Hi Gautam,
Thank you for sharing your thoughts! Matplotlib is a great resource I should look at. Currently, I am exploring the possibility of generating equi-probable buckets with variable widths based on the data. Uniform bin width is a good starting point as it will be easy to merge buckets from different histograms. Thanks, Yuting Gan On Wed, Jun 2, 2021 at 9:11 PM Gautam Worah <[email protected]> wrote: > Disclaimer: I work with Yuting in Amazon Product Search but the thoughts > in this mail are independent and entirely mine. > > I tried to see if something similar has been done in other libraries and > came across some interesting > <https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html> > finds in matplotlib. > matplotlib provides support for automated histograms with different > options of treating the data. > > 1. Provide the number of bins you want > 2. Let the library decide the number of bins and the uniform bin width > (with Scott's method > <https://docs.astropy.org/en/stable/api/astropy.stats.scott_bin_width.html#astropy.stats.scott_bin_width>, > or Freedman-Diaconis rule > <https://docs.astropy.org/en/stable/api/astropy.stats.freedman_bin_width.html#astropy.stats.freedman_bin_width> > or other methods > <https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bin_edges>). > > I think this option could provide reasonably good results which would work > for most "normal" use cases. > > The blog post you've linked provides bins with different widths. That is > another important factor in deciding the implementation. > Lots of directions to explore! > > Regards, > Gautam Worah. > > > On Wed, May 26, 2021 at 3:52 PM Yuti G <[email protected]> wrote: > >> Hello everyone, >> >> I have been exploring the possibilities of getting dynamic numeric range >> facet counts without users specifying ranges. >> >> An example use-case might be a price filter on an e-commerce site. >> Instead of requiring ranges to be pre-defined before doing facet counting >> in Lucene, it would be really cool if Lucene could examine the matching >> products and automatically determine relevant price ranges. >> >> I saw this blog post < >> https://www.elastic.co/guide/en/elasticsearch/reference/7.9/search-aggregations-bucket-variablewidthhistogram-aggregation.html> >> where Elasticsearch implemented similar functionality and think it would be >> useful to bring a similar idea into Lucene itself. >> >> I am very early in thinking about this. Has anybody else thought about >> this? >> >> If anyone is also interested or has any thoughts. I am more than happy to >> learn from you. Please let me know :) >> >> Thanks, >> Yuting >> >
