Hi Gautam,

Thank you for sharing your thoughts!


Matplotlib is a great resource I should look at. Currently, I am exploring
the possibility of generating equi-probable buckets with variable widths
based on the data. Uniform bin width is a good starting point as it will
be easy to merge buckets from different histograms.


Thanks,

Yuting Gan

On Wed, Jun 2, 2021 at 9:11 PM Gautam Worah <[email protected]> wrote:

> Disclaimer: I work with Yuting in Amazon Product Search but the thoughts
> in this mail are independent and entirely mine.
>
> I tried to see if something similar has been done in other libraries and
> came across some interesting
> <https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html>
> finds in matplotlib.
> matplotlib provides support for automated histograms with different
> options of treating the data.
>
> 1. Provide the number of bins you want
> 2. Let the library decide the number of bins and the uniform bin width
> (with Scott's method
> <https://docs.astropy.org/en/stable/api/astropy.stats.scott_bin_width.html#astropy.stats.scott_bin_width>,
> or Freedman-Diaconis rule
> <https://docs.astropy.org/en/stable/api/astropy.stats.freedman_bin_width.html#astropy.stats.freedman_bin_width>
> or other methods
> <https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bin_edges>).
>
> I think this option could provide reasonably good results which would work
> for most "normal" use cases.
>
> The blog post you've linked provides bins with different widths. That is
> another important factor in deciding the implementation.
> Lots of directions to explore!
>
> Regards,
> Gautam Worah.
>
>
> On Wed, May 26, 2021 at 3:52 PM Yuti G <[email protected]> wrote:
>
>> Hello everyone,
>>
>> I have been exploring the possibilities of getting dynamic numeric range
>> facet counts without users specifying ranges.
>>
>> An example use-case might be a price filter on an e-commerce site.
>> Instead of requiring ranges to be pre-defined before doing facet counting
>> in Lucene, it would be really cool if Lucene could examine the matching
>> products and automatically determine relevant price ranges.
>>
>> I saw this blog post <
>> https://www.elastic.co/guide/en/elasticsearch/reference/7.9/search-aggregations-bucket-variablewidthhistogram-aggregation.html>
>> where Elasticsearch implemented similar functionality and think it would be
>> useful to bring a similar idea into Lucene itself.
>>
>> I am very early in thinking about this. Has anybody else thought about
>> this?
>>
>> If anyone is also interested or has any thoughts. I am more than happy to
>> learn from you. Please let me know :)
>>
>> Thanks,
>> Yuting
>>
>

Reply via email to