[ 
https://issues.apache.org/jira/browse/SOLR-10548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980235#comment-15980235
 ] 

Otis Gospodnetic commented on SOLR-10548:
-----------------------------------------

A new paper published in January introduced a new cardinality estimation 
algorithm called LogLog-Beta/β:

https://arxiv.org/abs/1612.02284

"The new algorithm uses only one formula and needs no additional bias
corrections for the entire range of cardinalities, therefore, it is more
efficient and simpler to implement. Our simulations show that the accuracy
provided by the new algorithm is as good as or better than the accuracy
provided by either of HyperLogLog or HyperLogLog++."
Some comments about its accuracy (graphs included) can be found in this PR: 
https://github.com/antirez/redis/pull/3677

> hyper-log-log based numBuckets for faceting
> -------------------------------------------
>
>                 Key: SOLR-10548
>                 URL: https://issues.apache.org/jira/browse/SOLR-10548
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>            Reporter: Yonik Seeley
>
> numBuckets currently uses an estimate (same as the unique function detailed 
> at http://yonik.com/solr-count-distinct/ ).  We should either change 
> implementations or introduce a way to optionally select a hyper-log-log based 
> approach for a better estimate with high field cardinalities.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to