Michael Gibney created SOLR-16144:
-------------------------------------

             Summary: Don't internally round [foreground|background]_popularity 
values in RelatednessAgg
                 Key: SOLR-16144
                 URL: https://issues.apache.org/jira/browse/SOLR-16144
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: Facet Module
    Affects Versions: main (10.0)
            Reporter: Michael Gibney


The "relatedness" facet function supports the concept of 
{{foreground_popularity}} and {{background_popularity}} -- i.e., the 
cardinality of the intersection of bucket domain with the foreground and 
background sets (respectively), each normalized with respect to background set 
cardinality.

The logic appears to be:
# To provide clients with context of computed relatedness values
# To preemptively (optionally) screen out "noise" from low-frequency terms via 
the {{min_popularity}} function parameter.

For both purposes, popularity values are currently rounded to 5 digits.

This issue proposes that although rounding to 5 digits makes sense for the 
_first_ case (providing context to clients), this arbitrary truncation does not 
make sense as currently implemented for internally evaluating threshold pop 
values for bucket inclusion.

Consider the case of a high-cardinality field with a relatively large 
background set and a selective foreground set. For {{|background_set| = 
2,000,000}} and a foreground set of cardinality 9, even a bucket with a domain 
that exactly matches the foreground set would be screened out, for _any_ 
explicit setting of {{min_popularity}}.

This behavior is due to where the rounding takes place (internally, upon 
initial {{computeDerivedValues()}}). It is further problematic that 
{{RelatednessAgg}} will currently accept {{min_popularity < 0.00001}}, which 
would be guaranteed to exclude _all_ buckets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to