[ https://issues.apache.org/jira/browse/SOLR-9152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15296998#comment-15296998 ]
Scott Blum commented on SOLR-9152: ---------------------------------- Why is this even an option? I don't understand why you'd ever need the distributed request to have mincount=0 when the main request has mincount=1. > Change the default of facet.distrib.mco from false to true > ---------------------------------------------------------- > > Key: SOLR-9152 > URL: https://issues.apache.org/jira/browse/SOLR-9152 > Project: Solr > Issue Type: Improvement > Reporter: Dennis Gove > Priority: Minor > > SOLR-8988 added a new query option facet.distrib.mco which when set to true > would allow the use of facet.mincount=1 in cloud mode. The previous behavior, > and current default, is that facet.mincount=0 when in cloud mode. > h3. What exactly would be changed? > The default of facet.distrib.mco=false would be changed to > facet.distrib.mco=true. > h3. When is this option effective? > From the documentation, > {code} > /** > * If we are returning facet field counts, are sorting those facets by their > count, and the minimum count to return is > 0, > * then allow the use of facet.mincount = 1 in cloud mode. To enable this use > facet.distrib.mco=true. > * > * i.e. If the following three conditions are met in cloud mode: > facet.sort=count, facet.limit > 0, facet.mincount > 0. > * Then use facet.mincount=1. > * > * Previously and by default facet.mincount will be explicitly set to 0 when > in cloud mode for this condition. > * In SOLR-8599 and SOLR-8988, significant performance increase has been seen > when enabling this optimization. > * > * Note: enabling this flag has no effect when the conditions above are not > met. For those other cases the default behavior is sufficient. > */ > {code} > h3. What is the result of turning this option on? > When facet.distrib.mco=true is used, and the conditions above are met, then > when Solr is sending requests off to the various shards it will include > facet.mincount=1. The result of this is that only terms with a count > 0 will > be considered when processing the request for that shard. This can result in > a significant performance gain when the field has high cardinality and the > matching docset is relatively small because terms with 0 matches will not be > considered. > As shown in SOLR-8988, the runtime of a single query was reduced from 20 > seconds to less than 1 second. > h3. Can this change result in worse performance? > The current thinking is no, worse performance won't be experienced even under > non-optimal scenarios. From the comments in SOLR-8988, > {quote} > Consider you asked for up to 10 terms from shardA with mincount=1 but you > received only 5 terms back. In this case you know, definitively, that a term > seen in the response from shardB but not in the response from shardA could > have at most a count of 0 in shardA. If it had any other count in shardA then > it would have been returned in the response from shardA. > Also, if you asked for up to 10 terms from shardA with mincount=1 and you get > back a response with 10 terms having a count >= 1 then the response is > identical to the one you'd have received if mincount=0. > Because of this, there isn't a scenario where the response would result in > more work than would have been required if mincount=0. For this reason, the > decrease in required work when mincount=1 is *always* either a moot point or > a net win. > {quote} > The belief here is that it is safe to change the default of facet.distrib.mco > such that facet.mincount=1 will be used when appropriate. The overall > performance gain can be significant and there is no seen performance cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org