Dennis Gove created SOLR-9152:
---------------------------------

             Summary: Change the default of facet.distrib.mco from false to true
                 Key: SOLR-9152
                 URL: https://issues.apache.org/jira/browse/SOLR-9152
             Project: Solr
          Issue Type: Improvement
            Reporter: Dennis Gove
            Priority: Minor


SOLR-8988 added a new query option facet.distrib.mco which when set to true 
would allow the use of facet.mincount=1 in cloud mode. The previous behavior, 
and current default, is that facet.mincount=0 when in cloud mode. 

h3. What exactly would be changed?
The default of facet.distrib.mco=false would be changed to 
facet.distrib.mco=true.

h3. When is this option effective?
>From the documentation,
{code}
/**
 * If we are returning facet field counts, are sorting those facets by their 
count, and the minimum count to return is > 0,
 * then allow the use of facet.mincount = 1 in cloud mode. To enable this use 
facet.distrib.mco=true.
 *
 * i.e. If the following three conditions are met in cloud mode: 
facet.sort=count, facet.limit > 0, facet.mincount > 0.
 * Then use facet.mincount=1.
 *
 * Previously and by default facet.mincount will be explicitly set to 0 when in 
cloud mode for this condition.
 * In SOLR-8599 and SOLR-8988, significant performance increase has been seen 
when enabling this optimization.
 *
 * Note: enabling this flag has no effect when the conditions above are not 
met. For those other cases the default behavior is sufficient.
 */
{code}

h3. What is the result of turning this option on?
When facet.distrib.mco=true is used, and the conditions above are met, then 
when Solr is sending requests off to the various shards it will include 
facet.mincount=1. The result of this is that only terms with a count > 0 will 
be considered when processing the request for that shard. This can result in a 
significant performance gain when the field has high cardinality and the 
matching docset is relatively small because terms with 0 matches will not be 
considered. 

As shown in SOLR-8988, the runtime of a single query was reduced from 20 
seconds to less than 1 second.

h3. Can this change result in worse performance?
The current thinking is no, worse performance won't be experienced even under 
non-optimal scenarios. From the comments in SOLR-8988, 
{quote}
Consider you asked for up to 10 terms from shardA with mincount=1 but you 
received only 5 terms back. In this case you know, definitively, that a term 
seen in the response from shardB but not in the response from shardA could have 
at most a count of 0 in shardA. If it had any other count in shardA then it 
would have been returned in the response from shardA.

Also, if you asked for up to 10 terms from shardA with mincount=1 and you get 
back a response with 10 terms having a count >= 1 then the response is 
identical to the one you'd have received if mincount=0. 

Because of this, there isn't a scenario where the response would result in more 
work than would have been required if mincount=0. For this reason, the decrease 
in required work when mincount=1 is *always* either a moot point or a net win.
{quote}

The belief here is that it is safe to change the default of facet.distrib.mco 
such that facet.mincount=1 will be used when appropriate. The overall 
performance gain can be significant and there is no seen performance cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to