[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302430#comment-15302430 ] Keith Laban commented on SOLR-8988: --- Thats right. This affects all queries where {{isDistrib}} is true for any reason. > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.5, 6.0 >Reporter: Keith Laban >Assignee: Dennis Gove > Fix For: 6.1 > > Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, > SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot > 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301550#comment-15301550 ] David Smiley commented on SOLR-8988: BTW references to SolrCloud or "cloud mode" here seem incorrect, right? This is about *distributed* (aka "sharded") faceting. I was confused by the title and a related issue mentioning SolrCloud and I wondered how on earth SolrCloud would affect faceting. > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Affects Versions: 5.5, 6.0 >Reporter: Keith Laban >Assignee: Dennis Gove > Fix For: 6.1 > > Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, > SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot > 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296924#comment-15296924 ] ASF subversion and git services commented on SOLR-8988: --- Commit ab87a0e75641d3e4076b9f4c247339f9d9c47103 in lucene-solr's branch refs/heads/branch_6x from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ab87a0e ] SOLR-8988: Adds query option facet.distrib.mco which when set to true allows the use of facet.mincount=1 in cloud mode > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, > SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot > 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296907#comment-15296907 ] ASF subversion and git services commented on SOLR-8988: --- Commit e4e990b993d6872f6345b7d064efb8ca22ee6556 in lucene-solr's branch refs/heads/master from [~dpgove] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4e990b ] SOLR-8988: Adds query option facet.distrib.mco which when set to true allows the use of facet.mincount=1 in cloud mode > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, > SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot > 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296629#comment-15296629 ] Dennis Gove commented on SOLR-8988: --- I'm going to commit this with the default left as is (facet.mincount=0) because this new option will be defaulted to false. I've entered SOLR-9152 to discuss and handle changing the default. I believe it is safe to do so. > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, > SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot > 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296621#comment-15296621 ] Dennis Gove commented on SOLR-8988: --- Just to slightly rephrase the salient point here: Consider you asked for up to 10 terms from shardA with mincount=1 but you received only 5 terms back. In this case you know, definitively, that a term seen in the response from shardB but not in the response from shardA could have at most a count of 0 in shardA. If it had any other count in shardA then it would have been returned in the response from shardA. Also, if you asked for up to 10 terms from shardA with mincount=1 and you get back a response with 10 terms having a count >= 1 then the response is identical to the one you'd have received if mincount=0. Because of this, there isn't a scenario where the response would result in more work than would have been required if mincount=0. For this reason, the decrease in required work when mincount=1 is *always* either a moot point or a net win. > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, > SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot > 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294136#comment-15294136 ] Dennis Gove commented on SOLR-8988: --- [~k317h], could you explain the javadocs on FACET_DISTRIB_MCO a little bit more? I don't quite follow the documentation on it {code} + public static final String FACET_DISTRIB = FACET + ".distrib"; + + /** + * The default mincount to request on distributed facet queries. + * This param only applies to COUNT sorted queries which have a limit -1 + * + * Default values: + * Sort COUNT and facet.limit = -1: Math.min(facet.minCount, 1) + * Sort COUNT and facet.limit 0: 0 + * Sort INDEX and facet.mincount = 1: facet.mincount + * Sort INDEX and facet.mincount 1: (int) Math.ceil((double) dff.minCount / rb.slices.length) + * + * EXPERT + */ + + public static final String FACET_DISTRIB_MCO = FACET_DISTRIB + ".mco"; + {code} > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 > at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286979#comment-15286979 ] Hoss Man commented on SOLR-8988: I haven't had time to review it enough to be confident enough that I'd want to commit it myself -- but if you have then go for it, i'm +0. My one bit of feedback fro ma quick skim of the patch is that i don't understand the javadocs for "FACET_DISTRIB_MCO" at all ... it's a boolean param, but the docs describe it as " The default mincount to request on distributed facet queries" which makes it sound like a number, and the "Default values" bit of the javadocs don't relaly do anything to clarify that confusion since they also (appear to) talk about the (eventual) distributed mincount, and not the default value of the "FACET_DISTRIB_MCO" param itself > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 > at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286602#comment-15286602 ] Dennis Gove commented on SOLR-8988: --- [~hossman], do you still have concerns on this patch? I think it's a good change to make and I'm happy to take on the committing if you don't have any further concerns. > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 > at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266916#comment-15266916 ] Keith Laban commented on SOLR-8988: --- [~hossman] how does the updated patch look? > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 > at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249046#comment-15249046 ] Hoss Man commented on SOLR-8988: You've convinced me that i don't understand the point behind that existing {{TODO: we could change this to 1...}} comment, but I still want to review the code more thoroughly before i'm confident enough to concede your approach is better in all cases. That said: If you updated your patch to make it optional based on a param w/some tests that randomly toggled the value (TestCloudPivotFacet, DistributedFacetPivotLongTailTest would be good ones) then i'd probably be game to commit even w/o being confident it's better in all cases, and we could worry about changing the default later. bq. However I think this line block should also be changed. Hmm, yeah ... that does smell like it could be optimized. (FWIW: we have a TrackingShardHandlerFactory that can be used in tests to make assertions about what per-shard requests solr triggers. That can be used along with some carefully crafted shards/docs/requests to verify that no unnecessary refinement is done in cases where you don't expect it -- like with this {{initialMincount}} vs {{initialMincount-1}} situation) > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248953#comment-15248953 ] Keith Laban commented on SOLR-8988: --- [~hossman] can I convince you that this should be the default behavior? > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243103#comment-15243103 ] Keith Laban commented on SOLR-8988: --- For clarity of this test: bq. num shards - 12 bq. num docs per shard - ~70 million bq. num terms in field - ~15 million bq. num terms with non-zero facet counts for docs matching query on a per shard basis - ~90k bq. how much variance there is in the num terms with non-zero facet counts for docs matching query on a per shard basis - evenly distributed bq. ...is that if you get back a count of foo=0 from shardA, and if foo winds up being a candidate term for the final topN list because of it's count on other shards, then you know definitively that you don't have to ask shardA to provide a refinement value for "foo" - you already know it's count. This is the part that I would argue doesn't matter. Consider you asked for 10 terms from shardA with mincount =1 and you received only 5 terms back. Then you know that if foo was in shardB, but not in shardA the maximum count it could have had in shardA was 0, otherwise it would have been returned in the initial request. On the other hand if you ask for 10 terms with mincount=1 and you get back 10 terms with a count >=1 well the response back would have been identical if mincount=0. Logic aids refinement pulled from -- {{FacetComponent.DistributedFieldFacet}} {code} void add(int shardNum, NamedList shardCounts, int numRequested) { // shardCounts could be null if there was an exception int sz = shardCounts == null ? 0 : shardCounts.size(); int numReceived = sz; FixedBitSet terms = new FixedBitSet(termNum + sz); long last = 0; for (int i = 0; i < sz; i++) { String name = shardCounts.getName(i); long count = ((Number) shardCounts.getVal(i)).longValue(); if (name == null) { missingCount += count; numReceived--; } else { ShardFacetCount sfc = counts.get(name); if (sfc == null) { sfc = new ShardFacetCount(); sfc.name = name; sfc.indexed = ftype == null ? sfc.name : ftype.toInternal(sfc.name); sfc.termNum = termNum++; counts.put(name, sfc); } sfc.count += count; terms.set(sfc.termNum); last = count; } } // the largest possible missing term is initialMincount if we received // less than the number requested. if (numRequested < 0 || numRequested != 0 && numReceived < numRequested) { last = initialMincount; } missingMaxPossible += last; missingMax[shardNum] = last; counted[shardNum] = terms; } {code} However I think this line block should also be changed. {code} if (numRequested < 0 || numRequested != 0 && numReceived < numRequested) { last = Math.max(initialMincount-1, 0); } {code} > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241998#comment-15241998 ] Hoss Man commented on SOLR-8988: {quote} * ~15million unique terms * query matches ~3million documents {quote} Other key factors here are going to be: * num shards * num docs per shard * num terms in field * num terms with non-zero facet counts for docs matching query on a per shard basis * how much variance there is in the num terms with non-zero facet counts for docs matching query on a per shard basis > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241993#comment-15241993 ] Hoss Man commented on SOLR-8988: bq. As far as I can tell there is no reason to set facet.mincount=0 for refinement purposes . After trying to make sense of all the refinement logic, I cant see how the difference between no value and value=0 would have a negative i haven't looked closely, but IIRC the justification for this comment... {noformat} - dff.initialMincount = 0; // TODO: we could change this to 1, but would - // then need more refinement for small facet - // result sets? {noformat} is that if you get back a count of foo=0 from shardA, and if foo winds up being a candidate term for the final topN list because of it's count on other shards, then you know definitively that you don't have to ask shardA to provide a refinement value for "foo" - you already know it's count. which behavior is more performant in the most common cases? ... i have no idea off the top of my head ... i'd have ot really sit down and think about all the variables. what would probably make the most sense is to add an expert level option for controlling this (similar to the overrequest options) and leave the default as it is for now -- that way people have one more knob they can try turning to tune performance, and if we decide later that the default behavior should be changed in the common case, it's easy to do. > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241799#comment-15241799 ] Scott Blum commented on SOLR-8988: -- +1, we ran across this too, and couldn't think of any reason for the inefficiency here > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241767#comment-15241767 ] Keith Laban commented on SOLR-8988: --- I'm not sure who would be best to look at this. Maybe [~yo...@apache.org] or [~erike4...@yahoo.com] would be more familiar with this code path. Is there any reason this wouldn't work? > Improve facet.method=fcs performance in SolrCloud > - > > Key: SOLR-8988 > URL: https://issues.apache.org/jira/browse/SOLR-8988 > Project: Solr > Issue Type: Improvement >Reporter: Keith Laban > Attachments: SOLR-8988.patch > > > This relates to SOLR-8559 -- which improves the algorithm used by fcs > faceting when {{facet.mincount=1}} > This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. > As far as I can tell there is no reason to set {{facet.mincount=0}} for > refinement purposes . After trying to make sense of all the refinement logic, > I cant see how the difference between _no value_ and _value=0_ would have a > negative effect. > *Test perf:* > - ~15million unique terms > - query matches ~3million documents > *Params:* > {code} > facet.mincount=1 > facet.limit=500 > facet.method=fcs > facet.sort=count > {code} > *Average Time Per Request:* > - Before patch: ~20seconds > - After patch: <1 second > *Note*: all tests pass and in my test, the output was identical before and > after patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org