[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-26 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302430#comment-15302430
 ] 

Keith Laban commented on SOLR-8988:
---

Thats right. This affects all queries where {{isDistrib}} is true for any 
reason.

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.5, 6.0
>Reporter: Keith Laban
>Assignee: Dennis Gove
> Fix For: 6.1
>
> Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, 
> SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 
> 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15301550#comment-15301550
 ] 

David Smiley commented on SOLR-8988:


BTW references to SolrCloud or "cloud mode" here seem incorrect, right?  This 
is about *distributed* (aka "sharded") faceting.  I was confused by the title 
and a related issue mentioning SolrCloud and I wondered how on earth SolrCloud 
would affect faceting.

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 5.5, 6.0
>Reporter: Keith Laban
>Assignee: Dennis Gove
> Fix For: 6.1
>
> Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, 
> SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 
> 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296924#comment-15296924
 ] 

ASF subversion and git services commented on SOLR-8988:
---

Commit ab87a0e75641d3e4076b9f4c247339f9d9c47103 in lucene-solr's branch 
refs/heads/branch_6x from [~dpgove]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ab87a0e ]

SOLR-8988: Adds query option facet.distrib.mco which when set to true allows 
the use of facet.mincount=1 in cloud mode


> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, 
> SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 
> 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296907#comment-15296907
 ] 

ASF subversion and git services commented on SOLR-8988:
---

Commit e4e990b993d6872f6345b7d064efb8ca22ee6556 in lucene-solr's branch 
refs/heads/master from [~dpgove]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4e990b ]

SOLR-8988: Adds query option facet.distrib.mco which when set to true allows 
the use of facet.mincount=1 in cloud mode


> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, 
> SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 
> 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-23 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296629#comment-15296629
 ] 

Dennis Gove commented on SOLR-8988:
---

I'm going to commit this with the default left as is (facet.mincount=0) because 
this new option will be defaulted to false. I've entered SOLR-9152 to discuss 
and handle changing the default. I believe it is safe to do so.

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, 
> SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 
> 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-23 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296621#comment-15296621
 ] 

Dennis Gove commented on SOLR-8988:
---

Just to slightly rephrase the salient point here:

Consider you asked for up to 10 terms from shardA with mincount=1 but you 
received only 5 terms back. In this case you know, definitively, that a term 
seen in the response from shardB but not in the response from shardA could have 
at most a count of 0 in shardA. If it had any other count in shardA then it 
would have been returned in the response from shardA.

Also, if you asked for up to 10 terms from shardA with mincount=1 and you get 
back a response with 10 terms having a count >= 1 then the response is 
identical to the one you'd have received if mincount=0. 

Because of this, there isn't a scenario where the response would result in more 
work than would have been required if mincount=0. For this reason, the decrease 
in required work when mincount=1 is *always* either a moot point or a net win.

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, SOLR-8988.patch, 
> SOLR-8988.patch, Screen Shot 2016-04-25 at 2.54.47 PM.png, Screen Shot 
> 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-20 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294136#comment-15294136
 ] 

Dennis Gove commented on SOLR-8988:
---

[~k317h], could you explain the javadocs on FACET_DISTRIB_MCO a little bit 
more? I don't quite follow the documentation on it

{code}
+  public static final String FACET_DISTRIB = FACET + ".distrib";
+
+  /**
+   * The default mincount to request on distributed facet queries.
+   * This param only applies to COUNT sorted queries which have a limit  -1
+   *
+   * Default values:
+   * Sort COUNT and facet.limit = -1: Math.min(facet.minCount, 1)
+   * Sort COUNT and facet.limit  0: 0
+   * Sort INDEX and facet.mincount = 1: facet.mincount
+   * Sort INDEX and facet.mincount  1: (int) Math.ceil((double) 
dff.minCount / rb.slices.length)
+   *
+   * EXPERT
+   */
+
+  public static final String FACET_DISTRIB_MCO = FACET_DISTRIB + ".mco";
+
{code}

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 
> at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-17 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286979#comment-15286979
 ] 

Hoss Man commented on SOLR-8988:


I haven't had time to review it enough to be confident enough that I'd want to 
commit it myself -- but if you have then go for it, i'm +0.

My one bit of feedback fro ma quick skim of the patch is that i don't 
understand the javadocs for "FACET_DISTRIB_MCO" at all ... it's a boolean 
param, but the docs describe it as " The default mincount to request on 
distributed facet queries" which makes it sound like a number, and the "Default 
values" bit of the javadocs don't relaly do anything to clarify that confusion 
since they also (appear to) talk about the (eventual) distributed mincount, and 
not the default value of the "FACET_DISTRIB_MCO" param itself

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 
> at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-17 Thread Dennis Gove (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286602#comment-15286602
 ] 

Dennis Gove commented on SOLR-8988:
---

[~hossman], do you still have concerns on this patch? I think it's a good 
change to make and I'm happy to take on the committing if you don't have any 
further concerns.

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 
> at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-05-02 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15266916#comment-15266916
 ] 

Keith Laban commented on SOLR-8988:
---

[~hossman] how does the updated patch look?

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch, SOLR-8988.patch, Screen Shot 2016-04-25 
> at 2.54.47 PM.png, Screen Shot 2016-04-25 at 2.55.00 PM.png
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-19 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249046#comment-15249046
 ] 

Hoss Man commented on SOLR-8988:


You've convinced me that i don't understand the point behind that existing 
{{TODO: we could change this to 1...}} comment, but I still want to review the 
code more thoroughly before i'm confident enough to concede your approach is 
better in all cases.

That said: If you updated your patch to make it optional based on a param 
w/some tests that randomly toggled the value (TestCloudPivotFacet, 
DistributedFacetPivotLongTailTest would be good ones) then i'd probably be game 
to commit even w/o being confident it's better in all cases, and we could worry 
about changing the default later.

bq. However I think this line block should also be changed.

Hmm, yeah ... that does smell like it could be optimized.

(FWIW: we have a TrackingShardHandlerFactory that can be used in tests to make 
assertions about what per-shard requests solr triggers. That can be used along 
with some carefully crafted shards/docs/requests to verify that no unnecessary 
refinement is done in cases where you don't expect it -- like with this 
{{initialMincount}} vs {{initialMincount-1}} situation)

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-19 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248953#comment-15248953
 ] 

Keith Laban commented on SOLR-8988:
---

[~hossman] can I convince you that this should be the default behavior?

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-15 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243103#comment-15243103
 ] 

Keith Laban commented on SOLR-8988:
---

For clarity of this test:

bq. num shards
- 12

bq. num docs per shard
- ~70 million

bq. num terms in field
- ~15 million

bq. num terms with non-zero facet counts for docs matching query on a per shard 
basis
- ~90k

bq. how much variance there is in the num terms with non-zero facet counts for 
docs matching query on a per shard basis
- evenly distributed 



bq. ...is that if you get back a count of foo=0 from shardA, and if foo winds 
up being a candidate term for the final topN list because of it's count on 
other shards, then you know definitively that you don't have to ask shardA to 
provide a refinement value for "foo" - you already know it's count.

This is the part that I would argue doesn't matter. Consider you asked for 10 
terms from shardA with mincount =1 and you received only 5 terms back. Then you 
know that if foo was in shardB, but not in shardA the maximum count it could 
have had in shardA was 0, otherwise it would have been returned in the initial 
request. 

On the other hand if you ask for 10 terms with mincount=1 and you get back 10 
terms with a count >=1 well the response back would have been identical if 
mincount=0. 

Logic aids refinement pulled from -- {{FacetComponent.DistributedFieldFacet}} 
{code}
void add(int shardNum, NamedList shardCounts, int numRequested) {
  // shardCounts could be null if there was an exception
  int sz = shardCounts == null ? 0 : shardCounts.size();
  int numReceived = sz;
  
  FixedBitSet terms = new FixedBitSet(termNum + sz);

  long last = 0;
  for (int i = 0; i < sz; i++) {
String name = shardCounts.getName(i);
long count = ((Number) shardCounts.getVal(i)).longValue();
if (name == null) {
  missingCount += count;
  numReceived--;
} else {
  ShardFacetCount sfc = counts.get(name);
  if (sfc == null) {
sfc = new ShardFacetCount();
sfc.name = name;
sfc.indexed = ftype == null ? sfc.name : ftype.toInternal(sfc.name);
sfc.termNum = termNum++;
counts.put(name, sfc);
  }
  sfc.count += count;
  terms.set(sfc.termNum);
  last = count;
}
  }
  
  // the largest possible missing term is initialMincount if we received
  // less than the number requested.
  if (numRequested < 0 || numRequested != 0 && numReceived < numRequested) {
last = initialMincount;
  }
  
  missingMaxPossible += last;
  missingMax[shardNum] = last;
  counted[shardNum] = terms;
}
{code}

However I think this line block should also be changed.
{code}
  if (numRequested < 0 || numRequested != 0 && numReceived < numRequested) {
last = Math.max(initialMincount-1, 0);
  }
{code}

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241998#comment-15241998
 ] 

Hoss Man commented on SOLR-8988:


{quote} 
* ~15million unique terms
* query matches ~3million documents
{quote}

Other key factors here are going to be:
* num shards
* num docs per shard
* num terms in field
* num terms with non-zero facet counts for docs matching query on a per shard 
basis
* how much variance there is in the num terms with non-zero facet counts for 
docs matching query on a per shard basis

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-14 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241993#comment-15241993
 ] 

Hoss Man commented on SOLR-8988:


bq. As far as I can tell there is no reason to set facet.mincount=0 for 
refinement purposes . After trying to make sense of all the refinement logic, I 
cant see how the difference between no value and value=0 would have a negative 

i haven't looked closely, but IIRC the justification for this comment...

{noformat}
-  dff.initialMincount = 0; // TODO: we could change this to 1, but 
would
-   // then need more refinement for small facet
-   // result sets?
{noformat}

is that if you get back a count of foo=0 from shardA, and if foo winds up being 
a candidate term for the final topN list because of it's count on other shards, 
then you know definitively that you don't have to ask shardA to provide a 
refinement value for "foo" - you already know it's count.

which behavior is more performant in the most common cases? ... i have no idea 
off the top of my head ... i'd have ot really sit down and think about all the 
variables.

what would probably make the most sense is to add an expert level option for 
controlling this (similar to the overrequest options) and leave the default as 
it is for now -- that way people have one more knob they can try turning to 
tune performance, and if we decide later that the default behavior should be 
changed in the common case, it's easy to do.

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-14 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241799#comment-15241799
 ] 

Scott Blum commented on SOLR-8988:
--

+1, we ran across this too, and couldn't think of any reason for the 
inefficiency here

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8988) Improve facet.method=fcs performance in SolrCloud

2016-04-14 Thread Keith Laban (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241767#comment-15241767
 ] 

Keith Laban commented on SOLR-8988:
---

I'm not sure who would be best to look at this. Maybe [~yo...@apache.org] or 
[~erike4...@yahoo.com] would be more familiar with this code path. Is there any 
reason this wouldn't work?

> Improve facet.method=fcs performance in SolrCloud
> -
>
> Key: SOLR-8988
> URL: https://issues.apache.org/jira/browse/SOLR-8988
> Project: Solr
>  Issue Type: Improvement
>Reporter: Keith Laban
> Attachments: SOLR-8988.patch
>
>
> This relates to SOLR-8559 -- which improves the algorithm used by fcs 
> faceting when {{facet.mincount=1}}
> This patch allows {{facet.mincount}} to be sent as 1 for distributed queries. 
> As far as I can tell there is no reason to set {{facet.mincount=0}} for 
> refinement purposes . After trying to make sense of all the refinement logic, 
> I cant see how the difference between _no value_ and _value=0_ would have a 
> negative effect.
> *Test perf:*
> - ~15million unique terms
> - query matches ~3million documents
> *Params:*
> {code}
> facet.mincount=1
> facet.limit=500
> facet.method=fcs
> facet.sort=count
> {code}
> *Average Time Per Request:*
> - Before patch:  ~20seconds
> - After patch: <1 second
> *Note*: all tests pass and in my test, the output was identical before and 
> after patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org