[ 
https://issues.apache.org/jira/browse/SOLR-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088820#comment-14088820
 ] 

Erick Erickson commented on SOLR-6314:
--------------------------------------

OK, taking a closer look at this and I wonder what the right behavior is. The 
totals
are correct, it's just that they repeated in one case and not in  another. 

It _looks_ like I can restate the problem like this:

When a facet field is requested more than one time in non-sharded cluster, then 
the field
is repeated in the result set. 

When a facet field is requested more than once in a sharded cluster, then the 
field is only
returned once in the result set.

IOW, specifying the same facet.field twice: &facet.field=f1&facet.field=f1
results in two (identical) sections in the response in a non-sharded case
and one in the sharded case.

I'll look at the code tomorrow to see where the difference happens, I suspect 
in the
aggregating code in the distributed case but that's just a guess.

So the question is what the right behavior really is. I can argue that 
specifying the
exact same facet parameter (either query or field) more than once is a waste, 
and the
facet information should be cleaned up on the way in by removing duplicates. 
That would give
the same response in both cases and return just one entry per unique facet 
criteria. This is 
arguably avoiding useless work (what's the value of specifying the same facet 
parameter
twice?) That would change current behavior in the single-shard case however.

If we tried to return multiple entries in the sharded case, it seems quite 
fragile to try to sum the
facet sub-counts separately. By that I mean say shard 1 returns
f1:33
f1:33

shard two returns
f1:78
f1:78 

You'd like the final result to be
f1:111
f1:111

On the surface, some rule like "add facets by
position when the key is identical and return multiple
counts" seems like it would work, but it also seems
rife for errors to creep in with arguably no added value. What happens,
for instance, if there are three values for "f1" from one shard
and only two from another? I don't see how that would really happen,
but....

So, my question for you (and anyone who wants to chime in)
is: "Do you agree that pruning multiple identical facet criteria
is a Good Thing?". If not, what use case does returning multiple
identical facet counts support and is that use case worth the
effort? My gut feeling is no.

Thanks for bringing this up it's certainly something
that's confusing. I suspect it's just something that hasn't been
thought of in the past....

> Multi-threaded facet counts differ when SolrCloud has >1 shard
> --------------------------------------------------------------
>
>                 Key: SOLR-6314
>                 URL: https://issues.apache.org/jira/browse/SOLR-6314
>             Project: Solr
>          Issue Type: Bug
>          Components: SearchComponents - other, SolrCloud
>    Affects Versions: 5.0
>            Reporter: Vamsee Yarlagadda
>            Assignee: Erick Erickson
>
> I am trying to work with multi-threaded faceting on SolrCloud and in the 
> process i was hit by some issues.
> I am currently running the below upstream test on different SolrCloud 
> configurations and i am getting a different result set per configuration.
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/TestFaceting.java#L654
> Setup:
> - *Indexed 50 docs into SolrCloud.*
> - *If the SolrCloud has only 1 shard, the facet field query has the below 
> output (which matches with the expected upstream test output - # facet fields 
> ~ 50).*
> {code}
> $ curl  
> "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml";
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">21</int>
>   <lst name="params">
>     <str name="facet">true</str>
>     <str name="fl">id</str>
>     <str name="indent">true</str>
>     <str name="q">id:*</str>
>     <str name="facet.limit">-1</str>
>     <str name="facet.threads">1000</str>
>     <arr name="facet.field">
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>     </arr>
>     <str name="wt">xml</str>
>     <str name="rows">1</str>
>   </lst>
> </lst>
> <result name="response" numFound="50" start="0">
>   <doc>
>     <float name="id">0.0</float></doc>
> </result>
> <lst name="facet_counts">
>   <lst name="facet_queries"/>
>   <lst name="facet_fields">
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>   </lst>
>   <lst name="facet_dates"/>
>   <lst name="facet_ranges"/>
> </lst>
> </response> 
> {code}
> - *Now, if a create a new collection with 2 shards (>1 shard SolrCloud), the 
> same above query results in a different output. (# facet fields ~ 10 ;  
> Expected 50)*
> {code}
> $ curl  
> "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml";
>  
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">31</int>
>   <lst name="params">
>     <str name="facet">true</str>
>     <str name="fl">id</str>
>     <str name="indent">true</str>
>     <str name="q">id:*</str>
>     <str name="facet.limit">-1</str>
>     <str name="facet.threads">1000</str>
>     <arr name="facet.field">
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f0_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f1_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f2_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f3_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f4_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f5_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f6_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f7_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f8_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>       <str>f9_ws</str>
>     </arr>
>     <str name="wt">xml</str>
>     <str name="rows">1</str>
>   </lst>
> </lst>
> <result name="response" numFound="50" start="0" maxScore="1.0">
>   <doc>
>     <float name="id">2.0</float></doc>
> </result>
> <lst name="facet_counts">
>   <lst name="facet_queries"/>
>   <lst name="facet_fields">
>     <lst name="f0_ws">
>       <int name="zero_1">25</int>
>       <int name="zero_2">25</int>
>     </lst>
>     <lst name="f1_ws">
>       <int name="one_1">33</int>
>       <int name="one_3">17</int>
>     </lst>
>     <lst name="f2_ws">
>       <int name="two_1">37</int>
>       <int name="two_4">13</int>
>     </lst>
>     <lst name="f3_ws">
>       <int name="three_1">40</int>
>       <int name="three_5">10</int>
>     </lst>
>     <lst name="f4_ws">
>       <int name="four_1">41</int>
>       <int name="four_6">9</int>
>     </lst>
>     <lst name="f5_ws">
>       <int name="five_1">42</int>
>       <int name="five_7">8</int>
>     </lst>
>     <lst name="f6_ws">
>       <int name="six_1">43</int>
>       <int name="six_8">7</int>
>     </lst>
>     <lst name="f7_ws">
>       <int name="seven_1">44</int>
>       <int name="seven_9">6</int>
>     </lst>
>     <lst name="f8_ws">
>       <int name="eight_1">45</int>
>       <int name="eight_10">5</int>
>     </lst>
>     <lst name="f9_ws">
>       <int name="nine_1">45</int>
>       <int name="nine_11">5</int>
>     </lst>
>   </lst>
>   <lst name="facet_dates"/>
>   <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> This behavior is quite strange as it is being dependent on the number of 
> shards in SolrCloud. It would be great if someone can shed some light on this?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to