[ https://issues.apache.org/jira/browse/SOLR-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088820#comment-14088820 ]
Erick Erickson commented on SOLR-6314: -------------------------------------- OK, taking a closer look at this and I wonder what the right behavior is. The totals are correct, it's just that they repeated in one case and not in another. It _looks_ like I can restate the problem like this: When a facet field is requested more than one time in non-sharded cluster, then the field is repeated in the result set. When a facet field is requested more than once in a sharded cluster, then the field is only returned once in the result set. IOW, specifying the same facet.field twice: &facet.field=f1&facet.field=f1 results in two (identical) sections in the response in a non-sharded case and one in the sharded case. I'll look at the code tomorrow to see where the difference happens, I suspect in the aggregating code in the distributed case but that's just a guess. So the question is what the right behavior really is. I can argue that specifying the exact same facet parameter (either query or field) more than once is a waste, and the facet information should be cleaned up on the way in by removing duplicates. That would give the same response in both cases and return just one entry per unique facet criteria. This is arguably avoiding useless work (what's the value of specifying the same facet parameter twice?) That would change current behavior in the single-shard case however. If we tried to return multiple entries in the sharded case, it seems quite fragile to try to sum the facet sub-counts separately. By that I mean say shard 1 returns f1:33 f1:33 shard two returns f1:78 f1:78 You'd like the final result to be f1:111 f1:111 On the surface, some rule like "add facets by position when the key is identical and return multiple counts" seems like it would work, but it also seems rife for errors to creep in with arguably no added value. What happens, for instance, if there are three values for "f1" from one shard and only two from another? I don't see how that would really happen, but.... So, my question for you (and anyone who wants to chime in) is: "Do you agree that pruning multiple identical facet criteria is a Good Thing?". If not, what use case does returning multiple identical facet counts support and is that use case worth the effort? My gut feeling is no. Thanks for bringing this up it's certainly something that's confusing. I suspect it's just something that hasn't been thought of in the past.... > Multi-threaded facet counts differ when SolrCloud has >1 shard > -------------------------------------------------------------- > > Key: SOLR-6314 > URL: https://issues.apache.org/jira/browse/SOLR-6314 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other, SolrCloud > Affects Versions: 5.0 > Reporter: Vamsee Yarlagadda > Assignee: Erick Erickson > > I am trying to work with multi-threaded faceting on SolrCloud and in the > process i was hit by some issues. > I am currently running the below upstream test on different SolrCloud > configurations and i am getting a different result set per configuration. > https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/TestFaceting.java#L654 > Setup: > - *Indexed 50 docs into SolrCloud.* > - *If the SolrCloud has only 1 shard, the facet field query has the below > output (which matches with the expected upstream test output - # facet fields > ~ 50).* > {code} > $ curl > "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml" > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">21</int> > <lst name="params"> > <str name="facet">true</str> > <str name="fl">id</str> > <str name="indent">true</str> > <str name="q">id:*</str> > <str name="facet.limit">-1</str> > <str name="facet.threads">1000</str> > <arr name="facet.field"> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > </arr> > <str name="wt">xml</str> > <str name="rows">1</str> > </lst> > </lst> > <result name="response" numFound="50" start="0"> > <doc> > <float name="id">0.0</float></doc> > </result> > <lst name="facet_counts"> > <lst name="facet_queries"/> > <lst name="facet_fields"> > <lst name="f0_ws"> > <int name="zero_1">25</int> > <int name="zero_2">25</int> > </lst> > <lst name="f0_ws"> > <int name="zero_1">25</int> > <int name="zero_2">25</int> > </lst> > <lst name="f0_ws"> > <int name="zero_1">25</int> > <int name="zero_2">25</int> > </lst> > <lst name="f0_ws"> > <int name="zero_1">25</int> > <int name="zero_2">25</int> > </lst> > <lst name="f0_ws"> > <int name="zero_1">25</int> > <int name="zero_2">25</int> > </lst> > <lst name="f1_ws"> > <int name="one_1">33</int> > <int name="one_3">17</int> > </lst> > <lst name="f1_ws"> > <int name="one_1">33</int> > <int name="one_3">17</int> > </lst> > <lst name="f1_ws"> > <int name="one_1">33</int> > <int name="one_3">17</int> > </lst> > <lst name="f1_ws"> > <int name="one_1">33</int> > <int name="one_3">17</int> > </lst> > <lst name="f1_ws"> > <int name="one_1">33</int> > <int name="one_3">17</int> > </lst> > <lst name="f2_ws"> > <int name="two_1">37</int> > <int name="two_4">13</int> > </lst> > <lst name="f2_ws"> > <int name="two_1">37</int> > <int name="two_4">13</int> > </lst> > <lst name="f2_ws"> > <int name="two_1">37</int> > <int name="two_4">13</int> > </lst> > <lst name="f2_ws"> > <int name="two_1">37</int> > <int name="two_4">13</int> > </lst> > <lst name="f2_ws"> > <int name="two_1">37</int> > <int name="two_4">13</int> > </lst> > <lst name="f3_ws"> > <int name="three_1">40</int> > <int name="three_5">10</int> > </lst> > <lst name="f3_ws"> > <int name="three_1">40</int> > <int name="three_5">10</int> > </lst> > <lst name="f3_ws"> > <int name="three_1">40</int> > <int name="three_5">10</int> > </lst> > <lst name="f3_ws"> > <int name="three_1">40</int> > <int name="three_5">10</int> > </lst> > <lst name="f3_ws"> > <int name="three_1">40</int> > <int name="three_5">10</int> > </lst> > <lst name="f4_ws"> > <int name="four_1">41</int> > <int name="four_6">9</int> > </lst> > <lst name="f4_ws"> > <int name="four_1">41</int> > <int name="four_6">9</int> > </lst> > <lst name="f4_ws"> > <int name="four_1">41</int> > <int name="four_6">9</int> > </lst> > <lst name="f4_ws"> > <int name="four_1">41</int> > <int name="four_6">9</int> > </lst> > <lst name="f4_ws"> > <int name="four_1">41</int> > <int name="four_6">9</int> > </lst> > <lst name="f5_ws"> > <int name="five_1">42</int> > <int name="five_7">8</int> > </lst> > <lst name="f5_ws"> > <int name="five_1">42</int> > <int name="five_7">8</int> > </lst> > <lst name="f5_ws"> > <int name="five_1">42</int> > <int name="five_7">8</int> > </lst> > <lst name="f5_ws"> > <int name="five_1">42</int> > <int name="five_7">8</int> > </lst> > <lst name="f5_ws"> > <int name="five_1">42</int> > <int name="five_7">8</int> > </lst> > <lst name="f6_ws"> > <int name="six_1">43</int> > <int name="six_8">7</int> > </lst> > <lst name="f6_ws"> > <int name="six_1">43</int> > <int name="six_8">7</int> > </lst> > <lst name="f6_ws"> > <int name="six_1">43</int> > <int name="six_8">7</int> > </lst> > <lst name="f6_ws"> > <int name="six_1">43</int> > <int name="six_8">7</int> > </lst> > <lst name="f6_ws"> > <int name="six_1">43</int> > <int name="six_8">7</int> > </lst> > <lst name="f7_ws"> > <int name="seven_1">44</int> > <int name="seven_9">6</int> > </lst> > <lst name="f7_ws"> > <int name="seven_1">44</int> > <int name="seven_9">6</int> > </lst> > <lst name="f7_ws"> > <int name="seven_1">44</int> > <int name="seven_9">6</int> > </lst> > <lst name="f7_ws"> > <int name="seven_1">44</int> > <int name="seven_9">6</int> > </lst> > <lst name="f7_ws"> > <int name="seven_1">44</int> > <int name="seven_9">6</int> > </lst> > <lst name="f8_ws"> > <int name="eight_1">45</int> > <int name="eight_10">5</int> > </lst> > <lst name="f8_ws"> > <int name="eight_1">45</int> > <int name="eight_10">5</int> > </lst> > <lst name="f8_ws"> > <int name="eight_1">45</int> > <int name="eight_10">5</int> > </lst> > <lst name="f8_ws"> > <int name="eight_1">45</int> > <int name="eight_10">5</int> > </lst> > <lst name="f8_ws"> > <int name="eight_1">45</int> > <int name="eight_10">5</int> > </lst> > <lst name="f9_ws"> > <int name="nine_1">45</int> > <int name="nine_11">5</int> > </lst> > <lst name="f9_ws"> > <int name="nine_1">45</int> > <int name="nine_11">5</int> > </lst> > <lst name="f9_ws"> > <int name="nine_1">45</int> > <int name="nine_11">5</int> > </lst> > <lst name="f9_ws"> > <int name="nine_1">45</int> > <int name="nine_11">5</int> > </lst> > <lst name="f9_ws"> > <int name="nine_1">45</int> > <int name="nine_11">5</int> > </lst> > </lst> > <lst name="facet_dates"/> > <lst name="facet_ranges"/> > </lst> > </response> > {code} > - *Now, if a create a new collection with 2 shards (>1 shard SolrCloud), the > same above query results in a different output. (# facet fields ~ 10 ; > Expected 50)* > {code} > $ curl > "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml" > > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">31</int> > <lst name="params"> > <str name="facet">true</str> > <str name="fl">id</str> > <str name="indent">true</str> > <str name="q">id:*</str> > <str name="facet.limit">-1</str> > <str name="facet.threads">1000</str> > <arr name="facet.field"> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f0_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f1_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f2_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f3_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f4_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f5_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f6_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f7_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f8_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > <str>f9_ws</str> > </arr> > <str name="wt">xml</str> > <str name="rows">1</str> > </lst> > </lst> > <result name="response" numFound="50" start="0" maxScore="1.0"> > <doc> > <float name="id">2.0</float></doc> > </result> > <lst name="facet_counts"> > <lst name="facet_queries"/> > <lst name="facet_fields"> > <lst name="f0_ws"> > <int name="zero_1">25</int> > <int name="zero_2">25</int> > </lst> > <lst name="f1_ws"> > <int name="one_1">33</int> > <int name="one_3">17</int> > </lst> > <lst name="f2_ws"> > <int name="two_1">37</int> > <int name="two_4">13</int> > </lst> > <lst name="f3_ws"> > <int name="three_1">40</int> > <int name="three_5">10</int> > </lst> > <lst name="f4_ws"> > <int name="four_1">41</int> > <int name="four_6">9</int> > </lst> > <lst name="f5_ws"> > <int name="five_1">42</int> > <int name="five_7">8</int> > </lst> > <lst name="f6_ws"> > <int name="six_1">43</int> > <int name="six_8">7</int> > </lst> > <lst name="f7_ws"> > <int name="seven_1">44</int> > <int name="seven_9">6</int> > </lst> > <lst name="f8_ws"> > <int name="eight_1">45</int> > <int name="eight_10">5</int> > </lst> > <lst name="f9_ws"> > <int name="nine_1">45</int> > <int name="nine_11">5</int> > </lst> > </lst> > <lst name="facet_dates"/> > <lst name="facet_ranges"/> > </lst> > </response> > {code} > This behavior is quite strange as it is being dependent on the number of > shards in SolrCloud. It would be great if someone can shed some light on this? -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org