[ https://issues.apache.org/jira/browse/SOLR-6299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078396#comment-14078396 ]
Tomás Fernández Löbbe commented on SOLR-6299: --------------------------------------------- I think the issue must be with the combination grouping+facet-query. Grouping already gives you bad group counts if you don't make sure that all docs of a group fall in the same shard. > Facet count on facet queries returns different results if #shards > 1 > --------------------------------------------------------------------- > > Key: SOLR-6299 > URL: https://issues.apache.org/jira/browse/SOLR-6299 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 5.0 > Reporter: Vamsee Yarlagadda > Labels: faceting > > I am trying to run some facet counts on facet queries and looks like i am > getting different counts if i use >1 shards in the SolrCloud cluster. > Here is the upstream unit test: > https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/SimpleFacetsTest.java#L173 > Setup: > * Ingested 5 solr docs. > {code} > { > "responseHeader": { > "status": 0, > "QTime": 22, > "params": { > "indent": "true", > "q": "*:*", > "_": "1406346687337", > "wt": "json" > } > }, > "response": { > "numFound": 5, > "start": 0, > "maxScore": 1, > "docs": [ > { > "id": 2004, > "range_facet_l": [ > 2004 > ], > "hotel_s1": "b", > "airport_s1": "ams", > "duration_i1": 5, > "_version_": 1474661321774465000, > "timestamp": "2014-07-26T03:50:27.975Z", > "multiDefault": [ > "muLti-Default" > ], > "intDefault": 42 > }, > { > "id": 2000, > "range_facet_l": [ > 2000 > ], > "hotel_s1": "a", > "airport_s1": "ams", > "duration_i1": 5, > "_version_": 1474661323604230100, > "timestamp": "2014-07-26T03:50:29.734Z", > "multiDefault": [ > "muLti-Default" > ], > "intDefault": 42 > }, > { > "id": 2003, > "range_facet_l": [ > 2003 > ], > "hotel_s1": "b", > "airport_s1": "ams", > "duration_i1": 5, > "_version_": 1474661326312702000, > "timestamp": "2014-07-26T03:50:32.317Z", > "multiDefault": [ > "muLti-Default" > ], > "intDefault": 42 > }, > { > "id": 2001, > "range_facet_l": [ > 2001 > ], > "hotel_s1": "a", > "airport_s1": "dus", > "duration_i1": 10, > "_version_": 1474661326389248000, > "timestamp": "2014-07-26T03:50:32.375Z", > "multiDefault": [ > "muLti-Default" > ], > "intDefault": 42 > }, > { > "id": 2002, > "range_facet_l": [ > 2002 > ], > "hotel_s1": "b", > "airport_s1": "ams", > "duration_i1": 10, > "_version_": 1474661326464745500, > "timestamp": "2014-07-26T03:50:32.446Z", > "multiDefault": [ > "muLti-Default" > ], > "intDefault": 42 > } > ] > } > } > {code} > Here is the query being run: > {code} > Test code: > assertQ( > req( > "q", "*:*", > "fq", "id:[2000 TO 2004]", > "group", "true", > "group.facet", "true", > "group.field", "hotel_s1", > "facet", "true", > "facet.limit", facetLimit, > "facet.query", "airport_s1:ams" > ), > "//lst[@name='facet_queries']/int[@name='airport_s1:ams'][.='2']" > ); > $ curl > "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml" > > {code} > Now, if i issue a query statement - On *1* shard system (Works as expected) > {code} > $ curl > "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml" > > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">17</int> > <lst name="params"> > <str name="facet">true</str> > <str name="indent">true</str> > <str name="facet.query">airport_s1:ams</str> > <str name="q">*:*</str> > <str name="facet.limit">-100</str> > <str name="group.field">hotel_s1</str> > <str name="group">true</str> > <str name="wt">xml</str> > <str name="fq">id:[2000 TO 2004]</str> > <str name="group.facet">true</str> > </lst> > </lst> > <lst name="grouped"> > <lst name="hotel_s1"> > <int name="matches">5</int> > <arr name="groups"> > <lst> > <str name="groupValue">a</str> > <result name="doclist" numFound="2" start="0"> > <doc> > <int name="id">2001</int> > <arr name="range_facet_l"> > <long>2001</long> > </arr> > <str name="hotel_s1">a</str> > <str name="airport_s1">dus</str> > <int name="duration_i1">10</int> > <long name="_version_">1474989437819551744</long> > <date name="timestamp">2014-07-29T18:45:43.819Z</date> > <arr name="multiDefault"> > <str>muLti-Default</str> > </arr> > <int name="intDefault">42</int></doc> > </result> > </lst> > <lst> > <str name="groupValue">b</str> > <result name="doclist" numFound="3" start="0"> > <doc> > <int name="id">2003</int> > <arr name="range_facet_l"> > <long>2003</long> > </arr> > <str name="hotel_s1">b</str> > <str name="airport_s1">ams</str> > <int name="duration_i1">5</int> > <long name="_version_">1474989439611568128</long> > <date name="timestamp">2014-07-29T18:45:45.528Z</date> > <arr name="multiDefault"> > <str>muLti-Default</str> > </arr> > <int name="intDefault">42</int></doc> > </result> > </lst> > </arr> > </lst> > </lst> > <lst name="facet_counts"> > <lst name="facet_queries"> > <int name="airport_s1:ams">2</int> > </lst> > <lst name="facet_fields"/> > <lst name="facet_dates"/> > <lst name="facet_ranges"/> > </lst> > </response> > {code} > Now, if i run the same query on 2 shard system, i see facet count as *3* > instead of *2*. > Solr result on 2 shard cluster: > {code} > [systest@search-testing-c5-1 search]$ curl > "http://localhost:8983/solr/collection1/select?facet=true&facet.query=airport_s1%3Aams&q=*%3A*&facet.limit=-100&group.field=hotel_s1&group=true&group.facet=true&fq=id%3A%5B2000+TO+2004%5D&indent=true&wt=xml" > > <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">69</int> > <lst name="params"> > <str name="facet">true</str> > <str name="indent">true</str> > <str name="facet.query">airport_s1:ams</str> > <str name="q">*:*</str> > <str name="facet.limit">-100</str> > <str name="group.field">hotel_s1</str> > <str name="group">true</str> > <str name="wt">xml</str> > <str name="fq">id:[2000 TO 2004]</str> > <str name="group.facet">true</str> > </lst> > </lst> > <lst name="grouped"> > <lst name="hotel_s1"> > <int name="matches">5</int> > <arr name="groups"> > <lst> > <str name="groupValue">b</str> > <result name="doclist" numFound="3" start="0" maxScore="1.0"> > <doc> > <int name="id">2002</int> > <arr name="range_facet_l"> > <long>2002</long> > </arr> > <str name="hotel_s1">b</str> > <str name="airport_s1">ams</str> > <int name="duration_i1">10</int> > <long name="_version_">1474661326464745472</long> > <date name="timestamp">2014-07-26T03:50:32.446Z</date> > <arr name="multiDefault"> > <str>muLti-Default</str> > </arr> > <int name="intDefault">42</int></doc> > </result> > </lst> > <lst> > <str name="groupValue">a</str> > <result name="doclist" numFound="2" start="0" maxScore="1.0"> > <doc> > <int name="id">2001</int> > <arr name="range_facet_l"> > <long>2001</long> > </arr> > <str name="hotel_s1">a</str> > <str name="airport_s1">dus</str> > <int name="duration_i1">10</int> > <long name="_version_">1474661326389248000</long> > <date name="timestamp">2014-07-26T03:50:32.375Z</date> > <arr name="multiDefault"> > <str>muLti-Default</str> > </arr> > <int name="intDefault">42</int></doc> > </result> > </lst> > </arr> > </lst> > </lst> > <lst name="facet_counts"> > <lst name="facet_queries"> > <int name="airport_s1:ams">3</int> > </lst> > <lst name="facet_fields"/> > <lst name="facet_dates"/> > <lst name="facet_ranges"/> > </lst> > </response> > {code} > In order to replicate this, we can simply run the above test on >1 shard > system and the solr response will be different. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org