Re: Groups count in distributed grouping is wrong in some case
Hi, I'm using SOLR 4.x from trunk. This was the version from 2012-07-10. So this is one of the latest versions. I searched mailing list and jira but found only this https://issues.apache.org/jira/browse/SOLR-3436 It was committed in May to trunk so my version of SOLR has this fix. But the problem still exists. Cheers Agnieszka 2012/7/15 Erick Erickson erickerick...@gmail.com what version of Solr are you using? There's been quite a bit of work on this lately, I'm not even sure how much has made it into 3.6. You might try searching the JIRA list, Martijn van Groningen has done a bunch of work lately, look for his name. Fortunately, it's not likely to get a bunch of false hits G.. Best Erick On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, I have problem with faceting count in distributed grouping. It appears only when I make query that returns almost all of the documents. My SOLR implementation has 4 shards and my queries looks like: http://host:port /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 With query like above I get strange counts for field category1. The counts for values are very big: int name=val19659/int int name=val27015/int int name=val35676/int int name=val41180/int int name=val51105/int int name=val6979/int int name=val7770/int int name=val8701/int int name=612/int int name=val9422/int int name=val10358/int When I make query to narrow the results adding to query fq=category1:val1, etc. I get different counts than facet category1 shows for a few first values: fq=category1:val1 - counts: 22 fq=category1:val2 - counts: 22 fq=category1:val3 - counts: 21 fq=category1:val4 - counts: 19 fq=category1:val5 - counts: 19 fq=category1:val6 - counts: 20 fq=category1:val7 - counts: 20 fq=category1:val8 - counts: 25 fq=category1:val9 - counts: 422 fq=category1:val10 - counts: 358 From val9 the count is ok. First I thought that for some values in facet category1 groups count does not work and it returns counts of all documents not group by field id. But the number of all documents matches query fq=category1:val1 is 45468. So the numbers are not the same. I check the queries on each shard for val1 and the results are: shard1: query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 lst name=fcategory int name=val111/int query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 shard 2: query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet. query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups7/int shard3: query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups4/int So it looks that detail query with fq=category1:val1 returns the relevant results. But Solr has problem with faceting counts when one of the shard does not return the faceting value (in this scenario val1) that exists on other shards. I checked shards for val10 and I got: shard1: count for val10 - 142 shard2: count for val10 - 131 shard3: count for val10 - 149 sum of counts 422 - ok. I'm not sure how to resolve that situation. For sure the counts of val1 to val9 should be different and they should not be on the top of the category1 facet because this is very confusing. Do you have any idea how to fix this problem? Best regards Agnieszka
Re: Groups count in distributed grouping is wrong in some case
what version of Solr are you using? There's been quite a bit of work on this lately, I'm not even sure how much has made it into 3.6. You might try searching the JIRA list, Martijn van Groningen has done a bunch of work lately, look for his name. Fortunately, it's not likely to get a bunch of false hits G.. Best Erick On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz agnieszka.kukalow...@usable.pl wrote: Hi, I have problem with faceting count in distributed grouping. It appears only when I make query that returns almost all of the documents. My SOLR implementation has 4 shards and my queries looks like: http://host:port /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 With query like above I get strange counts for field category1. The counts for values are very big: int name=val19659/int int name=val27015/int int name=val35676/int int name=val41180/int int name=val51105/int int name=val6979/int int name=val7770/int int name=val8701/int int name=612/int int name=val9422/int int name=val10358/int When I make query to narrow the results adding to query fq=category1:val1, etc. I get different counts than facet category1 shows for a few first values: fq=category1:val1 - counts: 22 fq=category1:val2 - counts: 22 fq=category1:val3 - counts: 21 fq=category1:val4 - counts: 19 fq=category1:val5 - counts: 19 fq=category1:val6 - counts: 20 fq=category1:val7 - counts: 20 fq=category1:val8 - counts: 25 fq=category1:val9 - counts: 422 fq=category1:val10 - counts: 358 From val9 the count is ok. First I thought that for some values in facet category1 groups count does not work and it returns counts of all documents not group by field id. But the number of all documents matches query fq=category1:val1 is 45468. So the numbers are not the same. I check the queries on each shard for val1 and the results are: shard1: query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 lst name=fcategory int name=val111/int query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 shard 2: query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet. query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups7/int shard3: query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups4/int So it looks that detail query with fq=category1:val1 returns the relevant results. But Solr has problem with faceting counts when one of the shard does not return the faceting value (in this scenario val1) that exists on other shards. I checked shards for val10 and I got: shard1: count for val10 - 142 shard2: count for val10 - 131 shard3: count for val10 - 149 sum of counts 422 - ok. I'm not sure how to resolve that situation. For sure the counts of val1 to val9 should be different and they should not be on the top of the category1 facet because this is very confusing. Do you have any idea how to fix this problem? Best regards Agnieszka
Groups count in distributed grouping is wrong in some case
Hi, I have problem with faceting count in distributed grouping. It appears only when I make query that returns almost all of the documents. My SOLR implementation has 4 shards and my queries looks like: http://host:port /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 With query like above I get strange counts for field category1. The counts for values are very big: int name=val19659/int int name=val27015/int int name=val35676/int int name=val41180/int int name=val51105/int int name=val6979/int int name=val7770/int int name=val8701/int int name=612/int int name=val9422/int int name=val10358/int When I make query to narrow the results adding to query fq=category1:val1, etc. I get different counts than facet category1 shows for a few first values: fq=category1:val1 - counts: 22 fq=category1:val2 - counts: 22 fq=category1:val3 - counts: 21 fq=category1:val4 - counts: 19 fq=category1:val5 - counts: 19 fq=category1:val6 - counts: 20 fq=category1:val7 - counts: 20 fq=category1:val8 - counts: 25 fq=category1:val9 - counts: 422 fq=category1:val10 - counts: 358 From val9 the count is ok. First I thought that for some values in facet category1 groups count does not work and it returns counts of all documents not group by field id. But the number of all documents matches query fq=category1:val1 is 45468. So the numbers are not the same. I check the queries on each shard for val1 and the results are: shard1: query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 lst name=fcategory int name=val111/int query: http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 shard 2: query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet. query: http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups7/int shard3: query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1 there is no value val1 in category1 facet query: http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1 :val1 int name=ngroups4/int So it looks that detail query with fq=category1:val1 returns the relevant results. But Solr has problem with faceting counts when one of the shard does not return the faceting value (in this scenario val1) that exists on other shards. I checked shards for val10 and I got: shard1: count for val10 - 142 shard2: count for val10 - 131 shard3: count for val10 - 149 sum of counts 422 - ok. I'm not sure how to resolve that situation. For sure the counts of val1 to val9 should be different and they should not be on the top of the category1 facet because this is very confusing. Do you have any idea how to fix this problem? Best regards Agnieszka