Re: Groups count in distributed grouping is wrong in some case

2012-07-15 Thread Agnieszka Kukałowicz
Hi,

I'm using SOLR 4.x from trunk. This was the version from 2012-07-10. So
this is one of the latest versions.

I searched mailing list and jira but found only this
https://issues.apache.org/jira/browse/SOLR-3436

It was committed in May to trunk so my version of SOLR has this fix. But
the problem still exists.

Cheers
Agnieszka

2012/7/15 Erick Erickson erickerick...@gmail.com

 what version of Solr are you using? There's been quite a bit of work
 on this lately,
 I'm not even sure how much has made it into 3.6. You might try searching
 the
 JIRA list, Martijn van Groningen has done a bunch of work lately, look for
 his name. Fortunately, it's not likely to get a bunch of false hits G..

 Best
 Erick

 On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz
 agnieszka.kukalow...@usable.pl wrote:
  Hi,
 
  I have problem with faceting count in distributed grouping. It appears
 only
  when I make query that returns almost all of the documents.
 
  My SOLR implementation has 4 shards and my queries looks like:
 
  http://host:port
 
 /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  With query like above I get strange counts for field category1.
  The counts for values are very big:
  int name=val19659/int
  int name=val27015/int
  int name=val35676/int
  int name=val41180/int
  int name=val51105/int
  int name=val6979/int
  int name=val7770/int
  int name=val8701/int
  int name=612/int
  int name=val9422/int
  int name=val10358/int
 
  When I make query to narrow the results adding to query
  fq=category1:val1, etc. I get different counts than facet category1
 shows
  for a few first values:
 
  fq=category1:val1 - counts: 22
  fq=category1:val2 - counts: 22
  fq=category1:val3 - counts: 21
  fq=category1:val4 - counts: 19
  fq=category1:val5 - counts: 19
  fq=category1:val6 - counts: 20
  fq=category1:val7 - counts: 20
  fq=category1:val8 - counts: 25
  fq=category1:val9 - counts: 422
  fq=category1:val10 - counts: 358
 
  From val9 the count is ok.
 
  First I thought that for some values in facet category1 groups count
 does
  not work and it returns counts of all documents not group by field id.
  But the number of all documents matches query  fq=category1:val1 is
  45468. So the numbers are not the same.
 
  I check the queries on each shard for val1 and the results are:
 
  shard1:
  query:
 
 http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  lst name=fcategory
  int name=val111/int
 
  query:
 
 http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
  :val1
 
  shard 2:
  query:
 
 http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  there is no value val1 in category1 facet.
 
  query:
 
 http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
  :val1
 
  int name=ngroups7/int
 
  shard3:
  query:
 
 http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1
 
  there is no value val1 in category1 facet
 
  query:
 
 http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
  :val1
 
  int name=ngroups4/int
 
  So it looks that detail query with fq=category1:val1 returns the
 relevant
  results. But Solr has problem with faceting counts when one of the shard
  does not return the faceting value (in this scenario val1) that exists
 on
  other shards.
 
  I checked shards for val10 and I got:
 
  shard1: count for val10 - 142
  shard2: count for val10 - 131
  shard3: count for val10 -  149
  sum of counts 422 - ok.
 
  I'm not sure how to resolve that situation. For sure the counts of val1
 to
  val9 should be different and they should not be on the top of the
 category1
  facet because this is very confusing. Do you have any idea how to fix
 this
  problem?
 
  Best regards
  Agnieszka



Re: Groups count in distributed grouping is wrong in some case

2012-07-14 Thread Erick Erickson
what version of Solr are you using? There's been quite a bit of work
on this lately,
I'm not even sure how much has made it into 3.6. You might try searching the
JIRA list, Martijn van Groningen has done a bunch of work lately, look for
his name. Fortunately, it's not likely to get a bunch of false hits G..

Best
Erick

On Fri, Jul 13, 2012 at 7:50 AM, Agnieszka Kukałowicz
agnieszka.kukalow...@usable.pl wrote:
 Hi,

 I have problem with faceting count in distributed grouping. It appears only
 when I make query that returns almost all of the documents.

 My SOLR implementation has 4 shards and my queries looks like:

 http://host:port
 /select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

 With query like above I get strange counts for field category1.
 The counts for values are very big:
 int name=val19659/int
 int name=val27015/int
 int name=val35676/int
 int name=val41180/int
 int name=val51105/int
 int name=val6979/int
 int name=val7770/int
 int name=val8701/int
 int name=612/int
 int name=val9422/int
 int name=val10358/int

 When I make query to narrow the results adding to query
 fq=category1:val1, etc. I get different counts than facet category1 shows
 for a few first values:

 fq=category1:val1 - counts: 22
 fq=category1:val2 - counts: 22
 fq=category1:val3 - counts: 21
 fq=category1:val4 - counts: 19
 fq=category1:val5 - counts: 19
 fq=category1:val6 - counts: 20
 fq=category1:val7 - counts: 20
 fq=category1:val8 - counts: 25
 fq=category1:val9 - counts: 422
 fq=category1:val10 - counts: 358

 From val9 the count is ok.

 First I thought that for some values in facet category1 groups count does
 not work and it returns counts of all documents not group by field id.
 But the number of all documents matches query  fq=category1:val1 is
 45468. So the numbers are not the same.

 I check the queries on each shard for val1 and the results are:

 shard1:
 query:
 http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

 lst name=fcategory
 int name=val111/int

 query:
 http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
 :val1

 shard 2:
 query:
 http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

 there is no value val1 in category1 facet.

 query:
 http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
 :val1

 int name=ngroups7/int

 shard3:
 query:
 http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

 there is no value val1 in category1 facet

 query:
 http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
 :val1

 int name=ngroups4/int

 So it looks that detail query with fq=category1:val1 returns the relevant
 results. But Solr has problem with faceting counts when one of the shard
 does not return the faceting value (in this scenario val1) that exists on
 other shards.

 I checked shards for val10 and I got:

 shard1: count for val10 - 142
 shard2: count for val10 - 131
 shard3: count for val10 -  149
 sum of counts 422 - ok.

 I'm not sure how to resolve that situation. For sure the counts of val1 to
 val9 should be different and they should not be on the top of the category1
 facet because this is very confusing. Do you have any idea how to fix this
 problem?

 Best regards
 Agnieszka


Groups count in distributed grouping is wrong in some case

2012-07-13 Thread Agnieszka Kukałowicz
Hi,

I have problem with faceting count in distributed grouping. It appears only
when I make query that returns almost all of the documents.

My SOLR implementation has 4 shards and my queries looks like:

http://host:port
/select/q?=*:*shards=shard1,shard2,shard3,shard4group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

With query like above I get strange counts for field category1.
The counts for values are very big:
int name=val19659/int
int name=val27015/int
int name=val35676/int
int name=val41180/int
int name=val51105/int
int name=val6979/int
int name=val7770/int
int name=val8701/int
int name=612/int
int name=val9422/int
int name=val10358/int

When I make query to narrow the results adding to query
fq=category1:val1, etc. I get different counts than facet category1 shows
for a few first values:

fq=category1:val1 - counts: 22
fq=category1:val2 - counts: 22
fq=category1:val3 - counts: 21
fq=category1:val4 - counts: 19
fq=category1:val5 - counts: 19
fq=category1:val6 - counts: 20
fq=category1:val7 - counts: 20
fq=category1:val8 - counts: 25
fq=category1:val9 - counts: 422
fq=category1:val10 - counts: 358

From val9 the count is ok.

First I thought that for some values in facet category1 groups count does
not work and it returns counts of all documents not group by field id.
But the number of all documents matches query  fq=category1:val1 is
45468. So the numbers are not the same.

I check the queries on each shard for val1 and the results are:

shard1:
query:
http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

lst name=fcategory
int name=val111/int

query:
http://shard1/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
:val1

shard 2:
query:
http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

there is no value val1 in category1 facet.

query:
http://shard2/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
:val1

int name=ngroups7/int

shard3:
query:
http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1

there is no value val1 in category1 facet

query:
http://shard3/select/?q=*:*group=truegroup.field=idgroup.facet=truegroup.ngroups=truefacet.field=category1facet.missing=falsefacet.mincount=1fq=category1
:val1

int name=ngroups4/int

So it looks that detail query with fq=category1:val1 returns the relevant
results. But Solr has problem with faceting counts when one of the shard
does not return the faceting value (in this scenario val1) that exists on
other shards.

I checked shards for val10 and I got:

shard1: count for val10 - 142
shard2: count for val10 - 131
shard3: count for val10 -  149
sum of counts 422 - ok.

I'm not sure how to resolve that situation. For sure the counts of val1 to
val9 should be different and they should not be on the top of the category1
facet because this is very confusing. Do you have any idea how to fix this
problem?

Best regards
Agnieszka