[ 
https://issues.apache.org/jira/browse/SOLR-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-3793:
---------------------------

      Description: 
Günter Hipler reported on the solr-user mailing list that he was seeing 
inconsistencies in facet counts compared to the numFound when drilling down 
onto those facets (using "fq") - in particular: when adding an "fq" such as 
`fq={!term+f%3DnavNetwork}nebis`, the resulting numFound was higher then the 
number of docs reported by the facet constraint for nebis in the base request.

I've been able to trivially reproduce this using the example data from Solr 
4.0-BETA (details in comment to follow)

Important things to note from Günter's email thread with his assessment of the 
problem...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201208.mbox/%3ccam_u7jfdpnrgfmwmntnachcdcjw4yb-rlbbvrw_wp_jdob_...@mail.gmail.com%3E

bq. The behaviour is not consistent. Some of the facets provide the correct 
result, some not.  What I can't say for sure: The behaviour was correct (if I'm 
not wrong) once the whole index was newly created. After running some updates I 
got these results.

bq. I'm going to setup a new index with the Lucene 4.0 version from March (to 
be more exactly: it's version 4.0-2012-03-09_11-29-20) to see what are the 
results even in case of frequent updates ... the version deployed in march 
doesn't contain the error I now come across in Beta4.0 


  was:

Günter Hipler reported on the solr-user mailing list that he was seeing 
inconsistencies in facet counts compared to the numFound when drilling down 
onto those facets (using "fq") - in particular: when adding an "fq" such as 
`fq={!term+f%3DnavNetwork}nebis`, the resulting numFound was higher then the 
number of docs reported by the facet constraint for nebis in the base request.

I've been able to trivially reproduce this using the example data from Solr 
4.0-BETA (details in comment to follow)

Important things to note from Günter's email thread with his assessment of the 
problem...

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201208.mbox/%3ccam_u7jfdpnrgfmwmntnachcdcjw4yb-rlbbvrw_wp_jdob_...@mail.gmail.com%3E

bq. The behaviour is not consistent. Some of the facets provide the correct 
result, some not.  What I can't say for sure: The behaviour was correct (if I'm 
not wrong) once the whole index was newly created. After running some updates I 
got these results.

bq. I'm going to setup a new index with the Lucene 4.0 version from March (to 
be more exactly: it's version 4.0-2012-03-09_11-29-20) to see what are the 
results even in case of frequent updates ... the version deployed in march 
doesn't contain the error I now come across in Beta4.0 


    Fix Version/s: 4.0

Steps to reproduce...


{panel}

1) Start with a clean install of 4.0-BETA, containing a completley empty 
example index, and run solr...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example$ ls -a 
solr/collection1/data/
.  ..
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example$ java -jar start.jar 
2012-09-05 12:59:56.596:INFO:oejs.Server:jetty-8.1.2.v20120308
...
{noformat}

2) In another window, index all sample documents...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ java 
-jar post.jar *.xml
...
{noformat}

3) Observe the results of a simple query faceting on "cat", as well as the 
results of filtering on one of those cat values...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true&fq=cat:electronics'
{
  "responseHeader":{
    "status":0,
    "QTime":8},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
{noformat}

4) Re-index some of the sample documents, forcing a new segment to be created, 
as well as some deletions...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ java 
-jar post.jar ipod_*S
...
{noformat}

5) observe that while the "simple" results are unchanged, the filtered request 
now includes duplicate (deleted?) documents in the result set...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true&fq=cat:electronics'
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":6,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
{noformat}

{panel}


Interesting things to note...

1) stoping & restarting jetty does not make the problem go away, which 
initially suggested to me that the problem is not related to any sort of 
stale-caching of filters/docsets -- however if you stop & restart jetty, or 
even just issue a commit, and then re-issue the same two requests in reverse 
order, then no duplicates are included.  do another commit, send the requests 
in the (original) problematic order and the problem re-appears...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true&fq=cat:electronics'
{
  "responseHeader":{
    "status":0,
    "QTime":6},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ java 
-Ddata=args -jar post.jar '<commit/>'SimplePostTool version 1.5
POSTing args to http://localhost:8983/solr/update..
COMMITting Solr index changes to http://localhost:8983/solr/update..
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",3,
        "connector",2,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&facet=true&facet.field=cat&facet.mincount=1&wt=json&indent=true&fq=cat:electronics'
{
  "responseHeader":{
    "status":0,
    "QTime":3},
  "response":{"numFound":6,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  },
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",6,
        "connector",4,
        "music",1]},
    "facet_dates":{},
    "facet_ranges":{}}}
{noformat}

2) Optimizing seems to eliminate the problem completley, suggesting that the 
root cause is definitely related to multiple segments containing deletions.

3) Bizarely, the problem seems to be specific to faceting: using the same 
steps, with the same simple queries & fq, but leaving out the facet params, the 
duplicate documents are not returned...

{noformat}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ java 
-jar post.jar *.xml
...
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":13},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  }}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&wt=json&indent=true&fq=cat:electronics'
{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  }}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ java 
-jar post.jar ipod_*
...
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  }}
hossman@frisbee:~/tmp/apache-solr-4.0.0-BETA/solr/example/exampledocs$ curl 
'http://localhost:8983/solr/select?echoParams=none&q=ipod&rows=5&fl=id&wt=json&indent=true&fq=cat:electronics'
{
  "responseHeader":{
    "status":0,
    "QTime":3},
  "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"IW-02"},
      {
        "id":"F8V7067-APL-KIT"},
      {
        "id":"MA147LL/A"}]
  }}
{noformat}






                
> duplicate (deleted) documents included in result set when using field 
> faceting with fq
> --------------------------------------------------------------------------------------
>
>                 Key: SOLR-3793
>                 URL: https://issues.apache.org/jira/browse/SOLR-3793
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.0-BETA
>            Reporter: Hoss Man
>            Priority: Blocker
>             Fix For: 4.0
>
>
> Günter Hipler reported on the solr-user mailing list that he was seeing 
> inconsistencies in facet counts compared to the numFound when drilling down 
> onto those facets (using "fq") - in particular: when adding an "fq" such as 
> `fq={!term+f%3DnavNetwork}nebis`, the resulting numFound was higher then the 
> number of docs reported by the facet constraint for nebis in the base request.
> I've been able to trivially reproduce this using the example data from Solr 
> 4.0-BETA (details in comment to follow)
> Important things to note from Günter's email thread with his assessment of 
> the problem...
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201208.mbox/%3ccam_u7jfdpnrgfmwmntnachcdcjw4yb-rlbbvrw_wp_jdob_...@mail.gmail.com%3E
> bq. The behaviour is not consistent. Some of the facets provide the correct 
> result, some not.  What I can't say for sure: The behaviour was correct (if 
> I'm not wrong) once the whole index was newly created. After running some 
> updates I got these results.
> bq. I'm going to setup a new index with the Lucene 4.0 version from March (to 
> be more exactly: it's version 4.0-2012-03-09_11-29-20) to see what are the 
> results even in case of frequent updates ... the version deployed in march 
> doesn't contain the error I now come across in Beta4.0 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to