RE: inconsistent number of results returned in solr cloud

2013-03-08 Thread Hardik Upadhyay
HI

I am using solr 4.0 (Not BETA), and have created 2 shard 2 replica 
configuration.
But when I query solr with filter query it returns inconsistent result count.
Without filter query it returns same consistent result count.
I don't understand why?

Can any one help in this?

Best Regards

Hardik Upadhyay




Re: inconsistent number of results returned in solr cloud

2013-03-08 Thread mike st. john

check for dup id's

a quick way is to facet using the id as a field and set the mincount to 2.


-Mike

Hardik Upadhyay wrote:


HI

I am using solr 4.0 (Not BETA), and have created 2 shard 2 replica 
configuration.
But when I query solr with filter query it returns inconsistent result 
count.

Without filter query it returns same consistent result count.
I don't understand why?

Can any one help in this?

Best Regards

Hardik Upadhyay




Re: inconsistent number of results returned in solr cloud

2012-11-30 Thread Erick Erickson
Just glad it's resolved

Erick


On Thu, Nov 29, 2012 at 7:46 PM, Buttler, David buttl...@llnl.gov wrote:

 Sorry, yes, I had been using the BETA version.  I have deleted all of
 that, replaced the jars with the released versions (reduced my core count),
 and now I have consistent results.
 I guess I missed that JIRA ticket, sorry for the false alarm.
 Dave


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Friday, November 23, 2012 4:25 AM
 To: solr-user@lucene.apache.org
 Subject: Re: inconsistent number of results returned in solr cloud

 Dave:

 I should have asked this first. What version of Solr are you using? I  Not
 sure whether it was fixed in BETA or not (certainly is in the 4.0 GA
 release). There was a problem with adding a doclist via solrj, here's one
 related JIRA, although it wasn't the main fix:
 https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the
 known problem Mark mentioned.

 Because what you're seeing _sure_ sounds similar

 Best
 Erick


 On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David buttl...@llnl.gov
 wrote:

  Answers inline below
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Saturday, November 17, 2012 6:40 AM
  To: solr-user@lucene.apache.org
  Subject: Re: inconsistent number of results returned in solr cloud
 
  Hmmm, first an aside. If by commit after every batch of documents 
  you mean after every call to server.add(doclist), there's no real need
  to do that unless you're striving for really low latency. the usual
  recommendation is to use commitWithin when adding and commit only at
  the very end of the run. This shouldn't actually be germane to your
  issue, just an FYI.
 
  DB Good point.  The code for committing docs to solr is fairly old.
  DB I
  will update it since I don't have a latency requirement.
 
  So you're saying that the inconsistency is permanent? By that I mean
  it keeps coming back inconsistently for minutes/hours/days?
 
  DB Yes, it is permanent.  I have collections that have been up for
  DB weeks,
  and are still returning inconsistent results, and I haven't been
  adding any additional documents.
  DB Related to this, I seem to have a discrepancy between the number
  DB of
  documents I think I am sending to solr, and the number of documents it
  is reporting.  I have tried reducing the number of shards for one of
  my small collections, so I deleted all references to this collections,
  and reloaded it. I think I have 260 documents submitted (counted from a
 hadoop job).
   Solr returns a count of ~430 (it varies), and the first returned
  document is not consistent.
 
  I guess if I were trying to test this I'd need to know how you added
  subsequent collections. In particular what you did re: zookeeper as
  you added each collection.
 
  DB These are my steps
  DB 1. Create the collection via the HTTP API: http://
  host:port/solr/admin/collections?action=CREATEname=collectionn
  umShards=6%20collection.configName=collection
  DB 2. Relaunch one of my JVM processes, bootstrapping the collection:
  DB java -Xmx16g -Dcollection.configName=collection
  DB -Djetty.port=port
  -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6
  -Dbootstrap_confdir=conf -jar start.jar
  DB load data
 
  DB Let me know if something is unclear.  I can run through the
  DB process
  again and document it more carefully.
  DB
  DB Thanks for looking at it,
  DB Dave
 
  Best
  Erick
 
 
  On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov
 wrote:
 
   My typical way of adding documents is through SolrJ, where I commit
   after every batch of documents (where the batch size is
   configurable)
  
   I have now tried committing several times, from the command line
   (curl) with and without openSearcher=true.  It does not affect
 anything.
  
   Dave
  
   -Original Message-
   From: Mark Miller [mailto:markrmil...@gmail.com]
   Sent: Friday, November 16, 2012 11:04 AM
   To: solr-user@lucene.apache.org
   Subject: Re: inconsistent number of results returned in solr cloud
  
   How did you do the final commit? Can you try a lone commit (with
   openSearcher=true) and see if that affects things?
  
   Trying to determine if this is a known issue or not.
  
   - Mark
  
   On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov
 wrote:
  
Hi all,
I buried an issue in my last post, so let me pop it up.
   
I have a cluster with 10 collections on it.  The first collection
I
   loaded works perfectly.  But every subsequent collection returns an
   inconsistent number of results for each query.  The queries can be
   simply *:*, or more complex facet queries.  If I go to individual
   cores and
  issue
   the query, with distrib=false, I get a consistent number of results.
   I
  am
   wondering if there is some delay in returning results from my
   shards, and the queried node just times out and displays

RE: inconsistent number of results returned in solr cloud

2012-11-29 Thread Buttler, David
Sorry, yes, I had been using the BETA version.  I have deleted all of that, 
replaced the jars with the released versions (reduced my core count), and now I 
have consistent results.
I guess I missed that JIRA ticket, sorry for the false alarm.
Dave


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, November 23, 2012 4:25 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

Dave:

I should have asked this first. What version of Solr are you using? I  Not sure 
whether it was fixed in BETA or not (certainly is in the 4.0 GA release). There 
was a problem with adding a doclist via solrj, here's one related JIRA, 
although it wasn't the main fix:
https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the known 
problem Mark mentioned.

Because what you're seeing _sure_ sounds similar

Best
Erick


On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David buttl...@llnl.gov wrote:

 Answers inline below

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Saturday, November 17, 2012 6:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: inconsistent number of results returned in solr cloud

 Hmmm, first an aside. If by commit after every batch of documents  
 you mean after every call to server.add(doclist), there's no real need 
 to do that unless you're striving for really low latency. the usual 
 recommendation is to use commitWithin when adding and commit only at 
 the very end of the run. This shouldn't actually be germane to your 
 issue, just an FYI.

 DB Good point.  The code for committing docs to solr is fairly old.  
 DB I
 will update it since I don't have a latency requirement.

 So you're saying that the inconsistency is permanent? By that I mean 
 it keeps coming back inconsistently for minutes/hours/days?

 DB Yes, it is permanent.  I have collections that have been up for 
 DB weeks,
 and are still returning inconsistent results, and I haven't been 
 adding any additional documents.
 DB Related to this, I seem to have a discrepancy between the number 
 DB of
 documents I think I am sending to solr, and the number of documents it 
 is reporting.  I have tried reducing the number of shards for one of 
 my small collections, so I deleted all references to this collections, 
 and reloaded it. I think I have 260 documents submitted (counted from a 
 hadoop job).
  Solr returns a count of ~430 (it varies), and the first returned 
 document is not consistent.

 I guess if I were trying to test this I'd need to know how you added 
 subsequent collections. In particular what you did re: zookeeper as 
 you added each collection.

 DB These are my steps
 DB 1. Create the collection via the HTTP API: http://
 host:port/solr/admin/collections?action=CREATEname=collectionn
 umShards=6%20collection.configName=collection
 DB 2. Relaunch one of my JVM processes, bootstrapping the collection:
 DB java -Xmx16g -Dcollection.configName=collection 
 DB -Djetty.port=port
 -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 
 -Dbootstrap_confdir=conf -jar start.jar
 DB load data

 DB Let me know if something is unclear.  I can run through the 
 DB process
 again and document it more carefully.
 DB
 DB Thanks for looking at it,
 DB Dave

 Best
 Erick


 On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote:

  My typical way of adding documents is through SolrJ, where I commit 
  after every batch of documents (where the batch size is 
  configurable)
 
  I have now tried committing several times, from the command line 
  (curl) with and without openSearcher=true.  It does not affect anything.
 
  Dave
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Friday, November 16, 2012 11:04 AM
  To: solr-user@lucene.apache.org
  Subject: Re: inconsistent number of results returned in solr cloud
 
  How did you do the final commit? Can you try a lone commit (with
  openSearcher=true) and see if that affects things?
 
  Trying to determine if this is a known issue or not.
 
  - Mark
 
  On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:
 
   Hi all,
   I buried an issue in my last post, so let me pop it up.
  
   I have a cluster with 10 collections on it.  The first collection 
   I
  loaded works perfectly.  But every subsequent collection returns an 
  inconsistent number of results for each query.  The queries can be 
  simply *:*, or more complex facet queries.  If I go to individual 
  cores and
 issue
  the query, with distrib=false, I get a consistent number of results.  
  I
 am
  wondering if there is some delay in returning results from my 
  shards, and the queried node just times out and displays the number 
  of results that
 it
  has received so far.  If there is such a timeout, it must be very 
  small,
 as
  my QTime is around 11 ms.
  
   Dave
 
 



Re: inconsistent number of results returned in solr cloud

2012-11-23 Thread Erick Erickson
Dave:

I should have asked this first. What version of Solr are you using? I  Not
sure whether it was fixed in BETA or not (certainly is in the 4.0 GA
release). There was a problem with adding a doclist via solrj, here's one
related JIRA, although it wasn't the main fix:
https://issues.apache.org/jira/browse/SOLR-3001. I suspect that's the
known problem Mark mentioned.

Because what you're seeing _sure_ sounds similar

Best
Erick


On Mon, Nov 19, 2012 at 12:49 PM, Buttler, David buttl...@llnl.gov wrote:

 Answers inline below

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Saturday, November 17, 2012 6:40 AM
 To: solr-user@lucene.apache.org
 Subject: Re: inconsistent number of results returned in solr cloud

 Hmmm, first an aside. If by commit after every batch of documents  you
 mean after every call to server.add(doclist), there's no real need to do
 that unless you're striving for really low latency. the usual
 recommendation is to use commitWithin when adding and commit only at the
 very end of the run. This shouldn't actually be germane to your issue, just
 an FYI.

 DB Good point.  The code for committing docs to solr is fairly old.  I
 will update it since I don't have a latency requirement.

 So you're saying that the inconsistency is permanent? By that I mean it
 keeps coming back inconsistently for minutes/hours/days?

 DB Yes, it is permanent.  I have collections that have been up for weeks,
 and are still returning inconsistent results, and I haven't been adding any
 additional documents.
 DB Related to this, I seem to have a discrepancy between the number of
 documents I think I am sending to solr, and the number of documents it is
 reporting.  I have tried reducing the number of shards for one of my small
 collections, so I deleted all references to this collections, and reloaded
 it. I think I have 260 documents submitted (counted from a hadoop job).
  Solr returns a count of ~430 (it varies), and the first returned document
 is not consistent.

 I guess if I were trying to test this I'd need to know how you added
 subsequent collections. In particular what you did re: zookeeper as you
 added each collection.

 DB These are my steps
 DB 1. Create the collection via the HTTP API: http://
 host:port/solr/admin/collections?action=CREATEname=collectionnumShards=6%20collection.configName=collection
 DB 2. Relaunch one of my JVM processes, bootstrapping the collection:
 DB java -Xmx16g -Dcollection.configName=collection -Djetty.port=port
 -DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6
 -Dbootstrap_confdir=conf -jar start.jar
 DB load data

 DB Let me know if something is unclear.  I can run through the process
 again and document it more carefully.
 DB
 DB Thanks for looking at it,
 DB Dave

 Best
 Erick


 On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote:

  My typical way of adding documents is through SolrJ, where I commit after
  every batch of documents (where the batch size is configurable)
 
  I have now tried committing several times, from the command line (curl)
  with and without openSearcher=true.  It does not affect anything.
 
  Dave
 
  -Original Message-
  From: Mark Miller [mailto:markrmil...@gmail.com]
  Sent: Friday, November 16, 2012 11:04 AM
  To: solr-user@lucene.apache.org
  Subject: Re: inconsistent number of results returned in solr cloud
 
  How did you do the final commit? Can you try a lone commit (with
  openSearcher=true) and see if that affects things?
 
  Trying to determine if this is a known issue or not.
 
  - Mark
 
  On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:
 
   Hi all,
   I buried an issue in my last post, so let me pop it up.
  
   I have a cluster with 10 collections on it.  The first collection I
  loaded works perfectly.  But every subsequent collection returns an
  inconsistent number of results for each query.  The queries can be simply
  *:*, or more complex facet queries.  If I go to individual cores and
 issue
  the query, with distrib=false, I get a consistent number of results.  I
 am
  wondering if there is some delay in returning results from my shards, and
  the queried node just times out and displays the number of results that
 it
  has received so far.  If there is such a timeout, it must be very small,
 as
  my QTime is around 11 ms.
  
   Dave
 
 



RE: inconsistent number of results returned in solr cloud

2012-11-19 Thread Buttler, David
Answers inline below

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Saturday, November 17, 2012 6:40 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

Hmmm, first an aside. If by commit after every batch of documents  you
mean after every call to server.add(doclist), there's no real need to do
that unless you're striving for really low latency. the usual
recommendation is to use commitWithin when adding and commit only at the
very end of the run. This shouldn't actually be germane to your issue, just
an FYI.

DB Good point.  The code for committing docs to solr is fairly old.  I will 
update it since I don't have a latency requirement.

So you're saying that the inconsistency is permanent? By that I mean it
keeps coming back inconsistently for minutes/hours/days?

DB Yes, it is permanent.  I have collections that have been up for weeks, and 
are still returning inconsistent results, and I haven't been adding any 
additional documents.
DB Related to this, I seem to have a discrepancy between the number of 
documents I think I am sending to solr, and the number of documents it is 
reporting.  I have tried reducing the number of shards for one of my small 
collections, so I deleted all references to this collections, and reloaded it. 
I think I have 260 documents submitted (counted from a hadoop job).  Solr 
returns a count of ~430 (it varies), and the first returned document is not 
consistent.

I guess if I were trying to test this I'd need to know how you added
subsequent collections. In particular what you did re: zookeeper as you
added each collection.

DB These are my steps
DB 1. Create the collection via the HTTP API: 
http://host:port/solr/admin/collections?action=CREATEname=collectionnumShards=6%20collection.configName=collection
DB 2. Relaunch one of my JVM processes, bootstrapping the collection: 
DB java -Xmx16g -Dcollection.configName=collection -Djetty.port=port 
-DzkHost=zkhost -Dsolr.solr.home=solr home -DnumShards=6 
-Dbootstrap_confdir=conf -jar start.jar
DB load data

DB Let me know if something is unclear.  I can run through the process again 
and document it more carefully.
DB
DB Thanks for looking at it,
DB Dave

Best
Erick


On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote:

 My typical way of adding documents is through SolrJ, where I commit after
 every batch of documents (where the batch size is configurable)

 I have now tried committing several times, from the command line (curl)
 with and without openSearcher=true.  It does not affect anything.

 Dave

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Friday, November 16, 2012 11:04 AM
 To: solr-user@lucene.apache.org
 Subject: Re: inconsistent number of results returned in solr cloud

 How did you do the final commit? Can you try a lone commit (with
 openSearcher=true) and see if that affects things?

 Trying to determine if this is a known issue or not.

 - Mark

 On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:

  Hi all,
  I buried an issue in my last post, so let me pop it up.
 
  I have a cluster with 10 collections on it.  The first collection I
 loaded works perfectly.  But every subsequent collection returns an
 inconsistent number of results for each query.  The queries can be simply
 *:*, or more complex facet queries.  If I go to individual cores and issue
 the query, with distrib=false, I get a consistent number of results.  I am
 wondering if there is some delay in returning results from my shards, and
 the queried node just times out and displays the number of results that it
 has received so far.  If there is such a timeout, it must be very small, as
 my QTime is around 11 ms.
 
  Dave




Re: inconsistent number of results returned in solr cloud

2012-11-17 Thread Erick Erickson
Hmmm, first an aside. If by commit after every batch of documents  you
mean after every call to server.add(doclist), there's no real need to do
that unless you're striving for really low latency. the usual
recommendation is to use commitWithin when adding and commit only at the
very end of the run. This shouldn't actually be germane to your issue, just
an FYI.

So you're saying that the inconsistency is permanent? By that I mean it
keeps coming back inconsistently for minutes/hours/days?

I guess if I were trying to test this I'd need to know how you added
subsequent collections. In particular what you did re: zookeeper as you
added each collection.

Best
Erick


On Fri, Nov 16, 2012 at 2:55 PM, Buttler, David buttl...@llnl.gov wrote:

 My typical way of adding documents is through SolrJ, where I commit after
 every batch of documents (where the batch size is configurable)

 I have now tried committing several times, from the command line (curl)
 with and without openSearcher=true.  It does not affect anything.

 Dave

 -Original Message-
 From: Mark Miller [mailto:markrmil...@gmail.com]
 Sent: Friday, November 16, 2012 11:04 AM
 To: solr-user@lucene.apache.org
 Subject: Re: inconsistent number of results returned in solr cloud

 How did you do the final commit? Can you try a lone commit (with
 openSearcher=true) and see if that affects things?

 Trying to determine if this is a known issue or not.

 - Mark

 On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:

  Hi all,
  I buried an issue in my last post, so let me pop it up.
 
  I have a cluster with 10 collections on it.  The first collection I
 loaded works perfectly.  But every subsequent collection returns an
 inconsistent number of results for each query.  The queries can be simply
 *:*, or more complex facet queries.  If I go to individual cores and issue
 the query, with distrib=false, I get a consistent number of results.  I am
 wondering if there is some delay in returning results from my shards, and
 the queried node just times out and displays the number of results that it
 has received so far.  If there is such a timeout, it must be very small, as
 my QTime is around 11 ms.
 
  Dave




Re: inconsistent number of results returned in solr cloud

2012-11-16 Thread Mark Miller
How did you do the final commit? Can you try a lone commit (with 
openSearcher=true) and see if that affects things?

Trying to determine if this is a known issue or not.

- Mark

On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:

 Hi all,
 I buried an issue in my last post, so let me pop it up.
 
 I have a cluster with 10 collections on it.  The first collection I loaded 
 works perfectly.  But every subsequent collection returns an inconsistent 
 number of results for each query.  The queries can be simply *:*, or more 
 complex facet queries.  If I go to individual cores and issue the query, with 
 distrib=false, I get a consistent number of results.  I am wondering if there 
 is some delay in returning results from my shards, and the queried node just 
 times out and displays the number of results that it has received so far.  If 
 there is such a timeout, it must be very small, as my QTime is around 11 ms.
 
 Dave



RE: inconsistent number of results returned in solr cloud

2012-11-16 Thread Buttler, David
My typical way of adding documents is through SolrJ, where I commit after every 
batch of documents (where the batch size is configurable)

I have now tried committing several times, from the command line (curl) with 
and without openSearcher=true.  It does not affect anything.

Dave

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Friday, November 16, 2012 11:04 AM
To: solr-user@lucene.apache.org
Subject: Re: inconsistent number of results returned in solr cloud

How did you do the final commit? Can you try a lone commit (with 
openSearcher=true) and see if that affects things?

Trying to determine if this is a known issue or not.

- Mark

On Nov 16, 2012, at 1:34 PM, Buttler, David buttl...@llnl.gov wrote:

 Hi all,
 I buried an issue in my last post, so let me pop it up.
 
 I have a cluster with 10 collections on it.  The first collection I loaded 
 works perfectly.  But every subsequent collection returns an inconsistent 
 number of results for each query.  The queries can be simply *:*, or more 
 complex facet queries.  If I go to individual cores and issue the query, with 
 distrib=false, I get a consistent number of results.  I am wondering if there 
 is some delay in returning results from my shards, and the queried node just 
 times out and displays the number of results that it has received so far.  If 
 there is such a timeout, it must be very small, as my QTime is around 11 ms.
 
 Dave