RE: solr query gives different numFound upon refreshing

2014-09-16 Thread Joshi, Shital
We wrote a script which queries each Solr instance in cloud 
(http://$host/solr/replication?command=details) and subtracts the 
‘replicableVersion’ number from the ‘indexVersion’ number, converts to minutes, 
and alerts if the minutes exceed 20. We get alerted many times a day. The soft 
commit setting is every 7 minutes. 

Any idea what might be wrong here?

This is our commit setting. 


   15000
   10
   false   
 
  
   45


We got rid of all max new searcher errors. 


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, September 04, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

Does this persist if you issue a hard commit? You can do something like
http://solr/collection/update?stream.body=

On Thu, Sep 4, 2014 at 2:19 PM, shamik  wrote:
> I've noticed similar behavior with our Solr cloud cluster for a while, it's
> random though. We've 2 shards with 3 replicas each. At times, I've observed
> that the same query on refresh will fetch different results (numFound) as
> well as the content. The only way to mitigate is to refresh the index with
> the documents till the nodes are in sync. I always use SolrJ which talks to
> Solr through zookeeper, even with that it seemed to be unavoidable at times.
> We are committing every 10 mins. I'm pretty much sure there's a minor glitch
> which creates a sync issue at times.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr query gives different numFound upon refreshing

2014-09-04 Thread Erick Erickson
Does this persist if you issue a hard commit? You can do something like
http://solr/collection/update?stream.body=

On Thu, Sep 4, 2014 at 2:19 PM, shamik  wrote:
> I've noticed similar behavior with our Solr cloud cluster for a while, it's
> random though. We've 2 shards with 3 replicas each. At times, I've observed
> that the same query on refresh will fetch different results (numFound) as
> well as the content. The only way to mitigate is to refresh the index with
> the documents till the nodes are in sync. I always use SolrJ which talks to
> Solr through zookeeper, even with that it seemed to be unavoidable at times.
> We are committing every 10 mins. I'm pretty much sure there's a minor glitch
> which creates a sync issue at times.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr query gives different numFound upon refreshing

2014-09-04 Thread shamik
I've noticed similar behavior with our Solr cloud cluster for a while, it's
random though. We've 2 shards with 3 replicas each. At times, I've observed
that the same query on refresh will fetch different results (numFound) as
well as the content. The only way to mitigate is to refresh the index with
the documents till the nodes are in sync. I always use SolrJ which talks to
Solr through zookeeper, even with that it seemed to be unavoidable at times.
We are committing every 10 mins. I'm pretty much sure there's a minor glitch
which creates a sync issue at times. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-query-gives-different-numFound-upon-refreshing-tp4155414p4157026.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr query gives different numFound upon refreshing

2014-08-29 Thread Erick Erickson
bq: What is Master Searching vs. Master Replicable vs Slave Searching

Likely leftover from the days when master/slave was the only option. You
can pretty much ignore it in SorlrCloud I think.

Best,
Erick


On Fri, Aug 29, 2014 at 11:38 AM, Joshi, Shital  wrote:

> Eric,
>
> Thanks your reply.
>
> We will increase autocommit setting and let you know.
>
> We are  using Solr Cloud (4.8.0). When from the Solr admin gui, select a
> collection and see the Overview tab, We see three versions of index though
> we have just 1 replica.
>
> Master (Searching)
> Master (Replicable)
> Slave (Searching)
>
> What is Master Searching vs. Master Replicable vs Slave Searching?
>
> Thanks.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, August 29, 2014 12:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query gives different numFound upon refreshing
>
> First, I  want to be sure you're not mixing old-style
> replication and SolrCloud. Your use of Master/Slave
> causes this question.
>
> Second, your maxWarmingSearchers error indicates that
> your commit interval is too short relative to your autowarm
> times. Try lengthening your autocommit settings (probably
> soft commit) until you no longer see that error message
> and see if the problem goes away. If it doesn't, let us know.
>
> Best,
> Erick
>
>
>
> On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital 
> wrote:
>
> > Hi Shawn,
> >
> > Thanks for your reply.
> >
> > We did some tests enabling shards.info=true and confirmed that there is
> > not duplicate copy of our index.
> >
> > We have one replica but many times we see three versions on Admin
> > GUI/Overview tab. All three has different versions and gen. Is that a
> > problem?
> > Master (Searching)
> > Master (Replicable)
> > Slave (Searching)
> >
> > We constantly see max searcher open exception. The warmup time is 1.5
> > minutes but the difference between openedAt date and registeredAt date is
> > at times more than 4-5 minutes. Is the true searcher time the difference
> > between two dates and not the warmupTime?
> >
> > openedAt:   2014-08-28T16:17:24.829Z
> > registeredAt:   2014-08-28T16:21:02.278Z
> > warmupTime: 65727
> >
> > Thanks for all help.
> >
> >
> > -Original Message-
> > From: Shawn Heisey [mailto:s...@elyograg.org]
> > Sent: Wednesday, August 27, 2014 2:37 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: solr query gives different numFound upon refreshing
> >
> > On 8/27/2014 10:44 AM, Bryan Bende wrote:
> > > Theoretically this shouldn't happen, but is it possible that the two
> > > replicas for a given shard are not fully in sync?
> > >
> > > Say shard1 replica1 is missing a document that is in shard1 replica2...
> > if
> > > you run a query that would hit on that document and run it a bunch of
> > > times, sometimes replica 1 will handle the request and sometimes
> replica
> > 2
> > > will handle it, and it would change your number of results if one of
> them
> > > is missing a document. You could write a program that compares each
> > > replica's documents by querying them with distrib=false.
> > >
> > > If there was a replica out of sync, I would think it would detect that
> > on a
> > > restart when comparing itself against the leader for that shard, but
> I'm
> > > not sure.
> >
> > A replica out of sync is a possibility, but the most common reason for a
> > changing numFound is because the overall distributed index has more than
> > one document with the same uniqueKey value -- different versions of the
> > same document in more than one shard.
> >
> > SolrCloud tries really hard to never end up with replicas out of sync,
> > but either due to highly unusual circumstances or bugs, it could still
> > happen.
> >
> > Thanks,
> > Shawn
> >
> >
>


RE: solr query gives different numFound upon refreshing

2014-08-29 Thread Joshi, Shital
Eric,

Thanks your reply. 

We will increase autocommit setting and let you know.

We are  using Solr Cloud (4.8.0). When from the Solr admin gui, select a 
collection and see the Overview tab, We see three versions of index though we 
have just 1 replica. 

Master (Searching)
Master (Replicable)
Slave (Searching)

What is Master Searching vs. Master Replicable vs Slave Searching? 

Thanks. 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, August 29, 2014 12:22 AM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

First, I  want to be sure you're not mixing old-style
replication and SolrCloud. Your use of Master/Slave
causes this question.

Second, your maxWarmingSearchers error indicates that
your commit interval is too short relative to your autowarm
times. Try lengthening your autocommit settings (probably
soft commit) until you no longer see that error message
and see if the problem goes away. If it doesn't, let us know.

Best,
Erick



On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital  wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> We did some tests enabling shards.info=true and confirmed that there is
> not duplicate copy of our index.
>
> We have one replica but many times we see three versions on Admin
> GUI/Overview tab. All three has different versions and gen. Is that a
> problem?
> Master (Searching)
> Master (Replicable)
> Slave (Searching)
>
> We constantly see max searcher open exception. The warmup time is 1.5
> minutes but the difference between openedAt date and registeredAt date is
> at times more than 4-5 minutes. Is the true searcher time the difference
> between two dates and not the warmupTime?
>
> openedAt:   2014-08-28T16:17:24.829Z
> registeredAt:   2014-08-28T16:21:02.278Z
> warmupTime: 65727
>
> Thanks for all help.
>
>
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Wednesday, August 27, 2014 2:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query gives different numFound upon refreshing
>
> On 8/27/2014 10:44 AM, Bryan Bende wrote:
> > Theoretically this shouldn't happen, but is it possible that the two
> > replicas for a given shard are not fully in sync?
> >
> > Say shard1 replica1 is missing a document that is in shard1 replica2...
> if
> > you run a query that would hit on that document and run it a bunch of
> > times, sometimes replica 1 will handle the request and sometimes replica
> 2
> > will handle it, and it would change your number of results if one of them
> > is missing a document. You could write a program that compares each
> > replica's documents by querying them with distrib=false.
> >
> > If there was a replica out of sync, I would think it would detect that
> on a
> > restart when comparing itself against the leader for that shard, but I'm
> > not sure.
>
> A replica out of sync is a possibility, but the most common reason for a
> changing numFound is because the overall distributed index has more than
> one document with the same uniqueKey value -- different versions of the
> same document in more than one shard.
>
> SolrCloud tries really hard to never end up with replicas out of sync,
> but either due to highly unusual circumstances or bugs, it could still
> happen.
>
> Thanks,
> Shawn
>
>


Re: solr query gives different numFound upon refreshing

2014-08-28 Thread Erick Erickson
First, I  want to be sure you're not mixing old-style
replication and SolrCloud. Your use of Master/Slave
causes this question.

Second, your maxWarmingSearchers error indicates that
your commit interval is too short relative to your autowarm
times. Try lengthening your autocommit settings (probably
soft commit) until you no longer see that error message
and see if the problem goes away. If it doesn't, let us know.

Best,
Erick



On Thu, Aug 28, 2014 at 9:39 AM, Joshi, Shital  wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> We did some tests enabling shards.info=true and confirmed that there is
> not duplicate copy of our index.
>
> We have one replica but many times we see three versions on Admin
> GUI/Overview tab. All three has different versions and gen. Is that a
> problem?
> Master (Searching)
> Master (Replicable)
> Slave (Searching)
>
> We constantly see max searcher open exception. The warmup time is 1.5
> minutes but the difference between openedAt date and registeredAt date is
> at times more than 4-5 minutes. Is the true searcher time the difference
> between two dates and not the warmupTime?
>
> openedAt:   2014-08-28T16:17:24.829Z
> registeredAt:   2014-08-28T16:21:02.278Z
> warmupTime: 65727
>
> Thanks for all help.
>
>
> -Original Message-
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Wednesday, August 27, 2014 2:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: solr query gives different numFound upon refreshing
>
> On 8/27/2014 10:44 AM, Bryan Bende wrote:
> > Theoretically this shouldn't happen, but is it possible that the two
> > replicas for a given shard are not fully in sync?
> >
> > Say shard1 replica1 is missing a document that is in shard1 replica2...
> if
> > you run a query that would hit on that document and run it a bunch of
> > times, sometimes replica 1 will handle the request and sometimes replica
> 2
> > will handle it, and it would change your number of results if one of them
> > is missing a document. You could write a program that compares each
> > replica's documents by querying them with distrib=false.
> >
> > If there was a replica out of sync, I would think it would detect that
> on a
> > restart when comparing itself against the leader for that shard, but I'm
> > not sure.
>
> A replica out of sync is a possibility, but the most common reason for a
> changing numFound is because the overall distributed index has more than
> one document with the same uniqueKey value -- different versions of the
> same document in more than one shard.
>
> SolrCloud tries really hard to never end up with replicas out of sync,
> but either due to highly unusual circumstances or bugs, it could still
> happen.
>
> Thanks,
> Shawn
>
>


RE: solr query gives different numFound upon refreshing

2014-08-28 Thread Joshi, Shital
Hi Shawn,

Thanks for your reply. 

We did some tests enabling shards.info=true and confirmed that there is not 
duplicate copy of our index.  

We have one replica but many times we see three versions on Admin GUI/Overview 
tab. All three has different versions and gen. Is that a problem?
Master (Searching)  
Master (Replicable) 
Slave (Searching)   

We constantly see max searcher open exception. The warmup time is 1.5 minutes 
but the difference between openedAt date and registeredAt date is at times more 
than 4-5 minutes. Is the true searcher time the difference between two dates 
and not the warmupTime?

openedAt:   2014-08-28T16:17:24.829Z
registeredAt:   2014-08-28T16:21:02.278Z
warmupTime: 65727

Thanks for all help. 


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, August 27, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Re: solr query gives different numFound upon refreshing

On 8/27/2014 10:44 AM, Bryan Bende wrote:
> Theoretically this shouldn't happen, but is it possible that the two
> replicas for a given shard are not fully in sync?
>
> Say shard1 replica1 is missing a document that is in shard1 replica2... if
> you run a query that would hit on that document and run it a bunch of
> times, sometimes replica 1 will handle the request and sometimes replica 2
> will handle it, and it would change your number of results if one of them
> is missing a document. You could write a program that compares each
> replica's documents by querying them with distrib=false.
>
> If there was a replica out of sync, I would think it would detect that on a
> restart when comparing itself against the leader for that shard, but I'm
> not sure.

A replica out of sync is a possibility, but the most common reason for a
changing numFound is because the overall distributed index has more than
one document with the same uniqueKey value -- different versions of the
same document in more than one shard.

SolrCloud tries really hard to never end up with replicas out of sync,
but either due to highly unusual circumstances or bugs, it could still
happen.

Thanks,
Shawn



Re: solr query gives different numFound upon refreshing

2014-08-27 Thread Shawn Heisey
On 8/27/2014 10:44 AM, Bryan Bende wrote:
> Theoretically this shouldn't happen, but is it possible that the two
> replicas for a given shard are not fully in sync?
>
> Say shard1 replica1 is missing a document that is in shard1 replica2... if
> you run a query that would hit on that document and run it a bunch of
> times, sometimes replica 1 will handle the request and sometimes replica 2
> will handle it, and it would change your number of results if one of them
> is missing a document. You could write a program that compares each
> replica's documents by querying them with distrib=false.
>
> If there was a replica out of sync, I would think it would detect that on a
> restart when comparing itself against the leader for that shard, but I'm
> not sure.

A replica out of sync is a possibility, but the most common reason for a
changing numFound is because the overall distributed index has more than
one document with the same uniqueKey value -- different versions of the
same document in more than one shard.

SolrCloud tries really hard to never end up with replicas out of sync,
but either due to highly unusual circumstances or bugs, it could still
happen.

Thanks,
Shawn



Re: solr query gives different numFound upon refreshing

2014-08-27 Thread Bryan Bende
Theoretically this shouldn't happen, but is it possible that the two
replicas for a given shard are not fully in sync?

Say shard1 replica1 is missing a document that is in shard1 replica2... if
you run a query that would hit on that document and run it a bunch of
times, sometimes replica 1 will handle the request and sometimes replica 2
will handle it, and it would change your number of results if one of them
is missing a document. You could write a program that compares each
replica's documents by querying them with distrib=false.

If there was a replica out of sync, I would think it would detect that on a
restart when comparing itself against the leader for that shard, but I'm
not sure.


On Wed, Aug 27, 2014 at 11:37 AM, Joshi, Shital  wrote:

> Hi,
>
> We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. We have
> three collections. We recently upgraded from 4.4.0 from 4.8. We have ~850
> mil documents.
>
> We are facing an issue where refreshing a Solr query may give different
> results (number of documents returned). This issue is seen in all three
> collections.
>
> We found that Solr admin would report Solr instance states as not
> “current”.  Is it indicative of the above issue?
>
> We checked logs and found various errors/warnings, but they don’t seem to
> be indicative of the above issue (or if they are – it’s not yet
> clear/obvious or maybe indirectly related). The error message is like this:
> 8/27/2014 2:01:24 AMERROR   SolrCmdDistributor
> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error
> opening new searcher. exceeded limit of maxWarmingSearchers=2, try again
> later.
>
> This is our autocommit setting.
>
> 
>15000
>10
>false
>  
>  
>30
> 
> The searcher takes less than 1.5 minutes and the soft commit setting is
> set for every 5 minutes. So there is no way to end up with more than two
> searchers.
>
> The searcher registeredAttime and openedAttime are sometimes 12-13 hours
> old and we end up bouncing could.
>
> Any help to solve this issue is appreciated.
>
>
>
>
>
>
>
>
>


solr query gives different numFound upon refreshing

2014-08-27 Thread Joshi, Shital
Hi,

We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. We have three 
collections. We recently upgraded from 4.4.0 from 4.8. We have ~850 mil 
documents.

We are facing an issue where refreshing a Solr query may give different results 
(number of documents returned). This issue is seen in all three collections.

We found that Solr admin would report Solr instance states as not “current”.  
Is it indicative of the above issue?

We checked logs and found various errors/warnings, but they don’t seem to be 
indicative of the above issue (or if they are – it’s not yet clear/obvious or 
maybe indirectly related). The error message is like this:
8/27/2014 2:01:24 AMERROR   SolrCmdDistributor 
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Error 
opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later.

This is our autocommit setting.


   15000
   10
   false
 
 
   30

The searcher takes less than 1.5 minutes and the soft commit setting is set for 
every 5 minutes. So there is no way to end up with more than two searchers.

The searcher registeredAttime and openedAttime are sometimes 12-13 hours old 
and we end up bouncing could.

Any help to solve this issue is appreciated.