[jira] [Commented] (SOLR-5407) Strange error condition with cloud replication not working quite right

Nathan Neulinger (JIRA) Thu, 31 Oct 2013 05:32:20 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810192#comment-13810192
 ]


Nathan Neulinger commented on SOLR-5407:
----------------------------------------

After some further investigation - it seems like this might be related to 
SOLR-5325 fixed in 4.5.1. We haven't upgraded yet, but have it scheduled.

I also raised the zk tick to 5000 and increased timeout to 40 seconds just in 
case that helps. 

> Strange error condition with cloud replication not working quite right
> ----------------------------------------------------------------------
>
>                 Key: SOLR-5407
>                 URL: https://issues.apache.org/jira/browse/SOLR-5407
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.5
>            Reporter: Nathan Neulinger
>              Labels: cloud, replication
>
> I have a clodu deployment of 4.5 on EC2. Architecture is 3 dedicated ZK 
> nodes, and a pair of solr nodes.  I'll apologize in advance that this error 
> report is not going to have a lot of detail, I'm really hoping that the 
> scenario/description will trigger some "likely" possible explanation.
> The situation I got into was that the server had decided to fail over, so my 
> app servers were all taking to what should have been the primary for most of 
> the shards/collections, but actually was the replica.
> Here's where it gets odd - no errors being returned to the client code for 
> any of the searches or document updates - and the current primary server was 
> definitely receiving all of the updates - even though they were being 
> submitted to the inactive/replica node. (clients talking to solr-p1, which 
> was not primary at the time, and writes were being passed through to solr-r1, 
> which was primary at the time.)
> All sounds good so far right? Except - the replica server at the time, 
> through which the writes were passing - never got any of those content 
> updates. It had an old unmodified copy of the index. 
> I restarted solr-p1 (was the replica at the time) - no change in behavior. 
> Behavior did not change until I killed and restarted the current primary 
> (solr-r1) to force it to fail over.
> At that point, everything was all happy again and working properly. 
> Until this morning, when one of the developers provisioned a new collection, 
> which happened to put it's primary on solr-r1. Again, clients all pointing at 
> solr-p1. The developer reported that the documents were going into the index, 
> but not visible on the replica server. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5407) Strange error condition with cloud replication not working quite right

Reply via email to