dsmiley opened a new pull request, #1484:
URL: https://github.com/apache/solr/pull/1484

   https://issues.apache.org/jira/browse/SOLR-16693
   
   RE infamous error: "ClusterState says we are the leader ... but locally we 
don't think so." from DistributedZkUpdateProcessor.
   
   As it happens, I have a test in a fork of Solr that causes this failure half 
the time on a split shard test that is rather simple (notwithstanding inherent 
complexities of shard splits itself). After debugging it, I came to a similar 
to conclusion – this error should be caught and retried by the caller. It turns 
out, this is as easy as changing the HTTP status code from SERVICE_UNAVAILABLE 
to INVALID_STATE.
   
   I see another problem based on my test. A shard being split (a so-called 
parent shard) or that which recently completed (thus may have state INACTIVE) 
receives docs from a client (the test) and forwards to the sub-shards. But a 
sub-shard fails for the error shown above, and it does not bubble this up to 
the client; it's swallowed as okay. Changing the status code may fix for 
invalid state but wouldn't for other general errors (e.g. host went down 
suddenly). The result is data loss.
   
   I don't have a test to contribute for this, at least not yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to