dsmiley opened a new pull request, #1484: URL: https://github.com/apache/solr/pull/1484
https://issues.apache.org/jira/browse/SOLR-16693 RE infamous error: "ClusterState says we are the leader ... but locally we don't think so." from DistributedZkUpdateProcessor. As it happens, I have a test in a fork of Solr that causes this failure half the time on a split shard test that is rather simple (notwithstanding inherent complexities of shard splits itself). After debugging it, I came to a similar to conclusion – this error should be caught and retried by the caller. It turns out, this is as easy as changing the HTTP status code from SERVICE_UNAVAILABLE to INVALID_STATE. I see another problem based on my test. A shard being split (a so-called parent shard) or that which recently completed (thus may have state INACTIVE) receives docs from a client (the test) and forwards to the sub-shards. But a sub-shard fails for the error shown above, and it does not bubble this up to the client; it's swallowed as okay. Changing the status code may fix for invalid state but wouldn't for other general errors (e.g. host went down suddenly). The result is data loss. I don't have a test to contribute for this, at least not yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org