[ 
https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14866031#comment-14866031
 ] 

Mark Miller commented on SOLR-8069:
-----------------------------------

bq. thus I prefer the simple logic of "do this action only if our zookeeper 
session state is exactly what it was when we decided to do it". Anyhow, this is 
probably beyond the scope of this JIRA.

I don't see an easy way to do that in this case. Almost all the solutions that 
fit with the code have the exact same holes / races. I think the local leader 
check around getting the leader context is the strongest thing I can think of 
so far other than adding further defensive checks.

I don't know that much more is needed though. If the context returned is from 
the leader, great, its zkparentversion will will match. If the context is 
somehow not the right one, it won't match. We get a context and only if it's 
the context for the leader in ZK do we do anything rather than just if the 
context has a node in line. I'd say that is a pretty strong improvement.

This should only work the node is a valid leader by it's local state and by 
ZooKeeper.

> Leader Initiated Recovery can put the replica with the latest data into LIR 
> and a shard will have no leader even on restart.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8069
>                 URL: https://issues.apache.org/jira/browse/SOLR-8069
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>         Attachments: SOLR-8069.patch, SOLR-8069.patch
>
>
> I've seen this twice now. Need to work on a test.
> When some issues hit all the replicas at once, you can end up in a situation 
> where the rightful leader was put or put itself into LIR. Even on restart, 
> this rightful leader won't take leadership and you have to manually clear the 
> LIR nodes.
> It seems that if all the replicas participate in election on startup, LIR 
> should just be cleared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to