[ 
https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446163#comment-16446163
 ] 

ASF GitHub Bot commented on NIFI-5096:
--------------------------------------

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/2646
  
    @mcgilman we do indeed implement the ConnectionStateListener, but we do so 
only to log the fact and then call super.stateChanged(). When we call 
super.stateChanged(), that will throw CancelLeadershipException, which in turn 
is supposed to interrupt our listener. We followed the "Error Handling" 
guidance provided by Apache Curator: 
https://curator.apache.org/curator-recipes/leader-election.html
    
    So we are handling the SUSPENDED and LOST scenarios as is recommended. And 
this works 99% of the time. Unfortunately, we do occasionally see scenarios 
where it does not interrupt the thread and as such the node believes that it 
retains the lock. It's not clear, when this happens, if the thread just wasn't 
interrupted for some reason, or if the notification of SUSPENDED/LOST never was 
received, or what exactly is occurring that prevents our ElectionListener from 
being interrupted.
    
    That's why I went with the solution of periodically polling ZooKeeper, to 
check the state. That way, whatever the cause of the thread not being 
interrupted, we still will break out. If you think it makes sense, though, we 
can detect the LOST state specifically and have that trigger us to leave the 
election, in addition to polling?


> When Primary Node changes, occasionally both the new and old primary nodes 
> continue running processors
> ------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-5096
>                 URL: https://issues.apache.org/jira/browse/NIFI-5096
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Major
>
> Occasionally we will see that Node A is Primary Node and then the Primary 
> Node switches to Node B, resulting in both Node A and Node B running 
> processors that are marked as Primary Node only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to