[ https://issues.apache.org/jira/browse/NIFI-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446163#comment-16446163 ]
ASF GitHub Bot commented on NIFI-5096: -------------------------------------- Github user markap14 commented on the issue: https://github.com/apache/nifi/pull/2646 @mcgilman we do indeed implement the ConnectionStateListener, but we do so only to log the fact and then call super.stateChanged(). When we call super.stateChanged(), that will throw CancelLeadershipException, which in turn is supposed to interrupt our listener. We followed the "Error Handling" guidance provided by Apache Curator: https://curator.apache.org/curator-recipes/leader-election.html So we are handling the SUSPENDED and LOST scenarios as is recommended. And this works 99% of the time. Unfortunately, we do occasionally see scenarios where it does not interrupt the thread and as such the node believes that it retains the lock. It's not clear, when this happens, if the thread just wasn't interrupted for some reason, or if the notification of SUSPENDED/LOST never was received, or what exactly is occurring that prevents our ElectionListener from being interrupted. That's why I went with the solution of periodically polling ZooKeeper, to check the state. That way, whatever the cause of the thread not being interrupted, we still will break out. If you think it makes sense, though, we can detect the LOST state specifically and have that trigger us to leave the election, in addition to polling? > When Primary Node changes, occasionally both the new and old primary nodes > continue running processors > ------------------------------------------------------------------------------------------------------ > > Key: NIFI-5096 > URL: https://issues.apache.org/jira/browse/NIFI-5096 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Major > > Occasionally we will see that Node A is Primary Node and then the Primary > Node switches to Node B, resulting in both Node A and Node B running > processors that are marked as Primary Node only. -- This message was sent by Atlassian JIRA (v7.6.3#76005)