exceptionfactory commented on PR #6779: URL: https://github.com/apache/nifi/pull/6779#issuecomment-1356057372
Thanks for the feedback @markap14. On further evaluation of the disconnect and reconnect behavior, I realized the `unregister` method was not removing the local leader identifier from the `roleLeaders` Map within `KubernetesLeaderElectionManager`. The corresponding command was not being removed from the `roleCommands` Map, which was preventing proper registration on cluster reconnection. I corrected this behavior and also corrected the Role ID resolution prior to calling `findLeader()`. In addition to those changes, I removed the `withReleaseOnCancel()` setting from the Leader Elector Builder. This was a more recent addition to the Kubernetes Client library implementation. The purpose of the setting is to update the Lease will a null holder identity, prompting nodes to attempt lease renewal. For the purpose of NiFi clustering, this behavior does not seem necessary, as NiFi nodes will proceed with attempting to update and obtain a lease lock. Removing the release on cancel setting avoids the error shown above while allowing standard lease lock update attempts to proceed. These changes resulted in consistent behavior with various disconnect and reconnect attempts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org