exceptionfactory commented on PR #6779:
URL: https://github.com/apache/nifi/pull/6779#issuecomment-1356057372

   Thanks for the feedback @markap14.
   
   On further evaluation of the disconnect and reconnect behavior, I realized 
the `unregister` method was not removing the local leader identifier from the 
`roleLeaders` Map within `KubernetesLeaderElectionManager`. The corresponding 
command was not being removed from the `roleCommands` Map, which was preventing 
proper registration on cluster reconnection. I corrected this behavior and also 
corrected the Role ID resolution prior to calling `findLeader()`.
   
   In addition to those changes, I removed the `withReleaseOnCancel()` setting 
from the Leader Elector Builder. This was a more recent addition to the 
Kubernetes Client library implementation. The purpose of the setting is to 
update the Lease will a null holder identity, prompting nodes to attempt lease 
renewal. For the purpose of NiFi clustering, this behavior does not seem 
necessary, as NiFi nodes will proceed with attempting to update and obtain a 
lease lock. Removing the release on cancel setting avoids the error shown above 
while allowing standard lease lock update attempts to proceed.
   
   These changes resulted in consistent behavior with various disconnect and 
reconnect attempts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to