markap14 commented on PR #6779: URL: https://github.com/apache/nifi/pull/6779#issuecomment-1372856078
Thanks for the latest updated @exceptionfactory . Ran into another issue when testing, unfortunately. I have a statefulset that had 3 replicas. `nifi-1` was both the primary node and the coordinator. I then scaled the statefulset to 0. This didn't expire the lease though.: ``` mpayne@cs-654103601966-default:~$ k get leases NAME HOLDER AGE cluster-coordinator nifi-1.nifi:4423 63m primary-node nifi-1.nifi:4423 62m ``` Even after I waited over an hour the lease remains there. If I look at it: ``` mpayne@cs-654103601966-default:~$ k get lease cluster-coordinator -o yaml apiVersion: coordination.k8s.io/v1 kind: Lease metadata: creationTimestamp: "2023-01-05T21:06:29Z" name: cluster-coordinator namespace: nifi resourceVersion: "252479" uid: 7e5d05d1-3b20-426d-8822-5cff92eb183f spec: acquireTime: "2023-01-05T22:03:17.355642Z" holderIdentity: nifi-1.nifi:4423 leaseDurationSeconds: 15 leaseTransitions: 2 renewTime: "2023-01-05T22:04:13.480562Z" mpayne@cs-654103601966-default:~$ date Thu 05 Jan 2023 10:11:34 PM UTC ``` We can see here that date is well past the renewTime. (10:11:34 PM = 22:11:34 PM vs 22:04:13 as the renew time). So the least appears to remain, and the new node, `nifi-0` cannot proceed: ``` 2023-01-05 22:09:37,513 INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is located at nifi-1.nifi:4423. Will send Cluster Connection Request to this address 2023-01-05 22:09:37,535 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to nifi-1.nifi:4423 due to: java.net.UnknownHostException: nifi-1.nifi 2023-01-05 22:09:42,550 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at nifi-1.nifi:4423; will use this address for sending heartbeat messages 2023-01-05 22:09:42,550 INFO [main] o.a.n.c.p.AbstractNodeProtocolSender Cluster Coordinator is located at nifi-1.nifi:4423. Will send Cluster Connection Request to this address 2023-01-05 22:09:42,550 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to nifi-1.nifi:4423 due to: java.net.UnknownHostException: nifi-1.nifi ``` As soon as I delete the lease (`k delete lease cluster-coordinator`) all works as expected. But we obviously can't have users manually deleting the lease all the time. Not sure if this is the intended behavior, and we should be ignoring the lease if the renewTime has expired? Or is it because we don't actually participate in the leader election on startup since there appears to already be an elected leader? Either way, we need to make sure that we can properly handle this condition, where the lease points to a node that is no longer part of the cluster -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org