markap14 commented on PR #6779:
URL: https://github.com/apache/nifi/pull/6779#issuecomment-1372856078

   Thanks for the latest updated @exceptionfactory . Ran into another issue 
when testing, unfortunately.
   I have a statefulset that had 3 replicas. `nifi-1` was both the primary node 
and the coordinator.
   I then scaled the statefulset to 0.
   This didn't expire the lease though.:
   ```
   mpayne@cs-654103601966-default:~$ k get leases
   NAME                  HOLDER             AGE
   cluster-coordinator   nifi-1.nifi:4423   63m
   primary-node          nifi-1.nifi:4423   62m
   ```
   
   Even after I waited over an hour the lease remains there. If I look at it:
   ```
   mpayne@cs-654103601966-default:~$ k get lease cluster-coordinator -o yaml
   apiVersion: coordination.k8s.io/v1
   kind: Lease
   metadata:
     creationTimestamp: "2023-01-05T21:06:29Z"
     name: cluster-coordinator
     namespace: nifi
     resourceVersion: "252479"
     uid: 7e5d05d1-3b20-426d-8822-5cff92eb183f
   spec:
     acquireTime: "2023-01-05T22:03:17.355642Z"
     holderIdentity: nifi-1.nifi:4423
     leaseDurationSeconds: 15
     leaseTransitions: 2
     renewTime: "2023-01-05T22:04:13.480562Z"
   
   mpayne@cs-654103601966-default:~$ date
   Thu 05 Jan 2023 10:11:34 PM UTC
   ```
   
   We can see here that date is well past the renewTime. (10:11:34 PM = 
22:11:34 PM vs 22:04:13 as the renew time).
   So the least appears to remain, and the new node, `nifi-0` cannot proceed:
   
   ```
   2023-01-05 22:09:37,513 INFO [main] o.a.n.c.p.AbstractNodeProtocolSender 
Cluster Coordinator is located at nifi-1.nifi:4423. Will send Cluster 
Connection Request to this address
   2023-01-05 22:09:37,535 WARN [main] o.a.nifi.controller.StandardFlowService 
Failed to connect to cluster due to: 
org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to 
nifi-1.nifi:4423 due to: java.net.UnknownHostException: nifi-1.nifi
   2023-01-05 22:09:42,550 INFO [main] 
o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster 
Coordinator is located at nifi-1.nifi:4423; will use this address for sending 
heartbeat messages
   2023-01-05 22:09:42,550 INFO [main] o.a.n.c.p.AbstractNodeProtocolSender 
Cluster Coordinator is located at nifi-1.nifi:4423. Will send Cluster 
Connection Request to this address
   2023-01-05 22:09:42,550 WARN [main] o.a.nifi.controller.StandardFlowService 
Failed to connect to cluster due to: 
org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to 
nifi-1.nifi:4423 due to: java.net.UnknownHostException: nifi-1.nifi
   ```
   
   As soon as I delete the lease (`k delete lease cluster-coordinator`) all 
works as expected.
   But we obviously can't have users manually deleting the lease all the time.
   Not sure if this is the intended behavior, and we should be ignoring the 
lease if the renewTime has expired? Or is it because we don't actually 
participate in the leader election on startup since there appears to already be 
an elected leader?
   Either way, we need to make sure that we can properly handle this condition, 
where the lease points to a node that is no longer part of the cluster


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to