[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-05-28 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849916#comment-17849916
 ] 

Hong Liang Teoh commented on FLINK-34672:
-

Ok - sounds good. Will proceed with 1.19.1 RC without this

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
> elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
> [0x7f530567d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
> - waiting to lock <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
> - locked <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837)
> at 
> java.uti

[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-05-22 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848648#comment-17848648
 ] 

Matthias Pohl commented on FLINK-34672:
---

I'm still trying to find a reviewer. It's on my plate. But it's not a blocker 
because the issue already existed in older versions of Flink:
{quote}
I also verified that this is not something that was introduced in Flink 1.18 
with the FLIP-285 changes. AFAIS, it can also happen in 1.17- (I didn't check 
the pre-FLINK-24038 code but only looked into release-1.17).
{quote}

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
> elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
> [0x7f530567d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
> - waiting to lock <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
> - locked <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320

[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-05-22 Thread Hong Liang Teoh (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848646#comment-17848646
 ] 

Hong Liang Teoh commented on FLINK-34672:
-

[~mapohl] do we know if this should be a blocker to the Flink 1.19.1 patch 
release?

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.17.2, 1.19.0, 1.18.1, 1.20.0
>Reporter: Chesnay Schepler
>Assignee: Matthias Pohl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon prio=5 os_prio=0 cpu=125.56ms 
> elapsed=78699.01s tid=0x7f5321c6e800 nid=0x396 waiting for monitor entry  
> [0x7f530567d000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.hasLeadership(DefaultLeaderElectionService.java:366)
> - waiting to lock <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElection.hasLeadership(DefaultLeaderElection.java:52)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.isValidLeader(JobMasterServiceLeadershipRunner.java:509)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.lambda$forwardIfValidLeader$15(JobMasterServiceLeadershipRunner.java:520)
> - locked <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner$$Lambda$1320/0x000840e1a840.accept(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@11.0.22/CompletableFuture.java:859)
> at 
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@11.0.22/CompletableFuture.java:837

[jira] [Commented] (FLINK-34672) HA deadlock between JobMasterServiceLeadershipRunner and DefaultLeaderElectionService

2024-03-19 Thread Matthias Pohl (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-34672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828221#comment-17828221
 ] 

Matthias Pohl commented on FLINK-34672:
---

IMHO, the both locks are still necessary because we're accessing the state in 
the {{JobMasterServiceLeadershipRunner}} and the leadership in 
{{DefaultLeaderElectionService}}. I also verified that this is not something 
that was introduced in Flink 1.18 with the 
[FLIP-285|https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box]
 changes. AFAIS, it can also happen in 1.17- (I didn't check the 
pre-FLINK-24038 code but only looked into {{release-1.17}}).

One solution would be to move the async callback of 
[JobMasterServiceLeadershipRunner#forwardIfValidLeader|https://github.com/apache/flink/blob/c9fcb0c74b1354f4f0f1b7c7f62191b8cc6b5725/flink-runtime/src/main/java/org/apache/flink/runtime/jobmaster/JobMasterServiceLeadershipRunner.java#L547]
 into the leader-operation executor of the {{DefaultLeaderElectionService}} to 
force sequential execution of leadership-related operations.

This is only an issue in the {{JobMasterServiceLeadershipRunner}} because we're 
executing the creation asynchronously in an io thread. The other place where we 
check within the contender whether leadership is acquired is the 
{{DefaultDispatcherRunner}}. But we're not doing any async calls there during 
leadership handling (the {{DefaultDispatcherRunner}} is created directly in the 
leader-operation executor while handling the leadership acquired event).

> HA deadlock between JobMasterServiceLeadershipRunner and 
> DefaultLeaderElectionService
> -
>
> Key: FLINK-34672
> URL: https://issues.apache.org/jira/browse/FLINK-34672
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.18.1
>Reporter: Chesnay Schepler
>Priority: Major
> Fix For: 1.18.2, 1.20.0, 1.19.1
>
>
> We recently observed a deadlock in the JM within the HA system.
> (see below for the thread dump)
> [~mapohl] and I looked a bit into it and there appears to be a race condition 
> when leadership is revoked while a JobMaster is being started.
> It appears to be caused by 
> {{JobMasterServiceLeadershipRunner#createNewJobMasterServiceProcess}} 
> forwarding futures while holding a lock; depending on whether the forwarded 
> future is already complete the next stage may or may not run while holding 
> that same lock.
> We haven't determined yet whether we should be holding that lock or not.
> {code}
> "DefaultLeaderElectionService-leadershipOperationExecutor-thread-1" #131 
> daemon prio=5 os_prio=0 cpu=157.44ms elapsed=78749.65s tid=0x7f531f43d000 
> nid=0x19d waiting for monitor entry  [0x7f53084fd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.runIfStateRunning(JobMasterServiceLeadershipRunner.java:462)
> - waiting to lock <0xf1c0e088> (a java.lang.Object)
> at 
> org.apache.flink.runtime.jobmaster.JobMasterServiceLeadershipRunner.revokeLeadership(JobMasterServiceLeadershipRunner.java:397)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.notifyLeaderContenderOfLeadershipLoss(DefaultLeaderElectionService.java:484)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1252/0x000840ddec40.accept(Unknown
>  Source)
> at java.util.HashMap.forEach(java.base@11.0.22/HashMap.java:1337)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.onRevokeLeadershipInternal(DefaultLeaderElectionService.java:452)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1251/0x000840dcf840.run(Unknown
>  Source)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService.lambda$runInLeaderEventThread$3(DefaultLeaderElectionService.java:549)
> - locked <0xf0e3f4d8> (a java.lang.Object)
> at 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService$$Lambda$1075/0x000840c23040.run(Unknown
>  Source)
> at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(java.base@11.0.22/CompletableFuture.java:1736)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.22/ThreadPoolExecutor.java:1128)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.22/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run(java.base@11.0.22/Thread.java:829)
> {code}
> {code}
> "jobmanager-io-thread-1" #636 daemon pri