[jira] [Updated] (FLINK-12382) HA + ResourceManager exception: Fencing token not set

2020-06-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaël Renoux updated FLINK-12382:

Attachment: jobmanager_log

> HA + ResourceManager exception: Fencing token not set
> -
>
> Key: FLINK-12382
> URL: https://issues.apache.org/jira/browse/FLINK-12382
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
> Environment: Same all all previous bugs filed by myself, today, but 
> this time with HA with zetcd.
>Reporter: Henrik
>Priority: Major
> Attachments: jobmanager_log, jobmanager_stdout
>
>
> I'm testing zetcd + session jobs in k8s, and testing what happens when I kill 
> both the job-cluster and task-manager at the same time, but maintain ZK/zetcd 
> up and running.
> Then I get this stacktrace, that's completely non-actionable for me, and also 
> resolves itself. I expect a number of retries, and if this exception is part 
> of the protocol signalling to retry, then it should not be printed as a log 
> entry.
> This might be related to an older bug: 
> [https://jira.apache.org/jira/browse/FLINK-7734]
> {code:java}
> [tm] 2019-05-01 11:32:01,641 ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor    - Registration 
> at ResourceManager failed due to an error
> [tm] java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
> not set: Ignoring message 
> RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
> RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
> HardwareDescription, Time))) sent to 
> akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
> token is null.
> [tm]     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> [tm]     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> [tm]     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> [tm]     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> [tm]     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> [tm]     at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> [tm]     at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:815)
> [tm]     at akka.dispatch.OnComplete.internal(Future.scala:258)
> [tm]     at akka.dispatch.OnComplete.internal(Future.scala:256)
> [tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> [tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> [tm]     at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> [tm]     at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> [tm]     at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> [tm]     at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> [tm]     at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
> [tm]     at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:97)
> [tm]     at 
> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:982)
> [tm]     at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> [tm]     at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:446)
> [tm]     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> [tm]     at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> [tm]     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> [tm]     at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> [tm]     at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [tm] Caused by: 
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
> not set: Ignoring message 
> RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
> RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
> HardwareDescription, Time))) sent to 
> akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
> token is null.
> [tm]     at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63)
> [tm]     at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
> [tm]     at 
> 

[jira] [Updated] (FLINK-12382) HA + ResourceManager exception: Fencing token not set

2020-06-30 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FLINK-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaël Renoux updated FLINK-12382:

Attachment: jobmanager_stdout

> HA + ResourceManager exception: Fencing token not set
> -
>
> Key: FLINK-12382
> URL: https://issues.apache.org/jira/browse/FLINK-12382
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Coordination
>Affects Versions: 1.8.0
> Environment: Same all all previous bugs filed by myself, today, but 
> this time with HA with zetcd.
>Reporter: Henrik
>Priority: Major
> Attachments: jobmanager_log, jobmanager_stdout
>
>
> I'm testing zetcd + session jobs in k8s, and testing what happens when I kill 
> both the job-cluster and task-manager at the same time, but maintain ZK/zetcd 
> up and running.
> Then I get this stacktrace, that's completely non-actionable for me, and also 
> resolves itself. I expect a number of retries, and if this exception is part 
> of the protocol signalling to retry, then it should not be printed as a log 
> entry.
> This might be related to an older bug: 
> [https://jira.apache.org/jira/browse/FLINK-7734]
> {code:java}
> [tm] 2019-05-01 11:32:01,641 ERROR 
> org.apache.flink.runtime.taskexecutor.TaskExecutor    - Registration 
> at ResourceManager failed due to an error
> [tm] java.util.concurrent.CompletionException: 
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
> not set: Ignoring message 
> RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
> RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
> HardwareDescription, Time))) sent to 
> akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
> token is null.
> [tm]     at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> [tm]     at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> [tm]     at 
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
> [tm]     at 
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> [tm]     at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> [tm]     at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> [tm]     at 
> org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:815)
> [tm]     at akka.dispatch.OnComplete.internal(Future.scala:258)
> [tm]     at akka.dispatch.OnComplete.internal(Future.scala:256)
> [tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
> [tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
> [tm]     at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
> [tm]     at 
> org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
> [tm]     at 
> scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
> [tm]     at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
> [tm]     at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
> [tm]     at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:97)
> [tm]     at 
> akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:982)
> [tm]     at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> [tm]     at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:446)
> [tm]     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
> [tm]     at akka.actor.ActorCell.invoke(ActorCell.scala:495)
> [tm]     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
> [tm]     at akka.dispatch.Mailbox.run(Mailbox.scala:224)
> [tm]     at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> [tm]     at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> [tm] Caused by: 
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
> not set: Ignoring message 
> RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
> RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
> HardwareDescription, Time))) sent to 
> akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
> token is null.
> [tm]     at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63)
> [tm]     at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
> [tm]     at 
> 

[jira] [Updated] (FLINK-12382) HA + ResourceManager exception: Fencing token not set

2019-05-01 Thread Henrik (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik updated FLINK-12382:
---
Description: 
I'm testing zetcd + session jobs in k8s, and testing what happens when I kill 
both the job-cluster and task-manager at the same time, but maintain ZK/zetcd 
up and running.

Then I get this stacktrace, that's completely non-actionable for me, and also 
resolves itself. I expect a number of retries, and if this exception is part of 
the protocol signalling to retry, then it should not be printed as a log entry.

This might be related to an older bug: 
[https://jira.apache.org/jira/browse/FLINK-7734]
{code:java}
[tm] 2019-05-01 11:32:01,641 ERROR 
org.apache.flink.runtime.taskexecutor.TaskExecutor    - Registration at 
ResourceManager failed due to an error
[tm] java.util.concurrent.CompletionException: 
org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
not set: Ignoring message RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
HardwareDescription, Time))) sent to 
akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
token is null.
[tm]     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
[tm]     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
[tm]     at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
[tm]     at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
[tm]     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[tm]     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
[tm]     at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:815)
[tm]     at akka.dispatch.OnComplete.internal(Future.scala:258)
[tm]     at akka.dispatch.OnComplete.internal(Future.scala:256)
[tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
[tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
[tm]     at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
[tm]     at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
[tm]     at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
[tm]     at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
[tm]     at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
[tm]     at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:97)
[tm]     at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:982)
[tm]     at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
[tm]     at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:446)
[tm]     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
[tm]     at akka.actor.ActorCell.invoke(ActorCell.scala:495)
[tm]     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
[tm]     at akka.dispatch.Mailbox.run(Mailbox.scala:224)
[tm]     at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
[tm]     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[tm]     at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[tm]     at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[tm]     at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[tm] Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: 
Fencing token not set: Ignoring message 
RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
HardwareDescription, Time))) sent to 
akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
token is null.
[tm]     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63)
[tm]     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
[tm]     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
[tm]     at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
[tm]     at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
[tm]     at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
[tm]     ... 9 more
[tm] 2019-05-01 11:32:01,650 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor    - Pausing and 
re-attempting registration in 1 ms
[tm] 2019-05-01 11:32:03,070 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor    - The heartbeat 
of JobManager with id 3642aa576f132fecd6811ae0d314c2b5 timed out.
[tm] 2019-05-01 11:32:03,070 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor    - Close 

[jira] [Updated] (FLINK-12382) HA + ResourceManager exception: Fencing token not set

2019-05-01 Thread Henrik (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-12382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henrik updated FLINK-12382:
---
Description: 
I'm testing zetcd + session jobs in k8s, and testing what happens when I kill 
both the job-cluster and task-manager at the same time, but maintain ZK/zetcd 
up and running.

Then I get this stacktrace, that's completely non-actionable for me, and also 
resolves itself. I expect a number of retries, and if this exception is part of 
the protocol signalling to retry, then it should not be printed as a log entry.

This might be related to an older bug: 
[https://jira.apache.org/jira/browse/FLINK-7734]
{code:java}
[tm] 2019-05-01 11:32:01,641 ERROR 
org.apache.flink.runtime.taskexecutor.TaskExecutor    - Registration at 
ResourceManager failed due to an error
[tm] java.util.concurrent.CompletionException: 
org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token 
not set: Ignoring message RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
HardwareDescription, Time))) sent to 
akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
token is null.
[tm]     at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
[tm]     at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
[tm]     at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
[tm]     at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
[tm]     at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
[tm]     at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
[tm]     at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:815)
[tm]     at akka.dispatch.OnComplete.internal(Future.scala:258)
[tm]     at akka.dispatch.OnComplete.internal(Future.scala:256)
[tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
[tm]     at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
[tm]     at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
[tm]     at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:74)
[tm]     at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
[tm]     at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
[tm]     at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
[tm]     at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:97)
[tm]     at 
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:982)
[tm]     at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
[tm]     at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:446)
[tm]     at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
[tm]     at akka.actor.ActorCell.invoke(ActorCell.scala:495)
[tm]     at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
[tm]     at akka.dispatch.Mailbox.run(Mailbox.scala:224)
[tm]     at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
[tm]     at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[tm]     at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[tm]     at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[tm]     at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[tm] Caused by: org.apache.flink.runtime.rpc.exceptions.FencingTokenException: 
Fencing token not set: Ignoring message 
RemoteFencedMessage(b8fb92460bdd5c250de1415e6cf04d4e, 
RemoteRpcInvocation(registerTaskExecutor(String, ResourceID, int, 
HardwareDescription, Time))) sent to 
akka.tcp://flink@analytics-job:6123/user/resourcemanager because the fencing 
token is null.
[tm]     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:63)
[tm]     at 
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:147)
[tm]     at 
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
[tm]     at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
[tm]     at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
[tm]     at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
[tm]     ... 9 more
[tm] 2019-05-01 11:32:01,650 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor    - Pausing and 
re-attempting registration in 1 ms
[tm] 2019-05-01 11:32:03,070 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor    - The heartbeat 
of JobManager with id 3642aa576f132fecd6811ae0d314c2b5 timed out.
[tm] 2019-05-01 11:32:03,070 INFO  
org.apache.flink.runtime.taskexecutor.TaskExecutor    - Close