[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189286#comment-17189286 ] Till Rohrmann commented on FLINK-19022: --- I think we should do both things in order to be on the safe side. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at >
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189152#comment-17189152 ] tartarus commented on FLINK-19022: -- [~trohrmann] sorry , There is another question to confirm. if we catch Throwable in {{ResourceManager}} and {{Dispatcher}}, then they will call {{fatalErrorHandler.onFatalError(throwable);}} this method will call {{ClusterEntrypoint.onFatalError()}}, then it is necessary to register {{terminationFuture}} with {{DispatcherResourceManagerComponent}}? > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187499#comment-17187499 ] Till Rohrmann commented on FLINK-19022: --- [~tartarus] this is what the underlying {{AkkaRpcActor}} should already do for us. So there should be no need to add another {{terminationFuture}}. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17187201#comment-17187201 ] tartarus commented on FLINK-19022: -- [~trohrmann] hello, I have one questions need to be confirmed when I do this work: 1) {{ResourceManager.getTerminationFuture}} has been implemented in {{RpcEndpoint}}; {code:java} /** * Return a future which is completed with true when the rpc endpoint has been terminated. * In case of a failure, this future is completed with the occurring exception. * * @return Future which is completed when the rpc endpoint has been terminated. */ public CompletableFuture getTerminationFuture() { return rpcServer.getTerminationFuture(); } {code} how about we add a {{terminationFuture}} to {{ResourceManager}} and register current {{ResourceManager.getTerminationFuture}} with {{terminationFuture}}? and then if {{ResourceManager}} started fail we call completeExceptionally to complete {{terminationFuture}}. at last we register this {{terminationFuture}} with {{DispatcherResourceManagerComponent}}'s {{shutDownFuture}}. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17185142#comment-17185142 ] tartarus commented on FLINK-19022: -- ok,I will try to complete this issue as soon as possible > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at >
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184982#comment-17184982 ] Till Rohrmann commented on FLINK-19022: --- No, we don't remove the {{FatalErrorHandler}} from the {{Dispatcher}} and the {{ResourceManager}}. In the {{DispatcherResourceManagerComponent}} we should react to the termination future of the {{Dispatcher}} and {{ResourceManager}} in such a way that we call the fatal error handler if either of them completes while the {{DispatcherResourceManagerComponent}} is still running (not closing). > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17184921#comment-17184921 ] tartarus commented on FLINK-19022: -- [~trohrmann] I want to confirm the little details. we pass a {{FatalErrorHandler}} to the {{DispatcherResourceManagerComponent, and then we need to remove }}{{FatalErrorHandler from {{ResourceManager}} and {{Dispatcher}} ?}} Just register the {{TerminationFuture}} of ResourceManager}} and {{Dispatcher to {{DispatcherResourceManagerComponent.}} > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) >
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183768#comment-17183768 ] Till Rohrmann commented on FLINK-19022: --- Yes, for 3) we also need to pass a {{FatalErrorHandler}} to the {{DispatcherResourceManagerComponent}}. And I would say that we add 4) logging to the {{AkkaRpcActor.StartedState.terminate}} and {{AkkaRpcActor.StoppedState.start}}. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at >
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183295#comment-17183295 ] tartarus commented on FLINK-19022: -- [~trohrmann] I agree with you. If we do, then 3 questions need to be confirmed: 1) We may need to catch {{Throwable in }}{{ResourceManager}} and {{Dispatcher}}; 2)We call FatalErrorHandler directly in ResourceManager and Dispatcher or terminate through DispatcherResourceManagerComponent; 3)If terminate through DispatcherResourceManagerComponent, we need register both {{TerminationFuture}} to DispatcherResourceManagerComponent; Is there anything else that needs attention? > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183204#comment-17183204 ] Till Rohrmann commented on FLINK-19022: --- Well, the idea was that the {{DispatcherResourceManagerComponent}} must somehow react if one of the component shuts unexpectedly down. For example, one could monitor the termination futures of the {{ResourceManager}} and the {{Dispatcher}} and call a yet to be passed in {{FatalErrorHandler}} if they terminate while the {{DispatcherResourceManagerComponent}} is still {{isRunning}}. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183136#comment-17183136 ] tartarus commented on FLINK-19022: -- [~trohrmann] do your means is add {{TerminationFuture}} for {{ResourceManager}} and register to {{DispatcherResourceManagerComponent}} like this {code:java} private void registerShutDownFuture() { FutureUtils.forward(dispatcherRunner.getShutDownFuture(), shutDownFuture); } {code} > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183122#comment-17183122 ] tartarus commented on FLINK-19022: -- [~trohrmann] thanks for your reply. Sorry, my description is not very clear. RM shutting down the container because the failed happen on {{AMRMClientAsyncImpl#registerApplicationMaster, so JM not registed to RM yet.}} [https://github.com/apache/flink/blob/b4705edc841a8cf380d9a12d71551a4d38ec9e31/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java#L223] beacause [https://github.com/apache/flink/blob/b4705edc841a8cf380d9a12d71551a4d38ec9e31/flink-yarn/src/main/java/org/apache/flink/yarn/YarnResourceManager.java#L280] here only catch {{Exception}} but my case is a {{Error}} , {{NoSuchMethodError}} >From the current code logic, only {{AkkaRpcActor.StoppedState#start}} catch >the throwable, but not print log, so we miss the error message. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0, 1.12.0, 1.11.1 >Reporter: tartarus >Assignee: tartarus >Priority: Critical > Fix For: 1.12.0, 1.11.2, 1.10.3 > > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183029#comment-17183029 ] Till Rohrmann commented on FLINK-19022: --- Thanks for reporting this issue [~tartarus]. I agree that this is a problem. Could you share the logs with us? I would like to learn why the RM is shutting down the container eventually. Concerning the problem: One problem is that https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L212 only catches {{Exception}} instead of {{Throwable}}. The same actually applies to https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java#L248. The other problem is as you've mentioned that the {{AkkaRpcActor}} does not log the failure cause in case of failed start or shut down attempt. I think it would be a good improvement to add the logs in these places. Last but not least, I believe that the {{DispatcherResourceManagerComponent}} should also react if either of the {{Dispatcher}} or {{ResourceManager}} component failed during the start up. One way to do it, is to combine the termination futures {{Dispatcher.getTerminationFuture}} and {{ResourceManager.getTerminationFuture}} into the {{shutDownFuture}} of the {{DispatcherResourceManagerComponent}}. > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: tartarus >Priority: Major > > My job appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at
[jira] [Commented] (FLINK-19022) AkkaRpcActor failed to start but no exception information
[ https://issues.apache.org/jira/browse/FLINK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17181911#comment-17181911 ] tartarus commented on FLINK-19022: -- [~chesnay] [~trohrmann] How about adding log printing here to help quickly find the problem? Please assign to me, thanks > AkkaRpcActor failed to start but no exception information > - > > Key: FLINK-19022 > URL: https://issues.apache.org/jira/browse/FLINK-19022 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.10.0 >Reporter: tartarus >Priority: Major > > My task appeared that JM could not start normally, and the JM container was > finally killed by RM. > In the end, I found through debug that AkkaRpcActor failed to start because > the version of yarn in my job was incompatible with the version in the > cluster. > [AkkaRpcActor exception > handling|https://github.com/apache/flink/blob/478c9657fe1240acdc1eb08ad32ea93e08b0cd5e/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L550] > I add log printing here,and then found the specific problem. > {code:java} > 2020-08-21 21:31:16,985 ERROR > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState > [flink-akka.actor.default-dispatcher-4] - Could not start RpcEndpoint > resourcemanager. > java.lang.NoSuchMethodError: > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB.registerApplicationMaster(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterRequestProto;)Lorg/apache/hadoop/yarn/proto/YarnServiceProtos$RegisterApplicationMasterResponseProto; > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy25.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:222) > at > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.registerApplicationMaster(AMRMClientImpl.java:214) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.registerApplicationMaster(AMRMClientAsyncImpl.java:138) > at > org.apache.flink.yarn.YarnResourceManager.createAndStartResourceManagerClient(YarnResourceManager.java:229) > at > org.apache.flink.yarn.YarnResourceManager.initialize(YarnResourceManager.java:262) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.startResourceManagerServices(ResourceManager.java:204) > at > org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:192) > at > org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:185) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.start(AkkaRpcActor.java:544) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:169) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) > at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) > at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) > at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) > at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) > at akka.actor.Actor$class.aroundReceive(Actor.scala:517) > at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) > at akka.actor.ActorCell.invoke(ActorCell.scala:561) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) > at akka.dispatch.Mailbox.run(Mailbox.scala:225) > at akka.dispatch.Mailbox.exec(Mailbox.scala:235) > at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at >