[ 
https://issues.apache.org/jira/browse/FLINK-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi closed FLINK-2969.
------------------------------
    Resolution: Invalid

> FlinkYarnSessionCli with recovery enabled fails when killing TaskManager
> ------------------------------------------------------------------------
>
>                 Key: FLINK-2969
>                 URL: https://issues.apache.org/jira/browse/FLINK-2969
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, YARN Client
>    Affects Versions: 0.10.0
>            Reporter: Ufuk Celebi
>
> I'm running a YARN session with 2 physical nodes and 5 containers 
> (ApplicationMaster and 4 TaskManagers). There is no Flink program submitted 
> to the cluster.
> Running a sequence of failure operations (killing the ApplicationMaster and 
> TaskManager containers), I sometimes get the following Exception after 
> killing a TaskManager:
> {code}
> 15:31:20,721 WARN  org.apache.flink.client.FlinkYarnSessionCli                
>    - Exception while running the interactive command line interface
> java.lang.RuntimeException: Unable to get Cluster status from Application 
> Client
>       at 
> org.apache.flink.yarn.FlinkYarnCluster.getClusterStatus(FlinkYarnCluster.java:307)
>       at 
> org.apache.flink.client.FlinkYarnSessionCli.runInteractiveCli(FlinkYarnSessionCli.java:296)
>       at 
> org.apache.flink.client.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:455)
>       at 
> org.apache.flink.client.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:351)
> Caused by: akka.pattern.AskTimeoutException: 
> Recipient[Actor[akka://flink/user/applicationClient#-607831833]] had already 
> been terminated.
>       at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
>       at akka.pattern.AskableActorRef$.$qmark$extension(AskSupport.scala:144)
>       at akka.pattern.AskSupport$class.ask(AskSupport.scala:75)
>       at akka.pattern.package$.ask(package.scala:43)
>       at akka.pattern.Patterns$.ask(Patterns.scala:47)
>       at akka.pattern.Patterns.ask(Patterns.scala)
>       at 
> org.apache.flink.yarn.FlinkYarnCluster.getClusterStatus(FlinkYarnCluster.java:302)
>       ... 3 more
> {code}
> I would like to investigate this for the 0.10.1/1.0 release and not block the 
> current RC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to