I'm no expert here, but are 4 yarn containers/task managers (-yn 4) not too
many for 3 data nodes (=3 dn?)?
also, isn't the YARN UI reflecting its own jobs, i.e. running flink, as opposed
to running the actual flink job? or did you mean that the flink web ui (through
yarn) showed the submitted job as succeeded?
Nico
On Thursday, 8 June 2017 16:49:37 CEST Biplob Biswas wrote:
> I am trying to run a flink job on our cluster with 3 dn and 1 nn. I am usng
> the following command line argument to run this job, but I get an exception
> saying "Could not connect to the leading JobManager. Please check that the
> JobManager is running" ... what could I be doing wrong?
> Surprisingly, on yarn UI, i shows that the job finished and it succeeded,
> which is completely false.
>
> Can anyone shed some light on this?
>
> ./bin/flink run -m yarn-cluster -yn 4 -yt ./lib -c CEPForBAMOther
> ~/flinkCEPJob/FlinkCEP-1.0-SNAPSHOT.jar
> 2017-06-08 14:58:45,247 INFO
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline
> service address:
> http://airpluspoc-hdp-mn1.germanycentral.cloudapp.microsoftazure.de:8188/ws/
> v1/timeline/ 2017-06-08 14:58:45,372 INFO
> org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at
> airpluspoc-hdp-mn1.germanycentral.cloudapp.microsoftazure.de/172.16.6.31:805
> 0 2017-06-08 14:58:45,883 WARN
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The
> short-circuit local reads feature cannot be used because libhadoop cannot be
> loaded.
> 2017-06-08 14:58:51,296 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
> application application_1496901009273_0048
> Cluster started: Yarn cluster with application id
> application_1496901009273_0048
> Using address
> airpluspoc-hdp-dn2.germanycentral.cloudapp.microsoftazure.de:36661 to
> connect to JobManager.
> JobManager web interface address
> http://airpluspoc-hdp-mn1.germanycentral.cloudapp.microsoftazure.de:8088/pro
> xy/application_1496901009273_0048/ Using the parallelism provided by the
> remote cluster (4). To use another parallelism, set it at the ./bin/flink
> client.
> Starting execution of program
> RecordReadEventType
> Test String
> Waiting until all TaskManagers have connected
>
>
> The program finished with the following exception:
>
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error.
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgr
> am.java:545) at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExec
> ution(PackagedProgram.java:419) at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:381)
> at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:259)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
> at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130) at
> org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurity
> Context.java:43) at java.security.AccessController.doPrivileged(Native
> Method) at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.ja
> va:1657) at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSec
> urityContext.java:40) at
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1129) Caused by:
> java.lang.RuntimeException: Unable to get ClusterClient status from
> Application Client
> at
> org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.j
> ava:243) at
> org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterC
> lient.java:507) at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:454)
> at
> org.apache.flink.yarn.YarnClusterClient.submitJob(YarnClusterClient.java:205
> ) at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)
> at
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(
> StreamContextEnvironment.java:71) at
> CEPForBAMOther.main(CEPForBAMOther.java:134)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62
> ) at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> .java:43) at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgr
> am.java:528) ... 13 more
> Caused by: org.apache.flink.util.FlinkException: Could not connect to the
> leading JobManager. Please check that the JobManager is