[ https://issues.apache.org/jira/browse/FLINK-20143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233592#comment-17233592 ]
Kostas Kloudas commented on FLINK-20143: ---------------------------------------- Yes, I will try to merge it today and hopefully it will make it in 1.12 [~zhisheng]. > use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode > -------------------------------------------------------------------------- > > Key: FLINK-20143 > URL: https://issues.apache.org/jira/browse/FLINK-20143 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Deployment / YARN > Affects Versions: 1.12.0, 1.11.2 > Reporter: zhisheng > Assignee: Yang Wang > Priority: Major > Labels: pull-request-available > Attachments: image-2020-11-13-17-21-47-751.png, > image-2020-11-13-17-22-06-111.png, image-2020-11-13-18-43-55-188.png > > > use follow command deploy flink job to yarn failed > {code:java} > ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD > yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib > ./examples/streaming/StateMachineExample.jar > {code} > log: > {code:java} > $ ./bin/flink run -m yarn-cluster -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD > yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib > ./examples/streaming/StateMachineExample.jar$ ./bin/flink run -m yarn-cluster > -d -ynm flink-1.12-test -ytm 3g -yjm 3g -yD > yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib > ./examples/streaming/StateMachineExample.jarSLF4J: Class path contains > multiple SLF4J bindings.SLF4J: Found binding in > [jar:file:/data1/app/flink-1.12-SNAPSHOT/lib/log4j-slf4j-impl-2.12.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: > Found binding in > [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: > Found binding in > [jar:file:/data1/app/hadoop-2.7.3-snappy-32core12disk/share/hadoop/tools/lib/hadoop-aliyun-2.9.2-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: > See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation.SLF4J: Actual binding is of type > [org.apache.logging.slf4j.Log4jLoggerFactory]2020-11-13 16:14:30,347 INFO > org.apache.flink.yarn.cli.FlinkYarnSessionCli [] - Dynamic > Property set: > yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib2020-11-13 > 16:14:30,347 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli > [] - Dynamic Property set: > yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/libUsage with > built-in data generator: StateMachineExample [--error-rate > <probability-of-invalid-transition>] [--sleep <sleep-per-record-in-ms>]Usage > with Kafka: StateMachineExample --kafka-topic <topic> [--brokers > <brokers>]Options for both the above setups: [--backend <file|rocks>] > [--checkpoint-dir <filepath>] [--async-checkpoints <true|false>] > [--incremental-checkpoints <true|false>] [--output <filepath> OR null for > stdout] > Using standalone source with error rate 0.000000 and sleep delay 1 millis > 2020-11-13 16:14:30,706 WARN > org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The > configuration directory ('/data1/app/flink-1.12-SNAPSHOT/conf') already > contains a LOG4J config file.If you want to use logback, then please delete > or rename the log configuration file.2020-11-13 16:14:30,947 INFO > org.apache.hadoop.yarn.client.AHSProxy [] - Connecting > to Application History server at > FAT-hadoopuat-69117.vm.dc01.tech/10.69.1.17:102002020-11-13 16:14:30,958 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - No path > for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar2020-11-13 > 16:14:31,065 INFO > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing > over to rm22020-11-13 16:14:31,130 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - The > configured JobManager memory is 3072 MB. YARN will allocate 4096 MB to make > up an integer multiple of its minimum allocation memory (2048 MB, configured > via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be > used by Flink.2020-11-13 16:14:31,130 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - The > configured TaskManager memory is 3072 MB. YARN will allocate 4096 MB to make > up an integer multiple of its minimum allocation memory (2048 MB, configured > via 'yarn.scheduler.minimum-allocation-mb'). The extra 1024 MB may not be > used by Flink.2020-11-13 16:14:31,130 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - Cluster > specification: ClusterSpecification{masterMemoryMB=3072, > taskManagerMemoryMB=3072, slotsPerTaskManager=2}2020-11-13 16:14:31,681 WARN > org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory [] - The > short-circuit local reads feature cannot be used because libhadoop cannot be > loaded.2020-11-13 16:14:33,417 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - Submitting > application master application_1599741232083_219902020-11-13 16:14:33,446 > INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - > Submitted application application_1599741232083_219902020-11-13 16:14:33,446 > INFO org.apache.flink.yarn.YarnClusterDescriptor [] - > Waiting for the cluster to be allocated2020-11-13 16:14:33,448 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - Deploying > cluster, current state ACCEPTED > ------------------------------------------------------------ The program > finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error: Could not deploy Yarn job cluster. at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:330) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198) > at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:743) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:242) at > org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:971) at > org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1047) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1047)Caused > by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not > deploy Yarn job cluster. at > org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:460) > at > org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1916) > at > org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128) > at > org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1798) > at > org.apache.flink.streaming.examples.statemachine.StateMachineExample.main(StateMachineExample.java:142) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:316) > ... 11 moreCaused by: > org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN > application unexpectedly switched to state FAILED during > deployment.Diagnostics from YARN: Application application_1599741232083_21990 > failed 2 times in previous 10000 milliseconds due to AM Container for > appattempt_1599741232083_21990_000002 exited with exitCode: -1Failing this > attempt.Diagnostics: [2020-11-13 16:14:38.244]Destination must be relativeFor > more detailed output, check the application tracking page: > http://FAT-hadoopuat-69117.vm.dc01.tech:8188/applicationhistory/app/application_1599741232083_21990 > Then click on links to logs of each attempt.. Failing the application.If log > aggregation is enabled on your cluster, use this command to further > investigate the issue:yarn logs -applicationId > application_1599741232083_21990 at > org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:1078) > at > org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:558) > at > org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:453) > ... 22 more2020-11-13 16:14:38,492 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - Cancelling > deployment from Deployment Failure Hook2020-11-13 16:14:38,494 INFO > org.apache.hadoop.yarn.client.AHSProxy [] - Connecting > to Application History server at > FAT-hadoopuat-69117.vm.dc01.tech/10.69.1.17:102002020-11-13 16:14:38,495 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - Killing > YARN application2020-11-13 16:14:38,499 INFO > org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing > over to rm22020-11-13 16:14:38,503 INFO > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Killed > application application_1599741232083_219902020-11-13 16:14:38,503 INFO > org.apache.flink.yarn.YarnClusterDescriptor [] - Deleting > files in > hdfs://flashHadoopUAT/user/deploy/.flink/application_1599741232083_21990. > {code} > but if i set `execution.target: yarn-per-job` in flink-conf.yaml, it runs ok > if i run in application mode, it runs ok too > {code:java} > ./bin/flink run-application -p 2 -d -t yarn-application -ytm 3g -yjm 3g -yD > yarn.provided.lib.dirs=hdfs:///flink/flink-1.12-SNAPSHOT/lib > ./examples/streaming/StateMachineExample.jar > {code} > but the jobid is 00000000000000000000000000000000 > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)