[ https://issues.apache.org/jira/browse/FLINK-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946727#comment-14946727 ]
ASF GitHub Bot commented on FLINK-2790: --------------------------------------- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1213#issuecomment-146170086 I have a dependency problem with Curator leading to: ```bash ERROR org.apache.flink.runtime.jobmanager.JobManager - Error while starting up JobManager java.lang.NoSuchMethodError: org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String; at org.apache.curator.framework.imps.NamespaceImpl.<init>(NamespaceImpl.java:37) at org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:113) at org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:124) at org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:83) at org.apache.flink.runtime.util.ZooKeeperUtils.createLeaderElectionService(ZooKeeperUtils.java:145) at org.apache.flink.runtime.util.LeaderElectionUtils.createLeaderElectionService(LeaderElectionUtils.java:52) at org.apache.flink.runtime.jobmanager.JobManager$.createJobManagerComponents(JobManager.scala:1595) at org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(JobManager.scala:1672) at org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(JobManager.scala:1629) at org.apache.flink.runtime.jobmanager.JobManager$.startActorSystemAndJobManagerActors(JobManager.scala:1307) at org.apache.flink.yarn.ApplicationMasterBase.runAction(ApplicationMasterBase.scala:127) at org.apache.flink.yarn.ApplicationMasterBase$$anon$1.run(ApplicationMasterBase.scala:76) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608) at org.apache.flink.yarn.ApplicationMasterBase.run(ApplicationMasterBase.scala:74) at org.apache.flink.yarn.ApplicationMaster$.main(ApplicationMaster.scala:35) at org.apache.flink.yarn.ApplicationMaster.main(ApplicationMaster.scala) ``` Our application master class path: ```bash 13:45:50,795 DEBUG org.apache.flink.yarn.ApplicationMaster - All environment variables: {PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin, HADOOP_CONF_DIR=/Users/ufuk/Downloads/hadoop-2.6.0/etc/hadoop, MAX_APP_ATTEMPTS=2, HADOOP_SECURE_DN_PID_DIR=, HADOOP_PID_DIR=, MAIL=/var/mail/ufuk, LD_LIBRARY_PATH=:/Users/ufuk/Downloads/hadoop-2.6.0/lib/native:/Users/ufuk/Downloads/hadoop-2.6.0/lib/native, LOGNAME=ufuk, JVM_PID=13307, _DETACHED=false, PWD=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004/container_1444217271951_0004_01_000001, HADOOP_YARN_USER=yarn, HADOOP_PREFIX=/Users/ufuk/Downloads/hadoop-2.6.0, LOCAL_DIRS=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004, YARN_IDENT_STRING=ufuk, HADOOP_SECURE_DN_LOG_DIR=/, SHELL=/bin/zsh, YARN_CONF_DIR=/Users/ufuk/Downloads/hadoop-2.6.0/etc/hadoop, JAVA_MAIN_CLASS_10305=org.apache.hadoop.yarn.server.nodemanager.NodeManager, LOG_DIRS=/Users/ufuk/Downloads/hadoop-2.6.0/logs/userlogs/application_1444217271951_0004/container_1444217271951_0004_01_000001, _CLIENT_SHIP_FILES=file:/Users/ufuk/.flink/application_1444217271951_0004/flink-python-0.10-SNAPSHOT.jar,file:/Users/ufuk/.flink/application_1444217271951_0004/log4j-1.2.17.jar,file:/Users/ufuk/.flink/application_1444217271951_0004/slf4j-log4j12-1.7.7.jar,file:/Users/ufuk/.flink/application_1444217271951_0004/logback.xml,file:/Users/ufuk/.flink/application_1444217271951_0004/log4j.properties, _CLIENT_USERNAME=ufuk, HADOOP_YARN_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, TMPDIR=/var/folders/_c/5tc5q5q55qjcjtqwlwvwd1m00000gn/T/, HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS -Dhadoop.security.logger=ERROR,RFAS , HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender , _FLINK_JAR_PATH=file:/Users/ufuk/.flink/application_1444217271951_0004/flink-dist-0.10-SNAPSHOT.jar, __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0, LC_CTYPE=UTF-8, _CLIENT_TM_COUNT=2, _CLIENT_TM_MEMORY=1024, SHLVL=3, HADOOP_IDENT_STRING=ufuk, YARN_ROOT_LOGGER=INFO,RFA, _SLOTS=-1, _CLIENT_HOME_DIR=file:/Users/ufuk, APP_SUBMIT_TIME_ENV=1444218347305, NM_HOST=192.168.178.69, _APP_ID=application_1444217271951_0004, YARN_LOGFILE=yarn-ufuk-nodemanager-vinci.local.log, HADOOP_SECURE_DN_USER=, HADOOP_CLASSPATH=/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar, HADOOP_HDFS_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, HADOOP_MAPRED_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, HADOOP_COMMON_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, HADOOP_CLIENT_OPTS=-Xmx512m -Xmx512m , _=/bin/java, APPLICATION_WEB_PROXY_BASE=/proxy/application_1444217271951_0004, _STREAMING_MODE=false, NM_HTTP_PORT=8042, JAVA_MAIN_CLASS_13316=org.apache.flink.yarn.ApplicationMaster, HADOOP_OPTS= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/Users/ufuk/Downloads/hadoop-2.6.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/ufuk/Downloads/hadoop-2.6.0 -Dhadoop.id.str=ufuk -Dhadoop.root.logger=INFO,console -Djava.library.path=/Users/ufuk/Downloads/hadoop-2.6.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/Users/ufuk/Downloads/hadoop-2.6.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/Users/ufuk/Downloads/hadoop-2.6.0 -Dhadoop.id.str=ufuk -Dhadoop.root.logger=INFO,console -Djava.library.path=/Users/ufuk/Downloads/hadoop-2.6.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true, SSH_CLIENT=::1 60015 22, NM_PORT=60017, USER=ufuk, CLASSPATH=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004/container_1444217271951_0004_01_000001/*:/Users/ufuk/Downloads/hadoop-2.6.0/etc/hadoop:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/common/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/common/lib/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/hdfs/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/yarn/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/yarn/lib/*, SSH_CONNECTION=::1 60015 ::1 22, HADOOP_TOKEN_FILE_LOCATION=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004/container_1444217271951_0004_01_000001/container_tokens, HADOOP_NFS3_OPTS=, HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dhdfs.audit.logger=INFO,NullAppender , YARN_NICENESS=0, HOME=/home/, CONTAINER_ID=container_1444217271951_0004_01_000001, HADOOP_PORTMAP_OPTS=-Xmx512m -Xmx512m , MALLOC_ARENA_MAX=4} ``` The problem is */Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/yarn/lib/**, which contains Curator 2.6. I've used vanilla YARN 2.6.0. Robert for whom it worked used cdh 5.4. > Add high availability support for Yarn > -------------------------------------- > > Key: FLINK-2790 > URL: https://issues.apache.org/jira/browse/FLINK-2790 > Project: Flink > Issue Type: Sub-task > Components: JobManager, TaskManager > Reporter: Till Rohrmann > Fix For: 0.10 > > > Add master high availability support for Yarn. The idea is to let Yarn > restart a failed application master in a new container. For that, we set the > number of application retries to something greater than 1. > From version 2.4.0 onwards, it is possible to reuse already started > containers for the TaskManagers, thus, avoiding unnecessary restart delays. > From version 2.6.0 onwards, it is possible to specify an interval in which > the number of application attempts have to be exceeded in order to fail the > job. This will prevent long running jobs from eventually depleting all > available application attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)