[ 
https://issues.apache.org/jira/browse/FLINK-2790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14946727#comment-14946727
 ] 

ASF GitHub Bot commented on FLINK-2790:
---------------------------------------

Github user uce commented on the pull request:

    https://github.com/apache/flink/pull/1213#issuecomment-146170086
  
    I have a dependency problem with Curator leading to:
    
    ```bash
    ERROR org.apache.flink.runtime.jobmanager.JobManager                - Error 
while starting up JobManager
    java.lang.NoSuchMethodError: 
org.apache.curator.utils.PathUtils.validatePath(Ljava/lang/String;)Ljava/lang/String;
        at 
org.apache.curator.framework.imps.NamespaceImpl.<init>(NamespaceImpl.java:37)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.<init>(CuratorFrameworkImpl.java:113)
        at 
org.apache.curator.framework.CuratorFrameworkFactory$Builder.build(CuratorFrameworkFactory.java:124)
        at 
org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:83)
        at 
org.apache.flink.runtime.util.ZooKeeperUtils.createLeaderElectionService(ZooKeeperUtils.java:145)
        at 
org.apache.flink.runtime.util.LeaderElectionUtils.createLeaderElectionService(LeaderElectionUtils.java:52)
        at 
org.apache.flink.runtime.jobmanager.JobManager$.createJobManagerComponents(JobManager.scala:1595)
        at 
org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(JobManager.scala:1672)
        at 
org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(JobManager.scala:1629)
        at 
org.apache.flink.runtime.jobmanager.JobManager$.startActorSystemAndJobManagerActors(JobManager.scala:1307)
        at 
org.apache.flink.yarn.ApplicationMasterBase.runAction(ApplicationMasterBase.scala:127)
        at 
org.apache.flink.yarn.ApplicationMasterBase$$anon$1.run(ApplicationMasterBase.scala:76)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:360)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at 
org.apache.flink.yarn.ApplicationMasterBase.run(ApplicationMasterBase.scala:74)
        at 
org.apache.flink.yarn.ApplicationMaster$.main(ApplicationMaster.scala:35)
        at org.apache.flink.yarn.ApplicationMaster.main(ApplicationMaster.scala)
    ```
    
    Our application master class path:
    ```bash
    13:45:50,795 DEBUG org.apache.flink.yarn.ApplicationMaster                  
     - All environment variables: 
{PATH=/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin, 
HADOOP_CONF_DIR=/Users/ufuk/Downloads/hadoop-2.6.0/etc/hadoop, 
MAX_APP_ATTEMPTS=2, HADOOP_SECURE_DN_PID_DIR=, HADOOP_PID_DIR=, 
MAIL=/var/mail/ufuk, 
LD_LIBRARY_PATH=:/Users/ufuk/Downloads/hadoop-2.6.0/lib/native:/Users/ufuk/Downloads/hadoop-2.6.0/lib/native,
 LOGNAME=ufuk, JVM_PID=13307, _DETACHED=false, 
PWD=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004/container_1444217271951_0004_01_000001,
 HADOOP_YARN_USER=yarn, HADOOP_PREFIX=/Users/ufuk/Downloads/hadoop-2.6.0, 
LOCAL_DIRS=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004,
 YARN_IDENT_STRING=ufuk, HADOOP_SECURE_DN_LOG_DIR=/, SHELL=/bin/zsh, 
YARN_CONF_DIR=/Users/ufuk/Downloads/hadoop-2.6.0/etc/hadoop, 
JAVA_MAIN_CLASS_10305=org.apache.hadoop.yarn.server.nodemanager.NodeManager, 
LOG_DIRS=/Users/ufuk/Downloads/hadoop-2.6.0/logs/userlogs/application_1444217271951_0004/container_1444217271951_0004_01_000001,
 
_CLIENT_SHIP_FILES=file:/Users/ufuk/.flink/application_1444217271951_0004/flink-python-0.10-SNAPSHOT.jar,file:/Users/ufuk/.flink/application_1444217271951_0004/log4j-1.2.17.jar,file:/Users/ufuk/.flink/application_1444217271951_0004/slf4j-log4j12-1.7.7.jar,file:/Users/ufuk/.flink/application_1444217271951_0004/logback.xml,file:/Users/ufuk/.flink/application_1444217271951_0004/log4j.properties,
 _CLIENT_USERNAME=ufuk, HADOOP_YARN_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, 
TMPDIR=/var/folders/_c/5tc5q5q55qjcjtqwlwvwd1m00000gn/T/, 
HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS 
-Dhadoop.security.logger=ERROR,RFAS , 
HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS 
-Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS 
-Dhdfs.audit.logger=INFO,NullAppender , 
_FLINK_JAR_PATH=file:/Users/ufuk/.flink/application_1444217271951_0004/flink-dist-0.10-SNAPSHOT.jar,
 __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0, LC_CTYPE=UTF-8, _CLIENT_TM_COUNT=2, 
_CLIENT_TM_MEMORY=1024, SHLVL=3, HADOOP_IDENT_STRING=ufuk, 
YARN_ROOT_LOGGER=INFO,RFA, _SLOTS=-1, _CLIENT_HOME_DIR=file:/Users/ufuk, 
APP_SUBMIT_TIME_ENV=1444218347305, NM_HOST=192.168.178.69, 
_APP_ID=application_1444217271951_0004, 
YARN_LOGFILE=yarn-ufuk-nodemanager-vinci.local.log, HADOOP_SECURE_DN_USER=, 
HADOOP_CLASSPATH=/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar,
 HADOOP_HDFS_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, 
HADOOP_MAPRED_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, 
HADOOP_COMMON_HOME=/Users/ufuk/Downloads/hadoop-2.6.0, 
HADOOP_CLIENT_OPTS=-Xmx512m -Xmx512m , _=/bin/java, 
APPLICATION_WEB_PROXY_BASE=/proxy/application_1444217271951_0004, 
_STREAMING_MODE=false, NM_HTTP_PORT=8042, 
JAVA_MAIN_CLASS_13316=org.apache.flink.yarn.ApplicationMaster, HADOOP_OPTS= 
-Djava.net.preferIPv4Stack=true 
-Dhadoop.log.dir=/Users/ufuk/Downloads/hadoop-2.6.0/logs 
-Dhadoop.log.file=hadoop.log 
-Dhadoop.home.dir=/Users/ufuk/Downloads/hadoop-2.6.0 -Dhadoop.id.str=ufuk 
-Dhadoop.root.logger=INFO,console 
-Djava.library.path=/Users/ufuk/Downloads/hadoop-2.6.0/lib/native 
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true 
-Djava.net.preferIPv4Stack=true 
-Dhadoop.log.dir=/Users/ufuk/Downloads/hadoop-2.6.0/logs 
-Dhadoop.log.file=hadoop.log 
-Dhadoop.home.dir=/Users/ufuk/Downloads/hadoop-2.6.0 -Dhadoop.id.str=ufuk 
-Dhadoop.root.logger=INFO,console 
-Djava.library.path=/Users/ufuk/Downloads/hadoop-2.6.0/lib/native 
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true, 
SSH_CLIENT=::1 60015 22, NM_PORT=60017, USER=ufuk, 
CLASSPATH=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004/container_1444217271951_0004_01_000001/*:/Users/ufuk/Downloads/hadoop-2.6.0/etc/hadoop:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/common/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/common/lib/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/hdfs/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/yarn/*:/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/yarn/lib/*,
 SSH_CONNECTION=::1 60015 ::1 22, 
HADOOP_TOKEN_FILE_LOCATION=/tmp/hadoop-ufuk/nm-local-dir/usercache/ufuk/appcache/application_1444217271951_0004/container_1444217271951_0004_01_000001/container_tokens,
 HADOOP_NFS3_OPTS=, HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=INFO,RFAS 
-Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS 
-Dhdfs.audit.logger=INFO,NullAppender , YARN_NICENESS=0, HOME=/home/, 
CONTAINER_ID=container_1444217271951_0004_01_000001, 
HADOOP_PORTMAP_OPTS=-Xmx512m -Xmx512m , MALLOC_ARENA_MAX=4}
    ```
    
    The problem is 
*/Users/ufuk/Downloads/hadoop-2.6.0/share/hadoop/yarn/lib/**, which contains 
Curator 2.6.
    
    I've used vanilla YARN 2.6.0. Robert for whom it worked used cdh 5.4.
    



> Add high availability support for Yarn
> --------------------------------------
>
>                 Key: FLINK-2790
>                 URL: https://issues.apache.org/jira/browse/FLINK-2790
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager, TaskManager
>            Reporter: Till Rohrmann
>             Fix For: 0.10
>
>
> Add master high availability support for Yarn. The idea is to let Yarn 
> restart a failed application master in a new container. For that, we set the 
> number of application retries to something greater than 1. 
> From version 2.4.0 onwards, it is possible to reuse already started 
> containers for the TaskManagers, thus, avoiding unnecessary restart delays.
> From version 2.6.0 onwards, it is possible to specify an interval in which 
> the number of application attempts have to be exceeded in order to fail the 
> job. This will prevent long running jobs from eventually depleting all 
> available application attempts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to