[ https://issues.apache.org/jira/browse/FALCON-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Balu Vellanki updated FALCON-2090: ---------------------------------- Fix Version/s: (was: 0.10) > HDFS Snapshot failed with UnknownHostException when scheduling in HA Mode > ------------------------------------------------------------------------- > > Key: FALCON-2090 > URL: https://issues.apache.org/jira/browse/FALCON-2090 > Project: Falcon > Issue Type: Bug > Components: replication > Affects Versions: trunk > Reporter: Murali Ramasami > Assignee: Balu Vellanki > Priority: Critical > Fix For: trunk > > > In NN HA, when I schedule a hdfs snapshot replication, it is failing with > "java.net.UnknownHostException: mycluster1". In the error message primary is > the source cluster Nameservice. Please see the complete stack trace. > Stack Trace: > {noformat} > Log Contents: > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/grid/0/hadoop/yarn/local/filecache/371/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/grid/0/hadoop/yarn/local/filecache/213/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > Error: java.lang.IllegalArgumentException: java.net.UnknownHostException: > mycluster1 > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:411) > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:429) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.initialize(WebHdfsFileSystem.java:207) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2730) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2764) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2746) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:178) > at > org.apache.falcon.hive.util.EventUtils.initializeFS(EventUtils.java:145) > at org.apache.falcon.hive.mapreduce.CopyMapper.setup(CopyMapper.java:47) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > Caused by: java.net.UnknownHostException: mycluster1 > ... 19 more > {noformat} > Steps to Reproduce: > primaryCluster: > ============ > {noformat} > <?xml version="1.0" encoding="UTF-8"?> > <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" > description="oregonHadoopCluster" name="primaryCluster"> > <interfaces> > <interface type="readonly" endpoint="webhdfs://mycluster1:20070" > version="0.20.2" /> > <interface type="write" endpoint="hdfs://mycluster1:8020" > version="0.20.2" /> > <interface type="execute" > endpoint="mramasami-falcon-multi-ha-bug-12.openstacklocal:8050" > version="0.20.2" /> > <interface type="workflow" > endpoint="http://mramasami-falcon-multi-ha-bug-14.openstacklocal:11000/oozie" > version="3.1" /> > <interface type="messaging" > endpoint="tcp://mramasami-falcon-multi-ha-bug-9.openstacklocal:61616?daemon=true" > version="5.1.6" /> > <interface type="registry" > endpoint="thrift://mramasami-falcon-multi-ha-bug-14.openstacklocal:9083" > version="0.11.0" /> > </interfaces> > <locations> > <location name="staging" path="/tmp/fs" /> > <location name="temp" path="/tmp" /> > <location name="working" path="/tmp/fw" /> > </locations> > <ACL owner="hrt_qa" group="users" permission="0755" /> > <properties> > <property name="dfs.namenode.kerberos.principal" > value="nn/_h...@example.com" /> > <property name="hive.metastore.kerberos.principal" > value="hive/_h...@example.com" /> > <property name="hive.metastore.sasl.enabled" value="true" /> > <property name="hadoop.rpc.protection" value="authentication" /> > <property name="hive.metastore.uris" > value="thrift://mramasami-falcon-multi-ha-bug-14.openstacklocal:9083" /> > <property name="hive.server2.uri" > value="hive2://mramasami-falcon-multi-ha-bug-14.openstacklocal:10000" /> > </properties> > </cluster> > {noformat} > falcon entity -submit -type cluster -file primaryCluster.xml --> > primaryCluster > backupCluster : > ============ > {noformat} > <?xml version="1.0" encoding="UTF-8"?> > <cluster xmlns="uri:falcon:cluster:0.1" colo="USWestOregon" > description="oregonHadoopCluster" name="backupCluster"> > <interfaces> > <interface type="readonly" endpoint="webhdfs://mycluster2:20070" > version="0.20.2" /> > <interface type="write" endpoint="hdfs://mycluster2:8020" > version="0.20.2" /> > <interface type="execute" > endpoint="mramasami-falcon-multi-ha-bug-5.openstacklocal:8050" > version="0.20.2" /> > <interface type="workflow" > endpoint="http://mramasami-falcon-multi-ha-bug-6.openstacklocal:11000/oozie" > version="3.1" /> > <interface type="messaging" > endpoint="tcp://mramasami-falcon-multi-ha-bug-1.openstacklocal:61616" > version="5.1.6" /> > <interface type="registry" > endpoint="thrift://mramasami-falcon-multi-ha-bug-6.openstacklocal:9083" > version="0.11.0" /> > </interfaces> > <locations> > <location name="staging" path="/tmp/fs" /> > <location name="temp" path="/tmp" /> > <location name="working" path="/tmp/fw" /> > </locations> > <ACL owner="hrt_qa" group="users" permission="0755" /> > <properties> > <property name="dfs.namenode.kerberos.principal" > value="nn/_h...@example.com" /> > <property name="hive.metastore.kerberos.principal" > value="hive/_h...@example.com" /> > <property name="hive.metastore.sasl.enabled" value="true" /> > <property name="hadoop.rpc.protection" value="authentication" /> > <property name="hive.metastore.uris" > value="thrift://mramasami-falcon-multi-ha-bug-6.openstacklocal:9083" /> > <property name="hive.server2.uri" > value="hive2://mramasami-falcon-multi-ha-bug-6.openstacklocal:10000" /> > </properties> > </cluster> > {noformat} > falcon entity -submit -type cluster -file backupCluster.xml --> backupCluster > HDFS Snapshot Replication: > ========================= > Source: > ====== > hdfs dfs -mkdir -p /tmp/falcon-regression/HDFSSnapshotTest/source > hdfs dfs -put > /grid/0/hadoopqe/tests/ha/falcon/combinedActions/mr_input/2015/01/02/NYSE-2000-2001.tsv > /tmp/falcon-regression/HDFSSnapshotTest/source > Create Snapshot : > =============== > hdfs dfsadmin -allowSnapshot /tmp/falcon-regression/HDFSSnapshotTest/source [ > hdfs] > hdfs dfs -createSnapshot /tmp/falcon-regression/HDFSSnapshotTest/source [ > hrt_qa] > hdfs lsSnapshottableDir [ hrt_qa] > hdfs dfs -ls /tmp/falcon-regression/HDFSSnapshotTest/source/.snapshot > Target: > ====== > hdfs dfs -mkdir -p /tmp/falcon-regression/HDFSSnapshotTest/target > hdfs dfsadmin -allowSnapshot /tmp/falcon-regression/HDFSSnapshotTest/target > hdfs dfs -ls /tmp/falcon-regression/HDFSSnapshotTest/target/.snapshot > hdfs-snapshot.properties > ========================== > {noformat} > jobName=HDFSSnapshotTest > jobClusterName=primaryCluster > jobValidityStart=2016-05-09T06:25Z > jobValidityEnd=2017-05-09T08:00Z > jobFrequency=days(1) > sourceCluster=primaryCluster > sourceSnapshotDir=/tmp/falcon-regression/HDFSSnapshotTest/source > sourceSnapshotRetentionAgeLimit=days(1) > sourceSnapshotRetentionNumber=3 > targetCluster=backupCluster > targetSnapshotDir=/tmp/falcon-regression/HDFSSnapshotTest/target > targetSnapshotRetentionAgeLimit=days(1) > targetSnapshotRetentionNumber=3 > jobAclOwner=hrt_qa > jobAclGroup=users > jobAclPermission="0x755" > {noformat} > falcon extension -extensionName hdfs-snapshot-mirroring -submitAndSchedule > -file hdfs-snapshot.properties -- This message was sent by Atlassian JIRA (v6.3.4#6332)