[ https://issues.apache.org/jira/browse/HADOOP-15864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16662401#comment-16662401 ]
Wei-Chiu Chuang commented on HADOOP-15864: ------------------------------------------ Thanks for the patch. First off, please update the affect versions and fix versions. It sounds to me the bug still exists in trunk, so you should definitely provide a patch against trunk to begin with. If you do intend to offer a patch for branch-2.7, try to rename the patch as HADOOP-15864.branch.2.7.002.patch. I find separately the jira ID, branch name and revision numbers by dots works for me all the time. > Job submitter / executor fail when SBN domain name can not resolved > ------------------------------------------------------------------- > > Key: HADOOP-15864 > URL: https://issues.apache.org/jira/browse/HADOOP-15864 > Project: Hadoop Common > Issue Type: Bug > Reporter: He Xiaoqiao > Assignee: He Xiaoqiao > Priority: Critical > Attachments: HADOOP-15864-branch.2.7.001.patch, > HADOOP-15864-branch.2.7.002.patch, HADOOP-15864.003.patch > > > Job submit failure and Task executes failure if Standby NameNode domain name > can not resolved on HDFS HA with DelegationToken feature. > This issue is triggered when create {{ConfiguredFailoverProxyProvider}} > instance which invoke {{HAUtil.cloneDelegationTokenForLogicalUri}} in HA mode > with Security. Since in HDFS HA mode UGI need include separate token for each > NameNode in order to dealing with Active-Standby switch, the double tokens' > content is same of course. > However when #setTokenService in {{HAUtil.cloneDelegationTokenForLogicalUri}} > it checks whether the address of NameNode has been resolved or not, if Not, > throw #IllegalArgumentException upon, then job submitter/ task executor fail. > HDFS-8068 and HADOOP-12125 try to fix it, but I don't think the two tickets > resolve completely. > Another questions many guys consider is why NameNode domain name can not > resolve? I think there are many scenarios, for instance node replace when > meet fault, and refresh DNS sometimes. Anyway, Standby NameNode failure > should not impact Hadoop cluster stability in my opinion. > a. code ref: org.apache.hadoop.security.SecurityUtil line373-386 > {code:java} > public static Text buildTokenService(InetSocketAddress addr) { > String host = null; > if (useIpForTokenService) { > if (addr.isUnresolved()) { // host has no ip address > throw new IllegalArgumentException( > new UnknownHostException(addr.getHostName()) > ); > } > host = addr.getAddress().getHostAddress(); > } else { > host = StringUtils.toLowerCase(addr.getHostName()); > } > return new Text(host + ":" + addr.getPort()); > } > {code} > b.exception log ref: > {code:xml} > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Couldn't create proxy provider class > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider > at > org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:515) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:170) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:761) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:150) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) > at > org.apache.hadoop.fs.viewfs.ChRootedFileSystem.<init>(ChRootedFileSystem.java:106) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:178) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem$1.getTargetFileSystem(ViewFileSystem.java:172) > at org.apache.hadoop.fs.viewfs.InodeTree.createLink(InodeTree.java:303) > at org.apache.hadoop.fs.viewfs.InodeTree.<init>(InodeTree.java:377) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem$1.<init>(ViewFileSystem.java:172) > at > org.apache.hadoop.fs.viewfs.ViewFileSystem.initialize(ViewFileSystem.java:172) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2713) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:93) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2747) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2729) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:176) > at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:665) > ... 35 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > at > org.apache.hadoop.hdfs.NameNodeProxies.createFailoverProxyProvider(NameNodeProxies.java:498) > ... 58 more > Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: > standbynamenode > at > org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:390) > at > org.apache.hadoop.security.SecurityUtil.setTokenService(SecurityUtil.java:369) > at > org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:317) > at > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:132) > at > org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.<init>(ConfiguredFailoverProxyProvider.java:84) > ... 62 more > Caused by: java.net.UnknownHostException: standbynamenode > ... 67 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org