Hi, We sometimes see tasks failing with the exception below. There are no network issues and the domainname resolves normally. Also, all nodes have a local DNS caching daemon running. Any idea why we see this error? It usually happens when there is more than one job running on the cluster.
We could, of course, add all nodes in /etc/hosts but i prefer not. java.net.UnknownHostException: unknown host: namenode at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1192) at org.apache.hadoop.ipc.Client.call(Client.java:1046) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225) at $Proxy2.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:118) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:222) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1328) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:65) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1346) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122) at org.apache.hadoop.mapred.Child$4.run(Child.java:254) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:249) Thanks