RE: Failed to run wordcount on YARN
Hi Devaraj Thanks a lot for the explanation in detail. Best Regards, Raymond Liu -Original Message- From: Devaraj k [mailto:devara...@huawei.com] Sent: Friday, July 12, 2013 4:24 PM To: user@hadoop.apache.org Subject: RE: Failed to run wordcount on YARN Hi Raymond, In Hadoop 2.0.5 version, FileInputFormat new API doesn't support reading the files recursively in input dir. In supports only having the input dir with files. If the input dir has any child dir's then it throws below error. This has been added in trunk with this JIRA https://issues.apache.org/jira/browse/MAPREDUCE-3193. You can give input dir to the Job which doesn't have nested dir's or you can make use of the old FileInputFormat API to read files recursively in the sub dir's. Thanks Devaraj k -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: 12 July 2013 12:57 To: user@hadoop.apache.org Subject: Failed to run wordcount on YARN Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there are errors : ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount /tmp /out 13/07/12 15:05:53 INFO mapreduce.Job: Task Id : attempt_1373609123233_0004_m_04_0, Status : FAILED Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:117) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(
RE: Failed to run wordcount on YARN
Hi Raymond, In Hadoop 2.0.5 version, FileInputFormat new API doesn't support reading the files recursively in input dir. In supports only having the input dir with files. If the input dir has any child dir's then it throws below error. This has been added in trunk with this JIRA https://issues.apache.org/jira/browse/MAPREDUCE-3193. You can give input dir to the Job which doesn't have nested dir's or you can make use of the old FileInputFormat API to read files recursively in the sub dir's. Thanks Devaraj k -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Sent: 12 July 2013 12:57 To: user@hadoop.apache.org Subject: Failed to run wordcount on YARN Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there are errors : ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount /tmp /out 13/07/12 15:05:53 INFO mapreduce.Job: Task Id : attempt_1373609123233_0004_m_04_0, Status : FAILED Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:117) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) I check the HDFS and found
Failed to run wordcount on YARN
Hi I just start to try out hadoop2.0, I use the 2.0.5-alpha package And follow http://hadoop.apache.org/docs/r2.0.5-alpha/hadoop-project-dist/hadoop-common/ClusterSetup.html to setup a cluster in non-security mode. HDFS works fine with client tools. While when I run wordcount example, there are errors : ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar wordcount /tmp /out 13/07/12 15:05:53 INFO mapreduce.Job: Task Id : attempt_1373609123233_0004_m_04_0, Status : FAILED Error: java.io.FileNotFoundException: Path is not a file: /tmp/hadoop-yarn at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:42) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1317) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1276) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1252) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1225) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:403) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:239) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40728) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:986) at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:974) at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:157) at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:124) at org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:117) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1131) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:77) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:713) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:89) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:519) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153) I check the HDFS and found /tmp/hadoop-yarn is there , this dir's owner is the same as the job user. And to ensure it works, I also create /tmp/hadoop-yarn on local fs. None of it works. Any idea what might be the problem? Thx! Best Regards, Raymond Liu