Hi All:
I've got 0.17.0 set up on a 7 node grid (6 slaves w/datanodes, 1 master running
namenode). I'm trying to process a small (180G) dataset. I've done this
succesfully and painlessly running 0.15.0. When I run 0.17.0 with the same
data and same code (w/API changes for 0.17.0 and recompiled, of course), I get
a ton of failures. I've increased the number of namenode threads trying to
resolve this, but that doesn't seem to help. The errors are of the following
flavor:
java.io.IOException: Could not get block locations. Aborting...
java.io.IOException: All datanodes 10.2.11.2:50010 are bad. Aborting...
Exception in thread "Thread-2" java.util.ConcurrentModificationException
Exception closing file /blah/_temporary/_task_200807052311_0001_r_0000
04_0/baz/part-xxxxx
As things stand right now, I can't deploy to 0.17.0 (or 0.16.4 or 0.17.1). I
am wondering if anybody can shed some light on this, or if others are having
similar problems.
Any thoughts, insights, etc. would be greatly appreciated.
Thanks,
C G
Here's an ugly trace:
08/07/06 01:43:29 INFO mapred.JobClient: map 100% reduce 93%
08/07/06 01:43:29 INFO mapred.JobClient: Task Id :
task_200807052311_0001_r_000003_0, Status : FAILED
java.io.IOException: Could not get block locations. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
task_200807052311_0001_r_000003_0: Exception closing file
/output/_temporary/_task_200807052311_0001_r_0000
03_0/a/b/part-00003
task_200807052311_0001_r_000003_0: java.io.IOException: All datanodes
10.2.11.2:50010 are bad. Aborting...
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.ja
va:2095)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1
818)
task_200807052311_0001_r_000003_0: Exception in thread "Thread-2"
java.util..ConcurrentModificationException
task_200807052311_0001_r_000003_0: at
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
task_200807052311_0001_r_000003_0: at
java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
task_200807052311_0001_r_000003_0: at
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)
08/07/06 01:44:32 INFO mapred.JobClient: map 100% reduce 74%
08/07/06 01:44:32 INFO mapred.JobClient: Task Id :
task_200807052311_0001_r_000001_0, Status : FAILED
java.io.IOException: Could not get block locations. Aborting...
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
at
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
task_200807052311_0001_r_000001_0: Exception in thread "Thread-2"
java.util..ConcurrentModificationException
task_200807052311_0001_r_000001_0: at
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
task_200807052311_0001_r_000001_0: at
java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
task_200807052311_0001_r_000001_0: at
org.apache.hadoop.dfs.DFSClient.close(DFSClient.java:217)
task_200807052311_0001_r_000001_0: at
org.apache.hadoop.dfs.DistributedFileSystem.close(DistributedFileSystem.java:214)
task_200807052311_0001_r_000001_0: at
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:1324)
task_200807052311_0001_r_000001_0: at
org.apache.hadoop.fs.FileSystem.closeAll(FileSystem.java:224)
task_200807052311_0001_r_000001_0: at
org.apache.hadoop.fs.FileSystem$ClientFinalizer.run(FileSystem.java:209)
08/07/06 01:44:45 INFO mapred.JobClient: map 100% reduce 54%