I was just asking, because I got to the point were all the maps() were done, and I had configured one cluster to run 3 reduce(), but it was too much for that machine, so everything was done, only those 3 tasks needed to complete, but as they were running the 3 running at the same time, they would all crash giving the same error message. I changed the configuration so now that machine only runs one reduce() task. I have a very heterogeneous cluster, one server for instance is 4/5 years old, so I have a specific hadoop-site.xml for each.
Just one more question, does Hadoop handles reassign of task failure to different machines in some way? I saw that sometimes, usually at the end, when there are more "processing units" available than map() tasks to process, the same map() tasks might be processed twice, then one is killed when the other finish first. Thanks for the reply and further links on the java exception. -- ./david 2009/5/6 Steve Loughran <ste...@apache.org>: > Tom White wrote: >> >> Hi David, >> >> The MapReduce framework will attempt to rerun failed tasks >> automatically. However, if a task is running out of memory on one >> machine, it's likely to run out of memory on another, isn't it? Have a >> look at the mapred.child.java.opts configuration property for the >> amount of memory that each task VM is given (200MB by default). You >> can also control the memory that each daemon gets using the >> HADOOP_HEAPSIZE variable in hadoop-env.sh. Or you can specify it on a >> per-daemon basis using the HADOOP_<DAEMON_NAME>_OPTS variables in the >> same file. >> >> Tom > > This looks not so much a VM out of memory problem as OS thread provisioning. > ulimit may be useful, as is the java -Xss option > > http://candrews.integralblue.com/2009/01/preventing-outofmemoryerror-native-thread/ > >> >> On Wed, May 6, 2009 at 1:28 AM, David Batista <dsbati...@gmail.com> wrote: >>> >>> I get this error when running Reduce tasks on a machine: >>> >>> java.lang.OutOfMemoryError: unable to create new native thread >>> at java.lang.Thread.start0(Native Method) >>> at java.lang.Thread.start(Thread.java:597) >>> at >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2591) >>> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454) >>> at >>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487) >>> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:387) >>> at >>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:117) >>> at >>> org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44) >>> at >>> org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99) >>> at >>> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410) >>> >>> is it possible to move a reduce task to other machine in the cluster on >>> the fly? >>> >>> -- >>> ./david >>> > >