I was just asking, because I got to the point were all the maps() were
done, and I had configured one cluster to run 3 reduce(), but it was
too much for that machine, so everything was done, only those 3 tasks
needed to complete, but as they were running the 3 running at the same
time, they would all crash giving the same error message. I changed
the configuration so now that machine only runs one reduce() task. I
have a very heterogeneous cluster, one server for instance is  4/5
years old, so I have a specific hadoop-site.xml for each.

Just one more question, does Hadoop handles reassign of task failure
to different machines in some way?

I saw that sometimes, usually at the end, when there are more
"processing units" available than map() tasks to process, the same
map() tasks might be processed twice, then one is killed when the
other finish first.

Thanks for the reply and further links on the java exception.

--
./david



2009/5/6 Steve Loughran <ste...@apache.org>:
> Tom White wrote:
>>
>> Hi David,
>>
>> The MapReduce framework will attempt to rerun failed tasks
>> automatically. However, if a task is running out of memory on one
>> machine, it's likely to run out of memory on another, isn't it? Have a
>> look at the mapred.child.java.opts configuration property for the
>> amount of memory that each task VM is given (200MB by default). You
>> can also control the memory that each daemon gets using the
>> HADOOP_HEAPSIZE variable in hadoop-env.sh. Or you can specify it on a
>> per-daemon basis using the HADOOP_<DAEMON_NAME>_OPTS variables in the
>> same file.
>>
>> Tom
>
> This looks not so much a VM out of memory problem as OS thread provisioning.
> ulimit may be useful, as is the java -Xss option
>
> http://candrews.integralblue.com/2009/01/preventing-outofmemoryerror-native-thread/
>
>>
>> On Wed, May 6, 2009 at 1:28 AM, David Batista <dsbati...@gmail.com> wrote:
>>>
>>> I get this error when running Reduce tasks on a machine:
>>>
>>> java.lang.OutOfMemoryError: unable to create new native thread
>>>       at java.lang.Thread.start0(Native Method)
>>>       at java.lang.Thread.start(Thread.java:597)
>>>       at
>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:2591)
>>>       at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:454)
>>>       at
>>> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:190)
>>>       at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:487)
>>>       at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:387)
>>>       at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:117)
>>>       at
>>> org.apache.hadoop.mapred.lib.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:44)
>>>       at
>>> org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.write(MultipleOutputFormat.java:99)
>>>       at
>>> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
>>>
>>> is it possible to move a reduce task to other machine in the cluster on
>>> the fly?
>>>
>>> --
>>> ./david
>>>
>
>

Reply via email to