Sandeep,

Is the same DN 10.0.25.149 reported across all failures? And do you
notice any machine patterns when observing the failed tasks (i.e. are
they clumped on any one or a few particular TTs repeatedly)?

On Tue, May 22, 2012 at 7:32 PM, Sandeep Reddy P
<sandeepreddy.3...@gmail.com> wrote:
> Hi,
> We have a 5node cdh3u4 cluster running. When i try to do teragen/terasort
> some of the map tasks are Failed/Killed and the logs show similar error on
> all machines.
>
> 2012-05-22 09:43:50,831 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream 10.0.25.149:50010
> java.net.SocketTimeoutException: 69000 millis timeout while waiting
> for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.0.25.149:55835
> remote=/10.0.25.149:50010]
> 2012-05-22 09:44:25,968 INFO org.apache.hadoop.hdfs.DFSClient:
> Abandoning block blk_7260720956806950576_1825
> 2012-05-22 09:44:25,973 INFO org.apache.hadoop.hdfs.DFSClient:
> Excluding datanode 10.0.25.149:50010
> 2012-05-22 09:46:36,350 WARN org.apache.hadoop.mapred.Task: Parent
> died.  Exiting attempt_201205211504_0007_m_000016_1.
>
>
>
> Are these kind of errors common?? Atleast 1 map task is failing due to
> above reason on all the machines.We are using 24 mappers for teragen.
> For us it took 3hrs 44min 17 sec to generate 50Gb data with 24 mappers
> and 17failed/8 killed task attempts.
>
> 24min 10 sec for 5GB data with 24 mappers and 9 killed Task attempts.
> Cluster works good for small datasets.



-- 
Harsh J

Reply via email to