Sandeep, Is the same DN 10.0.25.149 reported across all failures? And do you notice any machine patterns when observing the failed tasks (i.e. are they clumped on any one or a few particular TTs repeatedly)?
On Tue, May 22, 2012 at 7:32 PM, Sandeep Reddy P <sandeepreddy.3...@gmail.com> wrote: > Hi, > We have a 5node cdh3u4 cluster running. When i try to do teragen/terasort > some of the map tasks are Failed/Killed and the logs show similar error on > all machines. > > 2012-05-22 09:43:50,831 INFO org.apache.hadoop.hdfs.DFSClient: > Exception in createBlockOutputStream 10.0.25.149:50010 > java.net.SocketTimeoutException: 69000 millis timeout while waiting > for channel to be ready for read. ch : > java.nio.channels.SocketChannel[connected local=/10.0.25.149:55835 > remote=/10.0.25.149:50010] > 2012-05-22 09:44:25,968 INFO org.apache.hadoop.hdfs.DFSClient: > Abandoning block blk_7260720956806950576_1825 > 2012-05-22 09:44:25,973 INFO org.apache.hadoop.hdfs.DFSClient: > Excluding datanode 10.0.25.149:50010 > 2012-05-22 09:46:36,350 WARN org.apache.hadoop.mapred.Task: Parent > died. Exiting attempt_201205211504_0007_m_000016_1. > > > > Are these kind of errors common?? Atleast 1 map task is failing due to > above reason on all the machines.We are using 24 mappers for teragen. > For us it took 3hrs 44min 17 sec to generate 50Gb data with 24 mappers > and 17failed/8 killed task attempts. > > 24min 10 sec for 5GB data with 24 mappers and 9 killed Task attempts. > Cluster works good for small datasets. -- Harsh J