Identifying why a task is taking long on a given hadoop node

2011-06-03 Thread Mayuresh
Hi,

I am really having a hard time debugging this. I have a hadoop cluster and
one of the maps is taking time. I checked the "datanode" logs and can see no
activity for around 10 minutes!

2011-06-03 10:09:06,772 DEBUG
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.240.222.218:50010,
storageID=DS-1909388466-10.240.222.218-50010-1307002238331, infoPort=50075,
ipcPort=50020):Number of active connections is: 2
2011-06-03 10:19:41,033 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-9115985339102075853_6140 file
/mnt/hadoop/hadoop-hadoop/dfs/data/current/blk_-9115985339102075853

The task running on this node gets killed finally.


Job:
attempt_201106011013_0023_m_13_0Task attempt:
/default-rack/domU-12-31-39-04-D9-2C.compute-1.internal
Cleanup Attempt:
/default-rack/domU-12-31-39-04-D9-2C.compute-1.internal
KILLED100.00%3-Jun-2011 10:09:413-Jun-2011 10:19:26 (9mins, 44sec)



Task attempt:
Last 
4KB
Last 
8KB
All
Cleanup attempt:
Last 
4KB
Last 
8KB
All
7
This node was added by me newly and I am trying to prove the analogy that it
doesnt have the required data, and it will bring it over the network. How do
I prove that this is the case here? I cannot see anything in the log. Is
there a way to identify the blocks that it was querying, and verify that it
wasn't present on the machine when the map ran?

Thanks in advance.

Regards,
Mayuresh


Re: Identifying why a task is taking long on a given hadoop node

2011-06-05 Thread Steve Loughran

On 03/06/2011 12:24, Mayuresh wrote:

Hi,

I am really having a hard time debugging this. I have a hadoop cluster and
one of the maps is taking time. I checked the "datanode" logs and can see no
activity for around 10 minutes!



The usual cause here is imminent disk failure, as reads start to take 
longer and longer. look at your SMART disk logs, do some performance 
tests of all the drives