I am running terasort on 10^12 bytes on a 512 node hadoop cluster.

There is something funny about the timings, that I am unable to explain. 
Probably something trivial, but not visible to my naked eye !

Here are the details - 

I am using CDH3B3

The job started at 14:47 on 12/10/2010

10/12/10 14:57:18 INFO mapred.JobClient: Running job: job_201012101121_0010

I am looking at the taskTracker log for this job

Here are the 4 relevant lines

Task TASKID="task_201012101121_0010_m_000087" TASK_TYPE="MAP" 
START_TIME="1292021852907" 
SPLITS="/default-rack/hadoop-376,/default-rack/hadoop-444,/default-rack/hadoop-264"
 .

MapAttempt TASK_TYPE="MAP" TASKID="task_201012101121_0010_m_000087" 
TASK_ATTEMPT_ID="attempt_201012101121_0010_m_000087_0" 
START_TIME="1291891739732" 
TRACKER_NAME="tracker_hadoop-444:localhost\.localdomain/127\.0\.0\.1:51062" 
HTTP_PORT="50060" .

MapAttempt TASK_TYPE="MAP" TASKID="task_201012101121_0010_m_000087" 
TASK_ATTEMPT_ID="attempt_201012101121_0010_m_000087_0" TASK_STATUS="SUCCESS" 
FINISH_TIME="1291891849493" HOSTNAME="/default-rack/hadoop-444" STATE_STRING="" 
COUNTERS="{(Fil.... 


Task TASKID="task_201012101121_0010_m_000087" TASK_TYPE="MAP" 
TASK_STATUS="SUCCESS" FINISH_TIME="1292021964098" 
COUNTERS="{(FileSystemCounters)(FileSystemCounters)[(FILE_BYTES_READ)(FILE_BYTES_READ)(97220263)][(HDFS_BYTES_READ)(HDFS_BYTES_READ)(105263208)][(FILE_BYTES_WRITTEN)(FILE_BYTES_WRITTEN)(39496455)]}{(org\.apache\.hadoop\.mapred\.Task$Counter)(Map-Reduce
 Framework)[(COMBINE_OUTPUT_RECORDS)(Combine output 
records)(0)][(MAP_INPUT_RECORDS)(Map input 
records)(1052631)][(SPILLED_RECORDS)(Spilled 
Records)(2105262)][(MAP_OUTPUT_BYTES)(Map output byte....


Here are the four time stamps logged and the human readable format 

 Task                Start    1292021852907 -  Fri Dec 10 14:57:32 PST 2010
 MapAttempt     Start - 1291891739732 -  Thu Dec 09 02:48:59 PST 2010
 MapAttempt     End    1291891849493 -  Thu Dec 09 02:50:49 PST 2010
 Task                End     1292021964098 -  Fri Dec 10 14:59:24 PST 2010

The clocks on hadoop-1 (TT) and Hadoop-444 ( the node on which this task ran) 
are off by a few seconds. ( about 7)


Any reason why the Task Timings and the MapAttempt Timings are so off?

Raj

P.S: Apologize for the long post and posting to multiple locations.

Reply via email to