Re: Monitoring Slow job.
Hi all, Thanks for Vitthai your reply leads me to the solution. Here is what i do, hopefully it is considered contribution back to community. 1. ./hadoop job -list all |awk '{ if($2==1) print $1 }' --- to get the list of running JobID 2. ./hadoop job -status JobID -- file: hdfs://xxx.xx.xxx/zz/aa/job.xml 3. ./hadoop fs -cat hdfs://xxx.xx.xxx/zz/aa/job.xml here we got mapred.output.dir 4. ./hadoop job -history mapred.output.dir -- Launched At: 5. Get the time diff between launched at and now. I would be much easier to just scrape the http://jobtracker:50030/jobdetails.jsp?jobid=jobId or there might be some more elegant way of getting this. It's just how i did it this time. Cheers, P On Mon, Oct 3, 2011 at 5:41 PM, Vitthal Suhas Gogate gog...@hortonworks.com wrote: I am not sure there is a easy way to get what you want on command line.. one option is to use following command which would give you verbose job history where you can find submit, Launch Finish time (including duration on FinishTime line). I am using hadoop-0.20.205.0 branch. So check if you have some such option for the version of hadoop you are using... I am pasting sample output for my wordcount program, bin/hadoop job -history job_output_directory_on_hdfs == horton-mac:hadoop-0.20.205.0 vgogate$ bin/hadoop job -history output Warning: $HADOOP_HOME is deprecated. Hadoop job: 0001_1317688277686_vgogate = Job tracker host name: job job tracker start time: Sun May 16 08:53:51 PDT 1976 User: vgogate JobName: word count JobConf: hdfs://horton-mac.local:54310/tmp/mapred/staging/vgogate/.staging/job_201110031726_0001/job.xml Submitted At: 3-Oct-2011 17:31:17 Launched At: 3-Oct-2011 17:31:17 (0sec) Finished At: 3-Oct-2011 17:31:50 (32sec) Status: SUCCESS Counters: |Group Name|Counter name |Map Value |Reduce Value|Total Value| --- |Job Counters |Launched reduce tasks |0 |0 |1 |Job Counters |SLOTS_MILLIS_MAPS |0 |0 |12,257 |Job Counters |Total time spent by all reduces waiting after reserving slots (ms)|0 |0 |0 |Job Counters |Total time spent by all maps waiting after reserving slots (ms)|0 |0 |0 |Job Counters |Launched map tasks|0 |0 |1 |Job Counters |Data-local map tasks |0 |0 |1 |Job Counters |SLOTS_MILLIS_REDUCES |0 |0 |10,082 |File Output Format Counters |Bytes Written |0 |61,192|61,192 |FileSystemCounters|FILE_BYTES_READ |0 |70,766|70,766 |FileSystemCounters|HDFS_BYTES_READ |112,056 |0 |112,056 |FileSystemCounters|FILE_BYTES_WRITTEN|92,325 |92,294|184,619 |FileSystemCounters|HDFS_BYTES_WRITTEN|0 |61,192|61,192 |File Input Format Counters|Bytes Read|111,933 |0 |111,933 |Map-Reduce Framework |Reduce input groups |0 |2,411 |2,411 |Map-Reduce Framework |Map output materialized bytes |70,766 |0 |70,766 |Map-Reduce Framework |Combine output records|2,411 |0 |2,411 |Map-Reduce Framework |Map input records |2,643 |0 |2,643 |Map-Reduce Framework |Reduce shuffle bytes |0 |0 |0 |Map-Reduce Framework |Reduce output records |0 |2,411 |2,411 |Map-Reduce Framework |Spilled Records |2,411 |2,411 |4,822 |Map-Reduce Framework |Map output bytes |120,995 |0 |120,995 |Map-Reduce Framework |Combine input records |5,849 |0 |5,849 |Map-Reduce Framework |Map output records|5,849 |0 |5,849 |Map-Reduce Framework |SPLIT_RAW_BYTES |123 |0 |123 |Map-Reduce Framework |Reduce input records |0 |2,411 |2,411 = Task Summary KindTotalSuccessfulFailedKilledStartTimeFinishTime Setup11003-Oct-2011 17:31:203-Oct-2011 17:31:24 (4sec) Map11003-Oct-2011 17:31:263-Oct-2011 17:31:30 (4sec) Reduce11003-Oct-2011 17:31:323-Oct-2011 17:31:42 (10sec) Cleanup11003-Oct-2011 17:31:443-Oct-2011 17:31:48 (4sec) Analysis = Time taken by best performing map task task_201110031726_0001_m_00: 4sec Average time taken by map tasks: 4sec Worse performing
Monitoring Slow job.
Hi Hadoopers, I am writing script to detect if there is any running job has been running longer than X hours. So far, I use ./hadoop job -jt jobtracker:port -list all |awk '{ if($2==1) print $1 }' --- to get the list of running JobID I am finding the way to get how long the job has been running from jobId. In the web admin page (http://job:50030), we can easily see the duration of each jobId pretty easily from *Started at:, ** Running for:*. of each running job. How do we get the such an information from command line? hope this make sense. Thanks you in advance, -P
Re: Monitoring Slow job.
I am not sure there is a easy way to get what you want on command line.. one option is to use following command which would give you verbose job history where you can find submit, Launch Finish time (including duration on FinishTime line). I am using hadoop-0.20.205.0 branch. So check if you have some such option for the version of hadoop you are using... I am pasting sample output for my wordcount program, bin/hadoop job -history job_output_directory_on_hdfs == horton-mac:hadoop-0.20.205.0 vgogate$ bin/hadoop job -history output Warning: $HADOOP_HOME is deprecated. Hadoop job: 0001_1317688277686_vgogate = Job tracker host name: job job tracker start time: Sun May 16 08:53:51 PDT 1976 User: vgogate JobName: word count JobConf: hdfs://horton-mac.local:54310/tmp/mapred/staging/vgogate/.staging/job_201110031726_0001/job.xml Submitted At: 3-Oct-2011 17:31:17 Launched At: 3-Oct-2011 17:31:17 (0sec) Finished At: 3-Oct-2011 17:31:50 (32sec) Status: SUCCESS Counters: |Group Name|Counter name |Map Value |Reduce Value|Total Value| --- |Job Counters |Launched reduce tasks |0 |0 |1 |Job Counters |SLOTS_MILLIS_MAPS |0 |0 |12,257 |Job Counters |Total time spent by all reduces waiting after reserving slots (ms)|0 |0 |0 |Job Counters |Total time spent by all maps waiting after reserving slots (ms)|0 |0 |0 |Job Counters |Launched map tasks|0 |0 |1 |Job Counters |Data-local map tasks |0 |0 |1 |Job Counters |SLOTS_MILLIS_REDUCES |0 |0 |10,082 |File Output Format Counters |Bytes Written |0 |61,192|61,192 |FileSystemCounters|FILE_BYTES_READ |0 |70,766|70,766 |FileSystemCounters|HDFS_BYTES_READ |112,056 |0 |112,056 |FileSystemCounters|FILE_BYTES_WRITTEN|92,325 |92,294|184,619 |FileSystemCounters|HDFS_BYTES_WRITTEN|0 |61,192|61,192 |File Input Format Counters|Bytes Read|111,933 |0 |111,933 |Map-Reduce Framework |Reduce input groups |0 |2,411 |2,411 |Map-Reduce Framework |Map output materialized bytes |70,766 |0 |70,766 |Map-Reduce Framework |Combine output records|2,411 |0 |2,411 |Map-Reduce Framework |Map input records |2,643 |0 |2,643 |Map-Reduce Framework |Reduce shuffle bytes |0 |0 |0 |Map-Reduce Framework |Reduce output records |0 |2,411 |2,411 |Map-Reduce Framework |Spilled Records |2,411 |2,411 |4,822 |Map-Reduce Framework |Map output bytes |120,995 |0 |120,995 |Map-Reduce Framework |Combine input records |5,849 |0 |5,849 |Map-Reduce Framework |Map output records|5,849 |0 |5,849 |Map-Reduce Framework |SPLIT_RAW_BYTES |123 |0 |123 |Map-Reduce Framework |Reduce input records |0 |2,411 |2,411 = Task Summary KindTotalSuccessfulFailedKilledStartTimeFinishTime Setup11003-Oct-2011 17:31:203-Oct-2011 17:31:24 (4sec) Map11003-Oct-2011 17:31:263-Oct-2011 17:31:30 (4sec) Reduce11003-Oct-2011 17:31:323-Oct-2011 17:31:42 (10sec) Cleanup11003-Oct-2011 17:31:443-Oct-2011 17:31:48 (4sec) Analysis = Time taken by best performing map task task_201110031726_0001_m_00: 4sec Average time taken by map tasks: 4sec Worse performing map tasks: TaskIdTimetaken task_201110031726_0001_m_00 4sec The last map task task_201110031726_0001_m_00 finished at (relative to the Job launch time): 3-Oct-2011 17:31:30 (12sec) Time taken by best performing shuffle task task_201110031726_0001_r_00: 7sec Average time taken by shuffle tasks: 7sec Worse performing shuffle tasks: TaskIdTimetaken task_201110031726_0001_r_00 7sec The last shuffle task task_201110031726_0001_r_00 finished at (relative to the Job launch time): 3-Oct-2011 17:31:39 (21sec) Time taken by best performing reduce task task_201110031726_0001_r_00: 2sec Average time taken by reduce tasks: 2sec Worse performing reduce tasks: TaskIdTimetaken task_201110031726_0001_r_00 2sec The last reduce task task_201110031726_0001_r_00 finished at (relative to the Job launch time): 3-Oct-2011 17:31:42