[ 
https://issues.apache.org/jira/browse/HDFS-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726240#action_12726240
 ] 

Konstantin Shvachko commented on HDFS-459:
------------------------------------------

> If JHLA does analysis across set of M/R jobs over a given time range, it can 
> be added as another offline analysis tool

Yes JHLA analyzes history logs of multiple MR jobs over a time range.

> Should this be part of the MAPREDUCE?

The is based on the framework defined for TestDFSIO as everything else in 
hdfs-with-mr subproject. This was the reason why JHLA is where it is. I was 
thinking that TestDFSIO and related classes tools here should be actually move 
to benchmarks. Because this is what they are. But this is not a part of this 
patch.

> There is already a job history analyzer contributed to hadoop, called hadoop 
> vaidya.

Sure there are different ways and motivations to analyze history logs.
This approach is trying to capture some characteristics, which would reflect 
the load on the cluster based on all jobs ran on the cluster during a period of 
time. The results are in very simple table-like format so that they could be 
processed by Excel of R system. I'll attache some pictures to demonstrate the 
final output.

> Job History Log Analyzer
> ------------------------
>
>                 Key: HDFS-459
>                 URL: https://issues.apache.org/jira/browse/HDFS-459
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 0.21.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.21.0
>
>         Attachments: JHLA-description.html, JHLA.patch
>
>
> Job History Log Analyzer parses and analyzes history logs of map-reduce jobs. 
> History logs contain information about execution of jobs, tasks, and 
> attempts. The tool focuses on submission, launch, start, and finish times, as 
> well as the success or failure of jobs, tasks and attempt.
> The analyzer calculates _per hour slot utilization_ and _pending times_ on 
> clusters running map-reduce jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to