[ https://issues.apache.org/jira/browse/HDFS-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726240#action_12726240 ]
Konstantin Shvachko commented on HDFS-459: ------------------------------------------ > If JHLA does analysis across set of M/R jobs over a given time range, it can > be added as another offline analysis tool Yes JHLA analyzes history logs of multiple MR jobs over a time range. > Should this be part of the MAPREDUCE? The is based on the framework defined for TestDFSIO as everything else in hdfs-with-mr subproject. This was the reason why JHLA is where it is. I was thinking that TestDFSIO and related classes tools here should be actually move to benchmarks. Because this is what they are. But this is not a part of this patch. > There is already a job history analyzer contributed to hadoop, called hadoop > vaidya. Sure there are different ways and motivations to analyze history logs. This approach is trying to capture some characteristics, which would reflect the load on the cluster based on all jobs ran on the cluster during a period of time. The results are in very simple table-like format so that they could be processed by Excel of R system. I'll attache some pictures to demonstrate the final output. > Job History Log Analyzer > ------------------------ > > Key: HDFS-459 > URL: https://issues.apache.org/jira/browse/HDFS-459 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 0.21.0 > Reporter: Konstantin Shvachko > Assignee: Konstantin Shvachko > Fix For: 0.21.0 > > Attachments: JHLA-description.html, JHLA.patch > > > Job History Log Analyzer parses and analyzes history logs of map-reduce jobs. > History logs contain information about execution of jobs, tasks, and > attempts. The tool focuses on submission, launch, start, and finish times, as > well as the success or failure of jobs, tasks and attempt. > The analyzer calculates _per hour slot utilization_ and _pending times_ on > clusters running map-reduce jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.