[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2010-07-02 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-751:
-

Component/s: tools/rumen
 (was: jobtracker)

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: tools/rumen
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.21.0

 Attachments: 2009-08-19--1030.patch, 2009-08-26--1513-patch.patch, 
 mapreduce-751--2009-07-23.patch, mapreduce-751-20090826.patch, 
 mapreduce-751-20090826.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-08-26 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-751:


Attachment: 2009-08-26--1513-patch.patch

This is a new patch for rumen.  It replaces the previous one, incorporating the 
comments raised by test-patch.

Here is the new test-patch output summary:

{noformat}

 [exec] 
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 38 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] -1 javac.  The applied patch generated 2226 javac compiler 
warnings (more than the trunk's current 2220 warnings).
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 215 release 
audit warnings (more than the trunk's current 202 warnings).
 [exec] 
{noformat}

The javac warnings are deprication warnings.  We are using JobConf in this 
version of rumen.  We expect to fix this in a future release to use the new 
interface.

The release audit warnings are places we don't have the Apache License.  These 
are .json input files used in the test cases.  JSON does not define a comment 
format.  Although some JSON engines have one, obviously if we used one that 
would kill flexibility for little gain.

I fixed the TestZombieJob code.  These were the tests of the new code that 
failed.  The other failed tests were in streaming; a known source of test 
failures.

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.20.1, 0.21.0

 Attachments: 2009-08-19--1030.patch, 2009-08-26--1513-patch.patch, 
 mapreduce-751--2009-07-23.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-08-26 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-751:


Fix Version/s: (was: 0.20.1)

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.21.0

 Attachments: 2009-08-19--1030.patch, 2009-08-26--1513-patch.patch, 
 mapreduce-751--2009-07-23.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-08-26 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-751:


Status: Open  (was: Patch Available)

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.21.0

 Attachments: 2009-08-19--1030.patch, 2009-08-26--1513-patch.patch, 
 mapreduce-751--2009-07-23.patch, mapreduce-751-20090826.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-08-26 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-751:


Attachment: mapreduce-751-20090826.patch

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.21.0

 Attachments: 2009-08-19--1030.patch, 2009-08-26--1513-patch.patch, 
 mapreduce-751--2009-07-23.patch, mapreduce-751-20090826.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-08-26 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-751:


Status: Patch Available  (was: Open)

Added the suggested fixes based on previous submission. Approved by Dick King. 
:)

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.21.0

 Attachments: 2009-08-19--1030.patch, 2009-08-26--1513-patch.patch, 
 mapreduce-751--2009-07-23.patch, mapreduce-751-20090826.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-08-19 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-751:


Attachment: 2009-08-19--1030.patch

This is the patch that implements Rumen.  It is licensed to Apache.

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Dick King
 Attachments: 2009-08-19--1030.patch, mapreduce-751--2009-07-23.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-751) Rumen: a tool to extract job characterization data from job tracker logs

2009-07-23 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-751:


Attachment: mapreduce-751--2009-07-23.patch

This is a preliminary patch to gather early feedback on this functionality.

It works, but there are some areas I'm working on -- general code cleanup, 
mostly.  Its functionality is complete.  Although there are forseeable 
enhancements, they will be called out in their own JIRAs.

 Rumen: a tool to extract job characterization data from job tracker logs
 

 Key: MAPREDUCE-751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Dick King
 Attachments: mapreduce-751--2009-07-23.patch


  We propose a new map/reduce component, rumen, which can be used to process 
 job history logs to produce any or all of the following:
   * Retrospective info describing the statistical behavior of the
 amount of time it would have taken to launch a job into a certain
 percentage of the number of mapper slots in the log's cluster, given the
 load over the period covered by the log
   * Statistical info as to the runtimes and shuffle times, etc. of
 the tasks and jobs covered by the log
   * files describing detailed job trace information, and the
 network topology as inferred from the host locations and rack IDs that
 arise in the job tracker log.  In addition to this facility, rumen
 includes readers for this information to return job and detailed task
 information to other tools.
 These other tools include a more advanced version of gridmix, and 
 also includes mumak: see blocked issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.