subject:"\[jira\] Updated\: \(MAPREDUCE\-1309\) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats"

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-07-06 Thread Hong Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1309:
-

Attachment: mr-1309-yhadoop-20.10.patch

patch for yahoo hadoop 20.10. not to be committed.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tools/rumen
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, 
> mr-1309-yhadoop-20.10.patch, rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-07-02 Thread Amar Kamat (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1309:
--

Component/s: tools/rumen

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tools/rumen
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, 
> rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-06-24 Thread Hong Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1309:
-

Attachment: rumen-yhadoop-20.patch

Backport to hadoop 20.1xx branch. Not to be committed.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.21.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch, 
> rumen-yhadoop-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-18 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1309:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Dick!

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-17 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-17 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-17.patch

got rid of two javadocs errors, and a couple of unused fields in LoggedTask

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-17 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-16 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-16-a.patch

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-16 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
> mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-16 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-16 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-16 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-16.patch

This patch is a response to Hong's comments.  

Indeed they were minor, and the fixes are very simple.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-16 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-12 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-12.patch

This fixes a null pointer exception in TraceBuilder.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-12 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-12 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

We discovered a corner case that generates a null pointer exception.

I wrote a simple fix.  I will withdraw this patch, and provide a new one that 
incrporats that fix.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
> mapreduce-1309--2010-02-12.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-10 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-10.patch

created the read loop described above

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-10 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-10 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

There was a problem that I discovered in a bulk test.

The main change in the patch is

{noformat}
   input.mark(bufferSize + 1);
 
   int actualRead = input.read(buffer);
+  int mostRecentRead = actualRead;
+
+  while (actualRead < bufferSize && mostRecentRead > 0) {
+mostRecentRead =
+input.read(buffer, actualRead, bufferSize - actualRead);
+
+if (mostRecentRead > 0) {
+  actualRead += mostRecentRead;
+}
+  }
 
   if (actualRead < markerBytes.length) {
 input.reset();
{noformat}

{{BufferedInputStream.read(byte[])}} does NOT read as much as possible as I 
expected.  It seems to stop at disk block boundaries [but a new read will steam 
on].

This patch clears this problem and only this problem, and is extremely unlikely 
to introduce new ones.

-dk


> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-04 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-04 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-04.patch

Replacement patch --- see previous comment

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-04 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-03 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-03 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-02-03.patch

This is the new patches.  The main changes are new test cases on small 
components of rumen, changing mainclass to TraceBuilder

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-03 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

I've gotten a code review and I've incorporated some suggestions

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-20 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-20 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2010-01-20.patch

This patch file reflects the small changes suggested.

None of them rises to the level of a major change.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-20 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

I made a few cosmetic changes based on a review

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-14 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-14 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2009-01-14-a.patch

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-14 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-14 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-14 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: mapreduce-1309--2009-01-14.patch

fixed a bug that the launch time and finish time were getting confused with 
each other

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-13 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

I discovered a bug.

I expect to have a fixed version of this patch in place by about 10AM PST 1/14 .

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-11 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

fix a bug introduced in some other patch that broke my test case.  Unless this 
happened again, it should work this time.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-11 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2010-01-11.patch

Line 143 of LoggedTaskAttempt.java was changed to read

   this.hostName = hostName.intern();

which introduces a bug which breaks my test case.  It fails when you read a 
null hostName in a json string.

I added the fix to this patch instead of making a separate patch for that issue.

   this.hostName = (hostName == null ? null :hostName.intern());


> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-11 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

this patch is incompatible with another patch.  See the patch reintroduction 
comment.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2010-01-08-d.patch

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2010-01-08-c.patch

This replacement patch fixes the problems noted by Hudson.  Please note that I 
request the following variances:

   * The javac warnings refer to deprecated interfaces that will require a lot 
of study to remove, and are being used ubiquitously.  Counters, mostly.
   * The release audit refers to files containing test cases, which are in 
.json and cannot receive a banner headline.
   * The failed core tests include other subsystems which I believe are known 
failures.

I did fix the other problems.


> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

Cancel this patch to make room for a new one.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

fix dropped import

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2010-01-08-b.patch

fix dropped import in a test case

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2010-01-08.patch

I made a couple of redundant changes to Trunk, in 1295 and here.  I removed 
them.


> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-08 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

There were a couple of changes in Trunk from one of my other patches [1295] 
while I was making the patch. 

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-07 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

Resubmitting the same patch a second time in the hopes that Hudson will notice 
it this time.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-07 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

Hudson hasn't run on this in over a day.  I cancelled this and will resubmit it 
to give Hudson another kick

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-06 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-01-06 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2010-01-06.patch

This fixes the compilationn problem resulting from the change in the jobhistory 
Events interface.

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2009-12-22 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1309:
-

Status: Open  (was: Patch Available)

Unfortunately, the patch does not compile against trunk (related to 
MAPREDUCE-1016?):
{noformat}
 [exec] compile-tools:
 [exec] [javac] Compiling 69 source files to 
/grid/0/hudson/hudson-slave/workspace/\
 Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/build/tools
 [exec] [javac] 
/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h6.grid.sp2.yahoo.net/trunk/\
 src/tools/org/apache/hadoop/tools/rumen/LoggedTask.java:178: cannot find 
symbol
 [exec] [javac] symbol  : class Counters
 [exec] [javac] location: interface 
org.apache.hadoop.mapreduce.jobhistory.Events
 [exec] [javac]   private void incorporateMapCounters(Events.Counters 
counters) {
{noformat}

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2009-12-21 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Attachment: demuxer-plus-concatenated-files--2009-12-21.patch

This patch implements a universal gridmix3/mumak trace generator.

It differs from previous versions of rumen in three ways:

1: The mainclass is o.a.h.tools.rumen.Driver

2: This tool is specialized to make traces.  Future statisticsengines will be 
trace-based

3: The argument list is more austere.  There are three or more arguments:

3a: the trace output, a {{Path}} , compressed or not

3b: the topology output, again a {{Path}} , again compressed or not

3c: any number of {{Path}} names, each of which can be compressed or not, and 
each of which can be a config.xml file, a job tracker log [ {{Driver}} 
determines the version ], or a directory filled with such files.



> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2009-12-21 Thread Dick King (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1309:
-

Status: Patch Available  (was: Open)

> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -
>
> Key: MAPREDUCE-1309
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Dick King
>Assignee: Dick King
> Attachments: demuxer-plus-concatenated-files--2009-12-21.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

56 matches

Mail list logo