[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Hong Tang (JIRA) Wed, 21 Jul 2010 11:02:50 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890795#action_12890795
 ]


Hong Tang commented on MAPREDUCE-1918:
--------------------------------------

I think we should also describe (1) the Json objects are created through 
Jackson ObjectMapper from LoggedXXX classes; (2)  the API interface how to 
build LoggedXXX objects, and how to read them.

The basic API flow for creating parsed rumen object is as follows (user's 
responsibility of creating input streams from job conf xml and job history 
logs):
- JobConfigurationParser: parser that parses job conf xml. One instance can be 
reused to parse many job conf xml files.
{code}
        JobConfigurationParser jcp = new 
JobConfigurationParser(interestedProperties); // interestedProperties is a a 
list of keys to be extracted from the job conf xml file.
        Properties parsedProperties = jcp.parse(inputStream); // inputStream is 
the file input stream for the job conf xml file.
{code}
        
- JobHistoryParser: parser that parses job history files. It is an interface 
and actual implementations are defined as enums in JobHistoryParserFactory. One 
can directly use the version matching the the version of job history logs. Or 
she can also use method "canParse()" to detect which parser is suitable for 
parsing the job history logs (following the pattern in TraceBuilder). Create 
one instance to parse a job history log and close it after use.
{code}
        JobHistoryParser parser = new Hadoop20JHParser(inputStream); // 
inputStream is the file input stream for the job history file.
        // JobHistoryParser APIs will be used later when being fed into 
JobBuilder (below).
        parser.close();
{code}

- JobBuilder: builder for LoggedJobs. Create one instance to parse the pairing 
job history log and job conf. The order of parsing conf file or job history 
file is not important.
{code}
        JobBuilder jb = new JobBuilder(jobID); // you will need to extract the 
job ID from the file name: <jobtracker>_job_<timestamp>_<sequence>
        jb.process(jcp.parse(jobConfInputStream));
        JobHistoryParser parser = new Hadoop20JHParser(jobHistoryInputStream);
        try {
                HistoryEvent e;
                while ((e = parser.nextEvent()) != null) {
                        jobBuilder.process(e);
                }
        } finally {
                parser.close();
        }
        LoggedJob job = jb.build();
{code}

>From the reading side, the output produced by TraceBuilder or Folder can be 
>read through JobTraceReader or ClusterTopologyReader. One can also use 
>Jackson's ObjectMapper to parse the json formatted data into LoggedJob or 
>LoggedTopology objects.

> Add documentation to Rumen
> --------------------------
>
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, 
> rumen.pdf, rumen.pdf
>
>
> Add forrest documentation to Rumen tool.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen

Reply via email to