[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Dick King (JIRA) Wed, 10 Feb 2010 12:23:50 -0800

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dick King updated MAPREDUCE-1309:
---------------------------------

    Status: Open  (was: Patch Available)

There was a problem that I discovered in a bulk test.

The main change in the patch is

{noformat}
       input.mark(bufferSize + 1);
 
       int actualRead = input.read(buffer);
+      int mostRecentRead = actualRead;
+
+      while (actualRead < bufferSize && mostRecentRead > 0) {
+        mostRecentRead =
+            input.read(buffer, actualRead, bufferSize - actualRead);
+
+        if (mostRecentRead > 0) {
+          actualRead += mostRecentRead;
+        }
+      }
 
       if (actualRead < markerBytes.length) {
         input.reset();
{noformat}

{{BufferedInputStream.read(byte[])}} does NOT read as much as possible as I 
expected.  It seems to stop at disk block boundaries [but a new read will steam 
on].

This patch clears this problem and only this problem, and is extremely unlikely 
to introduce new ones.

-dk


> I want to change the rumen job trace generator to use a more modular internal 
> structure, to allow for more input log formats 
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1309
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>         Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
> demuxer-plus-concatenated-files--2010-01-06.patch, 
> demuxer-plus-concatenated-files--2010-01-08-b.patch, 
> demuxer-plus-concatenated-files--2010-01-08-c.patch, 
> demuxer-plus-concatenated-files--2010-01-08-d.patch, 
> demuxer-plus-concatenated-files--2010-01-08.patch, 
> demuxer-plus-concatenated-files--2010-01-11.patch, 
> mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
> mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
> mapreduce-1309--2010-02-04.patch
>
>
> There are two orthogonal questions to answer when processing a job tracker 
> log: how will the logs and the xml configuration files be packaged, and in 
> which release of hadoop map/reduce were the logs generated?  The existing 
> rumen only has a couple of answers to this question.  The new engine will 
> handle three answers to the version question: 0.18, 0.20 and current, and two 
> answers to the packaging question: separate files with names derived from the 
> job ID, and concatenated files with a header between sections [used for 
> easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

Reply via email to