Re: [jira] Updated: (HADOOP-1014) map/reduce is corrupting data between map and reduce

Nt Never Thu, 15 Feb 2007 08:21:30 -0800

You are totally right, my bad. Your patched version passes all the JUnit
tests now. I will now test it on my largest jobs and compare with 0.9.2.
Should take about 7-8 hours. Thanks.


On 2/15/07, Devaraj Das (JIRA) <[EMAIL PROTECTED]> wrote:



     [
https://issues.apache.org/jira/browse/HADOOP-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

Devaraj Das updated HADOOP-1014:
--------------------------------

    Attachment: zero-size-inmem-fs.patch
                TestMapRed.java

Riccardo, the problem with your testcase was that in the "readFields"
method of the WritableWrapper class you were not setting the "type" field so
that "write" would always write the value '0' for 'type' in the map output
file. Hence the reduces won't get the intended map outputs. I have attached
the fixed TestMapRed.java.
In your map method, you instantiate a WritableWrapper object which has the
type field correctly set. After that you do output.collect which would
have different behaviors in the versions 0.9 and 0.10+. In the version 0.9,
the output.collect will write the data directly to the final map output
file. However, with the 0.10+ versions, the output gets buffered and
written later and there is deserialization/serialization happening when the
output is finally written to disk from the buffer. In your code, the
deserialization code (readFields) was not setting the type field and hence
the serialization (write) would always write 0 (type - INT_WRITABLE) for the
type field. The reducer, thus, would never see UTF8.

Also attached a patch that would disable inmem merge (basically sets the
buffer size for the ramfs to 0, and does some checks for that). This should
remove the blocker.

Mike, Albert and Riccardo - please comment whether this solves the issues
you reported for the time being. I will continue to debug the inmem merge.
Must be some race condition somewhere since the failures are not consistent.
Thanks.

> map/reduce is corrupting data between map and reduce
> ----------------------------------------------------
>
>                 Key: HADOOP-1014
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1014
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.11.2
>
>         Attachments: TestMapRed.java, TestMapRed.patch,
TestMapRed2.patch, zero-size-inmem-fs.patch
>
>
> It appears that a random data corruption is happening between the map
and the reduce. This looks to be a blocker until it is resolved. There were
two relevant messages on hadoop-dev:
> from Mike Smith:
> The map/reduce jobs are not consistent in hadoop 0.11 release and trunk
both
> when you rerun the same job. I have observed this inconsistency of the
map
> output in different jobs. A simple test to double check is to use hadoop
> 0.11 with nutch trunk.
> from Albert Chern:
> I am having the same problem with my own map reduce jobs.  I have a job
> which requires two pieces of data per key, and just as a sanity check I
make
> sure that it gets both in the reducer, but sometimes it doesn't.  What's
> even stranger is, the same tasks that complain about missing key/value
pairs
> will maybe fail two or three times, but then succeed on a subsequent
try,
> which leads me to believe that the bug has to do with randomization (I'm
not
> sure, but I think the map outputs are shuffled?).
> All of my code works perfectly with 0.9, so I went back and just
compared
> the sizes of the outputs.  For some jobs, the outputs from 0.11 were
> consistently 4 bytes larger, probably due to changes in
SequenceFile.  But
> for others, the output sizes were all over the place.  Some partitions
were
> empty, some were correct, and some were missing data.  There seems to be
> something seriously wrong with 0.11, so I suggest you use 0.9.  I've
been
> trying to pinpoint the bug but its random nature is really annoying.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (HADOOP-1014) map/reduce is corrupting data between map and reduce

Reply via email to