[
https://issues.apache.org/jira/browse/HADOOP-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-2244:
--------------------------------
Status: Open (was: Patch Available)
Michael, there is one problem with the patch. The testcase would pass even
without the change to the readFields method in MapWritable. This is because you
are reading the same key over and over again in the for loop (just for clarity,
note that the 'instance' field in MapWritable is a Map and not a List). Rather,
each iteration of the loop should read a different key.
> MapWritable.readFields needs to clear internal hash else instance accumulates
> entries forever
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-2244
> URL: https://issues.apache.org/jira/browse/HADOOP-2244
> Project: Hadoop
> Issue Type: Bug
> Components: io
> Reporter: stack
> Assignee: stack
> Fix For: 0.16.0
>
> Attachments: hadoop-2244.patch
>
>
> A common framework pattern is to get an instance of a Writable, usually by
> reflection, and then just keep calling readFields to make new 'instances' of
> the particular Writable.
> For example, the spill-to-disk that is run at the end of a map task gets
> instances of map output keys and values and then loops over the (sorted) map
> output calling readFields to make instances to write out to the filesystem
> (See around line #470 in the spill method).
> If the particular Writable is an instance of MapWritable, currently we get
> funny results. It has an internal hash map that is created on instantiation.
> Each time the readFields method is called, the newly deserialized entries
> are added to the internal map. The map needs to be reset when readFields is
> called so it doesn't just keep growing ad infinitum.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.