Ah, so my question was not clear.  Let me try again.  Suppose my map or 
reduce method invokes output.collect(key, value) and then, after that 
returns, mutates (side effects) either the key or the value --- what 
happens?  Is this specified, forbidden, or unspecified?  Is there a 
general answer, or is this up to the OutputFormat?

I think you are saying that for SequenceFileOutputFormat and 
TextOutputFormat, the scenario I outlined is allowed and the original 
values appear in the output.  Have I got that right?  If so, what remains 
to be answered then is whether this is something I have to answer for each 
OutputFormat or there is a general rule here.

Suppose, in particular, that my map or reduce method calls 
output.collect(key,value) several times in series --- each time passing 
the same object reference for key, and each time passing the same object 
reference for value, but modifying those objects between calls on 
output.collect.  I would like to know if this is a supported scenario, 
with the semantics that what is output is the contents of the key and 
value objects at the moment output.collect(key,value) is called.

Have I overlooked some documentation that answers my question?

Thanks,
Mike



From:   Brock Noland <br...@cloudera.com>
To:     mapreduce-user@hadoop.apache.org
Date:   02/12/2012 07:16 PM
Subject:        Re: Can caller mutate value after calling 
OutputCollector.collect(key,value) ?



Hi

On Sun, Feb 12, 2012 at 8:58 PM, Mike Spreitzer <mspre...@us.ibm.com> 
wrote:
> I have a question about the contract in the
> org.apache.hadoop.mapred.OutputCollector interface.  If it matters, let 
us
> say we are talking about Hadoop-1.0.0.  In my map or reduce method, 
after it
> calls output.collect(key,value), is it allowed to mutate (side effect)
> either the key or the value?
>
> If the answer is "it depends on the OutputFormat", then what is the 
answer
> for the more prominent ones, such as SequenceFileFormat and
> TextOutputFormat?

In both cases the key is serialized immediately so the caller cannot
mutate the key after calling collect.

http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapred/TextOutputFormat.java

http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapred/SequenceFileOutputFormat.java

http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/core/org/apache/hadoop/io/SequenceFile.java


-- 
Apache MRUnit - Unit testing MapReduce - 
http://incubator.apache.org/mrunit/


Reply via email to