Ah, so my question was not clear. Let me try again. Suppose my map or reduce method invokes output.collect(key, value) and then, after that returns, mutates (side effects) either the key or the value --- what happens? Is this specified, forbidden, or unspecified? Is there a general answer, or is this up to the OutputFormat?
I think you are saying that for SequenceFileOutputFormat and TextOutputFormat, the scenario I outlined is allowed and the original values appear in the output. Have I got that right? If so, what remains to be answered then is whether this is something I have to answer for each OutputFormat or there is a general rule here. Suppose, in particular, that my map or reduce method calls output.collect(key,value) several times in series --- each time passing the same object reference for key, and each time passing the same object reference for value, but modifying those objects between calls on output.collect. I would like to know if this is a supported scenario, with the semantics that what is output is the contents of the key and value objects at the moment output.collect(key,value) is called. Have I overlooked some documentation that answers my question? Thanks, Mike From: Brock Noland <br...@cloudera.com> To: mapreduce-user@hadoop.apache.org Date: 02/12/2012 07:16 PM Subject: Re: Can caller mutate value after calling OutputCollector.collect(key,value) ? Hi On Sun, Feb 12, 2012 at 8:58 PM, Mike Spreitzer <mspre...@us.ibm.com> wrote: > I have a question about the contract in the > org.apache.hadoop.mapred.OutputCollector interface. If it matters, let us > say we are talking about Hadoop-1.0.0. In my map or reduce method, after it > calls output.collect(key,value), is it allowed to mutate (side effect) > either the key or the value? > > If the answer is "it depends on the OutputFormat", then what is the answer > for the more prominent ones, such as SequenceFileFormat and > TextOutputFormat? In both cases the key is serialized immediately so the caller cannot mutate the key after calling collect. http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapred/TextOutputFormat.java http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/mapred/SequenceFileOutputFormat.java http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0/src/core/org/apache/hadoop/io/SequenceFile.java -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/