Jane, Yes and thats documented: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reducer.html#reduce(K2,%20java.util.Iterator,%20org.apache.hadoop.mapred.OutputCollector,%20org.apache.hadoop.mapred.Reporter)
"The framework will reuse the key and value objects that are passed into the reduce, therefore the application should clone the objects they want to keep a copy of." On Fri, Apr 6, 2012 at 6:26 AM, Jane Wayne <jane.wayne2...@gmail.com> wrote: > i found out what my problem was. apparently, when you iterate over > Iterable<Type> values, that instance of Type is being used over and > over. for example, in my reducer, > > public void reduce(Key key, Iterator<Value> values, Context context) > throws IOException, InterruptedException { > Iterator<Value> it = values.iterator(); > Value a = it.next(); > Value b = it.next(); > } > > the variables, a and b of type Value, will be the same object > instance! i suppose this behavior of the iterator is to optimize > iterating so as to avoid the new operator. > > > > On Thu, Apr 5, 2012 at 4:55 PM, Jane Wayne <jane.wayne2...@gmail.com> wrote: >> i am currently testing my map reduce job on Windows + Cygwin + Hadoop >> v0.20.205. for some strange reason, the list of values (i.e. >> Iterable<T> values) going into the reducer looks all wrong. i have >> tracked the map reduce process with logging statements (i.e. logged >> the input to the map, logged the output from the map, logged the >> partitioner, logged the input to the reducer). at all stages, >> everything looks correct except at the reducer. >> >> is there anyway (using Windows + Cygwin) to view the local map >> outputs before they are shuffled/sorted to the reducer? i need to know >> why the values are incorrect. -- Harsh J