Hadoop will make sure that every <k,v> pair with same key will land up in same reducer and consumed in a single reduce instance.
-----Original Message----- From: Nipun Saggar [mailto:nipun.sag...@gmail.com] Sent: Tuesday, August 25, 2009 10:41 AM To: common-user@hadoop.apache.org Subject: Re: Hadoop streaming: How is data distributed from mappers to reducers? Does that mean that, if the same key is emitted more than once from a mapper, it is not necessary that the key value pairs (for that same key) will go to the same reducer? -Nipun On Tue, Aug 25, 2009 at 6:13 AM, Aaron Kimball <aa...@cloudera.com> wrote: > Yes. It works just like Java-based MapReduce in that regard. > - Aaron > > On Sun, Aug 23, 2009 at 5:09 AM, Nipun Saggar <nipun.sag...@gmail.com > >wrote: > > > Hi all, > > > > I have recently started using Hadoop streaming. From the documentation, I > > understand that by default, each line output from a mapper up to the > first > > tab becomes the key and rest of the line is the value. I wanted to know > > that > > between the mapper and reducer, is there a shuffling(sorting) phase? More > > specifically, Would it be correct to assume that output from all mappers > > with the same key will go to the same reducer? > > > > Thanks, > > Nipun > > >