Koji - That makes a lot of sense. The two tasks are probably stepping over each other. I'll give it a try and let you know how it goes.
Malcolm - if you turned off speculative execution and are still getting the problem, it doesn't sound the same. Do you want to do a cut&paste of your reduce code and I'll see if I can spot anything suspicious? On Mon, Mar 2, 2009 at 1:15 PM, Malcolm Matalka < mmata...@millennialmedia.com> wrote: > I have a situation which may be related. I am running hadoop 0.18.1. I > am on a cluster with 5 machines and testing on very small input of 10 > lines. Mapper produces either 1 or 0 output per line of input yet > somehow I get 18 lines of output from the reducer. For example I have > one input where the key is: > fd349fc441ff5e726577aeb94cceb1e4 > > However, I added a print to the reducer to print keys right before > calling output.collect and I have 3 instances of this key being printed. > > I have turned speculative execution off and still get this. > > Does this sound related? A known bug? Something I'm missing? Fixed in > 19.1? > > - Malcolm > > > -----Original Message----- > From: Koji Noguchi [mailto:knogu...@yahoo-inc.com] > Sent: Monday, March 02, 2009 15:59 > To: core-user@hadoop.apache.org > Subject: RE: Potential race condition (Hadoop 18.3) > > Ryan, > > If you're using getOutputPath, try replacing it with getWorkOutputPath. > > http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/ > FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf<http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/%0AFileOutputFormat.html#getWorkOutputPath%28org.apache.hadoop.mapred.JobConf> > ) > > Koji > > -----Original Message----- > From: Ryan Shih [mailto:ryan.s...@gmail.com] > Sent: Monday, March 02, 2009 11:01 AM > To: core-user@hadoop.apache.org > Subject: Potential race condition (Hadoop 18.3) > > Hi - I'm not sure yet, but I think I might be hitting a race condition > in > Hadoop 18.3. What seems to happen is that in the reduce phase, some of > my > tasks perform speculative execution but when the initial task completes > successfully, it sends a kill to the new task started. After all is said > and > done, perhaps one in every five or ten which kill their second task ends > up > with zero or truncated output. When I code it to turn off speculative > execution, the problem goes away. Are there known race conditions that I > should be aware of around this area? > > Thanks in advance, > Ryan >