thanks Jie. that worked. instead of part-r-00000, i just get part-m-00000. so, that's no problem.
however, i'm going to see if i still get that IOException complaining about no more free disk space. On Thu, Mar 8, 2012 at 12:19 AM, Jie Li <ji...@cs.duke.edu> wrote: > You don't need to specify the reducer at all. > > Yeah the map output will go to HDFS directly. It's called map-only job. > > Jie > > On Thursday, March 8, 2012, Jane Wayne wrote: > > > Jie, > > > > so if if i set the number of reduce tasks to 0, do i need to specify the > > reducer (or should i set it null)? if i don't specify the reducer, and > just > > have a mapper, where do all the mapper output key-value pair go to? do > they > > get serialized to disk/HDFS automagically? > > > > On Thu, Mar 8, 2012 at 12:02 AM, Jie Li <ji...@cs.duke.edu<javascript:;>> > > wrote: > > > > > Hi Jane, > > > > > > The default Reducer (IdentityReducer) would simply read/write > everything > > > that goes through it. By default Shuffling would also happen and the > map > > > output data is partitioned by the HashPartitioner. > > > > > > If you don't need the shuffle/reduce, you need to explicitly set the > > number > > > of the reduce tasks to zero via JobConf's setNumReduceTasks(int num). > > > > > > Hope that helps. > > > > > > Jie > > > > > > On Wed, Mar 7, 2012 at 11:28 PM, Jane Wayne <jane.wayne2...@gmail.com > <javascript:;> > > > >wrote: > > > > > > > i have a Mapper and Reducer as a part of a job. all my data > > > transformation > > > > occurs in the mapper, and there is absolutely nothing that needs to > be > > > done > > > > in the reducer. when i set the reducer on the Job, i simply use the > > > > Reducer.class. > > > > > > > > i notice that after the mapper tasks have reached 100%, then the time > > > until > > > > reducing starts is very long. when reducing starts then i get a > > > > java.io.IOException: No space left on deviceFSError. i checked the > dfs > > > > health (via web page), and i still have 42.41% DFS remaining. why > does > > > this > > > > occur? i see that eventually 4 attempts are made to call Reducer, > > > however, > > > > they all end up with the IOException mentioned. at the bottom is an > > > output. > > > > notice that the percentage goes up then back down to 0% before the > > > > IOException. > > > > > > > > also, i want to know if i can just subclass Reducer or do something > > about > > > > shuffling and sorting as these steps are not important. i just want > > each > > > > record emitted from the Mapper to go straight to disk. is it possible > > to > > > do > > > > this without going through Reducer? i am thinking this is part of the > > > > problem for taking so long between 100% map and the first sign of > > reduce. > > > > > > > > EXAMPLE OUTPUT > > > > > > > > 12/03/07 22:38:45 INFO mapred.JobClient: map 98% reduce 0% > > > > 12/03/07 22:39:18 INFO mapred.JobClient: map 99% reduce 0% > > > > 12/03/07 22:39:43 INFO mapred.JobClient: map 100% reduce 0% > > > > 12/03/07 22:58:14 INFO mapred.JobClient: map 100% reduce 1% > > > > 12/03/07 22:58:23 INFO mapred.JobClient: map 100% reduce 3% > > > > 12/03/07 22:58:38 INFO mapred.JobClient: map 100% reduce 6% > > > > 12/03/07 22:58:57 INFO mapred.JobClient: map 100% reduce 7% > > > > 12/03/07 22:59:21 INFO mapred.JobClient: map 100% reduce 9% > > > > 12/03/07 23:00:00 INFO mapred.JobClient: map 100% reduce 10% > > > > 12/03/07 23:00:09 INFO mapred.JobClient: map 100% reduce 12% > > > > 12/03/07 23:00:58 INFO mapred.JobClient: map 100% reduce 0% > > > > 12/03/07 23:01:00 INFO mapred.JobClient: Task Id : > > > > attempt_201203071517_0043_r_000000_0, Status : FAILED > > > > FSError: java.io.IOException: No space left on deviceFSError: > > > > java.io.IOException: No space left on deviceFSError: > > java.io.IOException: > > > > No space left on deviceFSError: java.io.IOException: No space left on > > > > deviceFSError: java.io.IOException: No space left on deviceFSError: > > > > java.io.IOException: No space left on device > > > > attempt_201203071517_0043_r_000000_0: log4j:ERROR Failed to flush > > writer, > > > > attempt_201203071517_0043_r_000000_0: java.io.IOException: No space > > left > > > on > > > > device > > > > 12/03/07 23:01:31 INFO mapred.JobClient: map 100% reduce 1% > > > > 12/03/07 23:01:34 INFO mapred.JobClient: map 100% reduce 3% > > > > 12/03/07 23:01:37 INFO mapred.JobClient: map 100% reduce 4% > > > > 12/03/07 23:01:49 INFO mapred.JobClient: map 100% reduce 6% > > > > 12/03/07 23:01:55 INFO mapred.JobClient: map 100% reduce 7% > > > > 12/03/07 23:02:19 INFO mapred.JobClient: map 100% reduce 9% > > > > 12/03/07 23:02:52 INFO mapred.JobClient: map 100% reduce 0% > > > > 12/03/07 23:02:54 INFO mapred.JobClient: Task Id : > > > > attempt_201203071517_0043_r_000000_1, Status : FAILED > > > > FSError: java.io.IOException: No space left on deviceFSError: > > > > java.io.IOException: No space left on deviceFSError: > > java.io.IOException: > > > > No space left on device > > > > > > > > > >