Owen and Rasit, Thank you for the responses. I've figured that mapred.reduce.tasks was set to 1 in my hadoop-default xml and I didn't overwrite it in my hadoop-site.xml configuration file.
Jim On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley <omal...@apache.org> wrote: > On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wrote: > > Jim, >> >> As far as I know, there is no operation done after Reducer. >> > > Correct, other than output promotion, which moves the output file to the > final filename. > > But if you are a little experienced, you already know these. >> Ordered list means one final file, or am I missing something? >> > > There is no value and a lot of cost associated with creating a single file > for the output. The question is how you want the keys divided between the > reduces (and therefore output files). The default partitioner hashes the key > and mods by the number of reduces, which "stripes" the keys across the > output files. You can use the mapred.lib.InputSampler to generate good > partition keys and mapred.lib.TotalOrderPartitioner to get completely sorted > output based on the partition keys. With the total order partitioner, each > reduce gets an increasing range of keys and thus has all of the nice > properties of a single file without the costs. > > -- Owen >