Re: Merging reducer outputs into a single part-00000 file

Jim Twensky Wed, 14 Jan 2009 16:34:25 -0800

Owen and Rasit,

Thank you for the responses. I've figured that mapred.reduce.tasks was set
to 1 in my hadoop-default xml and I didn't overwrite it in my
hadoop-site.xml configuration file.


Jim

On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley <omal...@apache.org> wrote:

> On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wrote:
>
>  Jim,
>>
>> As far as I know, there is no operation done after Reducer.
>>
>
> Correct, other than output promotion, which moves the output file to the
> final filename.
>
>  But if you  are a little experienced, you already know these.
>> Ordered list means one final file, or am I missing something?
>>
>
> There is no value and a lot of cost associated with creating a single file
> for the output. The question is how you want the keys divided between the
> reduces (and therefore output files). The default partitioner hashes the key
> and mods by the number of reduces, which "stripes" the keys across the
> output files. You can use the mapred.lib.InputSampler to generate good
> partition keys and mapred.lib.TotalOrderPartitioner to get completely sorted
> output based on the partition keys. With the total order partitioner, each
> reduce gets an increasing range of keys and thus has all of the nice
> properties of a single file without the costs.
>
> -- Owen
>

Re: Merging reducer outputs into a single part-00000 file

Reply via email to