Re: Map Intermediate key/value pairs written to file system

Ted Dunning Fri, 18 Apr 2008 10:51:15 -0700

Isn't this just what Hadoop does when you set numReduces = 0?


On 4/18/08 10:45 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote:

> Within a task you can get the taskId (which are unique). Define "public void
> configure(JobConf job)" and in that get the taskId by doing
> job.get("mapred.task.id") ).
> 
> Now create filenames starting with that as the prefix and maybe a
> monotonically increasing integer as the suffix (defined as a static field in
> the task).. 
> 
>> -----Original Message-----
>> From: Kayla Jay [mailto:[EMAIL PROTECTED]
>> Sent: Friday, April 18, 2008 10:46 PM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Map Intermediate key/value pairs written to file system
>> 
>> Hi.
>> 
>> I don't know how to create unique individual file names for
>> each mapper's key/value pairs.  How do you create individual
>> files per mappers key/value pairs so they don't overwrite one another?
>> 
>> I.e how do you create a new file each time and use that code
>> for all the mappers and not have each of the mappers trying
>> to overwrite the other's output? If that makes any sense?
>> 
>> For example, at end of map(), i create sequence file for
>> output of the key/value pair.  But, if another mapper is
>> running and it does the same thing, the file gets overwritten
>> b/c this other mapper is creating the exact file name and
>> doing the exact same thing as the other mapper is doing.
>> 
>> Devaraj Das <[EMAIL PROTECTED]> wrote: Will your requirement
>> be addressed if, from within the map method, you create a
>> sequence file using SequenceFile.createWriter api, write a
>> key/value using the writer's append(key,value)   API and then
>> close the file
>> ? You can do this for every key/value.
>> Pls have a look at createWriter APIs and the Writer class in
>> o.a.h.i.SequenceFile..
>> 
>>> -----Original Message-----
>>> From: Kayla Jay [mailto:[EMAIL PROTECTED]
>>> Sent: Friday, April 18, 2008 6:12 PM
>>> To: core-user@hadoop.apache.org
>>> Subject: Map Intermediate key/value pairs written to file system
>>> 
>>> Hi
>>> 
>>> I have no reduces. I would like to directly write my map
>> results while 
>>> they are produced after each map has completed to disk.  I
>> don't want 
>>> to collect then write to output.
>>> 
>>> If I wanted to directly write my map output 1-by-1 (intermediate
>>> key/value pairs) after each map completes into individual files
>>> instead of collecting them until the end then writing them
>> in 1 swoop 
>>> into the composite results file (part-000X), is this
>> possible and how
>>> do I do that?
>>> 
>>> Can I force a write within the map to write the map
>> key/value pairs as
>>> an individual file for each results set instead of
>> output.collect and
>>> having them all the key/value pairs written to the output?
>>> 
>>> I.e I would like the intermediate key/value pairs produced from the
>>> maps to write to disk immediatly than having it to collect
>> in the end 
>>> all of the key/value pairs and writing it out.  I want individual
>>> files per key/value pair produced.
>>> 
>>> Thanks.
>>> 
>>>        
>>> ---------------------------------
>>> Be a better friend, newshound, and know-it-all with Yahoo!
>>> Mobile.  Try it now.
>>> 
>> 
>> 
>> 
>>        
>> ---------------------------------
>> Be a better friend, newshound, and know-it-all with Yahoo!
>> Mobile.  Try it now.
>> 
>

Re: Map Intermediate key/value pairs written to file system

Reply via email to