Isn't this just what Hadoop does when you set numReduces = 0?
On 4/18/08 10:45 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote: > Within a task you can get the taskId (which are unique). Define "public void > configure(JobConf job)" and in that get the taskId by doing > job.get("mapred.task.id") ). > > Now create filenames starting with that as the prefix and maybe a > monotonically increasing integer as the suffix (defined as a static field in > the task).. > >> -----Original Message----- >> From: Kayla Jay [mailto:[EMAIL PROTECTED] >> Sent: Friday, April 18, 2008 10:46 PM >> To: core-user@hadoop.apache.org >> Subject: RE: Map Intermediate key/value pairs written to file system >> >> Hi. >> >> I don't know how to create unique individual file names for >> each mapper's key/value pairs. How do you create individual >> files per mappers key/value pairs so they don't overwrite one another? >> >> I.e how do you create a new file each time and use that code >> for all the mappers and not have each of the mappers trying >> to overwrite the other's output? If that makes any sense? >> >> For example, at end of map(), i create sequence file for >> output of the key/value pair. But, if another mapper is >> running and it does the same thing, the file gets overwritten >> b/c this other mapper is creating the exact file name and >> doing the exact same thing as the other mapper is doing. >> >> Devaraj Das <[EMAIL PROTECTED]> wrote: Will your requirement >> be addressed if, from within the map method, you create a >> sequence file using SequenceFile.createWriter api, write a >> key/value using the writer's append(key,value) API and then >> close the file >> ? You can do this for every key/value. >> Pls have a look at createWriter APIs and the Writer class in >> o.a.h.i.SequenceFile.. >> >>> -----Original Message----- >>> From: Kayla Jay [mailto:[EMAIL PROTECTED] >>> Sent: Friday, April 18, 2008 6:12 PM >>> To: core-user@hadoop.apache.org >>> Subject: Map Intermediate key/value pairs written to file system >>> >>> Hi >>> >>> I have no reduces. I would like to directly write my map >> results while >>> they are produced after each map has completed to disk. I >> don't want >>> to collect then write to output. >>> >>> If I wanted to directly write my map output 1-by-1 (intermediate >>> key/value pairs) after each map completes into individual files >>> instead of collecting them until the end then writing them >> in 1 swoop >>> into the composite results file (part-000X), is this >> possible and how >>> do I do that? >>> >>> Can I force a write within the map to write the map >> key/value pairs as >>> an individual file for each results set instead of >> output.collect and >>> having them all the key/value pairs written to the output? >>> >>> I.e I would like the intermediate key/value pairs produced from the >>> maps to write to disk immediatly than having it to collect >> in the end >>> all of the key/value pairs and writing it out. I want individual >>> files per key/value pair produced. >>> >>> Thanks. >>> >>> >>> --------------------------------- >>> Be a better friend, newshound, and know-it-all with Yahoo! >>> Mobile. Try it now. >>> >> >> >> >> >> --------------------------------- >> Be a better friend, newshound, and know-it-all with Yahoo! >> Mobile. Try it now. >> >