RE: Best way to write multiple files from a MR job?

Saranath Raghavan Tue, 03 Mar 2009 19:12:42 -0800

This should help.

String jobId = jobConf.get("mapred.job.id");
String taskId = jobConf.get("mapred.task.partition");
String filename = "file_" + jobId + "_" + taskId;

- Saranath

-----Original Message-----
From: Stuart White [mailto:stuart.whi...@gmail.com] 
Sent: Tuesday, March 03, 2009 6:50 PM
To: core-user@hadoop.apache.org
Subject: Best way to write multiple files from a MR job?

I have a large amount of data, from which I'd like to extract multiple
different types of data, writing each type of data to different sets
of output files.  What's the best way to accomplish this?  (I should
mention, I'm only using a mapper.  I have no need for sorting or
reduction.)

Of course, if I only wanted 1 output file, I can just .collect() the
output from my mapper and let mapreduce write the output for me.  But,
to get multiple output files, the only way I can see is to manually
write the files myself from within my mapper.  If that's the correct
way, then how can I get a unique filename for each mapper instance?
Obviously hadoop has solved this problem, because it writes out its
partition files (part-00000, etc...) with unique numbers.  Is there a
way for my mappers to get this unique number being used so they can
use it to ensure a unique filename?

Thanks!

RE: Best way to write multiple files from a MR job?

Reply via email to