RE: Best way to write multiple files from a MR job?

2009-03-03 Thread Saranath Raghavan
This should help.

String jobId = jobConf.get(mapred.job.id);
String taskId = jobConf.get(mapred.task.partition);
String filename = file_ + jobId + _ + taskId;

- Saranath

-Original Message-
From: Stuart White [mailto:stuart.whi...@gmail.com] 
Sent: Tuesday, March 03, 2009 6:50 PM
To: core-user@hadoop.apache.org
Subject: Best way to write multiple files from a MR job?

I have a large amount of data, from which I'd like to extract multiple
different types of data, writing each type of data to different sets
of output files.  What's the best way to accomplish this?  (I should
mention, I'm only using a mapper.  I have no need for sorting or
reduction.)

Of course, if I only wanted 1 output file, I can just .collect() the
output from my mapper and let mapreduce write the output for me.  But,
to get multiple output files, the only way I can see is to manually
write the files myself from within my mapper.  If that's the correct
way, then how can I get a unique filename for each mapper instance?
Obviously hadoop has solved this problem, because it writes out its
partition files (part-0, etc...) with unique numbers.  Is there a
way for my mappers to get this unique number being used so they can
use it to ensure a unique filename?

Thanks!




Searching Lucene Index built using Hadoop

2008-10-06 Thread Saranath

I'm trying to index a large dataset using Hadoop+Lucene. I used the example
under hadoop/trunk/src/conrib/index/ for indexing. I'm unable to find a way
to search the index that was successfully built.

I tried copying over the index to one machine and merging them using
IndexWriter.addIndexesNoOptimize().

I would like hear your input on the best way to index+search large datasets.

Thanks,
Saranath
-- 
View this message in context: 
http://www.nabble.com/Searching-Lucene-Index-built-using-Hadoop-tp19842438p19842438.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.