Re: get name of file in mapper output directory

Luca Pireddu Mon, 23 May 2011 05:50:15 -0700


The path is defined by the FileOutputFormat in use.  In particular, I think 
this function is responsible:


http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html#getDefaultWorkFile(org.apache.hadoop.mapreduce.TaskAttemptContext,
 
java.lang.String)

It should give you the file path before all tasks have completed and the output 
is committed to the final output path.

Luca

On May 23, 2011 14:42:04 Joey Echeverria wrote:
> Hi Mark,
> 
> FYI, I'm moving the discussion over to
> mapreduce-u...@hadoop.apache.org since your question is specific to
> MapReduce.
> 
> You can derive the output name from the TaskAttemptID which you can
> get by calling getTaskAttemptID() on the context passed to your
> cleanup() funciton. The task attempt id will look like this:
> 
> attempt_200707121733_0003_m_000005_0
> 
> You're interested in the m_000005 part, This gets translated into the
> output file name part-m-00005.
> 
> -Joey
> 
> On Sat, May 21, 2011 at 8:03 PM, Mark question <markq2...@gmail.com> wrote:
> > Hi,
> > 
> >  I'm running a job with maps only  and I want by end of each map
> > (ie.Close() function) to open the file that the current map has wrote
> > using its output.collector.
> > 
> >  I know "job.getWorkingDirectory()"  would give me the parent path of the
> > file written, but how to get the full path or the name (ie. part-00000 or
> > part-00001).
> > 
> > Thanks,
> > Mark

-- 
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452

Re: get name of file in mapper output directory

Reply via email to