Alright, finally managed to get the intermediate file.

The pattern should be ".*_m_0000.*" instead of ".*_m_0000*"... stupid me.

If you try to get everything, use ".*" for pattern. ;)


Best Regards,
Raymond Liu


> -----Original Message-----
> From: Liu, Raymond [mailto:raymond....@intel.com]
> Sent: Friday, August 10, 2012 2:42 PM
> To: Harsh J; common-user@hadoop.apache.org
> Subject: RE: How can I get the intermediate output file from mapper class?
> 
> Hi Harsh
> 
>       Thanks for your reply. While I don't quite catch what do you mean...
> Accroding to the description
> 
> <property>
>   <name>keep.task.files.pattern</name>
>   <value>.*_m_0000*</value>
>   <description>Keep all files from tasks whose task names match the given
>                regular expression. Defaults to none.</description>
> </property>
> 
> 
>       Isn't that pattern for the task name? and the task name is something 
> like :
> task_201208101126_0004_m_000000 ? So, shouldn't this patten make all the
> data from the tasks from been cleaned?
> 
>       If this don't work, can you kindly show me what's the exact pattern I
> should put here for the map->intermediate->reduce intermediate file (the
> merged partition file waiting to be shuffled to reduce tasks)? I tried 
> ".out*" , it
> doesn't works too.
> 
> Or I should modify some other property instead?
> 
> 
> Best Regards,
> Raymond Liu
> 
> > -----Original Message-----
> > From: Harsh J [mailto:ha...@cloudera.com]
> > Sent: Friday, August 10, 2012 12:29 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: How can I get the intermediate output file from mapper class?
> >
> > Hi,
> >
> > You need the "file.out" and "file.out.index" files when wanting the
> > map->intermediate->reduce files. So try a pattern that matches these
> > and you should have it.
> >
> > The "XXXXX" kind of files are what MR produces on HDFS as regular
> > outputs - these aren't intermediate.
> >
> > On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <raymond....@intel.com>
> > wrote:
> > > Hi
> > >
> > >         I am trying to access the intermediate file save to the
> > > local
> > filesystem from mapreduce's mapper output.
> > >
> > >         I have googled this one :
> > > http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermed
> > > ia
> > > te-output
> > >
> > >         I am using hadoop 1.0.3 , and I did set following property
> > > in mapred-site.xml
> > >
> > > <property>
> > >   <name>keep.task.files.pattern</name>
> > >   <value>.*_m_00000*</value>
> > > </property>
> > >
> > > Then after restart hadoop and run some jobss, I did see tasks in my
> > > local dir
> > like:
> > >
> > >
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > >
> > > But I still cannot find any output dir there.
> > >
> > > I have four disks mount for local dir, and only jars,work dir are
> > > find as
> > following:
> > >
> > > <property>
> > > <name>mapred.local.dir</name>
> > >
> >
> <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/
> > ma
> > >
> >
> pred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/ma
> > pr
> > > ed</value>
> > > </property>
> > >
> > > Then I search though them:
> > >
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > > jars  job.xml
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/ raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > > jobToken  work
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201
> > > 208101040_0003/
> > >
> > > And I also search the ttprivate dir, no luck there :
> > >
> > > raymond@sr173:~$ ls
> > >
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> > >
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> > kjvm.
> > > sh
> > >
> >
> /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac
> > >
> >
> he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas
> > kjvm.
> > > sh
> > >
> > > So, Is there anything I am still missing?
> > >
> > >
> > > Best Regards,
> > > Raymond Liu
> > >
> >
> >
> >
> > --
> > Harsh J

Reply via email to