Alright, finally managed to get the intermediate file. The pattern should be ".*_m_0000.*" instead of ".*_m_0000*"... stupid me.
If you try to get everything, use ".*" for pattern. ;) Best Regards, Raymond Liu > -----Original Message----- > From: Liu, Raymond [mailto:raymond....@intel.com] > Sent: Friday, August 10, 2012 2:42 PM > To: Harsh J; common-user@hadoop.apache.org > Subject: RE: How can I get the intermediate output file from mapper class? > > Hi Harsh > > Thanks for your reply. While I don't quite catch what do you mean... > Accroding to the description > > <property> > <name>keep.task.files.pattern</name> > <value>.*_m_0000*</value> > <description>Keep all files from tasks whose task names match the given > regular expression. Defaults to none.</description> > </property> > > > Isn't that pattern for the task name? and the task name is something > like : > task_201208101126_0004_m_000000 ? So, shouldn't this patten make all the > data from the tasks from been cleaned? > > If this don't work, can you kindly show me what's the exact pattern I > should put here for the map->intermediate->reduce intermediate file (the > merged partition file waiting to be shuffled to reduce tasks)? I tried > ".out*" , it > doesn't works too. > > Or I should modify some other property instead? > > > Best Regards, > Raymond Liu > > > -----Original Message----- > > From: Harsh J [mailto:ha...@cloudera.com] > > Sent: Friday, August 10, 2012 12:29 PM > > To: common-user@hadoop.apache.org > > Subject: Re: How can I get the intermediate output file from mapper class? > > > > Hi, > > > > You need the "file.out" and "file.out.index" files when wanting the > > map->intermediate->reduce files. So try a pattern that matches these > > and you should have it. > > > > The "XXXXX" kind of files are what MR produces on HDFS as regular > > outputs - these aren't intermediate. > > > > On Fri, Aug 10, 2012 at 8:52 AM, Liu, Raymond <raymond....@intel.com> > > wrote: > > > Hi > > > > > > I am trying to access the intermediate file save to the > > > local > > filesystem from mapreduce's mapper output. > > > > > > I have googled this one : > > > http://stackoverflow.com/questions/7867608/hadoop-mapreduce-intermed > > > ia > > > te-output > > > > > > I am using hadoop 1.0.3 , and I did set following property > > > in mapred-site.xml > > > > > > <property> > > > <name>keep.task.files.pattern</name> > > > <value>.*_m_00000*</value> > > > </property> > > > > > > Then after restart hadoop and run some jobss, I did see tasks in my > > > local dir > > like: > > > > > > > > > /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > > 208101040_0003/ > > > > > > But I still cannot find any output dir there. > > > > > > I have four disks mount for local dir, and only jars,work dir are > > > find as > > following: > > > > > > <property> > > > <name>mapred.local.dir</name> > > > > > > <value>/mnt/DP_disk1/raymond/hdfs/mapred,/mnt/DP_disk2/raymond/hdfs/ > > ma > > > > > > pred,/mnt/DP_disk3/raymond/hdfs/mapred,/mnt/DP_disk4/raymond/hdfs/ma > > pr > > > ed</value> > > > </property> > > > > > > Then I search though them: > > > > > > raymond@sr173:~$ ls > > > > > > /mnt/DP_disk1/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > > 208101040_0003/ > > > jars job.xml > > > raymond@sr173:~$ ls > > > > > > /mnt/DP_disk2/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > > 208101040_0003/ raymond@sr173:~$ ls > > > > > > /mnt/DP_disk3/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > > 208101040_0003/ > > > jobToken work > > > raymond@sr173:~$ ls > > > > > > /mnt/DP_disk4/raymond/hdfs/mapred/taskTracker/raymond/jobcache/job_201 > > > 208101040_0003/ > > > > > > And I also search the ttprivate dir, no luck there : > > > > > > raymond@sr173:~$ ls > > > > > > /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac > > > > > > he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas > > kjvm. > > > sh > > > > > > /mnt/DP_disk4/raymond/hdfs/mapred/ttprivate/taskTracker/raymond/jobcac > > > > > > he/job_201208101040_0003/attempt_201208101040_0003_m_000021_0/tas > > kjvm. > > > sh > > > > > > So, Is there anything I am still missing? > > > > > > > > > Best Regards, > > > Raymond Liu > > > > > > > > > > > -- > > Harsh J