Re: Separating mapper intermediate files

2012-03-27 Thread aayush
Thanks Harsh.

I set the mapred.local.dir as you suggested. It creates 4 folders in it for 
jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. 
Can you let me know exactly where to look in this directory structure?

Furthermore, it seems that all the intermediate spill and map output are 
cleaned up when the mapper finishes. I want to see those intermediate files and 
 don't want the cleanup of these files. How can I achieve it?

Thanks a lot

On Mar 27, 2012, at 1:16 AM, Harsh J-2 [via Hadoop 
Common]ml-node+s472056n3860389...@n3.nabble.com wrote:

 Hello Aayush, 
 
 Three things that'd help clear your confusion: 
 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this 
 to a partition1 path. 
 2. mapred.local.dir controls where intermediate task data go to. Set 
 this to a partition2 path. 
 
  Furthermore, can someone also tell me how to save intermediate mapper 
  files(spill outputs) and where are they saved. 
 
 Intermediate outputs are handled by the framework itself (There is no 
 user/manual work involved), and are saved inside attempt directories 
 under mapred.local.dir. 
 
 On Tue, Mar 27, 2012 at 4:46 AM, aayush [hidden email] wrote: 
  I am a newbie to Hadoop and map reduce. I am running a single node hadoop 
  setup. I have created 2 partitions on my HDD. I want the mapper 
  intermediate 
  files (i.e. the spill files and the mapper output) to be sent to a file 
  system on Partition1 whereas everything else including HDFS should be run 
  on 
  partition2. I am struggling to find the appropriate parametes in the conf 
  files. I understand that there is hadoop.tmp.dir and mapred.local.dir but 
  am 
  not sure how to use what. I would really appreciate if someone could tell 
  me 
  exactly which parameters to modify to achieve the goal. 
 
 -- 
 Harsh J 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html
 To unsubscribe from Separating mapper intermediate files, click here.
 NAML


--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html
Sent from the Users mailing list archive at Nabble.com.

Re: Separating mapper intermediate files

2012-03-27 Thread Raj Vishwanathan
Aayush

You can use the following. Just play around with the pattern

 property
  namekeep.task.files.pattern/name
  value.*_m_123456_0/value
  descriptionKeep all files from tasks whose task names match the given
               regular expression. Defaults to none./description
  /property


Raj




 From: aayush aayushgupta...@gmail.com
To: common-user@hadoop.apache.org 
Sent: Tuesday, March 27, 2012 5:18 AM
Subject: Re: Separating mapper intermediate files
 
Thanks Harsh.

I set the mapred.local.dir as you suggested. It creates 4 folders in it for 
jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. 
Can you let me know exactly where to look in this directory structure?

Furthermore, it seems that all the intermediate spill and map output are 
cleaned up when the mapper finishes. I want to see those intermediate files 
and  don't want the cleanup of these files. How can I achieve it?

Thanks a lot

On Mar 27, 2012, at 1:16 AM, Harsh J-2 [via Hadoop 
Common]ml-node+s472056n3860389...@n3.nabble.com wrote:

 Hello Aayush, 
 
 Three things that'd help clear your confusion: 
 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this 
 to a partition1 path. 
 2. mapred.local.dir controls where intermediate task data go to. Set 
 this to a partition2 path. 
 
  Furthermore, can someone also tell me how to save intermediate mapper 
  files(spill outputs) and where are they saved. 
 
 Intermediate outputs are handled by the framework itself (There is no 
 user/manual work involved), and are saved inside attempt directories 
 under mapred.local.dir. 
 
 On Tue, Mar 27, 2012 at 4:46 AM, aayush [hidden email] wrote: 
  I am a newbie to Hadoop and map reduce. I am running a single node hadoop 
  setup. I have created 2 partitions on my HDD. I want the mapper 
  intermediate 
  files (i.e. the spill files and the mapper output) to be sent to a file 
  system on Partition1 whereas everything else including HDFS should be run 
  on 
  partition2. I am struggling to find the appropriate parametes in the conf 
  files. I understand that there is hadoop.tmp.dir and mapred.local.dir but 
  am 
  not sure how to use what. I would really appreciate if someone could tell 
  me 
  exactly which parameters to modify to achieve the goal. 
 
 -- 
 Harsh J 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html
 To unsubscribe from Separating mapper intermediate files, click here.
 NAML


--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html
Sent from the Users mailing list archive at Nabble.com.




Separating mapper intermediate files

2012-03-26 Thread aayush
I am a newbie to Hadoop and map reduce. I am running a single node hadoop
setup. I have created 2 partitions on my HDD. I want the mapper intermediate
files (i.e. the spill files and the mapper output) to be sent to a file
system on Partition1 whereas everything else including HDFS should be run on
partition2. I am struggling to find the appropriate parametes in the conf
files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
not sure how to use what. I would really appreciate if someone could tell me
exactly which parameters to modify to achieve the goal.

Furthermore, can someone also tell me how to save intermediate mapper
files(spill outputs) and where are they saved.

Thanks in advance for any help.

--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3859787.html
Sent from the Users mailing list archive at Nabble.com.


Re: Separating mapper intermediate files

2012-03-26 Thread Harsh J
Hello Aayush,

Three things that'd help clear your confusion:
1. dfs.data.dir controls where HDFS blocks are to be stored. Set this
to a partition1 path.
2. mapred.local.dir controls where intermediate task data go to. Set
this to a partition2 path.

 Furthermore, can someone also tell me how to save intermediate mapper
 files(spill outputs) and where are they saved.

Intermediate outputs are handled by the framework itself (There is no
user/manual work involved), and are saved inside attempt directories
under mapred.local.dir.

On Tue, Mar 27, 2012 at 4:46 AM, aayush aayushgupta...@gmail.com wrote:
 I am a newbie to Hadoop and map reduce. I am running a single node hadoop
 setup. I have created 2 partitions on my HDD. I want the mapper intermediate
 files (i.e. the spill files and the mapper output) to be sent to a file
 system on Partition1 whereas everything else including HDFS should be run on
 partition2. I am struggling to find the appropriate parametes in the conf
 files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am
 not sure how to use what. I would really appreciate if someone could tell me
 exactly which parameters to modify to achieve the goal.

-- 
Harsh J