Re: Separating mapper intermediate files
Thanks Harsh. I set the mapred.local.dir as you suggested. It creates 4 folders in it for jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. Can you let me know exactly where to look in this directory structure? Furthermore, it seems that all the intermediate spill and map output are cleaned up when the mapper finishes. I want to see those intermediate files and don't want the cleanup of these files. How can I achieve it? Thanks a lot On Mar 27, 2012, at 1:16 AM, Harsh J-2 [via Hadoop Common]ml-node+s472056n3860389...@n3.nabble.com wrote: Hello Aayush, Three things that'd help clear your confusion: 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this to a partition1 path. 2. mapred.local.dir controls where intermediate task data go to. Set this to a partition2 path. Furthermore, can someone also tell me how to save intermediate mapper files(spill outputs) and where are they saved. Intermediate outputs are handled by the framework itself (There is no user/manual work involved), and are saved inside attempt directories under mapred.local.dir. On Tue, Mar 27, 2012 at 4:46 AM, aayush [hidden email] wrote: I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper output) to be sent to a file system on Partition1 whereas everything else including HDFS should be run on partition2. I am struggling to find the appropriate parametes in the conf files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am not sure how to use what. I would really appreciate if someone could tell me exactly which parameters to modify to achieve the goal. -- Harsh J If you reply to this email, your message will be added to the discussion below: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html To unsubscribe from Separating mapper intermediate files, click here. NAML -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html Sent from the Users mailing list archive at Nabble.com.
Re: Separating mapper intermediate files
Aayush You can use the following. Just play around with the pattern property namekeep.task.files.pattern/name value.*_m_123456_0/value descriptionKeep all files from tasks whose task names match the given regular expression. Defaults to none./description /property Raj From: aayush aayushgupta...@gmail.com To: common-user@hadoop.apache.org Sent: Tuesday, March 27, 2012 5:18 AM Subject: Re: Separating mapper intermediate files Thanks Harsh. I set the mapred.local.dir as you suggested. It creates 4 folders in it for jobtracker, tasktracker, tt_private etc. i could not see an attempt directory. Can you let me know exactly where to look in this directory structure? Furthermore, it seems that all the intermediate spill and map output are cleaned up when the mapper finishes. I want to see those intermediate files and don't want the cleanup of these files. How can I achieve it? Thanks a lot On Mar 27, 2012, at 1:16 AM, Harsh J-2 [via Hadoop Common]ml-node+s472056n3860389...@n3.nabble.com wrote: Hello Aayush, Three things that'd help clear your confusion: 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this to a partition1 path. 2. mapred.local.dir controls where intermediate task data go to. Set this to a partition2 path. Furthermore, can someone also tell me how to save intermediate mapper files(spill outputs) and where are they saved. Intermediate outputs are handled by the framework itself (There is no user/manual work involved), and are saved inside attempt directories under mapred.local.dir. On Tue, Mar 27, 2012 at 4:46 AM, aayush [hidden email] wrote: I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper output) to be sent to a file system on Partition1 whereas everything else including HDFS should be run on partition2. I am struggling to find the appropriate parametes in the conf files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am not sure how to use what. I would really appreciate if someone could tell me exactly which parameters to modify to achieve the goal. -- Harsh J If you reply to this email, your message will be added to the discussion below: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3860389.html To unsubscribe from Separating mapper intermediate files, click here. NAML -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3861159.html Sent from the Users mailing list archive at Nabble.com.
Separating mapper intermediate files
I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper output) to be sent to a file system on Partition1 whereas everything else including HDFS should be run on partition2. I am struggling to find the appropriate parametes in the conf files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am not sure how to use what. I would really appreciate if someone could tell me exactly which parameters to modify to achieve the goal. Furthermore, can someone also tell me how to save intermediate mapper files(spill outputs) and where are they saved. Thanks in advance for any help. -- View this message in context: http://hadoop-common.472056.n3.nabble.com/Separating-mapper-intermediate-files-tp3859787p3859787.html Sent from the Users mailing list archive at Nabble.com.
Re: Separating mapper intermediate files
Hello Aayush, Three things that'd help clear your confusion: 1. dfs.data.dir controls where HDFS blocks are to be stored. Set this to a partition1 path. 2. mapred.local.dir controls where intermediate task data go to. Set this to a partition2 path. Furthermore, can someone also tell me how to save intermediate mapper files(spill outputs) and where are they saved. Intermediate outputs are handled by the framework itself (There is no user/manual work involved), and are saved inside attempt directories under mapred.local.dir. On Tue, Mar 27, 2012 at 4:46 AM, aayush aayushgupta...@gmail.com wrote: I am a newbie to Hadoop and map reduce. I am running a single node hadoop setup. I have created 2 partitions on my HDD. I want the mapper intermediate files (i.e. the spill files and the mapper output) to be sent to a file system on Partition1 whereas everything else including HDFS should be run on partition2. I am struggling to find the appropriate parametes in the conf files. I understand that there is hadoop.tmp.dir and mapred.local.dir but am not sure how to use what. I would really appreciate if someone could tell me exactly which parameters to modify to achieve the goal. -- Harsh J