Re: Does Hadoop Honor Reserved Space?
Hi Pete, Joydeep, These sound like thoughts that could lead to excellent suggestions with a little more investment of your time. We'd love it if you could invest some effort into contributing to the release process! Hadoop is open source and becoming active contributors is the best possible way to address shortcomings that impact your organization. Thanks for your help! E14 On Mar 10, 2008, at 8:43 PM, Pete Wyckoff wrote: +1 (obviously :)) On 3/10/08 5:26 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: I have left some comments behind on the jira. We could argue over what's the right thing to do (and we will on the Jira) - but the higher level problem is that this is another case where backwards compatibility with existing semantics of this option was not carried over. Neither was there any notification to admins about this change. The change notes just do not convey the import of this change to existing deployments (incidentally 1463 was classified as 'Bug Fix' - not that putting under 'Incompatible Fix' would have helped imho). Would request the board/committers to consider setting up something along the lines of: 1. have something better than Change Notes to convey interface changes 2. a field in the JIRA that marks it out as important from interface change point of view (with notes on what's changing). This could be used to auto-populate #1 3. Some way of auto-subscribing to bugs that are causing interface changes (even an email filter on the jira mails would do). As Hadoop user base keeps growing - and gets used for 'production' tasks - I think it's absolutely essential that users/admins can keep in tune with changes that affect their deployments. Otherwise - any organization other than Yahoo would have tough time upgrading. (I am new to open-source - but surely this has been solved before?) Joydeep -Original Message- From: Hairong Kuang [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2008 5:17 PM To: core-user@hadoop.apache.org Subject: Re: Does Hadoop Honor Reserved Space? I think you have a misunderstanding of the reserved parameter. As I commented on hadoop-1463, remember that dfs.du.reserve is the space for non-dfs usage, including the space for map/reduce, other application, fs meta-data etc. In your case since /usr already takes 45GB, it far exceeds the reserved limit 1G. You should set the reserved space to be 50G. Hairong On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: Filed https://issues.apache.org/jira/browse/HADOOP-2991 -Original Message- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2008 12:56 PM To: core-user@hadoop.apache.org; core-user@hadoop.apache.org Cc: Pete Wyckoff Subject: RE: Does Hadoop Honor Reserved Space? folks - Jimmy is right - as we have unfortunately hit it as well: https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. we have left some comments on the bug - but can't reopen it. this is going to be affecting all 0.15 and 0.16 deployments! -Original Message- From: Hairong Kuang [mailto:[EMAIL PROTECTED] Sent: Thu 3/6/2008 2:01 PM To: core-user@hadoop.apache.org Subject: Re: Does Hadoop Honor Reserved Space? In addition to the version, could you please send us a copy of the datanode report by running the command bin/hadoop dfsadmin -report? Thanks, Hairong On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: but intermediate data is stored in a different directory from dfs/data (something like mapred/local by default i think). what version are u running? -Original Message- From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] Sent: Thu 3/6/2008 10:14 AM To: core-user@hadoop.apache.org Subject: RE: Does Hadoop Honor Reserved Space? I've run into a similar issue in the past. From what I understand, this parameter only controls the HDFS space usage. However, the intermediate data in the map reduce job is stored on the local file system (not HDFS) and is not subject to this configuration. In the past I have used mapred.local.dir.minspacekill and mapred.local.dir.minspacestart to control the amount of space that is allowable for use by this temporary data. Not sure if that is the best approach though, so I'd love to hear what other people have done. In your case, you have a map-red job that will consume too much space (without setting a limit, you didn't have enough disk capacity for the job), so looking at mapred.output.compress and mapred.compress.map.output might be useful to decrease the job's disk requirements. --Ash -Original Message- From: Jimmy Wan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 9:56 AM To: core-user@hadoop.apache.org Subject: Does Hadoop Honor Reserved Space? I've got 2 dat
Re: Does Hadoop Honor Reserved Space?
+1 (obviously :)) On 3/10/08 5:26 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > I have left some comments behind on the jira. > > We could argue over what's the right thing to do (and we will on the > Jira) - but the higher level problem is that this is another case where > backwards compatibility with existing semantics of this option was not > carried over. Neither was there any notification to admins about this > change. The change notes just do not convey the import of this change to > existing deployments (incidentally 1463 was classified as 'Bug Fix' - > not that putting under 'Incompatible Fix' would have helped imho). > > Would request the board/committers to consider setting up something > along the lines of: > > 1. have something better than Change Notes to convey interface changes > 2. a field in the JIRA that marks it out as important from interface > change point of view (with notes on what's changing). This could be used > to auto-populate #1 > 3. Some way of auto-subscribing to bugs that are causing interface > changes (even an email filter on the jira mails would do). > > As Hadoop user base keeps growing - and gets used for 'production' tasks > - I think it's absolutely essential that users/admins can keep in tune > with changes that affect their deployments. Otherwise - any organization > other than Yahoo would have tough time upgrading. > > (I am new to open-source - but surely this has been solved before?) > > Joydeep > > -Original Message----- > From: Hairong Kuang [mailto:[EMAIL PROTECTED] > Sent: Monday, March 10, 2008 5:17 PM > To: core-user@hadoop.apache.org > Subject: Re: Does Hadoop Honor Reserved Space? > > I think you have a misunderstanding of the reserved parameter. As I > commented on hadoop-1463, remember that dfs.du.reserve is the space for > non-dfs usage, including the space for map/reduce, other application, fs > meta-data etc. In your case since /usr already takes 45GB, it far > exceeds > the reserved limit 1G. You should set the reserved space to be 50G. > > Hairong > > > On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > >> Filed https://issues.apache.org/jira/browse/HADOOP-2991 >> >> -----Original Message- >> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] >> Sent: Monday, March 10, 2008 12:56 PM >> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org >> Cc: Pete Wyckoff >> Subject: RE: Does Hadoop Honor Reserved Space? >> >> folks - Jimmy is right - as we have unfortunately hit it as well: >> >> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. >> we have left some comments on the bug - but can't reopen it. >> >> this is going to be affecting all 0.15 and 0.16 deployments! >> >> >> -Original Message- >> From: Hairong Kuang [mailto:[EMAIL PROTECTED] >> Sent: Thu 3/6/2008 2:01 PM >> To: core-user@hadoop.apache.org >> Subject: Re: Does Hadoop Honor Reserved Space? >> >> In addition to the version, could you please send us a copy of the >> datanode >> report by running the command bin/hadoop dfsadmin -report? >> >> Thanks, >> Hairong >> >> >> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: >> >>> but intermediate data is stored in a different directory from > dfs/data >>> (something like mapred/local by default i think). >>> >>> what version are u running? >>> >>> >>> -Original Message- >>> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] >>> Sent: Thu 3/6/2008 10:14 AM >>> To: core-user@hadoop.apache.org >>> Subject: RE: Does Hadoop Honor Reserved Space? >>> >>> I've run into a similar issue in the past. From what I understand, >> this >>> parameter only controls the HDFS space usage. However, the >> intermediate data >>> in >>> the map reduce job is stored on the local file system (not HDFS) and >> is not >>> subject to this configuration. >>> >>> In the past I have used mapred.local.dir.minspacekill and >>> mapred.local.dir.minspacestart to control the amount of space that is >>> allowable >>> for use by this temporary data. >>> >>> Not sure if that is the best approach though, so I'd love to hear > what >> other >>> people have done. In your case, you have a map-red job that will >> consume too >>> much space (without se
RE: Does Hadoop Honor Reserved Space?
I have left some comments behind on the jira. We could argue over what's the right thing to do (and we will on the Jira) - but the higher level problem is that this is another case where backwards compatibility with existing semantics of this option was not carried over. Neither was there any notification to admins about this change. The change notes just do not convey the import of this change to existing deployments (incidentally 1463 was classified as 'Bug Fix' - not that putting under 'Incompatible Fix' would have helped imho). Would request the board/committers to consider setting up something along the lines of: 1. have something better than Change Notes to convey interface changes 2. a field in the JIRA that marks it out as important from interface change point of view (with notes on what's changing). This could be used to auto-populate #1 3. Some way of auto-subscribing to bugs that are causing interface changes (even an email filter on the jira mails would do). As Hadoop user base keeps growing - and gets used for 'production' tasks - I think it's absolutely essential that users/admins can keep in tune with changes that affect their deployments. Otherwise - any organization other than Yahoo would have tough time upgrading. (I am new to open-source - but surely this has been solved before?) Joydeep -Original Message- From: Hairong Kuang [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2008 5:17 PM To: core-user@hadoop.apache.org Subject: Re: Does Hadoop Honor Reserved Space? I think you have a misunderstanding of the reserved parameter. As I commented on hadoop-1463, remember that dfs.du.reserve is the space for non-dfs usage, including the space for map/reduce, other application, fs meta-data etc. In your case since /usr already takes 45GB, it far exceeds the reserved limit 1G. You should set the reserved space to be 50G. Hairong On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > Filed https://issues.apache.org/jira/browse/HADOOP-2991 > > -Original Message- > From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] > Sent: Monday, March 10, 2008 12:56 PM > To: core-user@hadoop.apache.org; core-user@hadoop.apache.org > Cc: Pete Wyckoff > Subject: RE: Does Hadoop Honor Reserved Space? > > folks - Jimmy is right - as we have unfortunately hit it as well: > > https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. > we have left some comments on the bug - but can't reopen it. > > this is going to be affecting all 0.15 and 0.16 deployments! > > > -Original Message- > From: Hairong Kuang [mailto:[EMAIL PROTECTED] > Sent: Thu 3/6/2008 2:01 PM > To: core-user@hadoop.apache.org > Subject: Re: Does Hadoop Honor Reserved Space? > > In addition to the version, could you please send us a copy of the > datanode > report by running the command bin/hadoop dfsadmin -report? > > Thanks, > Hairong > > > On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > >> but intermediate data is stored in a different directory from dfs/data >> (something like mapred/local by default i think). >> >> what version are u running? >> >> >> -Original Message- >> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] >> Sent: Thu 3/6/2008 10:14 AM >> To: core-user@hadoop.apache.org >> Subject: RE: Does Hadoop Honor Reserved Space? >> >> I've run into a similar issue in the past. From what I understand, > this >> parameter only controls the HDFS space usage. However, the > intermediate data >> in >> the map reduce job is stored on the local file system (not HDFS) and > is not >> subject to this configuration. >> >> In the past I have used mapred.local.dir.minspacekill and >> mapred.local.dir.minspacestart to control the amount of space that is >> allowable >> for use by this temporary data. >> >> Not sure if that is the best approach though, so I'd love to hear what > other >> people have done. In your case, you have a map-red job that will > consume too >> much space (without setting a limit, you didn't have enough disk > capacity for >> the job), so looking at mapred.output.compress and > mapred.compress.map.output >> might be useful to decrease the job's disk requirements. >> >> --Ash >> >> -Original Message- >> From: Jimmy Wan [mailto:[EMAIL PROTECTED] >> Sent: Thursday, March 06, 2008 9:56 AM >> To: core-user@hadoop.apache.org >> Subject: Does Hadoop Honor Reserved Space? >> >> I've got 2 datanodes setup with the following configuration parameter: >> >>
Re: Does Hadoop Honor Reserved Space?
I think you have a misunderstanding of the reserved parameter. As I commented on hadoop-1463, remember that dfs.du.reserve is the space for non-dfs usage, including the space for map/reduce, other application, fs meta-data etc. In your case since /usr already takes 45GB, it far exceeds the reserved limit 1G. You should set the reserved space to be 50G. Hairong On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > Filed https://issues.apache.org/jira/browse/HADOOP-2991 > > -Original Message- > From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] > Sent: Monday, March 10, 2008 12:56 PM > To: core-user@hadoop.apache.org; core-user@hadoop.apache.org > Cc: Pete Wyckoff > Subject: RE: Does Hadoop Honor Reserved Space? > > folks - Jimmy is right - as we have unfortunately hit it as well: > > https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. > we have left some comments on the bug - but can't reopen it. > > this is going to be affecting all 0.15 and 0.16 deployments! > > > -Original Message- > From: Hairong Kuang [mailto:[EMAIL PROTECTED] > Sent: Thu 3/6/2008 2:01 PM > To: core-user@hadoop.apache.org > Subject: Re: Does Hadoop Honor Reserved Space? > > In addition to the version, could you please send us a copy of the > datanode > report by running the command bin/hadoop dfsadmin -report? > > Thanks, > Hairong > > > On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > >> but intermediate data is stored in a different directory from dfs/data >> (something like mapred/local by default i think). >> >> what version are u running? >> >> >> -----Original Message----- >> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] >> Sent: Thu 3/6/2008 10:14 AM >> To: core-user@hadoop.apache.org >> Subject: RE: Does Hadoop Honor Reserved Space? >> >> I've run into a similar issue in the past. From what I understand, > this >> parameter only controls the HDFS space usage. However, the > intermediate data >> in >> the map reduce job is stored on the local file system (not HDFS) and > is not >> subject to this configuration. >> >> In the past I have used mapred.local.dir.minspacekill and >> mapred.local.dir.minspacestart to control the amount of space that is >> allowable >> for use by this temporary data. >> >> Not sure if that is the best approach though, so I'd love to hear what > other >> people have done. In your case, you have a map-red job that will > consume too >> much space (without setting a limit, you didn't have enough disk > capacity for >> the job), so looking at mapred.output.compress and > mapred.compress.map.output >> might be useful to decrease the job's disk requirements. >> >> --Ash >> >> -Original Message- >> From: Jimmy Wan [mailto:[EMAIL PROTECTED] >> Sent: Thursday, March 06, 2008 9:56 AM >> To: core-user@hadoop.apache.org >> Subject: Does Hadoop Honor Reserved Space? >> >> I've got 2 datanodes setup with the following configuration parameter: >> >> dfs.datanode.du.reserved >> 429496729600 >> Reserved space in bytes per volume. Always leave this >> much >> space free for non dfs use. >> >> >> >> Both are housed on 800GB volumes, so I thought this would keep about > half >> the volume free for non-HDFS usage. >> >> After some long running jobs last night, both disk volumes were > completely >> filled. The bulk of the data was in: >> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data >> >> This is running as the user hadoop. >> >> Am I interpretting these parameters incorrectly? >> >> I noticed this issue, but it is marked as closed: >> http://issues.apache.org/jira/browse/HADOOP-2549 > > >
RE: Does Hadoop Honor Reserved Space?
Filed https://issues.apache.org/jira/browse/HADOOP-2991 -Original Message- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Monday, March 10, 2008 12:56 PM To: core-user@hadoop.apache.org; core-user@hadoop.apache.org Cc: Pete Wyckoff Subject: RE: Does Hadoop Honor Reserved Space? folks - Jimmy is right - as we have unfortunately hit it as well: https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. we have left some comments on the bug - but can't reopen it. this is going to be affecting all 0.15 and 0.16 deployments! -Original Message- From: Hairong Kuang [mailto:[EMAIL PROTECTED] Sent: Thu 3/6/2008 2:01 PM To: core-user@hadoop.apache.org Subject: Re: Does Hadoop Honor Reserved Space? In addition to the version, could you please send us a copy of the datanode report by running the command bin/hadoop dfsadmin -report? Thanks, Hairong On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > but intermediate data is stored in a different directory from dfs/data > (something like mapred/local by default i think). > > what version are u running? > > > -Original Message- > From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] > Sent: Thu 3/6/2008 10:14 AM > To: core-user@hadoop.apache.org > Subject: RE: Does Hadoop Honor Reserved Space? > > I've run into a similar issue in the past. From what I understand, this > parameter only controls the HDFS space usage. However, the intermediate data > in > the map reduce job is stored on the local file system (not HDFS) and is not > subject to this configuration. > > In the past I have used mapred.local.dir.minspacekill and > mapred.local.dir.minspacestart to control the amount of space that is > allowable > for use by this temporary data. > > Not sure if that is the best approach though, so I'd love to hear what other > people have done. In your case, you have a map-red job that will consume too > much space (without setting a limit, you didn't have enough disk capacity for > the job), so looking at mapred.output.compress and mapred.compress.map.output > might be useful to decrease the job's disk requirements. > > --Ash > > -Original Message----- > From: Jimmy Wan [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 06, 2008 9:56 AM > To: core-user@hadoop.apache.org > Subject: Does Hadoop Honor Reserved Space? > > I've got 2 datanodes setup with the following configuration parameter: > > dfs.datanode.du.reserved > 429496729600 > Reserved space in bytes per volume. Always leave this > much > space free for non dfs use. > > > > Both are housed on 800GB volumes, so I thought this would keep about half > the volume free for non-HDFS usage. > > After some long running jobs last night, both disk volumes were completely > filled. The bulk of the data was in: > ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data > > This is running as the user hadoop. > > Am I interpretting these parameters incorrectly? > > I noticed this issue, but it is marked as closed: > http://issues.apache.org/jira/browse/HADOOP-2549
RE: Does Hadoop Honor Reserved Space?
folks - Jimmy is right - as we have unfortunately hit it as well: https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. we have left some comments on the bug - but can't reopen it. this is going to be affecting all 0.15 and 0.16 deployments! -Original Message- From: Hairong Kuang [mailto:[EMAIL PROTECTED] Sent: Thu 3/6/2008 2:01 PM To: core-user@hadoop.apache.org Subject: Re: Does Hadoop Honor Reserved Space? In addition to the version, could you please send us a copy of the datanode report by running the command bin/hadoop dfsadmin -report? Thanks, Hairong On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > but intermediate data is stored in a different directory from dfs/data > (something like mapred/local by default i think). > > what version are u running? > > > -Original Message- > From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] > Sent: Thu 3/6/2008 10:14 AM > To: core-user@hadoop.apache.org > Subject: RE: Does Hadoop Honor Reserved Space? > > I've run into a similar issue in the past. From what I understand, this > parameter only controls the HDFS space usage. However, the intermediate data > in > the map reduce job is stored on the local file system (not HDFS) and is not > subject to this configuration. > > In the past I have used mapred.local.dir.minspacekill and > mapred.local.dir.minspacestart to control the amount of space that is > allowable > for use by this temporary data. > > Not sure if that is the best approach though, so I'd love to hear what other > people have done. In your case, you have a map-red job that will consume too > much space (without setting a limit, you didn't have enough disk capacity for > the job), so looking at mapred.output.compress and mapred.compress.map.output > might be useful to decrease the job's disk requirements. > > --Ash > > -Original Message----- > From: Jimmy Wan [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 06, 2008 9:56 AM > To: core-user@hadoop.apache.org > Subject: Does Hadoop Honor Reserved Space? > > I've got 2 datanodes setup with the following configuration parameter: > > dfs.datanode.du.reserved > 429496729600 > Reserved space in bytes per volume. Always leave this > much > space free for non dfs use. > > > > Both are housed on 800GB volumes, so I thought this would keep about half > the volume free for non-HDFS usage. > > After some long running jobs last night, both disk volumes were completely > filled. The bulk of the data was in: > ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data > > This is running as the user hadoop. > > Am I interpretting these parameters incorrectly? > > I noticed this issue, but it is marked as closed: > http://issues.apache.org/jira/browse/HADOOP-2549
Re: Does Hadoop Honor Reserved Space?
Unfortunately, I had to clean up my HDFS in order to get some work done, but I was running Hadoop on Hadoop 0.16.0 running on a Linux box. My configuration is two machines. One has the JobTracker/NameNode and a TaskTracker instance all running on the same machine. The other machine is just running a TaskTracker. Replication was set to 2 for the default and the max. -- Jimmy On Thu, 06 Mar 2008 16:01:16 -0600, Hairong Kuang <[EMAIL PROTECTED]> wrote: In addition to the version, could you please send us a copy of the datanode report by running the command bin/hadoop dfsadmin -report? Thanks, Hairong On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: but intermediate data is stored in a different directory from dfs/data (something like mapred/local by default i think). what version are u running? -Original Message- From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] Sent: Thu 3/6/2008 10:14 AM To: core-user@hadoop.apache.org Subject: RE: Does Hadoop Honor Reserved Space? I've run into a similar issue in the past. From what I understand, this parameter only controls the HDFS space usage. However, the intermediate data in the map reduce job is stored on the local file system (not HDFS) and is not subject to this configuration. In the past I have used mapred.local.dir.minspacekill and mapred.local.dir.minspacestart to control the amount of space that is allowable for use by this temporary data. Not sure if that is the best approach though, so I'd love to hear what other people have done. In your case, you have a map-red job that will consume too much space (without setting a limit, you didn't have enough disk capacity for the job), so looking at mapred.output.compress and mapred.compress.map.output might be useful to decrease the job's disk requirements. --Ash -Original Message- From: Jimmy Wan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 9:56 AM To: core-user@hadoop.apache.org Subject: Does Hadoop Honor Reserved Space? I've got 2 datanodes setup with the following configuration parameter: dfs.datanode.du.reserved 429496729600 Reserved space in bytes per volume. Always leave this much space free for non dfs use. Both are housed on 800GB volumes, so I thought this would keep about half the volume free for non-HDFS usage. After some long running jobs last night, both disk volumes were completely filled. The bulk of the data was in: ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data This is running as the user hadoop. Am I interpretting these parameters incorrectly? I noticed this issue, but it is marked as closed: http://issues.apache.org/jira/browse/HADOOP-2549
Re: Does Hadoop Honor Reserved Space?
In addition to the version, could you please send us a copy of the datanode report by running the command bin/hadoop dfsadmin -report? Thanks, Hairong On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote: > but intermediate data is stored in a different directory from dfs/data > (something like mapred/local by default i think). > > what version are u running? > > > -Original Message- > From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] > Sent: Thu 3/6/2008 10:14 AM > To: core-user@hadoop.apache.org > Subject: RE: Does Hadoop Honor Reserved Space? > > I've run into a similar issue in the past. From what I understand, this > parameter only controls the HDFS space usage. However, the intermediate data > in > the map reduce job is stored on the local file system (not HDFS) and is not > subject to this configuration. > > In the past I have used mapred.local.dir.minspacekill and > mapred.local.dir.minspacestart to control the amount of space that is > allowable > for use by this temporary data. > > Not sure if that is the best approach though, so I'd love to hear what other > people have done. In your case, you have a map-red job that will consume too > much space (without setting a limit, you didn't have enough disk capacity for > the job), so looking at mapred.output.compress and mapred.compress.map.output > might be useful to decrease the job's disk requirements. > > --Ash > > -Original Message- > From: Jimmy Wan [mailto:[EMAIL PROTECTED] > Sent: Thursday, March 06, 2008 9:56 AM > To: core-user@hadoop.apache.org > Subject: Does Hadoop Honor Reserved Space? > > I've got 2 datanodes setup with the following configuration parameter: > > dfs.datanode.du.reserved > 429496729600 > Reserved space in bytes per volume. Always leave this > much > space free for non dfs use. > > > > Both are housed on 800GB volumes, so I thought this would keep about half > the volume free for non-HDFS usage. > > After some long running jobs last night, both disk volumes were completely > filled. The bulk of the data was in: > ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data > > This is running as the user hadoop. > > Am I interpretting these parameters incorrectly? > > I noticed this issue, but it is marked as closed: > http://issues.apache.org/jira/browse/HADOOP-2549
RE: Does Hadoop Honor Reserved Space?
but intermediate data is stored in a different directory from dfs/data (something like mapred/local by default i think). what version are u running? -Original Message- From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED] Sent: Thu 3/6/2008 10:14 AM To: core-user@hadoop.apache.org Subject: RE: Does Hadoop Honor Reserved Space? I've run into a similar issue in the past. From what I understand, this parameter only controls the HDFS space usage. However, the intermediate data in the map reduce job is stored on the local file system (not HDFS) and is not subject to this configuration. In the past I have used mapred.local.dir.minspacekill and mapred.local.dir.minspacestart to control the amount of space that is allowable for use by this temporary data. Not sure if that is the best approach though, so I'd love to hear what other people have done. In your case, you have a map-red job that will consume too much space (without setting a limit, you didn't have enough disk capacity for the job), so looking at mapred.output.compress and mapred.compress.map.output might be useful to decrease the job's disk requirements. --Ash -Original Message- From: Jimmy Wan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 9:56 AM To: core-user@hadoop.apache.org Subject: Does Hadoop Honor Reserved Space? I've got 2 datanodes setup with the following configuration parameter: dfs.datanode.du.reserved 429496729600 Reserved space in bytes per volume. Always leave this much space free for non dfs use. Both are housed on 800GB volumes, so I thought this would keep about half the volume free for non-HDFS usage. After some long running jobs last night, both disk volumes were completely filled. The bulk of the data was in: ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data This is running as the user hadoop. Am I interpretting these parameters incorrectly? I noticed this issue, but it is marked as closed: http://issues.apache.org/jira/browse/HADOOP-2549 -- Jimmy
RE: Does Hadoop Honor Reserved Space?
I've run into a similar issue in the past. From what I understand, this parameter only controls the HDFS space usage. However, the intermediate data in the map reduce job is stored on the local file system (not HDFS) and is not subject to this configuration. In the past I have used mapred.local.dir.minspacekill and mapred.local.dir.minspacestart to control the amount of space that is allowable for use by this temporary data. Not sure if that is the best approach though, so I'd love to hear what other people have done. In your case, you have a map-red job that will consume too much space (without setting a limit, you didn't have enough disk capacity for the job), so looking at mapred.output.compress and mapred.compress.map.output might be useful to decrease the job's disk requirements. --Ash -Original Message- From: Jimmy Wan [mailto:[EMAIL PROTECTED] Sent: Thursday, March 06, 2008 9:56 AM To: core-user@hadoop.apache.org Subject: Does Hadoop Honor Reserved Space? I've got 2 datanodes setup with the following configuration parameter: dfs.datanode.du.reserved 429496729600 Reserved space in bytes per volume. Always leave this much space free for non dfs use. Both are housed on 800GB volumes, so I thought this would keep about half the volume free for non-HDFS usage. After some long running jobs last night, both disk volumes were completely filled. The bulk of the data was in: ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data This is running as the user hadoop. Am I interpretting these parameters incorrectly? I noticed this issue, but it is marked as closed: http://issues.apache.org/jira/browse/HADOOP-2549 -- Jimmy
Does Hadoop Honor Reserved Space?
I've got 2 datanodes setup with the following configuration parameter: dfs.datanode.du.reserved 429496729600 Reserved space in bytes per volume. Always leave this much space free for non dfs use. Both are housed on 800GB volumes, so I thought this would keep about half the volume free for non-HDFS usage. After some long running jobs last night, both disk volumes were completely filled. The bulk of the data was in: ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data This is running as the user hadoop. Am I interpretting these parameters incorrectly? I noticed this issue, but it is marked as closed: http://issues.apache.org/jira/browse/HADOOP-2549 -- Jimmy