RE: Does Hadoop Honor Reserved Space?

Joydeep Sen Sarma Mon, 10 Mar 2008 18:26:26 -0700

I have left some comments behind on the jira.

We could argue over what's the right thing to do (and we will on the
Jira) - but the higher level problem is that this is another case where
backwards compatibility with existing semantics of this option was not
carried over. Neither was there any notification to admins about this
change. The change notes just do not convey the import of this change to
existing deployments (incidentally 1463 was classified as 'Bug Fix' -
not that putting under 'Incompatible Fix' would have helped imho).


Would request the board/committers to consider setting up something
along the lines of:

1. have something better than Change Notes to convey interface changes
2. a field in the JIRA that marks it out as important from interface
change point of view (with notes on what's changing). This could be used
to auto-populate #1
3. Some way of auto-subscribing to bugs that are causing interface
changes (even an email filter on the jira mails would do).

As Hadoop user base keeps growing - and gets used for 'production' tasks
- I think it's absolutely essential that users/admins can keep in tune
with changes that affect their deployments. Otherwise - any organization
other than Yahoo would have tough time upgrading.

(I am new to open-source - but surely this has been solved before?)

Joydeep

-----Original Message-----
From: Hairong Kuang [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 10, 2008 5:17 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?

I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the space for
non-dfs usage, including the space for map/reduce, other application, fs
meta-data etc. In your case since /usr already takes 45GB, it far
exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> Filed https://issues.apache.org/jira/browse/HADOOP-2991
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 10, 2008 12:56 PM
> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
> Cc: Pete Wyckoff
> Subject: RE: Does Hadoop Honor Reserved Space?
> 
> folks - Jimmy is right - as we have unfortunately hit it as well:
> 
> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
> we have left some comments on the bug - but can't reopen it.
> 
> this is going to be affecting all 0.15 and 0.16 deployments!
> 
> 
> -----Original Message-----
> From: Hairong Kuang [mailto:[EMAIL PROTECTED]
> Sent: Thu 3/6/2008 2:01 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
>  
> In addition to the version, could you please send us a copy of the
> datanode
> report by running the command bin/hadoop dfsadmin -report?
> 
> Thanks,
> Hairong
> 
> 
> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:
> 
>> but intermediate data is stored in a different directory from
dfs/data
>> (something like mapred/local by default i think).
>> 
>> what version are u running?
>> 
>> 
>> -----Original Message-----
>> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
>> Sent: Thu 3/6/2008 10:14 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Does Hadoop Honor Reserved Space?
>>  
>> I've run into a similar issue in the past. From what I understand,
> this
>> parameter only controls the HDFS space usage. However, the
> intermediate data
>> in
>> the map reduce job is stored on the local file system (not HDFS) and
> is not
>> subject to this configuration.
>> 
>> In the past I have used mapred.local.dir.minspacekill and
>> mapred.local.dir.minspacestart to control the amount of space that is
>> allowable
>> for use by this temporary data.
>> 
>> Not sure if that is the best approach though, so I'd love to hear
what
> other
>> people have done. In your case, you have a map-red job that will
> consume too
>> much space (without setting a limit, you didn't have enough disk
> capacity for
>> the job), so looking at mapred.output.compress and
> mapred.compress.map.output
>> might be useful to decrease the job's disk requirements.
>> 
>> --Ash
>> 
>> -----Original Message-----
>> From: Jimmy Wan [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, March 06, 2008 9:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Does Hadoop Honor Reserved Space?
>> 
>> I've got 2 datanodes setup with the following configuration
parameter:
>> <property>
>>  <name>dfs.datanode.du.reserved</name>
>>  <value>429496729600</value>
>>  <description>Reserved space in bytes per volume. Always leave this
>> much  
>> space free for non dfs use.
>>  </description>
>> </property>
>> 
>> Both are housed on 800GB volumes, so I thought this would keep about
> half
>> the volume free for non-HDFS usage.
>> 
>> After some long running jobs last night, both disk volumes were
> completely
>> filled. The bulk of the data was in:
>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>> 
>> This is running as the user hadoop.
>> 
>> Am I interpretting these parameters incorrectly?
>> 
>> I noticed this issue, but it is marked as closed:
>> http://issues.apache.org/jira/browse/HADOOP-2549
> 
> 
>

RE: Does Hadoop Honor Reserved Space?

Reply via email to