Re: Does Hadoop Honor Reserved Space?

2008-03-12 Thread Eric Baldeschwieler

Hi Pete, Joydeep,

These sound like thoughts that could lead to excellent suggestions  
with a little more investment of your time.


We'd love it if you could invest some effort into contributing to the  
release process!  Hadoop is open source and becoming active  
contributors is the best possible way to address shortcomings that  
impact your organization.


Thanks for your help!

E14



On Mar 10, 2008, at 8:43 PM, Pete Wyckoff wrote:



+1

(obviously :))


On 3/10/08 5:26 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:


I have left some comments behind on the jira.

We could argue over what's the right thing to do (and we will on the
Jira) - but the higher level problem is that this is another case  
where
backwards compatibility with existing semantics of this option was  
not

carried over. Neither was there any notification to admins about this
change. The change notes just do not convey the import of this  
change to

existing deployments (incidentally 1463 was classified as 'Bug Fix' -
not that putting under 'Incompatible Fix' would have helped imho).

Would request the board/committers to consider setting up something
along the lines of:

1. have something better than Change Notes to convey interface  
changes

2. a field in the JIRA that marks it out as important from interface
change point of view (with notes on what's changing). This could  
be used

to auto-populate #1
3. Some way of auto-subscribing to bugs that are causing interface
changes (even an email filter on the jira mails would do).

As Hadoop user base keeps growing - and gets used for 'production'  
tasks
- I think it's absolutely essential that users/admins can keep in  
tune
with changes that affect their deployments. Otherwise - any  
organization

other than Yahoo would have tough time upgrading.

(I am new to open-source - but surely this has been solved before?)

Joydeep

-Original Message-
From: Hairong Kuang [mailto:[EMAIL PROTECTED]
Sent: Monday, March 10, 2008 5:17 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?

I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the  
space for
non-dfs usage, including the space for map/reduce, other  
application, fs

meta-data etc. In your case since /usr already takes 45GB, it far
exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:


Filed https://issues.apache.org/jira/browse/HADOOP-2991

-Original Message-
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
Sent: Monday, March 10, 2008 12:56 PM
To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
Cc: Pete Wyckoff
Subject: RE: Does Hadoop Honor Reserved Space?

folks - Jimmy is right - as we have unfortunately hit it as well:

https://issues.apache.org/jira/browse/HADOOP-1463 caused a  
regression.

we have left some comments on the bug - but can't reopen it.

this is going to be affecting all 0.15 and 0.16 deployments!


-Original Message-
From: Hairong Kuang [mailto:[EMAIL PROTECTED]
Sent: Thu 3/6/2008 2:01 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?

In addition to the version, could you please send us a copy of the
datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]>  
wrote:



but intermediate data is stored in a different directory from

dfs/data

(something like mapred/local by default i think).

what version are u running?


-Original Message-
From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
Sent: Thu 3/6/2008 10:14 AM
To: core-user@hadoop.apache.org
Subject: RE: Does Hadoop Honor Reserved Space?

I've run into a similar issue in the past. From what I understand,

this

parameter only controls the HDFS space usage. However, the

intermediate data

in
the map reduce job is stored on the local file system (not HDFS)  
and

is not

subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space  
that is

allowable
for use by this temporary data.

Not sure if that is the best approach though, so I'd love to hear

what

other

people have done. In your case, you have a map-red job that will

consume too

much space (without setting a limit, you didn't have enough disk

capacity for

the job), so looking at mapred.output.compress and

mapred.compress.map.output

might be useful to decrease the job's disk requirements.

--Ash

-Original Message-
From: Jimmy Wan [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 dat

Re: Does Hadoop Honor Reserved Space?

2008-03-10 Thread Pete Wyckoff

+1

(obviously :))


On 3/10/08 5:26 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> I have left some comments behind on the jira.
> 
> We could argue over what's the right thing to do (and we will on the
> Jira) - but the higher level problem is that this is another case where
> backwards compatibility with existing semantics of this option was not
> carried over. Neither was there any notification to admins about this
> change. The change notes just do not convey the import of this change to
> existing deployments (incidentally 1463 was classified as 'Bug Fix' -
> not that putting under 'Incompatible Fix' would have helped imho).
> 
> Would request the board/committers to consider setting up something
> along the lines of:
> 
> 1. have something better than Change Notes to convey interface changes
> 2. a field in the JIRA that marks it out as important from interface
> change point of view (with notes on what's changing). This could be used
> to auto-populate #1
> 3. Some way of auto-subscribing to bugs that are causing interface
> changes (even an email filter on the jira mails would do).
> 
> As Hadoop user base keeps growing - and gets used for 'production' tasks
> - I think it's absolutely essential that users/admins can keep in tune
> with changes that affect their deployments. Otherwise - any organization
> other than Yahoo would have tough time upgrading.
> 
> (I am new to open-source - but surely this has been solved before?)
> 
> Joydeep
> 
> -Original Message-----
> From: Hairong Kuang [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 10, 2008 5:17 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
> 
> I think you have a misunderstanding of the reserved parameter. As I
> commented on hadoop-1463, remember that dfs.du.reserve is the space for
> non-dfs usage, including the space for map/reduce, other application, fs
> meta-data etc. In your case since /usr already takes 45GB, it far
> exceeds
> the reserved limit 1G. You should set the reserved space to be 50G.
> 
> Hairong
> 
> 
> On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:
> 
>> Filed https://issues.apache.org/jira/browse/HADOOP-2991
>> 
>> -----Original Message-
>> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
>> Sent: Monday, March 10, 2008 12:56 PM
>> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
>> Cc: Pete Wyckoff
>> Subject: RE: Does Hadoop Honor Reserved Space?
>> 
>> folks - Jimmy is right - as we have unfortunately hit it as well:
>> 
>> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
>> we have left some comments on the bug - but can't reopen it.
>> 
>> this is going to be affecting all 0.15 and 0.16 deployments!
>> 
>> 
>> -Original Message-
>> From: Hairong Kuang [mailto:[EMAIL PROTECTED]
>> Sent: Thu 3/6/2008 2:01 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: Does Hadoop Honor Reserved Space?
>>  
>> In addition to the version, could you please send us a copy of the
>> datanode
>> report by running the command bin/hadoop dfsadmin -report?
>> 
>> Thanks,
>> Hairong
>> 
>> 
>> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:
>> 
>>> but intermediate data is stored in a different directory from
> dfs/data
>>> (something like mapred/local by default i think).
>>> 
>>> what version are u running?
>>> 
>>> 
>>> -Original Message-
>>> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
>>> Sent: Thu 3/6/2008 10:14 AM
>>> To: core-user@hadoop.apache.org
>>> Subject: RE: Does Hadoop Honor Reserved Space?
>>>  
>>> I've run into a similar issue in the past. From what I understand,
>> this
>>> parameter only controls the HDFS space usage. However, the
>> intermediate data
>>> in
>>> the map reduce job is stored on the local file system (not HDFS) and
>> is not
>>> subject to this configuration.
>>> 
>>> In the past I have used mapred.local.dir.minspacekill and
>>> mapred.local.dir.minspacestart to control the amount of space that is
>>> allowable
>>> for use by this temporary data.
>>> 
>>> Not sure if that is the best approach though, so I'd love to hear
> what
>> other
>>> people have done. In your case, you have a map-red job that will
>> consume too
>>> much space (without se

RE: Does Hadoop Honor Reserved Space?

2008-03-10 Thread Joydeep Sen Sarma
I have left some comments behind on the jira.

We could argue over what's the right thing to do (and we will on the
Jira) - but the higher level problem is that this is another case where
backwards compatibility with existing semantics of this option was not
carried over. Neither was there any notification to admins about this
change. The change notes just do not convey the import of this change to
existing deployments (incidentally 1463 was classified as 'Bug Fix' -
not that putting under 'Incompatible Fix' would have helped imho).

Would request the board/committers to consider setting up something
along the lines of:

1. have something better than Change Notes to convey interface changes
2. a field in the JIRA that marks it out as important from interface
change point of view (with notes on what's changing). This could be used
to auto-populate #1
3. Some way of auto-subscribing to bugs that are causing interface
changes (even an email filter on the jira mails would do).

As Hadoop user base keeps growing - and gets used for 'production' tasks
- I think it's absolutely essential that users/admins can keep in tune
with changes that affect their deployments. Otherwise - any organization
other than Yahoo would have tough time upgrading.

(I am new to open-source - but surely this has been solved before?)

Joydeep

-Original Message-
From: Hairong Kuang [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 10, 2008 5:17 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?

I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the space for
non-dfs usage, including the space for map/reduce, other application, fs
meta-data etc. In your case since /usr already takes 45GB, it far
exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> Filed https://issues.apache.org/jira/browse/HADOOP-2991
> 
> -Original Message-
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 10, 2008 12:56 PM
> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
> Cc: Pete Wyckoff
> Subject: RE: Does Hadoop Honor Reserved Space?
> 
> folks - Jimmy is right - as we have unfortunately hit it as well:
> 
> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
> we have left some comments on the bug - but can't reopen it.
> 
> this is going to be affecting all 0.15 and 0.16 deployments!
> 
> 
> -Original Message-
> From: Hairong Kuang [mailto:[EMAIL PROTECTED]
> Sent: Thu 3/6/2008 2:01 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
>  
> In addition to the version, could you please send us a copy of the
> datanode
> report by running the command bin/hadoop dfsadmin -report?
> 
> Thanks,
> Hairong
> 
> 
> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:
> 
>> but intermediate data is stored in a different directory from
dfs/data
>> (something like mapred/local by default i think).
>> 
>> what version are u running?
>> 
>> 
>> -Original Message-
>> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
>> Sent: Thu 3/6/2008 10:14 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Does Hadoop Honor Reserved Space?
>>  
>> I've run into a similar issue in the past. From what I understand,
> this
>> parameter only controls the HDFS space usage. However, the
> intermediate data
>> in
>> the map reduce job is stored on the local file system (not HDFS) and
> is not
>> subject to this configuration.
>> 
>> In the past I have used mapred.local.dir.minspacekill and
>> mapred.local.dir.minspacestart to control the amount of space that is
>> allowable
>> for use by this temporary data.
>> 
>> Not sure if that is the best approach though, so I'd love to hear
what
> other
>> people have done. In your case, you have a map-red job that will
> consume too
>> much space (without setting a limit, you didn't have enough disk
> capacity for
>> the job), so looking at mapred.output.compress and
> mapred.compress.map.output
>> might be useful to decrease the job's disk requirements.
>> 
>> --Ash
>> 
>> -Original Message-
>> From: Jimmy Wan [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, March 06, 2008 9:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Does Hadoop Honor Reserved Space?
>> 
>> I've got 2 datanodes setup with the following configuration
parameter:
>> 
>>

Re: Does Hadoop Honor Reserved Space?

2008-03-10 Thread Hairong Kuang
I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the space for
non-dfs usage, including the space for map/reduce, other application, fs
meta-data etc. In your case since /usr already takes 45GB, it far exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> Filed https://issues.apache.org/jira/browse/HADOOP-2991
> 
> -Original Message-
> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
> Sent: Monday, March 10, 2008 12:56 PM
> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
> Cc: Pete Wyckoff
> Subject: RE: Does Hadoop Honor Reserved Space?
> 
> folks - Jimmy is right - as we have unfortunately hit it as well:
> 
> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
> we have left some comments on the bug - but can't reopen it.
> 
> this is going to be affecting all 0.15 and 0.16 deployments!
> 
> 
> -Original Message-
> From: Hairong Kuang [mailto:[EMAIL PROTECTED]
> Sent: Thu 3/6/2008 2:01 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
>  
> In addition to the version, could you please send us a copy of the
> datanode
> report by running the command bin/hadoop dfsadmin -report?
> 
> Thanks,
> Hairong
> 
> 
> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:
> 
>> but intermediate data is stored in a different directory from dfs/data
>> (something like mapred/local by default i think).
>> 
>> what version are u running?
>> 
>> 
>> -----Original Message-----
>> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
>> Sent: Thu 3/6/2008 10:14 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Does Hadoop Honor Reserved Space?
>>  
>> I've run into a similar issue in the past. From what I understand,
> this
>> parameter only controls the HDFS space usage. However, the
> intermediate data
>> in
>> the map reduce job is stored on the local file system (not HDFS) and
> is not
>> subject to this configuration.
>> 
>> In the past I have used mapred.local.dir.minspacekill and
>> mapred.local.dir.minspacestart to control the amount of space that is
>> allowable
>> for use by this temporary data.
>> 
>> Not sure if that is the best approach though, so I'd love to hear what
> other
>> people have done. In your case, you have a map-red job that will
> consume too
>> much space (without setting a limit, you didn't have enough disk
> capacity for
>> the job), so looking at mapred.output.compress and
> mapred.compress.map.output
>> might be useful to decrease the job's disk requirements.
>> 
>> --Ash
>> 
>> -Original Message-
>> From: Jimmy Wan [mailto:[EMAIL PROTECTED]
>> Sent: Thursday, March 06, 2008 9:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Does Hadoop Honor Reserved Space?
>> 
>> I've got 2 datanodes setup with the following configuration parameter:
>> 
>>  dfs.datanode.du.reserved
>>  429496729600
>>  Reserved space in bytes per volume. Always leave this
>> much  
>> space free for non dfs use.
>>  
>> 
>> 
>> Both are housed on 800GB volumes, so I thought this would keep about
> half
>> the volume free for non-HDFS usage.
>> 
>> After some long running jobs last night, both disk volumes were
> completely
>> filled. The bulk of the data was in:
>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>> 
>> This is running as the user hadoop.
>> 
>> Am I interpretting these parameters incorrectly?
>> 
>> I noticed this issue, but it is marked as closed:
>> http://issues.apache.org/jira/browse/HADOOP-2549
> 
> 
> 



RE: Does Hadoop Honor Reserved Space?

2008-03-10 Thread Joydeep Sen Sarma
Filed https://issues.apache.org/jira/browse/HADOOP-2991

-Original Message-
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 10, 2008 12:56 PM
To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
Cc: Pete Wyckoff
Subject: RE: Does Hadoop Honor Reserved Space?

folks - Jimmy is right - as we have unfortunately hit it as well:

https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
we have left some comments on the bug - but can't reopen it.

this is going to be affecting all 0.15 and 0.16 deployments!


-Original Message-
From: Hairong Kuang [mailto:[EMAIL PROTECTED]
Sent: Thu 3/6/2008 2:01 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?
 
In addition to the version, could you please send us a copy of the
datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> but intermediate data is stored in a different directory from dfs/data
> (something like mapred/local by default i think).
> 
> what version are u running?
> 
> 
> -Original Message-
> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
> Sent: Thu 3/6/2008 10:14 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Does Hadoop Honor Reserved Space?
>  
> I've run into a similar issue in the past. From what I understand,
this
> parameter only controls the HDFS space usage. However, the
intermediate data
> in
> the map reduce job is stored on the local file system (not HDFS) and
is not
> subject to this configuration.
> 
> In the past I have used mapred.local.dir.minspacekill and
> mapred.local.dir.minspacestart to control the amount of space that is
> allowable
> for use by this temporary data.
> 
> Not sure if that is the best approach though, so I'd love to hear what
other
> people have done. In your case, you have a map-red job that will
consume too
> much space (without setting a limit, you didn't have enough disk
capacity for
> the job), so looking at mapred.output.compress and
mapred.compress.map.output
> might be useful to decrease the job's disk requirements.
> 
> --Ash
> 
> -Original Message-----
> From: Jimmy Wan [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 06, 2008 9:56 AM
> To: core-user@hadoop.apache.org
> Subject: Does Hadoop Honor Reserved Space?
> 
> I've got 2 datanodes setup with the following configuration parameter:
> 
>  dfs.datanode.du.reserved
>  429496729600
>  Reserved space in bytes per volume. Always leave this
> much  
> space free for non dfs use.
>  
> 
> 
> Both are housed on 800GB volumes, so I thought this would keep about
half
> the volume free for non-HDFS usage.
> 
> After some long running jobs last night, both disk volumes were
completely
> filled. The bulk of the data was in:
> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
> 
> This is running as the user hadoop.
> 
> Am I interpretting these parameters incorrectly?
> 
> I noticed this issue, but it is marked as closed:
> http://issues.apache.org/jira/browse/HADOOP-2549





RE: Does Hadoop Honor Reserved Space?

2008-03-10 Thread Joydeep Sen Sarma
folks - Jimmy is right - as we have unfortunately hit it as well:

https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. we have 
left some comments on the bug - but can't reopen it.

this is going to be affecting all 0.15 and 0.16 deployments!


-Original Message-
From: Hairong Kuang [mailto:[EMAIL PROTECTED]
Sent: Thu 3/6/2008 2:01 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?
 
In addition to the version, could you please send us a copy of the datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> but intermediate data is stored in a different directory from dfs/data
> (something like mapred/local by default i think).
> 
> what version are u running?
> 
> 
> -Original Message-
> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
> Sent: Thu 3/6/2008 10:14 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Does Hadoop Honor Reserved Space?
>  
> I've run into a similar issue in the past. From what I understand, this
> parameter only controls the HDFS space usage. However, the intermediate data
> in
> the map reduce job is stored on the local file system (not HDFS) and is not
> subject to this configuration.
> 
> In the past I have used mapred.local.dir.minspacekill and
> mapred.local.dir.minspacestart to control the amount of space that is
> allowable
> for use by this temporary data.
> 
> Not sure if that is the best approach though, so I'd love to hear what other
> people have done. In your case, you have a map-red job that will consume too
> much space (without setting a limit, you didn't have enough disk capacity for
> the job), so looking at mapred.output.compress and mapred.compress.map.output
> might be useful to decrease the job's disk requirements.
> 
> --Ash
> 
> -Original Message-----
> From: Jimmy Wan [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 06, 2008 9:56 AM
> To: core-user@hadoop.apache.org
> Subject: Does Hadoop Honor Reserved Space?
> 
> I've got 2 datanodes setup with the following configuration parameter:
> 
>  dfs.datanode.du.reserved
>  429496729600
>  Reserved space in bytes per volume. Always leave this
> much  
> space free for non dfs use.
>  
> 
> 
> Both are housed on 800GB volumes, so I thought this would keep about half
> the volume free for non-HDFS usage.
> 
> After some long running jobs last night, both disk volumes were completely
> filled. The bulk of the data was in:
> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
> 
> This is running as the user hadoop.
> 
> Am I interpretting these parameters incorrectly?
> 
> I noticed this issue, but it is marked as closed:
> http://issues.apache.org/jira/browse/HADOOP-2549





Re: Does Hadoop Honor Reserved Space?

2008-03-07 Thread Jimmy Wan
Unfortunately, I had to clean up my HDFS in order to get some work done,  
but
I was running Hadoop on Hadoop 0.16.0 running on a Linux box. My  
configuration is
two machines. One has the JobTracker/NameNode and a TaskTracker instance  
all
running on the same machine. The other machine is just running a  
TaskTracker.

Replication was set to 2 for the default and the max.

--
Jimmy

On Thu, 06 Mar 2008 16:01:16 -0600, Hairong Kuang <[EMAIL PROTECTED]>  
wrote:


In addition to the version, could you please send us a copy of the  
datanode

report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:


but intermediate data is stored in a different directory from dfs/data
(something like mapred/local by default i think).

what version are u running?


-Original Message-
From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
Sent: Thu 3/6/2008 10:14 AM
To: core-user@hadoop.apache.org
Subject: RE: Does Hadoop Honor Reserved Space?

I've run into a similar issue in the past. From what I understand, this
parameter only controls the HDFS space usage. However, the intermediate  
data in the map reduce job is stored on the local file system (not

HDFS) and is not subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space that is
allowable for use by this temporary data.

Not sure if that is the best approach though, so I'd love to hear what  
other people have done. In your case, you have a map-red job that will  
consume too much space (without setting a limit, you didn't have enough

disk capacity for the job), so looking at mapred.output.compress and
mapred.compress.map.output might be useful to decrease the job's disk
requirements.

--Ash

-Original Message-
From: Jimmy Wan [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:

 dfs.datanode.du.reserved
 429496729600
 Reserved space in bytes per volume. Always leave this
much
space free for non dfs use.
 


Both are housed on 800GB volumes, so I thought this would keep about  
half

the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were  
completely

filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:
http://issues.apache.org/jira/browse/HADOOP-2549


Re: Does Hadoop Honor Reserved Space?

2008-03-06 Thread Hairong Kuang
In addition to the version, could you please send us a copy of the datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <[EMAIL PROTECTED]> wrote:

> but intermediate data is stored in a different directory from dfs/data
> (something like mapred/local by default i think).
> 
> what version are u running?
> 
> 
> -Original Message-
> From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
> Sent: Thu 3/6/2008 10:14 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Does Hadoop Honor Reserved Space?
>  
> I've run into a similar issue in the past. From what I understand, this
> parameter only controls the HDFS space usage. However, the intermediate data
> in
> the map reduce job is stored on the local file system (not HDFS) and is not
> subject to this configuration.
> 
> In the past I have used mapred.local.dir.minspacekill and
> mapred.local.dir.minspacestart to control the amount of space that is
> allowable
> for use by this temporary data.
> 
> Not sure if that is the best approach though, so I'd love to hear what other
> people have done. In your case, you have a map-red job that will consume too
> much space (without setting a limit, you didn't have enough disk capacity for
> the job), so looking at mapred.output.compress and mapred.compress.map.output
> might be useful to decrease the job's disk requirements.
> 
> --Ash
> 
> -Original Message-
> From: Jimmy Wan [mailto:[EMAIL PROTECTED]
> Sent: Thursday, March 06, 2008 9:56 AM
> To: core-user@hadoop.apache.org
> Subject: Does Hadoop Honor Reserved Space?
> 
> I've got 2 datanodes setup with the following configuration parameter:
> 
>  dfs.datanode.du.reserved
>  429496729600
>  Reserved space in bytes per volume. Always leave this
> much  
> space free for non dfs use.
>  
> 
> 
> Both are housed on 800GB volumes, so I thought this would keep about half
> the volume free for non-HDFS usage.
> 
> After some long running jobs last night, both disk volumes were completely
> filled. The bulk of the data was in:
> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
> 
> This is running as the user hadoop.
> 
> Am I interpretting these parameters incorrectly?
> 
> I noticed this issue, but it is marked as closed:
> http://issues.apache.org/jira/browse/HADOOP-2549



RE: Does Hadoop Honor Reserved Space?

2008-03-06 Thread Joydeep Sen Sarma
but intermediate data is stored in a different directory from dfs/data 
(something like mapred/local by default i think).

what version are u running? 


-Original Message-
From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
Sent: Thu 3/6/2008 10:14 AM
To: core-user@hadoop.apache.org
Subject: RE: Does Hadoop Honor Reserved Space?
 
I've run into a similar issue in the past. From what I understand, this
parameter only controls the HDFS space usage. However, the intermediate data in
the map reduce job is stored on the local file system (not HDFS) and is not
subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space that is allowable
for use by this temporary data. 

Not sure if that is the best approach though, so I'd love to hear what other
people have done. In your case, you have a map-red job that will consume too
much space (without setting a limit, you didn't have enough disk capacity for
the job), so looking at mapred.output.compress and mapred.compress.map.output
might be useful to decrease the job's disk requirements.

--Ash

-Original Message-
From: Jimmy Wan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:

  dfs.datanode.du.reserved
  429496729600
  Reserved space in bytes per volume. Always leave this
much  
space free for non dfs use.
  


Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549

-- 
Jimmy




RE: Does Hadoop Honor Reserved Space?

2008-03-06 Thread ahluwalia5
I've run into a similar issue in the past. From what I understand, this
parameter only controls the HDFS space usage. However, the intermediate data in
the map reduce job is stored on the local file system (not HDFS) and is not
subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space that is allowable
for use by this temporary data. 

Not sure if that is the best approach though, so I'd love to hear what other
people have done. In your case, you have a map-red job that will consume too
much space (without setting a limit, you didn't have enough disk capacity for
the job), so looking at mapred.output.compress and mapred.compress.map.output
might be useful to decrease the job's disk requirements.

--Ash

-Original Message-
From: Jimmy Wan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:

  dfs.datanode.du.reserved
  429496729600
  Reserved space in bytes per volume. Always leave this
much  
space free for non dfs use.
  


Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549

-- 
Jimmy



Does Hadoop Honor Reserved Space?

2008-03-06 Thread Jimmy Wan

I've got 2 datanodes setup with the following configuration parameter:

  dfs.datanode.du.reserved
  429496729600
	  Reserved space in bytes per volume. Always leave this much  
space free for non dfs use.

  


Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.


After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:

${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549


--
Jimmy