I don't have the specific data you request, but I can give you a general
outline for the dev cluster in question.

I have 4 nodes that are general use.  These have about 1TB of storage each,
but this is largely used by other processes.  These nodes usually have
50-500GB free.

I have 8 nodes that have one 70GB drive and one 500GB drive.  The 70GB drive
usually has about 40GB free.  The 500GB drive is essentially all for hadoop.

I have 2 nodes that have one 70GB drive that usually has about 40GB of free
space.

Originally, my storage partitions were listed as
small-partition,large-partition.  I later changed that to
large-partition,small-partition and then changed it again to list only the
partition available on the machine.  A utility to evacuate a partition would
come in very handy here, btw.  Turning off one node at a time and waiting
for the blocks to replicate is very slow.  It would be much nicer to be able
to announce to hadoop that I want the blocks on a particular disk partition
re-replicated NOW.  Since I had 8 partitions to evacuate and some of these
had been slightly corrupted due to disk-full conditions, evacuating them
took forever.  

I would have loved to have been able to just say that those partitions
should not be counted as replicants (but should be considered as possible
replication sources).  I would also have appreciated some way to tell the
cluster to prioritize replication of blocks at risk ahead of normal
computation.  This is especially important if somebody is running with only
2 copies of files.  Fsck should also have an option to cause it to trigger
block reports from data nodes so that latent problems can be flushed out of
hiding.

I had about 20% usage across available storage.


On 1/8/08 2:16 PM, "Hairong Kuang" <[EMAIL PROTECTED]> wrote:

> I agree that block distribution does not deal with heterogeneous cluster
> well. Basically block replication does not favor less utilized datanode.
> After 0.16 is released, you may periodically run the balancer to
> redistribute blocks with the command bin/start-balancer.sh.
> 
> I checked the datanode code. A datanode does check the amount of
> available space before block allocation. I need to investigate the cause
> of the disk full problem. I appreciate if you could provide me more
> information like the capacity of the disk, the amount of dfs used space,
> reserved space, and non-dfs used space when the out of disk problem
> occurs.
> 
> Hairong
> 
> -----Original Message-----
> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 08, 2008 1:37 PM
> To: hadoop-user@lucene.apache.org
> Subject: Re: Limit the space used by hadoop on a slave node
> 
> 
> And I have both but have had disk full problems.  I can't be sure right
> now whether this occurred under 14.4 or 15.1, but I think it was 15.1.
> 
> In any case, new file creation from a non-datanode host is definitely
> not well balanced and will lead to disk full conditions if you have
> dramatically different sized partitions available on the different
> datanodes.  Also, if you have a small and a large partition available on
> a single node, the small partition will fill up and cause corruption.  I
> had to go to single partitions on all nodes to avoid this.
> 
> <property>
>   <name>dfs.datanode.du.reserved</name>
>   <!--  10 GB -->
>   <value> 10000000000 </value>
>   <description>Reserved space in bytes. Always leave this much space
> free for non dfs use  </description> </property>
> 
> <property>
>   <name>dfs.datanode.du.pct</name>
>   <value>0.9f</value>
>   <description>When calculating remaining space, only use this
> percentage of the real available space
>   </description>
> </property>
> 
> 
> 
> On 1/8/08 1:30 PM, "Koji Noguchi" <[EMAIL PROTECTED]> wrote:
> 
>> We use,
>> 
>> dfs.datanode.du.pct for 0.14 and dfs.datanode.du.reserved for 0.15.
>> 
>> Change was made in the Jira Hairong mentioned.
>> https://issues.apache.org/jira/browse/HADOOP-1463
>> 
>> Koji
>> 
>>> -----Original Message-----
>>> From: Ted Dunning [mailto:[EMAIL PROTECTED]
>>> Sent: Tuesday, January 08, 2008 1:13 PM
>>> To: hadoop-user@lucene.apache.org
>>> Subject: Re: Limit the space used by hadoop on a slave node
>>> 
>>> 
>>> I think I have seen related bad behavior on 15.1.
>>> 
>>> On 1/8/08 11:49 AM, "Hairong Kuang" <[EMAIL PROTECTED]> wrote:
>>> 
>>>> Has anybody tried 15.0? Please check
>>>> https://issues.apache.org/jira/browse/HADOOP-1463.
>>>> 
>>>> Hairong
>>>> -----Original Message-----
>>>> From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
>>>> Sent: Tuesday, January 08, 2008 11:33 AM
>>>> To: hadoop-user@lucene.apache.org; hadoop-user@lucene.apache.org
>>>> Subject: RE: Limit the space used by hadoop on a slave node
>>>> 
>>>> at least up until 14.4, these options are broken. see
>>>> https://issues.apache.org/jira/browse/HADOOP-2549
>>>> 
>>>> (there's a trivial patch - but i am still testing).
>>>> 
>>>> 
>> 
> 

Reply via email to