[ 
https://issues.apache.org/jira/browse/HDFS-6075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925274#comment-13925274
 ] 

Charles Wimmer commented on HDFS-6075:
--------------------------------------

dfs.datanode.balance.bandwidthPerSec may be set dynamically while the cluster 
is running.  We requested this feature for exactly the type of operational 
situation you describe.  You may not be able to eliminate replication, but you 
can minimize the impact by temporarily setting the bandwidth extremely low.

>From hdfs dfsadmin -help:
{noformat}
-setBalancerBandwidth <bandwidth>:
        Changes the network bandwidth used by each datanode during
        HDFS block balancing.

                <bandwidth> is the maximum number of bytes per second
                that will be used by each datanode. This value overrides
                the dfs.balance.bandwidthPerSec parameter.

                --- NOTE: The new value is not persistent on the DataNode.---
{noformat}

> Introducing "non-replication mode"
> ----------------------------------
>
>                 Key: HDFS-6075
>                 URL: https://issues.apache.org/jira/browse/HDFS-6075
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Adam Kawa
>            Priority: Minor
>
> Afaik, HDFS does not provide an easy way to temporarily disable the 
> replication of missing blocks.
> If you would like to temporarily disable the replication, you would have to
> * set dfs.namenode.replication.interval (_The periodicity in seconds with 
> which the namenode computes repliaction work for datanodes_ Default 3) to 
> something very high. *Disadvantage*: you have to restart the NN
> * go into the safe-mode. *Disadvantage*: all write operations will fail
> We have the situation that we need to replace our top-of-rack switches for 
> each rack. Replacing a switch should take around 30 minutes. Each rack has 
> around 0.6 PB of data. We would like to avoid an expensive replication, since 
> we know that we will put this rack online quickly. To avoid any downtime, or 
> excessive network transfer, we think that temporarily disabling the 
> replication could fit us.
> The default block placement policy puts blocks into two racks, so when one 
> rack temporarily goes offline, we still have an access to at least replica of 
> each block. Of course, if we lose this replica, then we would have to wait 
> until the rack goes back online. This is what the administrator should be 
> aware of.
> This feature could disable the replication
> * globally - for a whole cluster
> * partially - e.g. only for missing blocks that come from a specified set of 
> DataNodes. So a file like "we_will_be_back_soon" :) could be introduced, 
> similar to include and exclude.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to