Hello Elton,

On Mon, Apr 4, 2011 at 11:44 AM, elton sky <eltonsky9...@gmail.com> wrote:
> Now I want to remove 1 disk from each node, say /data4/hdfs-data. What I
> should do to keep data integrity?
> -Elton

This can be done using the reliable 'decommission' process, by
recommissioning them after having reconfigured (multiple nodes may be
taken down per decommission round this way, but be wary of your
cluster's actual used data capacity, and your minimum replication
factors). Read more about the decommission processes here:
http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#DFSAdmin+Command
and http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission

You may also have to run a cluster-wide balancer of DNs after the
entire process is done, to get rid of some skew in the distribution of
data across them.

(P.s. As an alternative solution, you may bring down one DataNode at a
time, reconfigure it individually, and bring it up again; then repeat
with the next one once NN's fsck reports a healthy situation again (no
under-replicated blocks). But decommissioning is the guaranteed safe
way and is easier to do for some bulk of nodes.)

-- 
Harsh J
Support Engineer, Cloudera

Reply via email to