Hello Elton, On Mon, Apr 4, 2011 at 11:44 AM, elton sky <eltonsky9...@gmail.com> wrote: > Now I want to remove 1 disk from each node, say /data4/hdfs-data. What I > should do to keep data integrity? > -Elton
This can be done using the reliable 'decommission' process, by recommissioning them after having reconfigured (multiple nodes may be taken down per decommission round this way, but be wary of your cluster's actual used data capacity, and your minimum replication factors). Read more about the decommission processes here: http://hadoop.apache.org/hdfs/docs/r0.21.0/hdfs_user_guide.html#DFSAdmin+Command and http://developer.yahoo.com/hadoop/tutorial/module2.html#decommission You may also have to run a cluster-wide balancer of DNs after the entire process is done, to get rid of some skew in the distribution of data across them. (P.s. As an alternative solution, you may bring down one DataNode at a time, reconfigure it individually, and bring it up again; then repeat with the next one once NN's fsck reports a healthy situation again (no under-replicated blocks). But decommissioning is the guaranteed safe way and is easier to do for some bulk of nodes.) -- Harsh J Support Engineer, Cloudera