Hi there, Our data nodes all have 2 disks, one which is nearly full and one which is nearly empty:
$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 120G 11G 104G 9% / /dev/cciss/c0d0p1 99M 35M 60M 37% /boot tmpfs 7.9G 0 7.9G 0% /dev/shm /dev/cciss/c0d1 1.8T 1.7T 103G 95% /data /dev/cciss/c0d2 1.8T 76G 1.8T 5% /data2 Reading through the docs and mailing list archives, my understanding is that HDFS will continue to round robin to both disks until /data is completely full and then only write to /data2. Is this correct? Does it really write until the disk is 100% full (or as close to full as possible?) Ignoring performance of this situation and the monitoring hassles of having full disks, I just want to be sure that nothing bad is going to happen over the next couple of days as we fill up that /data partition. I understand that my best two options to rebalance each data node would be to either: 1) bring down HDFS and just manually move ~50% of the /data/dfs/dn/current/subdir* directories over to /data2 and then bring HDFS back up 2) bring a data node down one at a time, clean our /data and /data2, put the node back into rotation and let the balancer distribute replication data back onto the node and since it will round robin to both (now empty) disks, I will wind up with a nicely balanced data node. Repeat this process for the remaining nodes. I'm relatively new to HDFS, so can someone please confirm whether what I'm saying is correct? Any tips, tricks or things to watch out for would also be greatly appreciated. Thanks, Tom