Hi Hadoopers, Currently I am running hadoop version 0.20.203 in production with 600 TB in her. I am planning to enable rack awareness in my production, but I still didn't see it through.
plan/questions. 1. I have script that can solve datanode/tasktracker IP to rack name. 2. Add topology.script.file.name in hdfs-site.xml and restart cluster. 3. After the cluster come back, my question start here, - do i have to run balancer or fsck or some command to have those 600 TB become redistribute to different rack in one time ? - currently i run balancer 2 hrs. everyday, can i keep this routine and hope that at some point the data will be nicely redistributed and aware of rack location ? - how could we know that the data in the cluster is now fully rack awareness ?? - if i just add the script and run balancer 2 hrs everyday, before the whole data become rack awareness. the data will be kind of mix between "default-rack" of existing data (haven't get balanced) and probably new loaded data will be rack-awareness. is it OK ? to have mix of default-rack and rack-specific data together ? 4. thought ? Hope this make sense, Thanks in advance Patai