Re: What will we encounter if we add a lot of nodes into the current cluster?
If you add these nodes, data will be put on them as you add data to the cluster. Soon after adding the nodes you should rebalance the storage to avoid age related surprises in how files are arranged in your cluster. Other than that, your addition should cause little in the way of surprises. On Tue, Aug 11, 2009 at 11:00 PM, yang song hadoop.ini...@gmail.com wrote: Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the process, which part shall we pay much attention on? Thank you! P.S. Our environment is hadoop-0.19.1, jdk1.6.0_06, linux redhat enterprise 4.0 -- Ted Dunning, CTO DeepDyve
Re: What will we encounter if we add a lot of nodes into the current cluster?
Also, if you haven't yet configured rack awareness, now's a good time to start :) - Aaron On Tue, Aug 11, 2009 at 11:27 PM, Ted Dunning ted.dunn...@gmail.com wrote: If you add these nodes, data will be put on them as you add data to the cluster. Soon after adding the nodes you should rebalance the storage to avoid age related surprises in how files are arranged in your cluster. Other than that, your addition should cause little in the way of surprises. On Tue, Aug 11, 2009 at 11:00 PM, yang song hadoop.ini...@gmail.com wrote: Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the process, which part shall we pay much attention on? Thank you! P.S. Our environment is hadoop-0.19.1, jdk1.6.0_06, linux redhat enterprise 4.0 -- Ted Dunning, CTO DeepDyve
Re: What will we encounter if we add a lot of nodes into the current cluster?
Thank you for teaching me that. I'm trying to use the balance tool(bin/hadoop balancer -t xxx). However, the data transfer is so slow that it will take a long long time. Is there a good method to solve it? What's more, I have a puzzle. The situation is we rarely use the existed data in the cluster. That means to rebalance existed data is not that worth. So, I intend to not rebalance the data. Is my opinion right? Thank you. 2009/8/12 Ted Dunning ted.dunn...@gmail.com If you add these nodes, data will be put on them as you add data to the cluster. Soon after adding the nodes you should rebalance the storage to avoid age related surprises in how files are arranged in your cluster. Other than that, your addition should cause little in the way of surprises. On Tue, Aug 11, 2009 at 11:00 PM, yang song hadoop.ini...@gmail.com wrote: Dear all I'm sorry to disturb you. Our cluster has 200 nodes now. In order to improve its ability, we hope to add 60 nodes into the current cluster. However, we all don't know what will happen if we add so many nodes at the same time. Could you give me some tips and notes? During the process, which part shall we pay much attention on? Thank you! P.S. Our environment is hadoop-0.19.1, jdk1.6.0_06, linux redhat enterprise 4.0 -- Ted Dunning, CTO DeepDyve
Re: What will we encounter if we add a lot of nodes into the current cluster?
On Thu, Aug 13, 2009 at 8:06 AM, yang song hadoop.ini...@gmail.com wrote: Thank you for teaching me that. I'm trying to use the balance tool(bin/hadoop balancer -t xxx). However, the data transfer is so slow that it will take a long long time. Is there a good method to solve it? What's more, I have a puzzle. The situation is we rarely use the existed data in the cluster. That means to rebalance existed data is not that worth. So, I intend to not rebalance the data. Is my opinion right? I think after you add new nodes to the cluster and if you don't rebalance, hadoop is probably going to pick the new nodes over the older ones for all the new data that you write into HDFS. As a result even your new data is probably going to be not balanced evenly across the cluster. If you're going to run m/r jobs on this new data, then it's a good idea to have that spread across the cluster evenly. -- Harish Mallipeddi http://blog.poundbang.in
Re: What will we encounter if we add a lot of nodes into the current cluster?
There is a parameter (dfs.balance.bandwidthPerSec) that limits the rebalancing bandwidth. The default is rather low. See http://developer.yahoo.com/hadoop/tutorial/module2.html#rebalancing On Wed, Aug 12, 2009 at 7:36 PM, yang song hadoop.ini...@gmail.com wrote: I'm trying to use the balance tool(bin/hadoop balancer -t xxx). However, the data transfer is so slow that it will take a long long time. Is there a good method to solve it? -- Ted Dunning, CTO DeepDyve