Stack wrote:

I'm being 0 on this

-I would worry if the exclusion list was used by the NN to do its blacklisting, I'm glad to see this isn't happening. Yes, you could pick up datanode failure faster, but you would also be vulnerable to a user doing a DoS against the cluster by reporting every DN as failing

-Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to allow the datanodes to get the entire list of nodes holding the data, and allowed them to make their own decision about where to get the data from. This 1. pushed the policy of handling failure down to the clients, less need to talk to the NN about it. 2. lets you do something very fancy where you deliberately choose data from different DNs, so that you can then pull data off the cluster at the full bandwidth of every disk

Long term, I would like to see Russ's addition go in, so worry if the HDFS-630 patch would be useful long term. Maybe its a more fundamental issue: where does the decision making go, into the clients or into the NN?

-steve



[1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html

Reply via email to