Todd Lipcon wrote:

On Sat, Jul 4, 2009 at 9:08 AM, David B. Ritch <david.ri...@gmail.com>wrote:

Thanks, Todd.  Perhaps I was misinformed, or misunderstood.  I'll make
sure I close files occasionally, but it's good to know that the only
real issue is with data recovery after losing a node.


Just to be clear, there aren't issues with data recovery of already-written
files. The issue is that, when you open a new file to write it, Hadoop sets
up a pipeline that looks something like:

Writer -> DN A -> DN B -> DN C

Where each of DN [ABC] are datanodes in your HDFS cluster. If Writer is also
a node in your HDFS cluster it will attempt to make DN A be the same machine
as Writer.

If DN B fails, the write pipeline will reorganize itself to:

Writer -> DN A -> DN C

In theory I *believe* it's supposed to pick up a new datanode at this point
and tack it onto the end, but I'm not certain this is implemented quite yet.
Maybe Dhruba or someone else with more knowledge here can chime in.

Sounds like a good opportunity for a fun little test -start the write on a 4DN (local) cluster, kill the DN in use, check that all is well

Reply via email to