I'm wondering what the proper actions to take in light of a NameNode or
DataNode failure are in an application which is holding a reference to a
FileSystem object.
* Does the FileSystem handle all of this itself (e.g. reconnect logic)?
* Do I need to get a new FileSystem using .get(Configuration)?
* Does the FileSystem need to be closed before re-getting?
* Do the answers to these questions depend on whether it's a NameNode or
DataNode that's failed?

In short, how does an application (not a Hadoop job -- just an app using
HDFS) properly recover from a NameNode or DataNode failure? I haven't
figured out the magic juju yet and my applications are not handling DFS
outages gracefully.

Thanks,
Brian

Reply via email to