I'm wondering what the proper actions to take in light of a NameNode or DataNode failure are in an application which is holding a reference to a FileSystem object. * Does the FileSystem handle all of this itself (e.g. reconnect logic)? * Do I need to get a new FileSystem using .get(Configuration)? * Does the FileSystem need to be closed before re-getting? * Do the answers to these questions depend on whether it's a NameNode or DataNode that's failed?
In short, how does an application (not a Hadoop job -- just an app using HDFS) properly recover from a NameNode or DataNode failure? I haven't figured out the magic juju yet and my applications are not handling DFS outages gracefully. Thanks, Brian