[ https://issues.apache.org/jira/browse/HADOOP-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gera Shegalov updated HADOOP-12077: ----------------------------------- Attachment: HADOOP-12077.002.patch Addressed a bulk of check style warnings. Test failures are unrelated and don't reproduce locally. > Provide a muti-URI replication Inode for ViewFs > ----------------------------------------------- > > Key: HADOOP-12077 > URL: https://issues.apache.org/jira/browse/HADOOP-12077 > Project: Hadoop Common > Issue Type: New Feature > Components: fs > Reporter: Gera Shegalov > Assignee: Gera Shegalov > Attachments: HADOOP-12077.001.patch, HADOOP-12077.002.patch > > > This JIRA is to provide simple "replication" capabilities for applications > that maintain logically equivalent paths in multiple locations for caching or > failover (e.g., S3 and HDFS). We noticed a simple common HDFS usage pattern > in our applications. They host their data on some logical cluster C. There > are corresponding HDFS clusters in multiple datacenters. When the application > runs in DC1, it prefers to read from C in DC1, and the applications prefers > to failover to C in DC2 if the application is migrated to DC2 or when C in > DC1 is unavailable. New application data versions are created > periodically/relatively infrequently. > In order to address many common scenarios in a general fashion, and to avoid > unnecessary code duplication, we implement this functionality in ViewFs (our > default FileSystem spanning all clusters in all datacenters) in a project > code-named Nfly (N as in N datacenters). Currently each ViewFs Inode points > to a single URI via ChRootedFileSystem. Consequently, we introduce a new type > of links that points to a list of URIs that are each going to be wrapped in > ChRootedFileSystem. A typical usage: > /nfly/C/user->/DC1/C/user,/DC2/C/user,... This collection of > ChRootedFileSystem instances is fronted by the Nfly filesystem object that is > actually used for the mount point/Inode. Nfly filesystems backs a single > logical path /nfly/C/user/<user>/path by multiple physical paths. > Nfly filesystem supports setting minReplication. As long as the number of > URIs on which an update has succeeded is greater than or equal to > minReplication exceptions are only logged but not thrown. Each update > operation is currently executed serially (client-bandwidth driven parallelism > will be added later). > A file create/write: > # Creates a temporary invisible _nfly_tmp_file in the intended chrooted > filesystem. > # Returns a FSDataOutputStream that wraps output streams returned by 1 > # All writes are forwarded to each output stream. > # On close of stream created by 2, all n streams are closed, and the files > are renamed from _nfly_tmp_file to file. All files receive the same mtime > corresponding to the client system time as of beginning of this step. > # If at least minReplication destinations has gone through steps 1-4 without > failures the transaction is considered logically committed, otherwise a > best-effort attempt of cleaning up the temporary files is attempted. > As for reads, we support a notion of locality similar to HDFS /DC/rack/node. > We sort Inode URIs using NetworkTopology by their authorities. These are > typically host names in simple HDFS URIs. If the authority is missing as is > the case with the local file:/// the local host name is assumed > InetAddress.getLocalHost(). This makes sure that the local file system is > always the closest one to the reader in this approach. For our Hadoop 2 hdfs > URIs that are based on nameservice ids instead of hostnames it is very easy > to adjust the topology script since our nameservice ids already contain the > datacenter. As for rack and node we can simply output any string such as > /DC/rack-nsid/node-nsid, since we only care about datacenter-locality for > such filesystem clients. > There are 2 policies/additions to the read call path that makes it more > expensive, but improve user experience: > - readMostRecent - when this policy is enabled, Nfly first checks mtime for > the path under all URIs, sorts them from most recent to least recent. Nfly > then sorts the set of most recent URIs topologically in the same manner as > described above. > - repairOnRead - when readMostRecent is enabled Nfly already has to RPC all > underlying destinations. With repairOnRead, Nfly filesystem would > additionally attempt to refresh destinations with the path missing or a stale > version of the path using the nearest available most recent destination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)