[ 
https://issues.apache.org/jira/browse/HADOOP-12077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated HADOOP-12077:
-----------------------------------
    Attachment: HADOOP-12077.002.patch

Addressed a bulk of check style warnings. Test failures are unrelated and don't 
reproduce locally. 

> Provide a muti-URI replication Inode for ViewFs
> -----------------------------------------------
>
>                 Key: HADOOP-12077
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12077
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Gera Shegalov
>            Assignee: Gera Shegalov
>         Attachments: HADOOP-12077.001.patch, HADOOP-12077.002.patch
>
>
> This JIRA is to provide simple "replication" capabilities for applications 
> that maintain logically equivalent paths in multiple locations for caching or 
> failover (e.g., S3 and HDFS). We noticed a simple common HDFS usage pattern 
> in our applications. They host their data on some logical cluster C. There 
> are corresponding HDFS clusters in multiple datacenters. When the application 
> runs in DC1, it prefers to read from C in DC1, and the applications prefers 
> to failover to C in DC2 if the application is migrated to DC2 or when C in 
> DC1 is unavailable. New application data versions are created 
> periodically/relatively infrequently. 
> In order to address many common scenarios in a general fashion, and to avoid 
> unnecessary code duplication, we implement this functionality in ViewFs (our 
> default FileSystem spanning all clusters in all datacenters) in a project 
> code-named Nfly (N as in N datacenters). Currently each ViewFs Inode points 
> to a single URI via ChRootedFileSystem. Consequently, we introduce a new type 
> of links that points to a list of URIs that are each going to be wrapped in 
> ChRootedFileSystem. A typical usage: 
> /nfly/C/user->/DC1/C/user,/DC2/C/user,... This collection of 
> ChRootedFileSystem instances is fronted by the Nfly filesystem object that is 
> actually used for the mount point/Inode. Nfly filesystems backs a single 
> logical path /nfly/C/user/<user>/path by multiple physical paths.
> Nfly filesystem supports setting minReplication. As long as the number of 
> URIs on which an update has succeeded is greater than or equal to 
> minReplication exceptions are only logged but not thrown. Each update 
> operation is currently executed serially (client-bandwidth driven parallelism 
> will be added later). 
> A file create/write: 
> # Creates a temporary invisible _nfly_tmp_file in the intended chrooted 
> filesystem. 
> # Returns a FSDataOutputStream that wraps output streams returned by 1
> # All writes are forwarded to each output stream.
> # On close of stream created by 2, all n streams are closed, and the files 
> are renamed from _nfly_tmp_file to file. All files receive the same mtime 
> corresponding to the client system time as of beginning of this step. 
> # If at least minReplication destinations has gone through steps 1-4 without 
> failures the transaction is considered logically committed, otherwise a 
> best-effort attempt of cleaning up the temporary files is attempted.
> As for reads, we support a notion of locality similar to HDFS  /DC/rack/node. 
> We sort Inode URIs using NetworkTopology by their authorities. These are 
> typically host names in simple HDFS URIs. If the authority is missing as is 
> the case with the local file:/// the local host name is assumed 
> InetAddress.getLocalHost(). This makes sure that the local file system is 
> always the closest one to the reader in this approach. For our Hadoop 2 hdfs 
> URIs that are based on nameservice ids instead of hostnames it is very easy 
> to adjust the topology script since our nameservice ids already contain the 
> datacenter. As for rack and node we can simply output any string such as 
> /DC/rack-nsid/node-nsid, since we only care about datacenter-locality for 
> such filesystem clients.
> There are 2 policies/additions to the read call path that makes it more 
> expensive, but improve user experience:
> - readMostRecent - when this policy is enabled, Nfly first checks mtime for 
> the path under all URIs, sorts them from most recent to least recent. Nfly 
> then sorts the set of most recent URIs topologically in the same manner as 
> described above.
> - repairOnRead - when readMostRecent is enabled Nfly already has to RPC all 
> underlying destinations. With repairOnRead, Nfly filesystem would 
> additionally attempt to refresh destinations with the path missing or a stale 
> version of the path using the nearest available most recent destination. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to