[ https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252213#comment-15252213 ]
Chris Douglas commented on HDFS-9806: ------------------------------------- We started down that path, citing similar reasoning. [~jghoman] and I even implemented a similar prototype. We elected to abandon a client-driven approach for one driven by the infrastructure, for a few reasons: * Like ViewFS, each client maintaining its own mount tables is powerful, but that flexibility interferes with sharing. Defining consistency between storage tiers when each client has an idiosyncratic mapping is not trivial. For example, if two clients map a path in {{smallFS}} to a different location in {{bigFS}}, the result of many operations will be undefined. By contrast, if that mapping is part of HDFS, then one class of potential conflicts is obviated. Conflicts still exist, but these will be between concurrent writers to {{bigFS}}. * Fault-tolerant, client-driven write-through caching is difficult, and in some cases impossible, to implement. Unless other clients (that must share the same mapping) can recover operations from {{smallFS}} to {{bigFS}}, client failures will create inconsistency. For example, if a client appends to {{smallFS}} and fails- or is partitioned from {{bigFS}}, credentials expire, etc.- what component will recover the operation? If recovery is implemented in the client, then another client appending to {{smallFS}} must first replay that operation in {{bigFS}}. * Migrating data, evicting data from the cache, quotas, etc. are all expressible using _existing_ HDFS machinery. If external storage complements the existing abstractions, then it is not yet another service in Hadoop clusters, neither will it need to re-implement (and require reconciliation) with the functionality already written and debugged in HDFS. * Some storage systems have simpler security models than HDFS, often a single key. To avoid giving every client access to the {{bigFS}} storage account (breaking HDFS security), an operator can embed the credentials in HDFS. * Not all external storage systems will be FileSystems. Many sync operations are much easier when blocks are mapped to objects, rather than file regions. You raise a good point about the granularity of caching policies. Files, directories, and blocks are all viable. The policy that directs the content of the cache need not match the mechanism; even if we serve by block we may keep metrics at the file, or even directory level. We'll add more detail to the design doc on this point (apologies for its delay; my fault). > Allow HDFS block replicas to be provided by an external storage system > ---------------------------------------------------------------------- > > Key: HDFS-9806 > URL: https://issues.apache.org/jira/browse/HDFS-9806 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Chris Douglas > > In addition to heterogeneous media, many applications work with heterogeneous > storage systems. The guarantees and semantics provided by these systems are > often similar, but not identical to those of > [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html]. > Any client accessing multiple storage systems is responsible for reasoning > about each system independently, and must propagate/and renew credentials for > each store. > Remote stores could be mounted under HDFS. Block locations could be mapped to > immutable file regions, opaque IDs, or other tokens that represent a > consistent view of the data. While correctness for arbitrary operations > requires careful coordination between stores, in practice we can provide > workable semantics with weaker guarantees. -- This message was sent by Atlassian JIRA (v6.3.4#6332)