[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252213#comment-15252213
 ] 

Chris Douglas commented on HDFS-9806:
-------------------------------------

We started down that path, citing similar reasoning. [~jghoman] and I even 
implemented a similar prototype. We elected to abandon a client-driven approach 
for one driven by the infrastructure, for a few reasons:
* Like ViewFS, each client maintaining its own mount tables is powerful, but 
that flexibility interferes with sharing. Defining consistency between storage 
tiers when each client has an idiosyncratic mapping is not trivial. For 
example, if two clients map a path in {{smallFS}} to a different location in 
{{bigFS}}, the result of many operations will be undefined. By contrast, if 
that mapping is part of HDFS, then one class of potential conflicts is 
obviated. Conflicts still exist, but these will be between concurrent writers 
to {{bigFS}}.
* Fault-tolerant, client-driven write-through caching is difficult, and in some 
cases impossible, to implement. Unless other clients (that must share the same 
mapping) can recover operations from {{smallFS}} to {{bigFS}}, client failures 
will create inconsistency. For example, if a client appends to {{smallFS}} and 
fails- or is partitioned from {{bigFS}}, credentials expire, etc.- what 
component will recover the operation? If recovery is implemented in the client, 
then another client appending to {{smallFS}} must first replay that operation 
in {{bigFS}}.
* Migrating data, evicting data from the cache, quotas, etc. are all 
expressible using _existing_ HDFS machinery. If external storage complements 
the existing abstractions, then it is not yet another service in Hadoop 
clusters, neither will it need to re-implement (and require reconciliation) 
with the functionality already written and debugged in HDFS.
* Some storage systems have simpler security models than HDFS, often a single 
key. To avoid giving every client access to the {{bigFS}} storage account 
(breaking HDFS security), an operator can embed the credentials in HDFS.
* Not all external storage systems will be FileSystems. Many sync operations 
are much easier when blocks are mapped to objects, rather than file regions.

You raise a good point about the granularity of caching policies. Files, 
directories, and blocks are all viable. The policy that directs the content of 
the cache need not match the mechanism; even if we serve by block we may keep 
metrics at the file, or even directory level.

We'll add more detail to the design doc on this point (apologies for its delay; 
my fault).

> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9806
>                 URL: https://issues.apache.org/jira/browse/HDFS-9806
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Douglas
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to