[ https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323433#comment-15323433 ]
Virajith Jalaparti commented on HDFS-9806: ------------------------------------------ Thanks [~ehiggs] (and, [~PieterReuse] and [~Thomas Demoor]) for getting the PoC working against S3! It will definitely be interesting to look at the changes you had to make. bq. It makes sense to us for there to be a series of commands to attach, detach, and rescan provided storage from the command line. Yes, agreed! Our first cut solution for this is same as what you suggested -- have different NNs manage different provided storages and have a particular NN manage the local storage. Using HDFS federation and ViewFs across these NNs can enable the mount functionality you suggested. A long-term solution can be to add _mount points_ in the NN -- this would not only allow operations with a single NN but also allow operations under the mount point without holding the FSN lock. We will update the document with a roadmap on how the implementation can be staged. bq. PROVIDED blocks are not stored with the {{INodeFile}} Are you referring to over-replication of blocks due to read-through caching? If so, that is addressed by [~chris.douglas]'s comment above. The PROVIDED blocks are treated as any other blocks. {{INodeFile}} will contain references to these blocks ({{StorageType}} will be marked as PROVIDED), and will have a _composite_ {{DatanodeStorage}} associated with them as their location (as mentioned in Section 4.1 in the document). Whenever an attempt is made to get the locations of these blocks, the composite is resolved to one of the DNs that advertised this storage. bq. If we want to attach multiple provided storage locations within a single NN, ... When there are multiple PROVIDED storages, different {{storageId}} s can be used to distinguish them. The multiplexing/de-multiplexing using the {{storageId}} can be handled inside {{ProvidedStorageMap}}, so as to avoid extensive changes to the {{BlockManager}}. > Allow HDFS block replicas to be provided by an external storage system > ---------------------------------------------------------------------- > > Key: HDFS-9806 > URL: https://issues.apache.org/jira/browse/HDFS-9806 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Chris Douglas > Attachments: HDFS-9806-design.001.pdf > > > In addition to heterogeneous media, many applications work with heterogeneous > storage systems. The guarantees and semantics provided by these systems are > often similar, but not identical to those of > [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html]. > Any client accessing multiple storage systems is responsible for reasoning > about each system independently, and must propagate/and renew credentials for > each store. > Remote stores could be mounted under HDFS. Block locations could be mapped to > immutable file regions, opaque IDs, or other tokens that represent a > consistent view of the data. While correctness for arbitrary operations > requires careful coordination between stores, in practice we can provide > workable semantics with weaker guarantees. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org