[ 
https://issues.apache.org/jira/browse/HDFS-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323433#comment-15323433
 ] 

Virajith Jalaparti commented on HDFS-9806:
------------------------------------------

Thanks [~ehiggs] (and, [~PieterReuse] and [~Thomas Demoor]) for getting the PoC 
working against S3! It will definitely be interesting to look at the changes 
you had to make. 

bq. It makes sense to us for there to be a series of commands to attach, 
detach, and rescan provided storage from the command line. 

Yes, agreed! Our first cut solution for this is same as what you suggested -- 
have different NNs manage different provided storages and have a particular NN 
manage the local storage. Using HDFS federation and ViewFs across these NNs can 
enable the mount functionality you suggested. A long-term solution can be to 
add _mount points_ in the NN  -- this would not only allow operations with a 
single NN but also allow operations under the mount point without holding the 
FSN lock. 

We will update the document with a roadmap on how the implementation can be 
staged. 

bq. PROVIDED blocks are not stored with the {{INodeFile}}

Are you referring to over-replication of blocks due to read-through caching? If 
so, that is addressed by [~chris.douglas]'s comment above. The PROVIDED blocks 
are treated as any other blocks. {{INodeFile}} will contain references to these 
blocks ({{StorageType}} will be marked as PROVIDED), and will have a 
_composite_ {{DatanodeStorage}} associated with them as their location (as 
mentioned in Section 4.1 in the document). Whenever an attempt is made to get 
the locations of these blocks, the composite is resolved to one of the DNs that 
advertised this storage. 

bq.  If we want to attach multiple provided storage locations within a single 
NN, ...

When there are multiple PROVIDED storages, different {{storageId}} s can be 
used to distinguish them. The multiplexing/de-multiplexing using the 
{{storageId}} can be handled inside {{ProvidedStorageMap}}, so as to avoid 
extensive changes to the {{BlockManager}}.



> Allow HDFS block replicas to be provided by an external storage system
> ----------------------------------------------------------------------
>
>                 Key: HDFS-9806
>                 URL: https://issues.apache.org/jira/browse/HDFS-9806
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Chris Douglas
>         Attachments: HDFS-9806-design.001.pdf
>
>
> In addition to heterogeneous media, many applications work with heterogeneous 
> storage systems. The guarantees and semantics provided by these systems are 
> often similar, but not identical to those of 
> [HDFS|https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/index.html].
>  Any client accessing multiple storage systems is responsible for reasoning 
> about each system independently, and must propagate/and renew credentials for 
> each store.
> Remote stores could be mounted under HDFS. Block locations could be mapped to 
> immutable file regions, opaque IDs, or other tokens that represent a 
> consistent view of the data. While correctness for arbitrary operations 
> requires careful coordination between stores, in practice we can provide 
> workable semantics with weaker guarantees.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to