[ 
https://issues.apache.org/jira/browse/HDFS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627272#comment-13627272
 ] 

Colin Patrick McCabe commented on HDFS-4672:
--------------------------------------------

We can add extended attributes in a way that imposes zero overhead for users 
who don't make use of them, by creating another subclass (or subclasses) of 
INode.  Inherited xattrs (that apply to all descendants) is also a reasonable 
idea.
                
> Support tiered storage policies
> -------------------------------
>
>                 Key: HDFS-4672
>                 URL: https://issues.apache.org/jira/browse/HDFS-4672
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, libhdfs, namenode
>            Reporter: Andrew Purtell
>
> We would like to be able to create certain files on certain storage device 
> classes (e.g. spinning media, solid state devices, RAM disk, non-volatile 
> memory). HDFS-2832 enables heterogeneous storage at the DataNode, so the 
> NameNode can gain awareness of what different storage options are available 
> in the pool and where they are located, but no API is provided for clients or 
> block placement plugins to perform device aware block placement. We would 
> like to propose a set of extensions that also have broad applicability to use 
> cases where storage device affinity is important:
>  
> - Add an enum of generic storage device classes, borrowing from current 
> taxonomy of the storage industry
>  
> - Augment DataNode volume metadata in storage reports with this enum
>  
> - Extend the namespace so pluggable block policies can be specified on a 
> directory and storage device class can be tracked in the Inode. Perhaps this 
> could be a larger discussion on adding support for extended attributes in the 
> HDFS namespace. The Inode should track both the storage device class hint and 
> the current actual storage device class. FileStatus should expose this 
> information (or xattrs in general) to clients.
>  
> - Extend the pluggable block policy framework so policies can also consider, 
> and specify, affinity for a particular storage device class
>  
> - Extend the file creation API to accept a storage device class affinity 
> hint. Such a hint can be supplied directly as a parameter, or, if we are 
> considering extended attribute support, then instead as one of a set of 
> xattrs. The hint would be stored in the namespace and also used by the client 
> to indicate to the NameNode/block placement policy/DataNode constraints on 
> block placement. Furthermore, if xattrs or device storage class affinity 
> hints are associated with directories, then the NameNode should provide the 
> storage device affinity hint to the client in the create API response, so the 
> client can provide the appropriate hint to DataNodes when writing new blocks.
>  
> - The list of candidate DataNodes for new blocks supplied by the NameNode to 
> clients should be weighted/sorted by availability of the desired storage 
> device class. 
>  
> - Block replication should consider storage device affinity hints. If a 
> client move()s a file from a location under a path with affinity hint X to 
> under a path with affinity hint Y, then all blocks currently residing on 
> media X should be eventually replicated onto media Y with the then excess 
> replicas on media X deleted.
>  
> - Introduce the concept of degraded path: a path can be degraded if a block 
> placement policy is forced to abandon a constraint in order to persist the 
> block, when there may not be available space on the desired device class, or 
> to maintain the minimum necessary replication factor. This concept is 
> distinct from the corrupt path, where one or more blocks are missing. Paths 
> in degraded state should be periodically reevaluated for re-replication.
>  
> - The FSShell should be extended with commands for changing the storage 
> device class hint for a directory or file. 
>  
> - Clients like DistCP which compare metadata should be extended to be aware 
> of the storage device class hint. For DistCP specifically, there should be an 
> option to ignore the storage device class hints, enabled by default.
>  
> Suggested semantics:
>  
> - The default storage device class should be the null class, or simply the 
> “default class”, for all cases where a hint is not available. This should be 
> configurable. hdfs-defaults.xml could provide the default as spinning media.
>  
> - A storage device class hint should be provided (and is necessary) only when 
> the default is not sufficient.
>  
> - For backwards compatibility, any FSImage or edit log entry lacking a  
> storage device class hint is interpreted as having affinity for the null 
> class.
>  
> - All blocks for a given file share the same storage device class. If the 
> replication factor for this file is increased the replicas should all be 
> placed on the same storage device class.
>  
> - If one or more blocks for a given file cannot be placed on the required 
> device class, then the file is marked as degraded. Files in degraded state 
> should be periodically reevaluated for re-replication. 
>  
> - A directory and path can only have one storage device affinity hint. If the 
> file inode specifies a hint, this is used, otherwise we walk up the path 
> until a hint is found and use that one, otherwise the default storage class 
> is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to