[ https://issues.apache.org/jira/browse/HDFS-4672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629403#comment-13629403 ]
Konstantin Shvachko commented on HDFS-4672: ------------------------------------------- I think the Extended Attributes is orthogonal to this issue. <bq>Extended file attributes is a file system feature that enables users to associate computer files with metadata not interpreted by the filesystem<bq> This implies > Support tiered storage policies > ------------------------------- > > Key: HDFS-4672 > URL: https://issues.apache.org/jira/browse/HDFS-4672 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, hdfs-client, libhdfs, namenode > Reporter: Andrew Purtell > > We would like to be able to create certain files on certain storage device > classes (e.g. spinning media, solid state devices, RAM disk, non-volatile > memory). HDFS-2832 enables heterogeneous storage at the DataNode, so the > NameNode can gain awareness of what different storage options are available > in the pool and where they are located, but no API is provided for clients or > block placement plugins to perform device aware block placement. We would > like to propose a set of extensions that also have broad applicability to use > cases where storage device affinity is important: > > - Add an enum of generic storage device classes, borrowing from current > taxonomy of the storage industry > > - Augment DataNode volume metadata in storage reports with this enum > > - Extend the namespace so pluggable block policies can be specified on a > directory and storage device class can be tracked in the Inode. Perhaps this > could be a larger discussion on adding support for extended attributes in the > HDFS namespace. The Inode should track both the storage device class hint and > the current actual storage device class. FileStatus should expose this > information (or xattrs in general) to clients. > > - Extend the pluggable block policy framework so policies can also consider, > and specify, affinity for a particular storage device class > > - Extend the file creation API to accept a storage device class affinity > hint. Such a hint can be supplied directly as a parameter, or, if we are > considering extended attribute support, then instead as one of a set of > xattrs. The hint would be stored in the namespace and also used by the client > to indicate to the NameNode/block placement policy/DataNode constraints on > block placement. Furthermore, if xattrs or device storage class affinity > hints are associated with directories, then the NameNode should provide the > storage device affinity hint to the client in the create API response, so the > client can provide the appropriate hint to DataNodes when writing new blocks. > > - The list of candidate DataNodes for new blocks supplied by the NameNode to > clients should be weighted/sorted by availability of the desired storage > device class. > > - Block replication should consider storage device affinity hints. If a > client move()s a file from a location under a path with affinity hint X to > under a path with affinity hint Y, then all blocks currently residing on > media X should be eventually replicated onto media Y with the then excess > replicas on media X deleted. > > - Introduce the concept of degraded path: a path can be degraded if a block > placement policy is forced to abandon a constraint in order to persist the > block, when there may not be available space on the desired device class, or > to maintain the minimum necessary replication factor. This concept is > distinct from the corrupt path, where one or more blocks are missing. Paths > in degraded state should be periodically reevaluated for re-replication. > > - The FSShell should be extended with commands for changing the storage > device class hint for a directory or file. > > - Clients like DistCP which compare metadata should be extended to be aware > of the storage device class hint. For DistCP specifically, there should be an > option to ignore the storage device class hints, enabled by default. > > Suggested semantics: > > - The default storage device class should be the null class, or simply the > “default class”, for all cases where a hint is not available. This should be > configurable. hdfs-defaults.xml could provide the default as spinning media. > > - A storage device class hint should be provided (and is necessary) only when > the default is not sufficient. > > - For backwards compatibility, any FSImage or edit log entry lacking a > storage device class hint is interpreted as having affinity for the null > class. > > - All blocks for a given file share the same storage device class. If the > replication factor for this file is increased the replicas should all be > placed on the same storage device class. > > - If one or more blocks for a given file cannot be placed on the required > device class, then the file is marked as degraded. Files in degraded state > should be periodically reevaluated for re-replication. > > - A directory and path can only have one storage device affinity hint. If the > file inode specifies a hint, this is used, otherwise we walk up the path > until a hint is found and use that one, otherwise the default storage class > is used. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira