[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

Arpit Agarwal (JIRA) Mon, 26 Aug 2013 16:19:48 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750715#comment-13750715
 ]


Arpit Agarwal commented on HDFS-2832:
-------------------------------------

Hi Andrew,

Thanks for looking at doc, great questions.

{quote}
- For quota management, have you considered the YARN-like abstraction of users 
and pools? We're moving down that avenue in HDFS-4949, and it'd be nice to 
eventually have a single abstraction if we can. I get that for a first cut, 
it's easier to stick with the existing disk quota system.
{quote}
I am interested in seeing how the pools abstraction is defined and whether it 
can cover all our use cases. Do you have a design doc? We chose this approach 
because it extends the existing quota system and APIs and more importantly 
covers our use cases.

{quote}
- How do you expect applications to handle runtime failures? If I have a stream 
open and my write fails due to lack of SSD quota, can I change it to retry the 
write to HDD? Do I get metrics so I can alert somewhere?
{quote}
"Out of quota" is a hard failure just like hitting the disk space quota limit. 
The application must change the Storage Preference on the file to continue. We 
have not discussed metrics yet.

{quote}
- How do you handle block migration of files opened by long-lived applications 
like HBase that also use short-circuit local reads? Let's say HBase initially 
writes all its files to SSD, then we want to periodically migrate them to HDD. 
HBase holds onto the SSD file descriptors indefinitely, preventing reclamation 
of SSD capacity.
{quote}
Quota should be blocked indefinitely until the files can be moved off their 
current Storage Type. We did not cover this use case, so thanks for calling it 
out! I will make the update.

{quote}
- If "File Storage Preferences" are part of file metadata, what happens when 
the files are copied, or distcp'd to another cluster?
{quote}
I think this is a tools decision. We probably want to lose the File Attributes, 
with an option to preserve them. We will document this in more detail when we 
get to updating the tools.

{quote}
- Why do we want a default "Storage Preferences" specified on a directory? I'd 
actually prefer if we make applications explicitly request special treatment 
when they open a stream.
{quote}
Storage Preferences are not supported on directories. Please let me know if you 
see anything in the doc implying otherwise and I will fix it.

{quote}
- Let's say I'm a cluster operator, and have nodes with both PCI-e and SATA 
SSDs. Can I differentiate between them? How about if I add nodes with an 
unknown StorageType like NVRAM? Basically: what's required to add a new 
StorageType?
- Also related, when I bring up a new StorageType in my cluster, how do I make 
my applications start using it? Do I need to submit patches to HBase to now 
know how to use NVRAM properly? This seems like one of the downsides of 
physical storage types, logical means apps can do this more automatically.
{quote}
Adding a new StorageType needs code and update to the StorageType enum. We made 
the trade-off for API and implementation simplicity for v1 but we are not 
ruling out adding support for logical classification in the future.
                
> Enable support for heterogeneous storages in HDFS
> -------------------------------------------------
>
>                 Key: HDFS-2832
>                 URL: https://issues.apache.org/jira/browse/HDFS-2832
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 0.24.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: 20130813-HeterogeneousStorage.pdf
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

Reply via email to