[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843549#comment-13843549 ]
Arpit Agarwal commented on HDFS-2832: ------------------------------------- {quote} I bring them up again because, 4 months later, I was wondering if you had any thoughts on potential solutions that could be added to the doc. It's fine if automatic migration, open files, more elaborate resource management, and additional storage types are all not in immediate scope, but I assume we'll want them in the future. {quote} Andrew, automatic migration is not in scope for our design. wrt open files can you describe a specific use case you think we should be handling that we have not described? Maybe it will help me understand your concern better. If you are concerned about reclaiming capacity for in use blocks, that is analogous to asking "If a process keeps a long-lived handle to a file what will the operating system do to reclaim disk space used by the file?" and the answer is the same - nothing. I don't want anyone reading your comments to get a false impression that the feature is incompatible with SCR. {quote} Well, CDH supports rolling upgrades in some situations. ATM is working on metadata upgrade with HA enabled (HDFS-5138) and I've seen some recent JIRAs related to rolling upgrade (HDFS-5535), so it seems like a reasonable question. At least at the protobuf level, everything so far looks compatible, so I thought it might work as long as the handler code is compatible too. {quote} I am not familiar with how CDH does rolling upgrades so I cannot tell you whether it will work. You recently bumped the layout version for caching so you might recall that HDFS layout version checks prevent a DN registering with an NN with a mismatched version. To my knowledge HDFS-5535 will not fix this limitation either. That said, we have retained wire-protocol compatibility. {quote} Do you forsee heartbeats and block reports always being combined in realistic scenarios? Or are there reasons to split it? Is there any additional overhead from splitting? Can we save any complexity by not supporting split reports? I see this on the test matrix. {quote} I thought I answered it, maybe if you describe your concerns I can give you a better answer. When the test plan says 'split' I meant splitting the reports across multiple requests. Reports will always be split by storage but we are not splitting them across multiple messages for now. What kind of overhead are you thinking of? {quote} b1. Have you put any thought about metrics and tooling to help users and admins debug their quota usage and issues with migrating files to certain storage types? {quote} We'll include it in the next design rev as we start phase 2. > Enable support for heterogeneous storages in HDFS > ------------------------------------------------- > > Key: HDFS-2832 > URL: https://issues.apache.org/jira/browse/HDFS-2832 > Project: Hadoop HDFS > Issue Type: New Feature > Affects Versions: 0.24.0 > Reporter: Suresh Srinivas > Assignee: Suresh Srinivas > Attachments: 20130813-HeterogeneousStorage.pdf, > 20131125-HeterogeneousStorage-TestPlan.pdf, > 20131125-HeterogeneousStorage.pdf, > 20131202-HeterogeneousStorage-TestPlan.pdf, > 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, > editsStored, h2832_20131023.patch, h2832_20131023b.patch, > h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, > h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, > h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, > h2832_20131110.patch, h2832_20131110b.patch, h2832_20131111.patch, > h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, > h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, > h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, > h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, > h2832_20131203.patch > > > HDFS currently supports configuration where storages are a list of > directories. Typically each of these directories correspond to a volume with > its own file system. All these directories are homogeneous and therefore > identified as a single storage at the namenode. I propose, change to the > current model where Datanode * is a * storage, to Datanode * is a collection > * of strorages. -- This message was sent by Atlassian JIRA (v6.1.4#6159)