[ https://issues.apache.org/jira/browse/HDFS-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113829#comment-16113829 ]
SammiChen commented on HDFS-11082: ---------------------------------- Thanks [~andrew.wang] for the quick review! I just realized that document is not updated, will update it later. {quote} Also need to think about the behavior of getErasureCodingPolicy. Right now it returns "null" to mean replication. With this patch, a user would have to check both for "null" and "replication-1-2-64K" to know if it's replicated. It'd be good to choose one or the other to make it simpler for downstreams. "null" would be more compatible, and it'd hide the special replicated EC policy from non-admin users which I like. {quote} Currently, replication policy can only be set on directory, not the file. Because currently in file header format, replication factor and ec policy ID share the same bits. So a file can be either traditional replication or effective EC, cannot have replication EC policy. For getErasureCodingPolicy on directory, return "null" or "replication-1-2-64k", both have pros and cons. If return "null" for replication EC policy, Pros: 1. It's easy for downstream applications to check it is effectively EC or replication Cons: 1. after set replication EC policy on directory, it cannot be get back, so there is no way to unset the policy or aware of the policy from user's point of view. User cannot distinguish a traditional replication directory and an replication EC policy directory. If return "replication-1-2-64k", the pros and cons are reversed. So it's a style choice, one is give all information to user and let them decide, another is handle it internally on behalf of user. I'm prone to give all information to user. But I'm OK to go "null" solution if it's for sure will add more benefit to users. I think you have more experience on this. You make the call. {quote} This is not directly related (and I think we discussed this a bit on another JIRA) but I'm not happy with our getECPolicy API right now. Right now it returns the effective EC policy. Without being able to query the actual EC policy, the behavior when setting/unsetting is kind of tricky. Should we add an "getActualECPolicy" API? Can be a follow-on JIRA. {quote} Do you refer to {{getErasureCodingPolicy}} when you say {{getECPolicy}}? I'm kind of forget when we have discussed this issue. Can you give more hints? The suggestions in all other comments will be addressed in next patch. > Erasure Coding : Provide replicated EC policy to just replicating the files > --------------------------------------------------------------------------- > > Key: HDFS-11082 > URL: https://issues.apache.org/jira/browse/HDFS-11082 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding > Reporter: Rakesh R > Assignee: SammiChen > Priority: Critical > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-11082.001.patch > > > The idea of this jira is to provide a new {{replicated EC policy}} so that we > can override the EC policy on a parent directory and go back to just > replicating the files based on replication factors. > Thanks [~andrew.wang] for the > [discussions|https://issues.apache.org/jira/browse/HDFS-11072?focusedCommentId=15620743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15620743]. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org