[ 
https://issues.apache.org/jira/browse/HDFS-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113829#comment-16113829
 ] 

SammiChen commented on HDFS-11082:
----------------------------------

Thanks [~andrew.wang] for the quick review! I just realized that document is 
not updated, will update it later. 
{quote}
Also need to think about the behavior of getErasureCodingPolicy. Right now it 
returns "null" to mean replication. With this patch, a user would have to check 
both for "null" and "replication-1-2-64K" to know if it's replicated. It'd be 
good to choose one or the other to make it simpler for downstreams. "null" 
would be more compatible, and it'd hide the special replicated EC policy from 
non-admin users which I like.
{quote}
Currently, replication policy can only be set on directory, not the file. 
Because currently in file header format, replication factor and ec policy ID 
share the same bits. So a file can be either traditional replication or 
effective EC, cannot have replication EC policy. 
For getErasureCodingPolicy on directory, return "null" or 
"replication-1-2-64k", both have pros and cons.  If return "null" for 
replication EC policy,
Pros:  1. It's easy for downstream applications to check it is effectively EC 
or replication
Cons: 1. after set replication EC policy on directory, it cannot be get back, 
so there is no way to unset the policy or aware of the policy from user's point 
of view.  User cannot distinguish a traditional replication directory and an 
replication EC policy directory. 
If return "replication-1-2-64k", the pros and cons are reversed.  So it's a 
style choice, one is give all information to user and let them decide, another 
is handle it internally on behalf of user. 
I'm prone to give all information to user. But I'm OK to go "null" solution if 
it's for sure will add more benefit to users. I think you have more experience 
on this. You make the call. 

{quote}
This is not directly related (and I think we discussed this a bit on another 
JIRA) but I'm not happy with our getECPolicy API right now. Right now it 
returns the effective EC policy. Without being able to query the actual EC 
policy, the behavior when setting/unsetting is kind of tricky. Should we add an 
"getActualECPolicy" API? Can be a follow-on JIRA.
{quote}
Do you refer to {{getErasureCodingPolicy}} when you say {{getECPolicy}}?  I'm 
kind of forget when we have discussed this issue. Can you give more hints? 

The suggestions in all other comments will be addressed in next patch. 










> Erasure Coding : Provide replicated EC policy to just replicating the files
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-11082
>                 URL: https://issues.apache.org/jira/browse/HDFS-11082
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>            Reporter: Rakesh R
>            Assignee: SammiChen
>            Priority: Critical
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-11082.001.patch
>
>
> The idea of this jira is to provide a new {{replicated EC policy}} so that we 
> can override the EC policy on a parent directory and go back to just 
> replicating the files based on replication factors.
> Thanks [~andrew.wang] for the 
> [discussions|https://issues.apache.org/jira/browse/HDFS-11072?focusedCommentId=15620743&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15620743].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to