[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658902#comment-14658902
 ] 

Andrew Wang commented on HDFS-8833:
-----------------------------------

I think EC schema differs from storage policies since the useful configuration 
space is much smaller. This is based on how other storage systems with EC 
support only a single (or very few) schemas. QFS I believe hardcodes 6,3 with a 
64KB cell size. Similarly, f4 does just 10,4. Even if we were to provide all 
the EC schemas implemented by these other storage systems, I bet we'd be well 
within 64 (and probably even well within 16). I think 64 would give us plenty 
of room even when adding new codecs like Hitchhiker or LRC.

For directory rename, the idea is we set the EC policy on the directory? This 
feels a little weird (and different from storage policies) since it means 
renaming a subdir implicitly calls createECZone or whatever on the subdir, i.e.:

{noformat}
/ec1 <-- ec policy 1
/ec2 <-- ec policy 2

rename /ec2/subdir to /ec1/subdir
create new file /ec1/subdir/foo
/ec1/subdir/foo will get policy 2, not policy 1
{noformat}

We could maybe solve this by having the directory-rename specify "schema of 
contained files" but not specify the schema of newly created files...but then 
we need to attach schema info on new files since they differ from the 
directory's info. More complicated for sure, I haven't thought through all the 
cases.

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to