[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14658902#comment-14658902 ]
Andrew Wang commented on HDFS-8833: ----------------------------------- I think EC schema differs from storage policies since the useful configuration space is much smaller. This is based on how other storage systems with EC support only a single (or very few) schemas. QFS I believe hardcodes 6,3 with a 64KB cell size. Similarly, f4 does just 10,4. Even if we were to provide all the EC schemas implemented by these other storage systems, I bet we'd be well within 64 (and probably even well within 16). I think 64 would give us plenty of room even when adding new codecs like Hitchhiker or LRC. For directory rename, the idea is we set the EC policy on the directory? This feels a little weird (and different from storage policies) since it means renaming a subdir implicitly calls createECZone or whatever on the subdir, i.e.: {noformat} /ec1 <-- ec policy 1 /ec2 <-- ec policy 2 rename /ec2/subdir to /ec1/subdir create new file /ec1/subdir/foo /ec1/subdir/foo will get policy 2, not policy 1 {noformat} We could maybe solve this by having the directory-rename specify "schema of contained files" but not specify the schema of newly created files...but then we need to attach schema info on new files since they differ from the directory's info. More complicated for sure, I haven't thought through all the cases. > Erasure coding: store EC schema and cell size in INodeFile and eliminate > notion of EC zones > ------------------------------------------------------------------------------------------- > > Key: HDFS-8833 > URL: https://issues.apache.org/jira/browse/HDFS-8833 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: HDFS-7285 > Reporter: Zhe Zhang > Assignee: Zhe Zhang > > We have [discussed | > https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] > storing EC schema with files instead of EC zones and recently revisited the > discussion under HDFS-8059. > As a recap, the _zone_ concept has severe limitations including renaming and > nested configuration. Those limitations are valid in encryption for security > reasons and it doesn't make sense to carry them over in EC. > This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For > simplicity, we should first implement it as an xattr and consider memory > optimizations (such as moving it to file header) as a follow-on. We should > also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)