[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713580#comment-14713580 ]
Walter Su commented on HDFS-8833: --------------------------------- bq. Rename isStriped to erasureCodingPolicy in INodeFile header. Please don't do that. bq. (Zhe Zhang) One hybrid approach we could pursue is providing a set of hardcoded policies and then allow additional user-configured policies as an XAttr. Users that stick to the the hardcoded policies will have improved memory usage. We can also dedupe xattr values in addition to names as an additional optimization. Since IIUC we only want to provide hardcoded policies in phase 1, this hybrid approach can come later. I thought we could use 4~6 bits in header to keep hardcoded policies before. And you mentioned that keep user-configured policies in XAttr. I agree that we keep polices in XAttr. I think myself ridiculous to think "using reserved space" equals "no memory overhead". The reserved space could be of great use than just to save ECPolicy. Wasting every contiguous file 4~6 bits just for nothing is NOT "no memory overhead". I found [btrfs compression|https://btrfs.wiki.kernel.org/index.php/Compression] interesting. The compression algorithm is stored per-extent(just like blocks for HDFS). And it utilizes a bit reserved 'compression flag' in the file attribute to support per-file compression. So, how about we should keep *ALL* polices in Xattr? (follow-on) Assume the propotion of striped file is 1:10. Instead of wasting 40~60 precious bits in contiguous file, why not store an xattr in striped file? (HDFS-8900 has a little help if there's more than 2 attrs). As for inheritance. "chattr +c" set compression on non-empty directory doesn't affect existed file. It only affects newly created file under it. There is "chattr -R +c" can be recursive. Even through messing compressed file with un-compressed file is not best practice. But it's pretty safe and could be very useful. "chattr -R +c" will automatically skip *.zip *.tar.gz files. We support striping small files when speaking EC framework. But striping small files is bad. If we have a conversion tool from contiguous from striping, can we auto skip the small files as well? > Erasure coding: store EC schema and cell size in INodeFile and eliminate > notion of EC zones > ------------------------------------------------------------------------------------------- > > Key: HDFS-8833 > URL: https://issues.apache.org/jira/browse/HDFS-8833 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Affects Versions: HDFS-7285 > Reporter: Zhe Zhang > Assignee: Zhe Zhang > Attachments: HDFS-8833-HDFS-7285-merge.00.patch, > HDFS-8833-HDFS-7285-merge.01.patch, HDFS-8833-HDFS-7285.02.patch > > > We have [discussed | > https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] > storing EC schema with files instead of EC zones and recently revisited the > discussion under HDFS-8059. > As a recap, the _zone_ concept has severe limitations including renaming and > nested configuration. Those limitations are valid in encryption for security > reasons and it doesn't make sense to carry them over in EC. > This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For > simplicity, we should first implement it as an xattr and consider memory > optimizations (such as moving it to file header) as a follow-on. We should > also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)