[ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14713580#comment-14713580
 ] 

Walter Su commented on HDFS-8833:
---------------------------------

bq. Rename isStriped to erasureCodingPolicy in INodeFile header.
Please don't do that.
bq. (Zhe Zhang) One hybrid approach we could pursue is providing a set of 
hardcoded policies and then allow additional user-configured policies as an 
XAttr. Users that stick to the the hardcoded policies will have improved memory 
usage. We can also dedupe xattr values in addition to names as an additional 
optimization. Since IIUC we only want to provide hardcoded policies in phase 1, 
this hybrid approach can come later.
I thought we could use 4~6 bits in header to keep hardcoded policies before. 
And you mentioned that keep user-configured policies in XAttr. I agree that we 
keep polices in XAttr.

I think myself ridiculous to think "using reserved space" equals "no memory 
overhead". The reserved space could be of great use than just to save ECPolicy. 
Wasting every contiguous file 4~6 bits just for nothing is NOT "no memory 
overhead".

I found [btrfs compression|https://btrfs.wiki.kernel.org/index.php/Compression] 
interesting. The compression algorithm is stored per-extent(just like blocks 
for HDFS). And it utilizes a bit reserved 'compression flag' in the file 
attribute to support per-file compression.

So, how about we should keep *ALL* polices in Xattr? (follow-on) Assume the 
propotion of striped file is 1:10. Instead of wasting 40~60 precious bits in 
contiguous file, why not store an xattr in striped file? (HDFS-8900 has a 
little help if there's more than 2 attrs).

As for inheritance.  "chattr +c" set compression on non-empty directory doesn't 
affect existed file. It only affects newly created file under it. There is 
"chattr -R +c" can be recursive. Even through messing compressed file with 
un-compressed file is not best practice. But it's pretty safe and could be very 
useful. "chattr -R +c"  will automatically skip *.zip *.tar.gz files.
We support striping small files when speaking EC framework. But striping small 
files is bad. If we have a conversion tool from contiguous from striping, can 
we auto skip the small files as well? 

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8833-HDFS-7285-merge.00.patch, 
> HDFS-8833-HDFS-7285-merge.01.patch, HDFS-8833-HDFS-7285.02.patch
>
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to