[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

Zhe Zhang (JIRA) Wed, 26 Aug 2015 14:56:51 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715589#comment-14715589
 ]


Zhe Zhang commented on HDFS-8833:
---------------------------------

Thanks for the thoughts Walter. 

bq. As for inheritance. "chattr +c" set compression on non-empty directory 
doesn't affect existed file. It only affects newly created file under it. 
That's equivalent to the policy implemented in the patch. I agree that the 
flexibility of having both EC and non-EC files under a directory is useful. 

bq. But striping small files is bad. If we have a conversion tool from 
contiguous from striping, can we auto skip the small files as well?
Well it depends on how small (in relative to cell size). We should certainly 
skip files smaller than a full stripe. Good thoughts on the conversion tool. We 
should probably create a JIRA under HDFS-8031 to track the overall conversion 
work.

bq. The reserved space could be of great use than just to save ECPolicy. 
If file header space is a concern, we can actually encode {{replication}} and 
{{ecPolicy}} together. For example, first bit in the 12 bits indicates whether 
the file is contiguous (0) or striped (1). If it's striped, the remaining 11 
bits represent the policy ID.

At this point, I don't think file header is competed over by too many use 
cases. If it becomes so in the future, I think we should take a more holistic 
view at all the attributes. I agree that (at least in the beginning) only a 
small portion of files will be erasure coded. But the same applies to storage 
policies, and even replication factor and block size -- most files use the 
default. 

> Erasure coding: store EC schema and cell size in INodeFile and eliminate 
> notion of EC zones
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8833
>                 URL: https://issues.apache.org/jira/browse/HDFS-8833
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: HDFS-7285
>            Reporter: Zhe Zhang
>            Assignee: Zhe Zhang
>         Attachments: HDFS-8833-HDFS-7285-merge.00.patch, 
> HDFS-8833-HDFS-7285-merge.01.patch, HDFS-8833-HDFS-7285.02.patch
>
>
> We have [discussed | 
> https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754]
>  storing EC schema with files instead of EC zones and recently revisited the 
> discussion under HDFS-8059.
> As a recap, the _zone_ concept has severe limitations including renaming and 
> nested configuration. Those limitations are valid in encryption for security 
> reasons and it doesn't make sense to carry them over in EC.
> This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For 
> simplicity, we should first implement it as an xattr and consider memory 
> optimizations (such as moving it to file header) as a follow-on. We should 
> also disable changing EC policy on a non-empty file / dir in the first phase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones

Reply via email to