[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

SammiChen (JIRA) Thu, 11 May 2017 23:38:17 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007698#comment-16007698
 ]


SammiChen commented on HDFS-7337:
---------------------------------

Hi [~eddyxu], thanks for review the design doc! 
bq. Do you know what is the overhead of this check when NN restarts? Will it 
introduce noticeable slowdown to NN start process?
My current idea is when NN restarts, it will load all file inodes and directory 
inodes, check whether the file is a striped file, directory has EC file 
applied.  So we can leverage this process, if the file is a striped file or a 
EC directory, add one extra step, put their EC policy ID into a global map. 
Once we have all the used EC policy IDs, we can decide if a user removed EC 
policy can be ultimately deleted or not.  I think one extra simple step will 
not introduce noticeable slowdown to the NN start process. 
We have also thought about other alternative solution, such as for those user 
removed EC policies,  they will not be seen by user through any NN API anymore, 
but will not actually deleted from the system, such there will be some stale 
information in the system. 

bq.  Enable policy: while it supports to add / remove policies using CLI, but 
why it dose not support to enable / use the policy via CLI? This limitation 
makes the user experience not consistent .
It has a history. At first, there are only built-in EC policies, such as 
RS(6,3) and RS(10,4). Then an improvement is made to avoid user use the EC 
policy which is not feasible for his/her cluster, for example,  if the cluster 
has only 6 datanodes, then RS(10,4) is not feasible at all. The improvement is 
made through 'dfs.namenode.ec.policies.enabled' property which requires Admin 
privilege to set the enabled policies and restart the cluster for cautious. 
Then comes the user define EC policy,  so we think this 
'dfs.namenode.ec.policies.enabled' can be leveraged to enabled or disabled user 
defined policies, also for cautious. 
On the meanwhile, I have discussed this question with Kai. Add an extra API is 
also an alternative. But I want to hear more from your guys.  

bq. For the configuration keys of codecs, why do we need ".rawcoders" as suffix 
for each one?  Do you mean the implementations of the same algorithm?
Yes, they are different implementations of the same algorithm. For example, RS 
algorithm, there are pure Java coder, ISA-L coder, HDFS RAID coder.  And 
because the configuration keys are used to define the order of different raw 
coders, so use the .rawcoders as the suffix.

bq. is there a system-wide default codec / configuration to use?
Yes. There are several system wide codecs to use, including  RS codec, RS 
legacy codec and XOR codec.

> Configurable and pluggable Erasure Codec and schema
> ---------------------------------------------------
>
>                 Key: HDFS-7337
>                 URL: https://issues.apache.org/jira/browse/HDFS-7337
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: erasure-coding
>            Reporter: Zhe Zhang
>            Priority: Critical
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-7337-prototype-v1.patch, 
> HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
> PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, 
> PluggableErasureCodec-v3.pdf, PluggableErasureCodec v4.pdf
>
>
> According to HDFS-7285 and the design, this considers to support multiple 
> Erasure Codecs via pluggable approach. It allows to define and configure 
> multiple codec schemas with different coding algorithms and parameters. The 
> resultant codec schemas can be utilized and specified via command tool for 
> different file folders. While design and implement such pluggable framework, 
> it’s also to implement a concrete codec by default (Reed Solomon) to prove 
> the framework is useful and workable. Separate JIRA could be opened for the 
> RS codec implementation.
> Note HDFS-7353 will focus on the very low level codec API and implementation 
> to make concrete vendor libraries transparent to the upper layer. This JIRA 
> focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

Reply via email to