[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007698#comment-16007698 ]
SammiChen commented on HDFS-7337: --------------------------------- Hi [~eddyxu], thanks for review the design doc! bq. Do you know what is the overhead of this check when NN restarts? Will it introduce noticeable slowdown to NN start process? My current idea is when NN restarts, it will load all file inodes and directory inodes, check whether the file is a striped file, directory has EC file applied. So we can leverage this process, if the file is a striped file or a EC directory, add one extra step, put their EC policy ID into a global map. Once we have all the used EC policy IDs, we can decide if a user removed EC policy can be ultimately deleted or not. I think one extra simple step will not introduce noticeable slowdown to the NN start process. We have also thought about other alternative solution, such as for those user removed EC policies, they will not be seen by user through any NN API anymore, but will not actually deleted from the system, such there will be some stale information in the system. bq. Enable policy: while it supports to add / remove policies using CLI, but why it dose not support to enable / use the policy via CLI? This limitation makes the user experience not consistent . It has a history. At first, there are only built-in EC policies, such as RS(6,3) and RS(10,4). Then an improvement is made to avoid user use the EC policy which is not feasible for his/her cluster, for example, if the cluster has only 6 datanodes, then RS(10,4) is not feasible at all. The improvement is made through 'dfs.namenode.ec.policies.enabled' property which requires Admin privilege to set the enabled policies and restart the cluster for cautious. Then comes the user define EC policy, so we think this 'dfs.namenode.ec.policies.enabled' can be leveraged to enabled or disabled user defined policies, also for cautious. On the meanwhile, I have discussed this question with Kai. Add an extra API is also an alternative. But I want to hear more from your guys. bq. For the configuration keys of codecs, why do we need ".rawcoders" as suffix for each one? Do you mean the implementations of the same algorithm? Yes, they are different implementations of the same algorithm. For example, RS algorithm, there are pure Java coder, ISA-L coder, HDFS RAID coder. And because the configuration keys are used to define the order of different raw coders, so use the .rawcoders as the suffix. bq. is there a system-wide default codec / configuration to use? Yes. There are several system wide codecs to use, including RS codec, RS legacy codec and XOR codec. > Configurable and pluggable Erasure Codec and schema > --------------------------------------------------- > > Key: HDFS-7337 > URL: https://issues.apache.org/jira/browse/HDFS-7337 > Project: Hadoop HDFS > Issue Type: New Feature > Components: erasure-coding > Reporter: Zhe Zhang > Priority: Critical > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-7337-prototype-v1.patch, > HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, > PluggableErasureCodec.pdf, PluggableErasureCodec-v2.pdf, > PluggableErasureCodec-v3.pdf, PluggableErasureCodec v4.pdf > > > According to HDFS-7285 and the design, this considers to support multiple > Erasure Codecs via pluggable approach. It allows to define and configure > multiple codec schemas with different coding algorithms and parameters. The > resultant codec schemas can be utilized and specified via command tool for > different file folders. While design and implement such pluggable framework, > it’s also to implement a concrete codec by default (Reed Solomon) to prove > the framework is useful and workable. Separate JIRA could be opened for the > RS codec implementation. > Note HDFS-7353 will focus on the very low level codec API and implementation > to make concrete vendor libraries transparent to the upper layer. This JIRA > focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org