[ 
https://issues.apache.org/jira/browse/HDFS-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157633#comment-16157633
 ] 

Kai Zheng commented on HDFS-7859:
---------------------------------

bq. Checking file / directory that is using this particular policy is a 
potentially O(n) operation, where n = # of inodes. I feel that it is OK to 
leave it in fsimage as garbage for now. In the future, we can let the fsimage 
loading process to handling this garbage, as it is O(n).
Discussed with Sammi offline before, we can do this very lightly like below:
{code}
# all used polices by files/directories
usedPoliciesSet = ();

# while loading inodes from fsimage, add the following two lines
foreach (in: inodes) {
  policyId = getPolicyIdFromInode(in) # a bitwise op, very minor
  usedPoliciesSet.add(policyId)
}

# when inodes all loaded, add the following post step
ErasureCodingPolicyManager.getInstance().updateWithUsedPolices(usedPoliciesSet)

# in ErasureCodingPolicyManager.updateWithUsedPolices, it's a simple step to 
clean up removed policies with the used polices set. 
{code}

bq. Here or elsewhere, please ensure no policy can be DISABLED/REMOVED if it's 
used by files, with necessary tests.
Let me correct myself. We should allow to disable/remove polices regardless 
they're used or not. It would be too much overhead to track policy usages while 
NN is running along with lots of files being operated. We can just do a post 
clean up as above illustrated.

I'm fine to leave the policies clean up work as a future work to do, but if 
sounds good maybe we can get it done before 3.0 GA. It should be OK since it 
doesn't involve API change.

bq. Could this happen before BETA 1? it seems to be a breaking change. If not , 
do we have a plan to preserve both this key and the capability of 
adding/removing policies?
I agree, we should get it done this time. Actually, IIRC, this was already done 
but Sammi may need some double check and clean up if any.

> Erasure Coding: Persist erasure coding policies in NameNode
> -----------------------------------------------------------
>
>                 Key: HDFS-7859
>                 URL: https://issues.apache.org/jira/browse/HDFS-7859
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: SammiChen
>              Labels: hdfs-ec-3.0-must-do
>         Attachments: HDFS-7859.001.patch, HDFS-7859.002.patch, 
> HDFS-7859.004.patch, HDFS-7859.005.patch, HDFS-7859.006.patch, 
> HDFS-7859.007.patch, HDFS-7859.008.patch, HDFS-7859.009.patch, 
> HDFS-7859.010.patch, HDFS-7859.011.patch, HDFS-7859.012.patch, 
> HDFS-7859.013.patch, HDFS-7859.014.patch, HDFS-7859.015.patch, 
> HDFS-7859-HDFS-7285.002.patch, HDFS-7859-HDFS-7285.002.patch, 
> HDFS-7859-HDFS-7285.003.patch
>
>
> In meetup discussion with [~zhz] and [~jingzhao], it's suggested that we 
> persist EC schemas in NameNode centrally and reliably, so that EC zones can 
> reference them by name efficiently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to