[ https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634823#comment-15634823 ]
Andrew Wang commented on HDFS-10899: ------------------------------------ Thanks for working on this Xiao! I didn't give the whole patch a deep look, but some review comments to start it off. I didn't get to the KMS changes, it'd help to split those into a separate patch to make review easier. Comments as follows: * getShortUsage, are -path / -cancel / -verify exclusive operations? We should indicate this in the usage text if so. {{man git}} for instance looks like: {noformat} usage: git [--version] [--help] [-C <path>] [-c name=value] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] [-p | --paginate | --no-pager] [--no-replace-objects] [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] <command> [<args>] {noformat} So {{<command> <path>}} would work here, with the long usage explaining the commands. * We have new functionality to specify time with units in configuration, want to use it here? * New keys should be documented in hdfs-default.xml also EZManager: * Some unused imports in I'm sure checkstyle will complain about * Typo: "filesReencryptedInCurentZone" -> "Current" * Can the new methods in EZManager be encapsulated as a class? Could use some javadoc about expected behavior also. * Empty {{finally}} case in the run method, can be deleted? * Thread should check if the NN is active or standby, typically we do this in FSNamesystem#startActiveServices and FSNamesystem#stopActiveServices * reencryptDir looks like it takes the writeLock the entire time while it's processing the files in a directory, including when talking to the KMS. We should not hold the lock while talking to the KMS. * reencryptDir is also a big method, refactor the innards of the for loop? * We also should do a logSync before we log the "Completed" message in {{reencryptEncryptionZoneInt}}, so we block first. Maybe elsewhere too. * Calling {{getListing}} will generate audit logs, we should be calling {{getListingInt}} instead. Also, we don't need a full {{HdfsFileStatus}} to reencrypt each file, so let's consider an INode-based traversal. * Structuring this as an iterator (and maybe also a visitor) would make the code more clear. * Recommend we rename {{updateReencryptStatus}} so it's clear that this is used during edit log / fsimage loading, or at least add some javadoc. ReencryptInfo: * Recommend we name ReencryptInfo something like "PendingReencryptionZones" or something to be more self documenting. The javadoc also could mention that this is basically a queue. * A small warning that {{hasZone}} will be {{O(n)}} if the JDK implementation is truly a queue, maybe we should use a LinkedHashMap instead? I don't think we use the deque nature. * How is the reencryptInfo in EZManager synchronized? I ask because it's a ConcurrentLinkedDeque, is there actually concurrency happening? > Add functionality to re-encrypt EDEKs. > -------------------------------------- > > Key: HDFS-10899 > URL: https://issues.apache.org/jira/browse/HDFS-10899 > Project: Hadoop HDFS > Issue Type: New Feature > Components: encryption, kms > Reporter: Xiao Chen > Assignee: Xiao Chen > Attachments: HDFS-10899.wip.patch, Re-encrypt edek design doc.pdf > > > Currently when an encryption zone (EZ) key is rotated, it only takes effect > on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key > rotation, for improved security. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org