[ 
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634823#comment-15634823
 ] 

Andrew Wang commented on HDFS-10899:
------------------------------------

Thanks for working on this Xiao! I didn't give the whole patch a deep look, but 
some review comments to start it off. I didn't get to the KMS changes, it'd 
help to split those into a separate patch to make review easier.

Comments as follows:

* getShortUsage, are -path / -cancel / -verify exclusive operations? We should 
indicate this in the usage text if so. {{man git}} for instance looks like:

{noformat}
usage: git [--version] [--help] [-C <path>] [-c name=value]
           [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p | --paginate | --no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           <command> [<args>]
{noformat}

So {{<command> <path>}} would work here, with the long usage explaining the 
commands.

* We have new functionality to specify time with units in configuration, want 
to use it here?
* New keys should be documented in hdfs-default.xml also

EZManager:

* Some unused imports in I'm sure checkstyle will complain about
* Typo: "filesReencryptedInCurentZone" -> "Current"
* Can the new methods in EZManager be encapsulated as a class? Could use some 
javadoc about expected behavior also.
* Empty {{finally}} case in the run method, can be deleted?
* Thread should check if the NN is active or standby, typically we do this in 
FSNamesystem#startActiveServices and FSNamesystem#stopActiveServices
* reencryptDir looks like it takes the writeLock the entire time while it's 
processing the files in a directory, including when talking to the KMS. We 
should not hold the lock while talking to the KMS.
* reencryptDir is also a big method, refactor the innards of the for loop?
* We also should do a logSync before we log the "Completed" message in 
{{reencryptEncryptionZoneInt}}, so we block first. Maybe elsewhere too.
* Calling {{getListing}} will generate audit logs, we should be calling 
{{getListingInt}} instead. Also, we don't need a full {{HdfsFileStatus}} to 
reencrypt each file, so let's consider an INode-based traversal.
* Structuring this as an iterator (and maybe also a visitor) would make the 
code more clear.
* Recommend we rename {{updateReencryptStatus}} so it's clear that this is used 
during edit log / fsimage loading, or at least add some javadoc.

ReencryptInfo:
* Recommend we name ReencryptInfo something like "PendingReencryptionZones" or 
something to be more self documenting. The javadoc also could mention that this 
is basically a queue.
* A small warning that {{hasZone}} will be {{O(n)}} if the JDK implementation 
is truly a queue, maybe we should use a LinkedHashMap instead? I don't think we 
use the deque nature.
* How is the reencryptInfo in EZManager synchronized? I ask because it's a 
ConcurrentLinkedDeque, is there actually concurrency happening?

> Add functionality to re-encrypt EDEKs.
> --------------------------------------
>
>                 Key: HDFS-10899
>                 URL: https://issues.apache.org/jira/browse/HDFS-10899
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: encryption, kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: HDFS-10899.wip.patch, Re-encrypt edek design doc.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect 
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key 
> rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to