[ 
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870903#comment-15870903
 ] 

Andrew Wang commented on HDFS-10899:
------------------------------------

Hi Xiao, this latest rev looks like a big improvement in readability, and 
thanks for the speedy work getting through all the previous comments!

Nit:
* FSEditLog, rename logCompleteReencryptZone to logReencryptZoneComplete to 
match name of the op and the Status method?

ReencryptionHandler:
* processCurrentBatch, checkReadyForWrite can throw an exception, and it's not 
guarded by the try/finally for the writeLock.
* processCurrentBatch, it looks like we ride over a GeneralSecurityException or 
null return value when talking to the KMS. Shouldn't this either throw, or be 
retried? We don't want to miss reencrypting an EDEKs.
* High-level, it'd be good to think about the behavior when the KMS is 
slow/flaky/down. It looks like right now an IOException will throw all the way 
up to {{run}} which has a fixed sleep, but we might want some backoff for 
connectivity issues.
* In reencryptEncryptionZone, I think restoreFromLastProcessedFile also needs 
the read lock held when calling? Sprinkling additional paranoia {{hasReadLock}} 
/ {{hasWriteLock}} checks everywhere would be appreciated.
* {{restoreFromLPF}}, javadoc looks out of date since there's no longer a 
{{subdirs}}
* The new throttling mechanism could use some code comments
* Would still prefer this if statement to be in the constructor, have it throw 
an exception all the way up to the FSDirectory constructor:

{code}
    if (ezManager.getProvider() == null) {
      LOG.info("No provider set, cannot re-encrypt");
      return;
    }
{code}

* Similarly, we should do some bounds checking on all the config keys in the 
constructor (like > 0). I think {{isEnabled}} becomes extraneous afterwards. I 
don't think enable/disable is necessary since reencryption is already "opt-in" 
in that an admin has to trigger it, but if we really do want a flag, I'd prefer 
we do it with a new config key rather than with the throttling configs.
* I still find the call hierarchy complicated to reason about, since it has two 
entry points and is bi-recursive (run -> reencryptEZ -> restoreFromLPF or 
reencryptDir -> reencryptDirInt <-> reencryptINode). Rather than recursion, 
have you considered using a stack or queue to hold unprocessed items? I think 
this gets us towards a single reencryptDir call in reencryptEZ, since 
restoreFromLPF would be responsible for repopulating the stack/queue.
* reencryptDirInt, I see there's a fixup for {{i}} if things have changed, but 
I don't think this is sufficient. Imagine if the dir's children have been 
deleted. {{INodeDirectory#nextChild}} can return a negative index, which will 
always be less than the for loop termination condition {{children.size()}}. 
{{children.get(i)}} won't behave correctly with a negative index.
* reencryptINode, needs documentation on the return value
* If the # of items is an exact multiple of {{reencryptBatchSize}}, will the 
final call to {{processCurrentBatch}} with {{lastBatch = true}} in 
{{reencryptEncryptionZone}} correctly log the completion? It returns early if 
{{currentBatch}} is empty. IMO we should separate batch processing from logging 
the completion, there doesn't seem like much code sharing.
* Could use some class-level documentation about the iteration order (appears 
to be depth-first) and how we handle deletes and creates.
* Add a Precondition check in ReencryptTracker that passed inode is not null? 
This is important so the {{Objects.equals}} in processCurrentBatch behaves 
correctly.

EZManager:
* Rename {{getDir}} to {{getFSDirectory}} for clarity?

Follow-on:
* This relates to the status follow-on you're planning. I noticed cancelling a 
reencrypt reuses the complete edit log op. It'd be nice to store some 
additional info in the EZ root xattr admins can differentiate between 
completion and cancellation.

> Add functionality to re-encrypt EDEKs.
> --------------------------------------
>
>                 Key: HDFS-10899
>                 URL: https://issues.apache.org/jira/browse/HDFS-10899
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: encryption, kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, 
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, 
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, 
> HDFS-10899.09.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt 
> edek design doc.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect 
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key 
> rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to