[ https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870903#comment-15870903 ]
Andrew Wang commented on HDFS-10899: ------------------------------------ Hi Xiao, this latest rev looks like a big improvement in readability, and thanks for the speedy work getting through all the previous comments! Nit: * FSEditLog, rename logCompleteReencryptZone to logReencryptZoneComplete to match name of the op and the Status method? ReencryptionHandler: * processCurrentBatch, checkReadyForWrite can throw an exception, and it's not guarded by the try/finally for the writeLock. * processCurrentBatch, it looks like we ride over a GeneralSecurityException or null return value when talking to the KMS. Shouldn't this either throw, or be retried? We don't want to miss reencrypting an EDEKs. * High-level, it'd be good to think about the behavior when the KMS is slow/flaky/down. It looks like right now an IOException will throw all the way up to {{run}} which has a fixed sleep, but we might want some backoff for connectivity issues. * In reencryptEncryptionZone, I think restoreFromLastProcessedFile also needs the read lock held when calling? Sprinkling additional paranoia {{hasReadLock}} / {{hasWriteLock}} checks everywhere would be appreciated. * {{restoreFromLPF}}, javadoc looks out of date since there's no longer a {{subdirs}} * The new throttling mechanism could use some code comments * Would still prefer this if statement to be in the constructor, have it throw an exception all the way up to the FSDirectory constructor: {code} if (ezManager.getProvider() == null) { LOG.info("No provider set, cannot re-encrypt"); return; } {code} * Similarly, we should do some bounds checking on all the config keys in the constructor (like > 0). I think {{isEnabled}} becomes extraneous afterwards. I don't think enable/disable is necessary since reencryption is already "opt-in" in that an admin has to trigger it, but if we really do want a flag, I'd prefer we do it with a new config key rather than with the throttling configs. * I still find the call hierarchy complicated to reason about, since it has two entry points and is bi-recursive (run -> reencryptEZ -> restoreFromLPF or reencryptDir -> reencryptDirInt <-> reencryptINode). Rather than recursion, have you considered using a stack or queue to hold unprocessed items? I think this gets us towards a single reencryptDir call in reencryptEZ, since restoreFromLPF would be responsible for repopulating the stack/queue. * reencryptDirInt, I see there's a fixup for {{i}} if things have changed, but I don't think this is sufficient. Imagine if the dir's children have been deleted. {{INodeDirectory#nextChild}} can return a negative index, which will always be less than the for loop termination condition {{children.size()}}. {{children.get(i)}} won't behave correctly with a negative index. * reencryptINode, needs documentation on the return value * If the # of items is an exact multiple of {{reencryptBatchSize}}, will the final call to {{processCurrentBatch}} with {{lastBatch = true}} in {{reencryptEncryptionZone}} correctly log the completion? It returns early if {{currentBatch}} is empty. IMO we should separate batch processing from logging the completion, there doesn't seem like much code sharing. * Could use some class-level documentation about the iteration order (appears to be depth-first) and how we handle deletes and creates. * Add a Precondition check in ReencryptTracker that passed inode is not null? This is important so the {{Objects.equals}} in processCurrentBatch behaves correctly. EZManager: * Rename {{getDir}} to {{getFSDirectory}} for clarity? Follow-on: * This relates to the status follow-on you're planning. I noticed cancelling a reencrypt reuses the complete edit log op. It'd be nice to store some additional info in the EZ root xattr admins can differentiate between completion and cancellation. > Add functionality to re-encrypt EDEKs. > -------------------------------------- > > Key: HDFS-10899 > URL: https://issues.apache.org/jira/browse/HDFS-10899 > Project: Hadoop HDFS > Issue Type: New Feature > Components: encryption, kms > Reporter: Xiao Chen > Assignee: Xiao Chen > Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, > HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, > HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, > HDFS-10899.09.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt > edek design doc.pdf > > > Currently when an encryption zone (EZ) key is rotated, it only takes effect > on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key > rotation, for improved security. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org