[ https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134793#comment-16134793 ]
Wei-Chiu Chuang commented on HDFS-10899: ---------------------------------------- Hi Xiao, thanks for the latest work! I reviewed the latest patch (rev 014), mostly focused on ReencryptionHandler. # ReencryptionHandler#(constructor) {code:title=ReencryptionHandler#(constructor)} if (reencryptBatchSize > 2000) { // 2000 is based on buffer size = 512 * 1024, and SetXAttr op size ~ 200. LOG.warn("Re-encryption batch size is {}. It could cause edit log buffer " + "to be full an trigger a logSync within the writelock, greatly " + "impacting namenode throughput."); } {code} * The log needs to print reencryptBatchSize. * It might also improve readability by making the hard coded number (2000) a final static member variable. * Please also add few more lines of comment to add the reference to QuorumJournalManager#outputBufferCapacity. # ReencryptionHandler#batchService, and ReencryptionUpdater#batchService can be declared as ExecutorCompletionService<ReencryptionTask> # ReencryptionHandler#removeZone(): It looks like you can de-dup some code by reusing ReencryptionHandler#cancelZone(). # reencryptionHandler#reencryptEncryptionZone() {code:title=reencryptionHandler#reencryptEncryptionZone()} zs = getReencryptionStatus().getZoneStatus(zoneId); assert zs != null; {code} * zoneId is obtained when holding FSDirectory read lock, release the lock, and then acquire FSDirectory read lock again. * This assertion is only correct if there will be only one ReencryptionHandler running. # ReencryptionStatus#updateZoneStatus() should check that zoneNode is an encryption zone. # Why is currentBatch a TreeMap? # Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or file name only? or absolute path? # ReencryptionHandler#checkZoneReady {code:title=ReencryptionHandler#checkZoneReady} dir.getFSNamesystem() .checkNameNodeSafeMode("NN is in safe mode, cannot " + "re-encrypt"); {code} * The “+” is redundant. # ReencryptionHandler#reencryptDir {code:title=ReencryptionHandler#reencryptDir} curr = parent; curr = reencryptDirInt(zoneId, currentBatch, curr, startAfters, ezKeyVerName); {code} Better rewrite as {code} curr = reencryptDirInt(zoneId, currentBatch, parent, startAfters, ezKeyVerName); {code} # ReencryptionHandler#reencryptDirInt {code:title=ReencryptionHandler#reencryptDirInt} assert dir.hasReadLock(); Preconditions.checkNotNull(curr, "Current inode can't be null"); assert dir.hasReadLock(); {code} * The second assertion is redundant. # ReencryptionHandler#reencryptINode {code:title=ReencryptionHandler#reencryptINode} FileEncryptionInfo feInfo = FSDirEncryptionZoneOp .getFileEncryptionInfo(dir, INodesInPath.fromINode(inode)); if (feInfo == null || ezKeyVerName.equals(feInfo.getEzKeyVersionName())) { LOG.debug("File {} skipped re-encryption because edek's key version name" + " is not changed.", name); return false; } {code} * if feInfo is null, it would be unexpected — an INodeFile under an encryption zone is supposed to have FileEncryptionInfo. # ReencryptionHandler#submitCurrentBatch * It Allocates a 2000-element map, copy it over, and then clear the map. That looks suboptimal. Would it be feasible to wrap TreeMap and make a method that simply assigns the TreeMap reference to another currentBatch? # ReencryptionHandler.EDEKReencryptCallable: {code} // TODO: add reasonable retries here. {code} * I feel the retry should be handled by provider. * if reencryptEdeks() returns numFailures > 0, call() should not return a new ReencryptionTask object. > Add functionality to re-encrypt EDEKs > ------------------------------------- > > Key: HDFS-10899 > URL: https://issues.apache.org/jira/browse/HDFS-10899 > Project: Hadoop HDFS > Issue Type: New Feature > Components: encryption, kms > Reporter: Xiao Chen > Assignee: Xiao Chen > Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, > HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, > HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, > HDFS-10899.09.patch, HDFS-10899.10.patch, HDFS-10899.10.wip.patch, > HDFS-10899.11.patch, HDFS-10899.12.patch, HDFS-10899.13.patch, > HDFS-10899.14.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt > edek design doc.pdf, Re-encrypt edek design doc V2.pdf > > > Currently when an encryption zone (EZ) key is rotated, it only takes effect > on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key > rotation, for improved security. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org