[ 
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134793#comment-16134793
 ] 

Wei-Chiu Chuang commented on HDFS-10899:
----------------------------------------

Hi Xiao, thanks for the latest work!
I reviewed the latest patch (rev 014), mostly focused on ReencryptionHandler.

# ReencryptionHandler#(constructor)
{code:title=ReencryptionHandler#(constructor)}
if (reencryptBatchSize > 2000) {
  // 2000 is based on buffer size = 512 * 
1024, and SetXAttr op size ~ 200.
  
LOG.warn("Re-encryption batch size is {}. It could cause edit log buffer "

      + "to be full an trigger a logSync within the writelock, greatly "

      + "impacting namenode throughput.");
}
{code}
* The log needs to print reencryptBatchSize. 
* It might also improve readability by making the hard coded number (2000) a 
final static member variable.
* Please also add few more lines of comment to add the reference to 
QuorumJournalManager#outputBufferCapacity.


# ReencryptionHandler#batchService, and ReencryptionUpdater#batchService
can be declared as ExecutorCompletionService<ReencryptionTask>

# ReencryptionHandler#removeZone(): It looks like you can de-dup some code by 
reusing ReencryptionHandler#cancelZone().

# reencryptionHandler#reencryptEncryptionZone()
{code:title=reencryptionHandler#reencryptEncryptionZone()}
zs = getReencryptionStatus().getZoneStatus(zoneId);

assert zs != null;
{code}
* zoneId is obtained when holding FSDirectory read lock,  release the lock, and 
then acquire FSDirectory read lock again.
* This assertion is only correct if there will be only one ReencryptionHandler 
running.

# ReencryptionStatus#updateZoneStatus() should check that zoneNode is an 
encryption zone.

# Why is currentBatch a TreeMap?

# Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or 
file name only? or absolute path?

# ReencryptionHandler#checkZoneReady
{code:title=ReencryptionHandler#checkZoneReady}
dir.getFSNamesystem()

    .checkNameNodeSafeMode("NN is in safe mode, cannot " + "re-encrypt");
{code}
* The “+” is redundant.

# ReencryptionHandler#reencryptDir
{code:title=ReencryptionHandler#reencryptDir}
curr = parent;

curr =

    reencryptDirInt(zoneId, currentBatch, curr, startAfters, ezKeyVerName);
{code}
Better rewrite as
{code}

curr =

    reencryptDirInt(zoneId, currentBatch, parent, startAfters, ezKeyVerName);
{code}

# ReencryptionHandler#reencryptDirInt
{code:title=ReencryptionHandler#reencryptDirInt}
assert dir.hasReadLock();

Preconditions.checkNotNull(curr, "Current inode can't be null");

assert dir.hasReadLock(); 
{code}
* The second assertion is redundant.

# ReencryptionHandler#reencryptINode
{code:title=ReencryptionHandler#reencryptINode}
FileEncryptionInfo feInfo = FSDirEncryptionZoneOp

    .getFileEncryptionInfo(dir, INodesInPath.fromINode(inode));

if (feInfo == null || ezKeyVerName.equals(feInfo.getEzKeyVersionName())) {

  LOG.debug("File {} skipped re-encryption because edek's key version name"

      + " is not changed.", name);

  return false;

}
{code}
* if feInfo is null, it would be unexpected — an INodeFile under an encryption 
zone is supposed to have FileEncryptionInfo.

# ReencryptionHandler#submitCurrentBatch
* It Allocates a 2000-element map, copy it over, and then clear the map. That 
looks suboptimal. Would it be feasible to wrap TreeMap and make a method that 
simply assigns the TreeMap reference to another currentBatch?

# ReencryptionHandler.EDEKReencryptCallable:
{code}
// TODO: add reasonable retries here.
{code}
* I feel the retry should be handled by provider.
* if reencryptEdeks() returns numFailures > 0, call() should not return a new 
ReencryptionTask object.



> Add functionality to re-encrypt EDEKs
> -------------------------------------
>
>                 Key: HDFS-10899
>                 URL: https://issues.apache.org/jira/browse/HDFS-10899
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: encryption, kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, 
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, 
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, 
> HDFS-10899.09.patch, HDFS-10899.10.patch, HDFS-10899.10.wip.patch, 
> HDFS-10899.11.patch, HDFS-10899.12.patch, HDFS-10899.13.patch, 
> HDFS-10899.14.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt 
> edek design doc.pdf, Re-encrypt edek design doc V2.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect 
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key 
> rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to