[jira] [Commented] (HDFS-10899) Add functionality to re-encrypt EDEKs

Wei-Chiu Chuang (JIRA) Sun, 20 Aug 2017 23:55:28 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134793#comment-16134793
 ]


Wei-Chiu Chuang commented on HDFS-10899:
----------------------------------------

Hi Xiao, thanks for the latest work!
I reviewed the latest patch (rev 014), mostly focused on ReencryptionHandler.

# ReencryptionHandler#(constructor)
{code:title=ReencryptionHandler#(constructor)}
if (reencryptBatchSize > 2000) {   // 2000 is based on buffer size = 512 * 
1024, and SetXAttr op size ~ 200.   
LOG.warn("Re-encryption batch size is {}. It could cause edit log buffer " 
      + "to be full an trigger a logSync within the writelock, greatly "
       + "impacting namenode throughput."); }
{code}
* The log needs to print reencryptBatchSize. 
* It might also improve readability by making the hard coded number (2000) a 
final static member variable.
* Please also add few more lines of comment to add the reference to 
QuorumJournalManager#outputBufferCapacity.


# ReencryptionHandler#batchService, and ReencryptionUpdater#batchService
can be declared as ExecutorCompletionService<ReencryptionTask>

# ReencryptionHandler#removeZone(): It looks like you can de-dup some code by 
reusing ReencryptionHandler#cancelZone().

# reencryptionHandler#reencryptEncryptionZone()
{code:title=reencryptionHandler#reencryptEncryptionZone()}
zs = getReencryptionStatus().getZoneStatus(zoneId); 
assert zs != null;
{code}
* zoneId is obtained when holding FSDirectory read lock,  release the lock, and 
then acquire FSDirectory read lock again.
* This assertion is only correct if there will be only one ReencryptionHandler 
running.

# ReencryptionStatus#updateZoneStatus() should check that zoneNode is an 
encryption zone.

# Why is currentBatch a TreeMap?

# Does ZoneReencryptionStatus#getLastProcessedFile return the relative path? or 
file name only? or absolute path?

# ReencryptionHandler#checkZoneReady
{code:title=ReencryptionHandler#checkZoneReady}
dir.getFSNamesystem()
     .checkNameNodeSafeMode("NN is in safe mode, cannot " + "re-encrypt");
{code}
* The “+” is redundant.

# ReencryptionHandler#reencryptDir
{code:title=ReencryptionHandler#reencryptDir}
curr = parent; 
curr = 
    reencryptDirInt(zoneId, currentBatch, curr, startAfters, ezKeyVerName);
{code}
Better rewrite as
{code} 
curr =
     reencryptDirInt(zoneId, currentBatch, parent, startAfters, ezKeyVerName);
{code}

# ReencryptionHandler#reencryptDirInt
{code:title=ReencryptionHandler#reencryptDirInt}
assert dir.hasReadLock();
 Preconditions.checkNotNull(curr, "Current inode can't be null"); 
assert dir.hasReadLock(); 
{code}
* The second assertion is redundant.

# ReencryptionHandler#reencryptINode
{code:title=ReencryptionHandler#reencryptINode}
FileEncryptionInfo feInfo = FSDirEncryptionZoneOp
     .getFileEncryptionInfo(dir, INodesInPath.fromINode(inode));
 if (feInfo == null || ezKeyVerName.equals(feInfo.getEzKeyVersionName())) {
   LOG.debug("File {} skipped re-encryption because edek's key version name" 
      + " is not changed.", name);
   return false;
 }
{code}
* if feInfo is null, it would be unexpected — an INodeFile under an encryption 
zone is supposed to have FileEncryptionInfo.

# ReencryptionHandler#submitCurrentBatch
* It Allocates a 2000-element map, copy it over, and then clear the map. That 
looks suboptimal. Would it be feasible to wrap TreeMap and make a method that 
simply assigns the TreeMap reference to another currentBatch?

# ReencryptionHandler.EDEKReencryptCallable:
{code}
// TODO: add reasonable retries here.
{code}
* I feel the retry should be handled by provider.
* if reencryptEdeks() returns numFailures > 0, call() should not return a new 
ReencryptionTask object.



> Add functionality to re-encrypt EDEKs
> -------------------------------------
>
>                 Key: HDFS-10899
>                 URL: https://issues.apache.org/jira/browse/HDFS-10899
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: encryption, kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, 
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, 
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, 
> HDFS-10899.09.patch, HDFS-10899.10.patch, HDFS-10899.10.wip.patch, 
> HDFS-10899.11.patch, HDFS-10899.12.patch, HDFS-10899.13.patch, 
> HDFS-10899.14.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt 
> edek design doc.pdf, Re-encrypt edek design doc V2.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect 
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key 
> rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10899) Add functionality to re-encrypt EDEKs

Reply via email to