[ 
https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-10899:
-----------------------------
    Attachment: HDFS-10899.08.patch

Thanks much for the soak and all these good comments Andrew. Sorry this took a 
while to update.

Attaching patch 8:
- Refactored the {{ReencryptionHandler}} so the innards are easier to reason 
about. This makes the locking easier, gets rid of the {{subdirs}}, and contacts 
the KMS after a full batch is ready, in a new method {{processCurrentBatch}}
- Having a some local bench mark it seems the communication overhead to KMS is 
dominating (60%+), so we can potentially add the re-encrypt batch API to KMS 
and use that on the above. Could multithread the {{processCurrentBatch}} to 
further push performance I think.

Also addresses all comments above, with the exceptions following:
bq. This could be difficult with all the lock/unlocks and stages, but I'd 
prefer a goal-pause-time configuration for the {{run}} loop. This is easier for 
admins to reason about. We would still use the batch size for determining when 
to log a batch.
Good idea. Will be working on that.
bq. Looks like we aren't using the op cache in FSEditLog SetXAttrOp / 
RemoveXAttrOp. I think this is accidental, could you do some research? 
Particularly since we'll be doing a lot of SetXAttrOps, avoiding all that 
object allocation would be nice. This could be a separate JIRA.
Tracked this back to the initial HDFS-6301, pinged there but no response. Agree 
this is likely a bug, created HDFS-11410. Good find!
bq. Follow-on idea: it'd be nice for admins to be able to query the status of 
queued and running reencrypt commands. Progress indicators like submission 
time, start time, # skipped, # reencrypted, total # (if this is cheap to get) 
would be helpful.
Planed to add {{-status}} to the crypto command, still marking as todo so far 
... :)
bq. Catching Exception in run is a code smell. What is the intent? It looks 
like we already catch the checked exceptions, so this will catch 
RuntimeExceptions (which are normally unrecoverable).
True it's not ideal. The reason I added it is re-encrypt is happening in a 
separate thread, and Exceptions go to stderr which may or may not get 
collected. Logging the exception in the NN log will add some supportability.

Also it almost feels we should do the kms batching already. Let me play with it 
and update no later than Wednesday.

> Add functionality to re-encrypt EDEKs.
> --------------------------------------
>
>                 Key: HDFS-10899
>                 URL: https://issues.apache.org/jira/browse/HDFS-10899
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: encryption, kms
>            Reporter: Xiao Chen
>            Assignee: Xiao Chen
>         Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, 
> HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, 
> HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, 
> HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt edek design doc.pdf
>
>
> Currently when an encryption zone (EZ) key is rotated, it only takes effect 
> on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key 
> rotation, for improved security.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to