[ https://issues.apache.org/jira/browse/HDFS-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104471#comment-16104471 ]
Xiao Chen commented on HDFS-10899: ---------------------------------- Had a productive offline review session with [~andrew.wang], where we discussed about several things. Thanks Andrew! - By design snapshots are immutable, so even after re-encryption, the snapshots of the EZ will still have old edeks. If there was security breach, admin's need to remove old snapshots, or take manual measures (e.g. cp, or mv to a non-snapshot dir then re-encrypt). Should add this to docs. This jira should not attempt to touch snapshots. - Perf -> latency I have been running this in a test cluster (with kerberos + SSL), with 1 sample EZ and 1M files. Cluster is on [GCE|https://cloud.google.com/compute/docs/machine-types], NN is n1-highmem-4, 2 KMS instances on n1-standard-2. Some perf numbers (when all 1M files have old edeks): -- 1 edek thread: 40~50 mins. -- 10 edek threads (5000 edeks per task): 13 mins -- 30 edek threads (1000 edeks per task): 12 mins Time to generate all the tasks for 1M files is ~10 seconds.\\ The bottleneck of the entire operation is on contacting KMS - from NN side the HTTPS connection to KMS took an average of single digit milliseconds per request, where inside the KMS the actual re-encryption only took 10s of microseconds. The default keep-alive of 5 connections is used, and the first 5 connections (clean setup) took even longer. This leads me to prototype a batched re-encryption interface on the KMS, and the perf of that is: --- 20 threads (1000 per task): 1.5 min. Which well fits in our 200M files within 8 hours goal. Discussing with Andrew, we felt the batched API is the way to go. I will file another jira to add a batched re-encryption API to the KMS, and update this patch to use that. - Perf - memory Above test is done without any throttling. We should throttle the {{ReencryptionHandler}} when instantiating Callables, to keep NN memory sane. The plan is to use a static calculation, so we only keep a configurable # of Callables in memory - the handler simply waits until a Callable is done and released before creating a new one. Will have a default number calculated from # of cores of the NN. Surely we should also considers how many edeks per Callable. Will implement this soon. - Perf - lock throttling Ideally we'd also throttle the {{ReencryptionHandler}} to control what % of time it can hold the readlock, and similarly {{ReencryptionUpdater}} for writelock. But since we already need to wait for the Callables, this kinda comes naturally. I.e. we won't be holding a writelock continuously for a long time. So we may not implement this in v1, pending confirmation from further perf runs. - Failure handling: Now that we use a batched re-encryption, it makes sense to simply retry the entire Callable (hence entire batch, since that just fails in 1 call). Then it sounds more admin-friendly to simply retry forever, with backoffs. If admin finds this annoying he can always cancel. This is better than the current way of fail the re-encryption after a few times, tell the admin, and force him to rerun the command. Also should add fault injectors to unit test some failure scenarios. > Add functionality to re-encrypt EDEKs > ------------------------------------- > > Key: HDFS-10899 > URL: https://issues.apache.org/jira/browse/HDFS-10899 > Project: Hadoop HDFS > Issue Type: New Feature > Components: encryption, kms > Reporter: Xiao Chen > Assignee: Xiao Chen > Attachments: editsStored, HDFS-10899.01.patch, HDFS-10899.02.patch, > HDFS-10899.03.patch, HDFS-10899.04.patch, HDFS-10899.05.patch, > HDFS-10899.06.patch, HDFS-10899.07.patch, HDFS-10899.08.patch, > HDFS-10899.09.patch, HDFS-10899.10.patch, HDFS-10899.10.wip.patch, > HDFS-10899.11.patch, HDFS-10899.wip.2.patch, HDFS-10899.wip.patch, Re-encrypt > edek design doc.pdf, Re-encrypt edek design doc V2.pdf > > > Currently when an encryption zone (EZ) key is rotated, it only takes effect > on new EDEKs. We should provide a way to re-encrypt EDEKs after the EZ key > rotation, for improved security. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org