[ 
https://issues.apache.org/jira/browse/HDFS-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17845461#comment-17845461
 ] 

ASF GitHub Bot commented on HDFS-13603:
---------------------------------------

simbadzina commented on code in PR #6774:
URL: https://github.com/apache/hadoop/pull/6774#discussion_r1597092684


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java:
##########
@@ -269,12 +269,23 @@ public ValueQueue(final int numValues, final float 
lowWaterMark, long expiry,
    * Initializes the Value Queues for the provided keys by calling the
    * fill Method with "numInitValues" values
    * @param keyNames Array of key Names
-   * @throws ExecutionException executionException.
+   * @throws IOException if no successful initialization for any key

Review Comment:
   The wording here is confusing. One way to read this is if any key. fails to 
initialize, then an except will be thrown. But IIUC an exception will be thrown 
if all keys fail to initialize.



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java:
##########
@@ -537,12 +537,12 @@ static boolean isInAnEZ(final FSDirectory fsd, final 
INodesInPath iip)
    * then launch up a separate thread to warm them up.
    */
   static void warmUpEdekCache(final ExecutorService executor,
-      final FSDirectory fsd, final int delay, final int interval) {
+      final FSDirectory fsd, final int delay, final int interval, final int 
maxRetries) {

Review Comment:
   Can you edit a comment in the function documentation to indicate that the 
warm up is best effort.



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirEncryptionZoneOp.java:
##########
@@ -580,15 +583,15 @@ public void run() {
       final int logCoolDown = 10000; // periodically print error log (if any)
       int sinceLastLog = logCoolDown; // always print the first failure
       boolean success = false;
+      int retryCount = 0;
       IOException lastSeenIOE = null;
       long warmUpEDEKStartTime = monotonicNow();
-      while (true) {
+
+      while (!success && retryCount < maxRetries) {
         try {
           kp.warmUpEncryptedKeys(keyNames);
-          NameNode.LOG
-              .info("Successfully warmed up {} EDEKs.", keyNames.length);
+          NameNode.LOG.info("Successfully warmed up {} EDEKs.", 
keyNames.length);
           success = true;
-          break;
         } catch (IOException ioe) {
           lastSeenIOE = ioe;
           if (sinceLastLog >= logCoolDown) {

Review Comment:
   `sinceLastLog` is no longer really used now. You can just print the failure 
since the retry count is limited.



##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSDirEncryptionZoneOp.java:
##########
@@ -0,0 +1,59 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode;
+
+import java.io.IOException;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.crypto.key.KeyProviderCryptoExtension;
+import org.apache.hadoop.hdfs.server.common.HdfsServerConstants.NamenodeRole;
+
+import org.junit.Test;
+
+import static org.mockito.ArgumentMatchers.any;
+import static org.mockito.Mockito.doThrow;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+
+public class TestFSDirEncryptionZoneOp {
+
+  @Test
+  public void testWarmUpEdekCacheRetries() throws IOException {
+    NameNode.initMetrics(new Configuration(), NamenodeRole.NAMENODE);
+
+    final int initialDelay = 100;
+    final int retryInterval = 100;
+    final int maxRetries = 2;
+
+    KeyProviderCryptoExtension kpMock = mock(KeyProviderCryptoExtension.class);
+
+    doThrow(new IOException())
+        .doThrow(new IOException())
+        .doAnswer(invocation -> null)
+        .when(kpMock).warmUpEncryptedKeys(any());
+
+    FSDirEncryptionZoneOp.EDEKCacheLoader loader =
+        new FSDirEncryptionZoneOp.EDEKCacheLoader(new String[] {"edek1", 
"edek2"}, kpMock,
+            initialDelay, retryInterval, maxRetries);
+
+    loader.run();
+
+    verify(kpMock, times(maxRetries)).warmUpEncryptedKeys(any());
+  }
+}

Review Comment:
   Can you add a test cache in which one or some of the keys are successfully 
warmed up, while others aren't.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/ValueQueue.java:
##########
@@ -269,12 +269,23 @@ public ValueQueue(final int numValues, final float 
lowWaterMark, long expiry,
    * Initializes the Value Queues for the provided keys by calling the
    * fill Method with "numInitValues" values
    * @param keyNames Array of key Names
-   * @throws ExecutionException executionException.
+   * @throws IOException if no successful initialization for any key
    */
-  public void initializeQueuesForKeys(String... keyNames)
-      throws ExecutionException {
+  public void initializeQueuesForKeys(String... keyNames) throws IOException {
+    int successfulInitializations = 0;
+    ExecutionException lastException = null;
+
     for (String keyName : keyNames) {
-      keyQueues.get(keyName);
+      try {
+        keyQueues.get(keyName);
+        successfulInitializations++;
+      } catch (ExecutionException e) {
+        lastException = e;
+      }
+    }
+
+    if (keyNames.length > 0 && successfulInitializations == 0) {
+      throw new IOException("Failed to initialize any queue for the provided 
keys.", lastException);

Review Comment:
   It seems you've made warm up a best error operation. If so, there should be 
no need to through an exception here. Just logging a warning should be enough.





> Warmup NameNode EDEK thread retries continuously if there's an invalid key 
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-13603
>                 URL: https://issues.apache.org/jira/browse/HDFS-13603
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: encryption, namenode
>    Affects Versions: 2.8.0
>            Reporter: Antony Jay
>            Priority: Major
>              Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/HDFS-9405 adds a background thread to 
> pre-warm EDEK cache. 
> However this fails and retries continuously if key retrieval fails for one 
> encryption zone. In our usecase, we have temporarily removed keys for certain 
> encryption zones.  Currently namenode and kms log is filled up with errors 
> related to background thread retrying warmup for ever .
> The pre-warm thread should
>  * Continue to refresh other encryption zones even if it fails for one
>  * Should retry only if it fails for all encryption zones, which will be the 
> case when kms is down.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to