[jira] [Work logged] (HDFS-16345) Fix test cases fail in TestBlockStoragePolicy
[ https://issues.apache.org/jira/browse/HDFS-16345?focusedWorklogId=690461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690461 ] ASF GitHub Bot logged work on HDFS-16345: - Author: ASF GitHub Bot Created on: 04/Dec/21 07:36 Start Date: 04/Dec/21 07:36 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3696: URL: https://github.com/apache/hadoop/pull/3696#discussion_r762397861 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockStoragePolicy.java ## @@ -1291,6 +1291,9 @@ public void testChooseTargetWithTopology() throws Exception { new HashSet(), 0, policy2, null); System.out.println(Arrays.asList(targets)); Assert.assertEquals(3, targets.length); +if (namenode != null) { + namenode.stop(); +} Review comment: This should be in a finally block. eg: Namenode namenode = new Namenode(conf) try { Do Something } finally { if (namenode != null) { namenode.stop(); } The reason being, in case the test fails after creation of namenode, in that case if it isn't in the finally block the stop command won't be executed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690461) Time Spent: 1h 10m (was: 1h) > Fix test cases fail in TestBlockStoragePolicy > - > > Key: HDFS-16345 > URL: https://issues.apache.org/jira/browse/HDFS-16345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: build >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > test class ``TestBlockStoragePolicy` ` fail frequently for the > `BindException`, it block all normal source code build. we can improve it. > [ERROR] Tests run: 26, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 49.295 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestBlockStoragePolicy > [ERROR] > testChooseTargetWithTopology(org.apache.hadoop.hdfs.TestBlockStoragePolicy) > Time elapsed: 0.551 s <<< ERROR! java.net.BindException: Problem binding to > [localhost:43947] java.net.BindException: Address already in use; For more > details see: http://wiki.apache.org/hadoop/BindException at > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:931) at > org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:827) at > org.apache.hadoop.ipc.Server.bind(Server.java:657) at > org.apache.hadoop.ipc.Server$Listener.(Server.java:1352) at > org.apache.hadoop.ipc.Server.(Server.java:3252) at > org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:468) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853) at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:466) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:860) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:766) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1017) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:992) > at > org.apache.hadoop.hdfs.TestBlockStoragePolicy.testChooseTargetWithTopology(TestBlockStoragePolicy.java:1275) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at >
[jira] [Work logged] (HDFS-16338) Fix error configuration message in FSImage
[ https://issues.apache.org/jira/browse/HDFS-16338?focusedWorklogId=690460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690460 ] ASF GitHub Bot logged work on HDFS-16338: - Author: ASF GitHub Bot Created on: 04/Dec/21 07:30 Start Date: 04/Dec/21 07:30 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3684: URL: https://github.com/apache/hadoop/pull/3684#discussion_r762397398 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java ## @@ -275,6 +276,29 @@ public void testSaveAndLoadStripedINodeFile() throws IOException{ } } + @Test + public void testImportCheckpoint() { +Configuration conf = new Configuration(); +conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_EDITS_DIR_KEY, ""); +MiniDFSCluster cluster = null; +try { Review comment: Can use try with resources for cluster ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImage.java ## @@ -275,6 +276,29 @@ public void testSaveAndLoadStripedINodeFile() throws IOException{ } } + @Test + public void testImportCheckpoint() { +Configuration conf = new Configuration(); +conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_EDITS_DIR_KEY, ""); +MiniDFSCluster cluster = null; +try { + cluster = new MiniDFSCluster.Builder(conf).build(); + cluster.waitActive(); + FSNamesystem fsn = cluster.getNamesystem(); + FSImage fsImage= new FSImage(conf); + fsImage.doImportCheckpoint(fsn); + fail("Expect to throw IOException."); +} catch (IOException e) { + GenericTestUtils.assertExceptionContains( + "Cannot import image from a checkpoint. " + + "\"dfs.namenode.checkpoint.edits.dir\" is not set.", e); Review comment: Use LambdaTestUtils instead of try-catch-assert -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690460) Time Spent: 2h 20m (was: 2h 10m) > Fix error configuration message in FSImage > -- > > Key: HDFS-16338 > URL: https://issues.apache.org/jira/browse/HDFS-16338 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > `dfs.namenode.checkpoint.edits.dir` may be different from > `dfs.namenode.checkpoint.dir` , if `checkpointEditsDirs` is null or empty, > error message should warn the edit dir configuration, we can fix it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16324) fix error log in BlockManagerSafeMode
[ https://issues.apache.org/jira/browse/HDFS-16324?focusedWorklogId=690459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690459 ] ASF GitHub Bot logged work on HDFS-16324: - Author: ASF GitHub Bot Created on: 04/Dec/21 07:26 Start Date: 04/Dec/21 07:26 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3661: URL: https://github.com/apache/hadoop/pull/3661#discussion_r762397173 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManagerSafeMode.java ## @@ -41,7 +42,6 @@ import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertFalse; import static org.junit.Assert.assertTrue; - Review comment: nit: revert this change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690459) Time Spent: 2h 50m (was: 2h 40m) > fix error log in BlockManagerSafeMode > - > > Key: HDFS-16324 > URL: https://issues.apache.org/jira/browse/HDFS-16324 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > if `recheckInterval` was set as invalid value, there will be warning log > output, but the message seems not that proper ,we can improve it. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13947) Review of DirectoryScanner Class
[ https://issues.apache.org/jira/browse/HDFS-13947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453298#comment-17453298 ] Ayush Saxena commented on HDFS-13947: - Hey Folks, Observed this while checking HDFS-16347. This patch changed the default value in DfsConfigKeys but not in hdfs-defaults: {code:java} DFS_DATANODE_DIRECTORYSCAN_THROTTLE_LIMIT_MS_PER_SEC_KEY = "dfs.datanode.directoryscan.throttle.limit.ms.per.sec"; public static final int - DFS_DATANODE_DIRECTORYSCAN_THROTTLE_LIMIT_MS_PER_SEC_DEFAULT = 1000; + DFS_DATANODE_DIRECTORYSCAN_THROTTLE_LIMIT_MS_PER_SEC_DEFAULT = -1; {code} Was that a miss or the change here is accidental. If we changed the default value we should have put that in the release notes for others to know. Let me know if that was intentional, if so we can get HDFS-16347 in and update release notes there > Review of DirectoryScanner Class > > > Key: HDFS-13947 > URL: https://issues.apache.org/jira/browse/HDFS-13947 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.2.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-13947.1.patch, HDFS-13947.2.patch, > HDFS-13947.3.patch, HDFS-13947.4.patch, HDFS-13947.5.patch > > > Review of Directory Scanner. Replaced a lot of code with Guava MultiMap. > Some general house cleaning and improved logging. For performance, using > {{ArrayList}} instead of {{LinkedList}} where possible, especially since > these lists can be quite large a LinkedList will consume a lot of memory and > be slow to sort/iterate over. > https://stackoverflow.com/questions/322715/when-to-use-linkedlist-over-arraylist-in-java -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16347) Fix directory scan throttle default value
[ https://issues.apache.org/jira/browse/HDFS-16347?focusedWorklogId=690453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690453 ] ASF GitHub Bot logged work on HDFS-16347: - Author: ASF GitHub Bot Created on: 04/Dec/21 06:35 Start Date: 04/Dec/21 06:35 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3703: URL: https://github.com/apache/hadoop/pull/3703#discussion_r76239 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml ## @@ -874,7 +874,7 @@ dfs.datanode.directoryscan.throttle.limit.ms.per.sec - 1000 + -1 Review comment: This is not just a doc change, This change will indeed change the default. Let me confirm on the original jira as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690453) Time Spent: 1h 10m (was: 1h) > Fix directory scan throttle default value > - > > Key: HDFS-16347 > URL: https://issues.apache.org/jira/browse/HDFS-16347 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > `dfs.datanode.directoryscan.throttle.limit.ms.per.sec` was changed from > `1000` to `-1` by default after HDFS-13947, we can improve the doc -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16347) Fix directory scan throttle default value
[ https://issues.apache.org/jira/browse/HDFS-16347?focusedWorklogId=690452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690452 ] ASF GitHub Bot logged work on HDFS-16347: - Author: ASF GitHub Bot Created on: 04/Dec/21 06:34 Start Date: 04/Dec/21 06:34 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3703: URL: https://github.com/apache/hadoop/pull/3703#discussion_r76239 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml ## @@ -874,7 +874,7 @@ dfs.datanode.directoryscan.throttle.limit.ms.per.sec - 1000 + -1 Review comment: This is not just a doc change, This change will indeed change the default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690452) Time Spent: 1h (was: 50m) > Fix directory scan throttle default value > - > > Key: HDFS-16347 > URL: https://issues.apache.org/jira/browse/HDFS-16347 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > `dfs.datanode.directoryscan.throttle.limit.ms.per.sec` was changed from > `1000` to `-1` by default after HDFS-13947, we can improve the doc -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16322) The NameNode implementation of ClientProtocol.truncate(...) can cause data loss.
[ https://issues.apache.org/jira/browse/HDFS-16322?focusedWorklogId=690449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690449 ] ASF GitHub Bot logged work on HDFS-16322: - Author: ASF GitHub Bot Created on: 04/Dec/21 06:28 Start Date: 04/Dec/21 06:28 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3705: URL: https://github.com/apache/hadoop/pull/3705#discussion_r762392267 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java ## @@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, Options.Rename... options) } @Override // ClientProtocol - public boolean truncate(String src, long newLength, String clientName) - throws IOException { + public boolean truncate(String src, long newLength, String clientName) throws IOException { Review comment: nit: only formatting change and unrelated, Please avoid ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java ## @@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, Options.Rename... options) } @Override // ClientProtocol - public boolean truncate(String src, long newLength, String clientName) - throws IOException { + public boolean truncate(String src, long newLength, String clientName) throws IOException { checkNNStartup(); -stateChangeLog -.debug("*DIR* NameNode.truncate: " + src + " to " + newLength); +if(stateChangeLog.isDebugEnabled()) { + stateChangeLog.debug("*DIR* NameNode.truncate: " + src + " to " + + newLength); +} +CacheEntryWithPayload cacheEntry = RetryCache.waitForCompletion(retryCache, null); +if (cacheEntry != null && cacheEntry.isSuccess()) { + return (boolean)cacheEntry.getPayload(); +} + String clientMachine = getClientMachine(); +boolean ret = false; try { - return namesystem.truncate( + ret = namesystem.truncate( src, newLength, clientName, clientMachine, now()); } finally { + RetryCache.setState(cacheEntry, true, ret); Review comment: Finally block will be executed in case of exception as well, we can not hard-code `true` here. Can check other codes like for `append` to get some reference & idea. ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java ## @@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, Options.Rename... options) } @Override // ClientProtocol - public boolean truncate(String src, long newLength, String clientName) - throws IOException { + public boolean truncate(String src, long newLength, String clientName) throws IOException { checkNNStartup(); -stateChangeLog -.debug("*DIR* NameNode.truncate: " + src + " to " + newLength); +if(stateChangeLog.isDebugEnabled()) { + stateChangeLog.debug("*DIR* NameNode.truncate: " + src + " to " + + newLength); +} +CacheEntryWithPayload cacheEntry = RetryCache.waitForCompletion(retryCache, null); Review comment: I think we should have `` namesystem.checkOperation(OperationCategory.WRITE); `` above the this like other calls and remove this check before lock in FsNamesystem. ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java ## @@ -1100,20 +1100,29 @@ public void rename2(String src, String dst, Options.Rename... options) } @Override // ClientProtocol - public boolean truncate(String src, long newLength, String clientName) - throws IOException { + public boolean truncate(String src, long newLength, String clientName) throws IOException { checkNNStartup(); -stateChangeLog -.debug("*DIR* NameNode.truncate: " + src + " to " + newLength); +if(stateChangeLog.isDebugEnabled()) { + stateChangeLog.debug("*DIR* NameNode.truncate: " + src + " to " + + newLength); +} +CacheEntryWithPayload cacheEntry = RetryCache.waitForCompletion(retryCache, null); +if (cacheEntry != null && cacheEntry.isSuccess()) { + return (boolean)cacheEntry.getPayload(); +} + String clientMachine = getClientMachine(); +boolean ret = false; try { - return namesystem.truncate( + ret = namesystem.truncate( src, newLength, clientName, clientMachine, now()); } finally { + RetryCache.setState(cacheEntry, true, ret); metrics.incrFilesTruncated(); } +return ret; } - + Review comment: nit: unrelated change, revert!! -- This is an automated message from the Apache Git Service. To respond to the
[jira] [Work logged] (HDFS-16370) Fix assert message for BlockInfo
[ https://issues.apache.org/jira/browse/HDFS-16370?focusedWorklogId=690448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690448 ] ASF GitHub Bot logged work on HDFS-16370: - Author: ASF GitHub Bot Created on: 04/Dec/21 05:53 Start Date: 04/Dec/21 05:53 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3747: URL: https://github.com/apache/hadoop/pull/3747#discussion_r762390295 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java ## @@ -146,7 +146,7 @@ BlockInfo getNext(int index) { BlockInfo info = (BlockInfo)triplets[index*3+2]; assert info == null || info.getClass().getName().startsWith( BlockInfo.class.getName()) : -"BlockInfo is expected at " + index*3; +"BlockInfo is expected at " + (index*3+2); Review comment: Same as above, Can you pad some space between the values ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockInfo.java ## @@ -136,7 +136,7 @@ BlockInfo getPrevious(int index) { BlockInfo info = (BlockInfo)triplets[index*3+1]; assert info == null || info.getClass().getName().startsWith(BlockInfo.class.getName()) : -"BlockInfo is expected at " + index*3; +"BlockInfo is expected at " + (index*3+1); Review comment: nit: Better to have some space around the values ```suggestion "BlockInfo is expected at " + (index * 3 + 1); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690448) Time Spent: 0.5h (was: 20m) > Fix assert message for BlockInfo > > > Key: HDFS-16370 > URL: https://issues.apache.org/jira/browse/HDFS-16370 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > In both methods BlockInfo#getPrevious and BlockInfo#getNext, the assert > message is wrong. This may cause some misunderstanding and needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16351) add path exception information in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-16351?focusedWorklogId=690447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690447 ] ASF GitHub Bot logged work on HDFS-16351: - Author: ASF GitHub Bot Created on: 04/Dec/21 05:49 Start Date: 04/Dec/21 05:49 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3713: URL: https://github.com/apache/hadoop/pull/3713#discussion_r762390099 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSNamesystem.java ## @@ -120,6 +123,23 @@ public void testStartupSafemode() throws IOException { + "isInSafeMode still returned false", fsn.isInSafeMode()); } + @Test + public void testCheckAccess() throws IOException { +Configuration conf = new Configuration(); +FSImage fsImage = Mockito.mock(FSImage.class); Review comment: Remove this, one test is enough, we need not to mock and try ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java ## @@ -30,6 +30,7 @@ import java.util.Map; import java.util.Random; +import org.apache.hadoop.test.GenericTestUtils; Review comment: import order seems wrong, the import should be in org.apache.hadoop. block with the others. ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java ## @@ -260,6 +261,33 @@ private void createAndCheckPermission(OpType op, Path name, short umask, checkPermission(name, expectedPermission, delete); } + @Test + public void testFSNamesystemCheckAccess() throws Exception { +Path testValidDir = new Path("/test1"); +Path testValidFile = new Path("/test1/file1"); +Path testInvalidPath = new Path("/test2"); +fs = FileSystem.get(conf); + +fs.mkdirs(testValidDir); +fs.create(testValidFile); + +fs.access(testValidDir, FsAction.READ); +fs.access(testValidFile, FsAction.READ); + +assertTrue(fs.exists(testValidDir)); +assertTrue(fs.exists(testValidFile)); + +try { + fs.access(testInvalidPath, FsAction.READ); + fail("Failed to get expected FileNotFoundException"); +} catch (FileNotFoundException e) { + GenericTestUtils.assertExceptionContains( + "Path not found: " + testInvalidPath, e); +} finally { + fs.delete(testValidDir, true); +} + } + Review comment: This is like testing normal fs.access also which isn't required, we just changed the exception, we can test that only. Can use LambdaTestUtils for that rather than the present try-catch. Something like this should do: ``` @Test public void testFSNamesystemCheckAccess() throws Exception { Path testInvalidPath = new Path("/test2"); fs = FileSystem.get(conf); LambdaTestUtils.intercept(FileNotFoundException.class, "Path not found: " + testInvalidPath, () -> fs.access(testInvalidPath, FsAction.READ)); } ``` ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSPermission.java ## @@ -289,7 +317,7 @@ public void testImmutableFsPermission() throws IOException { fs.setPermission(new Path("/"), FsPermission.createImmutable((short)0777)); } - + Review comment: unrelated, revert -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690447) Time Spent: 2h 10m (was: 2h) > add path exception information in FSNamesystem > -- > > Key: HDFS-16351 > URL: https://issues.apache.org/jira/browse/HDFS-16351 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > add path information in exception message to make message more clear in > FSNamesystem -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16369) RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs
[ https://issues.apache.org/jira/browse/HDFS-16369?focusedWorklogId=690442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690442 ] ASF GitHub Bot logged work on HDFS-16369: - Author: ASF GitHub Bot Created on: 04/Dec/21 05:25 Start Date: 04/Dec/21 05:25 Worklog Time Spent: 10m Work Description: ayushtkn commented on pull request #3745: URL: https://github.com/apache/hadoop/pull/3745#issuecomment-985971627 Merged, Thanx @goiri and @tomscut for the review!!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690442) Time Spent: 1.5h (was: 1h 20m) > RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs > --- > > Key: HDFS-16369 > URL: https://issues.apache.org/jira/browse/HDFS-16369 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > As of now invokeAtAvailableNs, retries only once if the default or the first > namespace is not available, despite having other namespaces available. > Optimise to retry on all namespaces. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16369) RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs
[ https://issues.apache.org/jira/browse/HDFS-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HDFS-16369. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs > --- > > Key: HDFS-16369 > URL: https://issues.apache.org/jira/browse/HDFS-16369 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > As of now invokeAtAvailableNs, retries only once if the default or the first > namespace is not available, despite having other namespaces available. > Optimise to retry on all namespaces. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16369) RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs
[ https://issues.apache.org/jira/browse/HDFS-16369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453290#comment-17453290 ] Ayush Saxena commented on HDFS-16369: - Committed to trunk. Thanx Everyone for the review!!! > RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs > --- > > Key: HDFS-16369 > URL: https://issues.apache.org/jira/browse/HDFS-16369 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > As of now invokeAtAvailableNs, retries only once if the default or the first > namespace is not available, despite having other namespaces available. > Optimise to retry on all namespaces. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16369) RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs
[ https://issues.apache.org/jira/browse/HDFS-16369?focusedWorklogId=690441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690441 ] ASF GitHub Bot logged work on HDFS-16369: - Author: ASF GitHub Bot Created on: 04/Dec/21 05:24 Start Date: 04/Dec/21 05:24 Worklog Time Spent: 10m Work Description: ayushtkn merged pull request #3745: URL: https://github.com/apache/hadoop/pull/3745 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690441) Time Spent: 1h 20m (was: 1h 10m) > RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs > --- > > Key: HDFS-16369 > URL: https://issues.apache.org/jira/browse/HDFS-16369 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > As of now invokeAtAvailableNs, retries only once if the default or the first > namespace is not available, despite having other namespaces available. > Optimise to retry on all namespaces. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16371) Exclude slow disks when choosing volume
tomscut created HDFS-16371: -- Summary: Exclude slow disks when choosing volume Key: HDFS-16371 URL: https://issues.apache.org/jira/browse/HDFS-16371 Project: Hadoop HDFS Issue Type: Improvement Reporter: tomscut Assignee: tomscut Currently, the datanode can detect slow disks. When choosing volume, we can exclude these slow disks according to some rules. This will prevents some slow disks from affecting the throughput of the whole datanode. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16370) Fix assert message for BlockInfo
[ https://issues.apache.org/jira/browse/HDFS-16370?focusedWorklogId=690265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690265 ] ASF GitHub Bot logged work on HDFS-16370: - Author: ASF GitHub Bot Created on: 03/Dec/21 19:07 Start Date: 03/Dec/21 19:07 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3747: URL: https://github.com/apache/hadoop/pull/3747#issuecomment-985760522 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 1s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 35m 22s | | trunk passed | | +1 :green_heart: | compile | 1m 28s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 1m 19s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 0m 57s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 27s | | trunk passed | | +1 :green_heart: | javadoc | 1m 1s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 31s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 19s | | the patch passed | | +1 :green_heart: | compile | 1m 22s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 14s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 1m 14s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 52s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 19s | | the patch passed | | +1 :green_heart: | javadoc | 0m 52s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 24s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 3m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 39s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 331m 40s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 440m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3747/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3747 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux aac445805a8d 4.15.0-153-generic #160-Ubuntu SMP Thu Jul 29 06:54:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 8cc00d6045598c7dfee290975fa04ecf6438d371 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3747/1/testReport/ | | Max. process+thread count | 2118 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3747/1/console | | versions |
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453162#comment-17453162 ] Hadoop QA commented on HDFS-16293: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 42s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 1s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 6s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 41s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 41s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 3s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 20s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 22m 53s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 10s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 32m 19s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 5m 38s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 2s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 20s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/748/artifact/out/diff-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color} | {color:red} hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 5 new + 646 unchanged - 0 fixed = 651 total (was 646) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 1s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 5m 1s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/748/artifact/out/diff-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color} | {color:red} hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 5 new + 623 unchanged - 0 fixed = 628 total (was 623) {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 5s{color} | {color:orange}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/748/artifact/out/diff-checkstyle-hadoop-hdfs-project.txt{color} | {color:orange}
[jira] [Work logged] (HDFS-16369) RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs
[ https://issues.apache.org/jira/browse/HDFS-16369?focusedWorklogId=690203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690203 ] ASF GitHub Bot logged work on HDFS-16369: - Author: ASF GitHub Bot Created on: 03/Dec/21 17:30 Start Date: 03/Dec/21 17:30 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #3745: URL: https://github.com/apache/hadoop/pull/3745#discussion_r762121422 ## File path: hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRPCMultipleDestinationMountTableResolver.java ## @@ -668,14 +674,16 @@ public void testInvokeAtAvailableNs() throws IOException { // Make one subcluster unavailable. MiniDFSCluster dfsCluster = cluster.getCluster(); dfsCluster.shutdownNameNode(0); +dfsCluster.shutdownNameNode(1); try { // Verify that #invokeAtAvailableNs works by calling #getServerDefaults. RemoteMethod method = new RemoteMethod("getServerDefaults"); FsServerDefaults serverDefaults = rpcServer.invokeAtAvailableNs(method, FsServerDefaults.class); assertNotNull(serverDefaults); Review comment: Yes, the flakiness is not ideal. Let's go with this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690203) Time Spent: 1h 10m (was: 1h) > RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs > --- > > Key: HDFS-16369 > URL: https://issues.apache.org/jira/browse/HDFS-16369 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > As of now invokeAtAvailableNs, retries only once if the default or the first > namespace is not available, despite having other namespaces available. > Optimise to retry on all namespaces. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16314) Support to make dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-16314. -- Fix Version/s: 3.4.0 3.3.3 Resolution: Fixed Committed to trunk and branch-3.3. Thanks [~haiyang Hu] for your contribution! > Support to make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > - > > Key: HDFS-16314 > URL: https://issues.apache.org/jira/browse/HDFS-16314 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Consider that make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > and rapid rollback in case this feature HDFS-16076 unexpected things happen > in production environment -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16287) Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-16287: - Fix Version/s: 3.3.3 Backported to branch-3.3 to backport HDFS-16314. > Support to make dfs.namenode.avoid.read.slow.datanode reconfigurable > - > > Key: HDFS-16287 > URL: https://issues.apache.org/jira/browse/HDFS-16287 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 11.5h > Remaining Estimate: 0h > > 1. Consider that make dfs.namenode.avoid.read.slow.datanode reconfigurable > and rapid rollback in case this feature > [HDFS-16076|https://issues.apache.org/jira/browse/HDFS-16076] unexpected > things happen in production environment > 2. DatanodeManager#startSlowPeerCollector by parameter > 'dfs.datanode.peer.stats.enabled' to control -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16314) Support to make dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16314?focusedWorklogId=690195=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690195 ] ASF GitHub Bot logged work on HDFS-16314: - Author: ASF GitHub Bot Created on: 03/Dec/21 17:20 Start Date: 03/Dec/21 17:20 Worklog Time Spent: 10m Work Description: aajisaka commented on pull request #3664: URL: https://github.com/apache/hadoop/pull/3664#issuecomment-985693281 Merged. Thank you @haiyang1987 for your contribution and thank you @ferhui @tomscut for your review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690195) Time Spent: 3h 50m (was: 3h 40m) > Support to make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > - > > Key: HDFS-16314 > URL: https://issues.apache.org/jira/browse/HDFS-16314 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Consider that make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > and rapid rollback in case this feature HDFS-16076 unexpected things happen > in production environment -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16314) Support to make dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16314?focusedWorklogId=690194=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690194 ] ASF GitHub Bot logged work on HDFS-16314: - Author: ASF GitHub Bot Created on: 03/Dec/21 17:19 Start Date: 03/Dec/21 17:19 Worklog Time Spent: 10m Work Description: aajisaka merged pull request #3664: URL: https://github.com/apache/hadoop/pull/3664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690194) Time Spent: 3h 40m (was: 3.5h) > Support to make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > - > > Key: HDFS-16314 > URL: https://issues.apache.org/jira/browse/HDFS-16314 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Consider that make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > and rapid rollback in case this feature HDFS-16076 unexpected things happen > in production environment -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16314) Support to make dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16314?focusedWorklogId=690193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690193 ] ASF GitHub Bot logged work on HDFS-16314: - Author: ASF GitHub Bot Created on: 03/Dec/21 17:17 Start Date: 03/Dec/21 17:17 Worklog Time Spent: 10m Work Description: aajisaka commented on a change in pull request #3664: URL: https://github.com/apache/hadoop/pull/3664#discussion_r762113040 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java ## @@ -261,4 +261,16 @@ protected String getRack(final DatanodeInfo datanode) { } } } + + /** + * Updates the value used for excludeSlowNodesEnabled, which is set by + * {@code DFSConfigKeys.DFS_NAMENODE_BLOCKPLACEMENTPOLICY_EXCLUDE_SLOW_NODES_ENABLED_KEY} + * initially. + * + * @param enable true, we will filter out slow nodes + * when choosing targets for blocks, otherwise false not filter. + */ + public abstract void setExcludeSlowNodesEnabled(boolean enable); + + public abstract boolean getExcludeSlowNodesEnabled(); Review comment: This interface is marked as `@Private`, so adding abstract methods is okay. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690193) Time Spent: 3.5h (was: 3h 20m) > Support to make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > - > > Key: HDFS-16314 > URL: https://issues.apache.org/jira/browse/HDFS-16314 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Consider that make > dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled reconfigurable > and rapid rollback in case this feature HDFS-16076 unexpected things happen > in production environment -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453142#comment-17453142 ] Hadoop QA commented on HDFS-16293: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 47s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 13s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 26s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 8s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 41s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 42s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 26m 31s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 37m 32s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 6m 40s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 28s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 23s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 50s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 50s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/747/artifact/out/diff-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color} | {color:red} hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 5 new + 647 unchanged - 0 fixed = 652 total (was 647) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 19s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 19s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/747/artifact/out/diff-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color} | {color:red} hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 5 new + 624 unchanged - 0 fixed = 629 total (was 624) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 21s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 46s{color} |
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690153 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:28 Start Date: 03/Dec/21 16:28 Worklog Time Spent: 10m Work Description: KevinWikant commented on a change in pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#discussion_r762078547 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminBackoffMonitor.java ## @@ -189,6 +190,30 @@ public void run() { * node will be removed from tracking by the pending cancel. */ processCancelledNodes(); + +// Having more nodes decommissioning than can be tracked will impact decommissioning +// performance due to queueing delay +int numTrackedNodes = outOfServiceNodeBlocks.size(); +int numQueuedNodes = getPendingNodes().size(); +int numDecommissioningNodes = numTrackedNodes + numQueuedNodes; +if (numDecommissioningNodes > maxConcurrentTrackedNodes) { + LOG.warn( + "There are {} nodes decommissioning but only {} nodes will be tracked at a time. " + + "{} nodes are currently queued waiting to be decommissioned.", + numDecommissioningNodes, maxConcurrentTrackedNodes, numQueuedNodes); + + // Re-queue unhealthy nodes to make space for decommissioning healthy nodes + final List unhealthyDns = outOfServiceNodeBlocks.keySet().stream() + .filter(dn -> !blockManager.isNodeHealthyForDecommissionOrMaintenance(dn)) + .collect(Collectors.toList()); + final List toRequeue = + identifyUnhealthyNodesToRequeue(unhealthyDns, numDecommissioningNodes); + for (DatanodeDescriptor dn : toRequeue) { +getPendingNodes().add(dn); +outOfServiceNodeBlocks.remove(dn); Review comment: I think I may also need to remove from "pendingRep" here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690153) Time Spent: 6h 50m (was: 6h 40m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690149 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:27 Start Date: 03/Dec/21 16:27 Worklog Time Spent: 10m Work Description: KevinWikant commented on a change in pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#discussion_r762077696 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java ## @@ -1654,4 +1658,139 @@ public Boolean get() { cleanupFile(fileSys, file); } + + /** + * Test DatanodeAdminManager logic to re-queue unhealthy decommissioning nodes + * which are blocking the decommissioning of healthy nodes. + * Force the tracked nodes set to be filled with nodes lost while decommissioning, + * then decommission healthy nodes & validate they are decommissioned eventually. + */ + @Test(timeout = 12) + public void testRequeueUnhealthyDecommissioningNodes() throws Exception { +// Allow 3 datanodes to be decommissioned at a time + getConf().setInt(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MAX_CONCURRENT_TRACKED_NODES, 3); +// Disable the normal monitor runs +getConf() +.setInt(MiniDFSCluster.DFS_NAMENODE_DECOMMISSION_INTERVAL_TESTING_KEY, Integer.MAX_VALUE); + +// Start cluster with 6 datanodes +startCluster(1, 6); +final FSNamesystem namesystem = getCluster().getNamesystem(); +final BlockManager blockManager = namesystem.getBlockManager(); +final DatanodeManager datanodeManager = blockManager.getDatanodeManager(); +final DatanodeAdminManager decomManager = datanodeManager.getDatanodeAdminManager(); +assertEquals(6, getCluster().getDataNodes().size()); + +// 3 datanodes will be "live" datanodes that are expected to be decommissioned eventually +final List liveNodes = getCluster().getDataNodes().subList(3, 6).stream() +.map(dn -> getDatanodeDesriptor(namesystem, dn.getDatanodeUuid())) +.collect(Collectors.toList()); +assertEquals(3, liveNodes.size()); + +// 3 datanodes will be "dead" datanodes that are expected to never be decommissioned +final List deadNodes = getCluster().getDataNodes().subList(0, 3).stream() +.map(dn -> getDatanodeDesriptor(namesystem, dn.getDatanodeUuid())) +.collect(Collectors.toList()); +assertEquals(3, deadNodes.size()); + +// Need to create some data or "isNodeHealthyForDecommissionOrMaintenance" +// may unexpectedly return true for a dead node +writeFile(getCluster().getFileSystem(), new Path("/tmp/test1"), 1, 100); + +// Cause the 3 "dead" nodes to be lost while in state decommissioning +// and fill the tracked nodes set with those 3 "dead" nodes +ArrayList decommissionedNodes = Lists.newArrayList(); +int expectedNumTracked = 0; +for (final DatanodeDescriptor deadNode : deadNodes) { Review comment: should put the "waitFor" after the for loop such that the nodes can be stopped in parallel, this will improve the runtime of the test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690149) Time Spent: 6h 40m (was: 6.5h) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690147 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:26 Start Date: 03/Dec/21 16:26 Worklog Time Spent: 10m Work Description: KevinWikant commented on a change in pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#discussion_r762076676 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java ## @@ -1654,4 +1658,139 @@ public Boolean get() { cleanupFile(fileSys, file); } + + /** + * Test DatanodeAdminManager logic to re-queue unhealthy decommissioning nodes + * which are blocking the decommissioning of healthy nodes. + * Force the tracked nodes set to be filled with nodes lost while decommissioning, + * then decommission healthy nodes & validate they are decommissioned eventually. + */ + @Test(timeout = 12) + public void testRequeueUnhealthyDecommissioningNodes() throws Exception { +// Allow 3 datanodes to be decommissioned at a time + getConf().setInt(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MAX_CONCURRENT_TRACKED_NODES, 3); +// Disable the normal monitor runs +getConf() +.setInt(MiniDFSCluster.DFS_NAMENODE_DECOMMISSION_INTERVAL_TESTING_KEY, Integer.MAX_VALUE); + +// Start cluster with 6 datanodes +startCluster(1, 6); +final FSNamesystem namesystem = getCluster().getNamesystem(); +final BlockManager blockManager = namesystem.getBlockManager(); +final DatanodeManager datanodeManager = blockManager.getDatanodeManager(); +final DatanodeAdminManager decomManager = datanodeManager.getDatanodeAdminManager(); +assertEquals(6, getCluster().getDataNodes().size()); + +// 3 datanodes will be "live" datanodes that are expected to be decommissioned eventually +final List liveNodes = getCluster().getDataNodes().subList(3, 6).stream() +.map(dn -> getDatanodeDesriptor(namesystem, dn.getDatanodeUuid())) +.collect(Collectors.toList()); +assertEquals(3, liveNodes.size()); + +// 3 datanodes will be "dead" datanodes that are expected to never be decommissioned +final List deadNodes = getCluster().getDataNodes().subList(0, 3).stream() +.map(dn -> getDatanodeDesriptor(namesystem, dn.getDatanodeUuid())) +.collect(Collectors.toList()); +assertEquals(3, deadNodes.size()); + +// Need to create some data or "isNodeHealthyForDecommissionOrMaintenance" +// may unexpectedly return true for a dead node +writeFile(getCluster().getFileSystem(), new Path("/tmp/test1"), 1, 100); Review comment: should use a larger replication factor here to ensure there are LowRendundancy blocks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690147) Time Spent: 6.5h (was: 6h 20m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690146 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:25 Start Date: 03/Dec/21 16:25 Worklog Time Spent: 10m Work Description: KevinWikant commented on a change in pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#discussion_r762076295 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDecommission.java ## @@ -1654,4 +1658,139 @@ public Boolean get() { cleanupFile(fileSys, file); } + + /** + * Test DatanodeAdminManager logic to re-queue unhealthy decommissioning nodes + * which are blocking the decommissioning of healthy nodes. + * Force the tracked nodes set to be filled with nodes lost while decommissioning, + * then decommission healthy nodes & validate they are decommissioned eventually. + */ + @Test(timeout = 12) + public void testRequeueUnhealthyDecommissioningNodes() throws Exception { +// Allow 3 datanodes to be decommissioned at a time + getConf().setInt(DFSConfigKeys.DFS_NAMENODE_DECOMMISSION_MAX_CONCURRENT_TRACKED_NODES, 3); +// Disable the normal monitor runs +getConf() +.setInt(MiniDFSCluster.DFS_NAMENODE_DECOMMISSION_INTERVAL_TESTING_KEY, Integer.MAX_VALUE); + +// Start cluster with 6 datanodes +startCluster(1, 6); Review comment: can probably reduce the number of nodes in this test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690146) Time Spent: 6h 20m (was: 6h 10m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0):
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690140 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:20 Start Date: 03/Dec/21 16:20 Worklog Time Spent: 10m Work Description: KevinWikant commented on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-985650485 I would also add, that if you look at the implementation of the proposed alternative of removing a dead DECOMMISSION_INPROGRESS node from the DatanodeAdminManager: https://github.com/apache/hadoop/pull/3746/files It is not any less complex than this change, due to aforementioned caveats that need to be dealt with -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690140) Time Spent: 6h 10m (was: 6h) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In > Progress. Cannot be safely decommissioned or be in maintenance since there is > risk of reduced data durability or data loss. Either restart the failed node > or force decommissioning or maintenance by removing, calling refreshNodes, > then re-adding to the excludes or host config files. > {quote} > If a Datanode is lost while decommissioning (for example if the underlying > hardware fails or is lost), then it will remain in state decommissioning > forever. > If 100 or more Datanodes are lost while decommissioning over the Hadoop > cluster lifetime, then this is enough to completely fill up the > "tracked.nodes" set. With the entire "tracked.nodes" set filled with > datanodes that can never finish decommissioning, any
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690136 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:16 Start Date: 03/Dec/21 16:16 Worklog Time Spent: 10m Work Description: KevinWikant commented on pull request #3675: URL: https://github.com/apache/hadoop/pull/3675#issuecomment-985647896 @sodonnel The existing test "TestDecommissioningStatus.testDecommissionStatusAfterDNRestart" will be problematic for the proposed alternative of removing a dead DECOMMISSION_INPROGRESS node from the DatanodeAdminManager: https://github.com/apache/hadoop/pull/3746/ As previously stated, removing the dead DECOMMISSION_INPROGRESS node from the DatanodeAdminManager means that when there are no LowRedundancy blocks the dead node will remain in DECOMMISSION_INPROGRESS rather than transitioning to DECOMMISSIONED This violates the expectation the the unit test is enforcing which is that a dead DECOMMISSION_INPROGRESS node should transition to DECOMMISSIONED when there are no LowRedundancy blocks ``` "Delete the under-replicated file, which should let the DECOMMISSION_IN_PROGRESS node become DECOMMISSIONED" ``` https://github.com/apache/hadoop/blob/6342d5e523941622a140fd877f06e9b59f48c48b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java#L451 Therefore, I think this is a good argument to remain more in favor of the original proposed change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690136) Time Spent: 6h (was: 5h 50m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN >
[jira] [Work logged] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?focusedWorklogId=690134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690134 ] ASF GitHub Bot logged work on HDFS-16303: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:15 Start Date: 03/Dec/21 16:15 Worklog Time Spent: 10m Work Description: KevinWikant commented on pull request #3746: URL: https://github.com/apache/hadoop/pull/3746#issuecomment-985646803 @sodonnel The existing test "TestDecommissioningStatus.testDecommissionStatusAfterDNRestart" will be problematic for this change As previously stated, removing the dead DECOMMISSION_INPROGRESS node from the DatanodeAdminManager means that when there are no LowRedundancy blocks the dead node will remain in DECOMMISSION_INPROGRESS rather than transitioning to DECOMMISSIONED This violates the expectation the the unit test is enforcing which is that a dead DECOMMISSION_INPROGRESS node should transition to DECOMMISSIONED when there are no LowRedundancy blocks ``` "Delete the under-replicated file, which should let the DECOMMISSION_IN_PROGRESS node become DECOMMISSIONED" ``` https://github.com/apache/hadoop/blob/6342d5e523941622a140fd877f06e9b59f48c48b/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java#L451 Therefore, I think this is a good argument to remain more in favor of the original proposed change: https://github.com/apache/hadoop/pull/3675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690134) Time Spent: 5h 50m (was: 5h 40m) > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453122#comment-17453122 ] Hadoop QA commented on HDFS-16293: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 19s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} {color} | {color:green} 0m 0s{color} | {color:green}test4tests{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 50s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 34s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 7s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 40s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 16s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 36s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 25m 16s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 35m 34s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 6m 16s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 29s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 24s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 20s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 7m 20s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/746/artifact/out/diff-compile-javac-hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04.txt{color} | {color:red} hadoop-hdfs-project-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 5 new + 646 unchanged - 0 fixed = 651 total (was 646) {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 34s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 6m 34s{color} | {color:red}https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/746/artifact/out/diff-compile-javac-hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10.txt{color} | {color:red} hadoop-hdfs-project-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 5 new + 624 unchanged - 0 fixed = 629 total (was 624) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 28s{color} |
[jira] [Work logged] (HDFS-16357) Fix log format in DFSUtilClient
[ https://issues.apache.org/jira/browse/HDFS-16357?focusedWorklogId=690133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690133 ] ASF GitHub Bot logged work on HDFS-16357: - Author: ASF GitHub Bot Created on: 03/Dec/21 16:14 Start Date: 03/Dec/21 16:14 Worklog Time Spent: 10m Work Description: ayushtkn commented on a change in pull request #3729: URL: https://github.com/apache/hadoop/pull/3729#discussion_r762068062 ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java ## @@ -733,13 +733,13 @@ public static boolean isLocalAddress(InetSocketAddress targetAddr) InetAddress addr = targetAddr.getAddress(); Boolean cached = localAddrMap.get(addr.getHostAddress()); if (cached != null) { - LOG.trace("Address {} is {} local", targetAddr, (cached ? "" : "not")); + LOG.trace("Address " + targetAddr + (cached ? " is local" : " is not local")); Review comment: The present change Looks good to me, What is the problem in that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690133) Time Spent: 1h (was: 50m) > Fix log format in DFSUtilClient > --- > > Key: HDFS-16357 > URL: https://issues.apache.org/jira/browse/HDFS-16357 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1 >Reporter: guophilipse >Priority: Minor > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > if address is local ,there will be additional space in the log .we can > improve it to look proper -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-16332: - Fix Version/s: 3.4.0 3.2.4 3.3.3 Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk, branch-3.3, and branch-3.2. Thank you [~lineyshinya] for your contribution! > Expired block token causes slow read due to missing handling in sasl handshake > -- > > Key: HDFS-16332 > URL: https://issues.apache.org/jira/browse/HDFS-16332 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, dfs, dfsclient >Affects Versions: 2.8.5, 3.3.1 >Reporter: Shinya Yoshida >Assignee: Shinya Yoshida >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from > 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png > > Time Spent: 5h 40m > Remaining Estimate: 0h > > We're operating the HBase 1.4.x cluster on Hadoop 2.8.5. > We're recently evaluating Kerberos secured HBase and Hadoop cluster with > production load and we observed HBase's response slows >= several seconds, > and about several minutes for worst-case (about once~three times a month). > The following image is a scatter plot of HBase's response slow, each circle > is each base's slow response log. > The X-axis is the date time of the log occurred, the Y-axis is the response > slow time. > !Screenshot from 2021-11-18 12-14-29.png! > We could reproduce this issue by reducing "dfs.block.access.token.lifetime" > and we could figure out the cause. > (We used dfs.block.access.token.lifetime=60, i.e. 1 hour) > When hedged read enabled: > !Screenshot from 2021-11-18 12-11-34.png! > When hedged read disabled: > !Screenshot from 2021-11-18 13-31-35.png! > As you can see, it's worst if the hedged read is enabled. However, it happens > whether the hedged read is enabled or not. > This impacts our 99%tile response time. > This happens when the block token is expired and the root cause is the wrong > handling of the InvalidToken exception in sasl handshake in > SaslDataTransferServer. > I propose to add a new response code for DataTransferEncryptorStatus to > request the client to update the block token like DataTransferProtos does. > The test code and patch is available in > https://github.com/apache/hadoop/pull/3677 > We could reproduce this issue by the following test code in 2.8.5 branch and > trunk as I tested > {code:java} > // HDFS is configured as secure cluster > try (FileSystem fs = newFileSystem(); > FSDataInputStream in = fs.open(PATH)) { > waitBlockTokenExpired(in); > in.read(0, bytes, 0, bytes.length) > } > private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception { > DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream(); > for (LocatedBlock block : innerStream.getAllBlocks()) { > while (!SecurityTestUtil.isBlockTokenExpired(block.getBlockToken())) { > Thread.sleep(100); > } > } > } > {code} > Here is the log we got, we added a custom log before and after the block > token refresh: > https://github.com/bitterfox/hadoop/commit/173a9f876f2264b76af01d658f624197936fd79c > {code} > 2021-11-16 09:40:20,330 WARN [hedgedRead-247] impl.BlockReaderFactory: I/O > error constructing remote block reader. > java.io.IOException: DIGEST-MD5: IO error acquiring password > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:420) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:475) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:389) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:568) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2880) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:815) > at >
[jira] [Work logged] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?focusedWorklogId=690047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690047 ] ASF GitHub Bot logged work on HDFS-16332: - Author: ASF GitHub Bot Created on: 03/Dec/21 14:30 Start Date: 03/Dec/21 14:30 Worklog Time Spent: 10m Work Description: aajisaka commented on pull request #3677: URL: https://github.com/apache/hadoop/pull/3677#issuecomment-985565777 Merged. Thank you @bitterfox for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690047) Time Spent: 5h 40m (was: 5.5h) > Expired block token causes slow read due to missing handling in sasl handshake > -- > > Key: HDFS-16332 > URL: https://issues.apache.org/jira/browse/HDFS-16332 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, dfs, dfsclient >Affects Versions: 2.8.5, 3.3.1 >Reporter: Shinya Yoshida >Assignee: Shinya Yoshida >Priority: Major > Labels: pull-request-available > Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from > 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png > > Time Spent: 5h 40m > Remaining Estimate: 0h > > We're operating the HBase 1.4.x cluster on Hadoop 2.8.5. > We're recently evaluating Kerberos secured HBase and Hadoop cluster with > production load and we observed HBase's response slows >= several seconds, > and about several minutes for worst-case (about once~three times a month). > The following image is a scatter plot of HBase's response slow, each circle > is each base's slow response log. > The X-axis is the date time of the log occurred, the Y-axis is the response > slow time. > !Screenshot from 2021-11-18 12-14-29.png! > We could reproduce this issue by reducing "dfs.block.access.token.lifetime" > and we could figure out the cause. > (We used dfs.block.access.token.lifetime=60, i.e. 1 hour) > When hedged read enabled: > !Screenshot from 2021-11-18 12-11-34.png! > When hedged read disabled: > !Screenshot from 2021-11-18 13-31-35.png! > As you can see, it's worst if the hedged read is enabled. However, it happens > whether the hedged read is enabled or not. > This impacts our 99%tile response time. > This happens when the block token is expired and the root cause is the wrong > handling of the InvalidToken exception in sasl handshake in > SaslDataTransferServer. > I propose to add a new response code for DataTransferEncryptorStatus to > request the client to update the block token like DataTransferProtos does. > The test code and patch is available in > https://github.com/apache/hadoop/pull/3677 > We could reproduce this issue by the following test code in 2.8.5 branch and > trunk as I tested > {code:java} > // HDFS is configured as secure cluster > try (FileSystem fs = newFileSystem(); > FSDataInputStream in = fs.open(PATH)) { > waitBlockTokenExpired(in); > in.read(0, bytes, 0, bytes.length) > } > private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception { > DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream(); > for (LocatedBlock block : innerStream.getAllBlocks()) { > while (!SecurityTestUtil.isBlockTokenExpired(block.getBlockToken())) { > Thread.sleep(100); > } > } > } > {code} > Here is the log we got, we added a custom log before and after the block > token refresh: > https://github.com/bitterfox/hadoop/commit/173a9f876f2264b76af01d658f624197936fd79c > {code} > 2021-11-16 09:40:20,330 WARN [hedgedRead-247] impl.BlockReaderFactory: I/O > error constructing remote block reader. > java.io.IOException: DIGEST-MD5: IO error acquiring password > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:420) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:475) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:389) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263) > at >
[jira] [Work logged] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?focusedWorklogId=690046=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-690046 ] ASF GitHub Bot logged work on HDFS-16332: - Author: ASF GitHub Bot Created on: 03/Dec/21 14:30 Start Date: 03/Dec/21 14:30 Worklog Time Spent: 10m Work Description: aajisaka merged pull request #3677: URL: https://github.com/apache/hadoop/pull/3677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 690046) Time Spent: 5.5h (was: 5h 20m) > Expired block token causes slow read due to missing handling in sasl handshake > -- > > Key: HDFS-16332 > URL: https://issues.apache.org/jira/browse/HDFS-16332 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, dfs, dfsclient >Affects Versions: 2.8.5, 3.3.1 >Reporter: Shinya Yoshida >Assignee: Shinya Yoshida >Priority: Major > Labels: pull-request-available > Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from > 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png > > Time Spent: 5.5h > Remaining Estimate: 0h > > We're operating the HBase 1.4.x cluster on Hadoop 2.8.5. > We're recently evaluating Kerberos secured HBase and Hadoop cluster with > production load and we observed HBase's response slows >= several seconds, > and about several minutes for worst-case (about once~three times a month). > The following image is a scatter plot of HBase's response slow, each circle > is each base's slow response log. > The X-axis is the date time of the log occurred, the Y-axis is the response > slow time. > !Screenshot from 2021-11-18 12-14-29.png! > We could reproduce this issue by reducing "dfs.block.access.token.lifetime" > and we could figure out the cause. > (We used dfs.block.access.token.lifetime=60, i.e. 1 hour) > When hedged read enabled: > !Screenshot from 2021-11-18 12-11-34.png! > When hedged read disabled: > !Screenshot from 2021-11-18 13-31-35.png! > As you can see, it's worst if the hedged read is enabled. However, it happens > whether the hedged read is enabled or not. > This impacts our 99%tile response time. > This happens when the block token is expired and the root cause is the wrong > handling of the InvalidToken exception in sasl handshake in > SaslDataTransferServer. > I propose to add a new response code for DataTransferEncryptorStatus to > request the client to update the block token like DataTransferProtos does. > The test code and patch is available in > https://github.com/apache/hadoop/pull/3677 > We could reproduce this issue by the following test code in 2.8.5 branch and > trunk as I tested > {code:java} > // HDFS is configured as secure cluster > try (FileSystem fs = newFileSystem(); > FSDataInputStream in = fs.open(PATH)) { > waitBlockTokenExpired(in); > in.read(0, bytes, 0, bytes.length) > } > private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception { > DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream(); > for (LocatedBlock block : innerStream.getAllBlocks()) { > while (!SecurityTestUtil.isBlockTokenExpired(block.getBlockToken())) { > Thread.sleep(100); > } > } > } > {code} > Here is the log we got, we added a custom log before and after the block > token refresh: > https://github.com/bitterfox/hadoop/commit/173a9f876f2264b76af01d658f624197936fd79c > {code} > 2021-11-16 09:40:20,330 WARN [hedgedRead-247] impl.BlockReaderFactory: I/O > error constructing remote block reader. > java.io.IOException: DIGEST-MD5: IO error acquiring password > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:420) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:475) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:389) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at >
[jira] [Assigned] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned HDFS-16332: Assignee: Shinya Yoshida > Expired block token causes slow read due to missing handling in sasl handshake > -- > > Key: HDFS-16332 > URL: https://issues.apache.org/jira/browse/HDFS-16332 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, dfs, dfsclient >Affects Versions: 2.8.5, 3.3.1 >Reporter: Shinya Yoshida >Assignee: Shinya Yoshida >Priority: Major > Labels: pull-request-available > Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from > 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png > > Time Spent: 5h 20m > Remaining Estimate: 0h > > We're operating the HBase 1.4.x cluster on Hadoop 2.8.5. > We're recently evaluating Kerberos secured HBase and Hadoop cluster with > production load and we observed HBase's response slows >= several seconds, > and about several minutes for worst-case (about once~three times a month). > The following image is a scatter plot of HBase's response slow, each circle > is each base's slow response log. > The X-axis is the date time of the log occurred, the Y-axis is the response > slow time. > !Screenshot from 2021-11-18 12-14-29.png! > We could reproduce this issue by reducing "dfs.block.access.token.lifetime" > and we could figure out the cause. > (We used dfs.block.access.token.lifetime=60, i.e. 1 hour) > When hedged read enabled: > !Screenshot from 2021-11-18 12-11-34.png! > When hedged read disabled: > !Screenshot from 2021-11-18 13-31-35.png! > As you can see, it's worst if the hedged read is enabled. However, it happens > whether the hedged read is enabled or not. > This impacts our 99%tile response time. > This happens when the block token is expired and the root cause is the wrong > handling of the InvalidToken exception in sasl handshake in > SaslDataTransferServer. > I propose to add a new response code for DataTransferEncryptorStatus to > request the client to update the block token like DataTransferProtos does. > The test code and patch is available in > https://github.com/apache/hadoop/pull/3677 > We could reproduce this issue by the following test code in 2.8.5 branch and > trunk as I tested > {code:java} > // HDFS is configured as secure cluster > try (FileSystem fs = newFileSystem(); > FSDataInputStream in = fs.open(PATH)) { > waitBlockTokenExpired(in); > in.read(0, bytes, 0, bytes.length) > } > private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception { > DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream(); > for (LocatedBlock block : innerStream.getAllBlocks()) { > while (!SecurityTestUtil.isBlockTokenExpired(block.getBlockToken())) { > Thread.sleep(100); > } > } > } > {code} > Here is the log we got, we added a custom log before and after the block > token refresh: > https://github.com/bitterfox/hadoop/commit/173a9f876f2264b76af01d658f624197936fd79c > {code} > 2021-11-16 09:40:20,330 WARN [hedgedRead-247] impl.BlockReaderFactory: I/O > error constructing remote block reader. > java.io.IOException: DIGEST-MD5: IO error acquiring password > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:420) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:475) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getSaslStreams(SaslDataTransferClient.java:389) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:263) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211) > at > org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.peerSend(SaslDataTransferClient.java:160) > at > org.apache.hadoop.hdfs.DFSUtilClient.peerFromSocketAndKey(DFSUtilClient.java:568) > at > org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2880) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:815) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:740) > at > org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385) > at > org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:697) > at >
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452973#comment-17452973 ] Yuanxin Zhu commented on HDFS-16293: [~tasanuma] Thanks for your review. I added some comments for the unit test. Could you check it? > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.07.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch, HDFS-16293.07.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?focusedWorklogId=689956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689956 ] ASF GitHub Bot logged work on HDFS-16332: - Author: ASF GitHub Bot Created on: 03/Dec/21 12:10 Start Date: 03/Dec/21 12:10 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3677: URL: https://github.com/apache/hadoop/pull/3677#issuecomment-985468663 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | buf | 0m 1s | | buf was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 12m 48s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 21m 23s | | trunk passed | | +1 :green_heart: | compile | 5m 18s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 4m 55s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 1m 12s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 25s | | trunk passed | | +1 :green_heart: | javadoc | 1m 46s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 16s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 5m 40s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 12s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 28s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 1s | | the patch passed | | +1 :green_heart: | compile | 5m 7s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | cc | 5m 7s | | the patch passed | | +1 :green_heart: | javac | 5m 7s | | the patch passed | | +1 :green_heart: | compile | 4m 49s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | cc | 4m 49s | | the patch passed | | +1 :green_heart: | javac | 4m 49s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 4s | | the patch passed | | +1 :green_heart: | mvnsite | 2m 11s | | the patch passed | | +1 :green_heart: | javadoc | 1m 23s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 1m 57s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 5m 43s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 6s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 23s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 223m 39s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 48s | | The patch does not generate ASF License warnings. | | | | 352m 24s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3677/12/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3677 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell cc buflint bufcompat | | uname | Linux 8c3a896c9e3b 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5fe1d32e150a50598c0760a7b3848b0cee87ffe4 | | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | Test Results |
[jira] [Work logged] (HDFS-16370) Fix assert message for BlockInfo
[ https://issues.apache.org/jira/browse/HDFS-16370?focusedWorklogId=689935=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689935 ] ASF GitHub Bot logged work on HDFS-16370: - Author: ASF GitHub Bot Created on: 03/Dec/21 11:46 Start Date: 03/Dec/21 11:46 Worklog Time Spent: 10m Work Description: tomscut opened a new pull request #3747: URL: https://github.com/apache/hadoop/pull/3747 JIRA: [HDFS-16370](https://issues.apache.org/jira/browse/HDFS-16370). In both methods BlockInfo#getPrevious and BlockInfo#getNext, the assert message is wrong. This may cause some misunderstanding and needs to be fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689935) Remaining Estimate: 0h Time Spent: 10m > Fix assert message for BlockInfo > > > Key: HDFS-16370 > URL: https://issues.apache.org/jira/browse/HDFS-16370 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In both methods BlockInfo#getPrevious and BlockInfo#getNext, the assert > message is wrong. This may cause some misunderstanding and needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16370) Fix assert message for BlockInfo
[ https://issues.apache.org/jira/browse/HDFS-16370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16370: -- Labels: pull-request-available (was: ) > Fix assert message for BlockInfo > > > Key: HDFS-16370 > URL: https://issues.apache.org/jira/browse/HDFS-16370 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In both methods BlockInfo#getPrevious and BlockInfo#getNext, the assert > message is wrong. This may cause some misunderstanding and needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16370) Fix assert message for BlockInfo
tomscut created HDFS-16370: -- Summary: Fix assert message for BlockInfo Key: HDFS-16370 URL: https://issues.apache.org/jira/browse/HDFS-16370 Project: Hadoop HDFS Issue Type: Bug Reporter: tomscut Assignee: tomscut In both methods BlockInfo#getPrevious and BlockInfo#getNext, the assert message is wrong. This may cause some misunderstanding and needs to be fixed. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16364) Remove unnecessary brackets in NameNodeRpcServer#L453
[ https://issues.apache.org/jira/browse/HDFS-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452922#comment-17452922 ] Brahma Reddy Battula commented on HDFS-16364: - Committed to trunk. [~wangzhaohui] thanks for contribution. > Remove unnecessary brackets in NameNodeRpcServer#L453 > - > > Key: HDFS-16364 > URL: https://issues.apache.org/jira/browse/HDFS-16364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16364) Remove unnecessary brackets in NameNodeRpcServer#L453
[ https://issues.apache.org/jira/browse/HDFS-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved HDFS-16364. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Remove unnecessary brackets in NameNodeRpcServer#L453 > - > > Key: HDFS-16364 > URL: https://issues.apache.org/jira/browse/HDFS-16364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16364) Remove unnecessary brackets in NameNodeRpcServer#L453
[ https://issues.apache.org/jira/browse/HDFS-16364?focusedWorklogId=689921=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689921 ] ASF GitHub Bot logged work on HDFS-16364: - Author: ASF GitHub Bot Created on: 03/Dec/21 11:21 Start Date: 03/Dec/21 11:21 Worklog Time Spent: 10m Work Description: brahmareddybattula merged pull request #3742: URL: https://github.com/apache/hadoop/pull/3742 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689921) Time Spent: 40m (was: 0.5h) > Remove unnecessary brackets in NameNodeRpcServer#L453 > - > > Key: HDFS-16364 > URL: https://issues.apache.org/jira/browse/HDFS-16364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16364) Remove unnecessary brackets in NameNodeRpcServer#L453
[ https://issues.apache.org/jira/browse/HDFS-16364?focusedWorklogId=689920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689920 ] ASF GitHub Bot logged work on HDFS-16364: - Author: ASF GitHub Bot Created on: 03/Dec/21 11:20 Start Date: 03/Dec/21 11:20 Worklog Time Spent: 10m Work Description: brahmareddybattula commented on pull request #3742: URL: https://github.com/apache/hadoop/pull/3742#issuecomment-985438559 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689920) Time Spent: 0.5h (was: 20m) > Remove unnecessary brackets in NameNodeRpcServer#L453 > - > > Key: HDFS-16364 > URL: https://issues.apache.org/jira/browse/HDFS-16364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Trivial > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452908#comment-17452908 ] Takanobu Asanuma commented on HDFS-16293: - [~Yuanxin Zhu] Thanks for your explanation and for updating the patch. It seems the unit test becomes stable, and [^HDFS-16293.06.patch] mostly looks good to me. Some minor comments: * Could you add a timeout to the unit test? @Test(timeout=6) * Please provide more comments to the unit tests about the purpose of each thread, and why it verifies that congestedNodes.size() is greater than 1, and so on. * How about adding a comment like "// streamer has to release dataQueue before calling backoff" before calling backOffIfNecessary()? > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452879#comment-17452879 ] Yuanxin Zhu commented on HDFS-16293: [~tasanuma] In HDFS-16293.06.patch, the program will definitely finish. Could you check it? > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.06.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch, HDFS-16293.06.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452859#comment-17452859 ] Yuanxin Zhu edited comment on HDFS-16293 at 12/3/21, 10:07 AM: --- [~tasanuma] Thanks for your feedback. What I'm worried about is that the unit test went wrong because of threading problems. I think there are two situations: * Without fixing DataStreamer, the congestedNodes thread may run one step ahead of the dataQueue thread, resulting in the size of the congestedNodes greater than 1, it can be solved by increasing the sleep time of the congestedNodes thread. * With fixing DataStreamer, in order to save time, the previous unit test program exits after the dataQueue thread ends, which may cause the program to exit in advance when the size of the congestedNodes is not greater than 1. It can be solved by increasing the number of the congestedNodes thread runs and putting the program exit code in the congestedNodes thread, but it will affect the running time of the unit test Without fixing DataStreamer. If the program can't finish occasionally, we can increase the number of times the dataQueue thread runs, so as to prevent the DataStreamer from waiting because the dataQueue is empty, or add a packet again before the congestedNodes thread ends. Could you check it? was (Author: yuanxin zhu): [~tasanuma] Thanks for your feedback. What I'm worried about is that the unit test went wrong because of threading problems I think there are two situations: * Without fixing DataStreamer, the congestedNodes thread may run one step ahead of the dataQueue thread, resulting in the size of the congestedNodes greater than 1, it can be solved by increasing the sleep time of the congestedNodes thread. * With fixing DataStreamer, in order to save time, the previous unit test program exits after the dataQueue thread ends, which may cause the program to exit in advance when the size of the congestedNodes is not greater than 1. It can be solved by increasing the number of the congestedNodes thread runs and putting the program exit code in the congestedNodes thread, but it will affect the running time of the unit test Without fixing DataStreamer. If the program can't finish occasionally, we can increase the number of times the dataQueue thread runs, so as to prevent the DataStreamer from waiting because the dataQueue is empty, or add a packet again before the congestedNodes thread ends. Could you check it? > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452859#comment-17452859 ] Yuanxin Zhu edited comment on HDFS-16293 at 12/3/21, 10:05 AM: --- [~tasanuma] Thanks for your feedback. What I'm worried about is that the unit test went wrong because of threading problems I think there are two situations: * Without fixing DataStreamer, the congestedNodes thread may run one step ahead of the dataQueue thread, resulting in the size of the congestedNodes greater than 1, it can be solved by increasing the sleep time of the congestedNodes thread. * With fixing DataStreamer, in order to save time, the previous unit test program exits after the dataQueue thread ends, which may cause the program to exit in advance when the size of the congestedNodes is not greater than 1. It can be solved by increasing the number of the congestedNodes thread runs and putting the program exit code in the congestedNodes thread, but it will affect the running time of the unit test Without fixing DataStreamer. If the program can't finish occasionally, we can increase the number of times the dataQueue thread runs, so as to prevent the DataStreamer from waiting because the dataQueue is empty, or add a packet again before the congestedNodes thread ends. Could you check it? was (Author: yuanxin zhu): [~tasanuma] Thanks for your feedback. It's also what I'm worried about. I think there are two situations: * Without fixing DataStreamer, the congestedNodes thread may run one step ahead of the dataQueue thread, resulting in the size of the congestedNodes greater than 1, it can be solved by increasing the sleep time of the congestedNodes thread. * With fixing DataStreamer, in order to save time, the previous unit test program exits after the dataQueue thread ends, which may cause the program to exit in advance when the size of the congestedNodes is not greater than 1. It can be solved by increasing the number of the congestedNodes thread runs and putting the program exit code in the congestedNodes thread, but it will affect the running time of the unit test Without fixing DataStreamer. Could you check it? > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16331) Make dfs.blockreport.intervalMsec reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-16331?focusedWorklogId=689854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689854 ] ASF GitHub Bot logged work on HDFS-16331: - Author: ASF GitHub Bot Created on: 03/Dec/21 09:42 Start Date: 03/Dec/21 09:42 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #3676: URL: https://github.com/apache/hadoop/pull/3676#issuecomment-985371174 Thanks @tasanuma for the review and the merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689854) Time Spent: 5h 10m (was: 5h) > Make dfs.blockreport.intervalMsec reconfigurable > > > Key: HDFS-16331 > URL: https://issues.apache.org/jira/browse/HDFS-16331 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-11-18-09-33-24-236.png, > image-2021-11-18-09-35-35-400.png > > Time Spent: 5h 10m > Remaining Estimate: 0h > > We have a cold data cluster, which stores as EC policy. There are 24 fast > disks on each node and each disk is 7 TB. > Recently, many nodes have more than 10 million blocks, and the interval of > FBR is 6h as default. Frequent FBR caused great pressure on NN. > !image-2021-11-18-09-35-35-400.png|width=334,height=229! > !image-2021-11-18-09-33-24-236.png|width=566,height=159! > We want to increase the interval of FBR, but have to rolling restart the DNs, > this operation is very heavy. In this scenario, it is necessary to make > _dfs.blockreport.intervalMsec_ reconfigurable. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452859#comment-17452859 ] Yuanxin Zhu commented on HDFS-16293: [~tasanuma] Thanks for your feedback. It's also what I'm worried about. I think there are two situations: * Without fixing DataStreamer, the congestedNodes thread may run one step ahead of the dataQueue thread, resulting in the size of the congestedNodes greater than 1, it can be solved by increasing the sleep time of the congestedNodes thread. * With fixing DataStreamer, in order to save time, the previous unit test program exits after the dataQueue thread ends, which may cause the program to exit in advance when the size of the congestedNodes is not greater than 1. It can be solved by increasing the number of the congestedNodes thread runs and putting the program exit code in the congestedNodes thread, but it will affect the running time of the unit test Without fixing DataStreamer. Could you check it? > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16293) Client sleeps and holds 'dataQueue' when DataNodes are congested
[ https://issues.apache.org/jira/browse/HDFS-16293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanxin Zhu updated HDFS-16293: --- Attachment: HDFS-16293.05.patch > Client sleeps and holds 'dataQueue' when DataNodes are congested > > > Key: HDFS-16293 > URL: https://issues.apache.org/jira/browse/HDFS-16293 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.2.2, 3.3.1, 3.2.3 >Reporter: Yuanxin Zhu >Assignee: Yuanxin Zhu >Priority: Major > Attachments: HDFS-16293.01-branch-3.2.2.patch, HDFS-16293.01.patch, > HDFS-16293.02.patch, HDFS-16293.03.patch, HDFS-16293.04.patch, > HDFS-16293.05.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > When I open the ECN and use Terasort(500G data,8 DataNodes,76 vcores/DN) for > testing, DataNodes are congested(HDFS-8008). The client enters the sleep > state after receiving the ACK for many times, but does not release the > 'dataQueue'. The ResponseProcessor thread needs the 'dataQueue' to execute > 'ackQueue.getFirst()', so the ResponseProcessor will wait for the client to > release the 'dataQueue', which is equivalent to that the ResponseProcessor > thread also enters sleep, resulting in ACK delay.MapReduce tasks can be > delayed by tens of minutes or even hours. > The DataStreamer thread can first execute 'one = dataQueue. getFirst()', > release 'dataQueue', and then judge whether to execute 'backOffIfNecessary()' > according to 'one.isHeartbeatPacket()' > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16332) Expired block token causes slow read due to missing handling in sasl handshake
[ https://issues.apache.org/jira/browse/HDFS-16332?focusedWorklogId=689835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689835 ] ASF GitHub Bot logged work on HDFS-16332: - Author: ASF GitHub Bot Created on: 03/Dec/21 09:13 Start Date: 03/Dec/21 09:13 Worklog Time Spent: 10m Work Description: aajisaka commented on a change in pull request #3677: URL: https://github.com/apache/hadoop/pull/3677#discussion_r761763311 ## File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java ## @@ -603,7 +603,20 @@ private IOStreamPair doSaslHandshake(InetAddress addr, conf, cipherOption, underlyingOut, underlyingIn, false) : sasl.createStreamPair(out, in); } catch (IOException ioe) { - sendGenericSaslErrorMessage(out, ioe.getMessage()); + String message = ioe.getMessage(); + try { +sendGenericSaslErrorMessage(out, message); + } catch (Exception e) { +// If ioe is caused by error response from server, server will close peer connection. +// So sendGenericSaslErrorMessage might cause IOException due to "Broken pipe". +// We suppress IOException from sendGenericSaslErrorMessage +// and always throw `ioe` as top level. +// `ioe` can be InvalidEncryptionKeyException or InvalidBlockTokenException +// that indicates refresh key or token and are important for caller. +LOG.debug("Failed to send generic sasl error (server: {}, message: {}), suppress exception", +addr.toString(), message, e); Review comment: Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689835) Time Spent: 5h 10m (was: 5h) > Expired block token causes slow read due to missing handling in sasl handshake > -- > > Key: HDFS-16332 > URL: https://issues.apache.org/jira/browse/HDFS-16332 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, dfs, dfsclient >Affects Versions: 2.8.5, 3.3.1 >Reporter: Shinya Yoshida >Priority: Major > Labels: pull-request-available > Attachments: Screenshot from 2021-11-18 12-11-34.png, Screenshot from > 2021-11-18 12-14-29.png, Screenshot from 2021-11-18 13-31-35.png > > Time Spent: 5h 10m > Remaining Estimate: 0h > > We're operating the HBase 1.4.x cluster on Hadoop 2.8.5. > We're recently evaluating Kerberos secured HBase and Hadoop cluster with > production load and we observed HBase's response slows >= several seconds, > and about several minutes for worst-case (about once~three times a month). > The following image is a scatter plot of HBase's response slow, each circle > is each base's slow response log. > The X-axis is the date time of the log occurred, the Y-axis is the response > slow time. > !Screenshot from 2021-11-18 12-14-29.png! > We could reproduce this issue by reducing "dfs.block.access.token.lifetime" > and we could figure out the cause. > (We used dfs.block.access.token.lifetime=60, i.e. 1 hour) > When hedged read enabled: > !Screenshot from 2021-11-18 12-11-34.png! > When hedged read disabled: > !Screenshot from 2021-11-18 13-31-35.png! > As you can see, it's worst if the hedged read is enabled. However, it happens > whether the hedged read is enabled or not. > This impacts our 99%tile response time. > This happens when the block token is expired and the root cause is the wrong > handling of the InvalidToken exception in sasl handshake in > SaslDataTransferServer. > I propose to add a new response code for DataTransferEncryptorStatus to > request the client to update the block token like DataTransferProtos does. > The test code and patch is available in > https://github.com/apache/hadoop/pull/3677 > We could reproduce this issue by the following test code in 2.8.5 branch and > trunk as I tested > {code:java} > // HDFS is configured as secure cluster > try (FileSystem fs = newFileSystem(); > FSDataInputStream in = fs.open(PATH)) { > waitBlockTokenExpired(in); > in.read(0, bytes, 0, bytes.length) > } > private void waitBlockTokenExpired(FSDataInputStream in1) throws Exception { > DFSInputStream innerStream = (DFSInputStream) in1.getWrappedStream(); > for (LocatedBlock block : innerStream.getAllBlocks()) { > while
[jira] [Work logged] (HDFS-15987) Improve oiv tool to parse fsimage file in parallel with delimited format
[ https://issues.apache.org/jira/browse/HDFS-15987?focusedWorklogId=689814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689814 ] ASF GitHub Bot logged work on HDFS-15987: - Author: ASF GitHub Bot Created on: 03/Dec/21 08:20 Start Date: 03/Dec/21 08:20 Worklog Time Spent: 10m Work Description: whbing commented on a change in pull request #2918: URL: https://github.com/apache/hadoop/pull/2918#discussion_r76172 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/PBImageTextWriter.java ## @@ -651,14 +683,123 @@ private void output(Configuration conf, FileSummary summary, is = FSImageUtil.wrapInputStreamForCompression(conf, summary.getCodec(), new BufferedInputStream(new LimitInputStream( fin, section.getLength(; -outputINodes(is); +INodeSection s = INodeSection.parseDelimitedFrom(is); +LOG.info("Found {} INodes in the INode section", s.getNumInodes()); +int count = outputINodes(is, out); +LOG.info("Outputted {} INodes.", count); } } afterOutput(); long timeTaken = Time.monotonicNow() - startTime; LOG.debug("Time to output inodes: {}ms", timeTaken); } + /** + * STEP1: Multi-threaded process sub-sections. + * Given n (n>1) threads to process k (k>=n) sections, + * E.g. 10 sections and 4 threads, grouped as follows: + * |---| + * | (012)(345)(67) (89) | + * | thread[0]thread[1]thread[2]thread[3] | + * |---| + * + * STEP2: Merge files. + */ + private void outputInParallel(Configuration conf, FileSummary summary, + ArrayList subSections) + throws IOException { +int nThreads = Integer.min(numThreads, subSections.size()); +LOG.info("Outputting in parallel with {} sub-sections" + +" using {} threads", subSections.size(), nThreads); +final CopyOnWriteArrayList exceptions = +new CopyOnWriteArrayList<>(); +Thread[] threads = new Thread[nThreads]; +String[] paths = new String[nThreads]; +for (int i = 0; i < paths.length; i++) { + paths[i] = parallelOut + ".tmp." + i; +} +AtomicLong expectedINodes = new AtomicLong(0); +AtomicLong totalParsed = new AtomicLong(0); +String codec = summary.getCodec(); + +int mark = 0; +for (int i = 0; i < nThreads; i++) { + // Each thread processes different ordered sub-sections + // and outputs to different paths + int step = subSections.size() / nThreads + + (i < subSections.size() % nThreads ? 1 : 0); + int start = mark; + int end = start + step; + ArrayList subList = new ArrayList<>( + subSections.subList(start, end)); + mark = end; + String path = paths[i]; + + threads[i] = new Thread(() -> { Review comment: > Maybe thread pool is better here? @symious Thanks! I will try this suggestion in next commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 689814) Time Spent: 4h 20m (was: 4h 10m) > Improve oiv tool to parse fsimage file in parallel with delimited format > > > Key: HDFS-15987 > URL: https://issues.apache.org/jira/browse/HDFS-15987 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Labels: pull-request-available > Attachments: Improve_oiv_tool_001.pdf > > Time Spent: 4h 20m > Remaining Estimate: 0h > > The purpose of this Jira is to improve oiv tool to parse fsimage file with > sub-sections (see -HDFS-14617-) in parallel with delmited format. > 1.Serial parsing is time-consuming > The time to serially parse a large fsimage with delimited format (e.g. `hdfs > oiv -p Delimited -t ...`) is as follows: > {code:java} > 1) Loading string table: -> Not time consuming. > 2) Loading inode references: -> Not time consuming > 3) Loading directories in INode section: -> Slightly time consuming (3%) > 4) Loading INode directory section: -> A bit time consuming (11%) > 5) Output: -> Very time consuming (86%){code} > Therefore, output is the