[jira] [Updated] (HDFS-16616) Remove the use if Sets#newHashSet and Sets#newTreeSet
[ https://issues.apache.org/jira/browse/HDFS-16616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16616: -- Component/s: hdfs-common Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove the use if Sets#newHashSet and Sets#newTreeSet > -- > > Key: HDFS-16616 > URL: https://issues.apache.org/jira/browse/HDFS-16616 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-common >Affects Versions: 3.4.0 >Reporter: Samrat Deb >Assignee: Samrat Deb >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > As part of removing guava dependencies HADOOP-17115, HADOOP-17721, > HADOOP-17722 and HADOOP-17720 are fixed, > Currently the code call util function to create HashSet and TreeSet in the > repo . These function calls dont have much importance as it is calling > internally new HashSet<> / new TreeSet<> from java.utils > This task is to clean up all the function calls to create sets which is > redundant > Before moving to java8 , sets were created using guava functions and API , > now since this is moved away and util code in the hadoop now looks like > 1. > public static TreeSet newTreeSet() { return new > TreeSet(); > 2. > public static HashSet newHashSet() > { return new HashSet(); } > These interfaces dont do anything much just a extra layer of function call > please refer to the task > https://issues.apache.org/jira/browse/HADOOP-17726 > Can anyone review if this ticket add some value in the code. > Looking forward to some input/ thoughts . If not adding any value we can > close it and not move forward with changes ! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16522) Set Http and Ipc ports for Datanodes in MiniDFSCluster
[ https://issues.apache.org/jira/browse/HDFS-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16522: -- Component/s: tets Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.4.0 Affects Version/s: 3.3.5 3.4.0 > Set Http and Ipc ports for Datanodes in MiniDFSCluster > -- > > Key: HDFS-16522 > URL: https://issues.apache.org/jira/browse/HDFS-16522 > Project: Hadoop HDFS > Issue Type: Task > Components: tets >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > We should provide options to set Http and Ipc ports for Datanodes in > MiniDFSCluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16502) Reconfigure Block Invalidate limit
[ https://issues.apache.org/jira/browse/HDFS-16502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16502: -- Component/s: block placement Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.4.0 Affects Version/s: 3.3.5 3.4.0 > Reconfigure Block Invalidate limit > -- > > Key: HDFS-16502 > URL: https://issues.apache.org/jira/browse/HDFS-16502 > Project: Hadoop HDFS > Issue Type: Task > Components: block placement >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 2h > Remaining Estimate: 0h > > Based on the cluster load, it would be helpful to consider tuning block > invalidate limit (dfs.block.invalidate.limit). The only way we can do this > without restarting Namenode as of today is by reconfiguring heartbeat > interval > {code:java} > Math.max(heartbeatInt*20, blockInvalidateLimit){code} > , this logic is not straightforward and operators are usually not aware of it > (lack of documentation), also updating heartbeat interval is not desired in > all the cases. > We should provide the ability to alter block invalidation limit without > affecting heartbeat interval on the live cluster to adjust some load at > Datanode level. > We should also take this opportunity to keep (heartbeatInterval * 20) > computation logic in a common method. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16481) Provide support to set Http and Rpc ports in MiniJournalCluster
[ https://issues.apache.org/jira/browse/HDFS-16481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16481: -- Component/s: test Target Version/s: 3.3.5, 3.4.0 Affects Version/s: 3.3.5 3.4.0 > Provide support to set Http and Rpc ports in MiniJournalCluster > --- > > Key: HDFS-16481 > URL: https://issues.apache.org/jira/browse/HDFS-16481 > Project: Hadoop HDFS > Issue Type: Task > Components: test >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > We should provide support for clients to set Http and Rpc ports of > JournalNodes in MiniJournalCluster. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16054) Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project
[ https://issues.apache.org/jira/browse/HDFS-16054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16054: -- Component/s: hdfs-common Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Replace Guava Lists usage by Hadoop's own Lists in hadoop-hdfs-project > -- > > Key: HDFS-16054 > URL: https://issues.apache.org/jira/browse/HDFS-16054 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs-common >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16435) Remove no need TODO comment for ObserverReadProxyProvider
[ https://issues.apache.org/jira/browse/HDFS-16435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16435: -- Component/s: namanode Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove no need TODO comment for ObserverReadProxyProvider > - > > Key: HDFS-16435 > URL: https://issues.apache.org/jira/browse/HDFS-16435 > Project: Hadoop HDFS > Issue Type: Wish > Components: namanode >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Based on discussion in > [HDFS-13923|https://issues.apache.org/jira/browse/HDFS-13923], we don't think > need to Add a configuration to turn on/off observer reads. > So I suggest removing the `TODO comment` that are not needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16541) Fix a typo in NameNodeLayoutVersion.
[ https://issues.apache.org/jira/browse/HDFS-16541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16541: -- Component/s: namanode Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix a typo in NameNodeLayoutVersion. > > > Key: HDFS-16541 > URL: https://issues.apache.org/jira/browse/HDFS-16541 > Project: Hadoop HDFS > Issue Type: Wish > Components: namanode >Affects Versions: 3.4.0 >Reporter: ZhiWei Shi >Assignee: ZhiWei Shi >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Fix a typo in NameNodeLayoutVersion. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16587) Allow configuring Handler number for the JournalNodeRpcServer
[ https://issues.apache.org/jira/browse/HDFS-16587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16587: -- Component/s: journal-node Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Allow configuring Handler number for the JournalNodeRpcServer > - > > Key: HDFS-16587 > URL: https://issues.apache.org/jira/browse/HDFS-16587 > Project: Hadoop HDFS > Issue Type: Wish > Components: journal-node >Affects Versions: 3.4.0 >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > We can allow configuring the handler number for the JournalNodeRpcServer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16339) Show the threshold when mover threads quota is exceeded
[ https://issues.apache.org/jira/browse/HDFS-16339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16339: -- Component/s: datanode Hadoop Flags: Reviewed Target Version/s: 3.2.4, 3.3.2, 3.4.0 Affects Version/s: 3.2.4 3.3.2 3.4.0 > Show the threshold when mover threads quota is exceeded > --- > > Key: HDFS-16339 > URL: https://issues.apache.org/jira/browse/HDFS-16339 > Project: Hadoop HDFS > Issue Type: Wish > Components: datanode >Affects Versions: 3.4.0, 3.3.2, 3.2.4 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Attachments: image-2021-11-20-17-23-04-924.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > Show the threshold when mover threads quota is exceeded in > DataXceiver#replaceBlock and DataXceiver#copyBlock. > !image-2021-11-20-17-23-04-924.png|width=1233,height=124! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16335) Fix HDFSCommands.md
[ https://issues.apache.org/jira/browse/HDFS-16335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16335: -- Component/s: documentation Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 Affects Version/s: 3.3.2 3.4.0 > Fix HDFSCommands.md > --- > > Key: HDFS-16335 > URL: https://issues.apache.org/jira/browse/HDFS-16335 > Project: Hadoop HDFS > Issue Type: Wish > Components: documentation >Affects Versions: 3.4.0, 3.3.2 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Fix HDFSCommands.md. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16326) Simplify the code for DiskBalancer
[ https://issues.apache.org/jira/browse/HDFS-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16326: -- Component/s: diskbalancer Hadoop Flags: Reviewed Target Version/s: 3.2.4, 3.3.2, 3.4.0 Affects Version/s: 3.2.4 3.3.2 3.4.0 > Simplify the code for DiskBalancer > -- > > Key: HDFS-16326 > URL: https://issues.apache.org/jira/browse/HDFS-16326 > Project: Hadoop HDFS > Issue Type: Wish > Components: diskbalancer >Affects Versions: 3.4.0, 3.3.2, 3.2.4 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 1h > Remaining Estimate: 0h > > Simplify the code for DiskBalancer. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16319) Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount
[ https://issues.apache.org/jira/browse/HDFS-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16319: -- Component/s: metrics Target Version/s: 3.3.2, 3.4.0 Affects Version/s: 3.3.2 3.4.0 > Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount > > > Key: HDFS-16319 > URL: https://issues.apache.org/jira/browse/HDFS-16319 > Project: Hadoop HDFS > Issue Type: Wish > Components: metrics >Affects Versions: 3.4.0, 3.3.2 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount. See > [HDFS-15808|https://issues.apache.org/jira/browse/HDFS-15808]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16298) Improve error msg for BlockMissingException
[ https://issues.apache.org/jira/browse/HDFS-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16298: -- Component/s: hdfs-client Hadoop Flags: Reviewed Target Version/s: 3.2.4, 3.3.2, 2.10.2, 3.4.0 Affects Version/s: 3.2.4 3.3.2 2.10.2 3.4.0 > Improve error msg for BlockMissingException > --- > > Key: HDFS-16298 > URL: https://issues.apache.org/jira/browse/HDFS-16298 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs-client >Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Attachments: image-2021-11-04-15-28-05-886.png > > Time Spent: 2h > Remaining Estimate: 0h > > When the client fails to obtain a block, a BlockMissingException is thrown. > To analyze the issues, we can add the relevant location information to error > msg here. > !image-2021-11-04-15-28-05-886.png|width=624,height=144! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16312) Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents
[ https://issues.apache.org/jira/browse/HDFS-16312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16312: -- Component/s: datanode metrics Hadoop Flags: Reviewed Target Version/s: 3.2.4, 3.3.2, 2.10.2, 3.4.0 Affects Version/s: 3.2.4 3.3.2 2.10.2 3.4.0 > Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents > > > Key: HDFS-16312 > URL: https://issues.apache.org/jira/browse/HDFS-16312 > Project: Hadoop HDFS > Issue Type: Wish > Components: datanode, metrics >Affects Versions: 3.4.0, 2.10.2, 3.3.2, 3.2.4 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.2, 3.2.4 > > Time Spent: 50m > Remaining Estimate: 0h > > Fix typo for DataNodeVolumeMetrics and ProfilingFileIoEvents. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16280) Fix typo for ShortCircuitReplica#isStale
[ https://issues.apache.org/jira/browse/HDFS-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16280: -- Component/s: hdfs-client Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix typo for ShortCircuitReplica#isStale > > > Key: HDFS-16280 > URL: https://issues.apache.org/jira/browse/HDFS-16280 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Fix typo for ShortCircuitReplica#isStale. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16281) Fix flaky unit tests failed due to timeout
[ https://issues.apache.org/jira/browse/HDFS-16281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16281: -- Component/s: test Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix flaky unit tests failed due to timeout > -- > > Key: HDFS-16281 > URL: https://issues.apache.org/jira/browse/HDFS-16281 > Project: Hadoop HDFS > Issue Type: Wish > Components: test >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > I found that this unit test > *_TestViewFileSystemOverloadSchemeWithHdfsScheme_* failed several times due > to timeout. Can we change the timeout for some methods from _*3s*_ to *_30s_* > to be consistent with the other methods? > {code:java} > [ERROR] Tests run: 19, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: > 65.39 s <<< FAILURE! - in > org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS[ERROR] > Tests run: 19, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 65.39 s <<< > FAILURE! - in > org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS[ERROR] > > testNflyRepair(org.apache.hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS) > Time elapsed: 4.132 s <<< > ERROR!org.junit.runners.model.TestTimedOutException: test timed out after > 3000 milliseconds at java.lang.Object.wait(Native Method) at > java.lang.Object.wait(Object.java:502) at > org.apache.hadoop.util.concurrent.AsyncGet$Util.wait(AsyncGet.java:59) at > org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1577) at > org.apache.hadoop.ipc.Client.call(Client.java:1535) at > org.apache.hadoop.ipc.Client.call(Client.java:1432) at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129) > at com.sun.proxy.$Proxy26.setTimes(Unknown Source) at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setTimes(ClientNamenodeProtocolTranslatorPB.java:1059) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) > at com.sun.proxy.$Proxy27.setTimes(Unknown Source) at > org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:2658) at > org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1978) > at > org.apache.hadoop.hdfs.DistributedFileSystem$37.doCall(DistributedFileSystem.java:1975) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1988) > at org.apache.hadoop.fs.FilterFileSystem.setTimes(FilterFileSystem.java:542) > at > org.apache.hadoop.fs.viewfs.ChRootedFileSystem.setTimes(ChRootedFileSystem.java:328) > at > org.apache.hadoop.fs.viewfs.NflyFSystem$NflyOutputStream.commit(NflyFSystem.java:439) > at > org.apache.hadoop.fs.viewfs.NflyFSystem$NflyOutputStream.close(NflyFSystem.java:395) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) > at > org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at > org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme.writeString(TestViewFileSystemOverloadSchemeWithHdfsScheme.java:685) > at > org.apache.hadoop.fs.viewfs.TestViewFileSystemOverloadSchemeWithHdfsScheme.testNflyRepair(TestViewFileSystemOverloadSchemeWithHdfsScheme.java:622) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.
[jira] [Updated] (HDFS-16194) Simplify the code with DatanodeID#getXferAddrWithHostname
[ https://issues.apache.org/jira/browse/HDFS-16194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16194: -- Component/s: datanode metrics namanode Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Simplify the code with DatanodeID#getXferAddrWithHostname > > > Key: HDFS-16194 > URL: https://issues.apache.org/jira/browse/HDFS-16194 > Project: Hadoop HDFS > Issue Type: Wish > Components: datanode, metrics, namanode >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Simplify the code with DatanodeID#getXferAddrWithHostname. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16131) Show storage type for failed volumes on namenode web
[ https://issues.apache.org/jira/browse/HDFS-16131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16131: -- Component/s: namanode ui Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Show storage type for failed volumes on namenode web > > > Key: HDFS-16131 > URL: https://issues.apache.org/jira/browse/HDFS-16131 > Project: Hadoop HDFS > Issue Type: Wish > Components: namanode, ui >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: failed-volumes.jpg > > Time Spent: 1h 10m > Remaining Estimate: 0h > > To make it easy to query the storage type for failed volumes, we can display > them on namenode web. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16110) Remove unused method reportChecksumFailure in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-16110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16110: -- Component/s: dfsclient Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove unused method reportChecksumFailure in DFSClient > --- > > Key: HDFS-16110 > URL: https://issues.apache.org/jira/browse/HDFS-16110 > Project: Hadoop HDFS > Issue Type: Wish > Components: dfsclient >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Remove unused method reportChecksumFailure and fix some code styles by the > way in DFSClient. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16106) Fix flaky unit test TestDFSShell
[ https://issues.apache.org/jira/browse/HDFS-16106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16106: -- Component/s: test Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix flaky unit test TestDFSShell > > > Key: HDFS-16106 > URL: https://issues.apache.org/jira/browse/HDFS-16106 > Project: Hadoop HDFS > Issue Type: Wish > Components: test >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > This unit test occasionally fails. > The value set for dfs.namenode.accesstime.precision is too low, result in the > execution of the method, accesstime could be set many times, eventually > leading to failed assert. > IMO, dfs.namenode.accesstime.precision should be greater than or equal to the > timeout(120s) of TestDFSShell#testCopyCommandsWithPreserveOption(), or > directly set to 0 to disable this feature. > > {code:java} > [ERROR] Tests run: 52, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: > 106.778 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSShell[ERROR] Tests > run: 52, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 106.778 s <<< > FAILURE! - in org.apache.hadoop.hdfs.TestDFSShell [ERROR] > testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell) Time > elapsed: 2.353 s <<< FAILURE! java.lang.AssertionError: > expected:<1625095098319> but was:<1625095099374> at > org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > org.junit.Assert.assertEquals(Assert.java:633) at > org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2282) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > [ERROR] > testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell) Time > elapsed: 2.467 s <<< FAILURE! java.lang.AssertionError: > expected:<1625095192527> but was:<1625095193950> at > org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > org.junit.Assert.assertEquals(Assert.java:633) at > org.apache.hadoop.hdfs.TestDFSShell.testCopyCommandsWithPreserveOption(TestDFSShell.java:2323) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) at > java.lang.Thread.run(Thread.java:748) > [ERROR] > testCopyCommandsWithPreserveOption(org.apache.hadoop.hdfs.TestDFSShell) Time > elapsed: 2.173 s <<< FAILURE! java.lang.AssertionError: > expected:<1625095196756> but was:<1625095197975> at > org.junit.Assert.fail(Assert.java:89) at > org.junit.Assert.failNotEquals(Assert.java:835) at > org.junit.Assert.assertEquals(Assert.java:647) at > org.junit.Assert.assertEquals(Assert.java:633) at > org.apache.hadoop.hdfs.TestDFSShell.tes
[jira] [Updated] (HDFS-16089) EC: Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor
[ https://issues.apache.org/jira/browse/HDFS-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16089: -- Component/s: erasure-coding metrics Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 Affects Version/s: 3.3.2 3.4.0 > EC: Add metric EcReconstructionValidateTimeMillis for > StripedBlockReconstructor > --- > > Key: HDFS-16089 > URL: https://issues.apache.org/jira/browse/HDFS-16089 > Project: Hadoop HDFS > Issue Type: Wish > Components: erasure-coding, metrics >Affects Versions: 3.4.0, 3.3.2 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Add metric EcReconstructionValidateTimeMillis for StripedBlockReconstructor, > so that we can count the elapsed time for striped block reconstructing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16104) Remove unused parameter and fix java doc for DiskBalancerCLI
[ https://issues.apache.org/jira/browse/HDFS-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16104: -- Component/s: diskbalancer documentation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove unused parameter and fix java doc for DiskBalancerCLI > > > Key: HDFS-16104 > URL: https://issues.apache.org/jira/browse/HDFS-16104 > Project: Hadoop HDFS > Issue Type: Wish > Components: diskbalancer, documentation >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Remove unused parameter and fix java doc for DiskBalancerCLI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16079) Improve the block state change log
[ https://issues.apache.org/jira/browse/HDFS-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16079: -- Component/s: block placement Target Version/s: 3.3.2, 3.4.0 Affects Version/s: 3.3.2 3.4.0 > Improve the block state change log > -- > > Key: HDFS-16079 > URL: https://issues.apache.org/jira/browse/HDFS-16079 > Project: Hadoop HDFS > Issue Type: Wish > Components: block placement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > Improve the block state change log. Add readOnlyReplicas and > replicasOnStaleNodes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16078) Remove unused parameters for DatanodeManager.handleLifeline()
[ https://issues.apache.org/jira/browse/HDFS-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16078: -- Component/s: namanode Target Version/s: 3.3.2, 3.2.3, 3.4.0 Affects Version/s: 3.3.2 3.2.3 3.4.0 > Remove unused parameters for DatanodeManager.handleLifeline() > - > > Key: HDFS-16078 > URL: https://issues.apache.org/jira/browse/HDFS-16078 > Project: Hadoop HDFS > Issue Type: Wish > Components: namanode >Affects Versions: 3.4.0, 3.2.3, 3.3.2 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Remove unused parameters (blockPoolId, maxTransfers) for > DatanodeManager.handleLifeline(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15991) Add location into datanode info for NameNodeMXBean
[ https://issues.apache.org/jira/browse/HDFS-15991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15991: -- Component/s: metrics namanode Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Add location into datanode info for NameNodeMXBean > -- > > Key: HDFS-15991 > URL: https://issues.apache.org/jira/browse/HDFS-15991 > Project: Hadoop HDFS > Issue Type: Wish > Components: metrics, namanode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Add location into datanode info for NameNodeMXBean. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16535) SlotReleaser should reuse the domain socket based on socket paths
[ https://issues.apache.org/jira/browse/HDFS-16535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned HDFS-16535: - Assignee: Quanlong Huang > SlotReleaser should reuse the domain socket based on socket paths > - > > Key: HDFS-16535 > URL: https://issues.apache.org/jira/browse/HDFS-16535 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Affects Versions: 3.3.1, 3.4.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.3 > > Time Spent: 2h > Remaining Estimate: 0h > > HDFS-13639 improves the performance of short-circuit shm slot releasing by > reusing the domain socket that the client previously used to send release > request to the DataNode. > This is good when there are only one DataNode locates with the client (truth > in most of the production environment). However, if we launch multiple > DataNodes on a machine (usually for testing, e.g. Impala's end-to-end tests), > the request could be sent to the wrong DataNode. See an example in > IMPALA-11234. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15951) Remove unused parameters in NameNodeProxiesClient
[ https://issues.apache.org/jira/browse/HDFS-15951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15951: -- Component/s: hdfs-client Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Remove unused parameters in NameNodeProxiesClient > - > > Key: HDFS-15951 > URL: https://issues.apache.org/jira/browse/HDFS-15951 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs-client >Affects Versions: 3.3.1, 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Remove unused parameters in org.apache.hadoop.hdfs.NameNodeProxiesClient. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15975) Use LongAdder instead of AtomicLong
[ https://issues.apache.org/jira/browse/HDFS-15975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15975: -- Component/s: metrics Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Use LongAdder instead of AtomicLong > --- > > Key: HDFS-15975 > URL: https://issues.apache.org/jira/browse/HDFS-15975 > Project: Hadoop HDFS > Issue Type: Wish > Components: metrics >Affects Versions: 3.3.1, 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > When counting some indicators, we can use LongAdder instead of AtomicLong to > improve performance. The long value is not an atomic snapshot in LongAdder, > but I think we can tolerate that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15938) Fix java doc in FSEditLog
[ https://issues.apache.org/jira/browse/HDFS-15938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15938: -- Component/s: documentation Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix java doc in FSEditLog > - > > Key: HDFS-15938 > URL: https://issues.apache.org/jira/browse/HDFS-15938 > Project: Hadoop HDFS > Issue Type: Wish > Components: documentation >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Fix java doc in > org.apache.hadoop.hdfs.server.namenode.FSEditLog#logAddCacheDirectiveInfo. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15906) Close FSImage and FSNamesystem after formatting is complete
[ https://issues.apache.org/jira/browse/HDFS-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15906: -- Component/s: namanode Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.4.0 > Close FSImage and FSNamesystem after formatting is complete > --- > > Key: HDFS-15906 > URL: https://issues.apache.org/jira/browse/HDFS-15906 > Project: Hadoop HDFS > Issue Type: Wish > Components: namanode >Affects Versions: 3.3.1, 3.4.0, 3.2.3 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Close FSImage and FSNamesystem after formatting is complete. > org.apache.hadoop.hdfs.server.namenode#format. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15892) Add metric for editPendingQ in FSEditLogAsync
[ https://issues.apache.org/jira/browse/HDFS-15892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15892: -- Component/s: metrics Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.4.0 > Add metric for editPendingQ in FSEditLogAsync > - > > Key: HDFS-15892 > URL: https://issues.apache.org/jira/browse/HDFS-15892 > Project: Hadoop HDFS > Issue Type: Wish > Components: metrics >Affects Versions: 3.3.1, 3.4.0, 3.2.3 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > To monitor editPendingQ in FSEditLogAsync, we add a metric > and print log when the queue is full. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15870) Remove unused configuration dfs.namenode.stripe.min
[ https://issues.apache.org/jira/browse/HDFS-15870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15870: -- Component/s: configuration Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.4.0 > Remove unused configuration dfs.namenode.stripe.min > --- > > Key: HDFS-15870 > URL: https://issues.apache.org/jira/browse/HDFS-15870 > Project: Hadoop HDFS > Issue Type: Wish > Components: configuration >Affects Versions: 3.3.1, 3.4.0, 3.2.3 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Remove unused configuration dfs.namenode.stripe.min. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15854) Make some parameters configurable for SlowDiskTracker and SlowPeerTracker
[ https://issues.apache.org/jira/browse/HDFS-15854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15854: -- Component/s: block placement Target Version/s: 3.3.5, 3.4.0 Affects Version/s: 3.3.5 3.4.0 > Make some parameters configurable for SlowDiskTracker and SlowPeerTracker > - > > Key: HDFS-15854 > URL: https://issues.apache.org/jira/browse/HDFS-15854 > Project: Hadoop HDFS > Issue Type: Wish > Components: block placement >Affects Versions: 3.4.0, 3.3.5 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Make some parameters configurable for SlowDiskTracker and SlowPeerTracker. > Related to https://issues.apache.org/jira/browse/HDFS-15814. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13274) RBF: Extend RouterRpcClient to use multiple sockets
[ https://issues.apache.org/jira/browse/HDFS-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13274: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Extend RouterRpcClient to use multiple sockets > --- > > Key: HDFS-13274 > URL: https://issues.apache.org/jira/browse/HDFS-13274 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > HADOOP-13144 introduces the ability to create multiple connections for the > same user and use different sockets. The RouterRpcClient should use this > approach to get a better throughput. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16598) Fix DataNode FsDatasetImpl lock issue without GS checks.
[ https://issues.apache.org/jira/browse/HDFS-16598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16598: -- Component/s: datanode Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix DataNode FsDatasetImpl lock issue without GS checks. > > > Key: HDFS-16598 > URL: https://issues.apache.org/jira/browse/HDFS-16598 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.4.0 >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 5h > Remaining Estimate: 0h > > org.apache.hadoop.hdfs.testPipelineRecoveryOnRestartFailure failed with the > stack like: > {code:java} > java.io.IOException: All datanodes > [DatanodeInfoWithStorage[127.0.0.1:57448,DS-1b5f7e33-a2bf-4edc-9122-a74c995a99f5,DISK]] > are bad. Aborting... > at > org.apache.hadoop.hdfs.DataStreamer.handleBadDatanode(DataStreamer.java:1667) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1601) > at > org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1587) > at > org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1371) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:674) > {code} > After tracing the root cause, this bug was introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. Because the > block GS of client may be smaller than DN when pipeline recovery failed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16600) Fix deadlock of fine-grain lock for FsDatastImpl of DataNode.
[ https://issues.apache.org/jira/browse/HDFS-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16600: -- Component/s: datanode Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix deadlock of fine-grain lock for FsDatastImpl of DataNode. > - > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.4.0 >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16526) Add metrics for slow DataNode
[ https://issues.apache.org/jira/browse/HDFS-16526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16526: -- Component/s: datanode metrics Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add metrics for slow DataNode > - > > Key: HDFS-16526 > URL: https://issues.apache.org/jira/browse/HDFS-16526 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, metrics >Affects Versions: 3.4.0 >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Metrics-html.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > Add some more metrics for slow datanode operations - FlushOrSync, > PacketResponder send ACK. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16488) [SPS]: Expose metrics to JMX for external SPS
[ https://issues.apache.org/jira/browse/HDFS-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16488: -- Component/s: metrics sps Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SPS]: Expose metrics to JMX for external SPS > - > > Key: HDFS-16488 > URL: https://issues.apache.org/jira/browse/HDFS-16488 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: metrics, sps >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-02-26-22-15-25-543.png > > Time Spent: 5h > Remaining Estimate: 0h > > Currently, external SPS has no monitoring metrics. We do not know how many > blocks are waiting to be processed, how many blocks are waiting to be > retried, and how many blocks have been migrated. > We can expose these metrics in JMX for easy collection and display by > monitoring systems. > !image-2022-02-26-22-15-25-543.png|width=631,height=170! > For example, in our cluster, we exposed these metrics to JMX, collected by > JMX-Exporter and combined with Prometheus, and finally display by Grafana. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16460) [SPS]: Handle failure retries for moving tasks
[ https://issues.apache.org/jira/browse/HDFS-16460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16460: -- Component/s: sps Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SPS]: Handle failure retries for moving tasks > -- > > Key: HDFS-16460 > URL: https://issues.apache.org/jira/browse/HDFS-16460 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: sps >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Handle failure retries for moving tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16484) [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread
[ https://issues.apache.org/jira/browse/HDFS-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16484: -- Component/s: sps Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.2.4, 3.4.0 Affects Version/s: 3.3.5 3.2.4 3.4.0 > [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread > - > > Key: HDFS-16484 > URL: https://issues.apache.org/jira/browse/HDFS-16484 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: sps >Affects Versions: 3.4.0, 3.2.4, 3.3.5 >Reporter: qinyuren >Assignee: qinyuren >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Attachments: image-2022-02-25-14-35-42-255.png > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently, we ran SPS in our cluster and found this log. The > SPSPathIdProcessor thread enters an infinite loop and prints the same log all > the time. > !image-2022-02-25-14-35-42-255.png|width=682,height=195! > In SPSPathIdProcessor thread, if it get a inodeId which path does not exist, > then the SPSPathIdProcessor thread entry infinite loop and can't work > normally. > The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not > exist. The inodeId will not be set to null, causing the thread hold this > inodeId forever. > {code:java} > public void run() { > LOG.info("Starting SPSPathIdProcessor!."); > Long startINode = null; > while (ctxt.isRunning()) { > try { > if (!ctxt.isInSafeMode()) { > if (startINode == null) { > startINode = ctxt.getNextSPSPath(); > } // else same id will be retried > if (startINode == null) { > // Waiting for SPS path > Thread.sleep(3000); > } else { > ctxt.scanAndCollectFiles(startINode); > // check if directory was empty and no child added to queue > DirPendingWorkInfo dirPendingWorkInfo = > pendingWorkForDirectory.get(startINode); > if (dirPendingWorkInfo != null > && dirPendingWorkInfo.isDirWorkDone()) { > ctxt.removeSPSHint(startINode); > pendingWorkForDirectory.remove(startINode); > } > } > startINode = null; // Current inode successfully scanned. > } > } catch (Throwable t) { > String reClass = t.getClass().getName(); > if (InterruptedException.class.getName().equals(reClass)) { > LOG.info("SPSPathIdProcessor thread is interrupted. Stopping.."); > break; > } > LOG.warn("Exception while scanning file inodes to satisfy the policy", > t); > try { > Thread.sleep(3000); > } catch (InterruptedException e) { > LOG.info("Interrupted while waiting in SPSPathIdProcessor", t); > break; > } > } > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15987) Improve oiv tool to parse fsimage file in parallel with delimited format
[ https://issues.apache.org/jira/browse/HDFS-15987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15987: -- Component/s: tools Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Improve oiv tool to parse fsimage file in parallel with delimited format > > > Key: HDFS-15987 > URL: https://issues.apache.org/jira/browse/HDFS-15987 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: tools >Affects Versions: 3.4.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Improve_oiv_tool_001.pdf > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The purpose of this Jira is to improve oiv tool to parse fsimage file with > sub-sections (see -HDFS-14617-) in parallel with delmited format. > 1.Serial parsing is time-consuming > The time to serially parse a large fsimage with delimited format (e.g. `hdfs > oiv -p Delimited -t ...`) is as follows: > {code:java} > 1) Loading string table: -> Not time consuming. > 2) Loading inode references: -> Not time consuming > 3) Loading directories in INode section: -> Slightly time consuming (3%) > 4) Loading INode directory section: -> A bit time consuming (11%) > 5) Output: -> Very time consuming (86%){code} > Therefore, output is the most parallelized stage. > 2.How to output in parallel > The sub-sections are grouped in order, and each thread processes a group and > outputs it to the file corresponding to each thread, and finally merges the > output files. > 3. The result of a test > {code:java} > input fsimage file info: > 3.4G, 12 sub-sections, 55976500 INodes > - > Threads TotalTime OutputTime MergeTime > 1 18m37s 16m18s – > 48m7s 4m49s 41s{code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16477) [SPS]: Add metric PendingSPSPaths for getting the number of paths to be processed by SPS
[ https://issues.apache.org/jira/browse/HDFS-16477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16477: -- Component/s: sps Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SPS]: Add metric PendingSPSPaths for getting the number of paths to be > processed by SPS > > > Key: HDFS-16477 > URL: https://issues.apache.org/jira/browse/HDFS-16477 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: sps >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 7h 50m > Remaining Estimate: 0h > > Currently we have no idea how many paths are waiting to be processed when > using the SPS feature. We should add metric PendingSPSPaths for getting the > number of paths to be processed by SPS in NameNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16499) [SPS]: Should not start indefinitely while another SPS process is running
[ https://issues.apache.org/jira/browse/HDFS-16499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16499: -- Component/s: sps Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SPS]: Should not start indefinitely while another SPS process is running > - > > Key: HDFS-16499 > URL: https://issues.apache.org/jira/browse/HDFS-16499 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: sps >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Normally, we can only start one SPS process at a time. When one process is > running, start another process and retry indefinitely. I think, in this case, > we should exit immediately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13248) RBF: Namenode need to choose block location for the client
[ https://issues.apache.org/jira/browse/HDFS-13248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13248: -- Component/s: rbf Hadoop Flags: Reviewed Target Version/s: 3.3.5, 2.10.2, 3.4.0 Affects Version/s: 3.3.5 2.10.2 3.4.0 > RBF: Namenode need to choose block location for the client > -- > > Key: HDFS-13248 > URL: https://issues.apache.org/jira/browse/HDFS-13248 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0, 2.10.2, 3.3.5 >Reporter: Wu Weiwei >Assignee: Owen O'Malley >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 2.10.2, 3.3.5 > > Attachments: HDFS-13248.000.patch, HDFS-13248.001.patch, > HDFS-13248.002.patch, HDFS-13248.003.patch, HDFS-13248.004.patch, > HDFS-13248.005.patch, HDFS-Router-Data-Locality.odt, RBF Data Locality > Design.pdf, clientMachine-call-path.jpeg, debug-info-1.jpeg, debug-info-2.jpeg > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When execute a put operation via router, the NameNode will choose block > location for the router, not for the real client. This will affect the file's > locality. > I think on both NameNode and Router, we should add a new addBlock method, or > add a parameter for the current addBlock method, to pass the real client > information. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16458) [SPS]: Fix bug for unit test of reconfiguring SPS mode
[ https://issues.apache.org/jira/browse/HDFS-16458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16458: -- Component/s: sps test Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SPS]: Fix bug for unit test of reconfiguring SPS mode > -- > > Key: HDFS-16458 > URL: https://issues.apache.org/jira/browse/HDFS-16458 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: sps, test >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > TestNameNodeReconfigure#verifySPSEnabled was compared with > itself({*}isSPSRunning{*}) at assertEquals. > In addition, after an *internal SPS* has been removed, *spsService daemon* > will not start within StoragePolicySatisfyManager. I think the relevant code > can be removed to simplify the code. > IMO, after reconfig SPS mode, we just need to confirm whether the mode is > correct and whether spsManager is NULL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16222) Fix ViewDFS with mount points for HDFS only API
[ https://issues.apache.org/jira/browse/HDFS-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16222: -- Component/s: viewfs Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix ViewDFS with mount points for HDFS only API > --- > > Key: HDFS-16222 > URL: https://issues.apache.org/jira/browse/HDFS-16222 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.4.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: test_to_repro.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Presently, For HDFS specific API, The ones not present in ViewFileSystem. The > resolved path seems to be coming wrong. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16231) Fix TestDataNodeMetrics#testReceivePacketSlowMetrics
[ https://issues.apache.org/jira/browse/HDFS-16231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16231: -- Component/s: datanode metrics Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix TestDataNodeMetrics#testReceivePacketSlowMetrics > > > Key: HDFS-16231 > URL: https://issues.apache.org/jira/browse/HDFS-16231 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, metrics >Affects Versions: 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > TestDataNodeMetrics#testReceivePacketSlowMetrics fails with stacktrace: > {code:java} > java.lang.AssertionError: Expected exactly one metric for name > TotalPacketsReceived > Expected :1 > Actual :0 > > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.test.MetricsAsserts.checkCaptured(MetricsAsserts.java:278) > at > org.apache.hadoop.test.MetricsAsserts.getLongCounter(MetricsAsserts.java:237) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testReceivePacketSlowMetrics(TestDataNodeMetrics.java:200) > {code} > {code:java} > // Error MetricsName in current code,e.g > TotalPacketsReceived,TotalPacketsSlowWriteToMirror,TotalPacketsSlowWriteToDisk,TotalPacketsSlowWriteToOsCache > MetricsRecordBuilder dnMetrics = > getMetrics(datanode.getMetrics().name()); > assertTrue("More than 1 packet received", > getLongCounter("TotalPacketsReceived", dnMetrics) > 1L); > assertTrue("More than 1 slow packet to mirror", > getLongCounter("TotalPacketsSlowWriteToMirror", dnMetrics) > 1L); > assertCounter("TotalPacketsSlowWriteToDisk", 1L, dnMetrics); > assertCounter("TotalPacketsSlowWriteToOsCache", 0L, dnMetrics); > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16192) ViewDistributedFileSystem#rename wrongly using src in the place of dst.
[ https://issues.apache.org/jira/browse/HDFS-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16192: -- Component/s: viewfs Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 Affects Version/s: 3.3.2 3.4.0 > ViewDistributedFileSystem#rename wrongly using src in the place of dst. > --- > > Key: HDFS-16192 > URL: https://issues.apache.org/jira/browse/HDFS-16192 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > In ViewDistributedFileSystem, we are mistakenly used src path in the place of > dst path when finding mount path info. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15671) TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk
[ https://issues.apache.org/jira/browse/HDFS-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15671: -- Component/s: balancer test Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > TestBalancerRPCDelay#testBalancerRPCDelayQpsDefault fails on Trunk > -- > > Key: HDFS-15671 > URL: https://issues.apache.org/jira/browse/HDFS-15671 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, test >Affects Versions: 3.4.0 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault.log > > Time Spent: 0.5h > Remaining Estimate: 0h > > qbt report shows failures on TestBalancer > {code:bash} > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault > Failing for the past 1 build (Since Failed#317 ) > Took 45 sec. > Error Message > Timed out waiting for /tmp.txt to reach 20 replicas > Stacktrace > java.util.concurrent.TimeoutException: Timed out waiting for /tmp.txt to > reach 20 replicas > at > org.apache.hadoop.hdfs.DFSTestUtil.waitReplication(DFSTestUtil.java:829) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.createFile(TestBalancer.java:319) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:865) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2193) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15973) RBF: Add permission check before doing router federation rename.
[ https://issues.apache.org/jira/browse/HDFS-15973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15973: -- Component/s: rbf Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Add permission check before doing router federation rename. > > > Key: HDFS-15973 > URL: https://issues.apache.org/jira/browse/HDFS-15973 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15973.001.patch, HDFS-15973.002.patch, > HDFS-15973.003.patch, HDFS-15973.004.patch, HDFS-15973.005.patch, > HDFS-15973.006.patch, HDFS-15973.007.patch, HDFS-15973.008.patch, > HDFS-15973.009.patch, HDFS-15973.010.patch > > > The router federation rename is lack of permission check. It is a security > issue. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13975) TestBalancer#testMaxIterationTime fails sporadically
[ https://issues.apache.org/jira/browse/HDFS-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13975: -- Component/s: balancer test Target Version/s: 3.2.3, 2.10.2, 3.3.1, 3.4.0 > TestBalancer#testMaxIterationTime fails sporadically > > > Key: HDFS-13975 > URL: https://issues.apache.org/jira/browse/HDFS-13975 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, test >Affects Versions: 3.2.0 >Reporter: Jason Darrell Lowe >Assignee: Toshihiko Uchida >Priority: Major > Labels: flaky-test, pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Time Spent: 40m > Remaining Estimate: 0h > > A number of precommit builds have seen this test fail like this: > {noformat} > java.lang.AssertionError: Unexpected iteration runtime: 4021ms > 3.5s > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testMaxIterationTime(TestBalancer.java:1649) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15848) Snapshot Operations: Add debug logs at the entry point
[ https://issues.apache.org/jira/browse/HDFS-15848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15848: -- Component/s: snapshots Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Snapshot Operations: Add debug logs at the entry point > -- > > Key: HDFS-15848 > URL: https://issues.apache.org/jira/browse/HDFS-15848 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Fix For: 3.4.0 > > Attachments: HDFS-15848.001.patch, HDFS-15848.002.patch, > HDFS-15848.003.patch, HDFS-15848.004.patch > > > Add debug logs at the entry point for various Snapshot Operations -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15847) create client protocol: add ecPolicyName & storagePolicy param to debug statement string
[ https://issues.apache.org/jira/browse/HDFS-15847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15847: -- Component/s: erasure-coding namanode Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > create client protocol: add ecPolicyName & storagePolicy param to debug > statement string > - > > Key: HDFS-15847 > URL: https://issues.apache.org/jira/browse/HDFS-15847 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namanode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15847.0001.patch > > > A create (ClientProtocol) ==> namesystem.startFileInt does not print > "ecPolicyName & storagePolicy" param, It will be good to have these params > added in debug statement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15834) Remove the usage of org.apache.log4j.Level
[ https://issues.apache.org/jira/browse/HDFS-15834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15834: -- Component/s: hdfs-common Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Remove the usage of org.apache.log4j.Level > -- > > Key: HDFS-15834 > URL: https://issues.apache.org/jira/browse/HDFS-15834 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-common >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Replace org.apache.log4j.Level with org.slf4j.event.Level in hadoop-hdfs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15820) Ensure snapshot root trash provisioning happens only post safe mode exit
[ https://issues.apache.org/jira/browse/HDFS-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15820: -- Component/s: snapshots Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Ensure snapshot root trash provisioning happens only post safe mode exit > > > Key: HDFS-15820 > URL: https://issues.apache.org/jira/browse/HDFS-15820 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, on namenode startup, snapshot trash root provisioning starts as > along with trash emptier service but namenode might not be out of safe mode > by then. This can fail the snapshot trash dir creation thereby crashing the > namenode. The idea here is to trigger snapshot trash provisioning only post > safe mode exit. > {code:java} > 2021-02-04 11:23:47,323 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: Error encountered requiring > NN shutdown. Shutting down immediately. > org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create > directory /upgrade/.Trash. Name node is in safe mode. > The reported blocks 0 needs additional 1383 blocks to reach the threshold > 0.9990 of total blocks 1385. > The number of live datanodes 0 needs an additional 1 live datanodes to reach > the minimum number 1. > Safe mode will be turned off automatically once the thresholds have been > reached. NamenodeHostName:quasar-brabeg-5.quasar-brabeg.root.hwx.site > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.newSafemodeException(FSNamesystem.java:1542) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1529) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3288) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAndProvisionSnapshotTrashRoots(FSNamesystem.java:8269) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1939) > at > org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:967) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:936) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1740) > 2021-02-04 11:23:47,334 INFO org.apache.hadoop.util.ExitUtil: Exiting with > status 1: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot > create directory /upgrade/.Trash. Name node is in safe mode. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15817) Rename snapshots while marking them deleted
[ https://issues.apache.org/jira/browse/HDFS-15817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15817: -- Component/s: snapshots Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Rename snapshots while marking them deleted > > > Key: HDFS-15817 > URL: https://issues.apache.org/jira/browse/HDFS-15817 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > With ordered snapshot feature turned on, a snapshot will be just marked as > deleted but won't actually be deleted if its not the oldest one. Since, the > snapshot is just marked deleted, creation of new snapshot having the same > name as the one which was marked deleted will fail. In order to mitigate such > problems, the idea here is to rename the snapshot getting marked as deleted > by appending deletion timestamp along with snapshot id to it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15767) RBF: Router federation rename of directory.
[ https://issues.apache.org/jira/browse/HDFS-15767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15767: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Router federation rename of directory. > --- > > Key: HDFS-15767 > URL: https://issues.apache.org/jira/browse/HDFS-15767 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15767.001.patch, HDFS-15767.002.patch, > HDFS-15767.003.patch, HDFS-15767.004.patch, HDFS-15767.005.patch, > HDFS-15767.006.patch, HDFS-15767.007.patch > > > This Jira trys to support rename of directory across namespaces using > fedbalance framework. > We can do the router federation rename when: > # Both the src and dst has only one remote location. > # The src and dst remote locations are at different namespaces. > # The src is a directory.(Fedbalance depends on snapshot). > # The dst doesn't exist. > We can implement router federation rename of file in a new task so the patch > won't be too big to review. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15672) TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy fails on trunk
[ https://issues.apache.org/jira/browse/HDFS-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15672: -- Component/s: balancer test Target Version/s: 3.2.3, 3.3.1, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.4.0 > TestBalancerWithMultipleNameNodes#testBalancingBlockpoolsWithBlockPoolPolicy > fails on trunk > --- > > Key: HDFS-15672 > URL: https://issues.apache.org/jira/browse/HDFS-15672 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, test >Affects Versions: 3.3.1, 3.4.0, 3.2.3 >Reporter: Ahmed Hussein >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 1h > Remaining Estimate: 0h > > qbt report shows the following error: > {code:bash} > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy > Failing for the past 1 build (Since Failed#317 ) > Took 10 min. > Error Message > test timed out after 60 milliseconds > Stacktrace > org.junit.runners.model.TestTimedOutException: test timed out after 60 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.sleep(TestBalancerWithMultipleNameNodes.java:353) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.wait(TestBalancerWithMultipleNameNodes.java:159) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runBalancer(TestBalancerWithMultipleNameNodes.java:175) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.runTest(TestBalancerWithMultipleNameNodes.java:550) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes.testBalancingBlockpoolsWithBlockPoolPolicy(TestBalancerWithMultipleNameNodes.java:609) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15762) TestMultipleNNPortQOP#testMultipleNNPortOverwriteDownStream fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15762: -- Component/s: hdfs test Target Version/s: 3.2.3, 3.3.1, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.4.0 > TestMultipleNNPortQOP#testMultipleNNPortOverwriteDownStream fails > intermittently > > > Key: HDFS-15762 > URL: https://issues.apache.org/jira/browse/HDFS-15762 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, test >Affects Versions: 3.3.1, 3.4.0, 3.2.3 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Minor > Labels: flaky-test, pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: PR2585#1-TestMultipleNNPortQOP-output.txt > > Time Spent: 3h 40m > Remaining Estimate: 0h > > This unit test failed in https://github.com/apache/hadoop/pull/2585 due to an > AssertionError. > {code} > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.hdfs.TestMultipleNNPortQOP.testMultipleNNPortOverwriteDownStream(TestMultipleNNPortQOP.java:267) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > The failure occurred at the following assertion. > {code} > doTest(fsPrivacy, PATH1); > for (int i = 0; i < 2; i++) { > DataNode dn = dataNodes.get(i); > SaslDataTransferClient saslClient = dn.getSaslClient(); > String qop = null; > // It may take some time for the qop to populate > // to all DNs, check in a loop. > for (int trial = 0; trial < 10; trial++) { > qop = saslClient.getTargetQOP(); > if (qop != null) { > break; > } > Thread.sleep(100); > } > assertEquals("auth", qop); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e
[jira] [Updated] (HDFS-14558) RBF: Isolation/Fairness documentation
[ https://issues.apache.org/jira/browse/HDFS-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-14558: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Isolation/Fairness documentation > - > > Key: HDFS-14558 > URL: https://issues.apache.org/jira/browse/HDFS-14558 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: CR Hota >Assignee: Fengnan Li >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-14558.001.patch, HDFS-14558.002.patch, > HDFS-14558.003.patch > > > Documentation is needed to make users aware of this feature HDFS-14090. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15702) Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission
[ https://issues.apache.org/jira/browse/HDFS-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15702: -- Component/s: hdfs test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix intermittent falilure of TestDecommission#testAllocAndIBRWhileDecommission > -- > > Key: HDFS-15702 > URL: https://issues.apache.org/jira/browse/HDFS-15702 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, test >Affects Versions: 3.4.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.hdfs.TestDecommission.testAllocAndIBRWhileDecommission(TestDecommission.java:1025) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15766) RBF: MockResolver.getMountPoints() breaks the semantic of FileSubclusterResolver.
[ https://issues.apache.org/jira/browse/HDFS-15766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15766: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: MockResolver.getMountPoints() breaks the semantic of > FileSubclusterResolver. > - > > Key: HDFS-15766 > URL: https://issues.apache.org/jira/browse/HDFS-15766 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15766.001.patch, HDFS-15766.002.patch, > HDFS-15766.003.patch > > > MockResolver.getMountPoints() breaks the semantic of > FileSubclusterResolver.getMountPoints(). Currently it returns null when the > path is a mount point and no mount points are under the path. > {quote}Return zero-length list if the path is a mount point but there are no > mount points under the path. > {quote} > > This is required by router federation rename. I found this bug when writing > unit test for the rbf rename. Let's fix it here to avoid mixing up with the > router federation rename. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15748) RBF: Move the router related part from hadoop-federation-balance module to hadoop-hdfs-rbf.
[ https://issues.apache.org/jira/browse/HDFS-15748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15748: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Move the router related part from hadoop-federation-balance module to > hadoop-hdfs-rbf. > --- > > Key: HDFS-15748 > URL: https://issues.apache.org/jira/browse/HDFS-15748 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15748.001.patch, HDFS-15748.002.patch, > HDFS-15748.003.patch, HDFS-15748.004.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15648) TestFileChecksum should be parameterized
[ https://issues.apache.org/jira/browse/HDFS-15648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15648: -- Component/s: test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > TestFileChecksum should be parameterized > > > Key: HDFS-15648 > URL: https://issues.apache.org/jira/browse/HDFS-15648 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Ahmed Hussein >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > {{TestFileChecksumCompositeCrc}} extends {{TestFileChecksum}} overriding 3 > methods that return a constant flag True/False. > The class is useless and it causes confusion with two different jiras, while > the main bug should be in TestFileChecksum. > The {{TestFileChecksum}} should be parameterized -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15677) TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk
[ https://issues.apache.org/jira/browse/HDFS-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15677: -- Component/s: rbf test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > TestRouterRpcMultiDestination#testGetCachedDatanodeReport fails on trunk > > > Key: HDFS-15677 > URL: https://issues.apache.org/jira/browse/HDFS-15677 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.4.0 >Reporter: Ahmed Hussein >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > qbt report (Nov 8, 2020, 11:28 AM) shows failures in > testGetCachedDatanodeReport -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15674) TestBPOfferService#testMissBlocksWhenReregister fails on trunk
[ https://issues.apache.org/jira/browse/HDFS-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15674: -- Component/s: datanode test Target Version/s: 3.3.6, 3.4.0 Affects Version/s: 3.3.6 3.4.0 > TestBPOfferService#testMissBlocksWhenReregister fails on trunk > -- > > Key: HDFS-15674 > URL: https://issues.apache.org/jira/browse/HDFS-15674 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, test >Affects Versions: 3.4.0, 3.3.6 >Reporter: Ahmed Hussein >Assignee: Masatake Iwasaki >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > Time Spent: 1h > Remaining Estimate: 0h > > qbt report (Nov 8, 2020, 11:28 AM) shows failures timing out in > testMissBlocksWhenReregister -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15643) EC: Fix checksum computation in case of native encoders
[ https://issues.apache.org/jira/browse/HDFS-15643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15643: -- Component/s: erasure-coding Target Version/s: 3.2.3, 3.3.1, 3.2.2, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.2.2 3.4.0 > EC: Fix checksum computation in case of native encoders > --- > > Key: HDFS-15643 > URL: https://issues.apache.org/jira/browse/HDFS-15643 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.2.2, 3.3.1, 3.4.0, 3.2.3 >Reporter: Ahmed Hussein >Assignee: Ayush Saxena >Priority: Blocker > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15643-01.patch, Test-Fix-01.patch, > TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery17.log, > org.apache.hadoop.hdfs.TestFileChecksum-output.txt, > org.apache.hadoop.hdfs.TestFileChecksum.txt > > Time Spent: 4h 40m > Remaining Estimate: 0h > > There are many failures in {{TestFileChecksumCompositeCrc}}. The test cases > {{testStripedFileChecksumWithMissedDataBlocksRangeQueryXX}} fail. The > following is a sample of the stack trace in two of them Query7 and Query8. > {code:bash} > org.apache.hadoop.fs.PathIOException: `/striped/stripedFileChecksum1': Fail > to get block checksum for > LocatedStripedBlock{BP-1812707539-172.17.0.3-1602771351154:blk_-9223372036854775792_1001; > getBlockSize()=37748736; corrupt=false; offset=0; > locs=[DatanodeInfoWithStorage[127.0.0.1:36687,DS-b00139f0-4f28-4870-8f72-b726bd339e23,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36303,DS-49a3c58e-da4a-4256-b1f9-893e4003ec94,DISK], > > DatanodeInfoWithStorage[127.0.0.1:43975,DS-ac278858-b6c8-424f-9e20-58d718dabe31,DISK], > > DatanodeInfoWithStorage[127.0.0.1:37507,DS-17f9d8d8-f8d3-443b-8df7-29416a2f5cb0,DISK], > > DatanodeInfoWithStorage[127.0.0.1:36441,DS-7e9d19b5-6220-465f-b33e-f8ed0e60fb07,DISK], > > DatanodeInfoWithStorage[127.0.0.1:42555,DS-ce679f5e-19fe-45b0-a0cd-8d8bec2f4735,DISK], > > DatanodeInfoWithStorage[127.0.0.1:39093,DS-4a7f54bb-dd39-4b5b-8dee-31a1b565cd7f,DISK], > > DatanodeInfoWithStorage[127.0.0.1:41699,DS-e1f939f3-37e7-413e-a522-934243477d81,DISK]]; > indices=[1, 2, 3, 4, 5, 6, 7, 8]} > at > org.apache.hadoop.hdfs.FileChecksumHelper$StripedFileNonStripedChecksumComputer.checksumBlocks(FileChecksumHelper.java:640) > at > org.apache.hadoop.hdfs.FileChecksumHelper$FileChecksumComputer.compute(FileChecksumHelper.java:252) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumInternal(DFSClient.java:1851) > at > org.apache.hadoop.hdfs.DFSClient.getFileChecksumWithCombineMode(DFSClient.java:1871) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1902) > at > org.apache.hadoop.hdfs.DistributedFileSystem$34.doCall(DistributedFileSystem.java:1899) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1916) > at > org.apache.hadoop.hdfs.TestFileChecksum.getFileChecksum(TestFileChecksum.java:584) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery(TestFileChecksum.java:295) > at > org.apache.hadoop.hdfs.TestFileChecksum.testStripedFileChecksumWithMissedDataBlocksRangeQuery7(TestFileChecksum.java:377) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} > > {code:bash} > Error Message > `/striped/stripedFileChecksum1': Fail to get block checksum for > LocatedStripedBlock{BP-1299291876-172.17.0.3-1
[jira] [Updated] (HDFS-15460) TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15460: -- Component/s: hdfs test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > TestFileCreation#testServerDefaultsWithMinimalCaching fails intermittently > -- > > Key: HDFS-15460 > URL: https://issues.apache.org/jira/browse/HDFS-15460 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, test >Affects Versions: 3.4.0 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available, test > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > {{TestFileCreation.testServerDefaultsWithMinimalCaching}} fails > intermittently on trunk > {code:bash} > [ERROR] Tests run: 25, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 103.413 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestFileCreation > [ERROR] > testServerDefaultsWithMinimalCaching(org.apache.hadoop.hdfs.TestFileCreation) > Time elapsed: 2.435 s <<< FAILURE! > java.lang.AssertionError: expected:<402653184> but was:<268435456> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.hdfs.TestFileCreation.testServerDefaultsWithMinimalCaching(TestFileCreation.java:279) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9776) TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky
[ https://issues.apache.org/jira/browse/HDFS-9776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-9776: - Component/s: test Target Version/s: 3.2.3, 3.3.1, 3.2.2, 3.4.0 Affects Version/s: 3.2.3 3.3.1 3.2.2 3.4.0 > TestHAAppend#testMultipleAppendsDuringCatchupTailing is flaky > - > > Key: HDFS-9776 > URL: https://issues.apache.org/jira/browse/HDFS-9776 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.2.2, 3.3.1, 3.4.0, 3.2.3 >Reporter: Vinayakumar B >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: TestHAAppend.testMultipleAppendsDuringCatchupTailing.log > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Initial analysys of Recent test failure in > {{TestHAAppend#testMultipleAppendsDuringCatchupTailing}} > [here|https://builds.apache.org/job/PreCommit-HDFS-Build/14420/testReport/org.apache.hadoop.hdfs.server.namenode.ha/TestHAAppend/testMultipleAppendsDuringCatchupTailing/] > > has found that, if the Active NameNode goes down immediately after truncate > operation, but before BlockRecovery command sent to datanode, > Then this block will never be truncated. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15640) Add diff threshold to FedBalance
[ https://issues.apache.org/jira/browse/HDFS-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15640: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add diff threshold to FedBalance > > > Key: HDFS-15640 > URL: https://issues.apache.org/jira/browse/HDFS-15640 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15640.001.patch, HDFS-15640.002.patch, > HDFS-15640.003.patch, HDFS-15640.004.patch > > > Currently in the DistCpProcedure it must submit distcp round by round until > there is no diff to go to the final distcp stage. The condition is very > strict. During incremental copy stage, if the diff size is under the given > threshold scope then we don't need to wait for no diff. We can start the > final distcp directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled
[ https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15614: -- Component/s: namanode snapshots Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Initialize snapshot trash root during NameNode startup if enabled > - > > Key: HDFS-15614 > URL: https://issues.apache.org/jira/browse/HDFS-15614 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namanode, snapshots >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This is a follow-up to HDFS-15607. > Goal: > Initialize (create) snapshot trash root for all existing snapshottable > directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to > {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually > on all those existing snapshottable directories. > The change is expected to land in {{FSNamesystem}}. > Discussion: > 1. Currently in HDFS-15607, the snapshot trash root creation logic is on the > client side. But in order for NN to create it at startup, the logic must > (also) be implemented on the server side as well. -- which is also a > requirement by WebHDFS (HDFS-15612). > 2. Alternatively, we can provide an extra parameter to the > {{-provisionTrash}} command like: {{dfsadmin -provisionTrash -all}} to > initialize/provision trash root on all existing snapshottable dirs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15598) ViewHDFS#canonicalizeUri should not be restricted to DFS only API.
[ https://issues.apache.org/jira/browse/HDFS-15598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15598: -- Component/s: viewfs Target Version/s: 3.4.0 > ViewHDFS#canonicalizeUri should not be restricted to DFS only API. > -- > > Key: HDFS-15598 > URL: https://issues.apache.org/jira/browse/HDFS-15598 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > As part of HIve Partitions verification, insert failed due to canonicalizeUri > restricted to DFS only. This can be relaxed and delegate to > vfs#canonicalizeUri -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15585) ViewDFS#getDelegationToken should not throw UnsupportedOperationException.
[ https://issues.apache.org/jira/browse/HDFS-15585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15585: -- Component/s: viewfs Target Version/s: 3.3.1, 3.4.0 > ViewDFS#getDelegationToken should not throw UnsupportedOperationException. > -- > > Key: HDFS-15585 > URL: https://issues.apache.org/jira/browse/HDFS-15585 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When starting Hive in secure environment, it is throwing > UnsupportedOprationException from ViewDFS. > at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:736) > ~[hive-service-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] > at > org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:1077) > ~[hive-service-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] > ... 9 more > Caused by: java.lang.UnsupportedOperationException > at > org.apache.hadoop.hdfs.ViewDistributedFileSystem.getDelegationToken(ViewDistributedFileSystem.java:1042) > ~[hadoop-hdfs-client-3.1.1.7.2.3.0-54.jar:?] > at > org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95) > ~[hadoop-common-3.1.1.7.2.3.0-54.jar:?] > at > org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76) > ~[hadoop-common-3.1.1.7.2.3.0-54.jar:?] > at > org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:140) > ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54] > at > org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:101) > ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54] > at > org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(TokenCache.java:77) > ~[tez-api-0.9.1.7.2.3.0-54.jar:0.9.1.7.2.3.0-54] > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.createLlapCredentials(TezSessionState.java:443) > ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:354) > ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] > at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:313) > ~[hive-exec-3.1.3000.7.2.3.0-54.jar:3.1.3000.7.2.3.0-54] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15532) listFiles on root/InternalDir will fail if fallback root has file
[ https://issues.apache.org/jira/browse/HDFS-15532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15532: -- Component/s: viewfs Target Version/s: 3.3.1, 3.4.0 > listFiles on root/InternalDir will fail if fallback root has file > - > > Key: HDFS-15532 > URL: https://issues.apache.org/jira/browse/HDFS-15532 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > listFiles implementation gets the RemoteIterator created in > InternalViewFSDirFs as the root is an InternalViewFSDir. > If there is a fallback and a file exist at root level, it would have > collected when collecting locatedStatuses. > When its iterating over to that fallbacks file from RemoteIterator (which > was returned from InternalViewFSDirFs ), iterator's next will will call > getFileBlockLocations if it's a file. > {code:java} > @Override > public LocatedFileStatus next() throws IOException { > System.out.println(this); > if (!hasNext()) { > throw new NoSuchElementException("No more entries in " + f); > } > FileStatus result = stats[i++]; > // for files, use getBlockLocations(FileStatus, int, int) to avoid > // calling getFileStatus(Path) to load the FileStatus again > BlockLocation[] locs = result.isFile() ? > getFileBlockLocations(result, 0, result.getLen()) : > null; > return new LocatedFileStatus(result, locs); > }{code} > > this getFileBlockLocations will be made on InternalViewFSDirFs, as that > Iterator created originally from that fs. > InternalViewFSDirFs#getFileBlockLocations does not handle fallback cases. > It's always expecting "/", this means it always assuming the dir. > But with the fallback and returning Iterator from InternalViewFSDirFs, will > create problems. > Probably we need to handle fallback case in getFileBlockLocations as well.( > Fallback only should be the reason for call coming to InternalViewFSDirFs > with other than "/") > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15558) ViewDistributedFileSystem#recoverLease should call super.recoverLease when there are no mounts configured
[ https://issues.apache.org/jira/browse/HDFS-15558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15558: -- Component/s: viewfs Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > ViewDistributedFileSystem#recoverLease should call super.recoverLease when > there are no mounts configured > - > > Key: HDFS-15558 > URL: https://issues.apache.org/jira/browse/HDFS-15558 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: viewfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15496) Add UI for deleted snapshots
[ https://issues.apache.org/jira/browse/HDFS-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15496: -- Component/s: snapshots Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add UI for deleted snapshots > > > Key: HDFS-15496 > URL: https://issues.apache.org/jira/browse/HDFS-15496 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Mukul Kumar Singh >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Fix For: 3.4.0 > > > Add UI for deleted snapshots > a) Show the list of snapshots per snapshottable directory > b) Add deleted status in the JMX output for the Snapshot along with a snap ID > e) NN UI, should sort the snapshots for snapIds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15518) Wrong operation name in FsNamesystem for listSnapshots
[ https://issues.apache.org/jira/browse/HDFS-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15518: -- Component/s: snapshots Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Wrong operation name in FsNamesystem for listSnapshots > -- > > Key: HDFS-15518 > URL: https://issues.apache.org/jira/browse/HDFS-15518 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Mukul Kumar Singh >Assignee: Aryan Gupta >Priority: Major > Fix For: 3.4.0 > > > List snapshots makes use of listSnapshotDirectory as the string in place of > ListSnapshot. > https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L7026 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15374) Add documentation for fedbalance tool
[ https://issues.apache.org/jira/browse/HDFS-15374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15374: -- Component/s: documentation rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add documentation for fedbalance tool > - > > Key: HDFS-15374 > URL: https://issues.apache.org/jira/browse/HDFS-15374 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: documentation, rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: BalanceProcedureScheduler.png, > FedBalance_Screenshot1.jpg, FedBalance_Screenshot2.jpg, > FedBalance_Screenshot3.jpg, HDFS-15374.001.patch, HDFS-15374.002.patch, > HDFS-15374.003.patch, HDFS-15374.004.patch, HDFS-15374.005.patch > > > Add documentation for fedbalance tool. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15410) Add separated config file hdfs-fedbalance-default.xml for fedbalance tool
[ https://issues.apache.org/jira/browse/HDFS-15410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15410: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add separated config file hdfs-fedbalance-default.xml for fedbalance tool > - > > Key: HDFS-15410 > URL: https://issues.apache.org/jira/browse/HDFS-15410 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15410.001.patch, HDFS-15410.002.patch, > HDFS-15410.003.patch, HDFS-15410.004.patch, HDFS-15410.005.patch > > > Add a separated config file named hdfs-fedbalance-default.xml for fedbalance > tool configs. It's like the ditcp-default.xml for distcp tool. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15346) FedBalance tool implementation
[ https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15346: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > FedBalance tool implementation > -- > > Key: HDFS-15346 > URL: https://issues.apache.org/jira/browse/HDFS-15346 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, > HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, > HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch, > HDFS-15346.009.patch, HDFS-15346.010.patch, HDFS-15346.011.patch, > HDFS-15346.012.patch > > > This Jira implements the HDFS FedBalance tool based on the basic frame work > in HDFS-15340. The whole process of hdfs federation tool is implemented in > this jira. See the documentation at HDFS-15374/patch-v05 for a detailed > description of the HDFS fedbalance tool. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15340) RBF: Implement BalanceProcedureScheduler basic framework
[ https://issues.apache.org/jira/browse/HDFS-15340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15340: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Implement BalanceProcedureScheduler basic framework > > > Key: HDFS-15340 > URL: https://issues.apache.org/jira/browse/HDFS-15340 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15340.001.patch, HDFS-15340.002.patch, > HDFS-15340.003.patch, HDFS-15340.004.patch, HDFS-15340.005.patch, > HDFS-15340.006.patch, HDFS-15340.007.patch, HDFS-15340.008.patch > > > This Jira implements the basic framework(Balance Procedure Scheduler) of the > hdfs federation balance tool. > The Balance Procedure Scheduler implements a state machine. It’s responsible > for scheduling a balance job, including submit, run, delay and recover. See > the documentation at HDFS-15374/patch-v05 for a detailed description of the > state machine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15146) TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15146: -- Component/s: balancer test Target Version/s: 2.10.1, 3.2.2, 3.3.0, 3.4.0 Affects Version/s: 2.10.1 3.2.2 3.3.0 3.4.0 > TestBalancerRPCDelay. testBalancerRPCDelayQpsDefault fails intermittently > - > > Key: HDFS-15146 > URL: https://issues.apache.org/jira/browse/HDFS-15146 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: balancer, test >Affects Versions: 3.3.0, 3.2.2, 2.10.1, 3.4.0 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Minor > Fix For: 3.3.0, 3.2.2, 2.10.1, 3.4.0 > > Attachments: HDFS-15146-branch-2.10.001.patch, HDFS-15146.001.patch > > > TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault fails intermittently when > the number of blocks does not match the expected. In > {{testBalancerRPCDelay}}, it seems like some datanodes will not be up by the > time we fetch the block locations. > I see the following stack trace: > {code:bash} > [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 39.969 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay > [ERROR] > testBalancerRPCDelayQpsDefault(org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay) > Time elapsed: 12.035 s <<< FAILURE! > java.lang.AssertionError: Number of getBlocks should be not less than 20 > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancer.testBalancerRPCDelay(TestBalancer.java:2197) > at > org.apache.hadoop.hdfs.server.balancer.TestBalancerRPCDelay.testBalancerRPCDelayQpsDefault(TestBalancerRPCDelay.java:53) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15898) Test case TestOfflineImageViewer fails
[ https://issues.apache.org/jira/browse/HDFS-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15898: -- Component/s: test Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Test case TestOfflineImageViewer fails > -- > > Key: HDFS-15898 > URL: https://issues.apache.org/jira/browse/HDFS-15898 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The following 3 cases failed locally > TestOfflineImageViewer#testWriterOutputEntryBuilderForFile > > {code:java} > org.junit.ComparisonFailure: org.junit.ComparisonFailure: Expected > :/path/file,5,2000-01-01 00:00,2000-01-01 > 00:00,1024,3,3072,0,0,-rwx-wx-w-+,user_1,group_1Actual > :/path/file,5,2000-01-01 08:00,2000-01-01 > 08:00,1024,3,3072,0,0,-rwx-wx-w-+,user_1,group_1 > at org.junit.Assert.assertEquals(Assert.java:115) at > org.junit.Assert.assertEquals(Assert.java:144) at > org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer.testWriterOutputEntryBuilderForFile(TestOfflineImageViewer.java:760){code} > TestOfflineImageViewer#testWriterOutputEntryBuilderForDirectory > {code:java} > org.junit.ComparisonFailure: org.junit.ComparisonFailure: Expected > :/path/dir,0,2000-01-01 00:00,1970-01-01 > 00:00,0,0,0,700,1000,drwx-wx-w-+,user_1,group_1Actual > :/path/dir,0,2000-01-01 08:00,1970-01-01 > 08:00,0,0,0,700,1000,drwx-wx-w-+,user_1,group_1 at > org.junit.Assert.assertEquals(Assert.java:115) at > org.junit.Assert.assertEquals(Assert.java:144) at > org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer.testWriterOutputEntryBuilderForDirectory(TestOfflineImageViewer.java:768){code} > TestOfflineImageViewer#testWriterOutputEntryBuilderForSymlink > {code:java} > org.junit.ComparisonFailure: org.junit.ComparisonFailure: Expected > :/path/sym,0,2000-01-01 00:00,2000-01-01 > 00:00,0,0,0,0,0,-rwx-wx-w-,user_1,group_1Actual :/path/sym,0,2000-01-01 > 08:00,2000-01-01 08:00,0,0,0,0,0,-rwx-wx-w-,user_1,group_1 difference> at org.junit.Assert.assertEquals(Assert.java:115) at > org.junit.Assert.assertEquals(Assert.java:144) at > org.apache.hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer.testWriterOutputEntryBuilderForSymlink(TestOfflineImageViewer.java:776){code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15576) Erasure Coding: Add rs and rs-legacy codec test for addPolicies
[ https://issues.apache.org/jira/browse/HDFS-15576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15576: -- Component/s: erasure-coding test Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Erasure Coding: Add rs and rs-legacy codec test for addPolicies > --- > > Key: HDFS-15576 > URL: https://issues.apache.org/jira/browse/HDFS-15576 > Project: Hadoop HDFS > Issue Type: Test > Components: erasure-coding, test >Affects Versions: 3.4.0 >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Minor > Fix For: 3.4.0 > > Attachments: HDFS-15576.001.patch, HDFS-15576.002.patch > > > * Add rs and rs-legacy codec test for TestErasureCodingCLI > * Add comments for failed test RS > * Modify UT, change "RS" to "rs", because "RS" is not supported -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15690) Add lz4-java as hadoop-hdfs test dependency
[ https://issues.apache.org/jira/browse/HDFS-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15690: -- Component/s: test Target Version/s: 3.3.1, 3.4.0 Affects Version/s: 3.3.1 3.4.0 > Add lz4-java as hadoop-hdfs test dependency > --- > > Key: HDFS-15690 > URL: https://issues.apache.org/jira/browse/HDFS-15690 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.3.1, 3.4.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > TestFSImage.testNativeCompression fails with "java.lang.NoClassDefFoundError: > net/jpountz/lz4/LZ4Factory": > https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/305/testReport/junit/org.apache.hadoop.hdfs.server.namenode/TestFSImage/testNativeCompression/ > We need to add lz4-java to hadoop-hdfs test dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15559) Complement initialize member variables in TestHdfsConfigFields#initializeMemberVariables
[ https://issues.apache.org/jira/browse/HDFS-15559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15559: -- Component/s: test Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Complement initialize member variables in > TestHdfsConfigFields#initializeMemberVariables > > > Key: HDFS-15559 > URL: https://issues.apache.org/jira/browse/HDFS-15559 > Project: Hadoop HDFS > Issue Type: Test > Components: test >Affects Versions: 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.4.0 > > Attachments: HDFS-15559.001.patch, HDFS-15559.002.patch > > > There are some missing constant interfaces in > TestHdfsConfigFields#initializeMemberVariables > {code:java} > @Override > public void initializeMemberVariables() { > xmlFilename = new String("hdfs-default.xml"); > configurationClasses = new Class[] { HdfsClientConfigKeys.class, > HdfsClientConfigKeys.Failover.class, > HdfsClientConfigKeys.StripedRead.class, DFSConfigKeys.class, > HdfsClientConfigKeys.BlockWrite.class, > HdfsClientConfigKeys.BlockWrite.ReplaceDatanodeOnFailure.class }; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash
[ https://issues.apache.org/jira/browse/HDFS-16550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16550: -- Component/s: journal-node Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SBN read] Improper cache-size for journal node may cause cluster crash > --- > > Key: HDFS-16550 > URL: https://issues.apache.org/jira/browse/HDFS-16550 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-04-21-09-54-29-751.png, > image-2022-04-21-09-54-57-111.png, image-2022-04-21-12-32-56-170.png > > Time Spent: 1h > Remaining Estimate: 0h > > When we introduced {*}SBN Read{*}, we encountered a situation during upgrade > the JournalNodes. > Cluster Info: > *Active: nn0* > *Standby: nn1* > 1. Rolling restart journal node. {color:#ff}(related config: > fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color} > 2. The cluster runs for a while, edits cache usage is increasing and memory > is used up. > 3. {color:#ff}Active namenode(nn0){color} shutdown because of “{_}Timed > out waiting 12ms for a quorum of nodes to respond”{_}. > 4. Transfer nn1 to Active state. > 5. {color:#ff}New Active namenode(nn1){color} also shutdown because of > “{_}Timed out waiting 12ms for a quorum of nodes to respond” too{_}. > 6. {color:#ff}The cluster crashed{color}. > > Related code: > {code:java} > JournaledEditsCache(Configuration conf) { > capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, > DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT); > if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) { > Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " + > "maximum JVM memory is only %d bytes. It is recommended that you " + > "decrease the cache size or increase the heap size.", > capacity, Runtime.getRuntime().maxMemory())); > } > Journal.LOG.info("Enabling the journaled edits cache with a capacity " + > "of bytes: " + capacity); > ReadWriteLock lock = new ReentrantReadWriteLock(true); > readLock = new AutoCloseableLock(lock.readLock()); > writeLock = new AutoCloseableLock(lock.writeLock()); > initialize(INVALID_TXN_ID); > } {code} > Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size > than the memory requested by the process. If > {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * > Runtime.getruntime().maxMemory(){*}, only warn logs are printed during > journalnode startup. This can easily be overlooked by users. However, as the > cluster runs to a certain period of time, it is likely to cause the cluster > to crash. > > NN log: > !image-2022-04-21-09-54-57-111.png|width=1012,height=47! > !image-2022-04-21-12-32-56-170.png|width=809,height=218! > IMO, we should not set the {{cache size}} to a fixed value, but to the ratio > of maximum memory, which is 0.2 by default. > This avoids the problem of too large cache size. In addition, users can > actively adjust the heap size when they need to increase the cache size. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state
[ https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16547: -- Component/s: namanode Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > [SBN read] Namenode in safe mode should not be transfered to observer state > --- > > Key: HDFS-16547 > URL: https://issues.apache.org/jira/browse/HDFS-16547 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently, when a Namenode is in safemode(under starting or enter safemode > manually), we can transfer this Namenode to Observer by command. This > Observer node may receive many requests and then throw a SafemodeException, > this causes unnecessary failover on the client. > So Namenode in safe mode should not be transfer to observer state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16593) Correct inaccurate BlocksRemoved metric on DataNode side
[ https://issues.apache.org/jira/browse/HDFS-16593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16593: -- Component/s: datanode metrics Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Correct inaccurate BlocksRemoved metric on DataNode side > > > Key: HDFS-16593 > URL: https://issues.apache.org/jira/browse/HDFS-16593 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, metrics >Affects Versions: 3.4.0 >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > When tracing the root cause of production issue, I found that the > BlocksRemoved metric on Datanode size was inaccurate. > {code:java} > case DatanodeProtocol.DNA_INVALIDATE: > // > // Some local block(s) are obsolete and can be > // safely garbage-collected. > // > Block toDelete[] = bcmd.getBlocks(); > try { > // using global fsdataset > dn.getFSDataset().invalidate(bcmd.getBlockPoolId(), toDelete); > } catch(IOException e) { > // Exceptions caught here are not expected to be disk-related. > throw e; > } > dn.metrics.incrBlocksRemoved(toDelete.length); > break; > {code} > Because even if the invalidate method throws an exception, some blocks may > have been successfully deleted internally. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16659) JournalNode should throw NewerTxnIdException if SinceTxId is bigger than HighestWrittenTxId
[ https://issues.apache.org/jira/browse/HDFS-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16659: -- Component/s: journal-node Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > JournalNode should throw NewerTxnIdException if SinceTxId is bigger than > HighestWrittenTxId > --- > > Key: HDFS-16659 > URL: https://issues.apache.org/jira/browse/HDFS-16659 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node >Affects Versions: 3.4.0 >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > JournalNode should throw `CacheMissException` if `sinceTxId` is bigger than > `highestWrittenTxId` during handling `getJournaledEdits` rpc from NNs. > Current logic may cause in-progress EditlogTailer cannot replay any Edits > from JournalNodes in some corner cases, resulting in ObserverNameNode cannot > handle requests from clients. > Suppose there are 3 journalNodes, JN0 ~ JN1. > * JN0 has some abnormal cases when Active Namenode is syncing 10 Edits with > first txid 11 > * NameNode just ignore the abnormal JN0 and continue to sync Edits to Journal > 1 and 2 > * JN0 backed to health > * NameNode continue sync 10 Edits with first txid 21. > * At this point, there are no Edits 11 ~ 30 in the cache of JN0 > * Observer NameNode try to select EditLogInputStream through > `getJournaledEdits` with since txId 21 > * Journal 2 has some abnormal cases and caused a slow response > The expected result is: Response should contain 20 Edits from txId 21 to txId > 30 from JN1 and JN2. Because Active NameNode successfully write these Edits > to JN1 and JN2 and failed write these edits to JN0. > But in the current implementation, the response is [Response(0) from JN0, > Response(10) from JN1], because there are some abnormal cases in JN2, such > as GC, bad network, cause a slow response. So the `maxAllowedTxns` will be > 0, NameNode will not replay any Edits. > As above, the root case is that JournalNode should throw Miss Cache Exception > when `sinceTxid` is more than `highestWrittenTxId`. > And the bug code as blew: > {code:java} > if (sinceTxId > getHighestWrittenTxId()) { > // Requested edits that don't exist yet; short-circuit the cache here > metrics.rpcEmptyResponses.incr(); > return > GetJournaledEditsResponseProto.newBuilder().setTxnCount(0).build(); > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16623) IllegalArgumentException in LifelineSender
[ https://issues.apache.org/jira/browse/HDFS-16623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16623: -- Component/s: datanode Target Version/s: 3.3.5, 3.2.4, 3.4.0 Affects Version/s: 3.3.5 3.2.4 3.4.0 > IllegalArgumentException in LifelineSender > -- > > Key: HDFS-16623 > URL: https://issues.apache.org/jira/browse/HDFS-16623 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.2.4, 3.3.5 >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In our production environment, an IllegalArgumentException occurred in the > LifelineSender at one DataNode which was undergoing GC at that time. > And the bug code is at line 1060 in BPServiceActor.java, because the sleep > time is negative. > {code:java} > while (shouldRun()) { > try { > if (lifelineNamenode == null) { > lifelineNamenode = dn.connectToLifelineNN(lifelineNnAddr); > } > sendLifelineIfDue(); > Thread.sleep(scheduler.getLifelineWaitTime()); > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > } catch (IOException e) { > LOG.warn("IOException in LifelineSender for " + BPServiceActor.this, > e); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16583) DatanodeAdminDefaultMonitor can get stuck in an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16583: -- Component/s: datanode Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.2.4, 3.4.0 Affects Version/s: 3.3.5 3.2.4 3.4.0 > DatanodeAdminDefaultMonitor can get stuck in an infinite loop > - > > Key: HDFS-16583 > URL: https://issues.apache.org/jira/browse/HDFS-16583 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.2.4, 3.3.5 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Time Spent: 2h > Remaining Estimate: 0h > > We encountered a case where the decommission monitor in the namenode got > stuck for about 6 hours. The logs give: > {code} > 2022-05-15 01:09:25,490 INFO > org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping > maintenance of dead node 10.185.3.132:50010 > 2022-05-15 01:10:20,918 INFO org.apache.hadoop.http.HttpServer2: Process > Thread Dump: jsp requested > > 2022-05-15 01:19:06,810 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > PendingReconstructionMonitor timed out blk_4501753665_3428271426 > 2022-05-15 01:19:06,810 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > PendingReconstructionMonitor timed out blk_4501753659_3428271420 > 2022-05-15 01:19:06,810 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > PendingReconstructionMonitor timed out blk_4501753662_3428271423 > 2022-05-15 01:19:06,810 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > PendingReconstructionMonitor timed out blk_4501753663_3428271424 > 2022-05-15 06:00:57,281 INFO > org.apache.hadoop.hdfs.server.blockmanagement.HeartbeatManager: Stopping > maintenance of dead node 10.185.3.34:50010 > 2022-05-15 06:00:58,105 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock > held for 17492614 ms via > java.lang.Thread.getStackTrace(Thread.java:1559) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:263) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:220) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1601) > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:496) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > Number of suppressed write-lock reports: 0 > Longest write-lock held interval: 17492614 > {code} > We only have the one thread dump triggered by the FC: > {code} > Thread 80 (DatanodeAdminMonitor-0): > State: RUNNABLE > Blocked count: 16 > Waited count: 453693 > Stack: > > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:538) > > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:494) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) > {code} > This was the line of code: > {code} > private void check() { > final Iterator>> > it = new CyclicIteration<>(outOfServiceNodeBlocks, > iterkey).iterator(); > final LinkedList toRemove = new LinkedList<>(); > while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem > .isRunning()) { > numNodesChecked++; > final Ma
[jira] [Updated] (HDFS-15225) RBF: Add snapshot counts to content summary in router
[ https://issues.apache.org/jira/browse/HDFS-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15225: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Add snapshot counts to content summary in router > - > > Key: HDFS-15225 > URL: https://issues.apache.org/jira/browse/HDFS-15225 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Quan Li >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16572) Fix typo in readme of hadoop-project-dist
[ https://issues.apache.org/jira/browse/HDFS-16572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16572: -- Component/s: documentation Hadoop Flags: Reviewed Target Version/s: 3.4.0 > Fix typo in readme of hadoop-project-dist > - > > Key: HDFS-16572 > URL: https://issues.apache.org/jira/browse/HDFS-16572 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Change *not* to *no*. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16552) Fix NPE for TestBlockManager
[ https://issues.apache.org/jira/browse/HDFS-16552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16552: -- Component/s: test Hadoop Flags: Reviewed Target Version/s: 3.3.5, 3.2.4, 3.4.0 Affects Version/s: 3.3.5 3.2.4 3.4.0 > Fix NPE for TestBlockManager > > > Key: HDFS-16552 > URL: https://issues.apache.org/jira/browse/HDFS-16552 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.2.4, 3.3.5 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is a NPE in BlockManager when run > TestBlockManager#testSkipReconstructionWithManyBusyNodes2. Because > NameNodeMetrics is not initialized in this unit test. > > Related ci link, see > [this|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4209/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]. > {code:java} > [ERROR] Tests run: 34, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 30.088 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager > [ERROR] > testSkipReconstructionWithManyBusyNodes2(org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager) > Time elapsed: 2.783 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.scheduleReconstruction(BlockManager.java:2171) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager.testSkipReconstructionWithManyBusyNodes2(TestBlockManager.java:947) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16507) [SBN read] Avoid purging edit log which is in progress
[ https://issues.apache.org/jira/browse/HDFS-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16507: -- Component/s: namanode Target Version/s: 3.3.3, 3.4.0 (was: 3.3.3) > [SBN read] Avoid purging edit log which is in progress > -- > > Key: HDFS-16507 > URL: https://issues.apache.org/jira/browse/HDFS-16507 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Affects Versions: 3.1.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.3 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > We introduced [Standby Read] feature in branch-3.1.0, but found a FATAL > exception. It looks like it's purging edit logs which is in process. > According to the analysis, I suspect that the editlog which is in progress to > be purged(after SNN checkpoint) does not finalize(See HDFS-14317) before ANN > rolls edit its self. > The stack: > {code:java} > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > > org.apache.hadoop.hdfs.server.namenode.FileJournalManager.purgeLogsOlderThan(FileJournalManager.java:185) > > org.apache.hadoop.hdfs.server.namenode.JournalSet$5.apply(JournalSet.java:623) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:388) > > org.apache.hadoop.hdfs.server.namenode.JournalSet.purgeLogsOlderThan(JournalSet.java:620) > > org.apache.hadoop.hdfs.server.namenode.FSEditLog.purgeLogsOlderThan(FSEditLog.java:1512) > org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:177) > > org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:1249) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:617) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet$2.run(ImageServlet.java:516) > java.security.AccessController.doPrivileged(Native Method) > javax.security.auth.Subject.doAs(Subject.java:422) > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:515) > javax.servlet.http.HttpServlet.service(HttpServlet.java:710) > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772) > > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1604) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > org.eclipse.jetty.server.Server.handle(Server.java:539) > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:333) > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceC
[jira] [Updated] (HDFS-16498) Fix NPE for checkBlockReportLease
[ https://issues.apache.org/jira/browse/HDFS-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16498: -- Component/s: datanode namanode Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fix NPE for checkBlockReportLease > - > > Key: HDFS-16498 > URL: https://issues.apache.org/jira/browse/HDFS-16498 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namanode >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-03-09-20-35-22-028.png, screenshot-1.png > > Time Spent: 5.5h > Remaining Estimate: 0h > > During the restart of Namenode, a Datanode is not registered, but this > Datanode triggers FBR, which causes NPE. > !image-2022-03-09-20-35-22-028.png|width=871,height=158! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16503) Should verify whether the path name is valid in the WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16503: -- Component/s: webhdfs Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Should verify whether the path name is valid in the WebHDFS > --- > > Key: HDFS-16503 > URL: https://issues.apache.org/jira/browse/HDFS-16503 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.4.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-03-14-09-35-49-860.png > > Time Spent: 2h > Remaining Estimate: 0h > > When creating a file using WebHDFS, there are two main steps: > 1. Obtain the location of the Datanode to be written. > 2. Put the file to this location. > Currently *NameNodeRpcServer* verifies that pathName is valid, but > *NamenodeWebHdfsMethods* and *RouterWebHdfsMethods* do not. > So if we use an invalid path(such as duplicated slash), the first step > returns success, but the second step throws an {*}InvalidPathException{*}. > IMO, we should also do the validation in WebHdfs, which is consistent with > the NameNodeRpcServer. > !image-2022-03-14-09-35-49-860.png|width=548,height=164! > The same webHDFS operations are: CREATE, APPEND, OPEN, GETFILECHECKSUM. So we > can add DFSUtil.isValidName to redirectURI for *NamenodeWebHdfsMethods* and > *RouterWebHdfsMethods.* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16406) DataNode metric ReadsFromLocalClient does not count short-circuit reads
[ https://issues.apache.org/jira/browse/HDFS-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16406: -- Component/s: datanode metrics Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > DataNode metric ReadsFromLocalClient does not count short-circuit reads > --- > > Key: HDFS-16406 > URL: https://issues.apache.org/jira/browse/HDFS-16406 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, metrics >Affects Versions: 3.4.0 >Reporter: secfree >Assignee: secfree >Priority: Minor > Labels: metrics, pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > The following test case failed. > {code} > @Test > public void testNodeLocalMetrics() throws Exception { > Assume.assumeTrue(null == DomainSocket.getLoadingFailureReason()); > Configuration conf = new HdfsConfiguration(); > conf.setBoolean(HdfsClientConfigKeys.Read.ShortCircuit.KEY, true); > TemporarySocketDirectory sockDir = new TemporarySocketDirectory(); > DomainSocket.disableBindPathValidation(); > conf.set(DFSConfigKeys.DFS_DOMAIN_SOCKET_PATH_KEY, > new File(sockDir.getDir(), > "testNodeLocalMetrics._PORT.sock").getAbsolutePath()); > MiniDFSCluster cluster = new > MiniDFSCluster.Builder(conf).numDataNodes(1).build(); > try { > cluster.waitActive(); > FileSystem fs = cluster.getFileSystem(); > Path testFile = new Path("/testNodeLocalMetrics.txt"); > long file_len = 10; > DFSTestUtil.createFile(fs, testFile, file_len, (short)1, 1L); > DFSTestUtil.readFile(fs, testFile); > List datanodes = cluster.getDataNodes(); > assertEquals(datanodes.size(), 1); > DataNode datanode = datanodes.get(0); > MetricsRecordBuilder rb = getMetrics(datanode.getMetrics().name()); > // Write related metrics > assertCounter("WritesFromLocalClient", 1L, rb); > // Read related metrics > assertCounter("ReadsFromLocalClient", 1L, rb); // failed here > } finally { > if (cluster != null) { > cluster.shutdown(); > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
[ https://issues.apache.org/jira/browse/HDFS-16303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16303: -- Component/s: block placement datanode Target Version/s: 3.3.5, 3.2.4, 3.4.0 > Losing over 100 datanodes in state decommissioning results in full blockage > of all datanode decommissioning > --- > > Key: HDFS-16303 > URL: https://issues.apache.org/jira/browse/HDFS-16303 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement, datanode >Affects Versions: 2.10.1, 3.3.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.5 > > Time Spent: 17h 50m > Remaining Estimate: 0h > > h2. Impact > HDFS datanode decommissioning does not make any forward progress. For > example, the user adds X datanodes to the "dfs.hosts.exclude" file and all X > of those datanodes remain in state decommissioning forever without making any > forward progress towards being decommissioned. > h2. Root Cause > The HDFS Namenode class "DatanodeAdminManager" is responsible for > decommissioning datanodes. > As per this "hdfs-site" configuration: > {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes > Default Value = 100 > The maximum number of decommission-in-progress datanodes nodes that will be > tracked at one time by the namenode. Tracking a decommission-in-progress > datanode consumes additional NN memory proportional to the number of blocks > on the datnode. Having a conservative limit reduces the potential impact of > decomissioning a large number of nodes at once. A value of 0 means no limit > will be enforced. > {quote} > The Namenode will only actively track up to 100 datanodes for decommissioning > at any given time, as to avoid Namenode memory pressure. > Looking into the "DatanodeAdminManager" code: > * a new datanode is only removed from the "tracked.nodes" set when it > finishes decommissioning > * a new datanode is only added to the "tracked.nodes" set if there is fewer > than 100 datanodes being tracked > So in the event that there are more than 100 datanodes being decommissioned > at a given time, some of those datanodes will not be in the "tracked.nodes" > set until 1 or more datanodes in the "tracked.nodes" finishes > decommissioning. This is generally not a problem because the datanodes in > "tracked.nodes" will eventually finish decommissioning, but there is an edge > case where this logic prevents the namenode from making any forward progress > towards decommissioning. > If all 100 datanodes in the "tracked.nodes" are unable to finish > decommissioning, then other datanodes (which may be able to be > decommissioned) will never get added to "tracked.nodes" and therefore will > never get the opportunity to be decommissioned. > This can occur due the following issue: > {quote}2021-10-21 12:39:24,048 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager > (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In > Progress. Cannot be safely decommissioned or be in maintenance since there is > risk of reduced data durability or data loss. Either restart the failed node > or force decommissioning or maintenance by removing, calling refreshNodes, > then re-adding to the excludes or host config files. > {quote} > If a Datanode is lost while decommissioning (for example if the underlying > hardware fails or is lost), then it will remain in state decommissioning > forever. > If 100 or more Datanodes are lost while decommissioning over the Hadoop > cluster lifetime, then this is enough to completely fill up the > "tracked.nodes" set. With the entire "tracked.nodes" set filled with > datanodes that can never finish decommissioning, any datanodes added after > this point will never be able to be decommissioned because they will never be > added to the "tracked.nodes" set. > In this scenario: > * the "tracked.nodes" set is filled with datanodes which are lost & cannot > be recovered (and can never finish decommissioning so they will never be > removed from the set) > * the actual live datanodes being decommissioned are enqueued waiting to > enter the "tracked.nodes" set (and are stuck waiting indefinitely) > This means that no progress towards decommissioning the live datanodes will > be made unless the user takes the following action: > {quote}Either restart the failed node or force decommissioning or maintenance > by removing, calling refreshNodes, then re-adding to the excludes or host > config files. > {quote} > Ideally, the Namenode should be able to gracefully handle scenarios where th