[jira] [Updated] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-1: -- Hadoop Flags: Reviewed > RBF: Refresh cacheNS when SocketException occurs > > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 > Environment: HDFS 3.3.0, Java 11 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Problem: > When active NameNode is restarted and loading fsimage, DFSRouters > significantly slow down. > Investigation: > When active NameNode is restarted and loading fsimage, RouterRpcClient > receives SocketException. Since > RouterRpcClient#isUnavailableException(IOException) returns false when the > argument is SocketException, the MembershipNameNodeResolver#cacheNS is not > refreshed. That's why the order of the NameNodes returned by > MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged > and the active NameNode is still returned first. Therefore RouterRpcClient > still tries to connect to the NameNode that is loading fsimage. > After loading the fsimage, the NameNode throws StandbyException. The > exception is one of the 'Unavailable Exception' and the cacheNS is refreshed. > Workaround: > Stop NameNode and wait 1 minute before starting NameNode instead of > restarting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs
[ https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-1: -- Affects Version/s: 3.3.1 3.4.0 > RBF: Refresh cacheNS when SocketException occurs > > > Key: HDFS-1 > URL: https://issues.apache.org/jira/browse/HDFS-1 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 > Environment: HDFS 3.3.0, Java 11 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Problem: > When active NameNode is restarted and loading fsimage, DFSRouters > significantly slow down. > Investigation: > When active NameNode is restarted and loading fsimage, RouterRpcClient > receives SocketException. Since > RouterRpcClient#isUnavailableException(IOException) returns false when the > argument is SocketException, the MembershipNameNodeResolver#cacheNS is not > refreshed. That's why the order of the NameNodes returned by > MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged > and the active NameNode is still returned first. Therefore RouterRpcClient > still tries to connect to the NameNode that is loading fsimage. > After loading the fsimage, the NameNode throws StandbyException. The > exception is one of the 'Unavailable Exception' and the cacheNS is refreshed. > Workaround: > Stop NameNode and wait 1 minute before starting NameNode instead of > restarting. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15574: -- Component/s: datanode > Remove unnecessary sort of block list in DirectoryScanner > - > > Key: HDFS-15574 > URL: https://issues.apache.org/jira/browse/HDFS-15574 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, > HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, > HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, > HDFS-15574.branch-3.3.002.patch > > > These lines of code in DirectoryScanner#scan(), obtain a snapshot of the > finalized blocks from memory, and then sort them, under the DN lock. However > the blocks are stored in a sorted structure (FoldedTreeSet) and hence the > sort should be unnecessary. > {code} > final List bl = dataset.getFinalizedBlocks(bpid); > Collections.sort(bl); // Sort based on blockId > {code} > This Jira removes the sort, and renames the getFinalizedBlocks to > getSortedFinalizedBlocks to make the intent of the method more clear. > Also added a test, just in case the underlying block structure is ever > changed to something unsorted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15574: -- Hadoop Flags: Reviewed > Remove unnecessary sort of block list in DirectoryScanner > - > > Key: HDFS-15574 > URL: https://issues.apache.org/jira/browse/HDFS-15574 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, > HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, > HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, > HDFS-15574.branch-3.3.002.patch > > > These lines of code in DirectoryScanner#scan(), obtain a snapshot of the > finalized blocks from memory, and then sort them, under the DN lock. However > the blocks are stored in a sorted structure (FoldedTreeSet) and hence the > sort should be unnecessary. > {code} > final List bl = dataset.getFinalizedBlocks(bpid); > Collections.sort(bl); // Sort based on blockId > {code} > This Jira removes the sort, and renames the getFinalizedBlocks to > getSortedFinalizedBlocks to make the intent of the method more clear. > Also added a test, just in case the underlying block structure is ever > changed to something unsorted. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15573) Only log warning if considerLoad and considerStorageType are both true
[ https://issues.apache.org/jira/browse/HDFS-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15573: -- Component/s: hdfs > Only log warning if considerLoad and considerStorageType are both true > -- > > Key: HDFS-15573 > URL: https://issues.apache.org/jira/browse/HDFS-15573 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15573.001.patch > > > When we implemented HDFS-15255, we added a log message to warn if both > dfs.namenode.read.considerLoad and dfs.namenode.read.considerStorageType were > set to true, as they cannot be used together. > Somehow, we failed to wrap the log message in an IF statement, so it is > always printed incorrectly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15607) Create trash dir when allowing snapshottable dir
[ https://issues.apache.org/jira/browse/HDFS-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15607: -- Hadoop Flags: Reviewed > Create trash dir when allowing snapshottable dir > > > Key: HDFS-15607 > URL: https://issues.apache.org/jira/browse/HDFS-15607 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > In {{TrashPolicyDefault}}, the {{.Trash}} directory will be created with > permission 700 (and without sticky bit) by the first user that moves a file > to the trash. This is an issue when other users try to move files to that > trash because they may not have the permission to move to that trash if the > trash root is shared. -- in this case, snapshottable directories. > This only affects users when trash is enabled inside snapshottable > directories ({{dfs.namenode.snapshot.trashroot.enabled}} set to true), and > when a user performing move to trash operations doesn't have admin > permissions. > Solution: Create a {{.Trash}} directory with 777 permission and sticky bits > enabled (similar solution as HDFS-10324). > Also need to deal with some corner cases: > 1. even when the snapshottable directory trash root config is not enabled > ({{dfs.namenode.snapshot.trashroot.enabled}} set to false), create the > {{.Trash}} directory anyway? Or should we ask the admin to provision trash > manually after enabling {{dfs.namenode.snapshot.trashroot.enabled}} on an > existing cluster? > - If the cluster is just upgraded, we need to provision trash manually anyway. > 2. When immediately disallowing trash, it shouldn't fail. just remove the > .Trash directory when disallowing snapshot on a dir if it is empty? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.
[ https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15596: -- Affects Version/s: 3.4.0 > ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, > progress, checksumOpt) should not be restricted to DFS only. > --- > > Key: HDFS-15596 > URL: https://issues.apache.org/jira/browse/HDFS-15596 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The ViewHDFS#create(f, permission, cflags, bufferSize, replication, > blockSize, progress, checksumOpt) API already available in FileSystem. It > will use other overloaded API and finally can go to ViewFileSystem. This case > works in regular ViewFileSystem also. With ViewHDFS, we restricted this to > DFS only which cause discp to fail when target is non hdfs as it's using this > API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.
[ https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15596: -- Component/s: hdfs-client > ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, > progress, checksumOpt) should not be restricted to DFS only. > --- > > Key: HDFS-15596 > URL: https://issues.apache.org/jira/browse/HDFS-15596 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The ViewHDFS#create(f, permission, cflags, bufferSize, replication, > blockSize, progress, checksumOpt) API already available in FileSystem. It > will use other overloaded API and finally can go to ViewFileSystem. This case > works in regular ViewFileSystem also. With ViewHDFS, we restricted this to > DFS only which cause discp to fail when target is non hdfs as it's using this > API. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15580) [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails
[ https://issues.apache.org/jira/browse/HDFS-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15580: -- Hadoop Flags: Reviewed > [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails > --- > > Key: HDFS-15580 > URL: https://issues.apache.org/jira/browse/HDFS-15580 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > DFSTestUtil#addDataNodeLayoutVersion uses reflection to update final > variables, however, it is not allowed in Java 12+. Please see > https://bugs.openjdk.java.net/browse/JDK-8210522 for the detail. > {noformat} > [ERROR] > testWithLayoutChangeAndFinalize(org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade) > Time elapsed: 11.159 s <<< ERROR! > java.lang.NoSuchFieldException: modifiers > at java.base/java.lang.Class.getDeclaredField(Class.java:2569) > at > org.apache.hadoop.hdfs.DFSTestUtil.addDataNodeLayoutVersion(DFSTestUtil.java:1961) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade.testWithLayoutChangeAndFinalize(TestDataNodeRollingUpgrade.java:364) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:832) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15580) [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails
[ https://issues.apache.org/jira/browse/HDFS-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15580: -- Affects Version/s: 3.4.0 > [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails > --- > > Key: HDFS-15580 > URL: https://issues.apache.org/jira/browse/HDFS-15580 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > DFSTestUtil#addDataNodeLayoutVersion uses reflection to update final > variables, however, it is not allowed in Java 12+. Please see > https://bugs.openjdk.java.net/browse/JDK-8210522 for the detail. > {noformat} > [ERROR] > testWithLayoutChangeAndFinalize(org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade) > Time elapsed: 11.159 s <<< ERROR! > java.lang.NoSuchFieldException: modifiers > at java.base/java.lang.Class.getDeclaredField(Class.java:2569) > at > org.apache.hadoop.hdfs.DFSTestUtil.addDataNodeLayoutVersion(DFSTestUtil.java:1961) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade.testWithLayoutChangeAndFinalize(TestDataNodeRollingUpgrade.java:364) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:832) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15608) Rename variable DistCp#CLEANUP
[ https://issues.apache.org/jira/browse/HDFS-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15608: -- Hadoop Flags: Reviewed > Rename variable DistCp#CLEANUP > -- > > Key: HDFS-15608 > URL: https://issues.apache.org/jira/browse/HDFS-15608 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15608.001.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The variables of Cleanup defined in the DistCp#main() method point to the > following: > public static void main(String argv[]) { > ... > Cleanup CLEANUP = new Cleanup(distCp); > ... > } > Here CLEANUP needs to be redefined, such as: cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15613) RBF: Router FSCK fails after HDFS-14442
[ https://issues.apache.org/jira/browse/HDFS-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15613: -- Hadoop Flags: Reviewed > RBF: Router FSCK fails after HDFS-14442 > --- > > Key: HDFS-15613 > URL: https://issues.apache.org/jira/browse/HDFS-15613 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.0 > Environment: HA is enabled >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > After HDFS-14442 fsck uses getHAServiceState operation to detect Active > NameNode, however, DFSRouter does not support the operation. > {noformat} > 20/10/05 16:41:30 DEBUG hdfs.HAUtil: Error while connecting to namenode > org.apache.hadoop.ipc.RemoteException(java.lang.UnsupportedOperationException): > Operation "getHAServiceState" is not supported > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.checkOperation(RouterRpcServer.java:488) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getHAServiceState(RouterClientProtocol.java:1773) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getHAServiceState(RouterRpcServer.java:1333) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getHAServiceState(ClientNamenodeProtocolServerSideTranslatorPB.java:2011) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) > at org.apache.hadoop.ipc.Client.call(Client.java:1508) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119) > at com.sun.proxy.$Proxy12.getHAServiceState(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getHAServiceState(ClientNamenodeProtocolTranslatorPB.java:2055) > at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:281) > at > org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:271) > at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:339) > at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:75) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:164) > at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:161) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845) > at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:160) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:409) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15621) Datanode DirectoryScanner uses excessive memory
[ https://issues.apache.org/jira/browse/HDFS-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15621: -- Hadoop Flags: Reviewed > Datanode DirectoryScanner uses excessive memory > --- > > Key: HDFS-15621 > URL: https://issues.apache.org/jira/browse/HDFS-15621 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot > 2020-10-09 at 15.20.56.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes > with a lot of blocks, this can mean a lot of heap. > We recently captured a heapdump of a DN with about 22M blocks and found only > about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken > by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to > strings. > Checking the strings in question, we can see two strings per scanInfo, > looking like: > {code} > /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785 > _106716708.meta > {code} > I will update a screen shot from MAT showing this. > For the first string especially, the part > "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will > be the same for every block in the block pool as the scanner is only > concerned about finalized blocks. > We can probably also store just the subdir indexes "28" and "27" rather than > "subdir28/subdir17" and then construct the path when it is requested via the > getter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15641: -- Component/s: datanode > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Critical > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15620) RBF: Fix test failures after HADOOP-17281
[ https://issues.apache.org/jira/browse/HDFS-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15620: -- Affects Version/s: 3.3.1 3.4.0 > RBF: Fix test failures after HADOOP-17281 > - > > Key: HDFS-15620 > URL: https://issues.apache.org/jira/browse/HDFS-15620 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > HADOOP-17281 added FileSystem.listStatusIterator API and added its contract > test cases. In RBF, the following tests are affected and they are now failing: > * hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatus > * hadoop.fs.contract.router.TestRouterHDFSContractRootDirectory > * hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatusSecure > * hadoop.fs.contract.router.web.TestRouterWebHDFSContractRootDirectory > * hadoop.fs.contract.router.TestRouterHDFSContractRootDirectorySecure -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode
[ https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15641: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 (was: 3.3.1, 3.4.0, 3.2.3) > DataNode could meet deadlock if invoke refreshNameNode > -- > > Key: HDFS-15641 > URL: https://issues.apache.org/jira/browse/HDFS-15641 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.2.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Critical > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, > HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log > > > DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes > hostname:50020` to register a new namespace in federation env. > The jstack is shown in jstack.log > The specific process is shown in Figure deadlock.png -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15657: -- Hadoop Flags: Reviewed > RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException > - > > Key: HDFS-15657 > URL: https://issues.apache.org/jira/browse/HDFS-15657 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > > Time Spent: 2h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > {noformat} > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter > [ERROR] > testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter) > Time elapsed: 1.04 s <<< ERROR! > org.apache.hadoop.service.ServiceStateException: java.net.BindException: > Problem binding to [0.0.0.0:] java.net.BindException: Address already in > use; For more details see: http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:174) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at >
[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15657: -- Affects Version/s: 3.3.1 3.4.0 > RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException > - > > Key: HDFS-15657 > URL: https://issues.apache.org/jira/browse/HDFS-15657 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > > Time Spent: 2h > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt > {noformat} > [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 > s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter > [ERROR] > testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter) > Time elapsed: 1.04 s <<< ERROR! > org.apache.hadoop.service.ServiceStateException: java.net.BindException: > Problem binding to [0.0.0.0:] java.net.BindException: Address already in > use; For more details see: http://wiki.apache.org/hadoop/BindException > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:174) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281) > at > org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > Caused by: java.net.BindException: Problem binding to [0.0.0.0:] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at >
[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails
[ https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15685: -- Hadoop Flags: Reviewed > [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS > fails > > > Key: HDFS-15685 > URL: https://issues.apache.org/jira/browse/HDFS-15685 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after > [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499]. > > {noformat} > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider > [ERROR] > testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider) > Time elapsed: 0.964 s <<< FAILURE! > java.lang.AssertionError: nn1 wasn't returned: > {host02.test/:8020=25, host01.test/:8020=25} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15680) Disable Broken Azure Junits
[ https://issues.apache.org/jira/browse/HDFS-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15680: -- Affects Version/s: 3.3.1 > Disable Broken Azure Junits > --- > > Key: HDFS-15680 > URL: https://issues.apache.org/jira/browse/HDFS-15680 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Ahmed Hussein >Assignee: Ahmed Hussein >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1 > > Time Spent: 20m > Remaining Estimate: 0h > > There are 6 test classes have been failing on Yetus for several months. > They contributed to more than 41 failing tests which makes reviewing Yetus > reports every a pain in the neck. Another point is to save the resources and > avoiding utilization of ports, memory, and CPU. > Over the last month, there was some effort to bring the Yetus back to a > stable state. However, there is no progress in addressing Azure failures. > Generally, I do not like to disable failing tests, but for this specific > case, I do not assume that it makes any sense to have 41 failing tests from > one module for several months. Whenever someone finds that those tests are > useful, then they can re-enable the tests on Yetus *_After_* the test is > fixed. > Following a PR, I have to review that my patch does not cause any failures > (include changing error messages in existing tests). A thorough review takes > a considerable amount of time browsing the nightly builds and Github reports. > So, please consider how much time is being spent to review those stack trace > over the last months. > Finally, this is one of the reasons developers tend to ignore the reports, > because it would take too much time to review; and by default, the errors are > considered irrelevant. > CC: [~aajisaka], [~elgoiri], [~weichiu], [~ayushtkn] > {code:bash} > hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked >hadoop.fs.azure.TestNativeAzureFileSystemMocked >hadoop.fs.azure.TestBlobMetadata >hadoop.fs.azure.TestNativeAzureFileSystemConcurrency >hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck >hadoop.fs.azure.TestNativeAzureFileSystemContractMocked >hadoop.fs.azure.TestWasbFsck >hadoop.fs.azure.TestOutOfBandAzureBlobOperations > {code} > {code:bash} > org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata > org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata > org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata > org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testNoTempBlobsVisible > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testLinkBlobs > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatusRootDir > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryMoveToExistingDirectory > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatus > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryAsExistingDirectory > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameToDirWithSamePrefixAllowed > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testLSRootDir > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testDeleteRecursively > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck.testWasbFsck > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testChineseCharactersFolderRename > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListingWithZeroByteRenameMetadata > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListing > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testUriEncoding > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testDeepFileCreation > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testListDirectory > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderRenameInProgress > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameImplicitFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testStoreDeleteFolder > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRename > org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatus >
[jira] [Updated] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception
[ https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15684: -- Affects Version/s: 3.4.0 > EC: Call recoverLease on DFSStripedOutputStream close exception > --- > > Key: HDFS-15684 > URL: https://issues.apache.org/jira/browse/HDFS-15684 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient, ec >Affects Versions: 3.4.0 >Reporter: Hongbing Wang >Assignee: Hongbing Wang >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15684.001.patch, HDFS-15684.002.patch, > HDFS-15684.003.patch > > > -HDFS-14694- add a feature that call recoverLease operation automatically > when DFSOutputSteam close encounters exception. When we wanted to apply this > feature to our cluster, we found that it does not support EC files. > I think this feature should take effect whether replica files or EC files. > This Jira proposes to make it effective when in the case of EC files. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails
[ https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15685: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS > fails > > > Key: HDFS-15685 > URL: https://issues.apache.org/jira/browse/HDFS-15685 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after > [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499]. > > {noformat} > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider > [ERROR] > testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider) > Time elapsed: 0.964 s <<< FAILURE! > java.lang.AssertionError: nn1 wasn't returned: > {host02.test/:8020=25, host01.test/:8020=25} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15689) allow/disallowSnapshot on EZ roots shouldn't fail due to trash provisioning/emptiness check
[ https://issues.apache.org/jira/browse/HDFS-15689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15689: -- Hadoop Flags: Reviewed > allow/disallowSnapshot on EZ roots shouldn't fail due to trash > provisioning/emptiness check > --- > > Key: HDFS-15689 > URL: https://issues.apache.org/jira/browse/HDFS-15689 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > h2. Background > 1. HDFS-15607 added a feature that when > {{dfs.namenode.snapshot.trashroot.enabled=true}}, allowSnapshot will > automatically create a .Trash directory immediately after allowSnapshot > operation so files deleted will be moved into the trash root inside the > snapshottable directory. > 2. HDFS-15539 prevents admins from disallowing snapshot if the trash root > inside is not empty > h2. Problem > 1. When {{dfs.namenode.snapshot.trashroot.enabled=true}}, currently if the > directory (to be allowed snapshot on) is an EZ root, it throws > {{FileAlreadyExistsException}} because the trash root already exists > (encryption zone has already created an internal trash root). > 2. Similarly, at the moment if we disallow snapshot on an EZ root, it may > complain that the trash root is not empty (or delete it if empty, which is > not desired since EZ will still need it). > h2. Solution > 1. Let allowSnapshot succeed by not throwing {{FileAlreadyExistsException}}, > but informs the admin that the trash already exists. > 2. Ignore {{checkTrashRootAndRemoveIfEmpty()}} check if path is EZ root. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails
[ https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15685: -- Component/s: test > [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS > fails > > > Key: HDFS-15685 > URL: https://issues.apache.org/jira/browse/HDFS-15685 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after > [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499]. > > {noformat} > [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 > s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider > [ERROR] > testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider) > Time elapsed: 0.964 s <<< FAILURE! > java.lang.AssertionError: nn1 wasn't returned: > {host02.test/:8020=25, host01.test/:8020=25} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15749) Make size of editPendingQ can be configurable
[ https://issues.apache.org/jira/browse/HDFS-15749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15749: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.0, 3.4.0 (was: 3.3.0, 3.4.0, 3.2.3) > Make size of editPendingQ can be configurable > - > > Key: HDFS-15749 > URL: https://issues.apache.org/jira/browse/HDFS-15749 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Baolong Mao >Assignee: Baolong Mao >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize
[ https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15725: -- Hadoop Flags: Reviewed > Lease Recovery never completes for a committed block which the DNs never > finalize > - > > Key: HDFS-15725 > URL: https://issues.apache.org/jira/browse/HDFS-15725 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, > HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, > HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch > > > It a very rare condition, the HDFS client process can get killed right at the > time it is completing a block / file. > The client sends the "complete" call to the namenode, moving the block into a > committed state, but it dies before it can send the final packet to the > Datanodes telling them to finalize the block. > This means the blocks are stuck on the datanodes in RBW state and nothing > will ever tell them to move out of that state. > The namenode / lease manager will retry forever to close the file, but it > will always complain it is waiting for blocks to reach minimal replication. > I have a simple test and patch to fix this, but I think it warrants some > discussion on whether this is the correct thing to do, or if I need to put > the fix behind a config switch. > My idea, is that if lease recovery occurs, and the block is still waiting on > "minimal replication", just put the file back to UNDER_CONSTRUCTION so that > on the next lease recovery attempt, BLOCK RECOVERY will happen, close the > file and move the replicas to FINALIZED. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support
[ https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15788: -- Hadoop Flags: Reviewed > Correct the statement for pmem cache to reflect cache persistence support > - > > Key: HDFS-15788 > URL: https://issues.apache.org/jira/browse/HDFS-15788 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch > > Time Spent: 40m > Remaining Estimate: 0h > > Correct the statement for pmem cache to reflect cache persistence support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Hadoop Flags: Reviewed > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Component/s: ipc > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ipc >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
[ https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15790: -- Affects Version/s: 3.3.1 3.4.0 > Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist > -- > > Key: HDFS-15790 > URL: https://issues.apache.org/jira/browse/HDFS-15790 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: David Mollitor >Assignee: Vinayakumar B >Priority: Critical > Labels: pull-request-available, release-blocker > Fix For: 3.3.1, 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive > project. This was not an awesome thing to do between minor versions in > regards to backwards compatibility for downstream projects. > Additionally, these two frameworks are not drop-in replacements, they have > some differences. Also, Protobuf 2 is not deprecated or anything so let us > have both protocols available at the same time. In Hadoop 4.x Protobuf 2 > support can be dropped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally
[ https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15796: -- Hadoop Flags: Reviewed > ConcurrentModificationException error happens on NameNode occasionally > -- > > Key: HDFS-15796 > URL: https://issues.apache.org/jira/browse/HDFS-15796 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Critical > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-15796-0001.patch > > > ConcurrentModificationException error happens on NameNode occasionally. > > {code:java} > 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor > thread received Runtime exception. | BlockManager.java:4746 > java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909) > at java.util.ArrayList$Itr.next(ArrayList.java:859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729) > at java.lang.Thread.run(Thread.java:748) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15798: -- Affects Version/s: 3.3.1 3.4.0 > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15798: -- Component/s: erasure-coding > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number
[ https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15798: -- Hadoop Flags: Reviewed > EC: Reconstruct task failed, and It would be XmitsInProgress of DN has > negative number > -- > > Key: HDFS-15798 > URL: https://issues.apache.org/jira/browse/HDFS-15798 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, > HDFS-15798.003.patch > > > The EC reconstruct task failed, and the decrementXmitsInProgress of > processErasureCodingTasks operation abnormal value ; > It would be XmitsInProgress of DN has negative number, it affects NN chooses > pending tasks based on the ratio between the lengths of replication and > erasure-coded block queues. > {code:java} > // 1.ErasureCodingWorker.java > public void processErasureCodingTasks( > Collection ecTasks) { > for (BlockECReconstructionInfo reconInfo : ecTasks) { > int xmitsSubmitted = 0; > try { > ... > // It may throw IllegalArgumentException from task#stripedReader > // constructor. > final StripedBlockReconstructor task = > new StripedBlockReconstructor(this, stripedReconInfo); > if (task.hasValidTargets()) { > // See HDFS-12044. We increase xmitsInProgress even the task is only > // enqueued, so that > // 1) NN will not send more tasks than what DN can execute and > // 2) DN will not throw away reconstruction tasks, and instead keeps > // an unbounded number of tasks in the executor's task queue. > xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1); > getDatanode().incrementXmitsInProcess(xmitsSubmitted); // task start > increment > stripedReconstructionPool.submit(task); > } else { > LOG.warn("No missing internal block. Skip reconstruction for task:{}", > reconInfo); > } > } catch (Throwable e) { > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task failed > decrement, XmitsInProgress is decremented by the previous value > LOG.warn("Failed to reconstruct striped block {}", > reconInfo.getExtendedBlock().getLocalBlock(), e); > } > } > } > // 2.StripedBlockReconstructor.java > public void run() { > try { > initDecoderIfNecessary(); >... > } catch (Throwable e) { > LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e); > getDatanode().getMetrics().incrECFailedReconstructionTasks(); > } finally { > float xmitWeight = getErasureCodingWorker().getXmitWeight(); > // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1 > // because if it set to zero, we cannot to measure the xmits submitted > int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1); > getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete > decrement > ... > } > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15818) Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig
[ https://issues.apache.org/jira/browse/HDFS-15818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15818: -- Hadoop Flags: Reviewed > Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig > --- > > Key: HDFS-15818 > URL: https://issues.apache.org/jira/browse/HDFS-15818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.2.4 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Current TestFsDatasetImpl.testReadLockCanBeDisabledByConfig is incorrect: > 1) Test fails intermittently as holder can acquire lock first > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2666/1/testReport/] > > 2) Test passes regardless of the setting of > DFS_DATANODE_LOCK_READ_WRITE_ENABLED_KEY -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15818) Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig
[ https://issues.apache.org/jira/browse/HDFS-15818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15818: -- Affects Version/s: 3.4.0 > Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig > --- > > Key: HDFS-15818 > URL: https://issues.apache.org/jira/browse/HDFS-15818 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.2.4 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Current TestFsDatasetImpl.testReadLockCanBeDisabledByConfig is incorrect: > 1) Test fails intermittently as holder can acquire lock first > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2666/1/testReport/] > > 2) Test passes regardless of the setting of > DFS_DATANODE_LOCK_READ_WRITE_ENABLED_KEY -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15836) RBF: Fix contract tests after HADOOP-13327
[ https://issues.apache.org/jira/browse/HDFS-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15836: -- Hadoop Flags: Reviewed > RBF: Fix contract tests after HADOOP-13327 > -- > > Key: HDFS-15836 > URL: https://issues.apache.org/jira/browse/HDFS-15836 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {noformat} > [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 19.094 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate) > Time elapsed: 0.102 s <<< FAILURE! > java.lang.AssertionError: Should not have capability: hflush in > FSDataOutputStream{wrappedStream=DFSOutputStream:block==null} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at > org.apache.hadoop.fs.contract.ContractTestUtils.assertCapabilities(ContractTestUtils.java:1553) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:497) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2696/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15819) Fix a codestyle issue for TestQuotaByStorageType
[ https://issues.apache.org/jira/browse/HDFS-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15819: -- Hadoop Flags: Reviewed > Fix a codestyle issue for TestQuotaByStorageType > > > Key: HDFS-15819 > URL: https://issues.apache.org/jira/browse/HDFS-15819 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Baolong Mao >Assignee: Baolong Mao >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15836) RBF: Fix contract tests after HADOOP-13327
[ https://issues.apache.org/jira/browse/HDFS-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15836: -- Affects Version/s: 3.3.1 3.4.0 > RBF: Fix contract tests after HADOOP-13327 > -- > > Key: HDFS-15836 > URL: https://issues.apache.org/jira/browse/HDFS-15836 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {noformat} > [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: > 19.094 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate) > Time elapsed: 0.102 s <<< FAILURE! > java.lang.AssertionError: Should not have capability: hflush in > FSDataOutputStream{wrappedStream=DFSOutputStream:block==null} > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertFalse(Assert.java:64) > at > org.apache.hadoop.fs.contract.ContractTestUtils.assertCapabilities(ContractTestUtils.java:1553) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:497) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2696/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15845) RBF: Router fails to start due to NoClassDefFoundError for hadoop-federation-balance
[ https://issues.apache.org/jira/browse/HDFS-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15845: -- Hadoop Flags: Reviewed > RBF: Router fails to start due to NoClassDefFoundError for > hadoop-federation-balance > > > Key: HDFS-15845 > URL: https://issues.apache.org/jira/browse/HDFS-15845 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > $ hdfs dfsrouter > ... > 2021-02-22 17:21:55,400 ERROR router.DFSRouter: Failed to start router > java.lang.NoClassDefFoundError: > org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.(RouterClientProtocol.java:195) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:394) > at > org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391) > at > org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) > at > org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedure > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 6 more > 2021-02-22 17:21:55,402 INFO util.ExitUtil: Exiting with status 1: > java.lang.NoClassDefFoundError: > org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure > 2021-02-22 17:21:55,404 INFO router.DFSRouter: SHUTDOWN_MSG: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15845) RBF: Router fails to start due to NoClassDefFoundError for hadoop-federation-balance
[ https://issues.apache.org/jira/browse/HDFS-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15845: -- Affects Version/s: 3.4.0 > RBF: Router fails to start due to NoClassDefFoundError for > hadoop-federation-balance > > > Key: HDFS-15845 > URL: https://issues.apache.org/jira/browse/HDFS-15845 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > $ hdfs dfsrouter > ... > 2021-02-22 17:21:55,400 ERROR router.DFSRouter: Failed to start router > java.lang.NoClassDefFoundError: > org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.(RouterClientProtocol.java:195) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:394) > at > org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391) > at > org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) > at > org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedure > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 6 more > 2021-02-22 17:21:55,402 INFO util.ExitUtil: Exiting with status 1: > java.lang.NoClassDefFoundError: > org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure > 2021-02-22 17:21:55,404 INFO router.DFSRouter: SHUTDOWN_MSG: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15845) RBF: Router fails to start due to NoClassDefFoundError for hadoop-federation-balance
[ https://issues.apache.org/jira/browse/HDFS-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15845: -- Component/s: rbf > RBF: Router fails to start due to NoClassDefFoundError for > hadoop-federation-balance > > > Key: HDFS-15845 > URL: https://issues.apache.org/jira/browse/HDFS-15845 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > $ hdfs dfsrouter > ... > 2021-02-22 17:21:55,400 ERROR router.DFSRouter: Failed to start router > java.lang.NoClassDefFoundError: > org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.(RouterClientProtocol.java:195) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:394) > at > org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391) > at > org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:165) > at > org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedure > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > ... 6 more > 2021-02-22 17:21:55,402 INFO util.ExitUtil: Exiting with status 1: > java.lang.NoClassDefFoundError: > org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure > 2021-02-22 17:21:55,404 INFO router.DFSRouter: SHUTDOWN_MSG: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15850) Superuser actions should be reported to external enforcers
[ https://issues.apache.org/jira/browse/HDFS-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15850: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Superuser actions should be reported to external enforcers > -- > > Key: HDFS-15850 > URL: https://issues.apache.org/jira/browse/HDFS-15850 > Project: Hadoop HDFS > Issue Type: Task > Components: security >Affects Versions: 3.3.0 >Reporter: Vivek Ratnavel Subramanian >Assignee: Vivek Ratnavel Subramanian >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15850.branch-3.3.001.patch, HDFS-15850.v1.patch, > HDFS-15850.v2.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Currently, HDFS superuser checks or actions are not reported to external > enforcers like Ranger and the audit report provided by such external enforces > are not complete and are missing the superuser actions. To fix this, add a > new method to "AccessControlEnforcer" for all superuser checks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15904) Flaky test TestBalancer#testBalancerWithSortTopNodes()
[ https://issues.apache.org/jira/browse/HDFS-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15904: -- Component/s: test > Flaky test TestBalancer#testBalancerWithSortTopNodes() > -- > > Key: HDFS-15904 > URL: https://issues.apache.org/jira/browse/HDFS-15904 > Project: Hadoop HDFS > Issue Type: Test > Components: balancer mover, test >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > TestBalancer#testBalancerWithSortTopNodes shows some flakes in around ~10 > runs or so. It's reproducible locally also. Basically, balancing either moves > 2 blocks of size 100+100 bytes or it moves 3 blocks of size 100+100+50 bytes > (2nd case causes flakies). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15895) DFSAdmin#printOpenFiles has redundant String#format usage
[ https://issues.apache.org/jira/browse/HDFS-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15895: -- Affects Version/s: 3.3.1 3.4.0 > DFSAdmin#printOpenFiles has redundant String#format usage > - > > Key: HDFS-15895 > URL: https://issues.apache.org/jira/browse/HDFS-15895 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15895) DFSAdmin#printOpenFiles has redundant String#format usage
[ https://issues.apache.org/jira/browse/HDFS-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15895: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 2.10.2, 3.3.1, 3.4.0 (was: 3.3.1, 3.4.0, 2.10.2, 3.2.3) > DFSAdmin#printOpenFiles has redundant String#format usage > - > > Key: HDFS-15895 > URL: https://issues.apache.org/jira/browse/HDFS-15895 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15895) DFSAdmin#printOpenFiles has redundant String#format usage
[ https://issues.apache.org/jira/browse/HDFS-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15895: -- Component/s: dfsadmin > DFSAdmin#printOpenFiles has redundant String#format usage > - > > Key: HDFS-15895 > URL: https://issues.apache.org/jira/browse/HDFS-15895 > Project: Hadoop HDFS > Issue Type: Task > Components: dfsadmin >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15904) Flaky test TestBalancer#testBalancerWithSortTopNodes()
[ https://issues.apache.org/jira/browse/HDFS-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15904: -- Affects Version/s: 3.4.0 > Flaky test TestBalancer#testBalancerWithSortTopNodes() > -- > > Key: HDFS-15904 > URL: https://issues.apache.org/jira/browse/HDFS-15904 > Project: Hadoop HDFS > Issue Type: Test > Components: balancer mover >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > TestBalancer#testBalancerWithSortTopNodes shows some flakes in around ~10 > runs or so. It's reproducible locally also. Basically, balancing either moves > 2 blocks of size 100+100 bytes or it moves 3 blocks of size 100+100+50 bytes > (2nd case causes flakies). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15911) Provide blocks moved count in Balancer iteration result
[ https://issues.apache.org/jira/browse/HDFS-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15911: -- Affects Version/s: 3.3.1 3.4.0 > Provide blocks moved count in Balancer iteration result > --- > > Key: HDFS-15911 > URL: https://issues.apache.org/jira/browse/HDFS-15911 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer mover >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Balancer provides Result for iteration and it contains info like exitStatus, > bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved > count from NameNodeConnector and print it with rest of details in > Result#print(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15907) Reduce Memory Overhead of AclFeature by avoiding AtomicInteger
[ https://issues.apache.org/jira/browse/HDFS-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15907: -- Hadoop Flags: Reviewed > Reduce Memory Overhead of AclFeature by avoiding AtomicInteger > -- > > Key: HDFS-15907 > URL: https://issues.apache.org/jira/browse/HDFS-15907 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15907.001.patch > > > In HDFS-15792 we made some changes to the AclFeature and ReferenceCountedMap > classes to address a rare bug when loading the FSImage in parallel. > One change we made was to replace an int inside AclFeature with an > AtomicInteger to avoid synchronising the methods in AclFeature. > Discussing this change with [~weichiu], he pointed out that while the > AclFeature cache is intended to reduce the count of AclFeature objects, on a > large cluster, it is possible for there to be many millions of AclFeature > objects. > Previously, the int will have taken 4 bytes of heap. > By moving to a AtomicInteger, we probably have an overhead of: > 4 bytes (or 8 if the heap is over 32GB) for a reference to the atomic long > object > 12 byte overhead for the java object > 4 bytes inside the atomic long to store an int. > > So the total heap overhead has gone from 4 bytes to 20 bytes just to use an > AtomicInteger. > Therefore I think it makes sense to remove the AtomicInteger and just > synchronise the methods of AclFeature where the value is incremented / > decremented / retrieved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15907) Reduce Memory Overhead of AclFeature by avoiding AtomicInteger
[ https://issues.apache.org/jira/browse/HDFS-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15907: -- Affects Version/s: 3.3.1 3.4.0 > Reduce Memory Overhead of AclFeature by avoiding AtomicInteger > -- > > Key: HDFS-15907 > URL: https://issues.apache.org/jira/browse/HDFS-15907 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15907.001.patch > > > In HDFS-15792 we made some changes to the AclFeature and ReferenceCountedMap > classes to address a rare bug when loading the FSImage in parallel. > One change we made was to replace an int inside AclFeature with an > AtomicInteger to avoid synchronising the methods in AclFeature. > Discussing this change with [~weichiu], he pointed out that while the > AclFeature cache is intended to reduce the count of AclFeature objects, on a > large cluster, it is possible for there to be many millions of AclFeature > objects. > Previously, the int will have taken 4 bytes of heap. > By moving to a AtomicInteger, we probably have an overhead of: > 4 bytes (or 8 if the heap is over 32GB) for a reference to the atomic long > object > 12 byte overhead for the java object > 4 bytes inside the atomic long to store an int. > > So the total heap overhead has gone from 4 bytes to 20 bytes just to use an > AtomicInteger. > Therefore I think it makes sense to remove the AtomicInteger and just > synchronise the methods of AclFeature where the value is incremented / > decremented / retrieved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters
[ https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15923: -- Affects Version/s: 3.4.0 > RBF: Authentication failed when rename accross sub clusters > > > Key: HDFS-15923 > URL: https://issues.apache.org/jira/browse/HDFS-15923 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: rbf >Affects Versions: 3.4.0 >Reporter: zhuobin zheng >Assignee: zhuobin zheng >Priority: Major > Labels: RBF, pull-request-available, rename > Fix For: 3.4.0 > > Attachments: HDFS-15923.001.patch, HDFS-15923.002.patch, > HDFS-15923.003.patch, HDFS-15923.stack-trace, > hdfs-15923-fix-security-issue.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > Rename accross subcluster with RBF and Kerberos environment. Will encounter > the following two errors: > # Save Object to journal. > # Precheck try to get src file status > So, we need use Router Login UGI doAs create DistcpProcedure and > TrashProcedure and submit Job. > > Beside, we should check user permission for src and dst path in router side > before do rename internal. (HDFS-15973) > First: Save Object to journal. > {code:java} > // code placeholder > 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception > encountered while connecting to the server > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) > at > org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) > at > org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636) > at org.apache.hadoop.ipc.Client.call(Client.java:1452) > at org.apache.hadoop.ipc.Client.call(Client.java:1405) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) > at com.sun.proxy.$Proxy11.create(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.create(Unknown Source) > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533) > at > org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at >
[jira] [Updated] (HDFS-15931) Fix non-static inner classes for better memory management
[ https://issues.apache.org/jira/browse/HDFS-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15931: -- Component/s: hdfs > Fix non-static inner classes for better memory management > - > > Key: HDFS-15931 > URL: https://issues.apache.org/jira/browse/HDFS-15931 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > If an inner class does not need to reference its enclosing instance, it can > be static. This prevents a common cause of memory leaks and uses less memory > per instance of the enclosing class. > Came across DataNodeProperties as a non static inner class defined in > MiniDFSCluster without holding any implicit reference to MiniDFSCluster. > Taking this opportunity to find other non-static inner classes that are not > holding implicit reference to their respective enclosing instances. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15926) Removed duplicate dependency of hadoop-annotations
[ https://issues.apache.org/jira/browse/HDFS-15926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15926: -- Component/s: hdfs > Removed duplicate dependency of hadoop-annotations > -- > > Key: HDFS-15926 > URL: https://issues.apache.org/jira/browse/HDFS-15926 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > hadoop-annotations is duplicated dependency in hadoop-hdfs as it is also > declared in parent hadoop-project-dist pom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15926) Removed duplicate dependency of hadoop-annotations
[ https://issues.apache.org/jira/browse/HDFS-15926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15926: -- Affects Version/s: 3.3.1 3.4.0 > Removed duplicate dependency of hadoop-annotations > -- > > Key: HDFS-15926 > URL: https://issues.apache.org/jira/browse/HDFS-15926 > Project: Hadoop HDFS > Issue Type: Task > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > hadoop-annotations is duplicated dependency in hadoop-hdfs as it is also > declared in parent hadoop-project-dist pom. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15931) Fix non-static inner classes for better memory management
[ https://issues.apache.org/jira/browse/HDFS-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15931: -- Affects Version/s: 3.3.1 3.4.0 > Fix non-static inner classes for better memory management > - > > Key: HDFS-15931 > URL: https://issues.apache.org/jira/browse/HDFS-15931 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > If an inner class does not need to reference its enclosing instance, it can > be static. This prevents a common cause of memory leaks and uses less memory > per instance of the enclosing class. > Came across DataNodeProperties as a non static inner class defined in > MiniDFSCluster without holding any implicit reference to MiniDFSCluster. > Taking this opportunity to find other non-static inner classes that are not > holding implicit reference to their respective enclosing instances. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15937) Reduce memory used during datanode layout upgrade
[ https://issues.apache.org/jira/browse/HDFS-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15937: -- Hadoop Flags: Reviewed > Reduce memory used during datanode layout upgrade > - > > Key: HDFS-15937 > URL: https://issues.apache.org/jira/browse/HDFS-15937 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: heap-dump-after.png, heap-dump-before.png > > Time Spent: 2h > Remaining Estimate: 0h > > When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), > we have found the datanode uses a lot more memory than usual. > For each volume, the blocks are scanned and a list is created holding a > series of LinkArgs objects. This object contains a File object for the block > source and destination. The file object stores the path as a string, eg: > /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta > /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825 > This is string is repeated for every block and meta file on the DN, and much > of the string is the same each time, leading to a large amount of memory. > If we change the linkArgs to store: > * Src Path without the block, eg > /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0 > * Dest Path without the block eg > /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10 > * Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta > Then ensure were reuse the same file object for repeated src and dest paths, > we can save most of the memory without reworking the logic of the code. > The current logic works along the source paths recursively, so you can easily > re-use the src path object. > For the destination path, there are only 32x32 (1024) distinct paths, so we > can simply cache them in a hashMap and lookup the re-useable object each time. > I tested locally by generating 100k block files and attempting the layout > upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That > is close to 1.5GB per 1M blocks. > After the change outlined above the same 100K blocks used about 20MB of heap, > so 200MB per million blocks. > A general DN sizing recommendation is 1GB of heap per 1M blocks, so the > upgrade should be able to happen within the pre-upgrade heap. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15940) Some tests in TestBlockRecovery are consistently failing
[ https://issues.apache.org/jira/browse/HDFS-15940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15940: -- Component/s: test > Some tests in TestBlockRecovery are consistently failing > > > Key: HDFS-15940 > URL: https://issues.apache.org/jira/browse/HDFS-15940 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > Some long running tests in TestBlockRecovery are consistently failing. Also, > TestBlockRecovery is huge with so many tests, we should refactor some of long > running and race condition specific tests to separate class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15942) Increase Quota initialization threads
[ https://issues.apache.org/jira/browse/HDFS-15942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15942: -- Affects Version/s: 3.3.1 3.4.0 > Increase Quota initialization threads > - > > Key: HDFS-15942 > URL: https://issues.apache.org/jira/browse/HDFS-15942 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15942.001.patch > > > On large namespaces, the quota initialization at started can take a long time > with the default 4 threads. Also on NN failover, often the quota needs to be > calculated before the failover can completed, delaying the failover. > I performed some benchmarks some time back on a large image (316M inodes 35GB > on disk), the quota load takes: > {code} > quota - 4 threads 39 seconds > quota - 8 threads 23 seconds > quota - 12 threads 20 seconds > quota - 16 threads 15 seconds > {code} > As the quota is calculated when the NN is starting up (and hence doing no > other work) or at failover time before the new standby becomes active, I > think the quota should use as many threads as possible. > I proposed we change the default to 8 or 12 on at least trunk and branch-3.3 > so we have a better default going forward. > Has anyone got any other thoughts? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15977) Call explicit_bzero only if it is available
[ https://issues.apache.org/jira/browse/HDFS-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15977: -- Hadoop Flags: Reviewed > Call explicit_bzero only if it is available > --- > > Key: HDFS-15977 > URL: https://issues.apache.org/jira/browse/HDFS-15977 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs++ >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > CentOS/RHEL 7 has glibc 2.17, and it does not support explicit_bzero. Now I > don't want to drop support for CentOS/RHEL 7, and we should call > explicit_bzero only if it is available. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15989) Split TestBalancer into two classes
[ https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15989: -- Component/s: balancer test > Split TestBalancer into two classes > --- > > Key: HDFS-15989 > URL: https://issues.apache.org/jira/browse/HDFS-15989 > Project: Hadoop HDFS > Issue Type: Task > Components: balancer, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > TestBalancer has many tests accumulated, it would be good to split it up into > two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should > also resolve it with this Jira. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15989) Split TestBalancer into two classes
[ https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15989: -- Affects Version/s: 3.3.1 3.4.0 > Split TestBalancer into two classes > --- > > Key: HDFS-15989 > URL: https://issues.apache.org/jira/browse/HDFS-15989 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > TestBalancer has many tests accumulated, it would be good to split it up into > two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should > also resolve it with this Jira. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15977) Call explicit_bzero only if it is available
[ https://issues.apache.org/jira/browse/HDFS-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15977: -- Affects Version/s: 3.3.2 3.4.0 > Call explicit_bzero only if it is available > --- > > Key: HDFS-15977 > URL: https://issues.apache.org/jira/browse/HDFS-15977 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs++ >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > CentOS/RHEL 7 has glibc 2.17, and it does not support explicit_bzero. Now I > don't want to drop support for CentOS/RHEL 7, and we should call > explicit_bzero only if it is available. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15989) Split TestBalancer into two classes
[ https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15989: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 3.3.1, 3.4.0 (was: 3.3.1, 3.4.0, 3.2.3) > Split TestBalancer into two classes > --- > > Key: HDFS-15989 > URL: https://issues.apache.org/jira/browse/HDFS-15989 > Project: Hadoop HDFS > Issue Type: Task > Components: balancer, test >Affects Versions: 3.3.1, 3.4.0 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > TestBalancer has many tests accumulated, it would be good to split it up into > two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should > also resolve it with this Jira. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes
[ https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16001: -- Affects Version/s: 3.3.1 3.4.0 > TestOfflineEditsViewer.testStored() fails reading negative value of > FSEditLogOpCodes > > > Key: HDFS-16001 > URL: https://issues.apache.org/jira/browse/HDFS-16001 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Konstantin Shvachko >Assignee: Akira Ajisaka >Priority: Blocker > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception > {noformat} > java.io.IOException: Op -54 has size -1314247195, but the minimum op size is > 17 > {noformat} > Seems like there is a corrupt record in {{editsStored}} file. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16007: -- Affects Version/s: 3.3.1 3.4.0 > Deserialization of ReplicaState should avoid throwing > ArrayIndexOutOfBoundsException > > > Key: HDFS-16007 > URL: https://issues.apache.org/jira/browse/HDFS-16007 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: junwen yang >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ReplicaState enum is using ordinal to conduct serialization and > deserialization, which is vulnerable to the order, to cause issues similar to > HDFS-15624. > To avoid it, either adding comments to let later developer not to change this > enum, or add index checking in the read and getState function to avoid index > out of bound error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16014) Fix an issue in checking native pmdk lib by 'hadoop checknative' command
[ https://issues.apache.org/jira/browse/HDFS-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16014: -- Hadoop Flags: Reviewed Target Version/s: 3.2.4, 3.4.0 (was: 3.4.0, 3.2.4) > Fix an issue in checking native pmdk lib by 'hadoop checknative' command > > > Key: HDFS-16014 > URL: https://issues.apache.org/jira/browse/HDFS-16014 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-16014-01.patch, HDFS-16014-02.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > In HDFS-14818, we proposed a patch to support checking native pmdk lib. The > expected target is to display hint to user regarding pmdk lib loaded state. > Recently, it was found that pmdk lib was not successfully loaded actually but > the `hadoop checknative` command still tells user that it was. This issue can > be reproduced by moving libpmem.so* from specified installed path to other > place, or directly deleting these libs, after the project is built. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16007: -- Component/s: hdfs > Deserialization of ReplicaState should avoid throwing > ArrayIndexOutOfBoundsException > > > Key: HDFS-16007 > URL: https://issues.apache.org/jira/browse/HDFS-16007 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: junwen yang >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ReplicaState enum is using ordinal to conduct serialization and > deserialization, which is vulnerable to the order, to cause issues similar to > HDFS-15624. > To avoid it, either adding comments to let later developer not to change this > enum, or add index checking in the read and getState function to avoid index > out of bound error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout
[ https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16046: -- Hadoop Flags: Reviewed > TestBalanceProcedureScheduler and TestDistCpProcedure timeout > - > > Key: HDFS-16046 > URL: https://issues.apache.org/jira/browse/HDFS-16046 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, > screenshot-2.png > > Time Spent: 40m > Remaining Estimate: 0h > > The following two tests timed out frequently in the qbt job. > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 6 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331) > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 3 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout
[ https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16046: -- Affects Version/s: 3.4.0 > TestBalanceProcedureScheduler and TestDistCpProcedure timeout > - > > Key: HDFS-16046 > URL: https://issues.apache.org/jira/browse/HDFS-16046 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, > screenshot-2.png > > Time Spent: 40m > Remaining Estimate: 0h > > The following two tests timed out frequently in the qbt job. > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 6 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331) > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/] > {quote}org.junit.runners.model.TestTimedOutException: test timed out after > 3 milliseconds > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:502) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220) > at > org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189) > at > org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16050) Some dynamometer tests fail
[ https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16050: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Some dynamometer tests fail > --- > > Key: HDFS-16050 > URL: https://issues.apache.org/jira/browse/HDFS-16050 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > The following tests failed: > {quote}hadoop.tools.dynamometer.TestDynamometerInfra > hadoop.tools.dynamometer.blockgenerator.TestBlockGen > hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt] > {quote}[ERROR] > testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator) > Time elapsed: 1.353 s <<< ERROR! > java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer > at > org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618) > at > org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16050) Some dynamometer tests fail
[ https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16050: -- Affects Version/s: 3.3.2 3.4.0 > Some dynamometer tests fail > --- > > Key: HDFS-16050 > URL: https://issues.apache.org/jira/browse/HDFS-16050 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h > Remaining Estimate: 0h > > The following tests failed: > {quote}hadoop.tools.dynamometer.TestDynamometerInfra > hadoop.tools.dynamometer.blockgenerator.TestBlockGen > hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator > {quote} > [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt] > {quote}[ERROR] > testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator) > Time elapsed: 1.353 s <<< ERROR! > java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer > at > org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618) > at > org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16075: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Use empty array constants present in StorageType and DatanodeInfo to avoid > creating redundant objects > - > > Key: HDFS-16075 > URL: https://issues.apache.org/jira/browse/HDFS-16075 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > StorageType and DatanodeInfo already provides empty array constants. We > should use them where possible in order to avoid creating unnecessary new > empty array objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result
[ https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16080: -- Component/s: rbf > RBF: Invoking method in all locations should break the loop after successful > result > --- > > Key: HDFS-16080 > URL: https://issues.apache.org/jira/browse/HDFS-16080 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > rename, delete and mkdir used by Router client usually calls multiple > locations if the path is present in multiple sub-clusters. After invoking > multiple concurrent proxy calls to multiple clients, we iterate through all > results and mark anyResult true if at least one of them was successful. We > should break the loop if one of the proxy call result was successful rather > than iterating over remaining calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16075: -- Component/s: hdfs > Use empty array constants present in StorageType and DatanodeInfo to avoid > creating redundant objects > - > > Key: HDFS-16075 > URL: https://issues.apache.org/jira/browse/HDFS-16075 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > StorageType and DatanodeInfo already provides empty array constants. We > should use them where possible in order to avoid creating unnecessary new > empty array objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16075: -- Affects Version/s: 3.3.2 3.4.0 > Use empty array constants present in StorageType and DatanodeInfo to avoid > creating redundant objects > - > > Key: HDFS-16075 > URL: https://issues.apache.org/jira/browse/HDFS-16075 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > StorageType and DatanodeInfo already provides empty array constants. We > should use them where possible in order to avoid creating unnecessary new > empty array objects. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result
[ https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16080: -- Affects Version/s: 3.3.2 3.4.0 > RBF: Invoking method in all locations should break the loop after successful > result > --- > > Key: HDFS-16080 > URL: https://issues.apache.org/jira/browse/HDFS-16080 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1.5h > Remaining Estimate: 0h > > rename, delete and mkdir used by Router client usually calls multiple > locations if the path is present in multiple sub-clusters. After invoking > multiple concurrent proxy calls to multiple clients, we iterate through all > results and mark anyResult true if at least one of them was successful. We > should break the loop if one of the proxy call result was successful rather > than iterating over remaining calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts
[ https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16090: -- Component/s: datanode > Fine grained locking for datanodeNetworkCounts > -- > > Key: HDFS-16090 > URL: https://issues.apache.org/jira/browse/HDFS-16090 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > While incrementing DataNode network error count, we lock entire LoadingCache > in order to increment network count of specific host. We should provide fine > grained concurrency for this update because locking entire cache is redundant > and could impact performance while incrementing network count for multiple > hosts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16082: -- Component/s: balancer > Avoid non-atomic operations on exceptionsSinceLastBalance and > failedTimesSinceLastSuccessfulBalance in Balancer > --- > > Key: HDFS-16082 > URL: https://issues.apache.org/jira/browse/HDFS-16082 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Balancer has introduced 2 volatile int as part of HDFS-13783 namely: > exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. > However, we are performing non-atomic operations on it. Since non-atomic > operations done here mostly depend on their previous values, we should use > AtomicInteger for both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16082: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Avoid non-atomic operations on exceptionsSinceLastBalance and > failedTimesSinceLastSuccessfulBalance in Balancer > --- > > Key: HDFS-16082 > URL: https://issues.apache.org/jira/browse/HDFS-16082 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Balancer has introduced 2 volatile int as part of HDFS-13783 namely: > exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. > However, we are performing non-atomic operations on it. Since non-atomic > operations done here mostly depend on their previous values, we should use > AtomicInteger for both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer
[ https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16082: -- Affects Version/s: 3.3.2 3.4.0 > Avoid non-atomic operations on exceptionsSinceLastBalance and > failedTimesSinceLastSuccessfulBalance in Balancer > --- > > Key: HDFS-16082 > URL: https://issues.apache.org/jira/browse/HDFS-16082 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Balancer has introduced 2 volatile int as part of HDFS-13783 namely: > exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. > However, we are performing non-atomic operations on it. Since non-atomic > operations done here mostly depend on their previous values, we should use > AtomicInteger for both. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts
[ https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16090: -- Affects Version/s: 3.3.2 3.4.0 > Fine grained locking for datanodeNetworkCounts > -- > > Key: HDFS-16090 > URL: https://issues.apache.org/jira/browse/HDFS-16090 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2.5h > Remaining Estimate: 0h > > While incrementing DataNode network error count, we lock entire LoadingCache > in order to increment network count of specific host. We should provide fine > grained concurrency for this update because locking entire cache is redundant > and could impact performance while incrementing network count for multiple > hosts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16092: -- Affects Version/s: 3.3.2 3.4.0 > Avoid creating LayoutFlags redundant objects > > > Key: HDFS-16092 > URL: https://issues.apache.org/jira/browse/HDFS-16092 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use LayoutFlags to represent features that EditLog/FSImage can support. > The utility helps write int (0) to given OutputStream and if EditLog/FSImage > supports Layout flags, they read the value from InputStream to confirm > whether there are unsupported feature flags (non zero int). However, we also > create and return new object of LayoutFlags, which is not used anywhere > because it's just a utility to read/write to/from given stream. We should > remove such redundant objects from getting created while reading from > InputStream using LayoutFlags#read utility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16092: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.2.3, 3.4.0 (was: 3.4.0, 3.2.3, 3.3.2) > Avoid creating LayoutFlags redundant objects > > > Key: HDFS-16092 > URL: https://issues.apache.org/jira/browse/HDFS-16092 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use LayoutFlags to represent features that EditLog/FSImage can support. > The utility helps write int (0) to given OutputStream and if EditLog/FSImage > supports Layout flags, they read the value from InputStream to confirm > whether there are unsupported feature flags (non zero int). However, we also > create and return new object of LayoutFlags, which is not used anywhere > because it's just a utility to read/write to/from given stream. We should > remove such redundant objects from getting created while reading from > InputStream using LayoutFlags#read utility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects
[ https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16092: -- Component/s: hdfs > Avoid creating LayoutFlags redundant objects > > > Key: HDFS-16092 > URL: https://issues.apache.org/jira/browse/HDFS-16092 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We use LayoutFlags to represent features that EditLog/FSImage can support. > The utility helps write int (0) to given OutputStream and if EditLog/FSImage > supports Layout flags, they read the value from InputStream to confirm > whether there are unsupported feature flags (non zero int). However, we also > create and return new object of LayoutFlags, which is not used anywhere > because it's just a utility to read/write to/from given stream. We should > remove such redundant objects from getting created while reading from > InputStream using LayoutFlags#read utility. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.
[ https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16127: -- Component/s: hdfs > Improper pipeline close recovery causes a permanent write failure or data > loss. > --- > > Key: HDFS-16127 > URL: https://issues.apache.org/jira/browse/HDFS-16127 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Major > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-16127.patch > > > When a block is being closed, the data streamer in the client waits for the > final ACK to be delivered. If an exception is received during this wait, the > close is retried. This assumption has become invalid by HDFS-15813, resulting > in permanent write failures in some close error cases involving slow nodes. > There are also less frequent cases of data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.
[ https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16127: -- Affects Version/s: 3.3.2 3.4.0 > Improper pipeline close recovery causes a permanent write failure or data > loss. > --- > > Key: HDFS-16127 > URL: https://issues.apache.org/jira/browse/HDFS-16127 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.2 >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Major > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-16127.patch > > > When a block is being closed, the data streamer in the client waits for the > final ACK to be delivered. If an exception is received during this wait, the > close is retried. This assumption has become invalid by HDFS-15813, resulting > in permanent write failures in some close error cases involving slow nodes. > There are also less frequent cases of data loss. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16140) TestBootstrapAliasmap fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16140: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > TestBootstrapAliasmap fails by BindException > > > Key: HDFS-16140 > URL: https://issues.apache.org/jira/browse/HDFS-16140 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 50m > Remaining Estimate: 0h > > TestBootstrapAliasmap fails if 50200 port is already in use. > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3227/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > {quote} > [ERROR] > testAliasmapBootstrap(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap) > Time elapsed: 0.472 s <<< ERROR! > java.net.BindException: Problem binding to [0.0.0.0:50200] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:914) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:810) > at org.apache.hadoop.ipc.Server.bind(Server.java:642) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1301) > at org.apache.hadoop.ipc.Server.(Server.java:3199) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:464) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853) > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.start(InMemoryLevelDBAliasMapServer.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startAliasMapServerIfNecessary(NameNode.java:801) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1378) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1147) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1020) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:952) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap.setup(TestBootstrapAliasmap.java:56) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16139) Update BPServiceActor Scheduler's nextBlockReportTime atomically
[ https://issues.apache.org/jira/browse/HDFS-16139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16139: -- Component/s: datanode > Update BPServiceActor Scheduler's nextBlockReportTime atomically > > > Key: HDFS-16139 > URL: https://issues.apache.org/jira/browse/HDFS-16139 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > BPServiceActor#Scheduler's nextBlockReportTime is declared volatile and it > can be assigned/read by testing threads (through BPServiceActor#triggerXXX) > as well as by actor threads. Hence it is declared volatile but it is still > assigned non-atomically > e.g > {code:java} > if (factor != 0) { > nextBlockReportTime += factor * blockReportIntervalMs; > } else { > // If the difference between the present time and the scheduled > // time is very less, the factor can be 0, so in that case, we can > // ignore that negligible time, spent while sending the BRss and > // schedule the next BR after the blockReportInterval. > nextBlockReportTime += blockReportIntervalMs; > } > {code} > We should convert it to AtomicLong to take care of concurrent assignments > while making sure that it is assigned atomically. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16139) Update BPServiceActor Scheduler's nextBlockReportTime atomically
[ https://issues.apache.org/jira/browse/HDFS-16139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16139: -- Affects Version/s: 3.3.5 3.4.0 > Update BPServiceActor Scheduler's nextBlockReportTime atomically > > > Key: HDFS-16139 > URL: https://issues.apache.org/jira/browse/HDFS-16139 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 3.4.0, 3.3.5 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > BPServiceActor#Scheduler's nextBlockReportTime is declared volatile and it > can be assigned/read by testing threads (through BPServiceActor#triggerXXX) > as well as by actor threads. Hence it is declared volatile but it is still > assigned non-atomically > e.g > {code:java} > if (factor != 0) { > nextBlockReportTime += factor * blockReportIntervalMs; > } else { > // If the difference between the present time and the scheduled > // time is very less, the factor can be 0, so in that case, we can > // ignore that negligible time, spent while sending the BRss and > // schedule the next BR after the blockReportInterval. > nextBlockReportTime += blockReportIntervalMs; > } > {code} > We should convert it to AtomicLong to take care of concurrent assignments > while making sure that it is assigned atomically. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16140) TestBootstrapAliasmap fails by BindException
[ https://issues.apache.org/jira/browse/HDFS-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16140: -- Affects Version/s: 3.3.2 3.4.0 > TestBootstrapAliasmap fails by BindException > > > Key: HDFS-16140 > URL: https://issues.apache.org/jira/browse/HDFS-16140 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 50m > Remaining Estimate: 0h > > TestBootstrapAliasmap fails if 50200 port is already in use. > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3227/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > {quote} > [ERROR] > testAliasmapBootstrap(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap) > Time elapsed: 0.472 s <<< ERROR! > java.net.BindException: Problem binding to [0.0.0.0:50200] > java.net.BindException: Address already in use; For more details see: > http://wiki.apache.org/hadoop/BindException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:914) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:810) > at org.apache.hadoop.ipc.Server.bind(Server.java:642) > at org.apache.hadoop.ipc.Server$Listener.(Server.java:1301) > at org.apache.hadoop.ipc.Server.(Server.java:3199) > at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:464) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371) > at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853) > at > org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.start(InMemoryLevelDBAliasMapServer.java:98) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.startAliasMapServerIfNecessary(NameNode.java:801) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1378) > at > org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1147) > at > org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1020) > at > org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:952) > at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576) > at > org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap.setup(TestBootstrapAliasmap.java:56) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16143) TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits is flaky
[ https://issues.apache.org/jira/browse/HDFS-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16143: -- Affects Version/s: 3.4.0 > TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits is flaky > - > > Key: HDFS-16143 > URL: https://issues.apache.org/jira/browse/HDFS-16143 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0 >Reporter: Akira Ajisaka >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > > Time Spent: 10h 50m > Remaining Estimate: 0h > > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3229/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > {quote} > [ERROR] > testStandbyTriggersLogRollsWhenTailInProgressEdits[0](org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer) > Time elapsed: 6.862 s <<< FAILURE! > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:87) > at org.junit.Assert.assertTrue(Assert.java:42) > at org.junit.Assert.assertTrue(Assert.java:53) > at > org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer.testStandbyTriggersLogRollsWhenTailInProgressEdits(TestEditLogTailer.java:444) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)
[ https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16144: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > Revert HDFS-15372 (Files in snapshots no longer see attribute provider > permissions) > --- > > Key: HDFS-16144 > URL: https://issues.apache.org/jira/browse/HDFS-16144 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, > HDFS-16144.003.patch, HDFS-16144.004.patch > > > In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. > When a user accesses a file in a snapshot, if an attribute provider is > configured it would see the original file path (ie no .snapshot folder) in > Hadoop 2, but it would see the snapshot path in Hadoop 3. > HDFS-15372 changed this back, but I noted at the time it may make sense for > the provider to see the actual snapshot path instead. > Recently we discovered HDFS-16132 where the HDFS-15372 does not work > correctly. At this stage I believe it is better to revert HDFS-15372 as the > fix to this issue is probably not trivial and allow providers to see the > actual path the user accessed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)
[ https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16144: -- Component/s: namenode > Revert HDFS-15372 (Files in snapshots no longer see attribute provider > permissions) > --- > > Key: HDFS-16144 > URL: https://issues.apache.org/jira/browse/HDFS-16144 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0, 3.3.2 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, > HDFS-16144.003.patch, HDFS-16144.004.patch > > > In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. > When a user accesses a file in a snapshot, if an attribute provider is > configured it would see the original file path (ie no .snapshot folder) in > Hadoop 2, but it would see the snapshot path in Hadoop 3. > HDFS-15372 changed this back, but I noted at the time it may make sense for > the provider to see the actual snapshot path instead. > Recently we discovered HDFS-16132 where the HDFS-15372 does not work > correctly. At this stage I believe it is better to revert HDFS-15372 as the > fix to this issue is probably not trivial and allow providers to see the > actual path the user accessed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.
[ https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16157: -- Hadoop Flags: Reviewed > Support configuring DNS record to get list of journal nodes. > > > Key: HDFS-16157 > URL: https://issues.apache.org/jira/browse/HDFS-16157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of journal nodes, so we > don't have to reconfigure everything journal node hostname is changed. For > example, in some containerized environment the hostname of journal nodes can > change pretty often. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.
[ https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16157: -- Fix Version/s: 3.4.0 > Support configuring DNS record to get list of journal nodes. > > > Key: HDFS-16157 > URL: https://issues.apache.org/jira/browse/HDFS-16157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of journal nodes, so we > don't have to reconfigure everything journal node hostname is changed. For > example, in some containerized environment the hostname of journal nodes can > change pretty often. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)
[ https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16144: -- Affects Version/s: 3.3.2 3.4.0 > Revert HDFS-15372 (Files in snapshots no longer see attribute provider > permissions) > --- > > Key: HDFS-16144 > URL: https://issues.apache.org/jira/browse/HDFS-16144 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0, 3.3.2 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.4.0, 3.3.2 > > Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, > HDFS-16144.003.patch, HDFS-16144.004.patch > > > In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. > When a user accesses a file in a snapshot, if an attribute provider is > configured it would see the original file path (ie no .snapshot folder) in > Hadoop 2, but it would see the snapshot path in Hadoop 3. > HDFS-15372 changed this back, but I noted at the time it may make sense for > the provider to see the actual snapshot path instead. > Recently we discovered HDFS-16132 where the HDFS-15372 does not work > correctly. At this stage I believe it is better to revert HDFS-15372 as the > fix to this issue is probably not trivial and allow providers to see the > actual path the user accessed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.
[ https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16157: -- Affects Version/s: 3.4.0 > Support configuring DNS record to get list of journal nodes. > > > Key: HDFS-16157 > URL: https://issues.apache.org/jira/browse/HDFS-16157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node >Affects Versions: 3.4.0 >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > We can use a DNS round-robin record to configure list of journal nodes, so we > don't have to reconfigure everything journal node hostname is changed. For > example, in some containerized environment the hostname of journal nodes can > change pretty often. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16184) De-flake TestBlockScanner#testSkipRecentAccessFile
[ https://issues.apache.org/jira/browse/HDFS-16184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16184: -- Affects Version/s: 3.3.2 3.4.0 > De-flake TestBlockScanner#testSkipRecentAccessFile > -- > > Key: HDFS-16184 > URL: https://issues.apache.org/jira/browse/HDFS-16184 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Test TestBlockScanner#testSkipRecentAccessFile is flaky: > > {code:java} > [ERROR] > testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner) > Time elapsed: 3.936 s <<< FAILURE![ERROR] > testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner) > Time elapsed: 3.936 s <<< FAILURE!java.lang.AssertionError: Scan nothing > for all files are accessed in last period. at > org.junit.Assert.fail(Assert.java:89) at > org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testSkipRecentAccessFile(TestBlockScanner.java:1015) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > {code} > e.g > [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3235/37/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16171) De-flake testDecommissionStatus
[ https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-16171: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.2.3, 2.10.2, 3.4.0 (was: 3.4.0, 2.10.2, 3.2.3, 3.3.2) > De-flake testDecommissionStatus > --- > > Key: HDFS-16171 > URL: https://issues.apache.org/jira/browse/HDFS-16171 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: 3.4.0, 3.3.2 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2, 3.2.4 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > testDecommissionStatus keeps failing intermittently. > {code:java} > [ERROR] > testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor) > Time elapsed: 3.299 s <<< FAILURE! > java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> > but was:<3> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169) > at > org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org