[jira] [Updated] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-1:
--
Hadoop Flags: Reviewed

> RBF: Refresh cacheNS when SocketException occurs
> 
>
> Key: HDFS-1
> URL: https://issues.apache.org/jira/browse/HDFS-1
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.1, 3.4.0
> Environment: HDFS 3.3.0, Java 11
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Problem:
> When active NameNode is restarted and loading fsimage, DFSRouters 
> significantly slow down.
> Investigation:
> When active NameNode is restarted and loading fsimage, RouterRpcClient 
> receives SocketException. Since 
> RouterRpcClient#isUnavailableException(IOException) returns false when the 
> argument is SocketException, the MembershipNameNodeResolver#cacheNS is not 
> refreshed. That's why the order of the NameNodes returned by 
> MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged 
> and the active NameNode is still returned first. Therefore RouterRpcClient 
> still tries to connect to the NameNode that is loading fsimage.
> After loading the fsimage, the NameNode throws StandbyException. The 
> exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.
> Workaround:
> Stop NameNode and wait 1 minute before starting NameNode instead of 
> restarting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-1:
--
Affects Version/s: 3.3.1
   3.4.0

> RBF: Refresh cacheNS when SocketException occurs
> 
>
> Key: HDFS-1
> URL: https://issues.apache.org/jira/browse/HDFS-1
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.1, 3.4.0
> Environment: HDFS 3.3.0, Java 11
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Problem:
> When active NameNode is restarted and loading fsimage, DFSRouters 
> significantly slow down.
> Investigation:
> When active NameNode is restarted and loading fsimage, RouterRpcClient 
> receives SocketException. Since 
> RouterRpcClient#isUnavailableException(IOException) returns false when the 
> argument is SocketException, the MembershipNameNodeResolver#cacheNS is not 
> refreshed. That's why the order of the NameNodes returned by 
> MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged 
> and the active NameNode is still returned first. Therefore RouterRpcClient 
> still tries to connect to the NameNode that is loading fsimage.
> After loading the fsimage, the NameNode throws StandbyException. The 
> exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.
> Workaround:
> Stop NameNode and wait 1 minute before starting NameNode instead of 
> restarting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15574:
--
Component/s: datanode

> Remove unnecessary sort of block list in DirectoryScanner
> -
>
> Key: HDFS-15574
> URL: https://issues.apache.org/jira/browse/HDFS-15574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, 
> HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, 
> HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, 
> HDFS-15574.branch-3.3.002.patch
>
>
> These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> This Jira removes the sort, and renames the getFinalizedBlocks to 
> getSortedFinalizedBlocks to make the intent of the method more clear.
> Also added a test, just in case the underlying block structure is ever 
> changed to something unsorted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15574) Remove unnecessary sort of block list in DirectoryScanner

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15574:
--
Hadoop Flags: Reviewed

> Remove unnecessary sort of block list in DirectoryScanner
> -
>
> Key: HDFS-15574
> URL: https://issues.apache.org/jira/browse/HDFS-15574
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15574.001.patch, HDFS-15574.002.patch, 
> HDFS-15574.003.patch, HDFS-15574.branch-3.2.001.patch, 
> HDFS-15574.branch-3.2.002.patch, HDFS-15574.branch-3.3.001.patch, 
> HDFS-15574.branch-3.3.002.patch
>
>
> These lines of code in DirectoryScanner#scan(), obtain a snapshot of the 
> finalized blocks from memory, and then sort them, under the DN lock. However 
> the blocks are stored in a sorted structure (FoldedTreeSet) and hence the 
> sort should be unnecessary.
> {code}
>   final List bl = dataset.getFinalizedBlocks(bpid);
>   Collections.sort(bl); // Sort based on blockId
> {code}
> This Jira removes the sort, and renames the getFinalizedBlocks to 
> getSortedFinalizedBlocks to make the intent of the method more clear.
> Also added a test, just in case the underlying block structure is ever 
> changed to something unsorted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15573) Only log warning if considerLoad and considerStorageType are both true

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15573:
--
Component/s: hdfs

> Only log warning if considerLoad and considerStorageType are both true
> --
>
> Key: HDFS-15573
> URL: https://issues.apache.org/jira/browse/HDFS-15573
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15573.001.patch
>
>
> When we implemented HDFS-15255, we added a log message to warn if both 
> dfs.namenode.read.considerLoad and dfs.namenode.read.considerStorageType were 
> set to true, as they cannot be used together.
> Somehow, we failed to wrap the log message in an IF statement, so it is 
> always printed incorrectly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15607) Create trash dir when allowing snapshottable dir

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15607:
--
Hadoop Flags: Reviewed

> Create trash dir when allowing snapshottable dir
> 
>
> Key: HDFS-15607
> URL: https://issues.apache.org/jira/browse/HDFS-15607
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In {{TrashPolicyDefault}}, the {{.Trash}} directory will be created with 
> permission 700 (and without sticky bit) by the first user that moves a file 
> to the trash. This is an issue when other users try to move files to that 
> trash because they may not have the permission to move to that trash if the 
> trash root is shared. -- in this case, snapshottable directories.
> This only affects users when trash is enabled inside snapshottable 
> directories ({{dfs.namenode.snapshot.trashroot.enabled}} set to true), and 
> when a user performing move to trash operations doesn't have admin 
> permissions.
> Solution: Create a {{.Trash}} directory with 777 permission and sticky bits 
> enabled (similar solution as HDFS-10324).
> Also need to deal with some corner cases:
> 1. even when the snapshottable directory trash root config is not enabled 
> ({{dfs.namenode.snapshot.trashroot.enabled}} set to false), create the 
> {{.Trash}} directory anyway? Or should we ask the admin to provision trash 
> manually after enabling {{dfs.namenode.snapshot.trashroot.enabled}} on an 
> existing cluster?
> - If the cluster is just upgraded, we need to provision trash manually anyway.
> 2. When immediately disallowing trash, it shouldn't fail. just remove the 
> .Trash directory when disallowing snapshot on a dir if it is empty?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15596:
--
Affects Version/s: 3.4.0

> ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, 
> progress, checksumOpt) should not be restricted to DFS only.
> ---
>
> Key: HDFS-15596
> URL: https://issues.apache.org/jira/browse/HDFS-15596
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The ViewHDFS#create(f, permission, cflags, bufferSize, replication, 
> blockSize, progress, checksumOpt) API already available in FileSystem. It 
> will use other overloaded API and finally can go to ViewFileSystem. This case 
> works in regular ViewFileSystem also. With ViewHDFS, we restricted this to 
> DFS only which cause discp to fail when target is non hdfs as it's using this 
> API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15596) ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, progress, checksumOpt) should not be restricted to DFS only.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15596:
--
Component/s: hdfs-client

> ViewHDFS#create(f, permission, cflags, bufferSize, replication, blockSize, 
> progress, checksumOpt) should not be restricted to DFS only.
> ---
>
> Key: HDFS-15596
> URL: https://issues.apache.org/jira/browse/HDFS-15596
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The ViewHDFS#create(f, permission, cflags, bufferSize, replication, 
> blockSize, progress, checksumOpt) API already available in FileSystem. It 
> will use other overloaded API and finally can go to ViewFileSystem. This case 
> works in regular ViewFileSystem also. With ViewHDFS, we restricted this to 
> DFS only which cause discp to fail when target is non hdfs as it's using this 
> API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15580) [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15580:
--
Hadoop Flags: Reviewed

> [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails
> ---
>
> Key: HDFS-15580
> URL: https://issues.apache.org/jira/browse/HDFS-15580
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> DFSTestUtil#addDataNodeLayoutVersion uses reflection to update final 
> variables, however, it is not allowed in Java 12+. Please see 
> https://bugs.openjdk.java.net/browse/JDK-8210522 for the detail.
> {noformat}
> [ERROR] 
> testWithLayoutChangeAndFinalize(org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade)
>   Time elapsed: 11.159 s  <<< ERROR!
> java.lang.NoSuchFieldException: modifiers
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2569)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.addDataNodeLayoutVersion(DFSTestUtil.java:1961)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade.testWithLayoutChangeAndFinalize(TestDataNodeRollingUpgrade.java:364)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15580) [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15580:
--
Affects Version/s: 3.4.0

> [JDK 12] DFSTestUtil#addDataNodeLayoutVersion fails
> ---
>
> Key: HDFS-15580
> URL: https://issues.apache.org/jira/browse/HDFS-15580
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> DFSTestUtil#addDataNodeLayoutVersion uses reflection to update final 
> variables, however, it is not allowed in Java 12+. Please see 
> https://bugs.openjdk.java.net/browse/JDK-8210522 for the detail.
> {noformat}
> [ERROR] 
> testWithLayoutChangeAndFinalize(org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade)
>   Time elapsed: 11.159 s  <<< ERROR!
> java.lang.NoSuchFieldException: modifiers
>   at java.base/java.lang.Class.getDeclaredField(Class.java:2569)
>   at 
> org.apache.hadoop.hdfs.DFSTestUtil.addDataNodeLayoutVersion(DFSTestUtil.java:1961)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeRollingUpgrade.testWithLayoutChangeAndFinalize(TestDataNodeRollingUpgrade.java:364)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>   at java.base/java.lang.Thread.run(Thread.java:832)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15608) Rename variable DistCp#CLEANUP

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15608:
--
Hadoop Flags: Reviewed

> Rename variable DistCp#CLEANUP
> --
>
> Key: HDFS-15608
> URL: https://issues.apache.org/jira/browse/HDFS-15608
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15608.001.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The variables of Cleanup defined in the DistCp#main() method point to the 
> following:
> public static void main(String argv[]) {
>  ...
>  Cleanup CLEANUP = new Cleanup(distCp);
>  ...
>  }
> Here CLEANUP needs to be redefined, such as: cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15613) RBF: Router FSCK fails after HDFS-14442

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15613:
--
Hadoop Flags: Reviewed

> RBF: Router FSCK fails after HDFS-14442
> ---
>
> Key: HDFS-15613
> URL: https://issues.apache.org/jira/browse/HDFS-15613
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.0
> Environment: HA is enabled
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> After HDFS-14442 fsck uses getHAServiceState operation to detect Active 
> NameNode, however, DFSRouter does not support the operation.
> {noformat}
> 20/10/05 16:41:30 DEBUG hdfs.HAUtil: Error while connecting to namenode
> org.apache.hadoop.ipc.RemoteException(java.lang.UnsupportedOperationException):
>  Operation "getHAServiceState" is not supported
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.checkOperation(RouterRpcServer.java:488)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.getHAServiceState(RouterClientProtocol.java:1773)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getHAServiceState(RouterRpcServer.java:1333)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getHAServiceState(ClientNamenodeProtocolServerSideTranslatorPB.java:2011)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1508)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1405)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119)
>   at com.sun.proxy.$Proxy12.getHAServiceState(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getHAServiceState(ClientNamenodeProtocolTranslatorPB.java:2055)
>   at org.apache.hadoop.hdfs.HAUtil.getAddressOfActive(HAUtil.java:281)
>   at 
> org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:271)
>   at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:339)
>   at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:75)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:164)
>   at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:161)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)
>   at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:160)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>   at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:409)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15621) Datanode DirectoryScanner uses excessive memory

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15621:
--
Hadoop Flags: Reviewed

> Datanode DirectoryScanner uses excessive memory
> ---
>
> Key: HDFS-15621
> URL: https://issues.apache.org/jira/browse/HDFS-15621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot 
> 2020-10-09 at 15.20.56.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes 
> with a lot of blocks, this can mean a lot of heap.
> We recently captured a heapdump of a DN with about 22M blocks and found only 
> about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken 
> by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to 
> strings.
> Checking the strings in question, we can see two strings per scanInfo, 
> looking like:
> {code}
> /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785
> _106716708.meta
> {code}
> I will update a screen shot from MAT showing this.
> For the first string especially, the part 
> "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will 
> be the same for every block in the block pool as the scanner is only 
> concerned about finalized blocks.
> We can probably also store just the subdir indexes "28" and "27" rather than 
> "subdir28/subdir17" and then construct the path when it is requested via the 
> getter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15641:
--
Component/s: datanode

> DataNode could meet deadlock if invoke refreshNameNode
> --
>
> Key: HDFS-15641
> URL: https://issues.apache.org/jira/browse/HDFS-15641
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Critical
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, 
> HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log
>
>
> DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes 
> hostname:50020` to register a new namespace in federation env.
> The jstack is shown in jstack.log
>  The specific process is shown in Figure deadlock.png



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15620) RBF: Fix test failures after HADOOP-17281

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15620:
--
Affects Version/s: 3.3.1
   3.4.0

> RBF: Fix test failures after HADOOP-17281
> -
>
> Key: HDFS-15620
> URL: https://issues.apache.org/jira/browse/HDFS-15620
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> HADOOP-17281 added FileSystem.listStatusIterator API and added its contract 
> test cases. In RBF, the following tests are affected and they are now failing:
> * hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatus
> * hadoop.fs.contract.router.TestRouterHDFSContractRootDirectory
> * hadoop.fs.contract.router.TestRouterHDFSContractGetFileStatusSecure
> * hadoop.fs.contract.router.web.TestRouterWebHDFSContractRootDirectory
> * hadoop.fs.contract.router.TestRouterHDFSContractRootDirectorySecure



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15641) DataNode could meet deadlock if invoke refreshNameNode

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15641:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.3, 3.3.1, 3.4.0  (was: 3.3.1, 3.4.0, 3.2.3)

> DataNode could meet deadlock if invoke refreshNameNode
> --
>
> Key: HDFS-15641
> URL: https://issues.apache.org/jira/browse/HDFS-15641
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.2.0
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Critical
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15641.001.patch, HDFS-15641.002.patch, 
> HDFS-15641.003.patch, deadlock.png, deadlock_fixed.png, jstack.log
>
>
> DataNode could meet deadlock when invoke `hdfs dfsadmin -refreshNamenodes 
> hostname:50020` to register a new namespace in federation env.
> The jstack is shown in jstack.log
>  The specific process is shown in Figure deadlock.png



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15657:
--
Hadoop Flags: Reviewed

> RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
> -
>
> Key: HDFS-15657
> URL: https://issues.apache.org/jira/browse/HDFS-15657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
> {noformat}
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 
> s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter
> [ERROR] 
> testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter)
>   Time elapsed: 1.04 s  <<< ERROR!
> org.apache.hadoop.service.ServiceStateException: java.net.BindException: 
> Problem binding to [0.0.0.0:] java.net.BindException: Address already in 
> use; For more details see:  http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:174)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> 

[jira] [Updated] (HDFS-15657) RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15657:
--
Affects Version/s: 3.3.1
   3.4.0

> RBF: TestRouter#testNamenodeHeartBeatEnableDefault fails by BindException
> -
>
> Key: HDFS-15657
> URL: https://issues.apache.org/jira/browse/HDFS-15657
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
> Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/40/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
> {noformat}
> [ERROR] Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 4.431 
> s <<< FAILURE! - in org.apache.hadoop.hdfs.server.federation.router.TestRouter
> [ERROR] 
> testNamenodeHeartBeatEnableDefault(org.apache.hadoop.hdfs.server.federation.router.TestRouter)
>   Time elapsed: 1.04 s  <<< ERROR!
> org.apache.hadoop.service.ServiceStateException: java.net.BindException: 
> Problem binding to [0.0.0.0:] java.net.BindException: Address already in 
> use; For more details see:  http://wiki.apache.org/hadoop/BindException
>   at 
> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:174)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.checkNamenodeHeartBeatEnableDefault(TestRouter.java:281)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.TestRouter.testNamenodeHeartBeatEnableDefault(TestRouter.java:267)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.net.BindException: Problem binding to [0.0.0.0:] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method)
>   at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> 

[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15685:
--
Hadoop Flags: Reviewed

> [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS 
> fails
> 
>
> Key: HDFS-15685
> URL: https://issues.apache.org/jira/browse/HDFS-15685
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after 
> [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499].
>  
> {noformat}
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider
> [ERROR] 
> testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider)
>   Time elapsed: 0.964 s  <<< FAILURE!
> java.lang.AssertionError: nn1 wasn't returned: 
> {host02.test/:8020=25, host01.test/:8020=25}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15680) Disable Broken Azure Junits

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15680:
--
Affects Version/s: 3.3.1

> Disable Broken Azure Junits
> ---
>
> Key: HDFS-15680
> URL: https://issues.apache.org/jira/browse/HDFS-15680
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are 6 test classes have been failing on Yetus for several months. 
> They contributed to more than 41 failing tests which makes reviewing Yetus 
> reports every a pain in the neck. Another point is to save the resources and 
> avoiding utilization of ports, memory, and CPU.
> Over the last month, there was some effort to bring the Yetus back to a 
> stable state. However, there is no progress in addressing Azure failures.
> Generally, I do not like to disable failing tests, but for this specific 
> case, I do not assume that it makes any sense to have 41 failing tests from 
> one module for several months. Whenever someone finds that those tests are 
> useful, then they can re-enable the tests on Yetus *_After_* the test is 
> fixed.
> Following a PR, I have to  review that my patch does not cause any failures 
> (include changing error messages in existing tests). A thorough review takes 
> a considerable amount of time browsing the nightly builds and Github reports.
> So, please consider how much time is being spent to review those stack trace 
> over the last months.
> Finally, this is one of the reasons developers tend to ignore the reports, 
> because it would take too much time to review; and by default, the errors are 
> considered irrelevant.
> CC: [~aajisaka], [~elgoiri], [~weichiu], [~ayushtkn]
> {code:bash}
>   hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked 
>hadoop.fs.azure.TestNativeAzureFileSystemMocked 
>hadoop.fs.azure.TestBlobMetadata 
>hadoop.fs.azure.TestNativeAzureFileSystemConcurrency 
>hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck 
>hadoop.fs.azure.TestNativeAzureFileSystemContractMocked 
>hadoop.fs.azure.TestWasbFsck 
>hadoop.fs.azure.TestOutOfBandAzureBlobOperations 
> {code}
> {code:bash}
> org.apache.hadoop.fs.azure.TestBlobMetadata.testFolderMetadata
> org.apache.hadoop.fs.azure.TestBlobMetadata.testFirstContainerVersionMetadata
> org.apache.hadoop.fs.azure.TestBlobMetadata.testPermissionMetadata
> org.apache.hadoop.fs.azure.TestBlobMetadata.testOldPermissionMetadata
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testNoTempBlobsVisible
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency.testLinkBlobs
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatusRootDir
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryMoveToExistingDirectory
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testListStatus
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameDirectoryAsExistingDirectory
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testRenameToDirWithSamePrefixAllowed
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testLSRootDir
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked.testDeleteRecursively
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck.testWasbFsck
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testChineseCharactersFolderRename
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListingWithZeroByteRenameMetadata
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderInFolderListing
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testUriEncoding
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testDeepFileCreation
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testListDirectory
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolderRenameInProgress
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRenameImplicitFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRedoRenameFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testStoreDeleteFolder
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked.testRename
> org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked.testListStatus
> 

[jira] [Updated] (HDFS-15684) EC: Call recoverLease on DFSStripedOutputStream close exception

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15684:
--
Affects Version/s: 3.4.0

> EC: Call recoverLease on DFSStripedOutputStream close exception
> ---
>
> Key: HDFS-15684
> URL: https://issues.apache.org/jira/browse/HDFS-15684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient, ec
>Affects Versions: 3.4.0
>Reporter: Hongbing Wang
>Assignee: Hongbing Wang
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15684.001.patch, HDFS-15684.002.patch, 
> HDFS-15684.003.patch
>
>
> -HDFS-14694- add a feature that call recoverLease operation automatically 
> when DFSOutputSteam close encounters exception. When we wanted to apply this 
> feature to our cluster, we found that it does not support EC files. 
> I think this feature should take effect whether replica files or EC files. 
> This Jira proposes to make it effective when in the case of EC files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15685:
--
Affects Version/s: 3.3.1
   3.4.0

> [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS 
> fails
> 
>
> Key: HDFS-15685
> URL: https://issues.apache.org/jira/browse/HDFS-15685
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after 
> [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499].
>  
> {noformat}
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider
> [ERROR] 
> testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider)
>   Time elapsed: 0.964 s  <<< FAILURE!
> java.lang.AssertionError: nn1 wasn't returned: 
> {host02.test/:8020=25, host01.test/:8020=25}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15689) allow/disallowSnapshot on EZ roots shouldn't fail due to trash provisioning/emptiness check

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15689:
--
Hadoop Flags: Reviewed

> allow/disallowSnapshot on EZ roots shouldn't fail due to trash 
> provisioning/emptiness check
> ---
>
> Key: HDFS-15689
> URL: https://issues.apache.org/jira/browse/HDFS-15689
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> h2. Background
> 1. HDFS-15607 added a feature that when 
> {{dfs.namenode.snapshot.trashroot.enabled=true}}, allowSnapshot will 
> automatically create a .Trash directory immediately after allowSnapshot 
> operation so files deleted will be moved into the trash root inside the 
> snapshottable directory.
> 2. HDFS-15539 prevents admins from disallowing snapshot if the trash root 
> inside is not empty
> h2. Problem
> 1. When {{dfs.namenode.snapshot.trashroot.enabled=true}}, currently if the 
> directory (to be allowed snapshot on) is an EZ root, it throws 
> {{FileAlreadyExistsException}} because the trash root already exists 
> (encryption zone has already created an internal trash root).
> 2. Similarly, at the moment if we disallow snapshot on an EZ root, it may 
> complain that the trash root is not empty (or delete it if empty, which is 
> not desired since EZ will still need it).
> h2. Solution
> 1. Let allowSnapshot succeed by not throwing {{FileAlreadyExistsException}}, 
> but informs the admin that the trash already exists.
> 2. Ignore {{checkTrashRootAndRemoveIfEmpty()}} check if path is EZ root.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15685) [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15685:
--
Component/s: test

> [JDK 14] TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS 
> fails
> 
>
> Key: HDFS-15685
> URL: https://issues.apache.org/jira/browse/HDFS-15685
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> TestConfiguredFailoverProxyProvider#testResolveDomainNameUsingDNS fails after 
> [JDK-8225499|https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8225499].
>  
> {noformat}
> [ERROR] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 2.115 
> s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider
> [ERROR] 
> testResolveDomainNameUsingDNS(org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider)
>   Time elapsed: 0.964 s  <<< FAILURE!
> java.lang.AssertionError: nn1 wasn't returned: 
> {host02.test/:8020=25, host01.test/:8020=25}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:295)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestConfiguredFailoverProxyProvider.testResolveDomainNameUsingDNS(TestConfiguredFailoverProxyProvider.java:320)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15749) Make size of editPendingQ can be configurable

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15749:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.3, 3.3.0, 3.4.0  (was: 3.3.0, 3.4.0, 3.2.3)

> Make size of editPendingQ can be configurable
> -
>
> Key: HDFS-15749
> URL: https://issues.apache.org/jira/browse/HDFS-15749
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15725) Lease Recovery never completes for a committed block which the DNs never finalize

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15725:
--
Hadoop Flags: Reviewed

> Lease Recovery never completes for a committed block which the DNs never 
> finalize
> -
>
> Key: HDFS-15725
> URL: https://issues.apache.org/jira/browse/HDFS-15725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
> Attachments: HDFS-15725.001.patch, HDFS-15725.002.patch, 
> HDFS-15725.003.patch, HDFS-15725.branch-2.10.001.patch, 
> HDFS-15725.branch-3.2.001.patch, lease_recovery_2_10.patch
>
>
> It a very rare condition, the HDFS client process can get killed right at the 
> time it is completing a block / file.
> The client sends the "complete" call to the namenode, moving the block into a 
> committed state, but it dies before it can send the final packet to the 
> Datanodes telling them to finalize the block.
> This means the blocks are stuck on the datanodes in RBW state and nothing 
> will ever tell them to move out of that state.
> The namenode / lease manager will retry forever to close the file, but it 
> will always complain it is waiting for blocks to reach minimal replication.
> I have a simple test and patch to fix this, but I think it warrants some 
> discussion on whether this is the correct thing to do, or if I need to put 
> the fix behind a config switch.
> My idea, is that if lease recovery occurs, and the block is still waiting on 
> "minimal replication", just put the file back to UNDER_CONSTRUCTION so that 
> on the next lease recovery attempt, BLOCK RECOVERY will happen, close the 
> file and move the replicas to FINALIZED.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15788) Correct the statement for pmem cache to reflect cache persistence support

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15788:
--
Hadoop Flags: Reviewed

> Correct the statement for pmem cache to reflect cache persistence support
> -
>
> Key: HDFS-15788
> URL: https://issues.apache.org/jira/browse/HDFS-15788
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15788-01.patch, HDFS-15788-02.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Correct the statement for pmem cache to reflect cache persistence support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15790:
--
Hadoop Flags: Reviewed

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.1, 3.4.0
>Reporter: David Mollitor
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15790:
--
Component/s: ipc

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 3.3.1, 3.4.0
>Reporter: David Mollitor
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15790) Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15790:
--
Affects Version/s: 3.3.1
   3.4.0

> Make ProtobufRpcEngineProtos and ProtobufRpcEngineProtos2 Co-Exist
> --
>
> Key: HDFS-15790
> URL: https://issues.apache.org/jira/browse/HDFS-15790
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 3.4.0
>Reporter: David Mollitor
>Assignee: Vinayakumar B
>Priority: Critical
>  Labels: pull-request-available, release-blocker
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Changing from Protobuf 2 to Protobuf 3 broke some stuff in Apache Hive 
> project.  This was not an awesome thing to do between minor versions in 
> regards to backwards compatibility for downstream projects.
> Additionally, these two frameworks are not drop-in replacements, they have 
> some differences.  Also, Protobuf 2 is not deprecated or anything so let us 
> have both protocols available at the same time.  In Hadoop 4.x Protobuf 2 
> support can be dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15796) ConcurrentModificationException error happens on NameNode occasionally

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15796:
--
Hadoop Flags: Reviewed

> ConcurrentModificationException error happens on NameNode occasionally
> --
>
> Key: HDFS-15796
> URL: https://issues.apache.org/jira/browse/HDFS-15796
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Daniel Ma
>Assignee: Daniel Ma
>Priority: Critical
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-15796-0001.patch
>
>
> ConcurrentModificationException error happens on NameNode occasionally.
>  
> {code:java}
> 2021-01-23 20:21:18,107 | ERROR | RedundancyMonitor | RedundancyMonitor 
> thread received Runtime exception.  | BlockManager.java:4746
> java.util.ConcurrentModificationException
>   at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:909)
>   at java.util.ArrayList$Itr.next(ArrayList.java:859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1907)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1859)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4862)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4729)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15798:
--
Affects Version/s: 3.3.1
   3.4.0

> EC: Reconstruct task failed, and It would be XmitsInProgress of DN has 
> negative number
> --
>
> Key: HDFS-15798
> URL: https://issues.apache.org/jira/browse/HDFS-15798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, 
> HDFS-15798.003.patch
>
>
> The EC reconstruct task failed, and the decrementXmitsInProgress of 
> processErasureCodingTasks operation abnormal value ;
>  It would be XmitsInProgress of DN has negative number, it affects NN chooses 
> pending tasks based on the ratio between the lengths of replication and 
> erasure-coded block queues.
> {code:java}
> // 1.ErasureCodingWorker.java
> public void processErasureCodingTasks(
> Collection ecTasks) {
>   for (BlockECReconstructionInfo reconInfo : ecTasks) {
> int xmitsSubmitted = 0;
> try {
>   ...
>   // It may throw IllegalArgumentException from task#stripedReader
>   // constructor.
>   final StripedBlockReconstructor task =
>   new StripedBlockReconstructor(this, stripedReconInfo);
>   if (task.hasValidTargets()) {
> // See HDFS-12044. We increase xmitsInProgress even the task is only
> // enqueued, so that
> //   1) NN will not send more tasks than what DN can execute and
> //   2) DN will not throw away reconstruction tasks, and instead keeps
> //  an unbounded number of tasks in the executor's task queue.
> xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
> getDatanode().incrementXmitsInProcess(xmitsSubmitted); //  task start 
> increment
> stripedReconstructionPool.submit(task);
>   } else {
> LOG.warn("No missing internal block. Skip reconstruction for task:{}",
> reconInfo);
>   }
> } catch (Throwable e) {
>   getDatanode().decrementXmitsInProgress(xmitsSubmitted); //  task failed 
> decrement,  XmitsInProgress is decremented by the previous value
>   LOG.warn("Failed to reconstruct striped block {}",
>   reconInfo.getExtendedBlock().getLocalBlock(), e);
> }
>   }
> }
> // 2.StripedBlockReconstructor.java
> public void run() {
>   try {
> initDecoderIfNecessary();
>...
>   } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> getDatanode().getMetrics().incrECFailedReconstructionTasks();
>   } finally {
> float xmitWeight = getErasureCodingWorker().getXmitWeight();
> // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1
> // because if it set to zero, we cannot to measure the xmits submitted
> int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1);
> getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete 
> decrement
> ...
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15798:
--
Component/s: erasure-coding

> EC: Reconstruct task failed, and It would be XmitsInProgress of DN has 
> negative number
> --
>
> Key: HDFS-15798
> URL: https://issues.apache.org/jira/browse/HDFS-15798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, 
> HDFS-15798.003.patch
>
>
> The EC reconstruct task failed, and the decrementXmitsInProgress of 
> processErasureCodingTasks operation abnormal value ;
>  It would be XmitsInProgress of DN has negative number, it affects NN chooses 
> pending tasks based on the ratio between the lengths of replication and 
> erasure-coded block queues.
> {code:java}
> // 1.ErasureCodingWorker.java
> public void processErasureCodingTasks(
> Collection ecTasks) {
>   for (BlockECReconstructionInfo reconInfo : ecTasks) {
> int xmitsSubmitted = 0;
> try {
>   ...
>   // It may throw IllegalArgumentException from task#stripedReader
>   // constructor.
>   final StripedBlockReconstructor task =
>   new StripedBlockReconstructor(this, stripedReconInfo);
>   if (task.hasValidTargets()) {
> // See HDFS-12044. We increase xmitsInProgress even the task is only
> // enqueued, so that
> //   1) NN will not send more tasks than what DN can execute and
> //   2) DN will not throw away reconstruction tasks, and instead keeps
> //  an unbounded number of tasks in the executor's task queue.
> xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
> getDatanode().incrementXmitsInProcess(xmitsSubmitted); //  task start 
> increment
> stripedReconstructionPool.submit(task);
>   } else {
> LOG.warn("No missing internal block. Skip reconstruction for task:{}",
> reconInfo);
>   }
> } catch (Throwable e) {
>   getDatanode().decrementXmitsInProgress(xmitsSubmitted); //  task failed 
> decrement,  XmitsInProgress is decremented by the previous value
>   LOG.warn("Failed to reconstruct striped block {}",
>   reconInfo.getExtendedBlock().getLocalBlock(), e);
> }
>   }
> }
> // 2.StripedBlockReconstructor.java
> public void run() {
>   try {
> initDecoderIfNecessary();
>...
>   } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> getDatanode().getMetrics().incrECFailedReconstructionTasks();
>   } finally {
> float xmitWeight = getErasureCodingWorker().getXmitWeight();
> // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1
> // because if it set to zero, we cannot to measure the xmits submitted
> int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1);
> getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete 
> decrement
> ...
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15798) EC: Reconstruct task failed, and It would be XmitsInProgress of DN has negative number

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15798:
--
Hadoop Flags: Reviewed

> EC: Reconstruct task failed, and It would be XmitsInProgress of DN has 
> negative number
> --
>
> Key: HDFS-15798
> URL: https://issues.apache.org/jira/browse/HDFS-15798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15798.001.patch, HDFS-15798.002.patch, 
> HDFS-15798.003.patch
>
>
> The EC reconstruct task failed, and the decrementXmitsInProgress of 
> processErasureCodingTasks operation abnormal value ;
>  It would be XmitsInProgress of DN has negative number, it affects NN chooses 
> pending tasks based on the ratio between the lengths of replication and 
> erasure-coded block queues.
> {code:java}
> // 1.ErasureCodingWorker.java
> public void processErasureCodingTasks(
> Collection ecTasks) {
>   for (BlockECReconstructionInfo reconInfo : ecTasks) {
> int xmitsSubmitted = 0;
> try {
>   ...
>   // It may throw IllegalArgumentException from task#stripedReader
>   // constructor.
>   final StripedBlockReconstructor task =
>   new StripedBlockReconstructor(this, stripedReconInfo);
>   if (task.hasValidTargets()) {
> // See HDFS-12044. We increase xmitsInProgress even the task is only
> // enqueued, so that
> //   1) NN will not send more tasks than what DN can execute and
> //   2) DN will not throw away reconstruction tasks, and instead keeps
> //  an unbounded number of tasks in the executor's task queue.
> xmitsSubmitted = Math.max((int)(task.getXmits() * xmitWeight), 1);
> getDatanode().incrementXmitsInProcess(xmitsSubmitted); //  task start 
> increment
> stripedReconstructionPool.submit(task);
>   } else {
> LOG.warn("No missing internal block. Skip reconstruction for task:{}",
> reconInfo);
>   }
> } catch (Throwable e) {
>   getDatanode().decrementXmitsInProgress(xmitsSubmitted); //  task failed 
> decrement,  XmitsInProgress is decremented by the previous value
>   LOG.warn("Failed to reconstruct striped block {}",
>   reconInfo.getExtendedBlock().getLocalBlock(), e);
> }
>   }
> }
> // 2.StripedBlockReconstructor.java
> public void run() {
>   try {
> initDecoderIfNecessary();
>...
>   } catch (Throwable e) {
> LOG.warn("Failed to reconstruct striped block: {}", getBlockGroup(), e);
> getDatanode().getMetrics().incrECFailedReconstructionTasks();
>   } finally {
> float xmitWeight = getErasureCodingWorker().getXmitWeight();
> // if the xmits is smaller than 1, the xmitsSubmitted should be set to 1
> // because if it set to zero, we cannot to measure the xmits submitted
> int xmitsSubmitted = Math.max((int) (getXmits() * xmitWeight), 1);
> getDatanode().decrementXmitsInProgress(xmitsSubmitted); // task complete 
> decrement
> ...
>   }
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15818) Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15818:
--
Hadoop Flags: Reviewed

> Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig
> ---
>
> Key: HDFS-15818
> URL: https://issues.apache.org/jira/browse/HDFS-15818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.2.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current TestFsDatasetImpl.testReadLockCanBeDisabledByConfig is incorrect:
> 1) Test fails intermittently as holder can acquire lock first
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2666/1/testReport/]
>  
> 2) Test passes regardless of the setting of 
> DFS_DATANODE_LOCK_READ_WRITE_ENABLED_KEY



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15818) Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15818:
--
Affects Version/s: 3.4.0

> Fix TestFsDatasetImpl.testReadLockCanBeDisabledByConfig
> ---
>
> Key: HDFS-15818
> URL: https://issues.apache.org/jira/browse/HDFS-15818
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.2.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current TestFsDatasetImpl.testReadLockCanBeDisabledByConfig is incorrect:
> 1) Test fails intermittently as holder can acquire lock first
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2666/1/testReport/]
>  
> 2) Test passes regardless of the setting of 
> DFS_DATANODE_LOCK_READ_WRITE_ENABLED_KEY



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15836) RBF: Fix contract tests after HADOOP-13327

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15836:
--
Hadoop Flags: Reviewed

> RBF: Fix contract tests after HADOOP-13327
> --
>
> Key: HDFS-15836
> URL: https://issues.apache.org/jira/browse/HDFS-15836
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 19.094 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate)
>   Time elapsed: 0.102 s  <<< FAILURE!
> java.lang.AssertionError: Should not have capability: hflush in 
> FSDataOutputStream{wrappedStream=DFSOutputStream:block==null}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.fs.contract.ContractTestUtils.assertCapabilities(ContractTestUtils.java:1553)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:497)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2696/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15819) Fix a codestyle issue for TestQuotaByStorageType

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15819:
--
Hadoop Flags: Reviewed

> Fix a codestyle issue for TestQuotaByStorageType
> 
>
> Key: HDFS-15819
> URL: https://issues.apache.org/jira/browse/HDFS-15819
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Baolong Mao
>Assignee: Baolong Mao
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15836) RBF: Fix contract tests after HADOOP-13327

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15836:
--
Affects Version/s: 3.3.1
   3.4.0

> RBF: Fix contract tests after HADOOP-13327
> --
>
> Key: HDFS-15836
> URL: https://issues.apache.org/jira/browse/HDFS-15836
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {noformat}
> [ERROR] Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 19.094 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.TestRouterHDFSContractCreate)
>   Time elapsed: 0.102 s  <<< FAILURE!
> java.lang.AssertionError: Should not have capability: hflush in 
> FSDataOutputStream{wrappedStream=DFSOutputStream:block==null}
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertFalse(Assert.java:64)
>   at 
> org.apache.hadoop.fs.contract.ContractTestUtils.assertCapabilities(ContractTestUtils.java:1553)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:497)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2696/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15845) RBF: Router fails to start due to NoClassDefFoundError for hadoop-federation-balance

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15845:
--
Hadoop Flags: Reviewed

> RBF: Router fails to start due to NoClassDefFoundError for 
> hadoop-federation-balance
> 
>
> Key: HDFS-15845
> URL: https://issues.apache.org/jira/browse/HDFS-15845
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> $ hdfs dfsrouter
> ...
> 2021-02-22 17:21:55,400 ERROR router.DFSRouter: Failed to start router
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.(RouterClientProtocol.java:195)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:394)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedure
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 6 more
> 2021-02-22 17:21:55,402 INFO util.ExitUtil: Exiting with status 1: 
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure
> 2021-02-22 17:21:55,404 INFO router.DFSRouter: SHUTDOWN_MSG:
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15845) RBF: Router fails to start due to NoClassDefFoundError for hadoop-federation-balance

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15845:
--
Affects Version/s: 3.4.0

> RBF: Router fails to start due to NoClassDefFoundError for 
> hadoop-federation-balance
> 
>
> Key: HDFS-15845
> URL: https://issues.apache.org/jira/browse/HDFS-15845
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> $ hdfs dfsrouter
> ...
> 2021-02-22 17:21:55,400 ERROR router.DFSRouter: Failed to start router
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.(RouterClientProtocol.java:195)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:394)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedure
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 6 more
> 2021-02-22 17:21:55,402 INFO util.ExitUtil: Exiting with status 1: 
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure
> 2021-02-22 17:21:55,404 INFO router.DFSRouter: SHUTDOWN_MSG:
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15845) RBF: Router fails to start due to NoClassDefFoundError for hadoop-federation-balance

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15845:
--
Component/s: rbf

> RBF: Router fails to start due to NoClassDefFoundError for 
> hadoop-federation-balance
> 
>
> Key: HDFS-15845
> URL: https://issues.apache.org/jira/browse/HDFS-15845
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {noformat}
> $ hdfs dfsrouter
> ...
> 2021-02-22 17:21:55,400 ERROR router.DFSRouter: Failed to start router
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.(RouterClientProtocol.java:195)
> at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.(RouterRpcServer.java:394)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.createRpcServer(Router.java:391)
> at 
> org.apache.hadoop.hdfs.server.federation.router.Router.serviceInit(Router.java:188)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:165)
> at 
> org.apache.hadoop.hdfs.server.federation.router.DFSRouter.main(DFSRouter.java:69)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedure
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 6 more
> 2021-02-22 17:21:55,402 INFO util.ExitUtil: Exiting with status 1: 
> java.lang.NoClassDefFoundError: 
> org/apache/hadoop/tools/fedbalance/procedure/BalanceProcedure
> 2021-02-22 17:21:55,404 INFO router.DFSRouter: SHUTDOWN_MSG:
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15850) Superuser actions should be reported to external enforcers

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15850:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Superuser actions should be reported to external enforcers
> --
>
> Key: HDFS-15850
> URL: https://issues.apache.org/jira/browse/HDFS-15850
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: security
>Affects Versions: 3.3.0
>Reporter: Vivek Ratnavel Subramanian
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15850.branch-3.3.001.patch, HDFS-15850.v1.patch, 
> HDFS-15850.v2.patch
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently, HDFS superuser checks or actions are not reported to external 
> enforcers like Ranger and the audit report provided by such external enforces 
> are not complete and are missing the superuser actions. To fix this, add a 
> new method to "AccessControlEnforcer" for all superuser checks. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15904) Flaky test TestBalancer#testBalancerWithSortTopNodes()

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15904:
--
Component/s: test

> Flaky test TestBalancer#testBalancerWithSortTopNodes()
> --
>
> Key: HDFS-15904
> URL: https://issues.apache.org/jira/browse/HDFS-15904
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: balancer  mover, test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> TestBalancer#testBalancerWithSortTopNodes shows some flakes in around ~10 
> runs or so. It's reproducible locally also. Basically, balancing either moves 
> 2 blocks of size 100+100 bytes or it moves 3 blocks of size 100+100+50 bytes 
> (2nd case causes flakies).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15895) DFSAdmin#printOpenFiles has redundant String#format usage

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15895:
--
Affects Version/s: 3.3.1
   3.4.0

> DFSAdmin#printOpenFiles has redundant String#format usage
> -
>
> Key: HDFS-15895
> URL: https://issues.apache.org/jira/browse/HDFS-15895
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15895) DFSAdmin#printOpenFiles has redundant String#format usage

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15895:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.3, 2.10.2, 3.3.1, 3.4.0  (was: 3.3.1, 3.4.0, 2.10.2, 
3.2.3)

> DFSAdmin#printOpenFiles has redundant String#format usage
> -
>
> Key: HDFS-15895
> URL: https://issues.apache.org/jira/browse/HDFS-15895
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15895) DFSAdmin#printOpenFiles has redundant String#format usage

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15895:
--
Component/s: dfsadmin

> DFSAdmin#printOpenFiles has redundant String#format usage
> -
>
> Key: HDFS-15895
> URL: https://issues.apache.org/jira/browse/HDFS-15895
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: dfsadmin
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15904) Flaky test TestBalancer#testBalancerWithSortTopNodes()

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15904:
--
Affects Version/s: 3.4.0

> Flaky test TestBalancer#testBalancerWithSortTopNodes()
> --
>
> Key: HDFS-15904
> URL: https://issues.apache.org/jira/browse/HDFS-15904
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: balancer  mover
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> TestBalancer#testBalancerWithSortTopNodes shows some flakes in around ~10 
> runs or so. It's reproducible locally also. Basically, balancing either moves 
> 2 blocks of size 100+100 bytes or it moves 3 blocks of size 100+100+50 bytes 
> (2nd case causes flakies).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15911) Provide blocks moved count in Balancer iteration result

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15911:
--
Affects Version/s: 3.3.1
   3.4.0

> Provide blocks moved count in Balancer iteration result
> ---
>
> Key: HDFS-15911
> URL: https://issues.apache.org/jira/browse/HDFS-15911
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Balancer provides Result for iteration and it contains info like exitStatus, 
> bytesLeftToMove, bytesBeingMoved etc. We should also provide blocksMoved 
> count from NameNodeConnector and print it with rest of details in 
> Result#print().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15907) Reduce Memory Overhead of AclFeature by avoiding AtomicInteger

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15907:
--
Hadoop Flags: Reviewed

> Reduce Memory Overhead of AclFeature by avoiding AtomicInteger
> --
>
> Key: HDFS-15907
> URL: https://issues.apache.org/jira/browse/HDFS-15907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15907.001.patch
>
>
> In HDFS-15792 we made some changes to the AclFeature and ReferenceCountedMap 
> classes to address a rare bug when loading the FSImage in parallel.
> One change we made was to replace an int inside AclFeature with an 
> AtomicInteger to avoid synchronising the methods in AclFeature.
> Discussing this change with [~weichiu], he pointed out that while the 
> AclFeature cache is intended to reduce the count of AclFeature objects, on a 
> large cluster, it is possible for there to be many millions of AclFeature 
> objects.
> Previously, the int will have taken 4 bytes of heap.
> By moving to a AtomicInteger, we probably have an overhead of:
>  4 bytes (or 8 if the heap is over 32GB) for a reference to the atomic long 
> object
>  12 byte overhead for the java object
>  4 bytes inside the atomic long to store an int.
>  
> So the total heap overhead has gone from 4 bytes to 20 bytes just to use an 
> AtomicInteger.
> Therefore I think it makes sense to remove the AtomicInteger and just 
> synchronise the methods of AclFeature where the value is incremented / 
> decremented / retrieved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15907) Reduce Memory Overhead of AclFeature by avoiding AtomicInteger

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15907:
--
Affects Version/s: 3.3.1
   3.4.0

> Reduce Memory Overhead of AclFeature by avoiding AtomicInteger
> --
>
> Key: HDFS-15907
> URL: https://issues.apache.org/jira/browse/HDFS-15907
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15907.001.patch
>
>
> In HDFS-15792 we made some changes to the AclFeature and ReferenceCountedMap 
> classes to address a rare bug when loading the FSImage in parallel.
> One change we made was to replace an int inside AclFeature with an 
> AtomicInteger to avoid synchronising the methods in AclFeature.
> Discussing this change with [~weichiu], he pointed out that while the 
> AclFeature cache is intended to reduce the count of AclFeature objects, on a 
> large cluster, it is possible for there to be many millions of AclFeature 
> objects.
> Previously, the int will have taken 4 bytes of heap.
> By moving to a AtomicInteger, we probably have an overhead of:
>  4 bytes (or 8 if the heap is over 32GB) for a reference to the atomic long 
> object
>  12 byte overhead for the java object
>  4 bytes inside the atomic long to store an int.
>  
> So the total heap overhead has gone from 4 bytes to 20 bytes just to use an 
> AtomicInteger.
> Therefore I think it makes sense to remove the AtomicInteger and just 
> synchronise the methods of AclFeature where the value is incremented / 
> decremented / retrieved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15923) RBF: Authentication failed when rename accross sub clusters

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15923:
--
Affects Version/s: 3.4.0

> RBF:  Authentication failed when rename accross sub clusters
> 
>
> Key: HDFS-15923
> URL: https://issues.apache.org/jira/browse/HDFS-15923
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: zhuobin zheng
>Assignee: zhuobin zheng
>Priority: Major
>  Labels: RBF, pull-request-available, rename
> Fix For: 3.4.0
>
> Attachments: HDFS-15923.001.patch, HDFS-15923.002.patch, 
> HDFS-15923.003.patch, HDFS-15923.stack-trace, 
> hdfs-15923-fix-security-issue.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Rename accross subcluster with RBF and Kerberos environment. Will encounter 
> the following two errors:
>  # Save Object to journal.
>  # Precheck try to get src file status
> So, we need use Router Login UGI doAs create DistcpProcedure and 
> TrashProcedure and submit Job.
>  
> Beside, we should check user permission for src and dst path in router side 
> before do rename internal. (HDFS-15973)
> First: Save Object to journal.
> {code:java}
> // code placeholder
> 2021-03-23 14:01:16,233 WARN org.apache.hadoop.ipc.Client: Exception 
> encountered while connecting to the server 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
> at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1636)
> at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> at org.apache.hadoop.ipc.Client.call(Client.java:1405)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy11.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:376)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy12.create(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:277)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1240)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1219)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1201)
> at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1139)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:533)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$8.doCall(DistributedFileSystem.java:530)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> 

[jira] [Updated] (HDFS-15931) Fix non-static inner classes for better memory management

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15931:
--
Component/s: hdfs

> Fix non-static inner classes for better memory management
> -
>
> Key: HDFS-15931
> URL: https://issues.apache.org/jira/browse/HDFS-15931
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> If an inner class does not need to reference its enclosing instance, it can 
> be static. This prevents a common cause of memory leaks and uses less memory 
> per instance of the enclosing class.
> Came across DataNodeProperties as a non static inner class defined in 
> MiniDFSCluster without holding any implicit reference to MiniDFSCluster. 
> Taking this opportunity to find other non-static inner classes that are not 
> holding implicit reference to their respective enclosing instances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15926) Removed duplicate dependency of hadoop-annotations

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15926:
--
Component/s: hdfs

> Removed duplicate dependency of hadoop-annotations
> --
>
> Key: HDFS-15926
> URL: https://issues.apache.org/jira/browse/HDFS-15926
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> hadoop-annotations is duplicated dependency in hadoop-hdfs as it is also 
> declared in parent hadoop-project-dist pom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15926) Removed duplicate dependency of hadoop-annotations

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15926:
--
Affects Version/s: 3.3.1
   3.4.0

> Removed duplicate dependency of hadoop-annotations
> --
>
> Key: HDFS-15926
> URL: https://issues.apache.org/jira/browse/HDFS-15926
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> hadoop-annotations is duplicated dependency in hadoop-hdfs as it is also 
> declared in parent hadoop-project-dist pom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15931) Fix non-static inner classes for better memory management

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15931:
--
Affects Version/s: 3.3.1
   3.4.0

> Fix non-static inner classes for better memory management
> -
>
> Key: HDFS-15931
> URL: https://issues.apache.org/jira/browse/HDFS-15931
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> If an inner class does not need to reference its enclosing instance, it can 
> be static. This prevents a common cause of memory leaks and uses less memory 
> per instance of the enclosing class.
> Came across DataNodeProperties as a non static inner class defined in 
> MiniDFSCluster without holding any implicit reference to MiniDFSCluster. 
> Taking this opportunity to find other non-static inner classes that are not 
> holding implicit reference to their respective enclosing instances.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15937) Reduce memory used during datanode layout upgrade

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15937:
--
Hadoop Flags: Reviewed

> Reduce memory used during datanode layout upgrade
> -
>
> Key: HDFS-15937
> URL: https://issues.apache.org/jira/browse/HDFS-15937
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0, 3.1.4, 3.2.2, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: heap-dump-after.png, heap-dump-before.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When the datanode block layout is upgrade from -56 (256x256) to -57 (32x32), 
> we have found the datanode uses a lot more memory than usual.
> For each volume, the blocks are scanned and a list is created holding a 
> series of LinkArgs objects. This object contains a File object for the block 
> source and destination. The file object stores the path as a string, eg:
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825_1001.meta
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0/blk_1073741825
> This is string is repeated for every block and meta file on the DN, and much 
> of the string is the same each time, leading to a large amount of memory.
> If we change the linkArgs to store:
> * Src Path without the block, eg 
> /data01/dfs/dn/previous.tmp/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir0
> * Dest Path without the block eg 
> /data01/dfs/dn/current/BP-586623041-127.0.0.1-1617017575175/current/finalized/subdir0/subdir10
> * Block / Meta file name, eg blk_12345678_1001 or blk_12345678_1001.meta
> Then ensure were reuse the same file object for repeated src and dest paths, 
> we can save most of the memory without reworking the logic of the code.
> The current logic works along the source paths recursively, so you can easily 
> re-use the src path object.
> For the destination path, there are only 32x32 (1024) distinct paths, so we 
> can simply cache them in a hashMap and lookup the re-useable object each time.
> I tested locally by generating 100k block files and attempting the layout 
> upgrade. A heap dump showed the 100k blocks using about 140MB of heap. That 
> is close to 1.5GB per 1M blocks.
> After the change outlined above the same 100K blocks used about 20MB of heap, 
> so 200MB per million blocks.
> A general DN sizing recommendation is 1GB of heap per 1M blocks, so the 
> upgrade should be able to happen within the pre-upgrade heap.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15940) Some tests in TestBlockRecovery are consistently failing

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15940:
--
Component/s: test

> Some tests in TestBlockRecovery are consistently failing
> 
>
> Key: HDFS-15940
> URL: https://issues.apache.org/jira/browse/HDFS-15940
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Some long running tests in TestBlockRecovery are consistently failing. Also, 
> TestBlockRecovery is huge with so many tests, we should refactor some of long 
> running and race condition specific tests to separate class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15942) Increase Quota initialization threads

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15942:
--
Affects Version/s: 3.3.1
   3.4.0

> Increase Quota initialization threads
> -
>
> Key: HDFS-15942
> URL: https://issues.apache.org/jira/browse/HDFS-15942
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15942.001.patch
>
>
> On large namespaces, the quota initialization at started can take a long time 
> with the default 4 threads. Also on NN failover, often the quota needs to be 
> calculated before the failover can completed, delaying the failover.
> I performed some benchmarks some time back on a large image (316M inodes 35GB 
> on disk), the quota load takes:
> {code}
> quota - 4  threads 39 seconds
> quota - 8  threads 23 seconds
> quota - 12 threads 20 seconds
> quota - 16 threads 15 seconds
> {code}
> As the quota is calculated when the NN is starting up (and hence doing no 
> other work) or at failover time before the new standby becomes active, I 
> think the quota should use as many threads as possible.
> I proposed we change the default to 8 or 12 on at least trunk and branch-3.3 
> so we have a better default going forward.
> Has anyone got any other thoughts?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15977) Call explicit_bzero only if it is available

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15977:
--
Hadoop Flags: Reviewed

> Call explicit_bzero only if it is available
> ---
>
> Key: HDFS-15977
> URL: https://issues.apache.org/jira/browse/HDFS-15977
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> CentOS/RHEL 7 has glibc 2.17, and it does not support explicit_bzero. Now I 
> don't want to drop support for CentOS/RHEL 7, and we should call 
> explicit_bzero only if it is available. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15989) Split TestBalancer into two classes

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15989:
--
Component/s: balancer
 test

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: balancer, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15989) Split TestBalancer into two classes

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15989:
--
Affects Version/s: 3.3.1
   3.4.0

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15977) Call explicit_bzero only if it is available

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15977:
--
Affects Version/s: 3.3.2
   3.4.0

> Call explicit_bzero only if it is available
> ---
>
> Key: HDFS-15977
> URL: https://issues.apache.org/jira/browse/HDFS-15977
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs++
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> CentOS/RHEL 7 has glibc 2.17, and it does not support explicit_bzero. Now I 
> don't want to drop support for CentOS/RHEL 7, and we should call 
> explicit_bzero only if it is available. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15989) Split TestBalancer into two classes

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15989:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.3, 3.3.1, 3.4.0  (was: 3.3.1, 3.4.0, 3.2.3)

> Split TestBalancer into two classes
> ---
>
> Key: HDFS-15989
> URL: https://issues.apache.org/jira/browse/HDFS-15989
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: balancer, test
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> TestBalancer has many tests accumulated, it would be good to split it up into 
> two classes. Moreover, TestBalancer#testMaxIterationTime is flaky. We should 
> also resolve it with this Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16001) TestOfflineEditsViewer.testStored() fails reading negative value of FSEditLogOpCodes

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16001:
--
Affects Version/s: 3.3.1
   3.4.0

> TestOfflineEditsViewer.testStored() fails reading negative value of 
> FSEditLogOpCodes
> 
>
> Key: HDFS-16001
> URL: https://issues.apache.org/jira/browse/HDFS-16001
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Konstantin Shvachko
>Assignee: Akira Ajisaka
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> {{TestOfflineEditsViewer.testStored()}} fails consistently with an exception
> {noformat}
> java.io.IOException: Op -54 has size -1314247195, but the minimum op size is 
> 17
> {noformat}
> Seems like there is a corrupt record in {{editsStored}} file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16007:
--
Affects Version/s: 3.3.1
   3.4.0

> Deserialization of ReplicaState should avoid throwing 
> ArrayIndexOutOfBoundsException
> 
>
> Key: HDFS-16007
> URL: https://issues.apache.org/jira/browse/HDFS-16007
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 3.4.0
>Reporter: junwen yang
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ReplicaState enum is using ordinal to conduct serialization and 
> deserialization, which is vulnerable to the order, to cause issues similar to 
> HDFS-15624.
> To avoid it, either adding comments to let later developer not to change this 
> enum, or add index checking in the read and getState function to avoid index 
> out of bound error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16014) Fix an issue in checking native pmdk lib by 'hadoop checknative' command

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16014:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.4, 3.4.0  (was: 3.4.0, 3.2.4)

> Fix an issue in checking native pmdk lib by 'hadoop checknative' command
> 
>
> Key: HDFS-16014
> URL: https://issues.apache.org/jira/browse/HDFS-16014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: native
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-16014-01.patch, HDFS-16014-02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> In HDFS-14818, we proposed a patch to support checking native pmdk lib. The 
> expected target is to display hint to user regarding pmdk lib loaded state. 
> Recently, it was found that pmdk lib was not successfully loaded actually but 
> the `hadoop checknative` command still tells user that it was. This issue can 
> be reproduced by moving libpmem.so* from specified installed path to other 
> place, or directly deleting these libs, after the project is built.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16007) Deserialization of ReplicaState should avoid throwing ArrayIndexOutOfBoundsException

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16007:
--
Component/s: hdfs

> Deserialization of ReplicaState should avoid throwing 
> ArrayIndexOutOfBoundsException
> 
>
> Key: HDFS-16007
> URL: https://issues.apache.org/jira/browse/HDFS-16007
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: junwen yang
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> ReplicaState enum is using ordinal to conduct serialization and 
> deserialization, which is vulnerable to the order, to cause issues similar to 
> HDFS-15624.
> To avoid it, either adding comments to let later developer not to change this 
> enum, or add index checking in the read and getState function to avoid index 
> out of bound error. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16046:
--
Hadoop Flags: Reviewed

> TestBalanceProcedureScheduler and TestDistCpProcedure timeout
> -
>
> Key: HDFS-16046
> URL: https://issues.apache.org/jira/browse/HDFS-16046
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, 
> screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following two tests timed out frequently in the qbt job.
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331)
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16046) TestBalanceProcedureScheduler and TestDistCpProcedure timeout

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16046:
--
Affects Version/s: 3.4.0

> TestBalanceProcedureScheduler and TestDistCpProcedure timeout
> -
>
> Key: HDFS-16046
> URL: https://issues.apache.org/jira/browse/HDFS-16046
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2021-05-28-11-41-16-733.png, screenshot-1.png, 
> screenshot-2.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following two tests timed out frequently in the qbt job.
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance.procedure/TestBalanceProcedureScheduler/testSchedulerDownAndRecoverJob/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 6 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.TestBalanceProcedureScheduler.testSchedulerDownAndRecoverJob(TestBalanceProcedureScheduler.java:331)
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/520/testReport/org.apache.hadoop.tools.fedbalance/TestDistCpProcedure/testSuccessfulDistCpProcedure/]
> {quote}org.junit.runners.model.TestTimedOutException: test timed out after 
> 3 milliseconds
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:502)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceJob.waitJobDone(BalanceJob.java:220)
>  at 
> org.apache.hadoop.tools.fedbalance.procedure.BalanceProcedureScheduler.waitUntilDone(BalanceProcedureScheduler.java:189)
>  at 
> org.apache.hadoop.tools.fedbalance.TestDistCpProcedure.testSuccessfulDistCpProcedure(TestDistCpProcedure.java:121)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16050) Some dynamometer tests fail

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16050:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Some dynamometer tests fail
> ---
>
> Key: HDFS-16050
> URL: https://issues.apache.org/jira/browse/HDFS-16050
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following tests failed:
> {quote}hadoop.tools.dynamometer.TestDynamometerInfra
>  hadoop.tools.dynamometer.blockgenerator.TestBlockGen
> hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt]
> {quote}[ERROR] 
> testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator)
>  Time elapsed: 1.353 s <<< ERROR!
>  java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16050) Some dynamometer tests fail

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16050:
--
Affects Version/s: 3.3.2
   3.4.0

> Some dynamometer tests fail
> ---
>
> Key: HDFS-16050
> URL: https://issues.apache.org/jira/browse/HDFS-16050
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following tests failed:
> {quote}hadoop.tools.dynamometer.TestDynamometerInfra
>  hadoop.tools.dynamometer.blockgenerator.TestBlockGen
> hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator
> {quote}
> [https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/523/artifact/out/patch-unit-hadoop-tools_hadoop-dynamometer.txt]
> {quote}[ERROR] 
> testAuditWorkloadDirectParserWithOutput(org.apache.hadoop.tools.dynamometer.workloadgenerator.TestWorkloadGenerator)
>  Time elapsed: 1.353 s <<< ERROR!
>  java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2618)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2632)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1498)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:977)
>  at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>  at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16075:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Use empty array constants present in StorageType and DatanodeInfo to avoid 
> creating redundant objects
> -
>
> Key: HDFS-16075
> URL: https://issues.apache.org/jira/browse/HDFS-16075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> StorageType and DatanodeInfo already provides empty array constants. We 
> should use them where possible in order to avoid creating unnecessary new 
> empty array objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16080:
--
Component/s: rbf

> RBF: Invoking method in all locations should break the loop after successful 
> result
> ---
>
> Key: HDFS-16080
> URL: https://issues.apache.org/jira/browse/HDFS-16080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> rename, delete and mkdir used by Router client usually calls multiple 
> locations if the path is present in multiple sub-clusters. After invoking 
> multiple concurrent proxy calls to multiple clients, we iterate through all 
> results and mark anyResult true if at least one of them was successful. We 
> should break the loop if one of the proxy call result was successful rather 
> than iterating over remaining calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16075:
--
Component/s: hdfs

> Use empty array constants present in StorageType and DatanodeInfo to avoid 
> creating redundant objects
> -
>
> Key: HDFS-16075
> URL: https://issues.apache.org/jira/browse/HDFS-16075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> StorageType and DatanodeInfo already provides empty array constants. We 
> should use them where possible in order to avoid creating unnecessary new 
> empty array objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16075) Use empty array constants present in StorageType and DatanodeInfo to avoid creating redundant objects

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16075:
--
Affects Version/s: 3.3.2
   3.4.0

> Use empty array constants present in StorageType and DatanodeInfo to avoid 
> creating redundant objects
> -
>
> Key: HDFS-16075
> URL: https://issues.apache.org/jira/browse/HDFS-16075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> StorageType and DatanodeInfo already provides empty array constants. We 
> should use them where possible in order to avoid creating unnecessary new 
> empty array objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16080) RBF: Invoking method in all locations should break the loop after successful result

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16080:
--
Affects Version/s: 3.3.2
   3.4.0

> RBF: Invoking method in all locations should break the loop after successful 
> result
> ---
>
> Key: HDFS-16080
> URL: https://issues.apache.org/jira/browse/HDFS-16080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> rename, delete and mkdir used by Router client usually calls multiple 
> locations if the path is present in multiple sub-clusters. After invoking 
> multiple concurrent proxy calls to multiple clients, we iterate through all 
> results and mark anyResult true if at least one of them was successful. We 
> should break the loop if one of the proxy call result was successful rather 
> than iterating over remaining calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16090:
--
Component/s: datanode

> Fine grained locking for datanodeNetworkCounts
> --
>
> Key: HDFS-16090
> URL: https://issues.apache.org/jira/browse/HDFS-16090
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While incrementing DataNode network error count, we lock entire LoadingCache 
> in order to increment network count of specific host. We should provide fine 
> grained concurrency for this update because locking entire cache is redundant 
> and could impact performance while incrementing network count for multiple 
> hosts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16082:
--
Component/s: balancer

> Avoid non-atomic operations on exceptionsSinceLastBalance and 
> failedTimesSinceLastSuccessfulBalance in Balancer
> ---
>
> Key: HDFS-16082
> URL: https://issues.apache.org/jira/browse/HDFS-16082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Balancer has introduced 2 volatile int as part of HDFS-13783 namely: 
> exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. 
> However, we are performing non-atomic operations on it. Since non-atomic 
> operations done here mostly depend on their previous values, we should use 
> AtomicInteger for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16082:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Avoid non-atomic operations on exceptionsSinceLastBalance and 
> failedTimesSinceLastSuccessfulBalance in Balancer
> ---
>
> Key: HDFS-16082
> URL: https://issues.apache.org/jira/browse/HDFS-16082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Balancer has introduced 2 volatile int as part of HDFS-13783 namely: 
> exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. 
> However, we are performing non-atomic operations on it. Since non-atomic 
> operations done here mostly depend on their previous values, we should use 
> AtomicInteger for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16082) Avoid non-atomic operations on exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance in Balancer

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16082:
--
Affects Version/s: 3.3.2
   3.4.0

> Avoid non-atomic operations on exceptionsSinceLastBalance and 
> failedTimesSinceLastSuccessfulBalance in Balancer
> ---
>
> Key: HDFS-16082
> URL: https://issues.apache.org/jira/browse/HDFS-16082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Balancer has introduced 2 volatile int as part of HDFS-13783 namely: 
> exceptionsSinceLastBalance and failedTimesSinceLastSuccessfulBalance. 
> However, we are performing non-atomic operations on it. Since non-atomic 
> operations done here mostly depend on their previous values, we should use 
> AtomicInteger for both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16090:
--
Affects Version/s: 3.3.2
   3.4.0

> Fine grained locking for datanodeNetworkCounts
> --
>
> Key: HDFS-16090
> URL: https://issues.apache.org/jira/browse/HDFS-16090
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While incrementing DataNode network error count, we lock entire LoadingCache 
> in order to increment network count of specific host. We should provide fine 
> grained concurrency for this update because locking entire cache is redundant 
> and could impact performance while incrementing network count for multiple 
> hosts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16092:
--
Affects Version/s: 3.3.2
   3.4.0

> Avoid creating LayoutFlags redundant objects
> 
>
> Key: HDFS-16092
> URL: https://issues.apache.org/jira/browse/HDFS-16092
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use LayoutFlags to represent features that EditLog/FSImage can support. 
> The utility helps write int (0) to given OutputStream and if EditLog/FSImage 
> supports Layout flags, they read the value from InputStream to confirm 
> whether there are unsupported feature flags (non zero int). However, we also 
> create and return new object of LayoutFlags, which is not used anywhere 
> because it's just a utility to read/write to/from given stream. We should 
> remove such redundant objects from getting created while reading from 
> InputStream using LayoutFlags#read utility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16092:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.2.3, 3.4.0  (was: 3.4.0, 3.2.3, 3.3.2)

> Avoid creating LayoutFlags redundant objects
> 
>
> Key: HDFS-16092
> URL: https://issues.apache.org/jira/browse/HDFS-16092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use LayoutFlags to represent features that EditLog/FSImage can support. 
> The utility helps write int (0) to given OutputStream and if EditLog/FSImage 
> supports Layout flags, they read the value from InputStream to confirm 
> whether there are unsupported feature flags (non zero int). However, we also 
> create and return new object of LayoutFlags, which is not used anywhere 
> because it's just a utility to read/write to/from given stream. We should 
> remove such redundant objects from getting created while reading from 
> InputStream using LayoutFlags#read utility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16092) Avoid creating LayoutFlags redundant objects

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16092:
--
Component/s: hdfs

> Avoid creating LayoutFlags redundant objects
> 
>
> Key: HDFS-16092
> URL: https://issues.apache.org/jira/browse/HDFS-16092
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We use LayoutFlags to represent features that EditLog/FSImage can support. 
> The utility helps write int (0) to given OutputStream and if EditLog/FSImage 
> supports Layout flags, they read the value from InputStream to confirm 
> whether there are unsupported feature flags (non zero int). However, we also 
> create and return new object of LayoutFlags, which is not used anywhere 
> because it's just a utility to read/write to/from given stream. We should 
> remove such redundant objects from getting created while reading from 
> InputStream using LayoutFlags#read utility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16127:
--
Component/s: hdfs

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16127) Improper pipeline close recovery causes a permanent write failure or data loss.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16127:
--
Affects Version/s: 3.3.2
   3.4.0

> Improper pipeline close recovery causes a permanent write failure or data 
> loss.
> ---
>
> Key: HDFS-16127
> URL: https://issues.apache.org/jira/browse/HDFS-16127
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-16127.patch
>
>
> When a block is being closed, the data streamer in the client waits for the 
> final ACK to be delivered. If an exception is received during this wait, the 
> close is retried. This assumption has become invalid by HDFS-15813, resulting 
> in permanent write failures in some close error cases involving slow nodes. 
> There are also less frequent cases of data loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16140) TestBootstrapAliasmap fails by BindException

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16140:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> TestBootstrapAliasmap fails by BindException
> 
>
> Key: HDFS-16140
> URL: https://issues.apache.org/jira/browse/HDFS-16140
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TestBootstrapAliasmap fails if 50200 port is already in use.
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3227/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> {quote}
> [ERROR] 
> testAliasmapBootstrap(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap)
>   Time elapsed: 0.472 s  <<< ERROR!
> java.net.BindException: Problem binding to [0.0.0.0:50200] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:914)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:810)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:642)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1301)
>   at org.apache.hadoop.ipc.Server.(Server.java:3199)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:464)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
>   at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.start(InMemoryLevelDBAliasMapServer.java:98)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startAliasMapServerIfNecessary(NameNode.java:801)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1378)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1147)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1020)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:952)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap.setup(TestBootstrapAliasmap.java:56)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16139) Update BPServiceActor Scheduler's nextBlockReportTime atomically

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16139:
--
Component/s: datanode

> Update BPServiceActor Scheduler's nextBlockReportTime atomically
> 
>
> Key: HDFS-16139
> URL: https://issues.apache.org/jira/browse/HDFS-16139
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> BPServiceActor#Scheduler's nextBlockReportTime is declared volatile and it 
> can be assigned/read by testing threads (through BPServiceActor#triggerXXX) 
> as well as by actor threads. Hence it is declared volatile but it is still 
> assigned non-atomically
> e.g
> {code:java}
> if (factor != 0) {
>   nextBlockReportTime += factor * blockReportIntervalMs;
> } else {
>   // If the difference between the present time and the scheduled
>   // time is very less, the factor can be 0, so in that case, we can
>   // ignore that negligible time, spent while sending the BRss and
>   // schedule the next BR after the blockReportInterval.
>   nextBlockReportTime += blockReportIntervalMs;
> }
> {code}
> We should convert it to AtomicLong to take care of concurrent assignments 
> while making sure that it is assigned atomically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16139) Update BPServiceActor Scheduler's nextBlockReportTime atomically

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16139:
--
Affects Version/s: 3.3.5
   3.4.0

> Update BPServiceActor Scheduler's nextBlockReportTime atomically
> 
>
> Key: HDFS-16139
> URL: https://issues.apache.org/jira/browse/HDFS-16139
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> BPServiceActor#Scheduler's nextBlockReportTime is declared volatile and it 
> can be assigned/read by testing threads (through BPServiceActor#triggerXXX) 
> as well as by actor threads. Hence it is declared volatile but it is still 
> assigned non-atomically
> e.g
> {code:java}
> if (factor != 0) {
>   nextBlockReportTime += factor * blockReportIntervalMs;
> } else {
>   // If the difference between the present time and the scheduled
>   // time is very less, the factor can be 0, so in that case, we can
>   // ignore that negligible time, spent while sending the BRss and
>   // schedule the next BR after the blockReportInterval.
>   nextBlockReportTime += blockReportIntervalMs;
> }
> {code}
> We should convert it to AtomicLong to take care of concurrent assignments 
> while making sure that it is assigned atomically.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16140) TestBootstrapAliasmap fails by BindException

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16140:
--
Affects Version/s: 3.3.2
   3.4.0

> TestBootstrapAliasmap fails by BindException
> 
>
> Key: HDFS-16140
> URL: https://issues.apache.org/jira/browse/HDFS-16140
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TestBootstrapAliasmap fails if 50200 port is already in use.
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3227/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> {quote}
> [ERROR] 
> testAliasmapBootstrap(org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap)
>   Time elapsed: 0.472 s  <<< ERROR!
> java.net.BindException: Problem binding to [0.0.0.0:50200] 
> java.net.BindException: Address already in use; For more details see:  
> http://wiki.apache.org/hadoop/BindException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:914)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:810)
>   at org.apache.hadoop.ipc.Server.bind(Server.java:642)
>   at org.apache.hadoop.ipc.Server$Listener.(Server.java:1301)
>   at org.apache.hadoop.ipc.Server.(Server.java:3199)
>   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:1062)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server.(ProtobufRpcEngine2.java:464)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2.getServer(ProtobufRpcEngine2.java:371)
>   at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:853)
>   at 
> org.apache.hadoop.hdfs.server.aliasmap.InMemoryLevelDBAliasMapServer.start(InMemoryLevelDBAliasMapServer.java:98)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.startAliasMapServerIfNecessary(NameNode.java:801)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:1014)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:989)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1763)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1378)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1147)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1020)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:952)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:576)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:518)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestBootstrapAliasmap.setup(TestBootstrapAliasmap.java:56)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16143) TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits is flaky

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16143:
--
Affects Version/s: 3.4.0

> TestEditLogTailer#testStandbyTriggersLogRollsWhenTailInProgressEdits is flaky
> -
>
> Key: HDFS-16143
> URL: https://issues.apache.org/jira/browse/HDFS-16143
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3229/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> {quote}
> [ERROR] 
> testStandbyTriggersLogRollsWhenTailInProgressEdits[0](org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer)
>   Time elapsed: 6.862 s  <<< FAILURE!
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:87)
>   at org.junit.Assert.assertTrue(Assert.java:42)
>   at org.junit.Assert.assertTrue(Assert.java:53)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer.testStandbyTriggersLogRollsWhenTailInProgressEdits(TestEditLogTailer.java:444)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16144:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16144:
--
Component/s: namenode

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16157:
--
Hadoop Flags: Reviewed

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16157:
--
Fix Version/s: 3.4.0

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16144) Revert HDFS-15372 (Files in snapshots no longer see attribute provider permissions)

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16144:
--
Affects Version/s: 3.3.2
   3.4.0

> Revert HDFS-15372 (Files in snapshots no longer see attribute provider 
> permissions)
> ---
>
> Key: HDFS-16144
> URL: https://issues.apache.org/jira/browse/HDFS-16144
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
> Attachments: HDFS-16144.001.patch, HDFS-16144.002.patch, 
> HDFS-16144.003.patch, HDFS-16144.004.patch
>
>
> In HDFS-15372, I noted a change in behaviour between Hadoop 2 and Hadoop 3. 
> When a user accesses a file in a snapshot, if an attribute provider is 
> configured it would see the original file path (ie no .snapshot folder) in 
> Hadoop 2, but it would see the snapshot path in Hadoop 3.
> HDFS-15372 changed this back, but I noted at the time it may make sense for 
> the provider to see the actual snapshot path instead.
> Recently we discovered HDFS-16132 where the HDFS-15372 does not work 
> correctly. At this stage I believe it is better to revert HDFS-15372 as the 
> fix to this issue is probably not trivial and allow providers to see the 
> actual path the user accessed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16157) Support configuring DNS record to get list of journal nodes.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16157:
--
Affects Version/s: 3.4.0

> Support configuring DNS record to get list of journal nodes.
> 
>
> Key: HDFS-16157
> URL: https://issues.apache.org/jira/browse/HDFS-16157
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We can use a DNS round-robin record to configure list of journal nodes, so we 
> don't have to reconfigure everything journal node hostname is changed. For 
> example, in some containerized environment the hostname of journal nodes can 
> change pretty often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16184) De-flake TestBlockScanner#testSkipRecentAccessFile

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16184:
--
Affects Version/s: 3.3.2
   3.4.0

> De-flake TestBlockScanner#testSkipRecentAccessFile
> --
>
> Key: HDFS-16184
> URL: https://issues.apache.org/jira/browse/HDFS-16184
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Test TestBlockScanner#testSkipRecentAccessFile is flaky:
>  
> {code:java}
> [ERROR] 
> testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner)
>   Time elapsed: 3.936 s  <<< FAILURE![ERROR] 
> testSkipRecentAccessFile(org.apache.hadoop.hdfs.server.datanode.TestBlockScanner)
>   Time elapsed: 3.936 s  <<< FAILURE!java.lang.AssertionError: Scan nothing 
> for all files are accessed in last period. at 
> org.junit.Assert.fail(Assert.java:89) at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testSkipRecentAccessFile(TestBlockScanner.java:1015)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
> {code}
> e.g 
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3235/37/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16171) De-flake testDecommissionStatus

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16171:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.2.3, 2.10.2, 3.4.0  (was: 3.4.0, 2.10.2, 3.2.3, 
3.3.2)

> De-flake testDecommissionStatus
> ---
>
> Key: HDFS-16171
> URL: https://issues.apache.org/jira/browse/HDFS-16171
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2, 3.2.4
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> testDecommissionStatus keeps failing intermittently.
> {code:java}
> [ERROR] 
> testDecommissionStatus(org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor)
>   Time elapsed: 3.299 s  <<< FAILURE!
> java.lang.AssertionError: Unexpected num under-replicated blocks expected:<4> 
> but was:<3>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatus.checkDecommissionStatus(TestDecommissioningStatus.java:169)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor.testDecommissionStatus(TestDecommissioningStatusWithBackoffMonitor.java:136)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    4   5   6   7   8   9   10   11   12   13   >