[jira] [Updated] (HDFS-17261) RBF: Fix getFileInfo return wrong path when get mountTable path which multi-level

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17261:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Fix getFileInfo return wrong path when get mountTable path which 
> multi-level
> -
>
> Key: HDFS-17261
> URL: https://issues.apache.org/jira/browse/HDFS-17261
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With DFSRouter, Suppose there are two nameservices : ns0,ns1
>  # Add mountTable      /testgetfileinfo/ns1/dir  -> (ns1 -> 
> /testgetfileinfo/ns1/dir) 
>  # hdfs client via DFSRouter accesses a directory:   hdfs dfs -ls -d 
> /testgetfileinfo
>  # it will return worng path :    /testgetfileinfo/testgetfileinfo
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17260) Fix the logic for reconfigure slow peer enable for Namenode.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17260:
--
Component/s: namanode

> Fix the logic for reconfigure slow peer enable for Namenode.
> 
>
> Key: HDFS-17260
> URL: https://issues.apache.org/jira/browse/HDFS-17260
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17263) RBF: Fix client ls trash path cannot get except default nameservices trash path

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17263:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Fix client ls trash path cannot get except default nameservices trash 
> path
> ---
>
> Key: HDFS-17263
> URL: https://issues.apache.org/jira/browse/HDFS-17263
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With  HDFS-16024, we can rename data to the Trash should be based on src 
> locations. That is great for my useage.  After a period of use, I found this 
> cause a issue.
> There are two nameservices ns0   ns1,  and ns0 is the default nameservice.
> (1) Add moutTable 
> /home/data -> (ns0, /home/data)
> /data1/test1 -> (ns1, /data1/test1 )
> /data2/test2 -> (ns1, /data2/test2 )
> (2)mv file to trash
> ns0:   /user/test-user/.Trash/Current/home/data/file1
> ns1:   /user/test-user/.Trash/Current/data1/test1/file1
> (3) client via DFSRouter  ls will not see  
> /user/test-user/.Trash/Current/data1
> (4) client ls  /user/test-user/.Trash/Current/data2/test2 will return 
> exception .
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17262) Fixed the verbose log.warn in DFSUtil.addTransferRateMetric()

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17262:
--
  Component/s: logging
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fixed the verbose log.warn in DFSUtil.addTransferRateMetric()
> -
>
> Key: HDFS-17262
> URL: https://issues.apache.org/jira/browse/HDFS-17262
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 3.4.0
>Reporter: Bryan Beaudreault
>Assignee: Ravindra Dingankar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-16917 added a LOG.warn when passed duration is 0. The unit for duration 
> is millis, and its very possible for a read to take less than a millisecond 
> when considering local TCP connection. We are seeing this spam multiple times 
> per millisecond. There's another report on the PR for HDFS-16917.
> Please downgrade to debug or remove the log



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17265) RBF: Throwing an exception prevents the permit from being released when using FairnessPolicyController

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17265:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Throwing an exception prevents the permit from being released when using 
> FairnessPolicyController
> --
>
> Key: HDFS-17265
> URL: https://issues.apache.org/jira/browse/HDFS-17265
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-17265.patch
>
>
> *Bug description*
> When the router uses FairnessPolicyController, each time a request is 
> processed,
> the permit of the ns corresponding to the request will be obtained first 
> {*}(method acquirePermit){*},
> and then the  information of namenodes corresponding to the ns will be 
> obtained{*}(method getOrderedNamenodes){*}.
> getOrderedNamenodes comes after acquirePermit, so if acquirePermit succeeds 
> but getOrderedNamenodes throws an exception, the permit cannot be released.
>  
> *How to reproduce*
> Use the original code to run the new unit test 
> testReleasedWhenExceptionOccurs in this PR
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17275:
--
Component/s: hdfs

> Judge whether the block has been deleted in the block report
> 
>
> Key: HDFS-17275
> URL: https://issues.apache.org/jira/browse/HDFS-17275
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In 
> block report.,We may do some useless block related calculations when blocks 
> haven't been added to invalidateBlocks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17275:
--
Target Version/s: 3.4.0

> Judge whether the block has been deleted in the block report
> 
>
> Key: HDFS-17275
> URL: https://issues.apache.org/jira/browse/HDFS-17275
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In 
> block report.,We may do some useless block related calculations when blocks 
> haven't been added to invalidateBlocks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17270) RBF: Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper client to get token in some case

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17270:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper client  to 
> get token in some case
> --
>
> Key: HDFS-17270
> URL: https://issues.apache.org/jira/browse/HDFS-17270
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: CuratorFrameworkException
>
>
> Now, we use CuratorFramework to simplifies using ZooKeeper in 
> ZKDelegationTokenSecretManagerImpl and we always hold the same 
> zookeeperClient after initialization ZKDelegationTokenSecretManagerImpl. But 
> in some cases like network problem , CuratorFramework may close current 
> zookeeperClient and create new one. In this case , we will use  a zkclient 
> which has been closed  to get token. We encountered this situation in our 
> cluster,exception information in attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17272) NNThroughputBenchmark should support specifying the base directory for multi-client test

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17272:
--
Target Version/s: 3.4.0

> NNThroughputBenchmark should support specifying the base directory for 
> multi-client test
> 
>
> Key: HDFS-17272
> URL: https://issues.apache.org/jira/browse/HDFS-17272
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, NNThroughputBenchmark does not support specifying the base 
> directory, therefore does not support multiple clients performing stress 
> testing at the same time. However, for high-performance namenode machine, 
> only one client submitting stress test can not make the namenode rpc access 
> reach the bottleneck. Therefore, multiple clients are required for parallel 
> testing to make the namenode pressure reach the level of the large-scale 
> production cluster.
> So I specify the base directory through the -baseDirName parameter to support 
> multiple clients submitting stress tests at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17277) Delete invalid code logic in namenode format

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17277:
--
 Target Version/s: 3.4.0, 3.3.9
Affects Version/s: 3.4.0
   3.3.9

> Delete invalid code logic in namenode format
> 
>
> Key: HDFS-17277
> URL: https://issues.apache.org/jira/browse/HDFS-17277
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.9
>Reporter: zhangzhanchang
>Assignee: zhangzhanchang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> There is invalid logical processing in the namenode format process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17278:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Detect order dependent flakiness in TestViewfsWithNfs3.java under 
> hadoop-hdfs-nfs module
> 
>
> Key: HDFS-17278
> URL: https://issues.apache.org/jira/browse/HDFS-17278
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
> Environment: openjdk version "17.0.9"
> Apache Maven 3.9.5
>Reporter: Ruby
>Assignee: Ruby
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: failed-1.png, failed-2.png, success.png
>
>
> The order dependent flakiness was detected if the test class 
> TestDFSClientCache.java runs before TestRpcProgramNfs3.java.
> The error message looks like below:
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestRpcProgramNfs3.testAccess:279 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testCommit:764 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testCreate:493 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   
> TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 
> Incorrect response:  expected: but 
> was:
> [ERROR]   TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testFsstat:696 Incorrect return code: 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testGetattr:205 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testLookup:249 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testMkdir:517 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testPathconf:738 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> 
> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testReaddir:642 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReadlink:297 Incorrect return code: 
> expected:<0> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRemove:570 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRename:618 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRmdir:594 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSetattr:225 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSymlink:546 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testWrite:468 Incorrect return code: 
> expected:<13> but was:<5>
> [INFO] 
> [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0
> [INFO] 
> [ERROR] There are test failures. {code}
> The polluter that led to this flakiness was the test method
> testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a 
> line 
> {code:java}
> UserGroupInformation.setLoginUser(currentUserUgi);{code}
> which modifies some shared state and resource, something like pre-setup the 
> config. To fix this issue, I added the cleanup methods in 
> TestDFSClientCache.java to reset the UserGroupInformation to ensure the 
> isolation among each test class.
> {code:java}
> @AfterClass
> public static void cleanup() {
> UserGroupInformation.reset();
> }{code}
> Including setting
> {code:java}
> authenticationMethod = null;
> conf = null; // set configuration to null
> setLoginUser(null); // reset login user to default null{code}
> ..., and so on. The reset() methods can be referred to 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java.
> After the fix, the error was no longer exist and the succeed message was:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
>  
> {code}
> Here is the CustomTest.java file that I used to run these two tests in order, 
> the 

[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17275:
--
Affects Version/s: 3.4.0

> Judge whether the block has been deleted in the block report
> 
>
> Key: HDFS-17275
> URL: https://issues.apache.org/jira/browse/HDFS-17275
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In 
> block report.,We may do some useless block related calculations when blocks 
> haven't been added to invalidateBlocks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17277) Delete invalid code logic in namenode format

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17277:
--
Component/s: namenode

> Delete invalid code logic in namenode format
> 
>
> Key: HDFS-17277
> URL: https://issues.apache.org/jira/browse/HDFS-17277
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.9
>Reporter: zhangzhanchang
>Assignee: zhangzhanchang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> There is invalid logical processing in the namenode format process



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17278:
--
Component/s: nfs
 test

> Detect order dependent flakiness in TestViewfsWithNfs3.java under 
> hadoop-hdfs-nfs module
> 
>
> Key: HDFS-17278
> URL: https://issues.apache.org/jira/browse/HDFS-17278
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nfs, test
>Affects Versions: 3.4.0
> Environment: openjdk version "17.0.9"
> Apache Maven 3.9.5
>Reporter: Ruby
>Assignee: Ruby
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: failed-1.png, failed-2.png, success.png
>
>
> The order dependent flakiness was detected if the test class 
> TestDFSClientCache.java runs before TestRpcProgramNfs3.java.
> The error message looks like below:
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestRpcProgramNfs3.testAccess:279 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testCommit:764 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testCreate:493 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   
> TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 
> Incorrect response:  expected: but 
> was:
> [ERROR]   TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testFsstat:696 Incorrect return code: 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testGetattr:205 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testLookup:249 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testMkdir:517 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testPathconf:738 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> 
> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testReaddir:642 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReadlink:297 Incorrect return code: 
> expected:<0> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRemove:570 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRename:618 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRmdir:594 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSetattr:225 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSymlink:546 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testWrite:468 Incorrect return code: 
> expected:<13> but was:<5>
> [INFO] 
> [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0
> [INFO] 
> [ERROR] There are test failures. {code}
> The polluter that led to this flakiness was the test method
> testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a 
> line 
> {code:java}
> UserGroupInformation.setLoginUser(currentUserUgi);{code}
> which modifies some shared state and resource, something like pre-setup the 
> config. To fix this issue, I added the cleanup methods in 
> TestDFSClientCache.java to reset the UserGroupInformation to ensure the 
> isolation among each test class.
> {code:java}
> @AfterClass
> public static void cleanup() {
> UserGroupInformation.reset();
> }{code}
> Including setting
> {code:java}
> authenticationMethod = null;
> conf = null; // set configuration to null
> setLoginUser(null); // reset login user to default null{code}
> ..., and so on. The reset() methods can be referred to 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java.
> After the fix, the error was no longer exist and the succeed message was:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
>  
> {code}
> Here is the CustomTest.java file that I used to run these two tests 

[jira] [Updated] (HDFS-17279) RBF: Fix link to Fedbalance document

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17279:
--
Target Version/s: 3.4.0

> RBF: Fix link to Fedbalance document 
> -
>
> Key: HDFS-17279
> URL: https://issues.apache.org/jira/browse/HDFS-17279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 
> Fix link to Fedbalance document cannot be displayed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17282) Reconfig 'SlowIoWarningThreshold' parameters for datanode.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17282:
--
Component/s: datanode

> Reconfig 'SlowIoWarningThreshold' parameters for datanode.
> --
>
> Key: HDFS-17282
> URL: https://issues.apache.org/jira/browse/HDFS-17282
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17282) Reconfig 'SlowIoWarningThreshold' parameters for datanode.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17282:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Reconfig 'SlowIoWarningThreshold' parameters for datanode.
> --
>
> Key: HDFS-17282
> URL: https://issues.apache.org/jira/browse/HDFS-17282
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17294:
--
Hadoop Flags: Reviewed

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: configuration
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17279) RBF: Fix link to Fedbalance document

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17279:
--
Affects Version/s: 3.4.0

> RBF: Fix link to Fedbalance document 
> -
>
> Key: HDFS-17279
> URL: https://issues.apache.org/jira/browse/HDFS-17279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 
> Fix link to Fedbalance document cannot be displayed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17294:
--
Target Version/s: 3.4.0

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: configuration
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17294:
--
Affects Version/s: 3.4.0

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: configuration
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17297:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0, 3.3.9
Affects Version/s: 3.4.0
   3.3.9

> The NameNode should remove block from the BlocksMap if the block is marked as 
> deleted.
> --
>
> Key: HDFS-17297
> URL: https://issues.apache.org/jira/browse/HDFS-17297
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> When call internalReleaseLease method:
> {code:java}
> boolean internalReleaseLease(
> ...
> int minLocationsNum = 1;
> if (lastBlock.isStriped()) {
>   minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum();
> }
> if (uc.getNumExpectedLocations() < minLocationsNum &&
> lastBlock.getNumBytes() == 0) {
>   // There is no datanode reported to this block.
>   // may be client have crashed before writing data to pipeline.
>   // This blocks doesn't need any recovery.
>   // We can remove this block and close the file.
>   pendingFile.removeLastBlock(lastBlock);
>   finalizeINodeFileUnderConstruction(src, pendingFile,
>   iip.getLatestSnapshotId(), false); 
> ...
> }
> {code}
>  if the condition `uc.getNumExpectedLocations() < minLocationsNum && 
> lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY 
> logic, the block is removed from the block list in the inode file and marked 
> as deleted. 
> However it is not removed from the BlocksMap, it may cause memory leak.
> Therefore it is necessary to remove the block from the BlocksMap at this 
> point as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17294:
--
Component/s: configuration

> Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
> ---
>
> Key: HDFS-17294
> URL: https://issues.apache.org/jira/browse/HDFS-17294
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: configuration
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17298:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0, 3.3.9
Affects Version/s: 3.4.0
   3.3.9

> Fix NPE in DataNode.handleBadBlock and BlockSender
> --
>
> Key: HDFS-17298
> URL: https://issues.apache.org/jira/browse/HDFS-17298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> There are some NPE issues on the DataNode side of our online environment.
> The detailed exception information is
> {code:java}
> 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending 
> block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK 
> operation  src: /xxx:41452 dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> if (!fromScanner && blockScanner.isEnabled()) {
>   // data.getVolume(block) is null
>   blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(),
>   block);
> } 
> {code}
> {code:java}
> 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) 
> [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - 
> xxx:50010:DataXceiver error processing COPY_BLOCK operation  src: /xxx:61052 
> dst: /xxx:50010
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> NPE Code logic:
> {code:java}
> // Obtain a reference before reading data
> volumeRef = datanode.data.getVolume(block).obtainReference(); 
> //datanode.data.getVolume(block) is null  
> {code}
> We need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17315:
--
Component/s: namenode
 (was: namanode)

> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.9
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17315:
--
Hadoop Flags: Reviewed

> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.9
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17301:
--
Component/s: datanode

> Add read and write dataXceiver threads count metrics to datanode.
> -
>
> Key: HDFS-17301
> URL: https://issues.apache.org/jira/browse/HDFS-17301
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> # The DataNodeActiveXeiversCount metric contains the number of threads of all 
> Op types.
>  # In most cases, we focus more on the number of read and write dataXceiver 
> threads, so add read and write dataXceiver threads count metrics to datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17301:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add read and write dataXceiver threads count metrics to datanode.
> -
>
> Key: HDFS-17301
> URL: https://issues.apache.org/jira/browse/HDFS-17301
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> # The DataNodeActiveXeiversCount metric contains the number of threads of all 
> Op types.
>  # In most cases, we focus more on the number of read and write dataXceiver 
> threads, so add read and write dataXceiver threads count metrics to datanode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17315:
--
 Target Version/s: 3.4.0, 3.3.9
Affects Version/s: 3.4.0
   3.3.9

> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.9
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17315:
--
Component/s: namanode

> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.4.0, 3.3.9
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17317:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12862) CacheDirective becomes invalid when NN restart or failover

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-12862:
--
Hadoop Flags: Reviewed
 Environment: (was: 
)

> CacheDirective becomes invalid when NN restart or failover
> --
>
> Key: HDFS-12862
> URL: https://issues.apache.org/jira/browse/HDFS-12862
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: caching, hdfs
>Affects Versions: 2.7.1
>Reporter: Wang XL
>Assignee: Wang XL
>Priority: Major
>  Labels: patch
> Fix For: 3.3.0, 3.2.2
>
> Attachments: HDFS-12862-branch-2.7.1.001.patch, 
> HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch, 
> HDFS-12862-trunk.004.patch, HDFS-12862.005.patch, HDFS-12862.006.patch, 
> HDFS-12862.007.patch, HDFS-12862.branch-3.1.patch
>
>
> The logic in FSNDNCacheOp#modifyCacheDirective is not correct.  when modify 
> cacheDirective,the expiration in directive may be a relative expiryTime, and 
> EditLog will serial a relative expiry time.
> {code:java}
> // Some comments here
> static void modifyCacheDirective(
>   FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo 
> directive,
>   EnumSet flags, boolean logRetryCache) throws IOException {
> final FSPermissionChecker pc = getFsPermissionChecker(fsn);
> cacheManager.modifyDirective(directive, pc, flags);
> fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache);
>   }
> {code}
> But when SBN replay the log ,it will invoke 
> FSImageSerialization#readCacheDirectiveInfo  as a absolute expiryTime.It will 
> result in the inconsistency .
> {code:java}
>   public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in)
>   throws IOException {
> CacheDirectiveInfo.Builder builder =
> new CacheDirectiveInfo.Builder();
> builder.setId(readLong(in));
> int flags = in.readInt();
> if ((flags & 0x1) != 0) {
>   builder.setPath(new Path(readString(in)));
> }
> if ((flags & 0x2) != 0) {
>   builder.setReplication(readShort(in));
> }
> if ((flags & 0x4) != 0) {
>   builder.setPool(readString(in));
> }
> if ((flags & 0x8) != 0) {
>   builder.setExpiration(
>   CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)));
> }
> if ((flags & ~0xF) != 0) {
>   throw new IOException("unknown flags set in " +
>   "ModifyCacheDirectiveInfoOp: " + flags);
> }
> return builder.build();
>   }
> {code}
> In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, 
> logRetryCache)  may serial a relative expiry time,But  
> builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in)))
>read it as a absolute expiryTime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-12920:
--
Affects Version/s: 3.3.2
   3.4.0

> HDFS default value change (with adding time unit) breaks old version MR 
> tarball work with Hadoop 3.x
> 
>
> Key: HDFS-12920
> URL: https://issues.apache.org/jira/browse/HDFS-12920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: configuration, hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Junping Du
>Assignee: Akira Ajisaka
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 
> RC1, and run the job with following errors:
> {noformat}
> 2017-12-12 13:29:06,824 INFO [main] 
> org.apache.hadoop.service.AbstractService: Service 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650)
> {noformat}
> This is because HDFS-10845, we are adding time unit to hdfs-default.xml but 
> it cannot be recognized by old version MR jars. 
> This break our rolling upgrade story, so should mark as blocker.
> A quick workaround is to add values in hdfs-site.xml with removing all time 
> unit. But the right way may be to revert HDFS-10845 (and get rid of noisy 
> warnings).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-12920:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.2.3, 3.4.0  (was: 3.4.0, 3.2.3, 3.3.2)

> HDFS default value change (with adding time unit) breaks old version MR 
> tarball work with Hadoop 3.x
> 
>
> Key: HDFS-12920
> URL: https://issues.apache.org/jira/browse/HDFS-12920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: configuration, hdfs
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Junping Du
>Assignee: Akira Ajisaka
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 
> RC1, and run the job with following errors:
> {noformat}
> 2017-12-12 13:29:06,824 INFO [main] 
> org.apache.hadoop.service.AbstractService: Service 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: 
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.lang.NumberFormatException: For input string: "30s"
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650)
> {noformat}
> This is because HDFS-10845, we are adding time unit to hdfs-default.xml but 
> it cannot be recognized by old version MR jars. 
> This break our rolling upgrade story, so should mark as blocker.
> A quick workaround is to add values in hdfs-site.xml with removing all time 
> unit. But the right way may be to revert HDFS-10845 (and get rid of noisy 
> warnings).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13639:
--
Hadoop Flags: Reviewed

> SlotReleaser is not fast enough
> ---
>
> Key: HDFS-13639
> URL: https://issues.apache.org/jira/browse/HDFS-13639
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.4.0, 2.6.0, 3.0.2
> Environment: 1. YCSB:
> {color:#00} recordcount=20
>  fieldcount=1
>  fieldlength=1000
>  operationcount=1000
>  
>  workload=com.yahoo.ycsb.workloads.CoreWorkload
>  
>  table=ycsb-test
>  columnfamily=C
>  readproportion=1
>  updateproportion=0
>  insertproportion=0
>  scanproportion=0
>  
>  maxscanlength=0
>  requestdistribution=zipfian
>  
>  # default 
>  readallfields=true
>  writeallfields=true
>  scanlengthdistribution=constan{color}
> {color:#00}2. datanode:{color}
> -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m 
> -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log 
> -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled 
> -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 
> -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure 
> -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps
> {color:#00}3. regionserver:{color}
> {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g 
> -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 
> -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 
> -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc 
> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime 
> -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy 
> -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 
> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m 
> -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking 
> -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 
> -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 
> -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 
> -XX:G1OldCSetRegionThresholdPercent=5{color}
> {color:#00}block cache is disabled:{color}{color:#00} 
>  hbase.bucketcache.size
>  0.9
>  {color}
>  
>Reporter: Gang Xie
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, 
> HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, 
> perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png
>
>
> When test the performance of the ShortCircuit Read of the HDFS with YCSB, we 
> find that SlotReleaser of the ShortCircuitCache has some performance issue. 
> The problem is that, the qps of the slot releasing could only reach to 1000+ 
> while the qps of the slot allocating is ~3000. This means that the replica 
> info on datanode could not be released in time, which causes a lot of GCs and 
> finally full GCs.
>  
> The fireflame graph shows that SlotReleaser spends a lot of time to do domain 
> socket connecting and throw/catching the exception when close the domain 
> socket and its streams. It doesn't make any sense to do the connecting and 
> closing each time. Each time when we connect to the domain socket, Datanode 
> allocates a new thread to free the slot. There are a lot of initializing 
> work, and it's costly. We need reuse the domain socket. 
>  
> After switch to reuse the domain socket(see diff attached), we get great 
> improvement(see the perf):
>  # without reusing the domain socket, the get qps of the YCSB getting worse 
> and worse, and after about 45 mins, full GC starts. When we reuse the domain 
> socket, no full GC found, and the stress test could be finished smoothly, the 
> qps of allocating and releasing match.
>  # Due to the datanode young GC, without the improvement, the YCSB get qps is 
> even smaller than the one with the improvement, ~3700 VS ~4200.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13671:
--
Hadoop Flags: Reviewed

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namnode
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, 
> image-2021-06-18-15-47-04-037.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-13671:
--
Component/s: namnode

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namnode
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Assignee: Haibin Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, 
> image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, 
> image-2021-06-18-15-47-04-037.png
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14013) Skip any credentials stored in HDFS when starting ZKFC

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14013:
--
Hadoop Flags: Reviewed

> Skip any credentials stored in HDFS when starting ZKFC
> --
>
> Key: HDFS-14013
> URL: https://issues.apache.org/jira/browse/HDFS-14013
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.1
>Reporter: Krzysztof Adamski
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: zkfc
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-14013.001.patch, hadoop-hdfs-zkfc-server1.log
>
>
> HADOOP-15157 added the ability to use a jceks credential provider to store 
> the Zookeeper credentials needed by the Failover Controller to connect to 
> Zookeeper.
> By default, if any provider is specified in 
> hadoop.security.credential.provider.path it will be checked to see if it 
> holds the required information, otherwise the traditional way of getting the 
> the login will be used.
> hadoop.security.credential.provider.path can hold a list of credential 
> providers and if there is an error reading any of them, the exception bubbles 
> up and causes the ZKFC to fail. The intent of HADOOP-15157 is to have a local 
> jceks file for the FC credentials, but if there is another provider stored in 
> HDFS (eg S3A credentials), then it will fail to be read and cause the FC to 
> fail.
> Other components which use credential providers (eg S3A, ABFS etc) explicitly 
> disallow storing the credentials in the same type of filesystem. Ie, S3A 
> cannot use providers stored in S3. To avoid this sort of circular dependency, 
> any such credentials are removed from the list before they are used.
> The Failover Controller should do the same, and ensure it does not try to 
> read any credentials stored in HDFS, as it will never be able to do so until 
> HDFS is full started.
> For reference, the stack logged when the FC meets this problem is:
>   
> {code:java}
> 2018-10-22 08:17:09,251 FATAL tools.DFSZKFailoverController 
> (DFSZKFailoverController.java:main(197)) - DFSZKFailOverController exiting 
> due to earlier exception java.io.IOException: Configuration problem with 
> provider path. 2018-10-22 08:17:09,252 DEBUG util.ExitUtil 
> (ExitUtil.java:terminate(209)) - Exiting with status 1: java.io.IOException: 
> Configuration problem with provider path. 1: java.io.IOException: 
> Configuration problem with provider path. at 
> org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:199)
>  Caused by: java.io.IOException: Configuration problem with provider path.    
>  at 
> org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363)
>  at 
> org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282) 
> at 
> org.apache.hadoop.security.SecurityUtil.getZKAuthInfos(SecurityUtil.java:732) 
> at 
> org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:343)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:194)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175)
>  at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:360) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>  at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171)  
>    at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195)
>  Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1951)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1427)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3100)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154)
>  at 
> 

[jira] [Updated] (HDFS-14694) Call recoverLease on DFSOutputStream close exception

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-14694:
--
Affects Version/s: 3.4.0

> Call recoverLease on DFSOutputStream close exception
> 
>
> Key: HDFS-14694
> URL: https://issues.apache.org/jira/browse/HDFS-14694
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.4.0
>Reporter: Chen Zhang
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-14694.001.patch, HDFS-14694.002.patch, 
> HDFS-14694.003.patch, HDFS-14694.004.patch, HDFS-14694.005.patch, 
> HDFS-14694.006.patch, HDFS-14694.007.patch, HDFS-14694.008.patch, 
> HDFS-14694.009.patch, HDFS-14694.010.patch, HDFS-14694.011.patch, 
> HDFS-14694.012.patch, HDFS-14694.013.patch, HDFS-14694.014.patch
>
>
> HDFS uses file-lease to manage opened files, when a file is not closed 
> normally, NN will recover lease automatically after hard limit exceeded. But 
> for a long running service(e.g. HBase), the hdfs-client will never die and NN 
> don't have any chances to recover the file.
> Usually client program needs to handle exceptions by themself to avoid this 
> condition(e.g. HBase automatically call recover lease for files that not 
> closed normally), but in our experience, most services (in our company) don't 
> process this condition properly, which will cause lots of files in abnormal 
> status or even data loss.
> This Jira propose to add a feature that call recoverLease operation 
> automatically when DFSOutputSteam close encounters exception. It should be 
> disabled by default, but when somebody builds a long-running service based on 
> HDFS, they can enable this option.
> We've add this feature to our internal Hadoop distribution for more than 3 
> years, it's quite useful according our experience.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15098:
--
Component/s: hdfs

> Add SM4 encryption method for HDFS
> --
>
> Key: HDFS-15098
> URL: https://issues.apache.org/jira/browse/HDFS-15098
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: liusheng
>Assignee: liusheng
>Priority: Major
>  Labels: pull-request-available, sm4
> Fix For: 3.4.0
>
> Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, 
> HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, 
> HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, 
> HDFS-15098.009.patch, image-2020-08-19-16-54-41-341.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard 
> for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure).
>  SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far 
> been rejected by ISO. One of the reasons for the rejection has been 
> opposition to the WAPI fast-track proposal by the IEEE. please see:
> [https://en.wikipedia.org/wiki/SM4_(cipher)]
>  
> *Use sm4 on hdfs as follows:*
> 1.Configure Hadoop KMS
>  2.test HDFS sm4
>  hadoop key create key1 -cipher 'SM4/CTR/NoPadding'
>  hdfs dfs -mkdir /benchmarks
>  hdfs crypto -createZone -keyName key1 -path /benchmarks
> *requires:*
>  1.openssl version >=1.1.1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15160:
--
Hadoop Flags: Reviewed
Target Version/s: 3.2.3, 2.10.2, 3.4.0  (was: 3.4.0, 2.10.2, 3.2.3)

> ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl 
> methods should use datanode readlock
> ---
>
> Key: HDFS-15160
> URL: https://issues.apache.org/jira/browse/HDFS-15160
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
> Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, 
> HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, 
> HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, 
> HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, 
> image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Now we have HDFS-15150, we can start to move some DN operations to use the 
> read lock rather than the write lock to improve concurrence. The first step 
> is to make the changes to ReplicaMap, as many other methods make calls to it.
> This Jira switches read operations against the volume map to use the readLock 
> rather than the write lock.
> Additionally, some methods make a call to replicaMap.replicas() (eg 
> getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result 
> in a read only fashion, so they can also be switched to using a readLock.
> Next is the directory scanner and disk balancer, which only require a read 
> lock.
> Finally (for this Jira) are various "low hanging fruit" items in BlockSender 
> and fsdatasetImpl where is it fairly obvious they only need a read lock.
> For now, I have avoided changing anything which looks too risky, as I think 
> its better to do any larger refactoring or risky changes each in their own 
> Jira.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15240:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.1, 3.2.2, 3.4.0  (was: 3.2.2, 3.3.1, 3.4.0)

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.1, 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Blocker
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15240-branch-3.1-001.patch, 
> HDFS-15240-branch-3.1.001.patch, HDFS-15240-branch-3.2.001.patch, 
> HDFS-15240-branch-3.3-001.patch, HDFS-15240-branch-3.3.001.patch, 
> HDFS-15240.001.patch, HDFS-15240.002.patch, HDFS-15240.003.patch, 
> HDFS-15240.004.patch, HDFS-15240.005.patch, HDFS-15240.006.patch, 
> HDFS-15240.007.patch, HDFS-15240.008.patch, HDFS-15240.009.patch, 
> HDFS-15240.010.patch, HDFS-15240.011.patch, HDFS-15240.012.patch, 
> HDFS-15240.013.patch, image-2020-07-16-15-56-38-608.png, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, 
> test-HDFS-15240.006.patch
>
>
> # When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> EDITED for readability.
> {code:java}
> decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[6] and block[6'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4
> decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 65536
> CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest 
> common substring length is 27197440  # this one
> Check the first 131072 bytes between block[7] and block[7'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 

[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15240:
--
Affects Version/s: 3.3.1
   3.4.0

> Erasure Coding: dirty buffer causes reconstruction block error
> --
>
> Key: HDFS-15240
> URL: https://issues.apache.org/jira/browse/HDFS-15240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.3.1, 3.4.0
>Reporter: HuangTao
>Assignee: HuangTao
>Priority: Blocker
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15240-branch-3.1-001.patch, 
> HDFS-15240-branch-3.1.001.patch, HDFS-15240-branch-3.2.001.patch, 
> HDFS-15240-branch-3.3-001.patch, HDFS-15240-branch-3.3.001.patch, 
> HDFS-15240.001.patch, HDFS-15240.002.patch, HDFS-15240.003.patch, 
> HDFS-15240.004.patch, HDFS-15240.005.patch, HDFS-15240.006.patch, 
> HDFS-15240.007.patch, HDFS-15240.008.patch, HDFS-15240.009.patch, 
> HDFS-15240.010.patch, HDFS-15240.011.patch, HDFS-15240.012.patch, 
> HDFS-15240.013.patch, image-2020-07-16-15-56-38-608.png, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, 
> org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, 
> test-HDFS-15240.006.patch
>
>
> # When read some lzo files we found some blocks were broken.
> I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from 
> DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') 
> blocks. And find the longest common sequenece(LCS) between b6'(decoded) and 
> b6(read from DN)(b7'/b7 and b8'/b8).
> After selecting 6 blocks of the block group in combinations one time and 
> iterating through all cases, I find one case that the length of LCS is the 
> block length - 64KB, 64KB is just the length of ByteBuffer used by 
> StripedBlockReader. So the corrupt reconstruction block is made by a dirty 
> buffer.
> The following log snippet(only show 2 of 28 cases) is my check program 
> output. In my case, I known the 3th block is corrupt, so need other 5 blocks 
> to decode another 3 blocks, then find the 1th block's LCS substring is block 
> length - 64kb.
> It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the 
> dirty buffer was used before read the 1th block.
> Must be noted that StripedBlockReader read from the offset 0 of the 1th block 
> after used the dirty buffer.
> EDITED for readability.
> {code:java}
> decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[6] and block[6'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4
> decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8']
> Check the first 131072 bytes between block[1] and block[1'], the longest 
> common substring length is 65536
> CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest 
> common substring length is 27197440  # this one
> Check the first 131072 bytes between block[7] and block[7'], the longest 
> common substring length is 4
> Check the first 131072 bytes between block[8] and block[8'], the longest 
> common substring length is 4{code}
> Now I know the dirty buffer causes reconstruction block error, but how does 
> the dirty buffer come about?
> After digging into the code and DN log, I found this following DN log is the 
> root reason.
> {code:java}
> [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel 
> java.nio.channels.SocketChannel[connected local=/:52586 
> remote=/:50010]. 18 millis timeout left.
> [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped 
> block: BP-714356632--1519726836856:blk_-YY_3472979393
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94)
> at 
> org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60)
> at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> at 
> 

[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15253:
--
Affects Version/s: 3.3.1
   3.4.0

> Set default throttle value on dfs.image.transfer.bandwidthPerSec
> 
>
> Key: HDFS-15253
> URL: https://issues.apache.org/jira/browse/HDFS-15253
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can 
> use maximum available bandwidth for fsimage transfers during checkpoint. I 
> think we should throttle this. Many users were experienced namenode failover 
> when transferring large image size along with fsimage replication on 
> dfs.namenode.name.dir. eg. >25Gb.  
> Thought to set,
> dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s)
> dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent 
> checkpoint. However, the default checkpoint runs every 6 hours once)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15255:
--
Component/s: hdfs

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> 
>
> Key: HDFS-15255
> URL: https://issues.apache.org/jira/browse/HDFS-15255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15255-findbugs-test.001.patch, 
> HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, 
> HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, 
> HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, 
> HDFS-15255.010.patch, experiment-find-bugs.001.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return fast StorageType node when the 
> distance is same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15253:
--
Hadoop Flags: Reviewed

> Set default throttle value on dfs.image.transfer.bandwidthPerSec
> 
>
> Key: HDFS-15253
> URL: https://issues.apache.org/jira/browse/HDFS-15253
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Karthik Palanisamy
>Assignee: Karthik Palanisamy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can 
> use maximum available bandwidth for fsimage transfers during checkpoint. I 
> think we should throttle this. Many users were experienced namenode failover 
> when transferring large image size along with fsimage replication on 
> dfs.namenode.name.dir. eg. >25Gb.  
> Thought to set,
> dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s)
> dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent 
> checkpoint. However, the default checkpoint runs every 6 hours once)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15255:
--
Affects Version/s: 3.3.1
   3.4.0

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> 
>
> Key: HDFS-15255
> URL: https://issues.apache.org/jira/browse/HDFS-15255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15255-findbugs-test.001.patch, 
> HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, 
> HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, 
> HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, 
> HDFS-15255.010.patch, experiment-find-bugs.001.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return fast StorageType node when the 
> distance is same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15287) HDFS rollingupgrade prepare never finishes

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15287:
--
Hadoop Flags: Reviewed

> HDFS rollingupgrade prepare never finishes
> --
>
> Key: HDFS-15287
> URL: https://issues.apache.org/jira/browse/HDFS-15287
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.10.0, 3.3.0
>Reporter: Kihwal Lee
>Priority: Major
>
> After HDFS-12979, the prepare step of rolling upgrade does not work. This is 
> because it added additional check for sufficient time passing since last 
> checkpoint. Since RU rollback image creation and upload can happen any time, 
> uploading of rollback image never succeeds. For a new cluster deployed for 
> testing, it might work since it never checkpointed before.
> It was found that this check is disabled for unit tests, defeating the very 
> purpose of testing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15283) Cache pool MAXTTL is not persisted and restored on cluster restart

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15283:
--
Hadoop Flags: Reviewed

> Cache pool MAXTTL is not persisted and restored on cluster restart
> --
>
> Key: HDFS-15283
> URL: https://issues.apache.org/jira/browse/HDFS-15283
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1
>
> Attachments: HDFS-15283.001.patch
>
>
> The cache pool "getMaxRelativeExpiryMs" is never persisted to or read from 
> the FSImage. This means that if a MAXTTL is set on a pool, it will not 
> persist beyond a cluster restart.
> From the protobuf definition, there is an existing field to store it:
> {code}
> message CachePoolInfoProto {
>   optional string poolName = 1;
>   optional string ownerName = 2;
>   optional string groupName = 3;
>   optional int32 mode = 4;
>   optional int64 limit = 5;
>   optional int64 maxRelativeExpiry = 6; <-- NEVER SET
>   optional uint32 defaultReplication = 7 [default=1];
> }
> {code}
> But this is never set in the CacheManager.saveState() or read in 
> CacheManager.loadState().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15255:
--
Hadoop Flags: Reviewed

> Consider StorageType when DatanodeManager#sortLocatedBlock()
> 
>
> Key: HDFS-15255
> URL: https://issues.apache.org/jira/browse/HDFS-15255
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15255-findbugs-test.001.patch, 
> HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, 
> HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, 
> HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, 
> HDFS-15255.010.patch, experiment-find-bugs.001.patch
>
>
> When only one replica of a block is SDD, the others are HDD. 
> When the client reads the data, the current logic is that it considers the 
> distance between the client and the dn. I think it should also consider the 
> StorageType of the replica. Priority to return fast StorageType node when the 
> distance is same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15298:
--
Hadoop Flags: Reviewed

> Fix the findbugs warnings introduced in HDFS-15217
> --
>
> Key: HDFS-15298
> URL: https://issues.apache.org/jira/browse/HDFS-15298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.4.0
>
>
> We need to fix the findbugs warnings introduced in HDFS-15217:
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15298:
--
Component/s: namanode

> Fix the findbugs warnings introduced in HDFS-15217
> --
>
> Key: HDFS-15298
> URL: https://issues.apache.org/jira/browse/HDFS-15298
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.4.0
>
>
> We need to fix the findbugs warnings introduced in HDFS-15217:
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15298:
--
Affects Version/s: 3.4.0

> Fix the findbugs warnings introduced in HDFS-15217
> --
>
> Key: HDFS-15298
> URL: https://issues.apache.org/jira/browse/HDFS-15298
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 3.4.0
>
>
> We need to fix the findbugs warnings introduced in HDFS-15217:
> https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesystem are not deleted during snapshot delete

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15313:
--
Hadoop Flags: Reviewed

> Ensure inodes in active filesystem are not deleted during snapshot delete
> -
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0
>
> Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, 
> HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, 
> HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesystem are not deleted during snapshot delete

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15313:
--
Affects Version/s: 3.3.1
   3.4.0

> Ensure inodes in active filesystem are not deleted during snapshot delete
> -
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0
>
> Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, 
> HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, 
> HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15344) DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15344:
--
Component/s: datanode

> DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
> 
>
> Key: HDFS-15344
> URL: https://issues.apache.org/jira/browse/HDFS-15344
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.5
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 3.4.0
>
>
> HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This 
> ticket is opened to change DataNode#checkSuperuserPrivilege to use 
> UGI#getGroups. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15320) StringIndexOutOfBoundsException in HostRestrictingAuthorizationFilter

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15320:
--
Affects Version/s: 3.3.1
   3.4.0

> StringIndexOutOfBoundsException in HostRestrictingAuthorizationFilter
> -
>
> Key: HDFS-15320
> URL: https://issues.apache.org/jira/browse/HDFS-15320
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.3.1, 3.4.0
> Environment: HostRestrictingAuthorizationFilter (HDFS-14234) is 
> enabled
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> When there is a request to "http://:/" without "webhdfs/v1" 
> suffix, DN returns 500 response code and throws 
> StringIndexOutOfBoundsException as follows: 
> {noformat}
> 2020-05-01 16:10:20,220 ERROR 
> org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler:
>  Exception in HostRestrictingAuthorizationFilterHandler
> java.lang.StringIndexOutOfBoundsException: String index out of range: -10
> at java.base/java.lang.String.substring(String.java:1841)
> at 
> org.apache.hadoop.hdfs.server.common.HostRestrictingAuthorizationFilter.handleInteraction(HostRestrictingAuthorizationFilter.java:234)
> at 
> org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler.channelRead0(HostRestrictingAuthorizationFilterHandler.java:155)
> at 
> org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler.channelRead0(HostRestrictingAuthorizationFilterHandler.java:58)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328)
> at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352)
> at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360)
> at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931)
> at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
> at 
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:834)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15345) RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15345:
--
Component/s: rbf

> RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups 
> after HADOOP-13442
> 
>
> Key: HDFS-15345
> URL: https://issues.apache.org/jira/browse/HDFS-15345
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 2.7.5
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 3.4.0
>
>
> HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This 
> ticket is opened to change  RouterPermissionChecker#checkSuperuserPrivilege 
> to use UGI#getGroups. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15344) DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15344:
--
Affects Version/s: 3.4.0

> DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
> 
>
> Key: HDFS-15344
> URL: https://issues.apache.org/jira/browse/HDFS-15344
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.5, 3.4.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 3.4.0
>
>
> HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This 
> ticket is opened to change DataNode#checkSuperuserPrivilege to use 
> UGI#getGroups. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15345) RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15345:
--
Affects Version/s: 3.4.0

> RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups 
> after HADOOP-13442
> 
>
> Key: HDFS-15345
> URL: https://issues.apache.org/jira/browse/HDFS-15345
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 2.7.5, 3.4.0
>Reporter: Xiaoyu Yao
>Assignee: Xiaoyu Yao
>Priority: Major
> Fix For: 3.4.0
>
>
> HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This 
> ticket is opened to change  RouterPermissionChecker#checkSuperuserPrivilege 
> to use UGI#getGroups. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15371) Nonstandard characters exist in NameNode.java

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15371:
--
Hadoop Flags: Reviewed

> Nonstandard characters exist in NameNode.java
> -
>
> Key: HDFS-15371
> URL: https://issues.apache.org/jira/browse/HDFS-15371
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.1.0
>Reporter: JiangHua Zhu
>Assignee: Zhao Yi Ming
>Priority: Minor
> Fix For: 3.4.0
>
>
> In NameNode.Java, DFS_HA_ZKFC_PORT_KEY has non-standard characters behind it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15372:
--
Hadoop Flags: Reviewed

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15350) Set dfs.client.failover.random.order to true as default

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15350:
--
Affects Version/s: 3.4.0

> Set dfs.client.failover.random.order to true as default
> ---
>
> Key: HDFS-15350
> URL: https://issues.apache.org/jira/browse/HDFS-15350
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.4.0
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
> Fix For: 3.4.0
>
>
> {noformat}
> Currently, the default value of dfs.client.failover.random.order is
> false. If it's true, clients access to NameNodes random order instead
> of the configured order which is defined in hdfs-site.xml.
> Setting dfs.client.failover.random.order=true is very important for
> RBF if there are multiple routers. If it's false, all the clients
> point to the same router because routers are always active.
> And I think dfs.client.failover.random.order=true would be good manner
> for normal HA(two-NameNodes) Cluster too. If it's false and the first
> NameNode is standby, clients always access to standby NameNode at
> first.
> So I'd like to set dfs.client.failover.random.order to true as default
> from 3.4. Does anyone have any concerns?
> {noformat}
> https://lists.apache.org/thread.html/ra79dde30235a1d302ea82120de8829c0aa7d6c0789f4613430610b8a%40%3Chdfs-dev.hadoop.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15359) EC: Allow closing a file with committed blocks

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15359:
--
Affects Version/s: 3.4.0

> EC: Allow closing a file with committed blocks
> --
>
> Key: HDFS-15359
> URL: https://issues.apache.org/jira/browse/HDFS-15359
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15359-01.patch, HDFS-15359-02.patch, 
> HDFS-15359-03.patch, HDFS-15359-04.patch, HDFS-15359-05.patch
>
>
> Presently, {{dfs.namenode.file.close.num-committed-allowed}} is ignored in 
> case of EC blocks. But in case of heavy loads, IBR's from Datanode may get 
> delayed and cause the file write to fail. So, can allow EC files to close 
> with blocks in committed state as REP files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15371) Nonstandard characters exist in NameNode.java

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15371:
--
Component/s: namanode

> Nonstandard characters exist in NameNode.java
> -
>
> Key: HDFS-15371
> URL: https://issues.apache.org/jira/browse/HDFS-15371
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.1.0
>Reporter: JiangHua Zhu
>Assignee: Zhao Yi Ming
>Priority: Minor
> Fix For: 3.4.0
>
>
> In NameNode.Java, DFS_HA_ZKFC_PORT_KEY has non-standard characters behind it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15415) Reduce locking in Datanode DirectoryScanner

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15415:
--
Hadoop Flags: Reviewed

> Reduce locking in Datanode DirectoryScanner
> ---
>
> Key: HDFS-15415
> URL: https://issues.apache.org/jira/browse/HDFS-15415
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15415.001.patch, HDFS-15415.002.patch, 
> HDFS-15415.003.patch, HDFS-15415.004.patch, HDFS-15415.005.patch, 
> HDFS-15415.branch-3.1.001.patch, HDFS-15415.branch-3.1.002.patch, 
> HDFS-15415.branch-3.2.001.patch, HDFS-15415.branch-3.2.002.patch, 
> HDFS-15415.branch-3.3.001.patch
>
>
> In HDFS-15406, we have a small change to greatly reduce the runtime and 
> locking time of the datanode DirectoryScanner. They may be room for further 
> improvement.
> From the scan step, we have captured a snapshot of what is on disk. After 
> calling `dataset.getFinalizedBlocks(bpid);` we have taken a snapshot of in 
> memory. The two snapshots are never 100% in sync as things are always 
> changing as the disk is scanned.
> We are only comparing finalized blocks, so they should not really change:
> * If a block is deleted after our snapshot, our snapshot will not see it and 
> that is OK.
> * A finalized block could be appended. If that happens both the genstamp and 
> length will change, but that should be handled by reconcile when it calls 
> `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
> appended after they have been scanned from disk, but before they have been 
> compared with memory.
> My suspicion is that we can do all the comparison work outside of the lock 
> and checkAndUpdate() re-checks any differences later under the lock on a 
> block by block basis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15372:
--
Component/s: snapshots

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15372:
--
Affects Version/s: 3.3.1
   3.4.0

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15418:
--
Component/s: hdfs

> ViewFileSystemOverloadScheme should represent mount links as non symlinks
> -
>
> Key: HDFS-15418
> URL: https://issues.apache.org/jira/browse/HDFS-15418
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
>
> Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. 
> ViewFS represents the mount links as symlinks always. Since 
> ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not 
> have symlinks, ViewFs behavior symlinks can confuse.
> So, here I propose to represent mount links as non symlinks in 
> ViewFSOverloadScheme



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15418:
--
Affects Version/s: 3.4.0

> ViewFileSystemOverloadScheme should represent mount links as non symlinks
> -
>
> Key: HDFS-15418
> URL: https://issues.apache.org/jira/browse/HDFS-15418
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
>
> Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. 
> ViewFS represents the mount links as symlinks always. Since 
> ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not 
> have symlinks, ViewFs behavior symlinks can confuse.
> So, here I propose to represent mount links as non symlinks in 
> ViewFSOverloadScheme



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15418:
--
Affects Version/s: 3.3.1

> ViewFileSystemOverloadScheme should represent mount links as non symlinks
> -
>
> Key: HDFS-15418
> URL: https://issues.apache.org/jira/browse/HDFS-15418
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.3.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
>
> Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. 
> ViewFS represents the mount links as symlinks always. Since 
> ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not 
> have symlinks, ViewFs behavior symlinks can confuse.
> So, here I propose to represent mount links as non symlinks in 
> ViewFSOverloadScheme



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15422:
--
Affects Version/s: 3.3.1
   3.4.0

> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Kihwal Lee
>Assignee: Stephen O'Donnell
>Priority: Critical
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
> Attachments: HDFS-15422-branch-2.10.001.patch, 
> HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch
>
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.
> Although it sounds bad, I am not making it blocker 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15422:
--
Hadoop Flags: Reviewed

> Reported IBR is partially replaced with stored info when queuing.
> -
>
> Key: HDFS-15422
> URL: https://issues.apache.org/jira/browse/HDFS-15422
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Kihwal Lee
>Assignee: Stephen O'Donnell
>Priority: Critical
> Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3
>
> Attachments: HDFS-15422-branch-2.10.001.patch, 
> HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch
>
>
> When queueing an IBR (incremental block report) on a standby namenode, some 
> of the reported information is being replaced with the existing stored 
> information.  This can lead to false block corruption.
> We had a namenode, after transitioning to active, started reporting missing 
> blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were 
> appended and the sizes were actually correct on the datanodes. Upon further 
> investigation, it was determined that the namenode was queueing IBRs with 
> altered information.
> Although it sounds bad, I am not making it blocker 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15429:
--
Component/s: hdfs

> mkdirs should work when parent dir is internalDir and fallback configured.
> --
>
> Key: HDFS-15429
> URL: https://issues.apache.org/jira/browse/HDFS-15429
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
>
> mkdir will not work if the parent dir is Internal mount dir (non leaf in 
> mount path) and fall back configured.
> Since fallback is available and if same tree structure available in fallback, 
> we should be able to mkdir in fallback.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15449:
--
Component/s: hdfs

> Optionally ignore port number in mount-table name when picking from 
> initialized uri
> ---
>
> Key: HDFS-15449
> URL: https://issues.apache.org/jira/browse/HDFS-15449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> Currently mount-table name is used from uri's authority part. This authority 
> part contains IP:port/HOST:port. Some may configure without port as well.
> ex: hdfs://ns1 or hdfs://ns1:8020
> It may be good idea to use only hostname/IP when users configured with 
> IP:port/HOST:port format. So, that we will have unique mount-table name in 
> both cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15430) create should work when parent dir is internalDir and fallback configured.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15430:
--
Component/s: hdfs

> create should work when parent dir is internalDir and fallback configured.
> ---
>
> Key: HDFS-15430
> URL: https://issues.apache.org/jira/browse/HDFS-15430
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> create will not work if the parent dir is Internal mount dir (non leaf in 
> mount path) and fall back configured.
> Since fallback is available and if same tree structure available in fallback, 
> we should be able to create in fallback fs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15449:
--
Affects Version/s: 3.3.1
   3.4.0

> Optionally ignore port number in mount-table name when picking from 
> initialized uri
> ---
>
> Key: HDFS-15449
> URL: https://issues.apache.org/jira/browse/HDFS-15449
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> Currently mount-table name is used from uri's authority part. This authority 
> part contains IP:port/HOST:port. Some may configure without port as well.
> ex: hdfs://ns1 or hdfs://ns1:8020
> It may be good idea to use only hostname/IP when users configured with 
> IP:port/HOST:port format. So, that we will have unique mount-table name in 
> both cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15462:
--
Hadoop Flags: Reviewed

> Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
> -
>
> Key: HDFS-15462
> URL: https://issues.apache.org/jira/browse/HDFS-15462
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
>
> HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add 
> ofs to core-default here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15488:
--
Hadoop Flags: Reviewed

> Add a command to list all snapshots for a snaphottable root with snapshot Ids
> -
>
> Key: HDFS-15488
> URL: https://issues.apache.org/jira/browse/HDFS-15488
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15488.000.patch
>
>
> Currently, the way to list snapshots is do a ls on  
> /.snapshot directory. Since creation time is not 
> recorded , there is no way to actually figure out the chronological order of 
> snapshots. The idea here is to add a command to list snapshots for a 
> snapshottable directory along with snapshot Ids which grow monotonically as 
> snapshots are created in the system. With snapID, it will be helpful to 
> figure out the chronology of snapshots in the system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15488:
--
Affects Version/s: 3.4.0

> Add a command to list all snapshots for a snaphottable root with snapshot Ids
> -
>
> Key: HDFS-15488
> URL: https://issues.apache.org/jira/browse/HDFS-15488
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15488.000.patch
>
>
> Currently, the way to list snapshots is do a ls on  
> /.snapshot directory. Since creation time is not 
> recorded , there is no way to actually figure out the chronological order of 
> snapshots. The idea here is to add a command to list snapshots for a 
> snapshottable directory along with snapshot Ids which grow monotonically as 
> snapshots are created in the system. With snapID, it will be helpful to 
> figure out the chronology of snapshots in the system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15492) Make trash root inside each snapshottable directory

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15492:
--
Hadoop Flags: Reviewed

> Make trash root inside each snapshottable directory
> ---
>
> Key: HDFS-15492
> URL: https://issues.apache.org/jira/browse/HDFS-15492
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, hdfs-client
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside 
> one snapshottable directories are moved outside of it. The most common case 
> of this is when trash is enabled and user deletes some file via the command 
> line without skipTrash.
> This jira aims to make a trash root for each snapshottable directory, same as 
> how encryption zone behaves at the moment.
> This will make trash cleanup a little bit more expensive on the NameNode as 
> it will be to iterate all trash roots. But should be fine as long as there 
> aren't many snapshottable directories.
> I could make this improvement as an option and disable it by default if 
> needed, such as {{dfs.namenode.snapshot.trashroot.enabled}}
> One small caveat though, when disabling (disallowing) snapshot on the 
> snapshottable directory when this improvement is in place. The client should 
> merge the snapshottable directory's trash with that user's trash to ensure 
> proper trash cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15493:
--
Hadoop Flags: Reviewed

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, 
> HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, 
> HDFS-15493.006.patch, HDFS-15493.007.patch, HDFS-15493.008.patch, 
> fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduce to 410s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15493:
--
Affects Version/s: 3.3.1
   3.4.0

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, 
> HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, 
> HDFS-15493.006.patch, HDFS-15493.007.patch, HDFS-15493.008.patch, 
> fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduce to 410s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15499) Clean up httpfs/pom.xml to remove aws-java-sdk-s3 exclusion

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15499:
--
Affects Version/s: 3.3.1
   3.4.0

> Clean up httpfs/pom.xml to remove aws-java-sdk-s3 exclusion
> ---
>
> Key: HDFS-15499
> URL: https://issues.apache.org/jira/browse/HDFS-15499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Major
> Fix For: 3.1.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0
>
>
> In [HADOOP-14040] we use shaded aws-sdk uber-JAR for instead of s3 jar in 
> hadoop-project/pom.xml. After that, we should also update httpfs `pom.xml` 
> file to exclude the correct jar dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15506) [JDK 11] Fix javadoc errors in hadoop-hdfs module

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15506:
--
Affects Version/s: 3.3.1
   3.4.0

> [JDK 11] Fix javadoc errors in hadoop-hdfs module
> -
>
> Key: HDFS-15506
> URL: https://issues.apache.org/jira/browse/HDFS-15506
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15506.001.patch, HDFS-15506.002.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java:43:
>  error: self-closing element not allowed
> [ERROR]  * 
> [ERROR]^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:682:
>  error: malformed HTML
> [ERROR]* a NameNode per second. Values <= 0 disable throttling. This 
> affects
> [ERROR]^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java:1780:
>  error: exception not thrown: java.io.FileNotFoundException
> [ERROR]* @throws FileNotFoundException
> [ERROR]  ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java:176:
>  error: @param name not found
> [ERROR]* @param mtime The snapshot creation time set by Time.now().
> [ERROR] ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:2187:
>  error: exception not thrown: java.lang.Exception
> [ERROR]* @exception Exception if the filesystem does not exist.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a0c16f0408a623e798dd7df29fbddf82
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15507) [JDK 11] Fix javadoc errors in hadoop-hdfs-client module

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15507:
--
Affects Version/s: 3.3.1
   3.4.0

> [JDK 11] Fix javadoc errors in hadoop-hdfs-client module
> 
>
> Key: HDFS-15507
> URL: https://issues.apache.org/jira/browse/HDFS-15507
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Xieming Li
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15507.001.patch, HDFS-15507.002.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:32:
>  error: self-closing element not allowed
> [ERROR]  * 
> [ERROR]^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1245:
>  error: unexpected text
> [ERROR]* Same as {@link #create(String, FsPermission, EnumSet, boolean, 
> short, long,
> [ERROR]  ^
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java:161:
>  error: reference not found
> [ERROR]* {@link HdfsConstants#LEASE_HARDLIMIT_PERIOD hard limit}. Until 
> the
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/7ab1c48a9bd7a0fdb11fa82eb04874d5
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15524) Add edit log entry for Snapshot deletion GC thread snapshot deletion

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15524:
--
Hadoop Flags: Reviewed

> Add edit log entry for Snapshot deletion GC thread snapshot deletion
> 
>
> Key: HDFS-15524
> URL: https://issues.apache.org/jira/browse/HDFS-15524
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, Snapshot deletion Gc thread doesn't create an edit log transaction 
> when the actual snapshot is garbage collected. In cases as such, what might 
> happen is, if the gc thread deletes  snapshots and then namenode is 
> restarted, snapshots which were garbage collected by the snapshot gc thread 
> prior restart will reapper till the gc thread again picks them up for garbage 
> collection as the edits were not captured for actual garbage collection and 
> at the same time data might have already been deleted from the datanodes 
> which may lead to too many spurious missing block alerts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15508:
--
Affects Version/s: 3.3.1
   3.4.0

> [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
> -
>
> Key: HDFS-15508
> URL: https://issues.apache.org/jira/browse/HDFS-15508
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15508.01.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21:
>  error: reference not found
> [ERROR]  * Implementations should extend {@link 
> AbstractDelegationTokenSecretManager}.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15508:
--
Hadoop Flags: Reviewed

> [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
> -
>
> Key: HDFS-15508
> URL: https://issues.apache.org/jira/browse/HDFS-15508
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Akira Ajisaka
>Assignee: Akira Ajisaka
>Priority: Major
>  Labels: newbie
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15508.01.patch
>
>
> {noformat}
> [ERROR] 
> /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21:
>  error: reference not found
> [ERROR]  * Implementations should extend {@link 
> AbstractDelegationTokenSecretManager}.
> [ERROR] ^
> {noformat}
> Full error log: 
> https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1
> How to reproduce the failure:
> * Remove {{true}} from pom.xml
> * Run {{mvn process-sources javadoc:javadoc-no-fork}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15524) Add edit log entry for Snapshot deletion GC thread snapshot deletion

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15524:
--
Affects Version/s: 3.4.0

> Add edit log entry for Snapshot deletion GC thread snapshot deletion
> 
>
> Key: HDFS-15524
> URL: https://issues.apache.org/jira/browse/HDFS-15524
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, Snapshot deletion Gc thread doesn't create an edit log transaction 
> when the actual snapshot is garbage collected. In cases as such, what might 
> happen is, if the gc thread deletes  snapshots and then namenode is 
> restarted, snapshots which were garbage collected by the snapshot gc thread 
> prior restart will reapper till the gc thread again picks them up for garbage 
> collection as the edits were not captured for actual garbage collection and 
> at the same time data might have already been deleted from the datanodes 
> which may lead to too many spurious missing block alerts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15539) When disallowing snapshot on a dir, throw exception if its trash root is not empty

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15539:
--
Hadoop Flags: Reviewed

> When disallowing snapshot on a dir, throw exception if its trash root is not 
> empty
> --
>
> Key: HDFS-15539
> URL: https://issues.apache.org/jira/browse/HDFS-15539
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When snapshot is disallowed on a dir, {{getTrashRoots()}} won't return the 
> trash root in that dir anymore (if any). The risk is the trash root will be 
> left there forever.
> We need to throw an exception there and prompt the user to clean up or rename 
> the trash root if it is not empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15539) When disallowing snapshot on a dir, throw exception if its trash root is not empty

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15539:
--
Affects Version/s: 3.4.0

> When disallowing snapshot on a dir, throw exception if its trash root is not 
> empty
> --
>
> Key: HDFS-15539
> URL: https://issues.apache.org/jira/browse/HDFS-15539
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When snapshot is disallowed on a dir, {{getTrashRoots()}} won't return the 
> trash root in that dir anymore (if any). The risk is the trash root will be 
> left there forever.
> We need to throw an exception there and prompt the user to clean up or rename 
> the trash root if it is not empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15542:
--
Component/s: test

> Add identified snapshot corruption tests for ordered snapshot deletion
> --
>
> Key: HDFS-15542
> URL: https://issues.apache.org/jira/browse/HDFS-15542
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots, test
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage 
> corruption sequences with snapshots . The idea here is to aggregate these 
> unit tests and enabled them for ordered snapshot deletion feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15540) Directories protected from delete can still be moved to the trash

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15540:
--
Hadoop Flags: Reviewed

> Directories protected from delete can still be moved to the trash
> -
>
> Key: HDFS-15540
> URL: https://issues.apache.org/jira/browse/HDFS-15540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15540.001.patch
>
>
> With HDFS-8983, HDFS-14802 and HDFS-15243 we are able to list protected 
> directories which cannot be deleted or renamed, provided the following is set:
> fs.protected.directories: 
> dfs.protected.subdirectories.enable: true
> Testing this feature out, I can see it mostly works fine, but protected 
> non-empty folders can still be moved to the trash. In this example 
> /dir/protected is set in fs.protected.directories, and 
> dfs.protected.subdirectories.enable is true.
> {code}
> hadoop fs -ls -R /dir
> drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected
> -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/file1
> drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir1
> -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir1/file1
> drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir2
> -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir2/file1
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected/subdir1
> rm: Cannot delete/rename subdirectory under protected subdirectory 
> /dir/protected
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected/subdir1 
> /dir/protected/subdir1-moved
> mv: Cannot delete/rename subdirectory under protected subdirectory 
> /dir/protected
> ** ALL GOOD SO FAR **
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected/subdir1
> 2020-08-26 16:54:32,404 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nn1/dir/protected/subdir1' to trash at: 
> hdfs://nn1/user/hdfs/.Trash/Current/dir/protected/subdir1
> ** It moved the protected sub-dir to the trash, where it will be deleted **
> ** Checking the top level dir, it is the same **
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected 
> rm: Cannot delete/rename non-empty protected directory /dir/protected
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected /dir/protected-new
> mv: Cannot delete/rename non-empty protected directory /dir/protected
> [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected 
> 2020-08-26 16:55:32,402 INFO fs.TrashPolicyDefault: Moved: 
> 'hdfs://nn1/dir/protected' to trash at: 
> hdfs://nn1/user/hdfs/.Trash/Current/dir/protected1598460932388
> {code}
> The reason for this, seems to be that "move to trash" uses a different rename 
> method in FSNameSystem and FSDirRenameOp which avoids the 
> DFSUtil.checkProtectedDescendants(...) in the earlier Jiras.
> I believe that "move to trash" should be protected in the same way as a 
> -skipTrash delete.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15541) Disallow making a Snapshottable directory unsnapshottable if it has no empty snapshot trash directory

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15541:
--
Fix Version/s: (was: 3.4.0)

> Disallow making a Snapshottable directory unsnapshottable if it has no empty 
> snapshot trash directory
> -
>
> Key: HDFS-15541
> URL: https://issues.apache.org/jira/browse/HDFS-15541
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Siyao Meng
>Priority: Major
>
> If the snapshot trash is enabled, a snapshottable directory should be 
> disallowed to be marked unsnapshottable if it has non-empty snapshot trash 
> directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15542:
--
Hadoop Flags: Reviewed

> Add identified snapshot corruption tests for ordered snapshot deletion
> --
>
> Key: HDFS-15542
> URL: https://issues.apache.org/jira/browse/HDFS-15542
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots, test
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage 
> corruption sequences with snapshots . The idea here is to aggregate these 
> unit tests and enabled them for ordered snapshot deletion feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15545:
--
Component/s: webhdfs

> (S)Webhdfs will not use updated delegation tokens available in the ugi after 
> the old ones expire
> 
>
> Key: HDFS-15545
> URL: https://issues.apache.org/jira/browse/HDFS-15545
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> WebHdfsFileSystem can select a delegation token to use from the current user 
> UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it 
> every time without searching the UGI again.
> If the previous token expires, WebHdfsFileSystem will catch the exception and 
> attempt to get a new token. However, the mechanism to get a new token 
> bypasses searching for one on the UGI, so even if there is external logic 
> that has retrieved a new token, it is not possible to make the FileSystem use 
> the new, valid token, rendering the FileSystem object unusable.
> A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it 
> finds a different token than the cached one try to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15542:
--
Affects Version/s: 3.4.0

> Add identified snapshot corruption tests for ordered snapshot deletion
> --
>
> Key: HDFS-15542
> URL: https://issues.apache.org/jira/browse/HDFS-15542
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage 
> corruption sequences with snapshots . The idea here is to aggregate these 
> unit tests and enabled them for ordered snapshot deletion feature.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15545:
--
Hadoop Flags: Reviewed
Target Version/s: 3.3.2, 3.4.0  (was: 3.4.0, 3.3.2)

> (S)Webhdfs will not use updated delegation tokens available in the ugi after 
> the old ones expire
> 
>
> Key: HDFS-15545
> URL: https://issues.apache.org/jira/browse/HDFS-15545
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> WebHdfsFileSystem can select a delegation token to use from the current user 
> UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it 
> every time without searching the UGI again.
> If the previous token expires, WebHdfsFileSystem will catch the exception and 
> attempt to get a new token. However, the mechanism to get a new token 
> bypasses searching for one on the UGI, so even if there is external logic 
> that has retrieved a new token, it is not possible to make the FileSystem use 
> the new, valid token, rendering the FileSystem object unusable.
> A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it 
> finds a different token than the cached one try to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire

2024-01-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-15545:
--
Affects Version/s: 3.4.0

> (S)Webhdfs will not use updated delegation tokens available in the ugi after 
> the old ones expire
> 
>
> Key: HDFS-15545
> URL: https://issues.apache.org/jira/browse/HDFS-15545
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Issac Buenrostro
>Assignee: Issac Buenrostro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> WebHdfsFileSystem can select a delegation token to use from the current user 
> UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it 
> every time without searching the UGI again.
> If the previous token expires, WebHdfsFileSystem will catch the exception and 
> attempt to get a new token. However, the mechanism to get a new token 
> bypasses searching for one on the UGI, so even if there is external logic 
> that has retrieved a new token, it is not possible to make the FileSystem use 
> the new, valid token, rendering the FileSystem object unusable.
> A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it 
> finds a different token than the cached one try to use it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    3   4   5   6   7   8   9   10   11   12   >