[jira] [Updated] (HDFS-17261) RBF: Fix getFileInfo return wrong path when get mountTable path which multi-level
[ https://issues.apache.org/jira/browse/HDFS-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17261: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Fix getFileInfo return wrong path when get mountTable path which > multi-level > - > > Key: HDFS-17261 > URL: https://issues.apache.org/jira/browse/HDFS-17261 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > With DFSRouter, Suppose there are two nameservices : ns0,ns1 > # Add mountTable /testgetfileinfo/ns1/dir -> (ns1 -> > /testgetfileinfo/ns1/dir) > # hdfs client via DFSRouter accesses a directory: hdfs dfs -ls -d > /testgetfileinfo > # it will return worng path : /testgetfileinfo/testgetfileinfo > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17260) Fix the logic for reconfigure slow peer enable for Namenode.
[ https://issues.apache.org/jira/browse/HDFS-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17260: -- Component/s: namanode > Fix the logic for reconfigure slow peer enable for Namenode. > > > Key: HDFS-17260 > URL: https://issues.apache.org/jira/browse/HDFS-17260 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17263) RBF: Fix client ls trash path cannot get except default nameservices trash path
[ https://issues.apache.org/jira/browse/HDFS-17263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17263: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Fix client ls trash path cannot get except default nameservices trash > path > --- > > Key: HDFS-17263 > URL: https://issues.apache.org/jira/browse/HDFS-17263 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > With HDFS-16024, we can rename data to the Trash should be based on src > locations. That is great for my useage. After a period of use, I found this > cause a issue. > There are two nameservices ns0 ns1, and ns0 is the default nameservice. > (1) Add moutTable > /home/data -> (ns0, /home/data) > /data1/test1 -> (ns1, /data1/test1 ) > /data2/test2 -> (ns1, /data2/test2 ) > (2)mv file to trash > ns0: /user/test-user/.Trash/Current/home/data/file1 > ns1: /user/test-user/.Trash/Current/data1/test1/file1 > (3) client via DFSRouter ls will not see > /user/test-user/.Trash/Current/data1 > (4) client ls /user/test-user/.Trash/Current/data2/test2 will return > exception . > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17262) Fixed the verbose log.warn in DFSUtil.addTransferRateMetric()
[ https://issues.apache.org/jira/browse/HDFS-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17262: -- Component/s: logging Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Fixed the verbose log.warn in DFSUtil.addTransferRateMetric() > - > > Key: HDFS-17262 > URL: https://issues.apache.org/jira/browse/HDFS-17262 > Project: Hadoop HDFS > Issue Type: Bug > Components: logging >Affects Versions: 3.4.0 >Reporter: Bryan Beaudreault >Assignee: Ravindra Dingankar >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > HDFS-16917 added a LOG.warn when passed duration is 0. The unit for duration > is millis, and its very possible for a read to take less than a millisecond > when considering local TCP connection. We are seeing this spam multiple times > per millisecond. There's another report on the PR for HDFS-16917. > Please downgrade to debug or remove the log -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17265) RBF: Throwing an exception prevents the permit from being released when using FairnessPolicyController
[ https://issues.apache.org/jira/browse/HDFS-17265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17265: -- Component/s: rbf Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Throwing an exception prevents the permit from being released when using > FairnessPolicyController > -- > > Key: HDFS-17265 > URL: https://issues.apache.org/jira/browse/HDFS-17265 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-17265.patch > > > *Bug description* > When the router uses FairnessPolicyController, each time a request is > processed, > the permit of the ns corresponding to the request will be obtained first > {*}(method acquirePermit){*}, > and then the information of namenodes corresponding to the ns will be > obtained{*}(method getOrderedNamenodes){*}. > getOrderedNamenodes comes after acquirePermit, so if acquirePermit succeeds > but getOrderedNamenodes throws an exception, the permit cannot be released. > > *How to reproduce* > Use the original code to run the new unit test > testReleasedWhenExceptionOccurs in this PR > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report
[ https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17275: -- Component/s: hdfs > Judge whether the block has been deleted in the block report > > > Key: HDFS-17275 > URL: https://issues.apache.org/jira/browse/HDFS-17275 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: lei w >Assignee: lei w >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In > block report.,We may do some useless block related calculations when blocks > haven't been added to invalidateBlocks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report
[ https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17275: -- Target Version/s: 3.4.0 > Judge whether the block has been deleted in the block report > > > Key: HDFS-17275 > URL: https://issues.apache.org/jira/browse/HDFS-17275 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: lei w >Assignee: lei w >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In > block report.,We may do some useless block related calculations when blocks > haven't been added to invalidateBlocks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17270) RBF: Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper client to get token in some case
[ https://issues.apache.org/jira/browse/HDFS-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17270: -- Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > RBF: Fix ZKDelegationTokenSecretManagerImpl use closed zookeeper client to > get token in some case > -- > > Key: HDFS-17270 > URL: https://issues.apache.org/jira/browse/HDFS-17270 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: CuratorFrameworkException > > > Now, we use CuratorFramework to simplifies using ZooKeeper in > ZKDelegationTokenSecretManagerImpl and we always hold the same > zookeeperClient after initialization ZKDelegationTokenSecretManagerImpl. But > in some cases like network problem , CuratorFramework may close current > zookeeperClient and create new one. In this case , we will use a zkclient > which has been closed to get token. We encountered this situation in our > cluster,exception information in attachment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17272) NNThroughputBenchmark should support specifying the base directory for multi-client test
[ https://issues.apache.org/jira/browse/HDFS-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17272: -- Target Version/s: 3.4.0 > NNThroughputBenchmark should support specifying the base directory for > multi-client test > > > Key: HDFS-17272 > URL: https://issues.apache.org/jira/browse/HDFS-17272 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Currently, NNThroughputBenchmark does not support specifying the base > directory, therefore does not support multiple clients performing stress > testing at the same time. However, for high-performance namenode machine, > only one client submitting stress test can not make the namenode rpc access > reach the bottleneck. Therefore, multiple clients are required for parallel > testing to make the namenode pressure reach the level of the large-scale > production cluster. > So I specify the base directory through the -baseDirName parameter to support > multiple clients submitting stress tests at the same time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17277) Delete invalid code logic in namenode format
[ https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17277: -- Target Version/s: 3.4.0, 3.3.9 Affects Version/s: 3.4.0 3.3.9 > Delete invalid code logic in namenode format > > > Key: HDFS-17277 > URL: https://issues.apache.org/jira/browse/HDFS-17277 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.9 >Reporter: zhangzhanchang >Assignee: zhangzhanchang >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > There is invalid logical processing in the namenode format process -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module
[ https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17278: -- Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Detect order dependent flakiness in TestViewfsWithNfs3.java under > hadoop-hdfs-nfs module > > > Key: HDFS-17278 > URL: https://issues.apache.org/jira/browse/HDFS-17278 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 > Environment: openjdk version "17.0.9" > Apache Maven 3.9.5 >Reporter: Ruby >Assignee: Ruby >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: failed-1.png, failed-2.png, success.png > > > The order dependent flakiness was detected if the test class > TestDFSClientCache.java runs before TestRpcProgramNfs3.java. > The error message looks like below: > {code:java} > [ERROR] Failures: > [ERROR] TestRpcProgramNfs3.testAccess:279 Incorrect return code > expected:<0> but was:<13> > [ERROR] TestRpcProgramNfs3.testCommit:764 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testCreate:493 Incorrect return code: > expected:<13> but was:<5> > [ERROR] > TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 > Incorrect response: expected: but > was: > [ERROR] TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testFsstat:696 Incorrect return code: > expected:<0> but was:<13> > [ERROR] TestRpcProgramNfs3.testGetattr:205 Incorrect return code > expected:<0> but was:<13> > [ERROR] TestRpcProgramNfs3.testLookup:249 Incorrect return code > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testMkdir:517 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testPathconf:738 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> > but was:<13> > [ERROR] TestRpcProgramNfs3.testReaddir:642 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testReadlink:297 Incorrect return code: > expected:<0> but was:<5> > [ERROR] TestRpcProgramNfs3.testRemove:570 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testRename:618 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testRmdir:594 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testSetattr:225 Incorrect return code > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testSymlink:546 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testWrite:468 Incorrect return code: > expected:<13> but was:<5> > [INFO] > [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0 > [INFO] > [ERROR] There are test failures. {code} > The polluter that led to this flakiness was the test method > testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a > line > {code:java} > UserGroupInformation.setLoginUser(currentUserUgi);{code} > which modifies some shared state and resource, something like pre-setup the > config. To fix this issue, I added the cleanup methods in > TestDFSClientCache.java to reset the UserGroupInformation to ensure the > isolation among each test class. > {code:java} > @AfterClass > public static void cleanup() { > UserGroupInformation.reset(); > }{code} > Including setting > {code:java} > authenticationMethod = null; > conf = null; // set configuration to null > setLoginUser(null); // reset login user to default null{code} > ..., and so on. The reset() methods can be referred to > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java. > After the fix, the error was no longer exist and the succeed message was: > {code:java} > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest > [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest > [INFO] > [INFO] Results: > [INFO] > [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0 > [INFO] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > {code} > Here is the CustomTest.java file that I used to run these two tests in order, > the
[jira] [Updated] (HDFS-17275) Judge whether the block has been deleted in the block report
[ https://issues.apache.org/jira/browse/HDFS-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17275: -- Affects Version/s: 3.4.0 > Judge whether the block has been deleted in the block report > > > Key: HDFS-17275 > URL: https://issues.apache.org/jira/browse/HDFS-17275 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: lei w >Assignee: lei w >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > Now, we use asynchronous thread MarkedDeleteBlockScrubber to delete block. In > block report.,We may do some useless block related calculations when blocks > haven't been added to invalidateBlocks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17277) Delete invalid code logic in namenode format
[ https://issues.apache.org/jira/browse/HDFS-17277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17277: -- Component/s: namenode > Delete invalid code logic in namenode format > > > Key: HDFS-17277 > URL: https://issues.apache.org/jira/browse/HDFS-17277 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0, 3.3.9 >Reporter: zhangzhanchang >Assignee: zhangzhanchang >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > There is invalid logical processing in the namenode format process -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module
[ https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17278: -- Component/s: nfs test > Detect order dependent flakiness in TestViewfsWithNfs3.java under > hadoop-hdfs-nfs module > > > Key: HDFS-17278 > URL: https://issues.apache.org/jira/browse/HDFS-17278 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs, test >Affects Versions: 3.4.0 > Environment: openjdk version "17.0.9" > Apache Maven 3.9.5 >Reporter: Ruby >Assignee: Ruby >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: failed-1.png, failed-2.png, success.png > > > The order dependent flakiness was detected if the test class > TestDFSClientCache.java runs before TestRpcProgramNfs3.java. > The error message looks like below: > {code:java} > [ERROR] Failures: > [ERROR] TestRpcProgramNfs3.testAccess:279 Incorrect return code > expected:<0> but was:<13> > [ERROR] TestRpcProgramNfs3.testCommit:764 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testCreate:493 Incorrect return code: > expected:<13> but was:<5> > [ERROR] > TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 > Incorrect response: expected: but > was: > [ERROR] TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testFsstat:696 Incorrect return code: > expected:<0> but was:<13> > [ERROR] TestRpcProgramNfs3.testGetattr:205 Incorrect return code > expected:<0> but was:<13> > [ERROR] TestRpcProgramNfs3.testLookup:249 Incorrect return code > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testMkdir:517 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testPathconf:738 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> > but was:<13> > [ERROR] TestRpcProgramNfs3.testReaddir:642 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testReadlink:297 Incorrect return code: > expected:<0> but was:<5> > [ERROR] TestRpcProgramNfs3.testRemove:570 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testRename:618 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testRmdir:594 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testSetattr:225 Incorrect return code > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testSymlink:546 Incorrect return code: > expected:<13> but was:<5> > [ERROR] TestRpcProgramNfs3.testWrite:468 Incorrect return code: > expected:<13> but was:<5> > [INFO] > [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0 > [INFO] > [ERROR] There are test failures. {code} > The polluter that led to this flakiness was the test method > testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a > line > {code:java} > UserGroupInformation.setLoginUser(currentUserUgi);{code} > which modifies some shared state and resource, something like pre-setup the > config. To fix this issue, I added the cleanup methods in > TestDFSClientCache.java to reset the UserGroupInformation to ensure the > isolation among each test class. > {code:java} > @AfterClass > public static void cleanup() { > UserGroupInformation.reset(); > }{code} > Including setting > {code:java} > authenticationMethod = null; > conf = null; // set configuration to null > setLoginUser(null); // reset login user to default null{code} > ..., and so on. The reset() methods can be referred to > hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java. > After the fix, the error was no longer exist and the succeed message was: > {code:java} > [INFO] --- > [INFO] T E S T S > [INFO] --- > [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest > [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest > [INFO] > [INFO] Results: > [INFO] > [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0 > [INFO] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > {code} > Here is the CustomTest.java file that I used to run these two tests
[jira] [Updated] (HDFS-17279) RBF: Fix link to Fedbalance document
[ https://issues.apache.org/jira/browse/HDFS-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17279: -- Target Version/s: 3.4.0 > RBF: Fix link to Fedbalance document > - > > Key: HDFS-17279 > URL: https://issues.apache.org/jira/browse/HDFS-17279 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: screenshot-1.png > > > !screenshot-1.png! > Fix link to Fedbalance document cannot be displayed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17282) Reconfig 'SlowIoWarningThreshold' parameters for datanode.
[ https://issues.apache.org/jira/browse/HDFS-17282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17282: -- Component/s: datanode > Reconfig 'SlowIoWarningThreshold' parameters for datanode. > -- > > Key: HDFS-17282 > URL: https://issues.apache.org/jira/browse/HDFS-17282 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17282) Reconfig 'SlowIoWarningThreshold' parameters for datanode.
[ https://issues.apache.org/jira/browse/HDFS-17282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17282: -- Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Reconfig 'SlowIoWarningThreshold' parameters for datanode. > -- > > Key: HDFS-17282 > URL: https://issues.apache.org/jira/browse/HDFS-17282 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17294: -- Hadoop Flags: Reviewed > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature > Components: configuration >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17279) RBF: Fix link to Fedbalance document
[ https://issues.apache.org/jira/browse/HDFS-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17279: -- Affects Version/s: 3.4.0 > RBF: Fix link to Fedbalance document > - > > Key: HDFS-17279 > URL: https://issues.apache.org/jira/browse/HDFS-17279 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: screenshot-1.png > > > !screenshot-1.png! > Fix link to Fedbalance document cannot be displayed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17294: -- Target Version/s: 3.4.0 > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature > Components: configuration >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17294: -- Affects Version/s: 3.4.0 > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature > Components: configuration >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17297: -- Hadoop Flags: Reviewed Target Version/s: 3.4.0, 3.3.9 Affects Version/s: 3.4.0 3.3.9 > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.4.0, 3.3.9 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17294) Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread.
[ https://issues.apache.org/jira/browse/HDFS-17294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17294: -- Component/s: configuration > Reconfigure the scheduling cycle of the slowPeerCollectorDaemon thread. > --- > > Key: HDFS-17294 > URL: https://issues.apache.org/jira/browse/HDFS-17294 > Project: Hadoop HDFS > Issue Type: New Feature > Components: configuration >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender
[ https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17298: -- Hadoop Flags: Reviewed Target Version/s: 3.4.0, 3.3.9 Affects Version/s: 3.4.0 3.3.9 > Fix NPE in DataNode.handleBadBlock and BlockSender > -- > > Key: HDFS-17298 > URL: https://issues.apache.org/jira/browse/HDFS-17298 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0, 3.3.9 >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > There are some NPE issues on the DataNode side of our online environment. > The detailed exception information is > {code:java} > 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) > [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending > block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK > operation src: /xxx:41452 dst: /xxx:50010 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:748) > {code} > NPE Code logic: > {code:java} > if (!fromScanner && blockScanner.isEnabled()) { > // data.getVolume(block) is null > blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(), > block); > } > {code} > {code:java} > 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) > [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - > xxx:50010:DataXceiver error processing COPY_BLOCK operation src: /xxx:61052 > dst: /xxx:50010 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:748) > {code} > NPE Code logic: > {code:java} > // Obtain a reference before reading data > volumeRef = datanode.data.getVolume(block).obtainReference(); > //datanode.data.getVolume(block) is null > {code} > We need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17315: -- Component/s: namenode (was: namanode) > Optimize the namenode format code logic. > > > Key: HDFS-17315 > URL: https://issues.apache.org/jira/browse/HDFS-17315 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0, 3.3.9 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have > been deleted in, but there is still one line of invalid code that has not > been deleted. > 2. Additionally, optimize resource closure logic and use 'try-with-resources' > processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17315: -- Hadoop Flags: Reviewed > Optimize the namenode format code logic. > > > Key: HDFS-17315 > URL: https://issues.apache.org/jira/browse/HDFS-17315 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0, 3.3.9 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have > been deleted in, but there is still one line of invalid code that has not > been deleted. > 2. Additionally, optimize resource closure logic and use 'try-with-resources' > processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.
[ https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17301: -- Component/s: datanode > Add read and write dataXceiver threads count metrics to datanode. > - > > Key: HDFS-17301 > URL: https://issues.apache.org/jira/browse/HDFS-17301 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > # The DataNodeActiveXeiversCount metric contains the number of threads of all > Op types. > # In most cases, we focus more on the number of read and write dataXceiver > threads, so add read and write dataXceiver threads count metrics to datanode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.
[ https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17301: -- Hadoop Flags: Reviewed Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > Add read and write dataXceiver threads count metrics to datanode. > - > > Key: HDFS-17301 > URL: https://issues.apache.org/jira/browse/HDFS-17301 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 3.4.0 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > # The DataNodeActiveXeiversCount metric contains the number of threads of all > Op types. > # In most cases, we focus more on the number of read and write dataXceiver > threads, so add read and write dataXceiver threads count metrics to datanode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17315: -- Target Version/s: 3.4.0, 3.3.9 Affects Version/s: 3.4.0 3.3.9 > Optimize the namenode format code logic. > > > Key: HDFS-17315 > URL: https://issues.apache.org/jira/browse/HDFS-17315 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.4.0, 3.3.9 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have > been deleted in, but there is still one line of invalid code that has not > been deleted. > 2. Additionally, optimize resource closure logic and use 'try-with-resources' > processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17315: -- Component/s: namanode > Optimize the namenode format code logic. > > > Key: HDFS-17315 > URL: https://issues.apache.org/jira/browse/HDFS-17315 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.4.0, 3.3.9 >Reporter: huangzhaobo >Assignee: huangzhaobo >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have > been deleted in, but there is still one line of invalid code that has not > been deleted. > 2. Additionally, optimize resource closure logic and use 'try-with-resources' > processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-17317: -- Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12862) CacheDirective becomes invalid when NN restart or failover
[ https://issues.apache.org/jira/browse/HDFS-12862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-12862: -- Hadoop Flags: Reviewed Environment: (was: ) > CacheDirective becomes invalid when NN restart or failover > -- > > Key: HDFS-12862 > URL: https://issues.apache.org/jira/browse/HDFS-12862 > Project: Hadoop HDFS > Issue Type: Bug > Components: caching, hdfs >Affects Versions: 2.7.1 >Reporter: Wang XL >Assignee: Wang XL >Priority: Major > Labels: patch > Fix For: 3.3.0, 3.2.2 > > Attachments: HDFS-12862-branch-2.7.1.001.patch, > HDFS-12862-trunk.002.patch, HDFS-12862-trunk.003.patch, > HDFS-12862-trunk.004.patch, HDFS-12862.005.patch, HDFS-12862.006.patch, > HDFS-12862.007.patch, HDFS-12862.branch-3.1.patch > > > The logic in FSNDNCacheOp#modifyCacheDirective is not correct. when modify > cacheDirective,the expiration in directive may be a relative expiryTime, and > EditLog will serial a relative expiry time. > {code:java} > // Some comments here > static void modifyCacheDirective( > FSNamesystem fsn, CacheManager cacheManager, CacheDirectiveInfo > directive, > EnumSet flags, boolean logRetryCache) throws IOException { > final FSPermissionChecker pc = getFsPermissionChecker(fsn); > cacheManager.modifyDirective(directive, pc, flags); > fsn.getEditLog().logModifyCacheDirectiveInfo(directive, logRetryCache); > } > {code} > But when SBN replay the log ,it will invoke > FSImageSerialization#readCacheDirectiveInfo as a absolute expiryTime.It will > result in the inconsistency . > {code:java} > public static CacheDirectiveInfo readCacheDirectiveInfo(DataInput in) > throws IOException { > CacheDirectiveInfo.Builder builder = > new CacheDirectiveInfo.Builder(); > builder.setId(readLong(in)); > int flags = in.readInt(); > if ((flags & 0x1) != 0) { > builder.setPath(new Path(readString(in))); > } > if ((flags & 0x2) != 0) { > builder.setReplication(readShort(in)); > } > if ((flags & 0x4) != 0) { > builder.setPool(readString(in)); > } > if ((flags & 0x8) != 0) { > builder.setExpiration( > CacheDirectiveInfo.Expiration.newAbsolute(readLong(in))); > } > if ((flags & ~0xF) != 0) { > throw new IOException("unknown flags set in " + > "ModifyCacheDirectiveInfoOp: " + flags); > } > return builder.build(); > } > {code} > In other words, fsn.getEditLog().logModifyCacheDirectiveInfo(directive, > logRetryCache) may serial a relative expiry time,But > builder.setExpiration(CacheDirectiveInfo.Expiration.newAbsolute(readLong(in))) >read it as a absolute expiryTime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-12920: -- Affects Version/s: 3.3.2 3.4.0 > HDFS default value change (with adding time unit) breaks old version MR > tarball work with Hadoop 3.x > > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: configuration, hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Junping Du >Assignee: Akira Ajisaka >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12920) HDFS default value change (with adding time unit) breaks old version MR tarball work with Hadoop 3.x
[ https://issues.apache.org/jira/browse/HDFS-12920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-12920: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.2.3, 3.4.0 (was: 3.4.0, 3.2.3, 3.3.2) > HDFS default value change (with adding time unit) breaks old version MR > tarball work with Hadoop 3.x > > > Key: HDFS-12920 > URL: https://issues.apache.org/jira/browse/HDFS-12920 > Project: Hadoop HDFS > Issue Type: Bug > Components: configuration, hdfs >Affects Versions: 3.4.0, 3.3.2 >Reporter: Junping Du >Assignee: Akira Ajisaka >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Time Spent: 40m > Remaining Estimate: 0h > > After HADOOP-15059 get resolved. I tried to deploy 2.9.0 tar ball with 3.0.0 > RC1, and run the job with following errors: > {noformat} > 2017-12-12 13:29:06,824 INFO [main] > org.apache.hadoop.service.AbstractService: Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.lang.NumberFormatException: For input string: "30s" > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:542) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1764) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:522) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:308) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1722) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1886) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1719) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1650) > {noformat} > This is because HDFS-10845, we are adding time unit to hdfs-default.xml but > it cannot be recognized by old version MR jars. > This break our rolling upgrade story, so should mark as blocker. > A quick workaround is to add values in hdfs-site.xml with removing all time > unit. But the right way may be to revert HDFS-10845 (and get rid of noisy > warnings). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13639) SlotReleaser is not fast enough
[ https://issues.apache.org/jira/browse/HDFS-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13639: -- Hadoop Flags: Reviewed > SlotReleaser is not fast enough > --- > > Key: HDFS-13639 > URL: https://issues.apache.org/jira/browse/HDFS-13639 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.4.0, 2.6.0, 3.0.2 > Environment: 1. YCSB: > {color:#00} recordcount=20 > fieldcount=1 > fieldlength=1000 > operationcount=1000 > > workload=com.yahoo.ycsb.workloads.CoreWorkload > > table=ycsb-test > columnfamily=C > readproportion=1 > updateproportion=0 > insertproportion=0 > scanproportion=0 > > maxscanlength=0 > requestdistribution=zipfian > > # default > readallfields=true > writeallfields=true > scanlengthdistribution=constan{color} > {color:#00}2. datanode:{color} > -Xmx2048m -Xms2048m -Xmn1024m -XX:MaxDirectMemorySize=1024m > -XX:MaxPermSize=256m -Xloggc:$run_dir/stdout/datanode_gc_${start_time}.log > -XX:+DisableExplicitGC -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=$log_dir -XX:+PrintGCApplicationStoppedTime > -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled > -XX:+CMSClassUnloadingEnabled -XX:CMSMaxAbortablePrecleanTime=1 > -XX:+CMSScavengeBeforeRemark -XX:+PrintPromotionFailure > -XX:+CMSConcurrentMTEnabled -XX:+ExplicitGCInvokesConcurrent > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -verbose:gc -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps > {color:#00}3. regionserver:{color} > {color:#00}-Xmx10g -Xms10g -XX:MaxDirectMemorySize=10g > -XX:MaxGCPauseMillis=150 -XX:MaxTenuringThreshold=2 > -XX:+UnlockExperimentalVMOptions -XX:G1NewSizePercent=5 > -Xloggc:$run_dir/stdout/regionserver_gc_${start_time}.log -Xss256k > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$log_dir -verbose:gc > -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime > -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy > -XX:+PrintTenuringDistribution -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=100 -XX:GCLogFileSize=128m > -XX:+SafepointTimeout -XX:MonitorBound=16384 -XX:-UseBiasedLocking > -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=65 > -XX:+ParallelRefProcEnabled -XX:ConcGCThreads=4 -XX:ParallelGCThreads=16 > -XX:G1HeapRegionSize=32m -XX:G1MixedGCCountTarget=64 > -XX:G1OldCSetRegionThresholdPercent=5{color} > {color:#00}block cache is disabled:{color}{color:#00} > hbase.bucketcache.size > 0.9 > {color} > >Reporter: Gang Xie >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-13639-2.4.diff, HDFS-13639.001.patch, > HDFS-13639.002.patch, ShortCircuitCache_new_slotReleaser.diff, > perf_after_improve_SlotReleaser.png, perf_before_improve_SlotReleaser.png > > > When test the performance of the ShortCircuit Read of the HDFS with YCSB, we > find that SlotReleaser of the ShortCircuitCache has some performance issue. > The problem is that, the qps of the slot releasing could only reach to 1000+ > while the qps of the slot allocating is ~3000. This means that the replica > info on datanode could not be released in time, which causes a lot of GCs and > finally full GCs. > > The fireflame graph shows that SlotReleaser spends a lot of time to do domain > socket connecting and throw/catching the exception when close the domain > socket and its streams. It doesn't make any sense to do the connecting and > closing each time. Each time when we connect to the domain socket, Datanode > allocates a new thread to free the slot. There are a lot of initializing > work, and it's costly. We need reuse the domain socket. > > After switch to reuse the domain socket(see diff attached), we get great > improvement(see the perf): > # without reusing the domain socket, the get qps of the YCSB getting worse > and worse, and after about 45 mins, full GC starts. When we reuse the domain > socket, no full GC found, and the stress test could be finished smoothly, the > qps of allocating and releasing match. > # Due to the datanode young GC, without the improvement, the YCSB get qps is > even smaller than the one with the improvement, ~3700 VS ~4200. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13671: -- Hadoop Flags: Reviewed > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug > Components: namnode >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, > image-2021-06-18-15-47-04-037.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
[ https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-13671: -- Component/s: namnode > Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet > -- > > Key: HDFS-13671 > URL: https://issues.apache.org/jira/browse/HDFS-13671 > Project: Hadoop HDFS > Issue Type: Bug > Components: namnode >Affects Versions: 3.1.0, 3.0.3 >Reporter: Yiqun Lin >Assignee: Haibin Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: HDFS-13671-001.patch, image-2021-06-10-19-28-18-373.png, > image-2021-06-10-19-28-58-359.png, image-2021-06-18-15-46-46-052.png, > image-2021-06-18-15-47-04-037.png > > Time Spent: 7h 40m > Remaining Estimate: 0h > > NameNode hung when deleting large files/blocks. The stack info: > {code} > "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 > tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849) > at > org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > {code} > In the current deletion logic in NameNode, there are mainly two steps: > * Collect INodes and all blocks to be deleted, then delete INodes. > * Remove blocks chunk by chunk in a loop. > Actually the first step should be a more expensive operation and will takes > more time. However, now we always see NN hangs during the remove block > operation. > Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a > better performance in dealing FBR/IBRs. But compared with early > implementation in remove-block logic, {{FoldedTreeSet}} seems more slower > since It will take additional time to balance tree node. When there are large > block to be removed/deleted, it looks bad. > For the get type operations in {{DatanodeStorageInfo}}, we only provide the > {{getBlockIterator}} to return blocks iterator and no other get operation > with specified block. Still we need to use {{FoldedTreeSet}} in > {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not > Update. Maybe we can revert this to the early implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-14013) Skip any credentials stored in HDFS when starting ZKFC
[ https://issues.apache.org/jira/browse/HDFS-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-14013: -- Hadoop Flags: Reviewed > Skip any credentials stored in HDFS when starting ZKFC > -- > > Key: HDFS-14013 > URL: https://issues.apache.org/jira/browse/HDFS-14013 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.1 >Reporter: Krzysztof Adamski >Assignee: Stephen O'Donnell >Priority: Major > Labels: zkfc > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-14013.001.patch, hadoop-hdfs-zkfc-server1.log > > > HADOOP-15157 added the ability to use a jceks credential provider to store > the Zookeeper credentials needed by the Failover Controller to connect to > Zookeeper. > By default, if any provider is specified in > hadoop.security.credential.provider.path it will be checked to see if it > holds the required information, otherwise the traditional way of getting the > the login will be used. > hadoop.security.credential.provider.path can hold a list of credential > providers and if there is an error reading any of them, the exception bubbles > up and causes the ZKFC to fail. The intent of HADOOP-15157 is to have a local > jceks file for the FC credentials, but if there is another provider stored in > HDFS (eg S3A credentials), then it will fail to be read and cause the FC to > fail. > Other components which use credential providers (eg S3A, ABFS etc) explicitly > disallow storing the credentials in the same type of filesystem. Ie, S3A > cannot use providers stored in S3. To avoid this sort of circular dependency, > any such credentials are removed from the list before they are used. > The Failover Controller should do the same, and ensure it does not try to > read any credentials stored in HDFS, as it will never be able to do so until > HDFS is full started. > For reference, the stack logged when the FC meets this problem is: > > {code:java} > 2018-10-22 08:17:09,251 FATAL tools.DFSZKFailoverController > (DFSZKFailoverController.java:main(197)) - DFSZKFailOverController exiting > due to earlier exception java.io.IOException: Configuration problem with > provider path. 2018-10-22 08:17:09,252 DEBUG util.ExitUtil > (ExitUtil.java:terminate(209)) - Exiting with status 1: java.io.IOException: > Configuration problem with provider path. 1: java.io.IOException: > Configuration problem with provider path. at > org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:199) > Caused by: java.io.IOException: Configuration problem with provider path. > at > org.apache.hadoop.conf.Configuration.getPasswordFromCredentialProviders(Configuration.java:2363) > at > org.apache.hadoop.conf.Configuration.getPassword(Configuration.java:2282) > at > org.apache.hadoop.security.SecurityUtil.getZKAuthInfos(SecurityUtil.java:732) > at > org.apache.hadoop.ha.ZKFailoverController.initZK(ZKFailoverController.java:343) > at > org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:194) > at > org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:60) > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:175) > at > org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:171) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:360) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480) > at > org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:171) > at > org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:195) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby. Visit > https://s.apache.org/sbnn-error at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:88) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1951) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1427) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3100) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1154) > at >
[jira] [Updated] (HDFS-14694) Call recoverLease on DFSOutputStream close exception
[ https://issues.apache.org/jira/browse/HDFS-14694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-14694: -- Affects Version/s: 3.4.0 > Call recoverLease on DFSOutputStream close exception > > > Key: HDFS-14694 > URL: https://issues.apache.org/jira/browse/HDFS-14694 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Chen Zhang >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-14694.001.patch, HDFS-14694.002.patch, > HDFS-14694.003.patch, HDFS-14694.004.patch, HDFS-14694.005.patch, > HDFS-14694.006.patch, HDFS-14694.007.patch, HDFS-14694.008.patch, > HDFS-14694.009.patch, HDFS-14694.010.patch, HDFS-14694.011.patch, > HDFS-14694.012.patch, HDFS-14694.013.patch, HDFS-14694.014.patch > > > HDFS uses file-lease to manage opened files, when a file is not closed > normally, NN will recover lease automatically after hard limit exceeded. But > for a long running service(e.g. HBase), the hdfs-client will never die and NN > don't have any chances to recover the file. > Usually client program needs to handle exceptions by themself to avoid this > condition(e.g. HBase automatically call recover lease for files that not > closed normally), but in our experience, most services (in our company) don't > process this condition properly, which will cause lots of files in abnormal > status or even data loss. > This Jira propose to add a feature that call recoverLease operation > automatically when DFSOutputSteam close encounters exception. It should be > disabled by default, but when somebody builds a long-running service based on > HDFS, they can enable this option. > We've add this feature to our internal Hadoop distribution for more than 3 > years, it's quite useful according our experience. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15098) Add SM4 encryption method for HDFS
[ https://issues.apache.org/jira/browse/HDFS-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15098: -- Component/s: hdfs > Add SM4 encryption method for HDFS > -- > > Key: HDFS-15098 > URL: https://issues.apache.org/jira/browse/HDFS-15098 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs >Affects Versions: 3.4.0 >Reporter: liusheng >Assignee: liusheng >Priority: Major > Labels: pull-request-available, sm4 > Fix For: 3.4.0 > > Attachments: HDFS-15098.001.patch, HDFS-15098.002.patch, > HDFS-15098.003.patch, HDFS-15098.004.patch, HDFS-15098.005.patch, > HDFS-15098.006.patch, HDFS-15098.007.patch, HDFS-15098.008.patch, > HDFS-15098.009.patch, image-2020-08-19-16-54-41-341.png > > Time Spent: 40m > Remaining Estimate: 0h > > SM4 (formerly SMS4)is a block cipher used in the Chinese National Standard > for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). > SM4 was a cipher proposed to for the IEEE 802.11i standard, but has so far > been rejected by ISO. One of the reasons for the rejection has been > opposition to the WAPI fast-track proposal by the IEEE. please see: > [https://en.wikipedia.org/wiki/SM4_(cipher)] > > *Use sm4 on hdfs as follows:* > 1.Configure Hadoop KMS > 2.test HDFS sm4 > hadoop key create key1 -cipher 'SM4/CTR/NoPadding' > hdfs dfs -mkdir /benchmarks > hdfs crypto -createZone -keyName key1 -path /benchmarks > *requires:* > 1.openssl version >=1.1.1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15160) ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl methods should use datanode readlock
[ https://issues.apache.org/jira/browse/HDFS-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15160: -- Hadoop Flags: Reviewed Target Version/s: 3.2.3, 2.10.2, 3.4.0 (was: 3.4.0, 2.10.2, 3.2.3) > ReplicaMap, Disk Balancer, Directory Scanner and various FsDatasetImpl > methods should use datanode readlock > --- > > Key: HDFS-15160 > URL: https://issues.apache.org/jira/browse/HDFS-15160 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Attachments: HDFS-15160-branch-3.3-001.patch, HDFS-15160.001.patch, > HDFS-15160.002.patch, HDFS-15160.003.patch, HDFS-15160.004.patch, > HDFS-15160.005.patch, HDFS-15160.006.patch, HDFS-15160.007.patch, > HDFS-15160.008.patch, HDFS-15160.branch-3-3.001.patch, > image-2020-04-10-17-18-08-128.png, image-2020-04-10-17-18-55-938.png > > Time Spent: 7h 20m > Remaining Estimate: 0h > > Now we have HDFS-15150, we can start to move some DN operations to use the > read lock rather than the write lock to improve concurrence. The first step > is to make the changes to ReplicaMap, as many other methods make calls to it. > This Jira switches read operations against the volume map to use the readLock > rather than the write lock. > Additionally, some methods make a call to replicaMap.replicas() (eg > getBlockReports, getFinalizedBlocks, deepCopyReplica) and only use the result > in a read only fashion, so they can also be switched to using a readLock. > Next is the directory scanner and disk balancer, which only require a read > lock. > Finally (for this Jira) are various "low hanging fruit" items in BlockSender > and fsdatasetImpl where is it fairly obvious they only need a read lock. > For now, I have avoided changing anything which looks too risky, as I think > its better to do any larger refactoring or risky changes each in their own > Jira. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15240: -- Hadoop Flags: Reviewed Target Version/s: 3.3.1, 3.2.2, 3.4.0 (was: 3.2.2, 3.3.1, 3.4.0) > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Blocker > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15240-branch-3.1-001.patch, > HDFS-15240-branch-3.1.001.patch, HDFS-15240-branch-3.2.001.patch, > HDFS-15240-branch-3.3-001.patch, HDFS-15240-branch-3.3.001.patch, > HDFS-15240.001.patch, HDFS-15240.002.patch, HDFS-15240.003.patch, > HDFS-15240.004.patch, HDFS-15240.005.patch, HDFS-15240.006.patch, > HDFS-15240.007.patch, HDFS-15240.008.patch, HDFS-15240.009.patch, > HDFS-15240.010.patch, HDFS-15240.011.patch, HDFS-15240.012.patch, > HDFS-15240.013.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > # When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at
[jira] [Updated] (HDFS-15240) Erasure Coding: dirty buffer causes reconstruction block error
[ https://issues.apache.org/jira/browse/HDFS-15240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15240: -- Affects Version/s: 3.3.1 3.4.0 > Erasure Coding: dirty buffer causes reconstruction block error > -- > > Key: HDFS-15240 > URL: https://issues.apache.org/jira/browse/HDFS-15240 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, erasure-coding >Affects Versions: 3.3.1, 3.4.0 >Reporter: HuangTao >Assignee: HuangTao >Priority: Blocker > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15240-branch-3.1-001.patch, > HDFS-15240-branch-3.1.001.patch, HDFS-15240-branch-3.2.001.patch, > HDFS-15240-branch-3.3-001.patch, HDFS-15240-branch-3.3.001.patch, > HDFS-15240.001.patch, HDFS-15240.002.patch, HDFS-15240.003.patch, > HDFS-15240.004.patch, HDFS-15240.005.patch, HDFS-15240.006.patch, > HDFS-15240.007.patch, HDFS-15240.008.patch, HDFS-15240.009.patch, > HDFS-15240.010.patch, HDFS-15240.011.patch, HDFS-15240.012.patch, > HDFS-15240.013.patch, image-2020-07-16-15-56-38-608.png, > org.apache.hadoop.hdfs.TestReconstructStripedFile-output.txt, > org.apache.hadoop.hdfs.TestReconstructStripedFile.txt, > test-HDFS-15240.006.patch > > > # When read some lzo files we found some blocks were broken. > I read back all internal blocks(b0-b8) of the block group(RS-6-3-1024k) from > DN directly, and choose 6(b0-b5) blocks to decode other 3(b6', b7', b8') > blocks. And find the longest common sequenece(LCS) between b6'(decoded) and > b6(read from DN)(b7'/b7 and b8'/b8). > After selecting 6 blocks of the block group in combinations one time and > iterating through all cases, I find one case that the length of LCS is the > block length - 64KB, 64KB is just the length of ByteBuffer used by > StripedBlockReader. So the corrupt reconstruction block is made by a dirty > buffer. > The following log snippet(only show 2 of 28 cases) is my check program > output. In my case, I known the 3th block is corrupt, so need other 5 blocks > to decode another 3 blocks, then find the 1th block's LCS substring is block > length - 64kb. > It means (0,1,2,4,5,6)th blocks were used to reconstruct 3th block, and the > dirty buffer was used before read the 1th block. > Must be noted that StripedBlockReader read from the offset 0 of the 1th block > after used the dirty buffer. > EDITED for readability. > {code:java} > decode from block[0, 2, 3, 4, 5, 7] to generate block[1', 6', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 4 > Check the first 131072 bytes between block[6] and block[6'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4 > decode from block[0, 2, 3, 4, 5, 6] to generate block[1', 7', 8'] > Check the first 131072 bytes between block[1] and block[1'], the longest > common substring length is 65536 > CHECK AGAIN: all 27262976 bytes between block[1] and block[1'], the longest > common substring length is 27197440 # this one > Check the first 131072 bytes between block[7] and block[7'], the longest > common substring length is 4 > Check the first 131072 bytes between block[8] and block[8'], the longest > common substring length is 4{code} > Now I know the dirty buffer causes reconstruction block error, but how does > the dirty buffer come about? > After digging into the code and DN log, I found this following DN log is the > root reason. > {code:java} > [INFO] [stripedRead-1017] : Interrupted while waiting for IO on channel > java.nio.channels.SocketChannel[connected local=/:52586 > remote=/:50010]. 18 millis timeout left. > [WARN] [StripedBlockReconstruction-199] : Failed to reconstruct striped > block: BP-714356632--1519726836856:blk_-YY_3472979393 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.util.StripedBlockUtil.getNextCompletedStripedRead(StripedBlockUtil.java:314) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.doReadMinimumSources(StripedReader.java:308) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedReader.readMinimumSources(StripedReader.java:269) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.reconstruct(StripedBlockReconstructor.java:94) > at > org.apache.hadoop.hdfs.server.datanode.erasurecode.StripedBlockReconstructor.run(StripedBlockReconstructor.java:60) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at >
[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec
[ https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15253: -- Affects Version/s: 3.3.1 3.4.0 > Set default throttle value on dfs.image.transfer.bandwidthPerSec > > > Key: HDFS-15253 > URL: https://issues.apache.org/jira/browse/HDFS-15253 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can > use maximum available bandwidth for fsimage transfers during checkpoint. I > think we should throttle this. Many users were experienced namenode failover > when transferring large image size along with fsimage replication on > dfs.namenode.name.dir. eg. >25Gb. > Thought to set, > dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s) > dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent > checkpoint. However, the default checkpoint runs every 6 hours once) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()
[ https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15255: -- Component/s: hdfs > Consider StorageType when DatanodeManager#sortLocatedBlock() > > > Key: HDFS-15255 > URL: https://issues.apache.org/jira/browse/HDFS-15255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15255-findbugs-test.001.patch, > HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, > HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, > HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, > HDFS-15255.010.patch, experiment-find-bugs.001.patch > > > When only one replica of a block is SDD, the others are HDD. > When the client reads the data, the current logic is that it considers the > distance between the client and the dn. I think it should also consider the > StorageType of the replica. Priority to return fast StorageType node when the > distance is same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15253) Set default throttle value on dfs.image.transfer.bandwidthPerSec
[ https://issues.apache.org/jira/browse/HDFS-15253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15253: -- Hadoop Flags: Reviewed > Set default throttle value on dfs.image.transfer.bandwidthPerSec > > > Key: HDFS-15253 > URL: https://issues.apache.org/jira/browse/HDFS-15253 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Karthik Palanisamy >Assignee: Karthik Palanisamy >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The default value dfs.image.transfer.bandwidthPerSec is set to 0 so it can > use maximum available bandwidth for fsimage transfers during checkpoint. I > think we should throttle this. Many users were experienced namenode failover > when transferring large image size along with fsimage replication on > dfs.namenode.name.dir. eg. >25Gb. > Thought to set, > dfs.image.transfer.bandwidthPerSec=52428800. (50 MB/s) > dfs.namenode.checkpoint.txns=200 (Default is 1M, good to avoid frequent > checkpoint. However, the default checkpoint runs every 6 hours once) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()
[ https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15255: -- Affects Version/s: 3.3.1 3.4.0 > Consider StorageType when DatanodeManager#sortLocatedBlock() > > > Key: HDFS-15255 > URL: https://issues.apache.org/jira/browse/HDFS-15255 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.1, 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15255-findbugs-test.001.patch, > HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, > HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, > HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, > HDFS-15255.010.patch, experiment-find-bugs.001.patch > > > When only one replica of a block is SDD, the others are HDD. > When the client reads the data, the current logic is that it considers the > distance between the client and the dn. I think it should also consider the > StorageType of the replica. Priority to return fast StorageType node when the > distance is same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15287) HDFS rollingupgrade prepare never finishes
[ https://issues.apache.org/jira/browse/HDFS-15287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15287: -- Hadoop Flags: Reviewed > HDFS rollingupgrade prepare never finishes > -- > > Key: HDFS-15287 > URL: https://issues.apache.org/jira/browse/HDFS-15287 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.10.0, 3.3.0 >Reporter: Kihwal Lee >Priority: Major > > After HDFS-12979, the prepare step of rolling upgrade does not work. This is > because it added additional check for sufficient time passing since last > checkpoint. Since RU rollback image creation and upload can happen any time, > uploading of rollback image never succeeds. For a new cluster deployed for > testing, it might work since it never checkpointed before. > It was found that this check is disabled for unit tests, defeating the very > purpose of testing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15283) Cache pool MAXTTL is not persisted and restored on cluster restart
[ https://issues.apache.org/jira/browse/HDFS-15283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15283: -- Hadoop Flags: Reviewed > Cache pool MAXTTL is not persisted and restored on cluster restart > -- > > Key: HDFS-15283 > URL: https://issues.apache.org/jira/browse/HDFS-15283 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.0, 3.1.4, 3.2.2, 2.10.1 > > Attachments: HDFS-15283.001.patch > > > The cache pool "getMaxRelativeExpiryMs" is never persisted to or read from > the FSImage. This means that if a MAXTTL is set on a pool, it will not > persist beyond a cluster restart. > From the protobuf definition, there is an existing field to store it: > {code} > message CachePoolInfoProto { > optional string poolName = 1; > optional string ownerName = 2; > optional string groupName = 3; > optional int32 mode = 4; > optional int64 limit = 5; > optional int64 maxRelativeExpiry = 6; <-- NEVER SET > optional uint32 defaultReplication = 7 [default=1]; > } > {code} > But this is never set in the CacheManager.saveState() or read in > CacheManager.loadState(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15255) Consider StorageType when DatanodeManager#sortLocatedBlock()
[ https://issues.apache.org/jira/browse/HDFS-15255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15255: -- Hadoop Flags: Reviewed > Consider StorageType when DatanodeManager#sortLocatedBlock() > > > Key: HDFS-15255 > URL: https://issues.apache.org/jira/browse/HDFS-15255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15255-findbugs-test.001.patch, > HDFS-15255.001.patch, HDFS-15255.002.patch, HDFS-15255.003.patch, > HDFS-15255.004.patch, HDFS-15255.005.patch, HDFS-15255.006.patch, > HDFS-15255.007.patch, HDFS-15255.008.patch, HDFS-15255.009.patch, > HDFS-15255.010.patch, experiment-find-bugs.001.patch > > > When only one replica of a block is SDD, the others are HDD. > When the client reads the data, the current logic is that it considers the > distance between the client and the dn. I think it should also consider the > StorageType of the replica. Priority to return fast StorageType node when the > distance is same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217
[ https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15298: -- Hadoop Flags: Reviewed > Fix the findbugs warnings introduced in HDFS-15217 > -- > > Key: HDFS-15298 > URL: https://issues.apache.org/jira/browse/HDFS-15298 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Affects Versions: 3.4.0 >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > We need to fix the findbugs warnings introduced in HDFS-15217: > https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217
[ https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15298: -- Component/s: namanode > Fix the findbugs warnings introduced in HDFS-15217 > -- > > Key: HDFS-15298 > URL: https://issues.apache.org/jira/browse/HDFS-15298 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Affects Versions: 3.4.0 >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > We need to fix the findbugs warnings introduced in HDFS-15217: > https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15298) Fix the findbugs warnings introduced in HDFS-15217
[ https://issues.apache.org/jira/browse/HDFS-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15298: -- Affects Version/s: 3.4.0 > Fix the findbugs warnings introduced in HDFS-15217 > -- > > Key: HDFS-15298 > URL: https://issues.apache.org/jira/browse/HDFS-15298 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 3.4.0 > > > We need to fix the findbugs warnings introduced in HDFS-15217: > https://builds.apache.org/job/hadoop-multibranch/job/PR-1954/5/artifact/out/new-findbugs-hadoop-hdfs-project_hadoop-hdfs.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesystem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15313: -- Hadoop Flags: Reviewed > Ensure inodes in active filesystem are not deleted during snapshot delete > - > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, > HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesystem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15313: -- Affects Version/s: 3.3.1 3.4.0 > Ensure inodes in active filesystem are not deleted during snapshot delete > - > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, > HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15344) DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15344: -- Component/s: datanode > DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442 > > > Key: HDFS-15344 > URL: https://issues.apache.org/jira/browse/HDFS-15344 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.5 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change DataNode#checkSuperuserPrivilege to use > UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15320) StringIndexOutOfBoundsException in HostRestrictingAuthorizationFilter
[ https://issues.apache.org/jira/browse/HDFS-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15320: -- Affects Version/s: 3.3.1 3.4.0 > StringIndexOutOfBoundsException in HostRestrictingAuthorizationFilter > - > > Key: HDFS-15320 > URL: https://issues.apache.org/jira/browse/HDFS-15320 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.3.1, 3.4.0 > Environment: HostRestrictingAuthorizationFilter (HDFS-14234) is > enabled >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > When there is a request to "http://:/" without "webhdfs/v1" > suffix, DN returns 500 response code and throws > StringIndexOutOfBoundsException as follows: > {noformat} > 2020-05-01 16:10:20,220 ERROR > org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler: > Exception in HostRestrictingAuthorizationFilterHandler > java.lang.StringIndexOutOfBoundsException: String index out of range: -10 > at java.base/java.lang.String.substring(String.java:1841) > at > org.apache.hadoop.hdfs.server.common.HostRestrictingAuthorizationFilter.handleInteraction(HostRestrictingAuthorizationFilter.java:234) > at > org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler.channelRead0(HostRestrictingAuthorizationFilterHandler.java:155) > at > org.apache.hadoop.hdfs.server.datanode.web.HostRestrictingAuthorizationFilterHandler.channelRead0(HostRestrictingAuthorizationFilterHandler.java:58) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1422) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:931) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:700) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514) > at > io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15345) RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15345: -- Component/s: rbf > RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups > after HADOOP-13442 > > > Key: HDFS-15345 > URL: https://issues.apache.org/jira/browse/HDFS-15345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 2.7.5 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change RouterPermissionChecker#checkSuperuserPrivilege > to use UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15344) DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15344: -- Affects Version/s: 3.4.0 > DataNode#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442 > > > Key: HDFS-15344 > URL: https://issues.apache.org/jira/browse/HDFS-15344 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 2.7.5, 3.4.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change DataNode#checkSuperuserPrivilege to use > UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15345) RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups after HADOOP-13442
[ https://issues.apache.org/jira/browse/HDFS-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15345: -- Affects Version/s: 3.4.0 > RBF: RouterPermissionChecker#checkSuperuserPrivilege should use UGI#getGroups > after HADOOP-13442 > > > Key: HDFS-15345 > URL: https://issues.apache.org/jira/browse/HDFS-15345 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 2.7.5, 3.4.0 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: 3.4.0 > > > HADOOP-13442 added UGI#getGroups to avoid list->array->list conversions. This > ticket is opened to change RouterPermissionChecker#checkSuperuserPrivilege > to use UGI#getGroups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15371) Nonstandard characters exist in NameNode.java
[ https://issues.apache.org/jira/browse/HDFS-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15371: -- Hadoop Flags: Reviewed > Nonstandard characters exist in NameNode.java > - > > Key: HDFS-15371 > URL: https://issues.apache.org/jira/browse/HDFS-15371 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.1.0 >Reporter: JiangHua Zhu >Assignee: Zhao Yi Ming >Priority: Minor > Fix For: 3.4.0 > > > In NameNode.Java, DFS_HA_ZKFC_PORT_KEY has non-standard characters behind it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions
[ https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15372: -- Hadoop Flags: Reviewed > Files in snapshots no longer see attribute provider permissions > --- > > Key: HDFS-15372 > URL: https://issues.apache.org/jira/browse/HDFS-15372 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, > HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch > > > Given a cluster with an authorization provider configured (eg Sentry) and the > paths covered by the provider are snapshotable, there was a change in > behaviour in how the provider permissions and ACLs are applied to files in > snapshots between the 2.x branch and Hadoop 3.0. > Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs > below are provided by Sentry: > {code} > hadoop fs -getfacl -R /data > # file: /data > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/tab1 > # owner: hive > # group: hive > user::rwx > group::--- > group:flume:rwx > user:hive:rwx > group:hive:rwx > group:testgroup:rwx > mask::rwx > other::--x > /data/tab1 > {code} > After taking a snapshot, the files in the snapshot do not see the provider > permissions: > {code} > hadoop fs -getfacl -R /data/.snapshot > # file: /data/.snapshot > # owner: > # group: > user::rwx > group::rwx > other::rwx > # file: /data/.snapshot/snap1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/.snapshot/snap1/tab1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > {code} > However pre-Hadoop 3.0 (when the attribute provider etc was extensively > refactored) snapshots did get the provider permissions. > The reason is this code in FSDirectory.java which ultimately calls the > attribute provider and passes the path we want permissions for: > {code} > INodeAttributes getAttributes(INodesInPath iip) > throws IOException { > INode node = FSDirectory.resolveLastINode(iip); > int snapshot = iip.getPathSnapshotId(); > INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot); > UserGroupInformation ugi = NameNode.getRemoteUser(); > INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi); > if (ap != null) { > // permission checking sends the full components array including the > // first empty component for the root. however file status > // related calls are expected to strip out the root component according > // to TestINodeAttributeProvider. > byte[][] components = iip.getPathComponents(); > components = Arrays.copyOfRange(components, 1, components.length); > nodeAttrs = ap.getAttributes(components, nodeAttrs); > } > return nodeAttrs; > } > {code} > The line: > {code} > INode node = FSDirectory.resolveLastINode(iip); > {code} > Picks the last resolved Inode and if you then call node.getPathComponents, > for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It > resolves the snapshot path to its original location, but its still the > snapshot inode. > However the logic passes 'iip.getPathComponents' which returns > "/user/.snapshot/snap1/tab" to the provider. > The pre Hadoop 3.0 code passes the inode directly to the provider, and hence > it only ever sees the path as "/user/data/tab1". > It is debatable which path should be passed to the provider - > /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as > the behaviour has changed I feel we should ensure the old behaviour is > retained. > It would also be fairly easy to provide a config switch so the provider gets > the full snapshot path or the resolved path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15350) Set dfs.client.failover.random.order to true as default
[ https://issues.apache.org/jira/browse/HDFS-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15350: -- Affects Version/s: 3.4.0 > Set dfs.client.failover.random.order to true as default > --- > > Key: HDFS-15350 > URL: https://issues.apache.org/jira/browse/HDFS-15350 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 3.4.0 >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Fix For: 3.4.0 > > > {noformat} > Currently, the default value of dfs.client.failover.random.order is > false. If it's true, clients access to NameNodes random order instead > of the configured order which is defined in hdfs-site.xml. > Setting dfs.client.failover.random.order=true is very important for > RBF if there are multiple routers. If it's false, all the clients > point to the same router because routers are always active. > And I think dfs.client.failover.random.order=true would be good manner > for normal HA(two-NameNodes) Cluster too. If it's false and the first > NameNode is standby, clients always access to standby NameNode at > first. > So I'd like to set dfs.client.failover.random.order to true as default > from 3.4. Does anyone have any concerns? > {noformat} > https://lists.apache.org/thread.html/ra79dde30235a1d302ea82120de8829c0aa7d6c0789f4613430610b8a%40%3Chdfs-dev.hadoop.apache.org%3E -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15359) EC: Allow closing a file with committed blocks
[ https://issues.apache.org/jira/browse/HDFS-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15359: -- Affects Version/s: 3.4.0 > EC: Allow closing a file with committed blocks > -- > > Key: HDFS-15359 > URL: https://issues.apache.org/jira/browse/HDFS-15359 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.4.0 >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15359-01.patch, HDFS-15359-02.patch, > HDFS-15359-03.patch, HDFS-15359-04.patch, HDFS-15359-05.patch > > > Presently, {{dfs.namenode.file.close.num-committed-allowed}} is ignored in > case of EC blocks. But in case of heavy loads, IBR's from Datanode may get > delayed and cause the file write to fail. So, can allow EC files to close > with blocks in committed state as REP files -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15371) Nonstandard characters exist in NameNode.java
[ https://issues.apache.org/jira/browse/HDFS-15371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15371: -- Component/s: namanode > Nonstandard characters exist in NameNode.java > - > > Key: HDFS-15371 > URL: https://issues.apache.org/jira/browse/HDFS-15371 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.1.0 >Reporter: JiangHua Zhu >Assignee: Zhao Yi Ming >Priority: Minor > Fix For: 3.4.0 > > > In NameNode.Java, DFS_HA_ZKFC_PORT_KEY has non-standard characters behind it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15415) Reduce locking in Datanode DirectoryScanner
[ https://issues.apache.org/jira/browse/HDFS-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15415: -- Hadoop Flags: Reviewed > Reduce locking in Datanode DirectoryScanner > --- > > Key: HDFS-15415 > URL: https://issues.apache.org/jira/browse/HDFS-15415 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15415.001.patch, HDFS-15415.002.patch, > HDFS-15415.003.patch, HDFS-15415.004.patch, HDFS-15415.005.patch, > HDFS-15415.branch-3.1.001.patch, HDFS-15415.branch-3.1.002.patch, > HDFS-15415.branch-3.2.001.patch, HDFS-15415.branch-3.2.002.patch, > HDFS-15415.branch-3.3.001.patch > > > In HDFS-15406, we have a small change to greatly reduce the runtime and > locking time of the datanode DirectoryScanner. They may be room for further > improvement. > From the scan step, we have captured a snapshot of what is on disk. After > calling `dataset.getFinalizedBlocks(bpid);` we have taken a snapshot of in > memory. The two snapshots are never 100% in sync as things are always > changing as the disk is scanned. > We are only comparing finalized blocks, so they should not really change: > * If a block is deleted after our snapshot, our snapshot will not see it and > that is OK. > * A finalized block could be appended. If that happens both the genstamp and > length will change, but that should be handled by reconcile when it calls > `FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being > appended after they have been scanned from disk, but before they have been > compared with memory. > My suspicion is that we can do all the comparison work outside of the lock > and checkAndUpdate() re-checks any differences later under the lock on a > block by block basis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions
[ https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15372: -- Component/s: snapshots > Files in snapshots no longer see attribute provider permissions > --- > > Key: HDFS-15372 > URL: https://issues.apache.org/jira/browse/HDFS-15372 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, > HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch > > > Given a cluster with an authorization provider configured (eg Sentry) and the > paths covered by the provider are snapshotable, there was a change in > behaviour in how the provider permissions and ACLs are applied to files in > snapshots between the 2.x branch and Hadoop 3.0. > Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs > below are provided by Sentry: > {code} > hadoop fs -getfacl -R /data > # file: /data > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/tab1 > # owner: hive > # group: hive > user::rwx > group::--- > group:flume:rwx > user:hive:rwx > group:hive:rwx > group:testgroup:rwx > mask::rwx > other::--x > /data/tab1 > {code} > After taking a snapshot, the files in the snapshot do not see the provider > permissions: > {code} > hadoop fs -getfacl -R /data/.snapshot > # file: /data/.snapshot > # owner: > # group: > user::rwx > group::rwx > other::rwx > # file: /data/.snapshot/snap1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/.snapshot/snap1/tab1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > {code} > However pre-Hadoop 3.0 (when the attribute provider etc was extensively > refactored) snapshots did get the provider permissions. > The reason is this code in FSDirectory.java which ultimately calls the > attribute provider and passes the path we want permissions for: > {code} > INodeAttributes getAttributes(INodesInPath iip) > throws IOException { > INode node = FSDirectory.resolveLastINode(iip); > int snapshot = iip.getPathSnapshotId(); > INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot); > UserGroupInformation ugi = NameNode.getRemoteUser(); > INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi); > if (ap != null) { > // permission checking sends the full components array including the > // first empty component for the root. however file status > // related calls are expected to strip out the root component according > // to TestINodeAttributeProvider. > byte[][] components = iip.getPathComponents(); > components = Arrays.copyOfRange(components, 1, components.length); > nodeAttrs = ap.getAttributes(components, nodeAttrs); > } > return nodeAttrs; > } > {code} > The line: > {code} > INode node = FSDirectory.resolveLastINode(iip); > {code} > Picks the last resolved Inode and if you then call node.getPathComponents, > for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It > resolves the snapshot path to its original location, but its still the > snapshot inode. > However the logic passes 'iip.getPathComponents' which returns > "/user/.snapshot/snap1/tab" to the provider. > The pre Hadoop 3.0 code passes the inode directly to the provider, and hence > it only ever sees the path as "/user/data/tab1". > It is debatable which path should be passed to the provider - > /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as > the behaviour has changed I feel we should ensure the old behaviour is > retained. > It would also be fairly easy to provide a config switch so the provider gets > the full snapshot path or the resolved path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions
[ https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15372: -- Affects Version/s: 3.3.1 3.4.0 > Files in snapshots no longer see attribute provider permissions > --- > > Key: HDFS-15372 > URL: https://issues.apache.org/jira/browse/HDFS-15372 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.1, 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, > HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch > > > Given a cluster with an authorization provider configured (eg Sentry) and the > paths covered by the provider are snapshotable, there was a change in > behaviour in how the provider permissions and ACLs are applied to files in > snapshots between the 2.x branch and Hadoop 3.0. > Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs > below are provided by Sentry: > {code} > hadoop fs -getfacl -R /data > # file: /data > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/tab1 > # owner: hive > # group: hive > user::rwx > group::--- > group:flume:rwx > user:hive:rwx > group:hive:rwx > group:testgroup:rwx > mask::rwx > other::--x > /data/tab1 > {code} > After taking a snapshot, the files in the snapshot do not see the provider > permissions: > {code} > hadoop fs -getfacl -R /data/.snapshot > # file: /data/.snapshot > # owner: > # group: > user::rwx > group::rwx > other::rwx > # file: /data/.snapshot/snap1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > # file: /data/.snapshot/snap1/tab1 > # owner: hive > # group: hive > user::rwx > group::rwx > other::--x > {code} > However pre-Hadoop 3.0 (when the attribute provider etc was extensively > refactored) snapshots did get the provider permissions. > The reason is this code in FSDirectory.java which ultimately calls the > attribute provider and passes the path we want permissions for: > {code} > INodeAttributes getAttributes(INodesInPath iip) > throws IOException { > INode node = FSDirectory.resolveLastINode(iip); > int snapshot = iip.getPathSnapshotId(); > INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot); > UserGroupInformation ugi = NameNode.getRemoteUser(); > INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi); > if (ap != null) { > // permission checking sends the full components array including the > // first empty component for the root. however file status > // related calls are expected to strip out the root component according > // to TestINodeAttributeProvider. > byte[][] components = iip.getPathComponents(); > components = Arrays.copyOfRange(components, 1, components.length); > nodeAttrs = ap.getAttributes(components, nodeAttrs); > } > return nodeAttrs; > } > {code} > The line: > {code} > INode node = FSDirectory.resolveLastINode(iip); > {code} > Picks the last resolved Inode and if you then call node.getPathComponents, > for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It > resolves the snapshot path to its original location, but its still the > snapshot inode. > However the logic passes 'iip.getPathComponents' which returns > "/user/.snapshot/snap1/tab" to the provider. > The pre Hadoop 3.0 code passes the inode directly to the provider, and hence > it only ever sees the path as "/user/data/tab1". > It is debatable which path should be passed to the provider - > /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as > the behaviour has changed I feel we should ensure the old behaviour is > retained. > It would also be fairly easy to provide a config switch so the provider gets > the full snapshot path or the resolved path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
[ https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15418: -- Component/s: hdfs > ViewFileSystemOverloadScheme should represent mount links as non symlinks > - > > Key: HDFS-15418 > URL: https://issues.apache.org/jira/browse/HDFS-15418 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. > ViewFS represents the mount links as symlinks always. Since > ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not > have symlinks, ViewFs behavior symlinks can confuse. > So, here I propose to represent mount links as non symlinks in > ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
[ https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15418: -- Affects Version/s: 3.4.0 > ViewFileSystemOverloadScheme should represent mount links as non symlinks > - > > Key: HDFS-15418 > URL: https://issues.apache.org/jira/browse/HDFS-15418 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. > ViewFS represents the mount links as symlinks always. Since > ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not > have symlinks, ViewFs behavior symlinks can confuse. > So, here I propose to represent mount links as non symlinks in > ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15418) ViewFileSystemOverloadScheme should represent mount links as non symlinks
[ https://issues.apache.org/jira/browse/HDFS-15418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15418: -- Affects Version/s: 3.3.1 > ViewFileSystemOverloadScheme should represent mount links as non symlinks > - > > Key: HDFS-15418 > URL: https://issues.apache.org/jira/browse/HDFS-15418 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.3.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > Currently ViewFileSystemOverloadScheme uses ViewFileSystem default behavior. > ViewFS represents the mount links as symlinks always. Since > ViewFSOverloadScheme, we can have any scheme, and that scheme fs does not > have symlinks, ViewFs behavior symlinks can confuse. > So, here I propose to represent mount links as non symlinks in > ViewFSOverloadScheme -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15422: -- Affects Version/s: 3.3.1 3.4.0 > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15422-branch-2.10.001.patch, > HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15422) Reported IBR is partially replaced with stored info when queuing.
[ https://issues.apache.org/jira/browse/HDFS-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15422: -- Hadoop Flags: Reviewed > Reported IBR is partially replaced with stored info when queuing. > - > > Key: HDFS-15422 > URL: https://issues.apache.org/jira/browse/HDFS-15422 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Kihwal Lee >Assignee: Stephen O'Donnell >Priority: Critical > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.3 > > Attachments: HDFS-15422-branch-2.10.001.patch, > HDFS-15422-branch-2.10.002.patch, HDFS-15422.001.patch > > > When queueing an IBR (incremental block report) on a standby namenode, some > of the reported information is being replaced with the existing stored > information. This can lead to false block corruption. > We had a namenode, after transitioning to active, started reporting missing > blocks with "SIZE_MISMATCH" as corrupt reason. These were blocks that were > appended and the sizes were actually correct on the datanodes. Upon further > investigation, it was determined that the namenode was queueing IBRs with > altered information. > Although it sounds bad, I am not making it blocker -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15429) mkdirs should work when parent dir is internalDir and fallback configured.
[ https://issues.apache.org/jira/browse/HDFS-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15429: -- Component/s: hdfs > mkdirs should work when parent dir is internalDir and fallback configured. > -- > > Key: HDFS-15429 > URL: https://issues.apache.org/jira/browse/HDFS-15429 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > mkdir will not work if the parent dir is Internal mount dir (non leaf in > mount path) and fall back configured. > Since fallback is available and if same tree structure available in fallback, > we should be able to mkdir in fallback. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri
[ https://issues.apache.org/jira/browse/HDFS-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15449: -- Component/s: hdfs > Optionally ignore port number in mount-table name when picking from > initialized uri > --- > > Key: HDFS-15449 > URL: https://issues.apache.org/jira/browse/HDFS-15449 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > Currently mount-table name is used from uri's authority part. This authority > part contains IP:port/HOST:port. Some may configure without port as well. > ex: hdfs://ns1 or hdfs://ns1:8020 > It may be good idea to use only hostname/IP when users configured with > IP:port/HOST:port format. So, that we will have unique mount-table name in > both cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15430) create should work when parent dir is internalDir and fallback configured.
[ https://issues.apache.org/jira/browse/HDFS-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15430: -- Component/s: hdfs > create should work when parent dir is internalDir and fallback configured. > --- > > Key: HDFS-15430 > URL: https://issues.apache.org/jira/browse/HDFS-15430 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.2.1 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > create will not work if the parent dir is Internal mount dir (non leaf in > mount path) and fall back configured. > Since fallback is available and if same tree structure available in fallback, > we should be able to create in fallback fs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15449) Optionally ignore port number in mount-table name when picking from initialized uri
[ https://issues.apache.org/jira/browse/HDFS-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15449: -- Affects Version/s: 3.3.1 3.4.0 > Optionally ignore port number in mount-table name when picking from > initialized uri > --- > > Key: HDFS-15449 > URL: https://issues.apache.org/jira/browse/HDFS-15449 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.3.1, 3.4.0 >Reporter: Uma Maheswara Rao G >Assignee: Uma Maheswara Rao G >Priority: Major > Fix For: 3.3.1, 3.4.0 > > > Currently mount-table name is used from uri's authority part. This authority > part contains IP:port/HOST:port. Some may configure without port as well. > ex: hdfs://ns1 or hdfs://ns1:8020 > It may be good idea to use only hostname/IP when users configured with > IP:port/HOST:port format. So, that we will have unique mount-table name in > both cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15462) Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml
[ https://issues.apache.org/jira/browse/HDFS-15462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15462: -- Hadoop Flags: Reviewed > Add fs.viewfs.overload.scheme.target.ofs.impl to core-default.xml > - > > Key: HDFS-15462 > URL: https://issues.apache.org/jira/browse/HDFS-15462 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: configuration, viewfs, viewfsOverloadScheme >Affects Versions: 3.2.1 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > > HDFS-15394 added the existing impls in core-default.xml except ofs. Let's add > ofs to core-default here. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids
[ https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15488: -- Hadoop Flags: Reviewed > Add a command to list all snapshots for a snaphottable root with snapshot Ids > - > > Key: HDFS-15488 > URL: https://issues.apache.org/jira/browse/HDFS-15488 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15488.000.patch > > > Currently, the way to list snapshots is do a ls on > /.snapshot directory. Since creation time is not > recorded , there is no way to actually figure out the chronological order of > snapshots. The idea here is to add a command to list snapshots for a > snapshottable directory along with snapshot Ids which grow monotonically as > snapshots are created in the system. With snapID, it will be helpful to > figure out the chronology of snapshots in the system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids
[ https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15488: -- Affects Version/s: 3.4.0 > Add a command to list all snapshots for a snaphottable root with snapshot Ids > - > > Key: HDFS-15488 > URL: https://issues.apache.org/jira/browse/HDFS-15488 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15488.000.patch > > > Currently, the way to list snapshots is do a ls on > /.snapshot directory. Since creation time is not > recorded , there is no way to actually figure out the chronological order of > snapshots. The idea here is to add a command to list snapshots for a > snapshottable directory along with snapshot Ids which grow monotonically as > snapshots are created in the system. With snapID, it will be helpful to > figure out the chronology of snapshots in the system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15492) Make trash root inside each snapshottable directory
[ https://issues.apache.org/jira/browse/HDFS-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15492: -- Hadoop Flags: Reviewed > Make trash root inside each snapshottable directory > --- > > Key: HDFS-15492 > URL: https://issues.apache.org/jira/browse/HDFS-15492 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, hdfs-client >Affects Versions: 3.2.1 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside > one snapshottable directories are moved outside of it. The most common case > of this is when trash is enabled and user deletes some file via the command > line without skipTrash. > This jira aims to make a trash root for each snapshottable directory, same as > how encryption zone behaves at the moment. > This will make trash cleanup a little bit more expensive on the NameNode as > it will be to iterate all trash roots. But should be fine as long as there > aren't many snapshottable directories. > I could make this improvement as an option and disable it by default if > needed, such as {{dfs.namenode.snapshot.trashroot.enabled}} > One small caveat though, when disabling (disallowing) snapshot on the > snapshottable directory when this improvement is in place. The client should > merge the snapshottable directory's trash with that user's trash to ensure > proper trash cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15493: -- Hadoop Flags: Reviewed > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, > HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, > HDFS-15493.006.patch, HDFS-15493.007.patch, HDFS-15493.008.patch, > fsimage-loading.log > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduce to 410s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.
[ https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15493: -- Affects Version/s: 3.3.1 3.4.0 > Update block map and name cache in parallel while loading fsimage. > -- > > Key: HDFS-15493 > URL: https://issues.apache.org/jira/browse/HDFS-15493 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.3.1, 3.4.0 >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15493.001.patch, HDFS-15493.002.patch, > HDFS-15493.003.patch, HDFS-15493.004.patch, HDFS-15493.005.patch, > HDFS-15493.006.patch, HDFS-15493.007.patch, HDFS-15493.008.patch, > fsimage-loading.log > > > While loading INodeDirectorySection of fsimage, it will update name cache and > block map after added inode file to inode directory. It would reduce time > cost of fsimage loading to enable these steps run in parallel. > In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load > fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost > reduce to 410s. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15499) Clean up httpfs/pom.xml to remove aws-java-sdk-s3 exclusion
[ https://issues.apache.org/jira/browse/HDFS-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15499: -- Affects Version/s: 3.3.1 3.4.0 > Clean up httpfs/pom.xml to remove aws-java-sdk-s3 exclusion > --- > > Key: HDFS-15499 > URL: https://issues.apache.org/jira/browse/HDFS-15499 > Project: Hadoop HDFS > Issue Type: Bug > Components: httpfs >Affects Versions: 3.3.1, 3.4.0 >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Major > Fix For: 3.1.4, 3.2.2, 2.10.1, 3.3.1, 3.4.0 > > > In [HADOOP-14040] we use shaded aws-sdk uber-JAR for instead of s3 jar in > hadoop-project/pom.xml. After that, we should also update httpfs `pom.xml` > file to exclude the correct jar dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15506) [JDK 11] Fix javadoc errors in hadoop-hdfs module
[ https://issues.apache.org/jira/browse/HDFS-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15506: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 11] Fix javadoc errors in hadoop-hdfs module > - > > Key: HDFS-15506 > URL: https://issues.apache.org/jira/browse/HDFS-15506 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15506.001.patch, HDFS-15506.002.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java:43: > error: self-closing element not allowed > [ERROR] * > [ERROR]^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java:682: > error: malformed HTML > [ERROR]* a NameNode per second. Values <= 0 disable throttling. This > affects > [ERROR]^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java:1780: > error: exception not thrown: java.io.FileNotFoundException > [ERROR]* @throws FileNotFoundException > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectorySnapshottableFeature.java:176: > error: @param name not found > [ERROR]* @param mtime The snapshot creation time set by Time.now(). > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java:2187: > error: exception not thrown: java.lang.Exception > [ERROR]* @exception Exception if the filesystem does not exist. > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/a0c16f0408a623e798dd7df29fbddf82 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15507) [JDK 11] Fix javadoc errors in hadoop-hdfs-client module
[ https://issues.apache.org/jira/browse/HDFS-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15507: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 11] Fix javadoc errors in hadoop-hdfs-client module > > > Key: HDFS-15507 > URL: https://issues.apache.org/jira/browse/HDFS-15507 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Xieming Li >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15507.001.patch, HDFS-15507.002.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:32: > error: self-closing element not allowed > [ERROR] * > [ERROR]^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1245: > error: unexpected text > [ERROR]* Same as {@link #create(String, FsPermission, EnumSet, boolean, > short, long, > [ERROR] ^ > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java:161: > error: reference not found > [ERROR]* {@link HdfsConstants#LEASE_HARDLIMIT_PERIOD hard limit}. Until > the > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/7ab1c48a9bd7a0fdb11fa82eb04874d5 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15524) Add edit log entry for Snapshot deletion GC thread snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15524: -- Hadoop Flags: Reviewed > Add edit log entry for Snapshot deletion GC thread snapshot deletion > > > Key: HDFS-15524 > URL: https://issues.apache.org/jira/browse/HDFS-15524 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > > Currently, Snapshot deletion Gc thread doesn't create an edit log transaction > when the actual snapshot is garbage collected. In cases as such, what might > happen is, if the gc thread deletes snapshots and then namenode is > restarted, snapshots which were garbage collected by the snapshot gc thread > prior restart will reapper till the gc thread again picks them up for garbage > collection as the edits were not captured for actual garbage collection and > at the same time data might have already been deleted from the datanodes > which may lead to too many spurious missing block alerts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
[ https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15508: -- Affects Version/s: 3.3.1 3.4.0 > [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module > - > > Key: HDFS-15508 > URL: https://issues.apache.org/jira/browse/HDFS-15508 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15508.01.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21: > error: reference not found > [ERROR] * Implementations should extend {@link > AbstractDelegationTokenSecretManager}. > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15508) [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module
[ https://issues.apache.org/jira/browse/HDFS-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15508: -- Hadoop Flags: Reviewed > [JDK 11] Fix javadoc errors in hadoop-hdfs-rbf module > - > > Key: HDFS-15508 > URL: https://issues.apache.org/jira/browse/HDFS-15508 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 3.3.1, 3.4.0 >Reporter: Akira Ajisaka >Assignee: Akira Ajisaka >Priority: Major > Labels: newbie > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15508.01.patch > > > {noformat} > [ERROR] > /Users/aajisaka/git/hadoop/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/security/token/package-info.java:21: > error: reference not found > [ERROR] * Implementations should extend {@link > AbstractDelegationTokenSecretManager}. > [ERROR] ^ > {noformat} > Full error log: > https://gist.github.com/aajisaka/a7dde76a4ba2942f60bf6230ec9ed6e1 > How to reproduce the failure: > * Remove {{true}} from pom.xml > * Run {{mvn process-sources javadoc:javadoc-no-fork}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15524) Add edit log entry for Snapshot deletion GC thread snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15524: -- Affects Version/s: 3.4.0 > Add edit log entry for Snapshot deletion GC thread snapshot deletion > > > Key: HDFS-15524 > URL: https://issues.apache.org/jira/browse/HDFS-15524 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > > Currently, Snapshot deletion Gc thread doesn't create an edit log transaction > when the actual snapshot is garbage collected. In cases as such, what might > happen is, if the gc thread deletes snapshots and then namenode is > restarted, snapshots which were garbage collected by the snapshot gc thread > prior restart will reapper till the gc thread again picks them up for garbage > collection as the edits were not captured for actual garbage collection and > at the same time data might have already been deleted from the datanodes > which may lead to too many spurious missing block alerts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15539) When disallowing snapshot on a dir, throw exception if its trash root is not empty
[ https://issues.apache.org/jira/browse/HDFS-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15539: -- Hadoop Flags: Reviewed > When disallowing snapshot on a dir, throw exception if its trash root is not > empty > -- > > Key: HDFS-15539 > URL: https://issues.apache.org/jira/browse/HDFS-15539 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When snapshot is disallowed on a dir, {{getTrashRoots()}} won't return the > trash root in that dir anymore (if any). The risk is the trash root will be > left there forever. > We need to throw an exception there and prompt the user to clean up or rename > the trash root if it is not empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15539) When disallowing snapshot on a dir, throw exception if its trash root is not empty
[ https://issues.apache.org/jira/browse/HDFS-15539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15539: -- Affects Version/s: 3.4.0 > When disallowing snapshot on a dir, throw exception if its trash root is not > empty > -- > > Key: HDFS-15539 > URL: https://issues.apache.org/jira/browse/HDFS-15539 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > When snapshot is disallowed on a dir, {{getTrashRoots()}} won't return the > trash root in that dir anymore (if any). The risk is the trash root will be > left there forever. > We need to throw an exception there and prompt the user to clean up or rename > the trash root if it is not empty. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15542: -- Component/s: test > Add identified snapshot corruption tests for ordered snapshot deletion > -- > > Key: HDFS-15542 > URL: https://issues.apache.org/jira/browse/HDFS-15542 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots, test >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage > corruption sequences with snapshots . The idea here is to aggregate these > unit tests and enabled them for ordered snapshot deletion feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15540) Directories protected from delete can still be moved to the trash
[ https://issues.apache.org/jira/browse/HDFS-15540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15540: -- Hadoop Flags: Reviewed > Directories protected from delete can still be moved to the trash > - > > Key: HDFS-15540 > URL: https://issues.apache.org/jira/browse/HDFS-15540 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15540.001.patch > > > With HDFS-8983, HDFS-14802 and HDFS-15243 we are able to list protected > directories which cannot be deleted or renamed, provided the following is set: > fs.protected.directories: > dfs.protected.subdirectories.enable: true > Testing this feature out, I can see it mostly works fine, but protected > non-empty folders can still be moved to the trash. In this example > /dir/protected is set in fs.protected.directories, and > dfs.protected.subdirectories.enable is true. > {code} > hadoop fs -ls -R /dir > drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected > -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/file1 > drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir1 > -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir1/file1 > drwxr-xr-x - hdfs supergroup 0 2020-08-26 16:52 /dir/protected/subdir2 > -rw-r--r-- 3 hdfs supergroup 174 2020-08-26 16:52 /dir/protected/subdir2/file1 > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected/subdir1 > rm: Cannot delete/rename subdirectory under protected subdirectory > /dir/protected > [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected/subdir1 > /dir/protected/subdir1-moved > mv: Cannot delete/rename subdirectory under protected subdirectory > /dir/protected > ** ALL GOOD SO FAR ** > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected/subdir1 > 2020-08-26 16:54:32,404 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://nn1/dir/protected/subdir1' to trash at: > hdfs://nn1/user/hdfs/.Trash/Current/dir/protected/subdir1 > ** It moved the protected sub-dir to the trash, where it will be deleted ** > ** Checking the top level dir, it is the same ** > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f -skipTrash /dir/protected > rm: Cannot delete/rename non-empty protected directory /dir/protected > [hdfs@7d67ed1af9b0 /]$ hadoop fs -mv /dir/protected /dir/protected-new > mv: Cannot delete/rename non-empty protected directory /dir/protected > [hdfs@7d67ed1af9b0 /]$ hadoop fs -rm -r -f /dir/protected > 2020-08-26 16:55:32,402 INFO fs.TrashPolicyDefault: Moved: > 'hdfs://nn1/dir/protected' to trash at: > hdfs://nn1/user/hdfs/.Trash/Current/dir/protected1598460932388 > {code} > The reason for this, seems to be that "move to trash" uses a different rename > method in FSNameSystem and FSDirRenameOp which avoids the > DFSUtil.checkProtectedDescendants(...) in the earlier Jiras. > I believe that "move to trash" should be protected in the same way as a > -skipTrash delete. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15541) Disallow making a Snapshottable directory unsnapshottable if it has no empty snapshot trash directory
[ https://issues.apache.org/jira/browse/HDFS-15541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15541: -- Fix Version/s: (was: 3.4.0) > Disallow making a Snapshottable directory unsnapshottable if it has no empty > snapshot trash directory > - > > Key: HDFS-15541 > URL: https://issues.apache.org/jira/browse/HDFS-15541 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Siyao Meng >Priority: Major > > If the snapshot trash is enabled, a snapshottable directory should be > disallowed to be marked unsnapshottable if it has non-empty snapshot trash > directory. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15542: -- Hadoop Flags: Reviewed > Add identified snapshot corruption tests for ordered snapshot deletion > -- > > Key: HDFS-15542 > URL: https://issues.apache.org/jira/browse/HDFS-15542 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots, test >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage > corruption sequences with snapshots . The idea here is to aggregate these > unit tests and enabled them for ordered snapshot deletion feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire
[ https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15545: -- Component/s: webhdfs > (S)Webhdfs will not use updated delegation tokens available in the ugi after > the old ones expire > > > Key: HDFS-15545 > URL: https://issues.apache.org/jira/browse/HDFS-15545 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.4.0 >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > WebHdfsFileSystem can select a delegation token to use from the current user > UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it > every time without searching the UGI again. > If the previous token expires, WebHdfsFileSystem will catch the exception and > attempt to get a new token. However, the mechanism to get a new token > bypasses searching for one on the UGI, so even if there is external logic > that has retrieved a new token, it is not possible to make the FileSystem use > the new, valid token, rendering the FileSystem object unusable. > A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it > finds a different token than the cached one try to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15542) Add identified snapshot corruption tests for ordered snapshot deletion
[ https://issues.apache.org/jira/browse/HDFS-15542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15542: -- Affects Version/s: 3.4.0 > Add identified snapshot corruption tests for ordered snapshot deletion > -- > > Key: HDFS-15542 > URL: https://issues.apache.org/jira/browse/HDFS-15542 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Affects Versions: 3.4.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HDFS-13101, HDFS-15012 and HDFS-15313 along with HDFS-15470 have fsimage > corruption sequences with snapshots . The idea here is to aggregate these > unit tests and enabled them for ordered snapshot deletion feature. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire
[ https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15545: -- Hadoop Flags: Reviewed Target Version/s: 3.3.2, 3.4.0 (was: 3.4.0, 3.3.2) > (S)Webhdfs will not use updated delegation tokens available in the ugi after > the old ones expire > > > Key: HDFS-15545 > URL: https://issues.apache.org/jira/browse/HDFS-15545 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.4.0 >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > WebHdfsFileSystem can select a delegation token to use from the current user > UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it > every time without searching the UGI again. > If the previous token expires, WebHdfsFileSystem will catch the exception and > attempt to get a new token. However, the mechanism to get a new token > bypasses searching for one on the UGI, so even if there is external logic > that has retrieved a new token, it is not possible to make the FileSystem use > the new, valid token, rendering the FileSystem object unusable. > A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it > finds a different token than the cached one try to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15545) (S)Webhdfs will not use updated delegation tokens available in the ugi after the old ones expire
[ https://issues.apache.org/jira/browse/HDFS-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-15545: -- Affects Version/s: 3.4.0 > (S)Webhdfs will not use updated delegation tokens available in the ugi after > the old ones expire > > > Key: HDFS-15545 > URL: https://issues.apache.org/jira/browse/HDFS-15545 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.4.0 >Reporter: Issac Buenrostro >Assignee: Issac Buenrostro >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15545.001.patch, HDFS-15545.002.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > WebHdfsFileSystem can select a delegation token to use from the current user > UGI. The token selection is sticky, and WebHdfsFileSystem will re-use it > every time without searching the UGI again. > If the previous token expires, WebHdfsFileSystem will catch the exception and > attempt to get a new token. However, the mechanism to get a new token > bypasses searching for one on the UGI, so even if there is external logic > that has retrieved a new token, it is not possible to make the FileSystem use > the new, valid token, rendering the FileSystem object unusable. > A simple fix would allow WebHdfsFileSystem to re-search the UGI, and if it > finds a different token than the cached one try to use it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org