[jira] [Updated] (HDFS-17151) EC: Fix wrong metadata in BlockInfoStriped after recovery

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17151:
--
  Component/s: erasure-coding
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> EC: Fix wrong metadata in BlockInfoStriped after recovery
> -
>
> Key: HDFS-17151
> URL: https://issues.apache.org/jira/browse/HDFS-17151
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When the datanode completes a block recovery, it will call 
> `commitBlockSynchronization` method to notify NN the new locations of the 
> block. For a EC block group, NN determines the index of each internal block 
> based on the position of the DatanodeID in the parameter `newtargets`.
> If the internal blocks written by the client don't have continuous indices, 
> the current datanode code might cause NN to record incorrect block metadata. 
> For simplicity, let's take RS (3,2) as an example. The timeline of the 
> problem is as follows:
> 1. The client plans to write internal blocks with indices [0,1,2,3,4] to 
> datanode [dn0, dn1, dn2, dn3, dn4] respectively. But dn1 is unable to 
> connect, so the client only writes data to the remaining 4 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the content of `uc. getExpectedStorageLocations()` completely depends 
> on block reports, and now it is ;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. Datanode that receives the recovery command fills `DatanodeID [] newLocs` 
> with [dn0, null, dn2, dn3, dn4];
> 7. The serialization process filters out null values, so the parameters 
> passed to NN become [dn0, dn2, dn3, dn4];
> 8. NN mistakenly believes that dn2 stores an internal block with index 1, dn3 
> stores an internal block with index 2, and so on.
> The above timeline is just an example, and there are other situations that 
> may result in the same error, such as an update pipeline occurs on the client 
> side. We should fix this bug.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17150) EC: Fix the bug of failed lease recovery.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17150:
--
  Component/s: erasure-coding
 Target Version/s: 3.4.0, 3.3.9
Affects Version/s: 3.4.0
   3.3.9

> EC: Fix the bug of failed lease recovery.
> -
>
> Key: HDFS-17150
> URL: https://issues.apache.org/jira/browse/HDFS-17150
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0, 3.3.9
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> If the client crashes without writing the minimum number of internal blocks 
> required by the EC policy, the lease recovery process for the corresponding 
> unclosed file may continue to fail. Taking RS(6,3) policy as an example, the 
> timeline is as follows:
> 1. The client writes some data to only 5 datanodes;
> 2. Client crashes;
> 3. NN fails over;
> 4. Now the result of `uc.getNumExpectedLocations()` completely depends on 
> block report, and there are 5 datanodes reporting internal blocks;
> 5. When the lease expires hard limit, NN issues a block recovery command;
> 6. The datanode checks the command and finds that the number of internal 
> blocks is insufficient, resulting in an error and recovery failure;
> 7. The lease expires hard limit again, and NN issues a block recovery command 
> again, but the recovery fails again..
> When the number of internal blocks written by the client is less than 6, the 
> block group is actually unrecoverable. We should equate this situation to the 
> case where the number of replicas is 0 when processing replica files, i.e., 
> directly remove the last block group and close the file.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17140) Revisit the BPOfferService.reportBadBlocks() method.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17140:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Revisit the BPOfferService.reportBadBlocks() method.
> 
>
> Key: HDFS-17140
> URL: https://issues.apache.org/jira/browse/HDFS-17140
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Liangjun He
>Assignee: Liangjun He
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The current BPOfferService.reportBadBlocks() method can be optimized by 
> moving the creation of the rbbAction object outside the loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17148:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
> ---
>
> Key: HDFS-17148
> URL: https://issues.apache.org/jira/browse/HDFS-17148
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them 
> temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in 
> AbstractDelegationTokenSecretManager runs periodically to cleanup any expired 
> tokens from the cache, but most tokens have been evicted automatically per 
> the TTL configuration. This leads to many expired tokens in the SQL database 
> that should be cleaned up.
> The SQLDelegationTokenSecretManager should find expired tokens in SQL instead 
> of in the memory cache when running the periodic cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17145) Fix description of property dfs.namenode.file.close.num-committed-allowed.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17145:
--
Hadoop Flags: Reviewed

> Fix description of property dfs.namenode.file.close.num-committed-allowed.
> --
>
> Key: HDFS-17145
> URL: https://issues.apache.org/jira/browse/HDFS-17145
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Normally a file can only be closed with all its blocks are complete.
> But in hdfs-default.xml, it it committed. We should fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17144) Remove incorrect comment in method storeAllocatedBlock

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17144:
--
Hadoop Flags: Reviewed

> Remove incorrect comment in method storeAllocatedBlock
> --
>
> Key: HDFS-17144
> URL: https://issues.apache.org/jira/browse/HDFS-17144
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> As title described, remove incorrect comment in method storeAllocatedBlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17139) RBF: For the doc of the class RouterAdminProtocolTranslatorPB, it describes the function of the class ClientNamenodeProtocolTranslatorPB

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17139:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: For the doc of the class RouterAdminProtocolTranslatorPB, it describes 
> the function of the class ClientNamenodeProtocolTranslatorPB
> 
>
> Key: HDFS-17139
> URL: https://issues.apache.org/jira/browse/HDFS-17139
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-17139.001.patch, HDFS-17139.patch
>
>
> For the doc of the class RouterAdminProtocolTranslatorPB, it describes the 
> function of the class ClientNamenodeProtocolTranslatorPB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17137:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Standby/Observer NameNode skip to handle redundant replica block logic when 
> set decrease replication. 
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17138) RBF: We changed the hadoop.security.auth_to_local configuration of one router, the other routers stopped working

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17138:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: We changed the hadoop.security.auth_to_local configuration of one 
> router, the other routers stopped working
> 
>
> Key: HDFS-17138
> URL: https://issues.apache.org/jira/browse/HDFS-17138
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
> Environment: hadoop 3.3.0
>Reporter: Xiping Zhang
>Assignee: Xiping Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-08-02-16-20-34-454.png, 
> image-2023-08-03-10-32-03-457.png
>
>
> other routers  error log:
> !image-2023-08-02-16-20-34-454.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17136) Fix annotation description and typo in BlockPlacementPolicyDefault Class

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17136:
--
Affects Version/s: 3.4.0

> Fix annotation description and typo in BlockPlacementPolicyDefault Class
> 
>
> Key: HDFS-17136
> URL: https://issues.apache.org/jira/browse/HDFS-17136
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix annotation description and typo in BlockPlacementPolicyDefault Class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17136) Fix annotation description and typo in BlockPlacementPolicyDefault Class

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17136:
--
Hadoop Flags: Reviewed

> Fix annotation description and typo in BlockPlacementPolicyDefault Class
> 
>
> Key: HDFS-17136
> URL: https://issues.apache.org/jira/browse/HDFS-17136
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix annotation description and typo in BlockPlacementPolicyDefault Class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17136) Fix annotation description and typo in BlockPlacementPolicyDefault Class

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17136:
--
Component/s: block placement

> Fix annotation description and typo in BlockPlacementPolicyDefault Class
> 
>
> Key: HDFS-17136
> URL: https://issues.apache.org/jira/browse/HDFS-17136
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix annotation description and typo in BlockPlacementPolicyDefault Class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17136) Fix annotation description and typo in BlockPlacementPolicyDefault Class

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17136:
--
Target Version/s: 3.4.0

> Fix annotation description and typo in BlockPlacementPolicyDefault Class
> 
>
> Key: HDFS-17136
> URL: https://issues.apache.org/jira/browse/HDFS-17136
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: huangzhaobo
>Assignee: huangzhaobo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix annotation description and typo in BlockPlacementPolicyDefault Class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17135) Update fsck -blockId to display excess state info of blocks

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17135:
--
Component/s: namnode

> Update fsck -blockId  to display excess state info of blocks
> 
>
> Key: HDFS-17135
> URL: https://issues.apache.org/jira/browse/HDFS-17135
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namnode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Update fsck -blockId to display excess state info of blocks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17133) TestFsDatasetImpl missing null check when cleaning up

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17133:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> TestFsDatasetImpl missing null check when cleaning up
> -
>
> Key: HDFS-17133
> URL: https://issues.apache.org/jira/browse/HDFS-17133
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: ConfX
>Assignee: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: reproduce.sh
>
>
> h2. What happened
> I turned on {{dfs.namenode.quota.init-threads=1468568631}} and then the test 
> {{{}org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl#testMoveBlockFailure{}}}fails
>  with null pointer.
> h2. Where's the problem
> In the clean up part of the test:
> {noformat}
>     } finally {
>       if (cluster.isClusterUp()) {
>         cluster.shutdown();
>       }
>     }{noformat}
> if cluster is null, the test would directly fail with a null pointer 
> exception and hiding potentially the actual failure.
> h2. How to reproduce
>  # set {{{}dfs.namenode.quota.init-threads{}}}={{{}1468568631 {}}}
>  # run 
> {{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl#testMoveBlockFailure}}
> you should observe
> {noformat}
> java.lang.NullPointerException
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl.testMoveBlockFailure(TestFsDatasetImpl.java:1005){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17135) Update fsck -blockId to display excess state info of blocks

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17135:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Update fsck -blockId  to display excess state info of blocks
> 
>
> Key: HDFS-17135
> URL: https://issues.apache.org/jira/browse/HDFS-17135
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namnode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Update fsck -blockId to display excess state info of blocks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17134) RBF: Fix duplicate results of getListing through Router.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17134:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Fix duplicate results of getListing through Router.
> 
>
> Key: HDFS-17134
> URL: https://issues.apache.org/jira/browse/HDFS-17134
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The result of `getListing` in NameNode are sorted based on `byte[]`, while 
> the Router side is based on `String`. If there are special characters in 
> path, the sorting result of the router will be inconsistent with the 
> namenode. This may result in duplicate `getListing` results obtained by the 
> client due to wrong `startAfter` parameter.
> For exemple, namenode returns [path1, path2, path3] for a `getListing` 
> request, while router returns [path1, path3, path2] to client. Then client 
> will pass `path2` as `startAfter`  at the next iteration, so it will receive 
> `path3` again.
> We need to fix the Router code so that the order of its result is the same as 
> NameNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17122) Rectify the table length discrepancy in the DataNode UI.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17122:
--
 Component/s: ui
Target Version/s: 3.4.0

> Rectify the table length discrepancy in the DataNode UI.
> 
>
> Key: HDFS-17122
> URL: https://issues.apache.org/jira/browse/HDFS-17122
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-07-25-18-12-10-582.png
>
>
> The hidden column settings in *table-datanodes.dataTable* have caused an 
> error in the calculation of table length in {*}dataTable{*}.
> !image-2023-07-25-18-12-10-582.png|width=798,height=318!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17119) RBF: Logger fix for StateStoreMySQLImpl

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17119:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Logger fix for StateStoreMySQLImpl
> ---
>
> Key: HDFS-17119
> URL: https://issues.apache.org/jira/browse/HDFS-17119
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17117) Print reconstructionQueuesInitProgress periodically when BlockManager processMisReplicatesAsync.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17117:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Print reconstructionQueuesInitProgress periodically when BlockManager 
> processMisReplicatesAsync.
> 
>
> Key: HDFS-17117
> URL: https://issues.apache.org/jira/browse/HDFS-17117
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> BlockManager#processMisReplicatesAsync can periodically print 
> reconstructionQueuesInitProgress,  so that the admin can get the progress of 
> the replication queues initialisation.
> Ready to add logs and metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17116) RBF: Update invoke millisecond time as monotonicNow() in RouterSafemodeService.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17116:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Update invoke millisecond time as monotonicNow() in 
> RouterSafemodeService.
> ---
>
> Key: HDFS-17116
> URL: https://issues.apache.org/jira/browse/HDFS-17116
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The following exceptions occurred in our online environment:
> # After the machine restarts, the system time is abnormal, is a time in the 
> future
> # After starting the router, there is log "safemode exit for 24981702 
> milliseconds...", which has been in the safemode state,
> this is mainly because the startupTime is recorded as the future system time 
> when router is started at this time, and the system time returns to normal 
> soon, resulting in a negative delta,
> at this time, the service can only be restored by restart the router service.
> The relevant logs are:
> {code:java}
> 2023-07-15 03:15:49,276 INFO  ipc.Server xxx
> 2023-07-15 11:21:03,785 INFO  router.DFSRouter (LogAdapter.java:info(51)) 
> [main] - STARTUP_MSG:
> /
> STARTUP_MSG: Starting Router
> ...
> 2023-07-15 11:21:51,325 INFO xxx
> 2023-07-15 03:22:00,257 INFO xxx
> 2023-07-15 03:22:29,829 INFO router.RouterSafemodeService 
> (RouterSafemodeService.java:periodicInvoke(167)) [RouterSafemodeService-0] - 
> Delaying safemode exit for 28761777 milliseconds...
> {code}
> Maybe we can be compatible with this case at the code level, and reset the 
> startupTime and enterSafeModeTime in the case of a negative delta,
> which can ensure that the router service can also exit the safemode state 
> normally after the system time returns to normal.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17111) RBF: Optimize msync to only call nameservices that have observer reads enabled.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17111:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Optimize msync to only call nameservices that have observer reads 
> enabled.
> ---
>
> Key: HDFS-17111
> URL: https://issues.apache.org/jira/browse/HDFS-17111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Right now when a client MSYNCs to the router, the call is fanned out to all 
> nameservices. We only need to proxy the msync to nameservices that have 
> observer reads configured.
> We can do this either by adding a new config for the admin to specify which 
> nameservices have CRS configured, or we can try to automatically detect these.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17112) Show decommission duration in JMX and HTML

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17112:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Show decommission duration in JMX and HTML
> --
>
> Key: HDFS-17112
> URL: https://issues.apache.org/jira/browse/HDFS-17112
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Expose decommission duration time in JMX page. It's a very useful info when 
> decommissioning a batch of datanodes in a cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17115) HttpFS Add Support getErasureCodeCodecs API

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17115:
--
Target Version/s: 3.4.0

> HttpFS Add Support getErasureCodeCodecs API
> ---
>
> Key: HDFS-17115
> URL: https://issues.apache.org/jira/browse/HDFS-17115
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> We should ensure that *WebHDFS* remains synchronized with {*}HttpFS{*}, as 
> the former has already implemented the *getErasureCodeCodecs* interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17093) Fix block report lease issue to avoid missing some storages report.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17093:
--
Target Version/s: 3.4.0

> Fix block report lease issue to avoid missing some storages report.
> ---
>
> Key: HDFS-17093
> URL: https://issues.apache.org/jira/browse/HDFS-17093
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: Yanlei Yu
>Assignee: Yanlei Yu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In our cluster of 800+ nodes, after restarting the namenode, we found that 
> some datanodes did not report enough blocks, causing the namenode to stay in 
> secure mode for a long time after restarting because of incomplete block 
> reporting
> I found in the logs of the datanode with incomplete block reporting that the 
> first FBR attempt failed, possibly due to namenode stress, and then a second 
> FBR attempt was made as follows:
> {code:java}
> 
> 2023-07-17 11:29:28,982 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Unsuccessfully sent block report 0x6237a52c1e817e,  containing 12 storage 
> report(s), of which we sent 1. The reports had 1099057 total blocks and used 
> 1 RPC(s). This took 294 msec to generate and 101721 msecs for RPC and NN 
> processing. Got back no commands.
> 2023-07-17 11:37:04,014 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Successfully sent block report 0x62382416f3f055,  containing 12 storage 
> report(s), of which we sent 12. The reports had 1099048 total blocks and used 
> 12 RPC(s). This took 295 msec to generate and 11647 msecs for RPC and NN 
> processing. Got back no commands. {code}
> There's nothing wrong with that. Retry the send if it fails But on the 
> namenode side of the logic:
> {code:java}
> if (namesystem.isInStartupSafeMode()
>     && !StorageType.PROVIDED.equals(storageInfo.getStorageType())
>     && storageInfo.getBlockReportCount() > 0) {
>   blockLog.info("BLOCK* processReport 0x{} with lease ID 0x{}: "
>       + "discarded non-initial block report from {}"
>       + " because namenode still in startup phase",
>       strBlockReportId, fullBrLeaseId, nodeID);
>   blockReportLeaseManager.removeLease(node);
>   return !node.hasStaleStorages();
> } {code}
> When a disk was identified as the report is not the first time, namely 
> storageInfo. GetBlockReportCount > 0, Will remove the ticket from the 
> datanode, lead to a second report failed because no lease



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17105) mistakenly purge editLogs even after it is empty in NNStorageRetentionManager

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17105:
--
  Component/s: namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

>  mistakenly purge editLogs even after it is empty in NNStorageRetentionManager
> --
>
> Key: HDFS-17105
> URL: https://issues.apache.org/jira/browse/HDFS-17105
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: ConfX
>Assignee: ConfX
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got {{IndexOutOfBoundsException}} after setting 
> {{dfs.namenode.max.extra.edits.segments.retained}} to a negative value and 
> purging old record with {{{}NNStorageRetentionManager{}}}.
> h2. Where's the bug:
> In line 156 of {{{}NNStorageRetentionManager{}}}, the manager trims 
> {{editLogs}} until it is under the {{{}maxExtraEditsSegmentsToRetain{}}}:
> {noformat}
> while (editLogs.size() > maxExtraEditsSegmentsToRetain) {
>       purgeLogsFrom = editLogs.get(0).getLastTxId() + 1;
>       editLogs.remove(0);
> }{noformat}
> However, if {{dfs.namenode.max.extra.edits.segments.retained}} is set to 
> below 0 the size of {{editLogs}} would never be below, resulting in 
> ultimately {{editLog.size()=0}} and thus {{editLogs.get(0)}} is out of range.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.max.extra.edits.segments.retained}} to -1974676133
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager#testNoLogs}}
> h2. Stacktrace:
> {noformat}
> java.lang.IndexOutOfBoundsException: Index 0 out of bounds for length 0
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
>     at 
> java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
>     at 
> java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:248)
>     at java.base/java.util.Objects.checkIndex(Objects.java:372)
>     at java.base/java.util.ArrayList.get(ArrayList.java:459)
>     at 
> org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:157)
>     at 
> org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.runTest(TestNNStorageRetentionManager.java:299)
>     at 
> org.apache.hadoop.hdfs.server.namenode.TestNNStorageRetentionManager.testNoLogs(TestNNStorageRetentionManager.java:143){noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17094:
--
  Component/s: erasure-coding
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> EC: Fix bug in block recovery when there are stale datanodes
> 
>
> Key: HDFS-17094
> URL: https://issues.apache.org/jira/browse/HDFS-17094
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When a block recovery occurs, `RecoveryTaskStriped` in datanode expects 
> `rBlock.getLocations()` and `rBlock. getBlockIndices()` to be in one-to-one 
> correspondence. However, if there are locations in stale state when NameNode 
> handles heartbeat, this correspondence will be disrupted. In detail, there is 
> no stale location in `recoveryLocations`, but the block indices array is 
> still complete (i.e. contains the indices of all the locations). This will 
> cause `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate a wrong 
> internal block ID, and the corresponding datanode cannot find the replica, 
> thus making the recovery process fail. This bug needs to be fixed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17088) Improve the debug verifyEC and dfsrouteradmin commands in HDFSCommands.md

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17088:
--
  Component/s: dfsadmin
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Improve the debug verifyEC and dfsrouteradmin commands in HDFSCommands.md
> -
>
> Key: HDFS-17088
> URL: https://issues.apache.org/jira/browse/HDFS-17088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsadmin
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Improve the debug verifyEC and dfsrouteradmin commands in HDFSCommands.md
> * the debug verifyEC add new params description, such as [-blockId ] 
> [-skipFailureBlocks].
> * dfsrouteradmin added a new command about -addAll.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17087) Add Throttler for datanode reading block

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17087:
--
  Component/s: datanode
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add Throttler for datanode reading block 
> -
>
> Key: HDFS-17087
> URL: https://issues.apache.org/jira/browse/HDFS-17087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> DataXceiver#readBlock code is:
> {code:java}
> ...
>   read = blockSender.sendBlock(out, baseStream, null); // send data
> ..
> {code}
> Current DataXceiver#readBlock doesn't throttler, maybe we can support 
> throttle for reading block satisfied catering for specific scenarios.
> Default read throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17086) Fix the parameter settings in TestDiskspaceQuotaUpdate#updateCountForQuota.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17086:
--
  Component/s: test
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix the parameter settings in TestDiskspaceQuotaUpdate#updateCountForQuota.
> ---
>
> Key: HDFS-17086
> URL: https://issues.apache.org/jira/browse/HDFS-17086
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17082) Add documentation for provisionSnapshotTrash command to HDFSCommands.md and HdfsSnapshots.md

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17082:
--
  Component/s: documentation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add documentation for provisionSnapshotTrash command to HDFSCommands.md  and 
> HdfsSnapshots.md
> -
>
> Key: HDFS-17082
> URL: https://issues.apache.org/jira/browse/HDFS-17082
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-15607 and HDFS-15997 introduced provisionSnapshotTrash should add it to 
> the document.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17083) Support getErasureCodeCodecs API in WebHDFS

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17083:
--
Target Version/s: 3.4.0

> Support getErasureCodeCodecs API in WebHDFS
> ---
>
> Key: HDFS-17083
> URL: https://issues.apache.org/jira/browse/HDFS-17083
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-07-12-22-52-15-954.png
>
>
> WebHDFS should support getErasureCodeCodecs:
> !image-2023-07-12-22-52-15-954.png|width=799,height=210!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17081) EC: Add logic for striped blocks in isSufficientlyReplicated

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17081:
--
  Component/s: erasure-coding
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> EC: Add logic for striped blocks in isSufficientlyReplicated
> 
>
> Key: HDFS-17081
> URL: https://issues.apache.org/jira/browse/HDFS-17081
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Append ec file check if a block is replicated to at least the minimum 
> replication need consider ec block.
> currently only the minimum replication of the replica is considered, the code 
> is as follows:
> {code:java}
> /**
>* Check if a block is replicated to at least the minimum replication.
>*/
>   public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live 
> DNs.
> final int liveReplicas = countNodes(b).liveReplicas();
> if (liveReplicas >= minReplication) {
>   return true;
> }
> // getNumLiveDataNodes() is very expensive and we minimize its use by
> // comparing with minReplication first.
> return liveReplicas >= getDatanodeManager().getNumLiveDataNodes();
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17075) Reconfig disk balancer parameters for datanode

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17075:
--
  Component/s: datanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Reconfig disk balancer parameters for datanode
> --
>
> Key: HDFS-17075
> URL: https://issues.apache.org/jira/browse/HDFS-17075
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Rolling restart datanodes takes long time, can make disk balanacer parameters 
> dfs.disk.balancer.enabled and dfs.disk.balancer.plan.valid.interval in 
> datanode reconfigurable to facilitate cluster operation and maintenance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17076) Remove the unused method isSlownodeByNameserviceId in DataNode

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17076:
--
  Component/s: datanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove the unused method isSlownodeByNameserviceId in DataNode
> --
>
> Key: HDFS-17076
> URL: https://issues.apache.org/jira/browse/HDFS-17076
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Remove the unused method isSlownodeByNameserviceId() in DataNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17070) Remove unused import in DataNodeMetricHelper.java.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17070:
--
Component/s: datanode

> Remove unused import in DataNodeMetricHelper.java.
> --
>
> Key: HDFS-17070
> URL: https://issues.apache.org/jira/browse/HDFS-17070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Remove unused import in DataNodeMetricHelper.java.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17073) Enhance the warning message output for BlockGroupNonStripedChecksumComputer#compute

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17073:
--
  Component/s: hdfs
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Enhance the warning message output for 
> BlockGroupNonStripedChecksumComputer#compute
> ---
>
> Key: HDFS-17073
> URL: https://issues.apache.org/jira/browse/HDFS-17073
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Consider improving the log output of the warning messages generated by 
> BlockGroupNonStripedChecksumComputer when calling checksumBlock, to make it 
> easier to locate the block information where an exception occurs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17070) Remove unused import in DataNodeMetricHelper.java.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17070:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove unused import in DataNodeMetricHelper.java.
> --
>
> Key: HDFS-17070
> URL: https://issues.apache.org/jira/browse/HDFS-17070
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Remove unused import in DataNodeMetricHelper.java.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17068) Datanode should record last directory scan time.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17068:
--
Affects Version/s: 3.4.0

> Datanode should record last directory scan time.
> 
>
> Key: HDFS-17068
> URL: https://issues.apache.org/jira/browse/HDFS-17068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> I think it is useful for us to record last directory scan time for one 
> datanode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17065) Fix typos in hadoop-hdfs-project

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17065:
--
  Component/s: hdfs
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix typos in hadoop-hdfs-project
> 
>
> Key: HDFS-17065
> URL: https://issues.apache.org/jira/browse/HDFS-17065
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17068) Datanode should record last directory scan time.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17068:
--
 Component/s: datanode
Target Version/s: 3.4.0

> Datanode should record last directory scan time.
> 
>
> Key: HDFS-17068
> URL: https://issues.apache.org/jira/browse/HDFS-17068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> I think it is useful for us to record last directory scan time for one 
> datanode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17064) Document the usage of the new Balancer "sortTopNodes" and "hotBlockTimeInterval" parameter

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17064:
--
  Component/s: balancer
   documentation
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Document the usage of the new Balancer "sortTopNodes" and 
> "hotBlockTimeInterval" parameter
> --
>
> Key: HDFS-17064
> URL: https://issues.apache.org/jira/browse/HDFS-17064
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer, documentation
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17063) Support to configure different capacity reserved for each disk of DataNode.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17063:
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0

> Support to configure different capacity reserved for each disk of DataNode.
> ---
>
> Key: HDFS-17063
> URL: https://issues.apache.org/jira/browse/HDFS-17063
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs
>Affects Versions: 3.3.6
>Reporter: Jiale Qi
>Assignee: Jiale Qi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now _dfs.datanode.du.reserved_ takes effect for all directory of a datanode.
> This issue allows cluster administrator to configure 
> {_}dfs.datanode.du.reserved./data/hdfs1/data{_}, which only take effect for a 
> specific directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17056:
--
  Component/s: erasure-coding
   (was: ec)
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> EC: Fix verifyClusterSetup output in case of an invalid param.
> --
>
> Key: HDFS-17056
> URL: https://issues.apache.org/jira/browse/HDFS-17056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Ayush Saxena
>Assignee: huangzhaobo
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>
> {code:java}
> bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
> 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
> XOR-2-1-1024k. The number of DataNodes is only 3. {code}
> verifyClusterSetup requires -policy then the name of policies, else it 
> defaults to all enabled policies.
> In case there are additional invalid options it silently ignores them, unlike 
> other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17057) RBF: Add DataNode maintenance states to Federation UI

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17057:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Add DataNode maintenance states to Federation UI 
> --
>
> Key: HDFS-17057
> URL: https://issues.apache.org/jira/browse/HDFS-17057
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Add DataNode maintenance states to Federation UI 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17053) Optimize method BlockInfoStriped#findSlot to reduce time complexity.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17053:
--
  Component/s: hdfs
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Optimize method BlockInfoStriped#findSlot to reduce time complexity.
> 
>
> Key: HDFS-17053
> URL: https://issues.apache.org/jira/browse/HDFS-17053
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, in method findSlot. there exists codes snippet below:
> {code:java}
> for (; i < getCapacity(); i++) {
>   if (getStorageInfo(i) == null) {
> return i;
>   }
> } {code}
> it will compute (triplets.length / 3;) every iteration, I think this can be 
> optimized.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17052) Improve BlockPlacementPolicyRackFaultTolerant to avoid choose nodes failed when no enough Rack.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17052:
--
Target Version/s: 3.4.0

> Improve BlockPlacementPolicyRackFaultTolerant to avoid choose nodes failed 
> when no enough Rack.
> ---
>
> Key: HDFS-17052
> URL: https://issues.apache.org/jira/browse/HDFS-17052
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: failed reconstruction ec in same rack-1.png, write ec in 
> same rack.png
>
>
> When writing EC data, if the number of racks matching the storageType is 
> insufficient, more than one block are allowed to be written to the same rack
> !write ec in same rack.png|width=962,height=604!
>  
>  
>  
> However, during EC block recovery, it is not possible to recover on the same 
> rack, which deviates from the expected behavior.
> !failed reconstruction ec in same rack-1.png|width=946,height=413!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17045) File renamed from a snapshottable dir to a non-snapshottable dir cannot be deleted.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17045:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> File renamed from a snapshottable dir to a non-snapshottable dir cannot be 
> deleted.
> ---
>
> Key: HDFS-17045
> URL: https://issues.apache.org/jira/browse/HDFS-17045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Affects Versions: 3.4.0
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-16972 added a 
> [shouldDestroy|https://github.com/szetszwo/hadoop/blob/331e075115b4a35574622318b26f6d4731658d57/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java#L834-L845]
>  method which caused the following bug.
> h3. Background:
>  - When {{FileSystem.rename(src, dst)}} from a snapshottable dir (src) to a 
> snapshottable dir (dst), dstSnapshotId is set to the latest snapshot at dst. 
> As a result, dst is NOT in dstSnapshotId because dstSnapshotId was already 
> taken before rename.
>  - snapshotToBeDeleted is the snapshot id of the current operation if the 
> operation is {{{}FileSystem.deleteSnapshot{}}}. Otherwise, 
> snapshotToBeDeleted is set to CURRENT_STATE_ID.
>  - If (snapshotToBeDeleted > dstSnapshotId), dst is in snapshotToBeDeleted. 
> The shouldDestroy method returns true to continue deletion.
>  - If (snapshotToBeDeleted <= dstSnapshotId), dst must not be in 
> snapshotToBeDeleted. The shouldDestroy method returns false to stop deletion.
> All the above are correct for renaming within snapshottable directories.
> h3. Bug:
>  - If rename(src, dst) from a snapshottable dir (src) to a non-snapshottable 
> dir (dst), dstSnapshotId becomes CURRENT_STATE_ID.
>  - When {{FileSystem.delete(dst)}} happens, snapshotToBeDeleted is also set 
> to CURRENT_STATE_ID.
>  - In this case, snapshotToBeDeleted == dstSnapshotId, the shouldDestroy 
> method will return false and it incorrectly stops the deletion.
> Not that this bug may cause fsimage corruption and quota miscalculation since 
> some files can be partially deleted.  Fortunately, this bug won't cause data 
> loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17043) HttpFS implementation for getAllErasureCodingPolicies

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17043:
--
Target Version/s: 3.4.0

> HttpFS implementation for getAllErasureCodingPolicies
> -
>
> Key: HDFS-17043
> URL: https://issues.apache.org/jira/browse/HDFS-17043
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HttpFS should support getAllErasureCodingPolicies API In order to be able to 
> retrieve all Erasure Coding Policies.. WebHdfs implementation available on 
> HDFS-17029.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17047) BlockManager#addStoredBlock should log storage id when AddBlockResult is REPLACED

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17047:
--
  Component/s: hdfs
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> BlockManager#addStoredBlock should log storage id when AddBlockResult is 
> REPLACED
> -
>
> Key: HDFS-17047
> URL: https://issues.apache.org/jira/browse/HDFS-17047
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Recently, we found some logs in active namenode frequently like belows:
>  
> {code:java}
> 2023-06-12 05:34:09,821 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 05:34:09,892 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 11:34:07,932 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 11:34:08,027 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 17:34:08,742 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 17:34:08,813 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 23:34:09,752 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-12 23:34:09,812 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 05:34:08,065 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 05:34:08,144 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 11:34:08,638 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010
> 2023-06-13 11:34:08,681 WARN BlockStateChange: BLOCK* addStoredBlock: block 
> blk_-9223372036614126544_57136788 moved to storageType DISK on node 
> datanode1:50010{code}
>  
>  
> All logs have the same ec block id : blk_-9223372036614126544_57136788  and 
> printed every 6 hours(FBR interval of our cluster).
> To figure out what happened, I think we should also log storage id here.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17045) File renamed from a snapshottable dir to a non-snapshottable dir cannot be deleted.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17045:
--
Hadoop Flags: Reviewed

> File renamed from a snapshottable dir to a non-snapshottable dir cannot be 
> deleted.
> ---
>
> Key: HDFS-17045
> URL: https://issues.apache.org/jira/browse/HDFS-17045
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, snapshots
>Affects Versions: 3.4.0
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-16972 added a 
> [shouldDestroy|https://github.com/szetszwo/hadoop/blob/331e075115b4a35574622318b26f6d4731658d57/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java#L834-L845]
>  method which caused the following bug.
> h3. Background:
>  - When {{FileSystem.rename(src, dst)}} from a snapshottable dir (src) to a 
> snapshottable dir (dst), dstSnapshotId is set to the latest snapshot at dst. 
> As a result, dst is NOT in dstSnapshotId because dstSnapshotId was already 
> taken before rename.
>  - snapshotToBeDeleted is the snapshot id of the current operation if the 
> operation is {{{}FileSystem.deleteSnapshot{}}}. Otherwise, 
> snapshotToBeDeleted is set to CURRENT_STATE_ID.
>  - If (snapshotToBeDeleted > dstSnapshotId), dst is in snapshotToBeDeleted. 
> The shouldDestroy method returns true to continue deletion.
>  - If (snapshotToBeDeleted <= dstSnapshotId), dst must not be in 
> snapshotToBeDeleted. The shouldDestroy method returns false to stop deletion.
> All the above are correct for renaming within snapshottable directories.
> h3. Bug:
>  - If rename(src, dst) from a snapshottable dir (src) to a non-snapshottable 
> dir (dst), dstSnapshotId becomes CURRENT_STATE_ID.
>  - When {{FileSystem.delete(dst)}} happens, snapshotToBeDeleted is also set 
> to CURRENT_STATE_ID.
>  - In this case, snapshotToBeDeleted == dstSnapshotId, the shouldDestroy 
> method will return false and it incorrectly stops the deletion.
> Not that this bug may cause fsimage corruption and quota miscalculation since 
> some files can be partially deleted.  Fortunately, this bug won't cause data 
> loss.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17044) Set size of non-exist block to NO_ACK when process FBR or IBR to avoid useless report from DataNode

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17044:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Set size of non-exist block to NO_ACK when process FBR or IBR to avoid 
> useless report from DataNode
> ---
>
> Key: HDFS-17044
> URL: https://issues.apache.org/jira/browse/HDFS-17044
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When NameNode processes DataNode increment or full block report, if block is 
> not in the blocks map, it will be added to invalidate and the replica should 
> be removed from the data-node, and the block size should be to set NO_ACK,  
> should be reduce some useless DataNode block reports.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17041) RBF: Fix putAll impl for mysql and file based state stores

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17041:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Fix putAll impl for mysql and file based state stores
> --
>
> Key: HDFS-17041
> URL: https://issues.apache.org/jira/browse/HDFS-17041
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Only zookeeper based state store allows all records to be inserted even 
> though only few of them already exists and "errorIfExists" is true, however 
> file/fs as well as mysql based putAll fails the whole putAll operation 
> immediately after encountering single record that already exists in the 
> records and "errorIfExists" is provided true (which is the case while 
> inserting records for the first time).
> For all implementations, we should allow inserts of the records that do not 
> already exist and report any record as failure that already exists, rather 
> than failing the whole operation and not trying to insert valid records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17037) Consider nonDfsUsed when running balancer

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17037:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Consider nonDfsUsed when running balancer
> -
>
> Key: HDFS-17037
> URL: https://issues.apache.org/jira/browse/HDFS-17037
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer & mover
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When we run balancer with `BalancingPolicy.Node` policy, our goal is to make 
> each datanode storage balanced. But in the current implementation, the 
> balancer doesn't account for storage used by non-dfs on the datanodes, which 
> can make the situation worse for datanodes that are already strained on 
> storage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17031) RBF: Reduce repeated code in RouterRpcServer

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17031:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: Reduce repeated code in RouterRpcServer
> 
>
> Key: HDFS-17031
> URL: https://issues.apache.org/jira/browse/HDFS-17031
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Reduce repeated codes :
>  
> {code:java}
> if (subclusterResolver instanceof MountTableResolver) {
>   try {
> MountTableResolver mountTable = (MountTableResolver)subclusterResolver;
> MountTable entry = mountTable.getMountPoint(path);
> // check logic
>   } catch (IOException e) {
> LOG.error("Cannot get mount point", e);
>   }
> }
> return false; {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17035) FsVolumeImpl#getActualNonDfsUsed may return negative value

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17035:
--
Target Version/s: 3.4.0

> FsVolumeImpl#getActualNonDfsUsed may return negative value
> --
>
> Key: HDFS-17035
> URL: https://issues.apache.org/jira/browse/HDFS-17035
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17027) RBF: Add supports for observer.auto-msync-period when using routers

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17027:
--
Affects Version/s: 3.4.0

> RBF: Add supports for observer.auto-msync-period when using routers
> ---
>
> Key: HDFS-17027
> URL: https://issues.apache.org/jira/browse/HDFS-17027
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> None-RBF clients that use observer reads have the option to set 
> *dfs.client.failover.observer.auto-msync-period.* . This config 
> makes the client automatically do an msync, allowing clients to use the 
> observer reads feature without any code change.
> To use observer reads with RBF, clients set 
> *dfs.client.rbf.observer.read.enable*. The way this flag is implemented does 
> not allow clients to use the *auto-msync-period* config. So with RBF, clients 
> either have to 
> # Not use observer reads
> # Use observer reads with the risk of stale reads
> # Make code changes to explicitly call msync.
> We should add support for 
> *dfs.client.failover.observer.auto-msync-period.*. This can be 
> done by adding a ProxyProvider, in a similar manner to the 
> ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17027) RBF: Add supports for observer.auto-msync-period when using routers

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17027:
--
 Component/s: rbf
Target Version/s: 3.4.0

> RBF: Add supports for observer.auto-msync-period when using routers
> ---
>
> Key: HDFS-17027
> URL: https://issues.apache.org/jira/browse/HDFS-17027
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> None-RBF clients that use observer reads have the option to set 
> *dfs.client.failover.observer.auto-msync-period.* . This config 
> makes the client automatically do an msync, allowing clients to use the 
> observer reads feature without any code change.
> To use observer reads with RBF, clients set 
> *dfs.client.rbf.observer.read.enable*. The way this flag is implemented does 
> not allow clients to use the *auto-msync-period* config. So with RBF, clients 
> either have to 
> # Not use observer reads
> # Use observer reads with the risk of stale reads
> # Make code changes to explicitly call msync.
> We should add support for 
> *dfs.client.failover.observer.auto-msync-period.*. This can be 
> done by adding a ProxyProvider, in a similar manner to the 
> ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17026) RBF: NamenodeHeartbeatService should update JMX report with configurable frequency

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17026:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> RBF: NamenodeHeartbeatService should update JMX report with configurable 
> frequency
> --
>
> Key: HDFS-17026
> URL: https://issues.apache.org/jira/browse/HDFS-17026
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Hector Sandoval Chaverri
>Assignee: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-17026-branch-3.3.patch
>
>
> The NamenodeHeartbeatService currently calls each of the Namenode's JMX 
> endpoint every time it wakes up (default value is every 5 seconds).
> In a cluster with 40 routers, we have observed service degradation on some of 
> the  Namenodes, since the JMX request obtains Datanode status and blocks 
> other RPC requests. However, JMX report data doesn't seem to be used for 
> critical paths on the routers.
> We should configure the NamenodeHeartbeatService so it updates the JMX 
> reports on a slower frequency than the Namenode states or to disable the 
> reports completely.
> The class calls out the JMX request being optional even though there is no 
> implementation to turn it off:
> {noformat}
> // Read the stats from JMX (optional)
> updateJMXParameters(webAddress, report);{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17029) Support getECPolices API in WebHDFS

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17029:
--
Target Version/s: 3.4.0

> Support getECPolices API in WebHDFS
> ---
>
> Key: HDFS-17029
> URL: https://issues.apache.org/jira/browse/HDFS-17029
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-05-29-23-55-09-224.png
>
>
> WebHDFS should support getEcPolicies:
> !image-2023-05-29-23-55-09-224.png|width=817,height=234!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17024) Potential data race introduced by HDFS-15865

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17024:
--
Hadoop Flags: Reviewed

> Potential data race introduced by HDFS-15865
> 
>
> Key: HDFS-17024
> URL: https://issues.apache.org/jira/browse/HDFS-17024
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.1
>Reporter: Wei-Chiu Chuang
>Assignee: Segawa Hiroaki
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> After HDFS-15865, we found client aborted due to an NPE.
> {noformat}
> 2023-04-10 16:07:43,409 ERROR 
> org.apache.hadoop.hbase.regionserver.HRegionServer: * ABORTING region 
> server kqhdp36,16020,1678077077562: Replay of WAL required. Forcing server 
> shutdown *
> org.apache.hadoop.hbase.DroppedSnapshotException: region: WAFER_ALL,16|CM 
> RIE.MA1|CP1114561.18|PROC|,1625899466315.0fbdf0f1810efa9e68af831247e6555f.
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2870)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2539)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2511)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2401)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:613)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:582)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:362)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.DataStreamer.waitForAckedSeqno(DataStreamer.java:880)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:781)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.closeImpl(DFSOutputStream.java:898)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:850)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
> at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishClose(HFileWriterImpl.java:859)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.close(HFileWriterImpl.java:687)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileWriter.close(StoreFileWriter.java:393)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at 
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:78)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:1047)
> at 
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2349)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2806)
> {noformat}
> This is only possible if a data race happened. File this jira to improve the 
> data and eliminate the data race.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17022) Fix the exception message to print the Identifier pattern

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17022:
--
 Target Version/s: 3.3.6, 3.4.0
Affects Version/s: 3.3.6
   3.4.0

> Fix the exception message to print the Identifier pattern
> -
>
> Key: HDFS-17022
> URL: https://issues.apache.org/jira/browse/HDFS-17022
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Nishtha Shah
>Assignee: Nishtha Shah
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> In case of an incorrect string passed as value, it would throw an exception, 
> but the message doesn't print the identifier pattern.
> {code:java}
> java.lang.IllegalArgumentException: [] = [[a] must be {2}{code}
>  instead of 
> {code:java}
> java.lang.IllegalArgumentException: [] = [[a] must be 
> [a-zA-Z_][a-zA-Z0-9_\-]*{code}
> Ref to original discussion: 
> https://github.com/apache/hadoop/pull/5669#discussion_r1198937053



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17020) RBF: mount table addAll should print failed records in std error

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17020:
--
Affects Version/s: 3.4.0

> RBF: mount table addAll should print failed records in std error
> 
>
> Key: HDFS-17020
> URL: https://issues.apache.org/jira/browse/HDFS-17020
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now that state store putAll supports returning failed records keys, addAll 
> command for mount entries should also support printing failed records in the 
> standard error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17022) Fix the exception message to print the Identifier pattern

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17022:
--
Component/s: httpfs

> Fix the exception message to print the Identifier pattern
> -
>
> Key: HDFS-17022
> URL: https://issues.apache.org/jira/browse/HDFS-17022
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Nishtha Shah
>Assignee: Nishtha Shah
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> In case of an incorrect string passed as value, it would throw an exception, 
> but the message doesn't print the identifier pattern.
> {code:java}
> java.lang.IllegalArgumentException: [] = [[a] must be {2}{code}
>  instead of 
> {code:java}
> java.lang.IllegalArgumentException: [] = [[a] must be 
> [a-zA-Z_][a-zA-Z0-9_\-]*{code}
> Ref to original discussion: 
> https://github.com/apache/hadoop/pull/5669#discussion_r1198937053



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17020) RBF: mount table addAll should print failed records in std error

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17020:
--
Component/s: rbf

> RBF: mount table addAll should print failed records in std error
> 
>
> Key: HDFS-17020
> URL: https://issues.apache.org/jira/browse/HDFS-17020
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now that state store putAll supports returning failed records keys, addAll 
> command for mount entries should also support printing failed records in the 
> standard error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17018) Improve dfsclient log format

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17018:
--
Target Version/s: 3.4.0

> Improve dfsclient log format
> 
>
> Key: HDFS-17018
> URL: https://issues.apache.org/jira/browse/HDFS-17018
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsclient
>Affects Versions: 3.3.4
>Reporter: Xianming Lei
>Assignee: Xianming Lei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Modify the log format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17019) Optimize the logic for reconfigure slow peer enable for Namenode

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17019:
--
  Component/s: namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

>  Optimize the logic for reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-17019
> URL: https://issues.apache.org/jira/browse/HDFS-17019
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The logic of Reconfigure slow peer enable for Namenode requires the following 
> optimizations:
> 1.Make SlowPeerTracker slowPeerTracker volatile.
> 2.When starting the NameNode, if the parameter 
> dfs.datanode.peer.stats.enabled is set to false, 
> DatanodeManager#startSlowPeerCollector() will not call, as a result the Slow 
> peers collection thread 'slowPeerCollectorDaemon' will not be started .
>  If the parameter dfs.datanode.peer.stats.enabled is dynamically refreshed to 
> true, the current logic will not call 
> DatanodeManager#startSlowPeerCollector(), which to thread 
> 'slowPeerCollectorDaemon' not started as expected, so we will optimize here



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17020) RBF: mount table addAll should print failed records in std error

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17020:
--
Target Version/s: 3.4.0

> RBF: mount table addAll should print failed records in std error
> 
>
> Key: HDFS-17020
> URL: https://issues.apache.org/jira/browse/HDFS-17020
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Now that state store putAll supports returning failed records keys, addAll 
> command for mount entries should also support printing failed records in the 
> standard error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17019) Optimize the logic for reconfigure slow peer enable for Namenode

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17019:
--
Hadoop Flags: Reviewed

>  Optimize the logic for reconfigure slow peer enable for Namenode
> -
>
> Key: HDFS-17019
> URL: https://issues.apache.org/jira/browse/HDFS-17019
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The logic of Reconfigure slow peer enable for Namenode requires the following 
> optimizations:
> 1.Make SlowPeerTracker slowPeerTracker volatile.
> 2.When starting the NameNode, if the parameter 
> dfs.datanode.peer.stats.enabled is set to false, 
> DatanodeManager#startSlowPeerCollector() will not call, as a result the Slow 
> peers collection thread 'slowPeerCollectorDaemon' will not be started .
>  If the parameter dfs.datanode.peer.stats.enabled is dynamically refreshed to 
> true, the current logic will not call 
> DatanodeManager#startSlowPeerCollector(), which to thread 
> 'slowPeerCollectorDaemon' not started as expected, so we will optimize here



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17017) Fix the issue of arguments number limit in report command in DFSAdmin.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17017:
--
  Component/s: dfsadmin
 Target Version/s: 3.3.6, 3.4.0
Affects Version/s: 3.3.6
   3.4.0

> Fix the issue of arguments number limit in report command in DFSAdmin.
> --
>
> Key: HDFS-17017
> URL: https://issues.apache.org/jira/browse/HDFS-17017
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfsadmin
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> Currently, the DFSAdmin report command should support a maximum number of 
> arguments of 7, such as :
> hdfs dfsadmin [-report] [-live] [-dead] [-decommissioning] 
> [-enteringmaintenance] [-inmaintenance] [-slownodes]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17012) Remove unused DFSConfigKeys#DFS_DATANODE_PMEM_CACHE_DIRS_DEFAULT

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17012:
--
Labels: pull-requests-available  (was: )

> Remove unused DFSConfigKeys#DFS_DATANODE_PMEM_CACHE_DIRS_DEFAULT
> 
>
> Key: HDFS-17012
> URL: https://issues.apache.org/jira/browse/HDFS-17012
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Affects Versions: 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-requests-available
> Fix For: 3.4.0
>
> Attachments: screenshot-1.png
>
>
> In DFSConfigKeys, DFS_DATANODE_PMEM_CACHE_DIRS_DEFAULT doesn't seem to have 
> been used anywhere, this is a redundant option and we should remove it.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17014) HttpFS Add Support getStatus API

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17014:
--
Target Version/s: 3.4.0

> HttpFS Add Support getStatus API
> 
>
> Key: HDFS-17014
> URL: https://issues.apache.org/jira/browse/HDFS-17014
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-05-24-21-58-28-674.png
>
>
> We should ensure that *WebHDFS* remains synchronized with {*}HttpFS{*}, as 
> the former has already implemented the *getStatus* interface.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17015) Typos in HDFS Documents

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17015:
--
Target Version/s: 3.4.0

> Typos in HDFS Documents
> ---
>
> Key: HDFS-17015
> URL: https://issues.apache.org/jira/browse/HDFS-17015
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: configuration
>Affects Versions: 3.3.5
>Reporter: Liang Yan
>Assignee: Liang Yan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There are some typos in HDFS documents. I will submit PR to fix these typos.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17012) Remove unused DFSConfigKeys#DFS_DATANODE_PMEM_CACHE_DIRS_DEFAULT

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17012:
--
Target Version/s: 3.4.0

> Remove unused DFSConfigKeys#DFS_DATANODE_PMEM_CACHE_DIRS_DEFAULT
> 
>
> Key: HDFS-17012
> URL: https://issues.apache.org/jira/browse/HDFS-17012
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, hdfs
>Affects Versions: 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-requests-available
> Fix For: 3.4.0
>
> Attachments: screenshot-1.png
>
>
> In DFSConfigKeys, DFS_DATANODE_PMEM_CACHE_DIRS_DEFAULT doesn't seem to have 
> been used anywhere, this is a redundant option and we should remove it.
>  !screenshot-1.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17011) Fix the metric of "HttpPort" at DataNodeInfo

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17011:
--
 Hadoop Flags: Reviewed
 Target Version/s: 3.3.6, 3.4.0
Affects Version/s: 3.3.6
   3.4.0

> Fix the metric of  "HttpPort" at DataNodeInfo 
> --
>
> Key: HDFS-17011
> URL: https://issues.apache.org/jira/browse/HDFS-17011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
> Attachments: after.png, before.png
>
>
> Now, the "HttpPort" metric was getting from the conf `dfs.datanode.info.port`
> but the conf seem to be useless, httpPort already named infoPort and was 
> assigned (#Line1373)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17011) Fix the metric of "HttpPort" at DataNodeInfo

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17011:
--
Labels: pull-request-available  (was: )

> Fix the metric of  "HttpPort" at DataNodeInfo 
> --
>
> Key: HDFS-17011
> URL: https://issues.apache.org/jira/browse/HDFS-17011
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
> Attachments: after.png, before.png
>
>
> Now, the "HttpPort" metric was getting from the conf `dfs.datanode.info.port`
> but the conf seem to be useless, httpPort already named infoPort and was 
> assigned (#Line1373)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17011) Fix the metric of "HttpPort" at DataNodeInfo

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17011:
--
Component/s: datanode

> Fix the metric of  "HttpPort" at DataNodeInfo 
> --
>
> Key: HDFS-17011
> URL: https://issues.apache.org/jira/browse/HDFS-17011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
> Attachments: after.png, before.png
>
>
> Now, the "HttpPort" metric was getting from the conf `dfs.datanode.info.port`
> but the conf seem to be useless, httpPort already named infoPort and was 
> assigned (#Line1373)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17010) Add a subtree test to TestSnapshotDiffReport

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17010:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Add a subtree test to TestSnapshotDiffReport
> 
>
> Key: HDFS-17010
> URL: https://issues.apache.org/jira/browse/HDFS-17010
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Minor
> Fix For: 3.4.0
>
>
> The JIRA is to add a test for running SnapshotDiffReport over subtrees of a 
> snapshottable directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17009) RBF: state store putAll should also return failed records

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17009:
--
Affects Version/s: 3.4.0

> RBF: state store putAll should also return failed records
> -
>
> Key: HDFS-17009
> URL: https://issues.apache.org/jira/browse/HDFS-17009
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> State store implementations allow adding/updating multiple records using 
> putAll. The implementation returns whether all records were successfully 
> added or updated. We should also allow the implementation to return which 
> records failed to get updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17009) RBF: state store putAll should also return failed records

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17009:
--
Target Version/s: 3.4.0

> RBF: state store putAll should also return failed records
> -
>
> Key: HDFS-17009
> URL: https://issues.apache.org/jira/browse/HDFS-17009
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> State store implementations allow adding/updating multiple records using 
> putAll. The implementation returns whether all records were successfully 
> added or updated. We should also allow the implementation to return which 
> records failed to get updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17003) Erasure Coding: invalidate wrong block after reporting bad blocks from datanode

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17003:
--
Affects Version/s: 3.3.6
   3.4.0

> Erasure Coding: invalidate wrong block after reporting bad blocks from 
> datanode
> ---
>
> Key: HDFS-17003
> URL: https://issues.apache.org/jira/browse/HDFS-17003
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0, 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
>
> After receiving reportBadBlocks RPC from datanode, NameNode compute wrong 
> block to invalidate. It is a dangerous behaviour and may cause data loss. 
> Some logs in our production as below:
>  
> NameNode log:
> {code:java}
> 2023-05-08 21:23:49,112 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> reportBadBlocks for block: 
> BP-932824627--1680179358678:blk_-9223372036848404320_1471186 on datanode: 
> datanode1:50010
> 2023-05-08 21:23:49,183 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
> reportBadBlocks for block: 
> BP-932824627--1680179358678:blk_-9223372036848404319_1471186 on datanode: 
> datanode2:50010{code}
> datanode1 log:
> {code:java}
> 2023-05-08 21:23:49,088 WARN 
> org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting bad 
> BP-932824627--1680179358678:blk_-9223372036848404320_1471186 on 
> /data7/hadoop/hdfs/datanode
> 2023-05-08 21:24:00,509 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Failed 
> to delete replica blk_-9223372036848404319_1471186: ReplicaInfo not 
> found.{code}
>  
> This phenomenon can be reproduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17009) RBF: state store putAll should also return failed records

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17009:
--
Labels: pull-request-available  (was: )

> RBF: state store putAll should also return failed records
> -
>
> Key: HDFS-17009
> URL: https://issues.apache.org/jira/browse/HDFS-17009
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> State store implementations allow adding/updating multiple records using 
> putAll. The implementation returns whether all records were successfully 
> added or updated. We should also allow the implementation to return which 
> records failed to get updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17008) Fix RBF JDK 11 javadoc warnings

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17008:
--
  Component/s: rbf
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix RBF JDK 11 javadoc warnings
> ---
>
> Key: HDFS-17008
> URL: https://issues.apache.org/jira/browse/HDFS-17008
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-16978 excluded proto packages from maven-javadoc-plugin for rbf, hence 
> now we have JDK 11 javadoc warnings (e.g. 
> [here|https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5554/14/artifact/out/results-javadoc-javadoc-hadoop-hdfs-project_hadoop-hdfs-rbf-jdkUbuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1.txt]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17009) RBF: state store putAll should also return failed records

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17009:
--
Component/s: rbf

> RBF: state store putAll should also return failed records
> -
>
> Key: HDFS-17009
> URL: https://issues.apache.org/jira/browse/HDFS-17009
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> State store implementations allow adding/updating multiple records using 
> putAll. The implementation returns whether all records were successfully 
> added or updated. We should also allow the implementation to return which 
> records failed to get updated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17001) Support getStatus API in WebHDFS

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17001:
--
Target Version/s: 3.4.0

> Support getStatus API in WebHDFS
> 
>
> Key: HDFS-17001
> URL: https://issues.apache.org/jira/browse/HDFS-17001
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-05-08-14-34-51-873.png
>
>
> WebHDFS should support getStatus:
> !image-2023-05-08-14-34-51-873.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17000) Potential infinite loop in TestDFSStripedOutputStreamUpdatePipeline.testDFSStripedOutputStreamUpdatePipeline

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17000:
--
Component/s: test

> Potential infinite loop in 
> TestDFSStripedOutputStreamUpdatePipeline.testDFSStripedOutputStreamUpdatePipeline
> 
>
> Key: HDFS-17000
> URL: https://issues.apache.org/jira/browse/HDFS-17000
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Marcono1234
>Assignee: Marcono1234
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The method 
> {{TestDFSStripedOutputStreamUpdatePipeline.testDFSStripedOutputStreamUpdatePipeline}}
>  contains the following line:
> {code}
> for (int i = 0; i < Long.MAX_VALUE; i++) {
> {code}
> [GitHub source 
> link|https://github.com/apache/hadoop/blob/4ee92efb73a90ae7f909e96de242d216ad6878b2/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedOutputStreamUpdatePipeline.java#L48]
> Because {{i}} is an {{int}} the condition {{i < Long.MAX_VALUE}} will always 
> be true and {{i}} will simply overflow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16999) Fix wrong use of processFirstBlockReport()

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16999:
--
  Component/s: namanode
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix wrong use of processFirstBlockReport()
> --
>
> Key: HDFS-16999
> URL: https://issues.apache.org/jira/browse/HDFS-16999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> `processFirstBlockReport()` is used to process first block report from 
> datanode. It does not calculating `toRemove` list because it believes that 
> there is no metadata about the datanode in the namenode. However, If a 
> datanode is re registered after restarting, its `blockReportCount` will be 
> updated to 0. That is to say, the first block report after a datanode 
> restarts will be processed by `processFirstBlockReport()`.  This is 
> unreasonable because the metadata of the datanode already exists in namenode 
> at this time, and if redundant replica metadata is not removed in time, the 
> blocks with insufficient replicas cannot be reconstruct in time, which 
> increases the risk of missing block. In summary, `processFirstBlockReport()` 
> should only be used when the namenode restarts, not when the datanode 
> restarts. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17000) Potential infinite loop in TestDFSStripedOutputStreamUpdatePipeline.testDFSStripedOutputStreamUpdatePipeline

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17000:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Potential infinite loop in 
> TestDFSStripedOutputStreamUpdatePipeline.testDFSStripedOutputStreamUpdatePipeline
> 
>
> Key: HDFS-17000
> URL: https://issues.apache.org/jira/browse/HDFS-17000
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Marcono1234
>Assignee: Marcono1234
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The method 
> {{TestDFSStripedOutputStreamUpdatePipeline.testDFSStripedOutputStreamUpdatePipeline}}
>  contains the following line:
> {code}
> for (int i = 0; i < Long.MAX_VALUE; i++) {
> {code}
> [GitHub source 
> link|https://github.com/apache/hadoop/blob/4ee92efb73a90ae7f909e96de242d216ad6878b2/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedOutputStreamUpdatePipeline.java#L48]
> Because {{i}} is an {{int}} the condition {{i < Long.MAX_VALUE}} will always 
> be true and {{i}} will simply overflow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16998) RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16998:
--
 Component/s: rbf
Target Version/s: 3.4.0

> RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity
> --
>
> Key: HDFS-16998
> URL: https://issues.apache.org/jira/browse/HDFS-16998
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16997) Set the locale to avoid printing useless logs in BlockSender

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16997:
--
  Component/s: block placement
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Set the locale to avoid printing useless logs in BlockSender
> 
>
> Key: HDFS-16997
> URL: https://issues.apache.org/jira/browse/HDFS-16997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In our production environment, if the hadoop process is started in a 
> non-English environment, many unexpected error logs will be printed. The 
> following is the error message printed by datanode.
> ```
> 2023-05-01 09:10:50,299 ERROR 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider: error in op 
> transferToSocketFully : 断开的管道
> 2023-05-01 09:10:50,299 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() 
> exception: 
> java.io.IOException: 断开的管道
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:260)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:801)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:755)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:580)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:258)
> at java.lang.Thread.run(Thread.java:745)
> 2023-05-01 09:10:50,298 ERROR 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider: error in op 
> transferToSocketFully : 断开的管道
> 2023-05-01 09:10:50,298 ERROR 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider: error in op 
> transferToSocketFully : 断开的管道
> 2023-05-01 09:10:50,298 ERROR 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider: error in op 
> transferToSocketFully : 断开的管道
> 2023-05-01 09:10:50,298 ERROR 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider: error in op 
> transferToSocketFully : 断开的管道
> 2023-05-01 09:10:50,302 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() 
> exception: 
> java.io.IOException: 断开的管道
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
> at 
> sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
> at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:242)
> at 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:260)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:559)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:801)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:755)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:580)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:258)
> at java.lang.Thread.run(Thread.java:745)
> 2023-05-01 09:10:50,303 ERROR 
> org.apache.hadoop.hdfs.server.datanode.FileIoProvider: error in op 
> transferToSocketFully : 断开的管道
> 2023-05-01 09:10:50,303 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() 
> exception: 
> java.io.IOException: 断开的管道
> at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> at 
> sun.nio.ch.FileChann

[jira] [Updated] (HDFS-16999) Fix wrong use of processFirstBlockReport()

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16999:
--
Component/s: block placement
 (was: namanode)

> Fix wrong use of processFirstBlockReport()
> --
>
> Key: HDFS-16999
> URL: https://issues.apache.org/jira/browse/HDFS-16999
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: block placement
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> `processFirstBlockReport()` is used to process first block report from 
> datanode. It does not calculating `toRemove` list because it believes that 
> there is no metadata about the datanode in the namenode. However, If a 
> datanode is re registered after restarting, its `blockReportCount` will be 
> updated to 0. That is to say, the first block report after a datanode 
> restarts will be processed by `processFirstBlockReport()`.  This is 
> unreasonable because the metadata of the datanode already exists in namenode 
> at this time, and if redundant replica metadata is not removed in time, the 
> blocks with insufficient replicas cannot be reconstruct in time, which 
> increases the risk of missing block. In summary, `processFirstBlockReport()` 
> should only be used when the namenode restarts, not when the datanode 
> restarts. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16998) RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16998:
--
Affects Version/s: 3.4.0

> RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity
> --
>
> Key: HDFS-16998
> URL: https://issues.apache.org/jira/browse/HDFS-16998
> Project: Hadoop HDFS
>  Issue Type: Task
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16995) Remove unused parameters at NameNodeHttpServer#initWebHdfs

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16995:
--
  Component/s: webhdfs
 Hadoop Flags: Reviewed
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Remove unused parameters at NameNodeHttpServer#initWebHdfs
> --
>
> Key: HDFS-16995
> URL: https://issues.apache.org/jira/browse/HDFS-16995
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16986) EC: Fix locationBudget in getListing()

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16986:
--
  Component/s: erasure-coding
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> EC: Fix locationBudget in getListing()
> --
>
> Key: HDFS-16986
> URL: https://issues.apache.org/jira/browse/HDFS-16986
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.4.0
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The current `locationBudget` is estimated using the `block_replication` in 
> `FileStatus`, which is unreasonable on EC files, because it will count the 
> number of locations of a EC block as 1. We should consider 
> ErasureCodingPolicy of the files to keep the meaning of `locationBudget` 
> consistent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16990) HttpFS Add Support getFileLinkStatus API

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16990:
--
Target Version/s: 3.4.0

> HttpFS Add Support getFileLinkStatus API
> 
>
> Key: HDFS-16990
> URL: https://issues.apache.org/jira/browse/HDFS-16990
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: httpfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HttpFS should implement the *getFileLinkStatus* API already implemented in 
> WebHDFS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16988) Improve NameServices info at JournalNode web UI

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16988:
--
  Component/s: journal-node
   ui
 Target Version/s: 3.3.6, 3.4.0
Affects Version/s: 3.3.6
   3.4.0

> Improve NameServices info at JournalNode web UI
> ---
>
> Key: HDFS-16988
> URL: https://issues.apache.org/jira/browse/HDFS-16988
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node, ui
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Zhaohui Wang
>Assignee: Zhaohui Wang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.6
>
> Attachments: Before.png, after.png
>
>
> If the NameServices is named xxx-abc-edg, only xxx will be displayed on the 
> JN web UI.
> If NS1 is named xxx-abc-edg, NS2 is named xxx-lmn-xyz. Show both NS as xxx on 
> the JN web UI.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16985) Fix data missing issue when delete local block file.

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16985:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Fix data missing issue when delete local block file.
> 
>
> Key: HDFS-16985
> URL: https://issues.apache.org/jira/browse/HDFS-16985
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> We encounterd several missing-block problem in our production cluster which  
> hdfs  running on AWS EC2 + EBS.
> The root cause:
>  # the block remains only 1 replication left and hasn't been reconstruction
>  # DN checks block file existing when BlockSender construction
>  # the EBS checking failed and throw FileNotFoundException (EBS may be in 
> fault condition)
>  # DN invalidateBlock and schedule block  async deletion
>  # EBS already back to normal when DN do delete block
>  # the block file be delete permanently and can't be recovered



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16982) Use the right Quantiles Array for Inverse Quantiles snapshot

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16982:
--
 Target Version/s: 3.4.0
Affects Version/s: 3.4.0

> Use the right Quantiles Array for Inverse Quantiles snapshot 
> -
>
> Key: HDFS-16982
> URL: https://issues.apache.org/jira/browse/HDFS-16982
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, metrics
>Affects Versions: 3.4.0
>Reporter: Ravindra Dingankar
>Assignee: Ravindra Dingankar
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> HDFS-16949 introduced InverseQuantiles. However during snapshot for Inverse 
> Quantiles we were still trying to access values from previous snapshot based 
> on the Quantile Array declared in MutableQuantiles. ( Quantile(.50, .050), 
> Quantile(.75, .025), Quantile(.90, .010), Quantile(.95, .005), Quantile(.99, 
> .001) )
> For InverseQuantiles we wont have these values ( except for Quantile(.50, 
> .050) ) thus except for 50 Percentile snapshot wont return any value for the 
> remaining quantiles.
> Fix is to use the correct Quantiles Array to retrieve values during snapshot. 
> The new UTs verify this behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16983) Fix concat operation doesn't honor dfs.permissions.enabled

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16983:
--
Target Version/s: 3.4.0

> Fix concat operation doesn't honor dfs.permissions.enabled
> --
>
> Key: HDFS-16983
> URL: https://issues.apache.org/jira/browse/HDFS-16983
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In concat RPC, it will call FSDirConcatOp::verifySrcFiles() to check the 
> source files. In this function, it would make permission check for srcs. 
> Whether do the permission check should be decided by dfs.permissions.enabled 
> configuration. And the 'pc' parameter is always not null.
> So we should change 'if (pc != null)' to 'if (fsd.isPermissionEnabled())'.
> {code:java}
> // permission check for srcs
> if (pc != null) {
>   fsd.checkPathAccess(pc, iip, FsAction.READ); // read the file
>   fsd.checkParentAccess(pc, iip, FsAction.WRITE); // for delete
> } 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16981) Support getFileLinkStatus API in WebHDFS

2024-01-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-16981:
--
Target Version/s: 3.4.0

> Support getFileLinkStatus API in WebHDFS
> 
>
> Key: HDFS-16981
> URL: https://issues.apache.org/jira/browse/HDFS-16981
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 3.4.0
>Reporter: Hualong Zhang
>Assignee: Hualong Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-04-13-23-41-51-380.png
>
>
> WebHDFS should support getFileLinkStatus:
> !image-2023-04-13-23-41-51-380.png|width=670,height=187!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    5   6   7   8   9   10   11   12   >