[jira] [Created] (HDFS-17420) [FGL] FSEditLogLoader supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17420:
---

 Summary: [FGL] FSEditLogLoader supports fine-grained lock
 Key: HDFS-17420
 URL: https://issues.apache.org/jira/browse/HDFS-17420
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


[FGL] FSEditLogLoader supports fine-grained lock



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17419) [FGL] CacheReplicationMonitor supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17419:
---

 Summary: [FGL] CacheReplicationMonitor supports fine-grained lock
 Key: HDFS-17419
 URL: https://issues.apache.org/jira/browse/HDFS-17419
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17418) [FGL] DatanodeAdminMonitor supports fine-grained locking

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17418:
---

 Summary: [FGL] DatanodeAdminMonitor supports fine-grained locking
 Key: HDFS-17418
 URL: https://issues.apache.org/jira/browse/HDFS-17418
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


[FGL] DatanodeAdminMonitor supports fine-grained locking.
 * DatanodeAdminBackoffMonitor
 * DatanodeAdminDefaultMonitor



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17417) [FGL] Monitor in HeartbeatManager supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17417:
---

 Summary: [FGL] Monitor in HeartbeatManager supports fine-grained 
lock
 Key: HDFS-17417
 URL: https://issues.apache.org/jira/browse/HDFS-17417
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


[FGL] Monitor in HeartbeatManager supports fine-grained lock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17416) [FGL] Monitor threads in BlockManager.class support fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17416:
---

 Summary: [FGL] Monitor threads in BlockManager.class support 
fine-grained lock
 Key: HDFS-17416
 URL: https://issues.apache.org/jira/browse/HDFS-17416
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


There are some monitor threads in BlockManager.class.

 

This ticket is used to make these threads supporting fine-grained locking.
 * BlockReportProcessingThread
 * MarkedDeleteBlockScrubber
 * RedundancyMonitor
 * Reconstruction Queue Initializer

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17415) [FGL] RPCs in NamenodeProtocol support fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17415:
---

 Summary: [FGL] RPCs in NamenodeProtocol support fine-grained lock
 Key: HDFS-17415
 URL: https://issues.apache.org/jira/browse/HDFS-17415
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


[FGL] RPCs in NamenodeProtocol support fine-grained lock.
 * getBlocks
 * getBlockKeys
 * getTransactionID
 * getMostRecentCheckpointTxId
 * rollEditLog
 * versionRequest
 * errorReport
 * registerSubordinateNamenode
 * startCheckpoint
 * endCheckpoint
 * getEditLogManifest
 * isUpgradeFinalized
 * isRollingUpgrade
 * getNextSPSPath



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17414) [FGL] RPCs in DatanodeProtocol support fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17414:
---

 Summary: [FGL] RPCs in DatanodeProtocol support fine-grained lock
 Key: HDFS-17414
 URL: https://issues.apache.org/jira/browse/HDFS-17414
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


[FGL] RPCs in DatanodeProtocol support fine-grained lock.
 * registerDatanode
 * sendHeartbeat
 * sendLifeline
 * blockReport
 * cacheReport
 * blockReceivedAndDeleted
 * errorReport
 * versionRequest
 * reportBadBlocks
 * commitBlockSynchronization



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17413) [FGL] Client RPCs involving Cache supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17413:
---

 Summary: [FGL] Client RPCs involving Cache supports fine-grained 
lock
 Key: HDFS-17413
 URL: https://issues.apache.org/jira/browse/HDFS-17413
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


Client RPCs involving Cache supports fine-grained lock.
 * addCacheDirective
 * modifyCacheDirective
 * removeCacheDirective
 * listCacheDirectives
 * addCachePool
 * modifyCachePool
 * removeCachePool
 * listCachePools



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Description: 
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync
 * checkAccess
 * getFileLinkInfo
 * getLinkTarget
 * getDelegationToken
 * getDataEncryptionKey

  was:
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync
 * checkAccess


> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client read process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * getListing
>  * getBatchedListing
>  * listOpenFiles
>  * getFileInfo
>  * isFileClosed
>  * getBlockLocations
>  * reportBadBlocks
>  * getServerDefaults
>  * getStats
>  * getReplicatedBlockStats
>  * getECBlockGroupStats
>  * getPreferredBlockSize
>  * listCorruptFileBlocks
>  * getContentSummary
>  * getLocatedFileInfo
>  * createEncryptionZone
>  * msync
>  * checkAccess
>  * getFileLinkInfo
>  * getLinkTarget
>  * getDelegationToken
>  * getDataEncryptionKey



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17388:

Description: 
The client write process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * mkdir 
 * create
 * addBlock
 * abandonBlock
 * getAdditionalDatanode
 * updateBlockForPipeline
 * updatePipeline
 * fsync
 * commit
 * rename
 * rename2
 * append
 * renewLease
 * recoverLease
 * delete
 * createSymlink
 * renewDelegationToken
 * cancelDelegationToken

  was:
The client write process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * mkdir 
 * create
 * addBlock
 * abandonBlock
 * getAdditionalDatanode
 * updateBlockForPipeline
 * updatePipeline
 * fsync
 * commit
 * rename
 * rename2
 * append
 * renewLease
 * recoverLease
 * delete


> [FGL] Client RPCs involving write process supports fine-grained lock
> 
>
> Key: HDFS-17388
> URL: https://issues.apache.org/jira/browse/HDFS-17388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client write process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * mkdir 
>  * create
>  * addBlock
>  * abandonBlock
>  * getAdditionalDatanode
>  * updateBlockForPipeline
>  * updatePipeline
>  * fsync
>  * commit
>  * rename
>  * rename2
>  * append
>  * renewLease
>  * recoverLease
>  * delete
>  * createSymlink
>  * renewDelegationToken
>  * cancelDelegationToken



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Description: 
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync
 * checkAccess

  was:
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync


> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client read process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * getListing
>  * getBatchedListing
>  * listOpenFiles
>  * getFileInfo
>  * isFileClosed
>  * getBlockLocations
>  * reportBadBlocks
>  * getServerDefaults
>  * getStats
>  * getReplicatedBlockStats
>  * getECBlockGroupStats
>  * getPreferredBlockSize
>  * listCorruptFileBlocks
>  * getContentSummary
>  * getLocatedFileInfo
>  * createEncryptionZone
>  * msync
>  * checkAccess



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17410:

Description: 
There are some client RPCs are used to change file attributes.

This ticket is used to make these RPCs supporting fine-grained lock.
 * setReplication
 * getStoragePolicies
 * setStoragePolicy
 * unsetStoragePolicy
 * satisfyStoragePolicy
 * getStoragePolicy
 * setPermission
 * setOwner
 * setTimes
 * concat
 * truncate
 * setQuota
 * getQuotaUsage
 * modifyAclEntries
 * removeAclEntries
 * removeDefaultAcl
 * removeAcl
 * setAcl
 * getAclStatus
 * getEZForPath
 * listEncryptionZones
 * reencryptEncryptionZone
 * listReencryptionStatus
 * setXAttr
 * getXAttrs
 * listXAttrs
 * removeXAttr

  was:
There are some client RPCs are used to change file attributes.

This ticket is used to make these RPCs supporting fine-grained lock.
 * setReplication
 * getStoragePolicies
 * setStoragePolicy
 * unsetStoragePolicy
 * getStoragePolicy
 * setPermission
 * setOwner
 * setTimes
 * concat
 * truncate
 * setQuota
 * getQuotaUsage
 * modifyAclEntries
 * removeAclEntries
 * removeDefaultAcl
 * removeAcl
 * setAcl
 * getAclStatus
 * getEZForPath
 * listEncryptionZones
 * reencryptEncryptionZone
 * listReencryptionStatus


> [FGL] Client RPCs that changes file attributes supports fine-grained lock
> -
>
> Key: HDFS-17410
> URL: https://issues.apache.org/jira/browse/HDFS-17410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> There are some client RPCs are used to change file attributes.
> This ticket is used to make these RPCs supporting fine-grained lock.
>  * setReplication
>  * getStoragePolicies
>  * setStoragePolicy
>  * unsetStoragePolicy
>  * satisfyStoragePolicy
>  * getStoragePolicy
>  * setPermission
>  * setOwner
>  * setTimes
>  * concat
>  * truncate
>  * setQuota
>  * getQuotaUsage
>  * modifyAclEntries
>  * removeAclEntries
>  * removeDefaultAcl
>  * removeAcl
>  * setAcl
>  * getAclStatus
>  * getEZForPath
>  * listEncryptionZones
>  * reencryptEncryptionZone
>  * listReencryptionStatus
>  * setXAttr
>  * getXAttrs
>  * listXAttrs
>  * removeXAttr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Description: 
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone
 * msync

  was:
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone


> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client read process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * getListing
>  * getBatchedListing
>  * listOpenFiles
>  * getFileInfo
>  * isFileClosed
>  * getBlockLocations
>  * reportBadBlocks
>  * getServerDefaults
>  * getStats
>  * getReplicatedBlockStats
>  * getECBlockGroupStats
>  * getPreferredBlockSize
>  * listCorruptFileBlocks
>  * getContentSummary
>  * getLocatedFileInfo
>  * createEncryptionZone
>  * msync



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17412) [FGL] Client RPCs involving maintenance supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17412:
---

 Summary: [FGL] Client RPCs involving maintenance supports 
fine-grained lock
 Key: HDFS-17412
 URL: https://issues.apache.org/jira/browse/HDFS-17412
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


There are multiple client RPCs that used for admin to maintain cluster.

 

This ticket is used to make these RPCs supporting fine-grained lock.
 * getDatanodeReport
 * getDatanodeStorageReport
 * setSafeMode
 * saveNamespace
 * metaSave
 * rollEdits
 * restoreFailedStorage
 * refreshNodes
 * finalizeUpgrade
 * upgradeStatus
 * rollingUpgrade
 * setBalancerBandwidth
 * getCurrentEditLogTxid
 * getEditsFromTxid
 * getHAServiceState
 * getSlowDatanodeReport
 * getEnclosingRoot



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824608#comment-17824608
 ] 

ASF GitHub Bot commented on HDFS-17408:
---

ThinkerLei commented on PR #6608:
URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1984976217

   > Hi @ThinkerLei , Please check if the failed unite tests are related with 
this changes.
   
   @Hexiaoqiao  Thanks for your reply, I will work on this soon.




> Reduce the number of quota calculations in FSDirRenameOp
> 
>
> Key: HDFS-17408
> URL: https://issues.apache.org/jira/browse/HDFS-17408
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824606#comment-17824606
 ] 

ASF GitHub Bot commented on HDFS-17408:
---

Hexiaoqiao commented on PR #6608:
URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1984973314

   Hi @ThinkerLei , Please check if the failed unite tests are related with 
this changes.




> Reduce the number of quota calculations in FSDirRenameOp
> 
>
> Key: HDFS-17408
> URL: https://issues.apache.org/jira/browse/HDFS-17408
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824605#comment-17824605
 ] 

ASF GitHub Bot commented on HDFS-17380:
---

Hexiaoqiao commented on PR #6549:
URL: https://github.com/apache/hadoop/pull/6549#issuecomment-1984971681

   @szetszwo Thanks for your response.
   
   > This may not be acceptable in some use cases since the newly created files 
will be lost (i.e. data loss) if we recover from an earlier fsimage.
   
   Recover from one earlier checkpoint will not loss data, it will keep both 
fsimage and all editlog util the latest transaction. 
   
   > If we remove the inaccessible inodes, we won't lose any files.
   
   When you talk about `inaccessible inode`,  do you mean NameNode unexpected 
logic cause some inodes are unreachable?
   
   > this is just a tool to fix fsimages. Users may choose not to use it if 
they are fine to recover from an earlier fsimage.
   
   +1. Will involve to review once understand what it will improve. Thanks 
again.




> FsImageValidation: remove inaccessible nodes
> 
>
> Key: HDFS-17380
> URL: https://issues.apache.org/jira/browse/HDFS-17380
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> If a fsimage is corrupted,  it may have inaccessible nodes.  The 
> FsImageValidation tool currently is able to identify the inaccessible nodes 
> when validating the INodeMap.  This JIRA is to update the tool to remove the 
> inaccessible nodes and then save a new fsimage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824604#comment-17824604
 ] 

ASF GitHub Bot commented on HDFS-17364:
---

Hexiaoqiao commented on PR #6514:
URL: https://github.com/apache/hadoop/pull/6514#issuecomment-1984965658

   > @Hexiaoqiao @zhangshuyan0 Thanks for your reviews. I realized there's also 
an ElasticBufferPool in DFSStripedOutputStream. I'm thinking of handling that 
here as well. What do you think?
   
   +1 from my side.




> Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
> 
>
> Key: HDFS-17364
> URL: https://issues.apache.org/jira/browse/HDFS-17364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for 
> the "curStripeBuf". This is used for non-positional (stateful) reads and is 
> allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that 
> means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is 
> finished, the buffer is put back in the pool. Over time and with spikes of 
> concurrent reads, the pool grows and most of the buffers sit there unused.
>  
> WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates 
> this issue because the excess buffers can be GC'd once they are no longer 
> needed. We should use this same pool in DFSStripedInputStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824603#comment-17824603
 ] 

ASF GitHub Bot commented on HDFS-17364:
---

Hexiaoqiao commented on code in PR #6514:
URL: https://github.com/apache/hadoop/pull/6514#discussion_r1517110356


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:
##
@@ -3159,6 +3165,21 @@ private void initThreadsNumForStripedReads(int 
numThreads) {
 }
   }
 
+  private void initBufferPoolForStripedReads(boolean useWeakReference) {
+if (STRIPED_READ_BUFFER_POOL != null) {
+  return;
+}
+synchronized (DFSClient.class) {

Review Comment:
   Got it.





> Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
> 
>
> Key: HDFS-17364
> URL: https://issues.apache.org/jira/browse/HDFS-17364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for 
> the "curStripeBuf". This is used for non-positional (stateful) reads and is 
> allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that 
> means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is 
> finished, the buffer is put back in the pool. Over time and with spikes of 
> concurrent reads, the pool grows and most of the buffers sit there unused.
>  
> WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates 
> this issue because the excess buffers can be GC'd once they are no longer 
> needed. We should use this same pool in DFSStripedInputStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824531#comment-17824531
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

hadoop-yetus commented on PR #6612:
URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1984392158

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  13m 44s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 14s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   2m 16s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  branch-3.3 passed  |
   | -1 :x: |  spotbugs  |   1m 27s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/3/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client in branch-3.3 has 2 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 249 unchanged - 3 fixed = 249 total (was 252)  |
   | +1 :green_heart: |  mvnsite  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  3s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 47s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 121m 34s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +0 :ok: |  asflicense  |   0m 27s |  |  ASF License check generated no 
output?  |
   |  |   | 225m 17s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor 
|
   |   | hadoop.hdfs.TestEncryptedTransfer |
   |   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
   |   | hadoop.hdfs.server.datanode.TestBatchIbr |
   |   | hadoop.hdfs.server.datanode.TestBlockScanner |
   |   | hadoop.hdfs.TestLeaseRecovery2 |
   |   | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics |
   |   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy |
   |   | hadoop.hdfs.server.datanode.TestDataNodeFaultInjector |
   |   | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations |
   |   | hadoop.hdfs.server.datanode.TestBlockRecovery2 |
   |   | hadoop.hdfs.TestParallelUnixDomainRead |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6612 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e6f838e8b04d 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   |

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824518#comment-17824518
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

ritegarg commented on PR #6612:
URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1984310623

   > There are few test failures. Can you please take a look? @ritegarg
   
   I was looking into the failures, looks like transient failures. The same 
tests are running fine locally. 




> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.pro

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824478#comment-17824478
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

shahrs87 commented on PR #6612:
URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1984012754

   There are few test failures. Can you please take a look? @ritegarg 




> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apa

[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824473#comment-17824473
 ] 

ASF GitHub Bot commented on HDFS-17146:
---

hadoop-yetus commented on PR #6595:
URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1983993404

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m  0s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 31s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 215m 22s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 30s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 308m 42s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestNamenodeRetryCache |
   |   | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/9/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6595 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 908072e384a3 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6c18e7912316a868d950d08f8525bd559629fa82 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/9/testReport/ |
   | Max. process+thread count | 4655 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/had

[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824465#comment-17824465
 ] 

ASF GitHub Bot commented on HDFS-17146:
---

hadoop-yetus commented on PR #6595:
URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1983953951

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 25s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  1s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |   5m 31s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/8/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 46s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 44s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 38s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 30s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 228m 50s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 26s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 295m 34s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestEncryptionZonesWithKMS |
   |   | hadoop.hdfs.TestReconstructStripedFile |
   |   | hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy |
   |   | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA |
   |   | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6595 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 75aec88bb203 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 6c18e7912316a868d950d08f8525bd559629fa82 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/jav

[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824462#comment-17824462
 ] 

ASF GitHub Bot commented on HDFS-17380:
---

szetszwo commented on PR #6549:
URL: https://github.com/apache/hadoop/pull/6549#issuecomment-1983883646

   @Hexiaoqiao , thanks for review this!
   
   > ... We should recover from other fsimages first if one fsimage file is 
corrupted ...
   
   This may not be acceptable in some use cases since the newly created files 
will be lost (i.e. data loss) if we recover from an earlier fsimage.  If we 
remove the inaccessible inodes, we won't lose any files (i.e. no data loss).
   
   BTW, this is just a tool to fix fsimages.  Users may choose not to use it if 
they are fine to recover from an earlier fsimage.




> FsImageValidation: remove inaccessible nodes
> 
>
> Key: HDFS-17380
> URL: https://issues.apache.org/jira/browse/HDFS-17380
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> If a fsimage is corrupted,  it may have inaccessible nodes.  The 
> FsImageValidation tool currently is able to identify the inaccessible nodes 
> when validating the INodeMap.  This JIRA is to update the tool to remove the 
> inaccessible nodes and then save a new fsimage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824460#comment-17824460
 ] 

ASF GitHub Bot commented on HDFS-17408:
---

hadoop-yetus commented on PR #6608:
URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1983825438

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 46s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m 43s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 10s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 36s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 15s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 46 unchanged - 1 
fixed = 46 total (was 47)  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  the patch passed  |
   | -1 :x: |  shadedclient  |  40m 26s |  |  patch has errors when building 
and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 302m 57s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6608/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 458m 11s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.TestDeleteRace |
   |   | hadoop.hdfs.web.TestFSMainOperationsWebHdfs |
   |   | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy |
   |   | hadoop.hdfs.TestLeaseRecovery2 |
   |   | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport |
   |   | hadoop.hdfs.TestTrashWithEncryptionZones |
   |   | hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS |
   |   | hadoop.fs.contract.hdfs.TestHDFSContractAppend |
   |   | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark |
   |   | hadoop.hdfs.TestDFSShell |
   |   | hadoop.hdfs.TestFileCreation |
   |   | hadoop.fs.viewfs.TestViewFsHdfs |
   |   | hadoop.fs.contract.hdfs.TestHDFSContractRename |
   |   | hadoop.hdfs.TestDFSUpgradeFromImage |
   |   | hadoop.hdfs.server.namenode.TestReencryption |
   |   | hadoop.fs.viewfs.TestViewFileSystemLinkFallback |
   |   | hadoop.hdfs.TestDFSRename |
   |   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport |
   |   | hadoop.cli.TestAclCLI |
   |   | hadoop.hdfs.web.TestWebHdfsFileSystemContract |
   |   | hadoop.hdfs.tools.offli

[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824459#comment-17824459
 ] 

ASF GitHub Bot commented on HDFS-17146:
---

hadoop-yetus commented on PR #6595:
URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1983822417

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  48m  8s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   1m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 16s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  8s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 16s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  41m 20s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 291m 14s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 448m  1s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/7/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6595 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 6d1ecdf07d36 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 53ff41a79cd2904c76053cfca956b0511270b1ec |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/7/testReport/ |
   | Max. process+thread count | 2596 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | C

[jira] [Commented] (HDFS-17391) Adjust the checkpoint io buffer size to the chunk size

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824423#comment-17824423
 ] 

ASF GitHub Bot commented on HDFS-17391:
---

hadoop-yetus commented on PR #6594:
URL: https://github.com/apache/hadoop/pull/6594#issuecomment-1983556930

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 36s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 44s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  3s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 4 unchanged - 4 
fixed = 4 total (was 8)  |
   | +1 :green_heart: |  mvnsite  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 19s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 197m 48s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6594/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 28s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 286m 18s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6594/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6594 |
   | JIRA Issue | HDFS-17391 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f97425500da8 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 595e396fa499ab7b0a67ad1d9f4d4d762a14e260 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-U

[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824417#comment-17824417
 ] 

ASF GitHub Bot commented on HDFS-17364:
---

bbeaudreault commented on PR #6514:
URL: https://github.com/apache/hadoop/pull/6514#issuecomment-1983545666

   @Hexiaoqiao @zhangshuyan0 Thanks for your reviews. I realized there's also 
an ElasticBufferPool in DFSStripedOutputStream. I'm thinking of handling that 
here as well. What do you think?




> Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
> 
>
> Key: HDFS-17364
> URL: https://issues.apache.org/jira/browse/HDFS-17364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for 
> the "curStripeBuf". This is used for non-positional (stateful) reads and is 
> allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that 
> means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is 
> finished, the buffer is put back in the pool. Over time and with spikes of 
> concurrent reads, the pool grows and most of the buffers sit there unused.
>  
> WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates 
> this issue because the excess buffers can be GC'd once they are no longer 
> needed. We should use this same pool in DFSStripedInputStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824416#comment-17824416
 ] 

ASF GitHub Bot commented on HDFS-17364:
---

bbeaudreault commented on code in PR #6514:
URL: https://github.com/apache/hadoop/pull/6514#discussion_r1516181907


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:
##
@@ -3159,6 +3165,21 @@ private void initThreadsNumForStripedReads(int 
numThreads) {
 }
   }
 
+  private void initBufferPoolForStripedReads(boolean useWeakReference) {
+if (STRIPED_READ_BUFFER_POOL != null) {
+  return;
+}
+synchronized (DFSClient.class) {

Review Comment:
   @Hexiaoqiao thanks for review. For this block, it's sort of modeled after 
other examples in DFSClient, such [as initializing the striped read thread 
pool](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L3150).
 I think the idea is that DFSClient could easily be used in multiple threads, 
so we want to avoid double initializing the shared resource.





> Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
> 
>
> Key: HDFS-17364
> URL: https://issues.apache.org/jira/browse/HDFS-17364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for 
> the "curStripeBuf". This is used for non-positional (stateful) reads and is 
> allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that 
> means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is 
> finished, the buffer is put back in the pool. Over time and with spikes of 
> concurrent reads, the pool grows and most of the buffers sit there unused.
>  
> WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates 
> this issue because the excess buffers can be GC'd once they are no longer 
> needed. We should use this same pool in DFSStripedInputStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824415#comment-17824415
 ] 

ASF GitHub Bot commented on HDFS-17364:
---

bbeaudreault commented on code in PR #6514:
URL: https://github.com/apache/hadoop/pull/6514#discussion_r1516179674


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java:
##
@@ -530,6 +530,9 @@ interface StripedRead {
  * span 6 DNs, so this default value accommodates 3 read streams
  */
 int THREADPOOL_SIZE_DEFAULT = 18;
+
+String WEAK_REF_BUFFER_POOL_KEY = PREFIX + 
"bufferpool.weak.references.enabled";
+boolean WEAK_REF_BUFFER_POOL_DEFAULT = false;

Review Comment:
   Will do





> Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
> 
>
> Key: HDFS-17364
> URL: https://issues.apache.org/jira/browse/HDFS-17364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for 
> the "curStripeBuf". This is used for non-positional (stateful) reads and is 
> allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that 
> means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is 
> finished, the buffer is put back in the pool. Over time and with spikes of 
> concurrent reads, the pool grows and most of the buffers sit there unused.
>  
> WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates 
> this issue because the excess buffers can be GC'd once they are no longer 
> needed. We should use this same pool in DFSStripedInputStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17368) HA: Standy should exit safemode when resources are from low available

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824410#comment-17824410
 ] 

ASF GitHub Bot commented on HDFS-17368:
---

Hexiaoqiao commented on code in PR #6518:
URL: https://github.com/apache/hadoop/pull/6518#discussion_r1516122704


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java:
##
@@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, 
boolean isObserver)
   standbyCheckpointer = new StandbyCheckpointer(conf, this);
   standbyCheckpointer.start();
 }
+if (isNoManualAndResourceLowSafeMode()) {
+  LOG.info("Standby should not enter safe mode when resources are low, 
exiting safe mode.");
+  leaveSafeMode(false);

Review Comment:
   It is reasonable at first glance, not think carefully, any cases to trigger 
Standby leave safemode untimely? Thanks.





> HA: Standy should exit safemode when resources are from low available
> -
>
> Key: HDFS-17368
> URL: https://issues.apache.org/jira/browse/HDFS-17368
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> The NameNodeResourceMonitor automatically enters safemode when it detects 
> that the resources are not suffcient. NNRM is only in ANN. If both ANN and 
> SNN enter SM due to low resources, and later SNN's disk space is restored, 
> SNN willl become ANN and ANN will become SNN. However, at this point, SNN 
> will not exit the SM, even if the disk is recovered.
> Consider the following scenario:
>  * Initially, nn-1 is active and nn-2 is standby. The insufficient resources 
> of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
> detects the resource issue and puts nn01 into safemode.
>  * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
> safemode (OFF) and standby.
>  * After a period of time, the resources in nn-2's dfs.namenode.name.dir 
> recover, triggering failover.
>  * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode 
> (OFF) and active.
>  * Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
>  * However, since nn-1 is standby but in safemode (ON), it unable to exit 
> safe mode automatically.
> There are two possible ways fix this issues:
>  # If SNN is detected to be in SM(because low resource), it will exit.
>  # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back 
> to SNN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17368) HA: Standy should exit safemode when resources are from low available

2024-03-07 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824403#comment-17824403
 ] 

Xiaoqiao He commented on HDFS-17368:


Add [~zilong zhu] to contributor list and assign this ticket to him/her.

> HA: Standy should exit safemode when resources are from low available
> -
>
> Key: HDFS-17368
> URL: https://issues.apache.org/jira/browse/HDFS-17368
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> The NameNodeResourceMonitor automatically enters safemode when it detects 
> that the resources are not suffcient. NNRM is only in ANN. If both ANN and 
> SNN enter SM due to low resources, and later SNN's disk space is restored, 
> SNN willl become ANN and ANN will become SNN. However, at this point, SNN 
> will not exit the SM, even if the disk is recovered.
> Consider the following scenario:
>  * Initially, nn-1 is active and nn-2 is standby. The insufficient resources 
> of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
> detects the resource issue and puts nn01 into safemode.
>  * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
> safemode (OFF) and standby.
>  * After a period of time, the resources in nn-2's dfs.namenode.name.dir 
> recover, triggering failover.
>  * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode 
> (OFF) and active.
>  * Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
>  * However, since nn-1 is standby but in safemode (ON), it unable to exit 
> safe mode automatically.
> There are two possible ways fix this issues:
>  # If SNN is detected to be in SM(because low resource), it will exit.
>  # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back 
> to SNN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17368) HA: Standy should exit safemode when resources are from low available

2024-03-07 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reassigned HDFS-17368:
--

Assignee: Zilong Zhu

> HA: Standy should exit safemode when resources are from low available
> -
>
> Key: HDFS-17368
> URL: https://issues.apache.org/jira/browse/HDFS-17368
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Zilong Zhu
>Assignee: Zilong Zhu
>Priority: Major
>  Labels: pull-request-available
>
> The NameNodeResourceMonitor automatically enters safemode when it detects 
> that the resources are not suffcient. NNRM is only in ANN. If both ANN and 
> SNN enter SM due to low resources, and later SNN's disk space is restored, 
> SNN willl become ANN and ANN will become SNN. However, at this point, SNN 
> will not exit the SM, even if the disk is recovered.
> Consider the following scenario:
>  * Initially, nn-1 is active and nn-2 is standby. The insufficient resources 
> of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor 
> detects the resource issue and puts nn01 into safemode.
>  * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in 
> safemode (OFF) and standby.
>  * After a period of time, the resources in nn-2's dfs.namenode.name.dir 
> recover, triggering failover.
>  * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode 
> (OFF) and active.
>  * Afterward, the resources in nn-1's dfs.namenode.name.dir recover.
>  * However, since nn-1 is standby but in safemode (ON), it unable to exit 
> safe mode automatically.
> There are two possible ways fix this issues:
>  # If SNN is detected to be in SM(because low resource), it will exit.
>  # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back 
> to SNN.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824398#comment-17824398
 ] 

ASF GitHub Bot commented on HDFS-17364:
---

Hexiaoqiao commented on code in PR #6514:
URL: https://github.com/apache/hadoop/pull/6514#discussion_r1516099526


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java:
##
@@ -530,6 +530,9 @@ interface StripedRead {
  * span 6 DNs, so this default value accommodates 3 read streams
  */
 int THREADPOOL_SIZE_DEFAULT = 18;
+
+String WEAK_REF_BUFFER_POOL_KEY = PREFIX + 
"bufferpool.weak.references.enabled";
+boolean WEAK_REF_BUFFER_POOL_DEFAULT = false;

Review Comment:
   Please also add this default config to core-default.xml.



##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:
##
@@ -3159,6 +3165,21 @@ private void initThreadsNumForStripedReads(int 
numThreads) {
 }
   }
 
+  private void initBufferPoolForStripedReads(boolean useWeakReference) {
+if (STRIPED_READ_BUFFER_POOL != null) {
+  return;
+}
+synchronized (DFSClient.class) {

Review Comment:
   What this `synchronized` would like to protect?





> Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
> 
>
> Key: HDFS-17364
> URL: https://issues.apache.org/jira/browse/HDFS-17364
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bryan Beaudreault
>Priority: Major
>  Labels: pull-request-available
>
> DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for 
> the "curStripeBuf". This is used for non-positional (stateful) reads and is 
> allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that 
> means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is 
> finished, the buffer is put back in the pool. Over time and with spikes of 
> concurrent reads, the pool grows and most of the buffers sit there unused.
>  
> WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates 
> this issue because the excess buffers can be GC'd once they are no longer 
> needed. We should use this same pool in DFSStripedInputStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17401) EC: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824389#comment-17824389
 ] 

ASF GitHub Bot commented on HDFS-17401:
---

haiyang1987 commented on code in PR #6597:
URL: https://github.com/apache/hadoop/pull/6597#discussion_r1516059534


##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReconstructStripedBlocks.java:
##
@@ -575,5 +576,82 @@ public void testReconstructionWithStorageTypeNotEnough() 
throws Exception {
   cluster.shutdown();
 }
   }
+  @Test
+  public void testDeleteOverReplicatedStripedBlock() throws Exception {
+final HdfsConfiguration conf = new HdfsConfiguration();
+conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1);
+conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_CONSIDERLOAD_KEY,
+false);
+StorageType[][] st = new StorageType[groupSize + 2][1];
+for (int i = 0;i < st.length-1;i++){
+  st[i] = new StorageType[]{StorageType.SSD};
+}
+st[st.length -1] = new StorageType[]{StorageType.DISK};
+
+cluster = new MiniDFSCluster.Builder(conf).numDataNodes(groupSize + 2)
+.storagesPerDatanode(1)
+.storageTypes(st)
+.build();
+cluster.waitActive();
+DistributedFileSystem fs = cluster.getFileSystem();
+fs.enableErasureCodingPolicy(
+StripedFileTestUtil.getDefaultECPolicy().getName());
+try {
+  fs.mkdirs(dirPath);
+  fs.setErasureCodingPolicy(dirPath,
+  StripedFileTestUtil.getDefaultECPolicy().getName());
+  fs.setStoragePolicy(dirPath, HdfsConstants.ALLSSD_STORAGE_POLICY_NAME);
+  DFSTestUtil.createFile(fs, filePath,
+  cellSize * dataBlocks * 2, (short) 1, 0L);
+  FSNamesystem fsn3 = cluster.getNamesystem();
+  BlockManager bm3 = fsn3.getBlockManager();
+  // stop a dn

Review Comment:
   The first letter should be uppercase~





> EC: Excess internal block may not be able to be deleted correctly when it's 
> stored in fallback storage
> --
>
> Key: HDFS-17401
> URL: https://issues.apache.org/jira/browse/HDFS-17401
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: Ruinan Gu
>Assignee: Ruinan Gu
>Priority: Major
>  Labels: pull-request-available
>
> Excess internal block can't be deleted correctly when it's stored in fallback 
> storage.
> Simple case:
> EC-RS-6-3-1024k file is stored using ALL_SSD storage policy(SSD is default 
> storage type and DISK is fallback storage type), if the block group is as 
> follows
> [0(SSD), 0(SSD), 1(SSD), 2(SSD), 3(SSD), 4(SSD), 5(SSD), 6(SSD), 7(SSD), 
> 8(DISK)] 
> The are two index 0 internal block and one of them should be chosen to 
> delete.But the current implement chooses the index 0 internal blocks as 
> candidates but DISK as exess storage type.As a result, the exess storage 
> type(DISK) can not correspond to the exess internal blocks' storage type(SSD) 
> correctly, and the exess internal block can not be deleted correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17410:

Description: 
There are some client RPCs are used to change file attributes.

This ticket is used to make these RPCs supporting fine-grained lock.
 * setReplication
 * getStoragePolicies
 * setStoragePolicy
 * unsetStoragePolicy
 * getStoragePolicy
 * setPermission
 * setOwner
 * setTimes
 * concat
 * truncate
 * setQuota
 * getQuotaUsage
 * modifyAclEntries
 * removeAclEntries
 * removeDefaultAcl
 * removeAcl
 * setAcl
 * getAclStatus
 * getEZForPath
 * listEncryptionZones
 * reencryptEncryptionZone
 * listReencryptionStatus

  was:
There are some client RPCs are used to change file attributes.

This ticket is used to make these RPCs supporting fine-grained lock.
 * setReplication
 * getStoragePolicies
 * setStoragePolicy
 * unsetStoragePolicy
 * getStoragePolicy
 * setPermission
 * setOwner
 * setTimes
 * concat
 * truncate
 *  


> [FGL] Client RPCs that changes file attributes supports fine-grained lock
> -
>
> Key: HDFS-17410
> URL: https://issues.apache.org/jira/browse/HDFS-17410
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> There are some client RPCs are used to change file attributes.
> This ticket is used to make these RPCs supporting fine-grained lock.
>  * setReplication
>  * getStoragePolicies
>  * setStoragePolicy
>  * unsetStoragePolicy
>  * getStoragePolicy
>  * setPermission
>  * setOwner
>  * setTimes
>  * concat
>  * truncate
>  * setQuota
>  * getQuotaUsage
>  * modifyAclEntries
>  * removeAclEntries
>  * removeDefaultAcl
>  * removeAcl
>  * setAcl
>  * getAclStatus
>  * getEZForPath
>  * listEncryptionZones
>  * reencryptEncryptionZone
>  * listReencryptionStatus



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17388:

Description: 
The client write process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * mkdir 
 * create
 * addBlock
 * abandonBlock
 * getAdditionalDatanode
 * updateBlockForPipeline
 * updatePipeline
 * fsync
 * commit
 * rename
 * rename2
 * append
 * renewLease
 * recoverLease
 * delete

  was:
The client write process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * mkdir 
 * create
 * addBlock
 * abandonBlock
 * getAdditionalDatanode
 * upadteBlockForPipeline
 * updatePipeline
 * fsync
 * commit
 * rename
 * rename2
 * append
 * renewLease
 * recoverLease


> [FGL] Client RPCs involving write process supports fine-grained lock
> 
>
> Key: HDFS-17388
> URL: https://issues.apache.org/jira/browse/HDFS-17388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client write process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * mkdir 
>  * create
>  * addBlock
>  * abandonBlock
>  * getAdditionalDatanode
>  * updateBlockForPipeline
>  * updatePipeline
>  * fsync
>  * commit
>  * rename
>  * rename2
>  * append
>  * renewLease
>  * recoverLease
>  * delete



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17411) [FGL] Client RPCs involving snapshot support fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17411:

Description: 
There are some client rpcs to handle snapshot.

This ticket is used to make these RPCs supporting fine-grained locking.
 * getSnapshottableDirListing
 * getSnapshotListing
 * createSnapshot
 * deleteSnapshot
 * renameSnapshot
 * allowSnapshot
 * disallowSnapshot
 * getSnapshotDiffReport
 * getSnapshotDiffReportListing

  was:
There are some client rpcs to handle snapshot.

This ticket is used to make these RPCs supporting fine-grained locking.
 *  


> [FGL] Client RPCs involving snapshot support fine-grained lock
> --
>
> Key: HDFS-17411
> URL: https://issues.apache.org/jira/browse/HDFS-17411
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> There are some client rpcs to handle snapshot.
> This ticket is used to make these RPCs supporting fine-grained locking.
>  * getSnapshottableDirListing
>  * getSnapshotListing
>  * createSnapshot
>  * deleteSnapshot
>  * renameSnapshot
>  * allowSnapshot
>  * disallowSnapshot
>  * getSnapshotDiffReport
>  * getSnapshotDiffReportListing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Description: 
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks
 * getServerDefaults
 * getStats
 * getReplicatedBlockStats
 * getECBlockGroupStats
 * getPreferredBlockSize
 * listCorruptFileBlocks
 * getContentSummary
 * getLocatedFileInfo
 * createEncryptionZone

  was:
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks


> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client read process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * getListing
>  * getBatchedListing
>  * listOpenFiles
>  * getFileInfo
>  * isFileClosed
>  * getBlockLocations
>  * reportBadBlocks
>  * getServerDefaults
>  * getStats
>  * getReplicatedBlockStats
>  * getECBlockGroupStats
>  * getPreferredBlockSize
>  * listCorruptFileBlocks
>  * getContentSummary
>  * getLocatedFileInfo
>  * createEncryptionZone



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17411) [FGL] Client RPCs involving snapshot support fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17411:
---

 Summary: [FGL] Client RPCs involving snapshot support fine-grained 
lock
 Key: HDFS-17411
 URL: https://issues.apache.org/jira/browse/HDFS-17411
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


There are some client rpcs to handle snapshot.

This ticket is used to make these RPCs supporting fine-grained locking.
 *  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)
ZanderXu created HDFS-17410:
---

 Summary: [FGL] Client RPCs that changes file attributes supports 
fine-grained lock
 Key: HDFS-17410
 URL: https://issues.apache.org/jira/browse/HDFS-17410
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: ZanderXu
Assignee: ZanderXu


There are some client RPCs are used to change file attributes.

This ticket is used to make these RPCs supporting fine-grained lock.
 * setReplication
 * getStoragePolicies
 * setStoragePolicy
 * unsetStoragePolicy
 * getStoragePolicy
 * setPermission
 * setOwner
 * setTimes
 * concat
 * truncate
 *  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824353#comment-17824353
 ] 

ASF GitHub Bot commented on HDFS-17380:
---

Hexiaoqiao commented on PR #6549:
URL: https://github.com/apache/hadoop/pull/6549#issuecomment-1983209877

   Hi @szetszwo , Thanks for your works. I am not sure if this is one safe 
operations. Now it keeps at least 2 checkpoints by 
default(dfs.namenode.num.checkpoints.retained), it configs to more than default 
value in production env generally. We should recover from other fsimages first 
if one fsimage file is corrupted IMO rather than remove inaccessible nodes then 
recover. I am afraid this will be not acceptable in most case. Thanks again.




> FsImageValidation: remove inaccessible nodes
> 
>
> Key: HDFS-17380
> URL: https://issues.apache.org/jira/browse/HDFS-17380
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> If a fsimage is corrupted,  it may have inaccessible nodes.  The 
> FsImageValidation tool currently is able to identify the inaccessible nodes 
> when validating the INodeMap.  This JIRA is to update the tool to remove the 
> inaccessible nodes and then save a new fsimage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Description: 
The client read process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * getListing
 * getBatchedListing
 * listOpenFiles
 * getFileInfo
 * isFileClosed
 * getBlockLocations
 * reportBadBlocks

  was:The Create RPC minimizes the scope of the global BM lock, because it 
doesn't need the global BM lock in most scenes.


> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client read process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * getListing
>  * getBatchedListing
>  * listOpenFiles
>  * getFileInfo
>  * isFileClosed
>  * getBlockLocations
>  * reportBadBlocks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824348#comment-17824348
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

hadoop-yetus commented on PR #6612:
URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1983180401

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +0 :ok: |  mvndep  |  12m 59s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  22m 27s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   2m 14s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   0m 37s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 37s |  |  branch-3.3 passed  |
   | -1 :x: |  spotbugs  |   1m 26s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs-client in branch-3.3 has 2 extant spotbugs 
warnings.  |
   | +1 :green_heart: |  shadedclient  |  23m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m  9s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m  9s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 33s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 249 unchanged - 3 fixed = 249 total (was 252)  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 18s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  8s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 49s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 172m 34s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 276m 57s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.protocol.TestBlockListAsLongs |
   |   | hadoop.hdfs.server.mover.TestMover |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   |   | hadoop.hdfs.TestLeaseRecovery2 |
   |   | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   |   | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6612 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8029685ad3de 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / 5d4a6ed957d86f85618f70f27d11f6077336b16f |
   | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/testReport/ |
   | Max. process+thread count | 4424 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-client 
hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/

[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17389:

Summary: [FGL] Client RPCs involving read process supports fine-grained 
lock  (was: [FGL] Create RPC minimizes the scope of the global BM lock)

> [FGL] Client RPCs involving read process supports fine-grained lock
> ---
>
> Key: HDFS-17389
> URL: https://issues.apache.org/jira/browse/HDFS-17389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The Create RPC minimizes the scope of the global BM lock, because it doesn't 
> need the global BM lock in most scenes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17388:

Description: 
The client write process involves many client RPCs. 

 

This ticket is used to make these RPCs support fine-grained lock.
 * mkdir 
 * create
 * addBlock
 * abandonBlock
 * getAdditionalDatanode
 * upadteBlockForPipeline
 * updatePipeline
 * fsync
 * commit
 * rename
 * rename2
 * append
 * renewLease
 * recoverLease

  was:
The create RPC just involves directory tree if it creates a new file, and most 
scenes are like this. It involves blocks only if the file is existing and it 
tries to overwrite it.

So in most scenarios, the create RPC just needs FS lock.

The current implementation just holds the global write lock, so in order for 
the improvement to be better accepted, the first step is just to replace the 
lock mode without changing logic. We can minimize the scope of the global BM 
lock in the second step.


> [FGL] Client RPCs involving write process supports fine-grained lock
> 
>
> Key: HDFS-17388
> URL: https://issues.apache.org/jira/browse/HDFS-17388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The client write process involves many client RPCs. 
>  
> This ticket is used to make these RPCs support fine-grained lock.
>  * mkdir 
>  * create
>  * addBlock
>  * abandonBlock
>  * getAdditionalDatanode
>  * upadteBlockForPipeline
>  * updatePipeline
>  * fsync
>  * commit
>  * rename
>  * rename2
>  * append
>  * renewLease
>  * recoverLease



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock

2024-03-07 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu updated HDFS-17388:

Summary: [FGL] Client RPCs involving write process supports fine-grained 
lock  (was: [FGL] Create RPC supports this fine-grained locking I)

> [FGL] Client RPCs involving write process supports fine-grained lock
> 
>
> Key: HDFS-17388
> URL: https://issues.apache.org/jira/browse/HDFS-17388
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> The create RPC just involves directory tree if it creates a new file, and 
> most scenes are like this. It involves blocks only if the file is existing 
> and it tries to overwrite it.
> So in most scenarios, the create RPC just needs FS lock.
> The current implementation just holds the global write lock, so in order for 
> the improvement to be better accepted, the first step is just to replace the 
> lock mode without changing logic. We can minimize the scope of the global BM 
> lock in the second step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17407) Exception during image upload

2024-03-07 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329
 ] 

ruiliang edited comment on HDFS-17407 at 3/7/24 9:29 AM:
-

After analyzing the log and source code, it is because the two sbnn initiated 
Checkpoint at the same time. When the latter checked the file flow, it found 
that the file had been updated and threw an exception. Should not output as an 
exception?

SbNN 1 log

 
{code:java}
root@cluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-nn1.xx.com.log 
2024-03-07 16:48:00,061 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 
expecting start txid #57258734311
2024-03-07 16:48:00,061 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:02,592 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 2 seconds {code}
SbNN 2 log

 
{code:java}
root@cluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-nn3.xx.com.log
2024-03-07 16:48:32,536 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd 
expecting start txid #57258734311
2024-03-07 16:48:32,536 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.xxcom:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:35,634 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inP

[jira] [Comment Edited] (HDFS-17407) Exception during image upload

2024-03-07 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329
 ] 

ruiliang edited comment on HDFS-17407 at 3/7/24 9:26 AM:
-

After analyzing the log and source code, it is because the two sbnn initiated 
Checkpoint at the same time. When the latter checked the file flow, it found 
that the file had been updated and threw an exception. Should not output as an 
exception?

SbNN 1 log

 
{code:java}
root@cluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-yynn1.xx.com.log 
2024-03-07 16:48:00,061 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 
expecting start txid #57258734311
2024-03-07 16:48:00,061 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:02,592 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 2 seconds {code}
SbNN 2 log

 
{code:java}
root@cluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-cluster06-yynn3.xx.com.log
2024-03-07 16:48:32,536 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd 
expecting start txid #57258734311
2024-03-07 16:48:32,536 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:35,634 INFO  namenode.FSImage 
(FSEditLogLoader.java:lo

[jira] [Updated] (HDFS-17407) Exception during image upload

2024-03-07 Thread ruiliang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ruiliang updated HDFS-17407:

Issue Type: Improvement  (was: Bug)

> Exception during image upload
> -
>
> Key: HDFS-17407
> URL: https://issues.apache.org/jira/browse/HDFS-17407
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.1.0
> Environment: hadoop 3.1.0 
> linux:ubuntu 16.04
> ambari-hdp:3.1.1
>Reporter: ruiliang
>Priority: Major
>
> After I added the third hdfs namenode, the service was fine. However, the two 
> Standby namenode service logs always show exceptions during image upload. 
> However, I observe that the image file of the primary node is being updated 
> normally, which indicates that the secondary node has merged the image file 
> and uploaded it to the primary node. But I don't understand why two Standby 
> namenode keep sending such exception logs. Are there potential risk issues?
>  
> namenode log 
> {code:java}
> 2024-03-01 15:31:46,162 INFO  namenode.TransferFsImage 
> (TransferFsImage.java:copyFileToStream(394)) - Sending fileName: 
> /data/hadoop/hdfs/namenode/current/fsimage_55689095810, fileSize: 
> 4626167848. Sent total: 1703936 bytes. Size of last segment intended to send: 
> 131072 bytes.
> java.io.IOException: Error writing request body to server
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> 2024-03-01 15:31:46,630 INFO  blockmanagement.BlockManager 
> (BlockManager.java:enqueue(4923)) - Block report queue is full
> 2024-03-01 15:31:46,664 ERROR ha.StandbyCheckpointer 
> (StandbyCheckpointer.java:doWork(452)) - Exception in doCheckpoint
> java.io.IOException: Exception during image upload
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:257)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1500(StandbyCheckpointer.java:62)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:432)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:331)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:351)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:360)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:347)
> Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
> Error writing request body to server
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:250)
>         ... 9 more
> Caused by: java.io.IOException: Error writing request body to server
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587)
>         at 
> sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570)
>         at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(T

[jira] [Commented] (HDFS-17407) Exception during image upload

2024-03-07 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329
 ] 

ruiliang commented on HDFS-17407:
-

After analyzing the log and source code, it is because the two sbnn initiated 
Checkpoint at the same time. When the latter checked the file flow, it found 
that the file had been updated and threw an exception. Should not output as an 
exception?

SbNN 1 log

 
{code:java}
root@fs-hiido-yycluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn1.hiido.host.yydevops.com.log 
2024-03-07 16:48:00,061 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 
expecting start txid #57258734311
2024-03-07 16:48:00,061 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:00,061 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:02,592 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(162)) - Edits file 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 of size 35380849 edits # 214398 loaded in 2 seconds {code}
SbNN 2 log

 
{code:java}
root@fs-hiido-yycluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 
hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.int.yy.com.log
2024-03-07 16:48:32,536 INFO  namenode.FSImage (FSImage.java:loadEdits(887)) - 
Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd 
expecting start txid #57258734311
2024-03-07 16:48:32,536 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file 
http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true
 maxTxnsToRead = 9223372036854775807
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true,
 
http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:32,536 INFO  namenode.RedundantEditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 
'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true'
 to transaction ID 57258734311
2024-03-07 16:48:35,634 INFO  namenode.FSIm

[jira] [Assigned] (HDFS-17391) Adjust the checkpoint io buffer size to the chunk size

2024-03-07 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reassigned HDFS-17391:
--

Assignee: lei w

> Adjust the checkpoint io buffer size to the chunk size
> --
>
> Key: HDFS-17391
> URL: https://issues.apache.org/jira/browse/HDFS-17391
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
>
> Adjust the checkpoint io buffer size to the chunk size to reduce checkpoint 
> time.
> Before change:
> 2022-07-11 07:10:50,900 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
> txid 374700896827 to namenode at http://:50070 in 1729.465 seconds
> After change:
> 2022-07-12 08:15:55,068 INFO 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with 
> txid 375717629244 to namenode at http://:50070  in 858.668 seconds



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-07 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He reassigned HDFS-17408:
--

Assignee: lei w

> Reduce the number of quota calculations in FSDirRenameOp
> 
>
> Key: HDFS-17408
> URL: https://issues.apache.org/jira/browse/HDFS-17408
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Assignee: lei w
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824320#comment-17824320
 ] 

ASF GitHub Bot commented on HDFS-17408:
---

Hexiaoqiao commented on PR #6608:
URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1983018540

   Some nit point: It will be helpful for reviewers when add some description 
about this improvement background and target. If offer benchmark result will be 
better.




> Reduce the number of quota calculations in FSDirRenameOp
> 
>
> Key: HDFS-17408
> URL: https://issues.apache.org/jira/browse/HDFS-17408
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824319#comment-17824319
 ] 

ASF GitHub Bot commented on HDFS-17408:
---

Hexiaoqiao commented on PR #6608:
URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1983003619

   Thanks @ThinkerLei for your works. It's great performance improvement! The 
last CI didn't run clean, try to trigger it again. Let's wait what it will say.




> Reduce the number of quota calculations in FSDirRenameOp
> 
>
> Key: HDFS-17408
> URL: https://issues.apache.org/jira/browse/HDFS-17408
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824295#comment-17824295
 ] 

ASF GitHub Bot commented on HDFS-17146:
---

hadoop-yetus commented on PR #6595:
URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1982845035

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  compile  |   0m 39s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 39s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 38s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 206m  8s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 294m 12s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport |
   |   | hadoop.hdfs.protocol.TestBlockListAsLongs |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.44 ServerAPI=1.44 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6595 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 92fc8280315c 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 
15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8a47b5fee635b96071b99ac3b460e852cf25a6d5 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/6/testReport/ |
   | Max. process+thread count | 3963 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | C

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824294#comment-17824294
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

Hexiaoqiao commented on PR #6613:
URL: https://github.com/apache/hadoop/pull/6613#issuecomment-1982836474

   Hi @ritegarg Thanks for your PR. branch-3.2 has been EOL. We should not 
submit PR to this branch. I will close this one. Please feel free to reopen it 
if something I missed. Thanks again.




> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824293#comment-17824293
 ] 

ASF GitHub Bot commented on HDFS-17299:
---

Hexiaoqiao closed pull request #6613: HDFS-17299. Adding rack failure tolerance 
when creating a new file  (…
URL: https://github.com/apache/hadoop/pull/6613




> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.Da