[jira] [Created] (HDFS-17420) [FGL] FSEditLogLoader supports fine-grained lock
ZanderXu created HDFS-17420: --- Summary: [FGL] FSEditLogLoader supports fine-grained lock Key: HDFS-17420 URL: https://issues.apache.org/jira/browse/HDFS-17420 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu [FGL] FSEditLogLoader supports fine-grained lock -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17419) [FGL] CacheReplicationMonitor supports fine-grained lock
ZanderXu created HDFS-17419: --- Summary: [FGL] CacheReplicationMonitor supports fine-grained lock Key: HDFS-17419 URL: https://issues.apache.org/jira/browse/HDFS-17419 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17418) [FGL] DatanodeAdminMonitor supports fine-grained locking
ZanderXu created HDFS-17418: --- Summary: [FGL] DatanodeAdminMonitor supports fine-grained locking Key: HDFS-17418 URL: https://issues.apache.org/jira/browse/HDFS-17418 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu [FGL] DatanodeAdminMonitor supports fine-grained locking. * DatanodeAdminBackoffMonitor * DatanodeAdminDefaultMonitor -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17417) [FGL] Monitor in HeartbeatManager supports fine-grained lock
ZanderXu created HDFS-17417: --- Summary: [FGL] Monitor in HeartbeatManager supports fine-grained lock Key: HDFS-17417 URL: https://issues.apache.org/jira/browse/HDFS-17417 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu [FGL] Monitor in HeartbeatManager supports fine-grained lock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17416) [FGL] Monitor threads in BlockManager.class support fine-grained lock
ZanderXu created HDFS-17416: --- Summary: [FGL] Monitor threads in BlockManager.class support fine-grained lock Key: HDFS-17416 URL: https://issues.apache.org/jira/browse/HDFS-17416 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu There are some monitor threads in BlockManager.class. This ticket is used to make these threads supporting fine-grained locking. * BlockReportProcessingThread * MarkedDeleteBlockScrubber * RedundancyMonitor * Reconstruction Queue Initializer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17415) [FGL] RPCs in NamenodeProtocol support fine-grained lock
ZanderXu created HDFS-17415: --- Summary: [FGL] RPCs in NamenodeProtocol support fine-grained lock Key: HDFS-17415 URL: https://issues.apache.org/jira/browse/HDFS-17415 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu [FGL] RPCs in NamenodeProtocol support fine-grained lock. * getBlocks * getBlockKeys * getTransactionID * getMostRecentCheckpointTxId * rollEditLog * versionRequest * errorReport * registerSubordinateNamenode * startCheckpoint * endCheckpoint * getEditLogManifest * isUpgradeFinalized * isRollingUpgrade * getNextSPSPath -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17414) [FGL] RPCs in DatanodeProtocol support fine-grained lock
ZanderXu created HDFS-17414: --- Summary: [FGL] RPCs in DatanodeProtocol support fine-grained lock Key: HDFS-17414 URL: https://issues.apache.org/jira/browse/HDFS-17414 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu [FGL] RPCs in DatanodeProtocol support fine-grained lock. * registerDatanode * sendHeartbeat * sendLifeline * blockReport * cacheReport * blockReceivedAndDeleted * errorReport * versionRequest * reportBadBlocks * commitBlockSynchronization -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17413) [FGL] Client RPCs involving Cache supports fine-grained lock
ZanderXu created HDFS-17413: --- Summary: [FGL] Client RPCs involving Cache supports fine-grained lock Key: HDFS-17413 URL: https://issues.apache.org/jira/browse/HDFS-17413 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu Client RPCs involving Cache supports fine-grained lock. * addCacheDirective * modifyCacheDirective * removeCacheDirective * listCacheDirectives * addCachePool * modifyCachePool * removeCachePool * listCachePools -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17389: Description: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone * msync * checkAccess * getFileLinkInfo * getLinkTarget * getDelegationToken * getDataEncryptionKey was: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone * msync * checkAccess > [FGL] Client RPCs involving read process supports fine-grained lock > --- > > Key: HDFS-17389 > URL: https://issues.apache.org/jira/browse/HDFS-17389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client read process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * getListing > * getBatchedListing > * listOpenFiles > * getFileInfo > * isFileClosed > * getBlockLocations > * reportBadBlocks > * getServerDefaults > * getStats > * getReplicatedBlockStats > * getECBlockGroupStats > * getPreferredBlockSize > * listCorruptFileBlocks > * getContentSummary > * getLocatedFileInfo > * createEncryptionZone > * msync > * checkAccess > * getFileLinkInfo > * getLinkTarget > * getDelegationToken > * getDataEncryptionKey -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17388: Description: The client write process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * mkdir * create * addBlock * abandonBlock * getAdditionalDatanode * updateBlockForPipeline * updatePipeline * fsync * commit * rename * rename2 * append * renewLease * recoverLease * delete * createSymlink * renewDelegationToken * cancelDelegationToken was: The client write process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * mkdir * create * addBlock * abandonBlock * getAdditionalDatanode * updateBlockForPipeline * updatePipeline * fsync * commit * rename * rename2 * append * renewLease * recoverLease * delete > [FGL] Client RPCs involving write process supports fine-grained lock > > > Key: HDFS-17388 > URL: https://issues.apache.org/jira/browse/HDFS-17388 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client write process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * mkdir > * create > * addBlock > * abandonBlock > * getAdditionalDatanode > * updateBlockForPipeline > * updatePipeline > * fsync > * commit > * rename > * rename2 > * append > * renewLease > * recoverLease > * delete > * createSymlink > * renewDelegationToken > * cancelDelegationToken -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17389: Description: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone * msync * checkAccess was: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone * msync > [FGL] Client RPCs involving read process supports fine-grained lock > --- > > Key: HDFS-17389 > URL: https://issues.apache.org/jira/browse/HDFS-17389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client read process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * getListing > * getBatchedListing > * listOpenFiles > * getFileInfo > * isFileClosed > * getBlockLocations > * reportBadBlocks > * getServerDefaults > * getStats > * getReplicatedBlockStats > * getECBlockGroupStats > * getPreferredBlockSize > * listCorruptFileBlocks > * getContentSummary > * getLocatedFileInfo > * createEncryptionZone > * msync > * checkAccess -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17410: Description: There are some client RPCs are used to change file attributes. This ticket is used to make these RPCs supporting fine-grained lock. * setReplication * getStoragePolicies * setStoragePolicy * unsetStoragePolicy * satisfyStoragePolicy * getStoragePolicy * setPermission * setOwner * setTimes * concat * truncate * setQuota * getQuotaUsage * modifyAclEntries * removeAclEntries * removeDefaultAcl * removeAcl * setAcl * getAclStatus * getEZForPath * listEncryptionZones * reencryptEncryptionZone * listReencryptionStatus * setXAttr * getXAttrs * listXAttrs * removeXAttr was: There are some client RPCs are used to change file attributes. This ticket is used to make these RPCs supporting fine-grained lock. * setReplication * getStoragePolicies * setStoragePolicy * unsetStoragePolicy * getStoragePolicy * setPermission * setOwner * setTimes * concat * truncate * setQuota * getQuotaUsage * modifyAclEntries * removeAclEntries * removeDefaultAcl * removeAcl * setAcl * getAclStatus * getEZForPath * listEncryptionZones * reencryptEncryptionZone * listReencryptionStatus > [FGL] Client RPCs that changes file attributes supports fine-grained lock > - > > Key: HDFS-17410 > URL: https://issues.apache.org/jira/browse/HDFS-17410 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > > There are some client RPCs are used to change file attributes. > This ticket is used to make these RPCs supporting fine-grained lock. > * setReplication > * getStoragePolicies > * setStoragePolicy > * unsetStoragePolicy > * satisfyStoragePolicy > * getStoragePolicy > * setPermission > * setOwner > * setTimes > * concat > * truncate > * setQuota > * getQuotaUsage > * modifyAclEntries > * removeAclEntries > * removeDefaultAcl > * removeAcl > * setAcl > * getAclStatus > * getEZForPath > * listEncryptionZones > * reencryptEncryptionZone > * listReencryptionStatus > * setXAttr > * getXAttrs > * listXAttrs > * removeXAttr -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17389: Description: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone * msync was: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone > [FGL] Client RPCs involving read process supports fine-grained lock > --- > > Key: HDFS-17389 > URL: https://issues.apache.org/jira/browse/HDFS-17389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client read process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * getListing > * getBatchedListing > * listOpenFiles > * getFileInfo > * isFileClosed > * getBlockLocations > * reportBadBlocks > * getServerDefaults > * getStats > * getReplicatedBlockStats > * getECBlockGroupStats > * getPreferredBlockSize > * listCorruptFileBlocks > * getContentSummary > * getLocatedFileInfo > * createEncryptionZone > * msync -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17412) [FGL] Client RPCs involving maintenance supports fine-grained lock
ZanderXu created HDFS-17412: --- Summary: [FGL] Client RPCs involving maintenance supports fine-grained lock Key: HDFS-17412 URL: https://issues.apache.org/jira/browse/HDFS-17412 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu There are multiple client RPCs that used for admin to maintain cluster. This ticket is used to make these RPCs supporting fine-grained lock. * getDatanodeReport * getDatanodeStorageReport * setSafeMode * saveNamespace * metaSave * rollEdits * restoreFailedStorage * refreshNodes * finalizeUpgrade * upgradeStatus * rollingUpgrade * setBalancerBandwidth * getCurrentEditLogTxid * getEditsFromTxid * getHAServiceState * getSlowDatanodeReport * getEnclosingRoot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824608#comment-17824608 ] ASF GitHub Bot commented on HDFS-17408: --- ThinkerLei commented on PR #6608: URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1984976217 > Hi @ThinkerLei , Please check if the failed unite tests are related with this changes. @Hexiaoqiao Thanks for your reply, I will work on this soon. > Reduce the number of quota calculations in FSDirRenameOp > > > Key: HDFS-17408 > URL: https://issues.apache.org/jira/browse/HDFS-17408 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824606#comment-17824606 ] ASF GitHub Bot commented on HDFS-17408: --- Hexiaoqiao commented on PR #6608: URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1984973314 Hi @ThinkerLei , Please check if the failed unite tests are related with this changes. > Reduce the number of quota calculations in FSDirRenameOp > > > Key: HDFS-17408 > URL: https://issues.apache.org/jira/browse/HDFS-17408 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes
[ https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824605#comment-17824605 ] ASF GitHub Bot commented on HDFS-17380: --- Hexiaoqiao commented on PR #6549: URL: https://github.com/apache/hadoop/pull/6549#issuecomment-1984971681 @szetszwo Thanks for your response. > This may not be acceptable in some use cases since the newly created files will be lost (i.e. data loss) if we recover from an earlier fsimage. Recover from one earlier checkpoint will not loss data, it will keep both fsimage and all editlog util the latest transaction. > If we remove the inaccessible inodes, we won't lose any files. When you talk about `inaccessible inode`, do you mean NameNode unexpected logic cause some inodes are unreachable? > this is just a tool to fix fsimages. Users may choose not to use it if they are fine to recover from an earlier fsimage. +1. Will involve to review once understand what it will improve. Thanks again. > FsImageValidation: remove inaccessible nodes > > > Key: HDFS-17380 > URL: https://issues.apache.org/jira/browse/HDFS-17380 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > > If a fsimage is corrupted, it may have inaccessible nodes. The > FsImageValidation tool currently is able to identify the inaccessible nodes > when validating the INodeMap. This JIRA is to update the tool to remove the > inaccessible nodes and then save a new fsimage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824604#comment-17824604 ] ASF GitHub Bot commented on HDFS-17364: --- Hexiaoqiao commented on PR #6514: URL: https://github.com/apache/hadoop/pull/6514#issuecomment-1984965658 > @Hexiaoqiao @zhangshuyan0 Thanks for your reviews. I realized there's also an ElasticBufferPool in DFSStripedOutputStream. I'm thinking of handling that here as well. What do you think? +1 from my side. > Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream > > > Key: HDFS-17364 > URL: https://issues.apache.org/jira/browse/HDFS-17364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bryan Beaudreault >Priority: Major > Labels: pull-request-available > > DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for > the "curStripeBuf". This is used for non-positional (stateful) reads and is > allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that > means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is > finished, the buffer is put back in the pool. Over time and with spikes of > concurrent reads, the pool grows and most of the buffers sit there unused. > > WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates > this issue because the excess buffers can be GC'd once they are no longer > needed. We should use this same pool in DFSStripedInputStream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824603#comment-17824603 ] ASF GitHub Bot commented on HDFS-17364: --- Hexiaoqiao commented on code in PR #6514: URL: https://github.com/apache/hadoop/pull/6514#discussion_r1517110356 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java: ## @@ -3159,6 +3165,21 @@ private void initThreadsNumForStripedReads(int numThreads) { } } + private void initBufferPoolForStripedReads(boolean useWeakReference) { +if (STRIPED_READ_BUFFER_POOL != null) { + return; +} +synchronized (DFSClient.class) { Review Comment: Got it. > Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream > > > Key: HDFS-17364 > URL: https://issues.apache.org/jira/browse/HDFS-17364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bryan Beaudreault >Priority: Major > Labels: pull-request-available > > DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for > the "curStripeBuf". This is used for non-positional (stateful) reads and is > allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that > means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is > finished, the buffer is put back in the pool. Over time and with spikes of > concurrent reads, the pool grows and most of the buffers sit there unused. > > WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates > this issue because the excess buffers can be GC'd once they are no longer > needed. We should use this same pool in DFSStripedInputStream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824531#comment-17824531 ] ASF GitHub Bot commented on HDFS-17299: --- hadoop-yetus commented on PR #6612: URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1984392158 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ branch-3.3 Compile Tests _ | | +0 :ok: | mvndep | 13m 44s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 14s | | branch-3.3 passed | | +1 :green_heart: | compile | 2m 16s | | branch-3.3 passed | | +1 :green_heart: | checkstyle | 0m 38s | | branch-3.3 passed | | +1 :green_heart: | mvnsite | 1m 31s | | branch-3.3 passed | | +1 :green_heart: | javadoc | 1m 37s | | branch-3.3 passed | | -1 :x: | spotbugs | 1m 27s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/3/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html) | hadoop-hdfs-project/hadoop-hdfs-client in branch-3.3 has 2 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 22m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 23s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 2m 13s | | the patch passed | | +1 :green_heart: | javac | 2m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 31s | | hadoop-hdfs-project: The patch generated 0 new + 249 unchanged - 3 fixed = 249 total (was 252) | | +1 :green_heart: | mvnsite | 1m 18s | | the patch passed | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed | | +1 :green_heart: | spotbugs | 3m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 3s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 47s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 121m 34s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +0 :ok: | asflicense | 0m 27s | | ASF License check generated no output? | | | | 225m 17s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.datanode.TestBatchIbr | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.datanode.TestDataNodeErasureCodingMetrics | | | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy | | | hadoop.hdfs.server.datanode.TestDataNodeFaultInjector | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.datanode.TestBlockRecovery2 | | | hadoop.hdfs.TestParallelUnixDomainRead | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6612 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux e6f838e8b04d 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | |
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824518#comment-17824518 ] ASF GitHub Bot commented on HDFS-17299: --- ritegarg commented on PR #6612: URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1984310623 > There are few test failures. Can you please take a look? @ritegarg I was looking into the failures, looks like transient failures. The same tests are running fine locally. > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.pro
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824478#comment-17824478 ] ASF GitHub Bot commented on HDFS-17299: --- shahrs87 commented on PR #6612: URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1984012754 There are few test failures. Can you please take a look? @ritegarg > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apa
[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.
[ https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824473#comment-17824473 ] ASF GitHub Bot commented on HDFS-17146: --- hadoop-yetus commented on PR #6595: URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1983993404 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 24s | | trunk passed | | +1 :green_heart: | compile | 0m 45s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 39s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 46s | | trunk passed | | +1 :green_heart: | javadoc | 0m 41s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 0s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 42s | | the patch passed | | +1 :green_heart: | compile | 0m 41s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 41s | | the patch passed | | +1 :green_heart: | compile | 0m 38s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 38s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 31s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 46s | | the patch passed | | +1 :green_heart: | javadoc | 0m 31s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 9s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 57s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 215m 22s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/9/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 30s | | The patch does not generate ASF License warnings. | | | | 308m 42s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestNamenodeRetryCache | | | hadoop.hdfs.server.namenode.TestReconstructStripedBlocks | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6595 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 908072e384a3 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6c18e7912316a868d950d08f8525bd559629fa82 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/9/testReport/ | | Max. process+thread count | 4655 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/had
[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.
[ https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824465#comment-17824465 ] ASF GitHub Bot commented on HDFS-17146: --- hadoop-yetus commented on PR #6595: URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1983953951 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 25s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 1s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 5m 31s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/8/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 1m 27s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 46s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 40s | | the patch passed | | +1 :green_heart: | compile | 0m 42s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 42s | | the patch passed | | +1 :green_heart: | compile | 0m 38s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 38s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 30s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 40s | | the patch passed | | +1 :green_heart: | javadoc | 0m 30s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 10s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 56s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 30s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 228m 50s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/8/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 26s | | The patch does not generate ASF License warnings. | | | | 295m 34s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestEncryptionZonesWithKMS | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6595 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 75aec88bb203 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 6c18e7912316a868d950d08f8525bd559629fa82 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/jav
[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes
[ https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824462#comment-17824462 ] ASF GitHub Bot commented on HDFS-17380: --- szetszwo commented on PR #6549: URL: https://github.com/apache/hadoop/pull/6549#issuecomment-1983883646 @Hexiaoqiao , thanks for review this! > ... We should recover from other fsimages first if one fsimage file is corrupted ... This may not be acceptable in some use cases since the newly created files will be lost (i.e. data loss) if we recover from an earlier fsimage. If we remove the inaccessible inodes, we won't lose any files (i.e. no data loss). BTW, this is just a tool to fix fsimages. Users may choose not to use it if they are fine to recover from an earlier fsimage. > FsImageValidation: remove inaccessible nodes > > > Key: HDFS-17380 > URL: https://issues.apache.org/jira/browse/HDFS-17380 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > > If a fsimage is corrupted, it may have inaccessible nodes. The > FsImageValidation tool currently is able to identify the inaccessible nodes > when validating the INodeMap. This JIRA is to update the tool to remove the > inaccessible nodes and then save a new fsimage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824460#comment-17824460 ] ASF GitHub Bot commented on HDFS-17408: --- hadoop-yetus commented on PR #6608: URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1983825438 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 43s | | trunk passed | | +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 23s | | trunk passed | | +1 :green_heart: | javadoc | 1m 10s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 35s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 36s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 11s | | the patch passed | | +1 :green_heart: | compile | 1m 15s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 1m 15s | | the patch passed | | +1 :green_heart: | compile | 1m 12s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 12s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 4s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 46 unchanged - 1 fixed = 46 total (was 47) | | +1 :green_heart: | mvnsite | 1m 16s | | the patch passed | | +1 :green_heart: | javadoc | 0m 56s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 35s | | the patch passed | | -1 :x: | shadedclient | 40m 26s | | patch has errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 302m 57s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6608/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 52s | | The patch does not generate ASF License warnings. | | | | 458m 11s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestDeleteRace | | | hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | | hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.namenode.TestFSNamesystemLockReport | | | hadoop.hdfs.TestTrashWithEncryptionZones | | | hadoop.fs.viewfs.TestViewFSOverloadSchemeWithMountTableConfigInHDFS | | | hadoop.fs.contract.hdfs.TestHDFSContractAppend | | | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.TestFileCreation | | | hadoop.fs.viewfs.TestViewFsHdfs | | | hadoop.fs.contract.hdfs.TestHDFSContractRename | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.server.namenode.TestReencryption | | | hadoop.fs.viewfs.TestViewFileSystemLinkFallback | | | hadoop.hdfs.TestDFSRename | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDiffReport | | | hadoop.cli.TestAclCLI | | | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.tools.offli
[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.
[ https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824459#comment-17824459 ] ASF GitHub Bot commented on HDFS-17146: --- hadoop-yetus commented on PR #6595: URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1983822417 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 48m 8s | | trunk passed | | +1 :green_heart: | compile | 1m 29s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 1m 15s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 16s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 25s | | trunk passed | | +1 :green_heart: | javadoc | 1m 8s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 37s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 23s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 1m 23s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 7s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 16s | | the patch passed | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 29s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 41m 20s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 291m 14s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/7/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 57s | | The patch does not generate ASF License warnings. | | | | 448m 1s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport | | | hadoop.hdfs.protocol.TestBlockListAsLongs | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/7/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6595 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 6d1ecdf07d36 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 53ff41a79cd2904c76053cfca956b0511270b1ec | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/7/testReport/ | | Max. process+thread count | 2596 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | C
[jira] [Commented] (HDFS-17391) Adjust the checkpoint io buffer size to the chunk size
[ https://issues.apache.org/jira/browse/HDFS-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824423#comment-17824423 ] ASF GitHub Bot commented on HDFS-17391: --- hadoop-yetus commented on PR #6594: URL: https://github.com/apache/hadoop/pull/6594#issuecomment-1983556930 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 42s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 39s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 3s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 48s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 28s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 4 unchanged - 4 fixed = 4 total (was 8) | | +1 :green_heart: | mvnsite | 0m 37s | | the patch passed | | +1 :green_heart: | javadoc | 0m 33s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 43s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 19s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 197m 48s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6594/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 28s | | The patch does not generate ASF License warnings. | | | | 286m 18s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.server.datanode.TestLargeBlockReport | | | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand | | | hadoop.hdfs.protocol.TestBlockListAsLongs | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6594/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6594 | | JIRA Issue | HDFS-17391 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux f97425500da8 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 595e396fa499ab7b0a67ad1d9f4d4d762a14e260 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-U
[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824417#comment-17824417 ] ASF GitHub Bot commented on HDFS-17364: --- bbeaudreault commented on PR #6514: URL: https://github.com/apache/hadoop/pull/6514#issuecomment-1983545666 @Hexiaoqiao @zhangshuyan0 Thanks for your reviews. I realized there's also an ElasticBufferPool in DFSStripedOutputStream. I'm thinking of handling that here as well. What do you think? > Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream > > > Key: HDFS-17364 > URL: https://issues.apache.org/jira/browse/HDFS-17364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bryan Beaudreault >Priority: Major > Labels: pull-request-available > > DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for > the "curStripeBuf". This is used for non-positional (stateful) reads and is > allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that > means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is > finished, the buffer is put back in the pool. Over time and with spikes of > concurrent reads, the pool grows and most of the buffers sit there unused. > > WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates > this issue because the excess buffers can be GC'd once they are no longer > needed. We should use this same pool in DFSStripedInputStream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824416#comment-17824416 ] ASF GitHub Bot commented on HDFS-17364: --- bbeaudreault commented on code in PR #6514: URL: https://github.com/apache/hadoop/pull/6514#discussion_r1516181907 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java: ## @@ -3159,6 +3165,21 @@ private void initThreadsNumForStripedReads(int numThreads) { } } + private void initBufferPoolForStripedReads(boolean useWeakReference) { +if (STRIPED_READ_BUFFER_POOL != null) { + return; +} +synchronized (DFSClient.class) { Review Comment: @Hexiaoqiao thanks for review. For this block, it's sort of modeled after other examples in DFSClient, such [as initializing the striped read thread pool](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java#L3150). I think the idea is that DFSClient could easily be used in multiple threads, so we want to avoid double initializing the shared resource. > Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream > > > Key: HDFS-17364 > URL: https://issues.apache.org/jira/browse/HDFS-17364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bryan Beaudreault >Priority: Major > Labels: pull-request-available > > DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for > the "curStripeBuf". This is used for non-positional (stateful) reads and is > allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that > means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is > finished, the buffer is put back in the pool. Over time and with spikes of > concurrent reads, the pool grows and most of the buffers sit there unused. > > WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates > this issue because the excess buffers can be GC'd once they are no longer > needed. We should use this same pool in DFSStripedInputStream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824415#comment-17824415 ] ASF GitHub Bot commented on HDFS-17364: --- bbeaudreault commented on code in PR #6514: URL: https://github.com/apache/hadoop/pull/6514#discussion_r1516179674 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java: ## @@ -530,6 +530,9 @@ interface StripedRead { * span 6 DNs, so this default value accommodates 3 read streams */ int THREADPOOL_SIZE_DEFAULT = 18; + +String WEAK_REF_BUFFER_POOL_KEY = PREFIX + "bufferpool.weak.references.enabled"; +boolean WEAK_REF_BUFFER_POOL_DEFAULT = false; Review Comment: Will do > Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream > > > Key: HDFS-17364 > URL: https://issues.apache.org/jira/browse/HDFS-17364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bryan Beaudreault >Priority: Major > Labels: pull-request-available > > DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for > the "curStripeBuf". This is used for non-positional (stateful) reads and is > allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that > means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is > finished, the buffer is put back in the pool. Over time and with spikes of > concurrent reads, the pool grows and most of the buffers sit there unused. > > WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates > this issue because the excess buffers can be GC'd once they are no longer > needed. We should use this same pool in DFSStripedInputStream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17368) HA: Standy should exit safemode when resources are from low available
[ https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824410#comment-17824410 ] ASF GitHub Bot commented on HDFS-17368: --- Hexiaoqiao commented on code in PR #6518: URL: https://github.com/apache/hadoop/pull/6518#discussion_r1516122704 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java: ## @@ -1582,6 +1582,10 @@ void startStandbyServices(final Configuration conf, boolean isObserver) standbyCheckpointer = new StandbyCheckpointer(conf, this); standbyCheckpointer.start(); } +if (isNoManualAndResourceLowSafeMode()) { + LOG.info("Standby should not enter safe mode when resources are low, exiting safe mode."); + leaveSafeMode(false); Review Comment: It is reasonable at first glance, not think carefully, any cases to trigger Standby leave safemode untimely? Thanks. > HA: Standy should exit safemode when resources are from low available > - > > Key: HDFS-17368 > URL: https://issues.apache.org/jira/browse/HDFS-17368 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zilong Zhu >Assignee: Zilong Zhu >Priority: Major > Labels: pull-request-available > > The NameNodeResourceMonitor automatically enters safemode when it detects > that the resources are not suffcient. NNRM is only in ANN. If both ANN and > SNN enter SM due to low resources, and later SNN's disk space is restored, > SNN willl become ANN and ANN will become SNN. However, at this point, SNN > will not exit the SM, even if the disk is recovered. > Consider the following scenario: > * Initially, nn-1 is active and nn-2 is standby. The insufficient resources > of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor > detects the resource issue and puts nn01 into safemode. > * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in > safemode (OFF) and standby. > * After a period of time, the resources in nn-2's dfs.namenode.name.dir > recover, triggering failover. > * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode > (OFF) and active. > * Afterward, the resources in nn-1's dfs.namenode.name.dir recover. > * However, since nn-1 is standby but in safemode (ON), it unable to exit > safe mode automatically. > There are two possible ways fix this issues: > # If SNN is detected to be in SM(because low resource), it will exit. > # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back > to SNN. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17368) HA: Standy should exit safemode when resources are from low available
[ https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824403#comment-17824403 ] Xiaoqiao He commented on HDFS-17368: Add [~zilong zhu] to contributor list and assign this ticket to him/her. > HA: Standy should exit safemode when resources are from low available > - > > Key: HDFS-17368 > URL: https://issues.apache.org/jira/browse/HDFS-17368 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zilong Zhu >Assignee: Zilong Zhu >Priority: Major > Labels: pull-request-available > > The NameNodeResourceMonitor automatically enters safemode when it detects > that the resources are not suffcient. NNRM is only in ANN. If both ANN and > SNN enter SM due to low resources, and later SNN's disk space is restored, > SNN willl become ANN and ANN will become SNN. However, at this point, SNN > will not exit the SM, even if the disk is recovered. > Consider the following scenario: > * Initially, nn-1 is active and nn-2 is standby. The insufficient resources > of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor > detects the resource issue and puts nn01 into safemode. > * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in > safemode (OFF) and standby. > * After a period of time, the resources in nn-2's dfs.namenode.name.dir > recover, triggering failover. > * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode > (OFF) and active. > * Afterward, the resources in nn-1's dfs.namenode.name.dir recover. > * However, since nn-1 is standby but in safemode (ON), it unable to exit > safe mode automatically. > There are two possible ways fix this issues: > # If SNN is detected to be in SM(because low resource), it will exit. > # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back > to SNN. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17368) HA: Standy should exit safemode when resources are from low available
[ https://issues.apache.org/jira/browse/HDFS-17368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He reassigned HDFS-17368: -- Assignee: Zilong Zhu > HA: Standy should exit safemode when resources are from low available > - > > Key: HDFS-17368 > URL: https://issues.apache.org/jira/browse/HDFS-17368 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Zilong Zhu >Assignee: Zilong Zhu >Priority: Major > Labels: pull-request-available > > The NameNodeResourceMonitor automatically enters safemode when it detects > that the resources are not suffcient. NNRM is only in ANN. If both ANN and > SNN enter SM due to low resources, and later SNN's disk space is restored, > SNN willl become ANN and ANN will become SNN. However, at this point, SNN > will not exit the SM, even if the disk is recovered. > Consider the following scenario: > * Initially, nn-1 is active and nn-2 is standby. The insufficient resources > of both nn-1 and nn-2 in dfs.namenode.name.dir, the NameNodeResourceMonitor > detects the resource issue and puts nn01 into safemode. > * At this point, nn-1 is in safemode (ON) and active, while nn-2 is in > safemode (OFF) and standby. > * After a period of time, the resources in nn-2's dfs.namenode.name.dir > recover, triggering failover. > * Now, nn-1 is in safe mode (ON) and standby, while nn-2 is in safe mode > (OFF) and active. > * Afterward, the resources in nn-1's dfs.namenode.name.dir recover. > * However, since nn-1 is standby but in safemode (ON), it unable to exit > safe mode automatically. > There are two possible ways fix this issues: > # If SNN is detected to be in SM(because low resource), it will exit. > # Or we already have HDFS-17231, we can revert HDFS-2914. Bringing NNRM back > to SNN. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17364) Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream
[ https://issues.apache.org/jira/browse/HDFS-17364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824398#comment-17824398 ] ASF GitHub Bot commented on HDFS-17364: --- Hexiaoqiao commented on code in PR #6514: URL: https://github.com/apache/hadoop/pull/6514#discussion_r1516099526 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java: ## @@ -530,6 +530,9 @@ interface StripedRead { * span 6 DNs, so this default value accommodates 3 read streams */ int THREADPOOL_SIZE_DEFAULT = 18; + +String WEAK_REF_BUFFER_POOL_KEY = PREFIX + "bufferpool.weak.references.enabled"; +boolean WEAK_REF_BUFFER_POOL_DEFAULT = false; Review Comment: Please also add this default config to core-default.xml. ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java: ## @@ -3159,6 +3165,21 @@ private void initThreadsNumForStripedReads(int numThreads) { } } + private void initBufferPoolForStripedReads(boolean useWeakReference) { +if (STRIPED_READ_BUFFER_POOL != null) { + return; +} +synchronized (DFSClient.class) { Review Comment: What this `synchronized` would like to protect? > Use WeakReferencedElasticByteBufferPool in DFSStripedInputStream > > > Key: HDFS-17364 > URL: https://issues.apache.org/jira/browse/HDFS-17364 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bryan Beaudreault >Priority: Major > Labels: pull-request-available > > DFSStripedInputStream uses ElasticByteBufferPool to allocate byte buffers for > the "curStripeBuf". This is used for non-positional (stateful) reads and is > allocated with a size of numDataBlocks * cellSize. For RS-6-3-1024k, that > means each DFSStripedInputStream could allocate a 6mb buffer. When the IS is > finished, the buffer is put back in the pool. Over time and with spikes of > concurrent reads, the pool grows and most of the buffers sit there unused. > > WeakReferencedElasticByteBufferPool was introduced HADOOP-18105 and mitigates > this issue because the excess buffers can be GC'd once they are no longer > needed. We should use this same pool in DFSStripedInputStream -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17401) EC: Excess internal block may not be able to be deleted correctly when it's stored in fallback storage
[ https://issues.apache.org/jira/browse/HDFS-17401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824389#comment-17824389 ] ASF GitHub Bot commented on HDFS-17401: --- haiyang1987 commented on code in PR #6597: URL: https://github.com/apache/hadoop/pull/6597#discussion_r1516059534 ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestReconstructStripedBlocks.java: ## @@ -575,5 +576,82 @@ public void testReconstructionWithStorageTypeNotEnough() throws Exception { cluster.shutdown(); } } + @Test + public void testDeleteOverReplicatedStripedBlock() throws Exception { +final HdfsConfiguration conf = new HdfsConfiguration(); +conf.setInt(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_INTERVAL_SECONDS_KEY, 1); +conf.setBoolean(DFSConfigKeys.DFS_NAMENODE_REDUNDANCY_CONSIDERLOAD_KEY, +false); +StorageType[][] st = new StorageType[groupSize + 2][1]; +for (int i = 0;i < st.length-1;i++){ + st[i] = new StorageType[]{StorageType.SSD}; +} +st[st.length -1] = new StorageType[]{StorageType.DISK}; + +cluster = new MiniDFSCluster.Builder(conf).numDataNodes(groupSize + 2) +.storagesPerDatanode(1) +.storageTypes(st) +.build(); +cluster.waitActive(); +DistributedFileSystem fs = cluster.getFileSystem(); +fs.enableErasureCodingPolicy( +StripedFileTestUtil.getDefaultECPolicy().getName()); +try { + fs.mkdirs(dirPath); + fs.setErasureCodingPolicy(dirPath, + StripedFileTestUtil.getDefaultECPolicy().getName()); + fs.setStoragePolicy(dirPath, HdfsConstants.ALLSSD_STORAGE_POLICY_NAME); + DFSTestUtil.createFile(fs, filePath, + cellSize * dataBlocks * 2, (short) 1, 0L); + FSNamesystem fsn3 = cluster.getNamesystem(); + BlockManager bm3 = fsn3.getBlockManager(); + // stop a dn Review Comment: The first letter should be uppercase~ > EC: Excess internal block may not be able to be deleted correctly when it's > stored in fallback storage > -- > > Key: HDFS-17401 > URL: https://issues.apache.org/jira/browse/HDFS-17401 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: Ruinan Gu >Assignee: Ruinan Gu >Priority: Major > Labels: pull-request-available > > Excess internal block can't be deleted correctly when it's stored in fallback > storage. > Simple case: > EC-RS-6-3-1024k file is stored using ALL_SSD storage policy(SSD is default > storage type and DISK is fallback storage type), if the block group is as > follows > [0(SSD), 0(SSD), 1(SSD), 2(SSD), 3(SSD), 4(SSD), 5(SSD), 6(SSD), 7(SSD), > 8(DISK)] > The are two index 0 internal block and one of them should be chosen to > delete.But the current implement chooses the index 0 internal blocks as > candidates but DISK as exess storage type.As a result, the exess storage > type(DISK) can not correspond to the exess internal blocks' storage type(SSD) > correctly, and the exess internal block can not be deleted correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17410: Description: There are some client RPCs are used to change file attributes. This ticket is used to make these RPCs supporting fine-grained lock. * setReplication * getStoragePolicies * setStoragePolicy * unsetStoragePolicy * getStoragePolicy * setPermission * setOwner * setTimes * concat * truncate * setQuota * getQuotaUsage * modifyAclEntries * removeAclEntries * removeDefaultAcl * removeAcl * setAcl * getAclStatus * getEZForPath * listEncryptionZones * reencryptEncryptionZone * listReencryptionStatus was: There are some client RPCs are used to change file attributes. This ticket is used to make these RPCs supporting fine-grained lock. * setReplication * getStoragePolicies * setStoragePolicy * unsetStoragePolicy * getStoragePolicy * setPermission * setOwner * setTimes * concat * truncate * > [FGL] Client RPCs that changes file attributes supports fine-grained lock > - > > Key: HDFS-17410 > URL: https://issues.apache.org/jira/browse/HDFS-17410 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > > There are some client RPCs are used to change file attributes. > This ticket is used to make these RPCs supporting fine-grained lock. > * setReplication > * getStoragePolicies > * setStoragePolicy > * unsetStoragePolicy > * getStoragePolicy > * setPermission > * setOwner > * setTimes > * concat > * truncate > * setQuota > * getQuotaUsage > * modifyAclEntries > * removeAclEntries > * removeDefaultAcl > * removeAcl > * setAcl > * getAclStatus > * getEZForPath > * listEncryptionZones > * reencryptEncryptionZone > * listReencryptionStatus -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17388: Description: The client write process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * mkdir * create * addBlock * abandonBlock * getAdditionalDatanode * updateBlockForPipeline * updatePipeline * fsync * commit * rename * rename2 * append * renewLease * recoverLease * delete was: The client write process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * mkdir * create * addBlock * abandonBlock * getAdditionalDatanode * upadteBlockForPipeline * updatePipeline * fsync * commit * rename * rename2 * append * renewLease * recoverLease > [FGL] Client RPCs involving write process supports fine-grained lock > > > Key: HDFS-17388 > URL: https://issues.apache.org/jira/browse/HDFS-17388 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client write process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * mkdir > * create > * addBlock > * abandonBlock > * getAdditionalDatanode > * updateBlockForPipeline > * updatePipeline > * fsync > * commit > * rename > * rename2 > * append > * renewLease > * recoverLease > * delete -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17411) [FGL] Client RPCs involving snapshot support fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17411: Description: There are some client rpcs to handle snapshot. This ticket is used to make these RPCs supporting fine-grained locking. * getSnapshottableDirListing * getSnapshotListing * createSnapshot * deleteSnapshot * renameSnapshot * allowSnapshot * disallowSnapshot * getSnapshotDiffReport * getSnapshotDiffReportListing was: There are some client rpcs to handle snapshot. This ticket is used to make these RPCs supporting fine-grained locking. * > [FGL] Client RPCs involving snapshot support fine-grained lock > -- > > Key: HDFS-17411 > URL: https://issues.apache.org/jira/browse/HDFS-17411 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > > There are some client rpcs to handle snapshot. > This ticket is used to make these RPCs supporting fine-grained locking. > * getSnapshottableDirListing > * getSnapshotListing > * createSnapshot > * deleteSnapshot > * renameSnapshot > * allowSnapshot > * disallowSnapshot > * getSnapshotDiffReport > * getSnapshotDiffReportListing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17389: Description: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks * getServerDefaults * getStats * getReplicatedBlockStats * getECBlockGroupStats * getPreferredBlockSize * listCorruptFileBlocks * getContentSummary * getLocatedFileInfo * createEncryptionZone was: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks > [FGL] Client RPCs involving read process supports fine-grained lock > --- > > Key: HDFS-17389 > URL: https://issues.apache.org/jira/browse/HDFS-17389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client read process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * getListing > * getBatchedListing > * listOpenFiles > * getFileInfo > * isFileClosed > * getBlockLocations > * reportBadBlocks > * getServerDefaults > * getStats > * getReplicatedBlockStats > * getECBlockGroupStats > * getPreferredBlockSize > * listCorruptFileBlocks > * getContentSummary > * getLocatedFileInfo > * createEncryptionZone -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17411) [FGL] Client RPCs involving snapshot support fine-grained lock
ZanderXu created HDFS-17411: --- Summary: [FGL] Client RPCs involving snapshot support fine-grained lock Key: HDFS-17411 URL: https://issues.apache.org/jira/browse/HDFS-17411 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu There are some client rpcs to handle snapshot. This ticket is used to make these RPCs supporting fine-grained locking. * -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17410) [FGL] Client RPCs that changes file attributes supports fine-grained lock
ZanderXu created HDFS-17410: --- Summary: [FGL] Client RPCs that changes file attributes supports fine-grained lock Key: HDFS-17410 URL: https://issues.apache.org/jira/browse/HDFS-17410 Project: Hadoop HDFS Issue Type: Sub-task Reporter: ZanderXu Assignee: ZanderXu There are some client RPCs are used to change file attributes. This ticket is used to make these RPCs supporting fine-grained lock. * setReplication * getStoragePolicies * setStoragePolicy * unsetStoragePolicy * getStoragePolicy * setPermission * setOwner * setTimes * concat * truncate * -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17380) FsImageValidation: remove inaccessible nodes
[ https://issues.apache.org/jira/browse/HDFS-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824353#comment-17824353 ] ASF GitHub Bot commented on HDFS-17380: --- Hexiaoqiao commented on PR #6549: URL: https://github.com/apache/hadoop/pull/6549#issuecomment-1983209877 Hi @szetszwo , Thanks for your works. I am not sure if this is one safe operations. Now it keeps at least 2 checkpoints by default(dfs.namenode.num.checkpoints.retained), it configs to more than default value in production env generally. We should recover from other fsimages first if one fsimage file is corrupted IMO rather than remove inaccessible nodes then recover. I am afraid this will be not acceptable in most case. Thanks again. > FsImageValidation: remove inaccessible nodes > > > Key: HDFS-17380 > URL: https://issues.apache.org/jira/browse/HDFS-17380 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > > If a fsimage is corrupted, it may have inaccessible nodes. The > FsImageValidation tool currently is able to identify the inaccessible nodes > when validating the INodeMap. This JIRA is to update the tool to remove the > inaccessible nodes and then save a new fsimage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17389: Description: The client read process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * getListing * getBatchedListing * listOpenFiles * getFileInfo * isFileClosed * getBlockLocations * reportBadBlocks was:The Create RPC minimizes the scope of the global BM lock, because it doesn't need the global BM lock in most scenes. > [FGL] Client RPCs involving read process supports fine-grained lock > --- > > Key: HDFS-17389 > URL: https://issues.apache.org/jira/browse/HDFS-17389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client read process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * getListing > * getBatchedListing > * listOpenFiles > * getFileInfo > * isFileClosed > * getBlockLocations > * reportBadBlocks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824348#comment-17824348 ] ASF GitHub Bot commented on HDFS-17299: --- hadoop-yetus commented on PR #6612: URL: https://github.com/apache/hadoop/pull/6612#issuecomment-1983180401 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ branch-3.3 Compile Tests _ | | +0 :ok: | mvndep | 12m 59s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 27s | | branch-3.3 passed | | +1 :green_heart: | compile | 2m 14s | | branch-3.3 passed | | +1 :green_heart: | checkstyle | 0m 37s | | branch-3.3 passed | | +1 :green_heart: | mvnsite | 1m 27s | | branch-3.3 passed | | +1 :green_heart: | javadoc | 1m 37s | | branch-3.3 passed | | -1 :x: | spotbugs | 1m 26s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client-warnings.html) | hadoop-hdfs-project/hadoop-hdfs-client in branch-3.3 has 2 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 23m 12s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 21s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 18s | | the patch passed | | +1 :green_heart: | compile | 2m 9s | | the patch passed | | +1 :green_heart: | javac | 2m 9s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 33s | | hadoop-hdfs-project: The patch generated 0 new + 249 unchanged - 3 fixed = 249 total (was 252) | | +1 :green_heart: | mvnsite | 1m 20s | | the patch passed | | +1 :green_heart: | javadoc | 1m 18s | | the patch passed | | +1 :green_heart: | spotbugs | 3m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 8s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 49s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 172m 34s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 31s | | The patch does not generate ASF License warnings. | | | | 276m 57s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.protocol.TestBlockListAsLongs | | | hadoop.hdfs.server.mover.TestMover | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | | hadoop.hdfs.server.datanode.TestLargeBlockReport | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6612 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 8029685ad3de 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | branch-3.3 / 5d4a6ed957d86f85618f70f27d11f6077336b16f | | Default Java | Private Build-1.8.0_362-8u372-ga~us1-0ubuntu1~18.04-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6612/2/testReport/ | | Max. process+thread count | 4424 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/
[jira] [Updated] (HDFS-17389) [FGL] Client RPCs involving read process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17389: Summary: [FGL] Client RPCs involving read process supports fine-grained lock (was: [FGL] Create RPC minimizes the scope of the global BM lock) > [FGL] Client RPCs involving read process supports fine-grained lock > --- > > Key: HDFS-17389 > URL: https://issues.apache.org/jira/browse/HDFS-17389 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The Create RPC minimizes the scope of the global BM lock, because it doesn't > need the global BM lock in most scenes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17388: Description: The client write process involves many client RPCs. This ticket is used to make these RPCs support fine-grained lock. * mkdir * create * addBlock * abandonBlock * getAdditionalDatanode * upadteBlockForPipeline * updatePipeline * fsync * commit * rename * rename2 * append * renewLease * recoverLease was: The create RPC just involves directory tree if it creates a new file, and most scenes are like this. It involves blocks only if the file is existing and it tries to overwrite it. So in most scenarios, the create RPC just needs FS lock. The current implementation just holds the global write lock, so in order for the improvement to be better accepted, the first step is just to replace the lock mode without changing logic. We can minimize the scope of the global BM lock in the second step. > [FGL] Client RPCs involving write process supports fine-grained lock > > > Key: HDFS-17388 > URL: https://issues.apache.org/jira/browse/HDFS-17388 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The client write process involves many client RPCs. > > This ticket is used to make these RPCs support fine-grained lock. > * mkdir > * create > * addBlock > * abandonBlock > * getAdditionalDatanode > * upadteBlockForPipeline > * updatePipeline > * fsync > * commit > * rename > * rename2 > * append > * renewLease > * recoverLease -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17388) [FGL] Client RPCs involving write process supports fine-grained lock
[ https://issues.apache.org/jira/browse/HDFS-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu updated HDFS-17388: Summary: [FGL] Client RPCs involving write process supports fine-grained lock (was: [FGL] Create RPC supports this fine-grained locking I) > [FGL] Client RPCs involving write process supports fine-grained lock > > > Key: HDFS-17388 > URL: https://issues.apache.org/jira/browse/HDFS-17388 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The create RPC just involves directory tree if it creates a new file, and > most scenes are like this. It involves blocks only if the file is existing > and it tries to overwrite it. > So in most scenarios, the create RPC just needs FS lock. > The current implementation just holds the global write lock, so in order for > the improvement to be better accepted, the first step is just to replace the > lock mode without changing logic. We can minimize the scope of the global BM > lock in the second step. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-17407) Exception during image upload
[ https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329 ] ruiliang edited comment on HDFS-17407 at 3/7/24 9:29 AM: - After analyzing the log and source code, it is because the two sbnn initiated Checkpoint at the same time. When the latter checked the file flow, it found that the file had been updated and threw an exception. Should not output as an exception? SbNN 1 log {code:java} root@cluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 hadoop-hdfs-namenode-cluster06-nn1.xx.com.log 2024-03-07 16:48:00,061 INFO namenode.FSImage (FSImage.java:loadEdits(887)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 expecting start txid #57258734311 2024-03-07 16:48:00,061 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true maxTxnsToRead = 9223372036854775807 2024-03-07 16:48:00,061 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:00,061 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:02,592 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(162)) - Edits file http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true of size 35380849 edits # 214398 loaded in 2 seconds {code} SbNN 2 log {code:java} root@cluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 hadoop-hdfs-namenode-cluster06-nn3.xx.com.log 2024-03-07 16:48:32,536 INFO namenode.FSImage (FSImage.java:loadEdits(887)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd expecting start txid #57258734311 2024-03-07 16:48:32,536 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true maxTxnsToRead = 9223372036854775807 2024-03-07 16:48:32,536 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-190.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:32,536 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-191.xxcom:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:35,634 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(162)) - Edits file http://fs-nn-party-65-191.xx.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inP
[jira] [Comment Edited] (HDFS-17407) Exception during image upload
[ https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329 ] ruiliang edited comment on HDFS-17407 at 3/7/24 9:26 AM: - After analyzing the log and source code, it is because the two sbnn initiated Checkpoint at the same time. When the latter checked the file flow, it found that the file had been updated and threw an exception. Should not output as an exception? SbNN 1 log {code:java} root@cluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 hadoop-hdfs-namenode-cluster06-yynn1.xx.com.log 2024-03-07 16:48:00,061 INFO namenode.FSImage (FSImage.java:loadEdits(887)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 expecting start txid #57258734311 2024-03-07 16:48:00,061 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true maxTxnsToRead = 9223372036854775807 2024-03-07 16:48:00,061 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:00,061 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:02,592 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(162)) - Edits file http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true of size 35380849 edits # 214398 loaded in 2 seconds {code} SbNN 2 log {code:java} root@cluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 hadoop-hdfs-namenode-cluster06-yynn3.xx.com.log 2024-03-07 16:48:32,536 INFO namenode.FSImage (FSImage.java:loadEdits(887)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd expecting start txid #57258734311 2024-03-07 16:48:32,536 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true maxTxnsToRead = 9223372036854775807 2024-03-07 16:48:32,536 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:32,536 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:35,634 INFO namenode.FSImage (FSEditLogLoader.java:lo
[jira] [Updated] (HDFS-17407) Exception during image upload
[ https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ruiliang updated HDFS-17407: Issue Type: Improvement (was: Bug) > Exception during image upload > - > > Key: HDFS-17407 > URL: https://issues.apache.org/jira/browse/HDFS-17407 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.1.0 > Environment: hadoop 3.1.0 > linux:ubuntu 16.04 > ambari-hdp:3.1.1 >Reporter: ruiliang >Priority: Major > > After I added the third hdfs namenode, the service was fine. However, the two > Standby namenode service logs always show exceptions during image upload. > However, I observe that the image file of the primary node is being updated > normally, which indicates that the secondary node has merged the image file > and uploaded it to the primary node. But I don't understand why two Standby > namenode keep sending such exception logs. Are there potential risk issues? > > namenode log > {code:java} > 2024-03-01 15:31:46,162 INFO namenode.TransferFsImage > (TransferFsImage.java:copyFileToStream(394)) - Sending fileName: > /data/hadoop/hdfs/namenode/current/fsimage_55689095810, fileSize: > 4626167848. Sent total: 1703936 bytes. Size of last segment intended to send: > 131072 bytes. > java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(TransferFsImage.java:376) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.writeFileToPutRequest(TransferFsImage.java:320) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImage(TransferFsImage.java:294) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.uploadImageFromStorage(TransferFsImage.java:229) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:236) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$1.call(StandbyCheckpointer.java:231) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2024-03-01 15:31:46,630 INFO blockmanagement.BlockManager > (BlockManager.java:enqueue(4923)) - Block report queue is full > 2024-03-01 15:31:46,664 ERROR ha.StandbyCheckpointer > (StandbyCheckpointer.java:doWork(452)) - Exception in doCheckpoint > java.io.IOException: Exception during image upload > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:257) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1500(StandbyCheckpointer.java:62) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:432) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$600(StandbyCheckpointer.java:331) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:351) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:347) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > Error writing request body to server > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:250) > ... 9 more > Caused by: java.io.IOException: Error writing request body to server > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.checkError(HttpURLConnection.java:3587) > at > sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.write(HttpURLConnection.java:3570) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.copyFileToStream(T
[jira] [Commented] (HDFS-17407) Exception during image upload
[ https://issues.apache.org/jira/browse/HDFS-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824329#comment-17824329 ] ruiliang commented on HDFS-17407: - After analyzing the log and source code, it is because the two sbnn initiated Checkpoint at the same time. When the latter checked the file flow, it found that the file had been updated and threw an exception. Should not output as an exception? SbNN 1 log {code:java} root@fs-hiido-yycluster06-yynn1:/data/logs/hadoop/hdfs# grep 57258734311 hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn1.hiido.host.yydevops.com.log 2024-03-07 16:48:00,061 INFO namenode.FSImage (FSImage.java:loadEdits(887)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4afc4056 expecting start txid #57258734311 2024-03-07 16:48:00,061 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true maxTxnsToRead = 9223372036854775807 2024-03-07 16:48:00,061 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:00,061 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:02,592 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(162)) - Edits file http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true of size 35380849 edits # 214398 loaded in 2 seconds {code} SbNN 2 log {code:java} root@fs-hiido-yycluster06-yynn3:/data/logs/hadoop/hdfs# grep 57258734311 hadoop-hdfs-namenode-fs-hiido-yycluster06-yynn3.hiido.host.int.yy.com.log 2024-03-07 16:48:32,536 INFO namenode.FSImage (FSImage.java:loadEdits(887)) - Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6d0659cd expecting start txid #57258734311 2024-03-07 16:48:32,536 INFO namenode.FSImage (FSEditLogLoader.java:loadFSEdits(158)) - Start loading edits file http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true maxTxnsToRead = 9223372036854775807 2024-03-07 16:48:32,536 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true, http://fs-nn-party-65-190.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:32,536 INFO namenode.RedundantEditLogInputStream (RedundantEditLogInputStream.java:nextOp(177)) - Fast-forwarding stream 'http://fs-nn-party-65-191.hiido.host.yydevops.com:8480/getJournal?jid=yycluster06&segmentTxId=57258734311&storageInfo=-64%3A848315649%3A1660893388633%3ACID-1becf536-8c05-40cb-a1ff-106923139c5c&inProgressOk=true' to transaction ID 57258734311 2024-03-07 16:48:35,634 INFO namenode.FSIm
[jira] [Assigned] (HDFS-17391) Adjust the checkpoint io buffer size to the chunk size
[ https://issues.apache.org/jira/browse/HDFS-17391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He reassigned HDFS-17391: -- Assignee: lei w > Adjust the checkpoint io buffer size to the chunk size > -- > > Key: HDFS-17391 > URL: https://issues.apache.org/jira/browse/HDFS-17391 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > > Adjust the checkpoint io buffer size to the chunk size to reduce checkpoint > time. > Before change: > 2022-07-11 07:10:50,900 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with > txid 374700896827 to namenode at http://:50070 in 1729.465 seconds > After change: > 2022-07-12 08:15:55,068 INFO > org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with > txid 375717629244 to namenode at http://:50070 in 858.668 seconds -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He reassigned HDFS-17408: -- Assignee: lei w > Reduce the number of quota calculations in FSDirRenameOp > > > Key: HDFS-17408 > URL: https://issues.apache.org/jira/browse/HDFS-17408 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Assignee: lei w >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824320#comment-17824320 ] ASF GitHub Bot commented on HDFS-17408: --- Hexiaoqiao commented on PR #6608: URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1983018540 Some nit point: It will be helpful for reviewers when add some description about this improvement background and target. If offer benchmark result will be better. > Reduce the number of quota calculations in FSDirRenameOp > > > Key: HDFS-17408 > URL: https://issues.apache.org/jira/browse/HDFS-17408 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17408) Reduce the number of quota calculations in FSDirRenameOp
[ https://issues.apache.org/jira/browse/HDFS-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824319#comment-17824319 ] ASF GitHub Bot commented on HDFS-17408: --- Hexiaoqiao commented on PR #6608: URL: https://github.com/apache/hadoop/pull/6608#issuecomment-1983003619 Thanks @ThinkerLei for your works. It's great performance improvement! The last CI didn't run clean, try to trigger it again. Let's wait what it will say. > Reduce the number of quota calculations in FSDirRenameOp > > > Key: HDFS-17408 > URL: https://issues.apache.org/jira/browse/HDFS-17408 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: lei w >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17146) Use the dfsadmin -reconfig command to initiate reconfiguration on all decommissioning datanodes.
[ https://issues.apache.org/jira/browse/HDFS-17146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824295#comment-17824295 ] ASF GitHub Bot commented on HDFS-17146: --- hadoop-yetus commented on PR #6595: URL: https://github.com/apache/hadoop/pull/6595#issuecomment-1982845035 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 26s | | trunk passed | | +1 :green_heart: | compile | 0m 43s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 39s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 45s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 34s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 39s | | the patch passed | | +1 :green_heart: | compile | 0m 37s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javac | 0m 37s | | the patch passed | | +1 :green_heart: | compile | 0m 34s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 34s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 27s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 38s | | the patch passed | | +1 :green_heart: | javadoc | 0m 32s | | the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 42s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 206m 8s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 31s | | The patch does not generate ASF License warnings. | | | | 294m 12s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestLargeBlockReport | | | hadoop.hdfs.protocol.TestBlockListAsLongs | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6595 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 92fc8280315c 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 8a47b5fee635b96071b99ac3b460e852cf25a6d5 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6595/6/testReport/ | | Max. process+thread count | 3963 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | C
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824294#comment-17824294 ] ASF GitHub Bot commented on HDFS-17299: --- Hexiaoqiao commented on PR #6613: URL: https://github.com/apache/hadoop/pull/6613#issuecomment-1982836474 Hi @ritegarg Thanks for your PR. branch-3.2 has been EOL. We should not submit PR to this branch. I will close this one. Please feel free to reopen it if something I missed. Thanks again. > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824293#comment-17824293 ] ASF GitHub Bot commented on HDFS-17299: --- Hexiaoqiao closed pull request #6613: HDFS-17299. Adding rack failure tolerance when creating a new file (… URL: https://github.com/apache/hadoop/pull/6613 > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.1, 3.5.0 > > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.Da