[jira] [Created] (HDFS-16989) Large scale block transfer causes too many excess blocks
ZhangHB created HDFS-16989: -- Summary: Large scale block transfer causes too many excess blocks Key: HDFS-16989 URL: https://issues.apache.org/jira/browse/HDFS-16989 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.3.5, 3.4.0 Reporter: ZhangHB -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16937) Delete RPC should also record number of delete blocks in audit log
ZhangHB created HDFS-16937: -- Summary: Delete RPC should also record number of delete blocks in audit log Key: HDFS-16937 URL: https://issues.apache.org/jira/browse/HDFS-16937 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.3.4 Reporter: ZhangHB To better trace the jitter caused by delete rpc, we should also record the number of deleting blocks in audit log. With this information, we can know which user cause the jitter. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16600) Fix deadlock of fine-grain lock for FsDatastImpl of DataNode.
[ https://issues.apache.org/jira/browse/HDFS-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693871#comment-17693871 ] ZhangHB commented on HDFS-16600: [~xuzq_zander] , Hi, brother. Could you please provide some performance result, Thanks. Looking forward to receiving your reply. > Fix deadlock of fine-grain lock for FsDatastImpl of DataNode. > - > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: > {code:java} > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 > need a read lock > try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl, > b.getBlockPoolId())) > // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line > 3526 need a write lock > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16928) Both getCurrentEditLogTxid and getEditsFromTxid should be OperationCategory.WRITE
ZhangHB created HDFS-16928: -- Summary: Both getCurrentEditLogTxid and getEditsFromTxid should be OperationCategory.WRITE Key: HDFS-16928 URL: https://issues.apache.org/jira/browse/HDFS-16928 Project: Hadoop HDFS Issue Type: Improvement Reporter: ZhangHB After HDFS-13183, standby namenode could handler some RPC with OperationCategory.READ. It is controlled by the configuration: dfs.ha.allow.stale.reads. But, these two RPC, should be only handled by Active NameNode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16921) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
[ https://issues.apache.org/jira/browse/HDFS-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB resolved HDFS-16921. Resolution: Duplicate > The logic of IncrementalBlockReportManager#addRDBI method may cause missing > blocks when cluster is busy. > > > Key: HDFS-16921 > URL: https://issues.apache.org/jira/browse/HDFS-16921 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Critical > > The current logic of IncrementalBlockReportManager# addRDBI method could lead > to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16920) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
[ https://issues.apache.org/jira/browse/HDFS-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB resolved HDFS-16920. Resolution: Duplicate > The logic of IncrementalBlockReportManager#addRDBI method may cause missing > blocks when cluster is busy. > > > Key: HDFS-16920 > URL: https://issues.apache.org/jira/browse/HDFS-16920 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Critical > > The current logic of IncrementalBlockReportManager# addRDBI method could lead > to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16919) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
[ https://issues.apache.org/jira/browse/HDFS-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB resolved HDFS-16919. Resolution: Duplicate > The logic of IncrementalBlockReportManager#addRDBI method may cause missing > blocks when cluster is busy. > > > Key: HDFS-16919 > URL: https://issues.apache.org/jira/browse/HDFS-16919 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Critical > > The current logic of IncrementalBlockReportManager# addRDBI method could lead > to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
ZhangHB created HDFS-16922: -- Summary: The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy. Key: HDFS-16922 URL: https://issues.apache.org/jira/browse/HDFS-16922 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: ZhangHB The current logic of IncrementalBlockReportManager# addRDBI method could lead to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16921) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
ZhangHB created HDFS-16921: -- Summary: The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy. Key: HDFS-16921 URL: https://issues.apache.org/jira/browse/HDFS-16921 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.3.4 Reporter: ZhangHB The current logic of IncrementalBlockReportManager# addRDBI method could lead to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16920) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
ZhangHB created HDFS-16920: -- Summary: The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy. Key: HDFS-16920 URL: https://issues.apache.org/jira/browse/HDFS-16920 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.3.4 Reporter: ZhangHB The current logic of IncrementalBlockReportManager# addRDBI method could lead to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16919) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.
ZhangHB created HDFS-16919: -- Summary: The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy. Key: HDFS-16919 URL: https://issues.apache.org/jira/browse/HDFS-16919 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.3.4 Reporter: ZhangHB The current logic of IncrementalBlockReportManager# addRDBI method could lead to the missing blocks when datanodes in pipeline are I/O busy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16915) Optimize metrics for operations hold lock times of FsDatasetImpl
ZhangHB created HDFS-16915: -- Summary: Optimize metrics for operations hold lock times of FsDatasetImpl Key: HDFS-16915 URL: https://issues.apache.org/jira/browse/HDFS-16915 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.3.4 Reporter: ZhangHB Current calculation method also includes the time of waiting lock. So, i think we should optimize the compute method of metrics for operations hold lock times of FsDatasetImpl. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16914) Add some logs for updateBlockForPipeline RPC.
ZhangHB created HDFS-16914: -- Summary: Add some logs for updateBlockForPipeline RPC. Key: HDFS-16914 URL: https://issues.apache.org/jira/browse/HDFS-16914 Project: Hadoop HDFS Issue Type: Improvement Components: namanode Affects Versions: 3.3.4 Reporter: ZhangHB Assignee: ZhangHB Recently,we received an phone alarm about missing blocks. We found logs in one datanode where the block was placed on like below: {code:java} 2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231832415 src: /clientAddress:44638 dest: /localAddress:50010 of size 45733720 2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231826462 src: /upStreamDatanode:60316 dest: /localAddress:50010 of size 45733720 {code} the datanode received the same block with different generation stamp because of socket timeout exception. blk_1305044966_231826462 is received from upstream datanode in pipeline which has two datanodes. blk_1305044966_231832415 is received from client directly. we have search all log info about blk_1305044966 in namenode and three datanodes in original pipeline. but we could not obtain any helpful message about the generation stamp 231826462. After diving into the source code, it was assigned in NameNodeRpcServer#updateBlockForPipeline which was invoked in DataStreamer#setupPipelineInternal. The updateBlockForPipeline RPC does not have any log info. So I think we should add some logs in this RPC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16909) Make judging null statment out from for loop in ReplicaMap#mergeAll method.
ZhangHB created HDFS-16909: -- Summary: Make judging null statment out from for loop in ReplicaMap#mergeAll method. Key: HDFS-16909 URL: https://issues.apache.org/jira/browse/HDFS-16909 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.3.4 Reporter: ZhangHB Currently, the code is as below: {code:java} for (ReplicaInfo replicaInfo : replicaSet) { checkBlock(replicaInfo); if (curSet == null) { // Add an entry for block pool if it does not exist already curSet = new LightWeightResizableGSet<>(); map.put(bp, curSet); } curSet.put(replicaInfo); } {code} the statment : {code:java} if(curSet == null){code} should be moved to outside from the for loop. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method
[ https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684368#comment-17684368 ] ZhangHB commented on HDFS-16900: So , i think this ISSUE can be closed~. > Method DataNode#isWrite seems not working in DataTransfer constructor method > > > Key: HDFS-16900 > URL: https://issues.apache.org/jira/browse/HDFS-16900 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > > In constructor method of DataTransfer, there is codes below: > {code:java} > if (isTransfer(stage, clientname)) { > this.throttler = xserver.getTransferThrottler(); > } else if(isWrite(stage)) { > this.throttler = xserver.getWriteThrottler(); > } {code} > the stage is a parameter of DataTransfer Constructor. Let us see where > instantiate DataTransfer object. > In method transferReplicaForPipelineRecovery, codes like below: > {code:java} > final DataTransfer dataTransferTask = new DataTransfer(targets, > targetStorageTypes, targetStorageIds, b, stage, client); {code} > but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or > PIPELINE_SETUP_APPEND_RECOVERY. > It can only be TRANSFER_RBW or TRANSFER_FINALIZED. So I think the method > isWrite is not working. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method
[ https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684366#comment-17684366 ] ZhangHB commented on HDFS-16900: Hi, [~elgoiri] . After reading the codes here, I found the original codes is right. There is no need to change the isWrite logic here because when recovery progress occures, createBlockOutputStream will be invoked in setupPipelineInternal method. In createBlockOutputStream method, it has code below: {code:java} BlockConstructionStage bcs = recoveryFlag ? stage.getRecoveryStage() : stage; {code} then passes the bcs to writeBlock method. in writeBlock method, it will use dataXceiverServer.getWriteThrottler(). > Method DataNode#isWrite seems not working in DataTransfer constructor method > > > Key: HDFS-16900 > URL: https://issues.apache.org/jira/browse/HDFS-16900 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > > In constructor method of DataTransfer, there is codes below: > {code:java} > if (isTransfer(stage, clientname)) { > this.throttler = xserver.getTransferThrottler(); > } else if(isWrite(stage)) { > this.throttler = xserver.getWriteThrottler(); > } {code} > the stage is a parameter of DataTransfer Constructor. Let us see where > instantiate DataTransfer object. > In method transferReplicaForPipelineRecovery, codes like below: > {code:java} > final DataTransfer dataTransferTask = new DataTransfer(targets, > targetStorageTypes, targetStorageIds, b, stage, client); {code} > but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or > PIPELINE_SETUP_APPEND_RECOVERY. > It can only be TRANSFER_RBW or TRANSFER_FINALIZED. So I think the method > isWrite is not working. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16908) Fix javadoc of field IncrementalBlockReportManager#readyToSend.
[ https://issues.apache.org/jira/browse/HDFS-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB updated HDFS-16908: --- Description: fix javadoc of field IncrementalBlockReportManager#readyToSend. in sendImmediately(), readyToSend will be used with {{monotonicNow() - ibrInterval >= lastIBR}} condition. So, we should update the javadoc of it. was:IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not. > Fix javadoc of field IncrementalBlockReportManager#readyToSend. > --- > > Key: HDFS-16908 > URL: https://issues.apache.org/jira/browse/HDFS-16908 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > fix javadoc of field IncrementalBlockReportManager#readyToSend. > in sendImmediately(), readyToSend will be used with {{monotonicNow() - > ibrInterval >= lastIBR}} condition. So, we should update the javadoc of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16908) HDFS-16908. Fix javadoc of field IncrementalBlockReportManager#readyToSend.
[ https://issues.apache.org/jira/browse/HDFS-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB updated HDFS-16908: --- Summary: HDFS-16908. Fix javadoc of field IncrementalBlockReportManager#readyToSend. (was: IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not.) > HDFS-16908. Fix javadoc of field IncrementalBlockReportManager#readyToSend. > --- > > Key: HDFS-16908 > URL: https://issues.apache.org/jira/browse/HDFS-16908 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > IncrementalBlockReportManager#sendImmediately should use or logic to decide > whether send immediately or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16908) Fix javadoc of field IncrementalBlockReportManager#readyToSend.
[ https://issues.apache.org/jira/browse/HDFS-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB updated HDFS-16908: --- Summary: Fix javadoc of field IncrementalBlockReportManager#readyToSend. (was: HDFS-16908. Fix javadoc of field IncrementalBlockReportManager#readyToSend.) > Fix javadoc of field IncrementalBlockReportManager#readyToSend. > --- > > Key: HDFS-16908 > URL: https://issues.apache.org/jira/browse/HDFS-16908 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > IncrementalBlockReportManager#sendImmediately should use or logic to decide > whether send immediately or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16908) IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not.
ZhangHB created HDFS-16908: -- Summary: IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not. Key: HDFS-16908 URL: https://issues.apache.org/jira/browse/HDFS-16908 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 3.3.4 Reporter: ZhangHB IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14795) Add Throttler for writing block
[ https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683822#comment-17683822 ] ZhangHB commented on HDFS-14795: Hi, [~leosun08] . why do we do this desing : in stage of PIPELINE_SETUP_APPEND_RECOVERY or PIPELINE_SETUP_STREAMING_RECOVERY, default throttler is still null. could you please help me figure it out? thanks a lot. > Add Throttler for writing block > --- > > Key: HDFS-14795 > URL: https://issues.apache.org/jira/browse/HDFS-14795 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Minor > Fix For: 3.3.0 > > Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, > HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, > HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, > HDFS-14795.009.patch, HDFS-14795.010.patch, HDFS-14795.011.patch, > HDFS-14795.012.patch > > > DataXceiver#writeBlock > {code:java} > blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut, > mirrorAddr, null, targets, false); > {code} > As above code, DataXceiver#writeBlock doesn't throttler. > I think it is necessary to throttle for writing block, while add throttler > in stage of PIPELINE_SETUP_APPEND_RECOVERY or > PIPELINE_SETUP_STREAMING_RECOVERY. > Default throttler value is still null. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet
ZhangHB created HDFS-16903: -- Summary: Fix javadoc of Class LightWeightResizableGSet Key: HDFS-16903 URL: https://issues.apache.org/jira/browse/HDFS-16903 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, hdfs Affects Versions: 3.3.4 Reporter: ZhangHB After [HDFS-16249. Add DataSetLockManager to manage fine-grain locks for FsDataSetImpl.], the Class LightWeightResizableGSet is thread-safe. So we should fix the docs of it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method
[ https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682871#comment-17682871 ] ZhangHB commented on HDFS-16900: yes, [~elgoiri] . I will fix it soonly. thanks for you replying~ > Method DataNode#isWrite seems not working in DataTransfer constructor method > > > Key: HDFS-16900 > URL: https://issues.apache.org/jira/browse/HDFS-16900 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > > In constructor method of DataTransfer, there is codes below: > {code:java} > if (isTransfer(stage, clientname)) { > this.throttler = xserver.getTransferThrottler(); > } else if(isWrite(stage)) { > this.throttler = xserver.getWriteThrottler(); > } {code} > the stage is a parameter of DataTransfer Constructor. Let us see where > instantiate DataTransfer object. > In method transferReplicaForPipelineRecovery, codes like below: > {code:java} > final DataTransfer dataTransferTask = new DataTransfer(targets, > targetStorageTypes, targetStorageIds, b, stage, client); {code} > but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or > PIPELINE_SETUP_APPEND_RECOVERY. > It can only be TRANSFER_RBW or TRANSFER_FINALIZED. So I think the method > isWrite is not working. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method
[ https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682569#comment-17682569 ] ZhangHB commented on HDFS-16900: Hi, [~inigoiri] , [~sunlisheng] , could you please have a look at this. > Method DataNode#isWrite seems not working in DataTransfer constructor method > > > Key: HDFS-16900 > URL: https://issues.apache.org/jira/browse/HDFS-16900 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > > In constructor method of DataTransfer, there is codes below: > {code:java} > if (isTransfer(stage, clientname)) { > this.throttler = xserver.getTransferThrottler(); > } else if(isWrite(stage)) { > this.throttler = xserver.getWriteThrottler(); > } {code} > the stage is a parameter of DataTransfer Constructor. Let us see where > instantiate DataTransfer object. > In method transferReplicaForPipelineRecovery, codes like below: > {code:java} > final DataTransfer dataTransferTask = new DataTransfer(targets, > targetStorageTypes, targetStorageIds, b, stage, client); {code} > but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or > PIPELINE_SETUP_APPEND_RECOVERY. > It can only be TRANSFER_RBW or TRANSFER_FINALIZED. So I think the method > isWrite is not working. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method
ZhangHB created HDFS-16900: -- Summary: Method DataNode#isWrite seems not working in DataTransfer constructor method Key: HDFS-16900 URL: https://issues.apache.org/jira/browse/HDFS-16900 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.3.4 Reporter: ZhangHB In constructor method of DataTransfer, there is codes below: {code:java} if (isTransfer(stage, clientname)) { this.throttler = xserver.getTransferThrottler(); } else if(isWrite(stage)) { this.throttler = xserver.getWriteThrottler(); } {code} the stage is a parameter of DataTransfer Constructor. Let us see where instantiate DataTransfer object. In method transferReplicaForPipelineRecovery, codes like below: {code:java} final DataTransfer dataTransferTask = new DataTransfer(targets, targetStorageTypes, targetStorageIds, b, stage, client); {code} but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or PIPELINE_SETUP_APPEND_RECOVERY. It can only be TRANSFER_RBW or TRANSFER_FINALIZED. So I think the method isWrite is not working. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16898) Make write lock fine-grain in processCommandFromActor method
ZhangHB created HDFS-16898: -- Summary: Make write lock fine-grain in processCommandFromActor method Key: HDFS-16898 URL: https://issues.apache.org/jira/browse/HDFS-16898 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.3.4 Reporter: ZhangHB Now in method processCommandFromActor, we have code like below: {code:java} writeLock(); try { if (actor == bpServiceToActive) { return processCommandFromActive(cmd, actor); } else { return processCommandFromStandby(cmd, actor); } } finally { writeUnlock(); } {code} if method processCommandFromActive costs much time, the write lock would not release. It maybe block the updateActorStatesFromHeartbeat method in offerService,furthermore, it can cause the lastcontact of datanode very high, even dead when lastcontact beyond 600s. {code:java} bpos.updateActorStatesFromHeartbeat( this, resp.getNameNodeHaState());{code} here we can make write lock fine-grain in processCommandFromActor method to address this problem -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16882) RBF: Add cache hit rate metric in MountTableResolver#getDestinationForPath
[ https://issues.apache.org/jira/browse/HDFS-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangHB updated HDFS-16882: --- Attachment: locationCache.png > RBF: Add cache hit rate metric in MountTableResolver#getDestinationForPath > -- > > Key: HDFS-16882 > URL: https://issues.apache.org/jira/browse/HDFS-16882 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Minor > Labels: pull-request-available > Attachments: locationCache.png > > > Currently, the default value of > "dfs.federation.router.mount-table.cache.enable" is true and the default > value of "dfs.federation.router.mount-table.max-cache-size" is 1. > But there is no metric that display cache hit rate, I think we can add a hit > rate metric to watch the Cache performance and better tuning the parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16882) Add cache hit rate metric in MountTableResolver#getDestinationForPath
ZhangHB created HDFS-16882: -- Summary: Add cache hit rate metric in MountTableResolver#getDestinationForPath Key: HDFS-16882 URL: https://issues.apache.org/jira/browse/HDFS-16882 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.3.4 Reporter: ZhangHB Currently, the default value of "dfs.federation.router.mount-table.cache.enable" is ture, the default value of "dfs.federation.router.mount-table.max-cache-size" is 1. But there is no metric that display cache hit rate, I think we can add a hit rate metric to watch the Cache performance and better tuning the parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
ZhangHB created HDFS-16880: -- Summary: modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info. Key: HDFS-16880 URL: https://issues.apache.org/jira/browse/HDFS-16880 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.3.4 Reporter: ZhangHB We found lots of INFO level log like below: {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: / is closed by DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 {quote} It lost the real path of completeFile. Actually this is caused by : *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* In this method, it instantiates a RemoteLocationContext object: *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* and then execute: *Object[] params = method.getParams(loc);* The problem is right here, becasuse we always use new RemoteParam(), so, context.getDest() always return "/"; That's why we saw lots of incorrect logs. After diving into invokeSingleXXX source code, I found the following RPCs classified as need actual src and not need actual src. *need src path RPC:* addBlock、abandonBlock、getAdditionalDatanode、complete *not need src path RPC:* updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked by: getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) After changes, the src can be pass to NN correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16859) acquirePermit : move LOG.debug into if condition
ZhangHB created HDFS-16859: -- Summary: acquirePermit : move LOG.debug into if condition Key: HDFS-16859 URL: https://issues.apache.org/jira/browse/HDFS-16859 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.3.4 Reporter: ZhangHB The invoke frequency of method AbstractRouterRpcFairnessPolicyController#acquirePermit is high. before getting the permit of a nameservice, there is always a LOG.debug statement. It is better to move the statement into if condition statement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org