[jira] [Created] (HDFS-16989) Large scale block transfer causes too many excess blocks

2023-04-23 Thread ZhangHB (Jira)
ZhangHB created HDFS-16989:
--

 Summary: Large scale block transfer causes too many excess blocks
 Key: HDFS-16989
 URL: https://issues.apache.org/jira/browse/HDFS-16989
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.3.5, 3.4.0
Reporter: ZhangHB






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16937) Delete RPC should also record number of delete blocks in audit log

2023-02-27 Thread ZhangHB (Jira)
ZhangHB created HDFS-16937:
--

 Summary: Delete RPC should also record number of delete blocks in 
audit log
 Key: HDFS-16937
 URL: https://issues.apache.org/jira/browse/HDFS-16937
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: ZhangHB


To better trace the jitter caused by delete rpc,  we should also record the 
number of deleting blocks in audit log. With this information, we can know 
which user cause the jitter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16600) Fix deadlock of fine-grain lock for FsDatastImpl of DataNode.

2023-02-27 Thread ZhangHB (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17693871#comment-17693871
 ] 

ZhangHB commented on HDFS-16600:


[~xuzq_zander] , Hi, brother. Could you please provide some performance result, 
Thanks. Looking forward to receiving your reply.

> Fix deadlock of fine-grain lock for FsDatastImpl of DataNode.
> -
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16928) Both getCurrentEditLogTxid and getEditsFromTxid should be OperationCategory.WRITE

2023-02-21 Thread ZhangHB (Jira)
ZhangHB created HDFS-16928:
--

 Summary: Both getCurrentEditLogTxid and getEditsFromTxid should be 
OperationCategory.WRITE
 Key: HDFS-16928
 URL: https://issues.apache.org/jira/browse/HDFS-16928
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: ZhangHB


After HDFS-13183, standby namenode could handler some RPC with 
OperationCategory.READ.  It is controlled by the configuration: 
dfs.ha.allow.stale.reads.

But, these two RPC, should be only handled by Active NameNode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16921) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB resolved HDFS-16921.

Resolution: Duplicate

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16921
> URL: https://issues.apache.org/jira/browse/HDFS-16921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Critical
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16920) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB resolved HDFS-16920.

Resolution: Duplicate

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16920
> URL: https://issues.apache.org/jira/browse/HDFS-16920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Critical
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16919) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB resolved HDFS-16919.

Resolution: Duplicate

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16919
> URL: https://issues.apache.org/jira/browse/HDFS-16919
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Critical
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16922:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16922
 URL: https://issues.apache.org/jira/browse/HDFS-16922
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16921) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16921:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16921
 URL: https://issues.apache.org/jira/browse/HDFS-16921
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16920) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16920:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16920
 URL: https://issues.apache.org/jira/browse/HDFS-16920
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16919) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16919:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16919
 URL: https://issues.apache.org/jira/browse/HDFS-16919
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16915) Optimize metrics for operations hold lock times of FsDatasetImpl

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16915:
--

 Summary: Optimize metrics for operations hold lock times of 
FsDatasetImpl
 Key: HDFS-16915
 URL: https://issues.apache.org/jira/browse/HDFS-16915
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: ZhangHB


Current calculation method also includes the time of waiting lock. So, i think 
we should optimize the compute method of metrics for operations hold lock times 
of FsDatasetImpl.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16914) Add some logs for updateBlockForPipeline RPC.

2023-02-09 Thread ZhangHB (Jira)
ZhangHB created HDFS-16914:
--

 Summary: Add some logs for updateBlockForPipeline RPC.
 Key: HDFS-16914
 URL: https://issues.apache.org/jira/browse/HDFS-16914
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namanode
Affects Versions: 3.3.4
Reporter: ZhangHB
Assignee: ZhangHB


Recently,we received an phone alarm about missing blocks.  We found logs in one 
datanode where the block was placed on  like below:

 
{code:java}
2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231832415 src: 
/clientAddress:44638 dest: /localAddress:50010 of size 45733720

2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231826462 src: 
/upStreamDatanode:60316 dest: /localAddress:50010 of size 45733720 {code}
the datanode received the same block with different generation stamp because of 
socket timeout exception.  blk_1305044966_231826462 is received from upstream 
datanode in pipeline which has two datanodes.  blk_1305044966_231832415 is 
received from client directly.   

 

we have search all log info about blk_1305044966 in namenode and three 
datanodes in original pipeline. but we could not obtain any helpful message 
about the generation stamp 231826462.  After diving into the source code,  it 
was assigned in NameNodeRpcServer#updateBlockForPipeline which was invoked in 
DataStreamer#setupPipelineInternal.   The updateBlockForPipeline RPC does not 
have any log info. So I think we should add some logs in this RPC.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16909) Make judging null statment out from for loop in ReplicaMap#mergeAll method.

2023-02-05 Thread ZhangHB (Jira)
ZhangHB created HDFS-16909:
--

 Summary: Make judging null statment out from for loop in 
ReplicaMap#mergeAll method.
 Key: HDFS-16909
 URL: https://issues.apache.org/jira/browse/HDFS-16909
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


Currently, the code is as below:
{code:java}
for (ReplicaInfo replicaInfo : replicaSet) {
  checkBlock(replicaInfo);
  if (curSet == null) {
// Add an entry for block pool if it does not exist already
curSet = new LightWeightResizableGSet<>();
map.put(bp, curSet);
  }
  curSet.put(replicaInfo);
} {code}
the statment :
{code:java}
if(curSet == null){code}
should be moved to outside from the for loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-02-05 Thread ZhangHB (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684368#comment-17684368
 ] 

ZhangHB commented on HDFS-16900:


So , i think this ISSUE can be closed~. 

> Method DataNode#isWrite seems not working in DataTransfer constructor method
> 
>
> Key: HDFS-16900
> URL: https://issues.apache.org/jira/browse/HDFS-16900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>
> In constructor method of DataTransfer, there is codes below:
> {code:java}
> if (isTransfer(stage, clientname)) {
>   this.throttler = xserver.getTransferThrottler();
> } else if(isWrite(stage)) {
>   this.throttler = xserver.getWriteThrottler();
> } {code}
> the stage is a parameter of DataTransfer Constructor. Let us see where 
> instantiate DataTransfer object.
> In method transferReplicaForPipelineRecovery, codes like below:
> {code:java}
> final DataTransfer dataTransferTask = new DataTransfer(targets,
> targetStorageTypes, targetStorageIds, b, stage, client); {code}
> but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
> PIPELINE_SETUP_APPEND_RECOVERY.
> It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
> isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-02-05 Thread ZhangHB (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17684366#comment-17684366
 ] 

ZhangHB commented on HDFS-16900:


Hi, [~elgoiri] .  After reading the codes here, I found the original codes is 
right. There is no need to change the isWrite logic here because when recovery 
progress occures,  createBlockOutputStream will be invoked in 
setupPipelineInternal method. In createBlockOutputStream method, it has code 
below:

 
{code:java}
BlockConstructionStage bcs = recoveryFlag ?
stage.getRecoveryStage() : stage; {code}
 

then passes the bcs to writeBlock method.  in writeBlock method, it will use 
dataXceiverServer.getWriteThrottler().

 

> Method DataNode#isWrite seems not working in DataTransfer constructor method
> 
>
> Key: HDFS-16900
> URL: https://issues.apache.org/jira/browse/HDFS-16900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>
> In constructor method of DataTransfer, there is codes below:
> {code:java}
> if (isTransfer(stage, clientname)) {
>   this.throttler = xserver.getTransferThrottler();
> } else if(isWrite(stage)) {
>   this.throttler = xserver.getWriteThrottler();
> } {code}
> the stage is a parameter of DataTransfer Constructor. Let us see where 
> instantiate DataTransfer object.
> In method transferReplicaForPipelineRecovery, codes like below:
> {code:java}
> final DataTransfer dataTransferTask = new DataTransfer(targets,
> targetStorageTypes, targetStorageIds, b, stage, client); {code}
> but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
> PIPELINE_SETUP_APPEND_RECOVERY.
> It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
> isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16908) Fix javadoc of field IncrementalBlockReportManager#readyToSend.

2023-02-04 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB updated HDFS-16908:
---
Description: 
fix javadoc of field IncrementalBlockReportManager#readyToSend.
in sendImmediately(), readyToSend will be used with {{monotonicNow() - 
ibrInterval >= lastIBR}} condition. So, we should update the javadoc of it.

  was:IncrementalBlockReportManager#sendImmediately should use or logic to 
decide whether send immediately or not.


> Fix javadoc of field IncrementalBlockReportManager#readyToSend.
> ---
>
> Key: HDFS-16908
> URL: https://issues.apache.org/jira/browse/HDFS-16908
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> fix javadoc of field IncrementalBlockReportManager#readyToSend.
> in sendImmediately(), readyToSend will be used with {{monotonicNow() - 
> ibrInterval >= lastIBR}} condition. So, we should update the javadoc of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16908) HDFS-16908. Fix javadoc of field IncrementalBlockReportManager#readyToSend.

2023-02-04 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB updated HDFS-16908:
---
Summary: HDFS-16908. Fix javadoc of field 
IncrementalBlockReportManager#readyToSend.  (was: 
IncrementalBlockReportManager#sendImmediately should use or logic to decide 
whether send immediately or not.)

> HDFS-16908. Fix javadoc of field IncrementalBlockReportManager#readyToSend.
> ---
>
> Key: HDFS-16908
> URL: https://issues.apache.org/jira/browse/HDFS-16908
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> IncrementalBlockReportManager#sendImmediately should use or logic to decide 
> whether send immediately or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16908) Fix javadoc of field IncrementalBlockReportManager#readyToSend.

2023-02-04 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB updated HDFS-16908:
---
Summary: Fix javadoc of field IncrementalBlockReportManager#readyToSend.  
(was: HDFS-16908. Fix javadoc of field 
IncrementalBlockReportManager#readyToSend.)

> Fix javadoc of field IncrementalBlockReportManager#readyToSend.
> ---
>
> Key: HDFS-16908
> URL: https://issues.apache.org/jira/browse/HDFS-16908
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> IncrementalBlockReportManager#sendImmediately should use or logic to decide 
> whether send immediately or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16908) IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not.

2023-02-03 Thread ZhangHB (Jira)
ZhangHB created HDFS-16908:
--

 Summary: IncrementalBlockReportManager#sendImmediately should use 
or logic to decide whether send immediately or not.
 Key: HDFS-16908
 URL: https://issues.apache.org/jira/browse/HDFS-16908
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


IncrementalBlockReportManager#sendImmediately should use or logic to decide 
whether send immediately or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14795) Add Throttler for writing block

2023-02-03 Thread ZhangHB (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683822#comment-17683822
 ] 

ZhangHB commented on HDFS-14795:


Hi, [~leosun08] . why do we do this desing : in stage of 
PIPELINE_SETUP_APPEND_RECOVERY or PIPELINE_SETUP_STREAMING_RECOVERY, default 
throttler is still null.

could you please help me figure it out?  thanks a lot.

> Add Throttler for writing block
> ---
>
> Key: HDFS-14795
> URL: https://issues.apache.org/jira/browse/HDFS-14795
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: HDFS-14795.001.patch, HDFS-14795.002.patch, 
> HDFS-14795.003.patch, HDFS-14795.004.patch, HDFS-14795.005.patch, 
> HDFS-14795.006.patch, HDFS-14795.007.patch, HDFS-14795.008.patch, 
> HDFS-14795.009.patch, HDFS-14795.010.patch, HDFS-14795.011.patch, 
> HDFS-14795.012.patch
>
>
> DataXceiver#writeBlock
> {code:java}
> blockReceiver.receiveBlock(mirrorOut, mirrorIn, replyOut,
> mirrorAddr, null, targets, false);
> {code}
> As above code, DataXceiver#writeBlock doesn't throttler.
>  I think it is necessary to throttle for writing block, while add throttler 
> in stage of PIPELINE_SETUP_APPEND_RECOVERY or 
> PIPELINE_SETUP_STREAMING_RECOVERY.
> Default throttler value is still null.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet

2023-02-01 Thread ZhangHB (Jira)
ZhangHB created HDFS-16903:
--

 Summary: Fix javadoc of Class LightWeightResizableGSet
 Key: HDFS-16903
 URL: https://issues.apache.org/jira/browse/HDFS-16903
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs
Affects Versions: 3.3.4
Reporter: ZhangHB


After [HDFS-16249. Add DataSetLockManager to manage fine-grain locks for 
FsDataSetImpl.], the Class LightWeightResizableGSet is thread-safe. So we 
should fix the docs of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-01-31 Thread ZhangHB (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682871#comment-17682871
 ] 

ZhangHB commented on HDFS-16900:


yes, [~elgoiri] .  I will fix it soonly. thanks for you replying~

> Method DataNode#isWrite seems not working in DataTransfer constructor method
> 
>
> Key: HDFS-16900
> URL: https://issues.apache.org/jira/browse/HDFS-16900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>
> In constructor method of DataTransfer, there is codes below:
> {code:java}
> if (isTransfer(stage, clientname)) {
>   this.throttler = xserver.getTransferThrottler();
> } else if(isWrite(stage)) {
>   this.throttler = xserver.getWriteThrottler();
> } {code}
> the stage is a parameter of DataTransfer Constructor. Let us see where 
> instantiate DataTransfer object.
> In method transferReplicaForPipelineRecovery, codes like below:
> {code:java}
> final DataTransfer dataTransferTask = new DataTransfer(targets,
> targetStorageTypes, targetStorageIds, b, stage, client); {code}
> but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
> PIPELINE_SETUP_APPEND_RECOVERY.
> It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
> isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-01-31 Thread ZhangHB (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682569#comment-17682569
 ] 

ZhangHB commented on HDFS-16900:


Hi, [~inigoiri] , [~sunlisheng] , could you please have a look at this.

> Method DataNode#isWrite seems not working in DataTransfer constructor method
> 
>
> Key: HDFS-16900
> URL: https://issues.apache.org/jira/browse/HDFS-16900
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>
> In constructor method of DataTransfer, there is codes below:
> {code:java}
> if (isTransfer(stage, clientname)) {
>   this.throttler = xserver.getTransferThrottler();
> } else if(isWrite(stage)) {
>   this.throttler = xserver.getWriteThrottler();
> } {code}
> the stage is a parameter of DataTransfer Constructor. Let us see where 
> instantiate DataTransfer object.
> In method transferReplicaForPipelineRecovery, codes like below:
> {code:java}
> final DataTransfer dataTransferTask = new DataTransfer(targets,
> targetStorageTypes, targetStorageIds, b, stage, client); {code}
> but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
> PIPELINE_SETUP_APPEND_RECOVERY.
> It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
> isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-01-31 Thread ZhangHB (Jira)
ZhangHB created HDFS-16900:
--

 Summary: Method DataNode#isWrite seems not working in DataTransfer 
constructor method
 Key: HDFS-16900
 URL: https://issues.apache.org/jira/browse/HDFS-16900
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


In constructor method of DataTransfer, there is codes below:
{code:java}
if (isTransfer(stage, clientname)) {
  this.throttler = xserver.getTransferThrottler();
} else if(isWrite(stage)) {
  this.throttler = xserver.getWriteThrottler();
} {code}
the stage is a parameter of DataTransfer Constructor. Let us see where 
instantiate DataTransfer object.

In method transferReplicaForPipelineRecovery, codes like below:
{code:java}
final DataTransfer dataTransferTask = new DataTransfer(targets,
targetStorageTypes, targetStorageIds, b, stage, client); {code}
but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
PIPELINE_SETUP_APPEND_RECOVERY.

It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16898) Make write lock fine-grain in processCommandFromActor method

2023-01-29 Thread ZhangHB (Jira)
ZhangHB created HDFS-16898:
--

 Summary: Make write lock fine-grain in processCommandFromActor 
method
 Key: HDFS-16898
 URL: https://issues.apache.org/jira/browse/HDFS-16898
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: ZhangHB


Now in method processCommandFromActor,  we have code like below:

 
{code:java}
writeLock();
try {
  if (actor == bpServiceToActive) {
return processCommandFromActive(cmd, actor);
  } else {
return processCommandFromStandby(cmd, actor);
  }
} finally {
  writeUnlock();
} {code}
if method processCommandFromActive costs much time, the write lock would not 
release.

 

It maybe block the updateActorStatesFromHeartbeat method in 
offerService,furthermore, it can cause the lastcontact of datanode very high, 
even dead when lastcontact beyond 600s.
{code:java}
bpos.updateActorStatesFromHeartbeat(
this, resp.getNameNodeHaState());{code}
here we can make write lock fine-grain in processCommandFromActor method to 
address this problem

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16882) RBF: Add cache hit rate metric in MountTableResolver#getDestinationForPath

2023-01-05 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB updated HDFS-16882:
---
Attachment: locationCache.png

> RBF: Add cache hit rate metric in MountTableResolver#getDestinationForPath
> --
>
> Key: HDFS-16882
> URL: https://issues.apache.org/jira/browse/HDFS-16882
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Minor
>  Labels: pull-request-available
> Attachments: locationCache.png
>
>
> Currently, the default value of 
> "dfs.federation.router.mount-table.cache.enable" is true and the default 
> value of "dfs.federation.router.mount-table.max-cache-size" is 1.
> But there is no metric that display cache hit rate, I think we can add a hit 
> rate metric to watch the Cache performance and better tuning the parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16882) Add cache hit rate metric in MountTableResolver#getDestinationForPath

2023-01-03 Thread ZhangHB (Jira)
ZhangHB created HDFS-16882:
--

 Summary: Add cache hit rate metric in 
MountTableResolver#getDestinationForPath
 Key: HDFS-16882
 URL: https://issues.apache.org/jira/browse/HDFS-16882
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


Currently,  the default value of 
"dfs.federation.router.mount-table.cache.enable" is ture,

the default value of "dfs.federation.router.mount-table.max-cache-size" is 
1.

But there is no metric that display cache hit rate, I think we can add a hit 
rate metric to watch the Cache performance and better tuning the parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2022-12-29 Thread ZhangHB (Jira)
ZhangHB created HDFS-16880:
--

 Summary: modify invokeSingleXXX interface in order to pass actual 
file src to namenode for debug info.
 Key: HDFS-16880
 URL: https://issues.apache.org/jira/browse/HDFS-16880
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


We found lots of INFO level log like below:
{quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: / is closed by 
DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
{quote}
It lost the real path of completeFile. Actually this is caused by : 

 
*org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
 org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*

In this method, it instantiates a RemoteLocationContext object:

*RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*

and then execute: *Object[] params = method.getParams(loc);*

The problem is right here, becasuse we always use new RemoteParam(), so, 

context.getDest() always return "/"; That's why we saw lots of incorrect logs.

 

After diving into invokeSingleXXX source code, I found the following RPCs 
classified as need actual src and not need actual src.

 

*need src path RPC:*

addBlock、abandonBlock、getAdditionalDatanode、complete

*not need src path RPC:*

updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
 by: 
getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)

 

After changes, the src can be pass to NN correctly.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16859) acquirePermit : move LOG.debug into if condition

2022-12-02 Thread ZhangHB (Jira)
ZhangHB created HDFS-16859:
--

 Summary: acquirePermit : move LOG.debug into if condition
 Key: HDFS-16859
 URL: https://issues.apache.org/jira/browse/HDFS-16859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


The invoke frequency of method 
AbstractRouterRpcFairnessPolicyController#acquirePermit is high.  before 
getting the permit of a nameservice, there is always a LOG.debug statement.

It is better to move the statement into if condition statement. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org