[jira] [Created] (HDFS-16989) Large scale block transfer causes too many excess blocks

2023-04-23 Thread ZhangHB (Jira)
ZhangHB created HDFS-16989:
--

 Summary: Large scale block transfer causes too many excess blocks
 Key: HDFS-16989
 URL: https://issues.apache.org/jira/browse/HDFS-16989
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.3.5, 3.4.0
Reporter: ZhangHB






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16937) Delete RPC should also record number of delete blocks in audit log

2023-02-27 Thread ZhangHB (Jira)
ZhangHB created HDFS-16937:
--

 Summary: Delete RPC should also record number of delete blocks in 
audit log
 Key: HDFS-16937
 URL: https://issues.apache.org/jira/browse/HDFS-16937
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: ZhangHB


To better trace the jitter caused by delete rpc,  we should also record the 
number of deleting blocks in audit log. With this information, we can know 
which user cause the jitter.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16928) Both getCurrentEditLogTxid and getEditsFromTxid should be OperationCategory.WRITE

2023-02-21 Thread ZhangHB (Jira)
ZhangHB created HDFS-16928:
--

 Summary: Both getCurrentEditLogTxid and getEditsFromTxid should be 
OperationCategory.WRITE
 Key: HDFS-16928
 URL: https://issues.apache.org/jira/browse/HDFS-16928
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: ZhangHB


After HDFS-13183, standby namenode could handler some RPC with 
OperationCategory.READ.  It is controlled by the configuration: 
dfs.ha.allow.stale.reads.

But, these two RPC, should be only handled by Active NameNode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16921) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB resolved HDFS-16921.

Resolution: Duplicate

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16921
> URL: https://issues.apache.org/jira/browse/HDFS-16921
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Critical
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16920) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB resolved HDFS-16920.

Resolution: Duplicate

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16920
> URL: https://issues.apache.org/jira/browse/HDFS-16920
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Critical
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16919) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhangHB resolved HDFS-16919.

Resolution: Duplicate

> The logic of IncrementalBlockReportManager#addRDBI method may cause missing 
> blocks when cluster is busy.
> 
>
> Key: HDFS-16919
> URL: https://issues.apache.org/jira/browse/HDFS-16919
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Critical
>
> The current logic of IncrementalBlockReportManager# addRDBI method could lead 
> to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16922) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16922:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16922
 URL: https://issues.apache.org/jira/browse/HDFS-16922
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16919) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16919:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16919
 URL: https://issues.apache.org/jira/browse/HDFS-16919
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16921) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16921:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16921
 URL: https://issues.apache.org/jira/browse/HDFS-16921
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16920) The logic of IncrementalBlockReportManager#addRDBI method may cause missing blocks when cluster is busy.

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16920:
--

 Summary: The logic of IncrementalBlockReportManager#addRDBI method 
may cause missing blocks when cluster is busy.
 Key: HDFS-16920
 URL: https://issues.apache.org/jira/browse/HDFS-16920
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


The current logic of IncrementalBlockReportManager# addRDBI method could lead 
to the missing blocks when datanodes in pipeline are I/O busy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16915) Optimize metrics for operations hold lock times of FsDatasetImpl

2023-02-14 Thread ZhangHB (Jira)
ZhangHB created HDFS-16915:
--

 Summary: Optimize metrics for operations hold lock times of 
FsDatasetImpl
 Key: HDFS-16915
 URL: https://issues.apache.org/jira/browse/HDFS-16915
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: ZhangHB


Current calculation method also includes the time of waiting lock. So, i think 
we should optimize the compute method of metrics for operations hold lock times 
of FsDatasetImpl.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16914) Add some logs for updateBlockForPipeline RPC.

2023-02-09 Thread ZhangHB (Jira)
ZhangHB created HDFS-16914:
--

 Summary: Add some logs for updateBlockForPipeline RPC.
 Key: HDFS-16914
 URL: https://issues.apache.org/jira/browse/HDFS-16914
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namanode
Affects Versions: 3.3.4
Reporter: ZhangHB
Assignee: ZhangHB


Recently,we received an phone alarm about missing blocks.  We found logs in one 
datanode where the block was placed on  like below:

 
{code:java}
2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231832415 src: 
/clientAddress:44638 dest: /localAddress:50010 of size 45733720

2023-02-09 15:05:10,376 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Received BP-578784987-x.x.x.x-1667291826362:blk_1305044966_231826462 src: 
/upStreamDatanode:60316 dest: /localAddress:50010 of size 45733720 {code}
the datanode received the same block with different generation stamp because of 
socket timeout exception.  blk_1305044966_231826462 is received from upstream 
datanode in pipeline which has two datanodes.  blk_1305044966_231832415 is 
received from client directly.   

 

we have search all log info about blk_1305044966 in namenode and three 
datanodes in original pipeline. but we could not obtain any helpful message 
about the generation stamp 231826462.  After diving into the source code,  it 
was assigned in NameNodeRpcServer#updateBlockForPipeline which was invoked in 
DataStreamer#setupPipelineInternal.   The updateBlockForPipeline RPC does not 
have any log info. So I think we should add some logs in this RPC.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16909) Make judging null statment out from for loop in ReplicaMap#mergeAll method.

2023-02-05 Thread ZhangHB (Jira)
ZhangHB created HDFS-16909:
--

 Summary: Make judging null statment out from for loop in 
ReplicaMap#mergeAll method.
 Key: HDFS-16909
 URL: https://issues.apache.org/jira/browse/HDFS-16909
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


Currently, the code is as below:
{code:java}
for (ReplicaInfo replicaInfo : replicaSet) {
  checkBlock(replicaInfo);
  if (curSet == null) {
// Add an entry for block pool if it does not exist already
curSet = new LightWeightResizableGSet<>();
map.put(bp, curSet);
  }
  curSet.put(replicaInfo);
} {code}
the statment :
{code:java}
if(curSet == null){code}
should be moved to outside from the for loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16908) IncrementalBlockReportManager#sendImmediately should use or logic to decide whether send immediately or not.

2023-02-03 Thread ZhangHB (Jira)
ZhangHB created HDFS-16908:
--

 Summary: IncrementalBlockReportManager#sendImmediately should use 
or logic to decide whether send immediately or not.
 Key: HDFS-16908
 URL: https://issues.apache.org/jira/browse/HDFS-16908
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


IncrementalBlockReportManager#sendImmediately should use or logic to decide 
whether send immediately or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16903) Fix javadoc of Class LightWeightResizableGSet

2023-02-01 Thread ZhangHB (Jira)
ZhangHB created HDFS-16903:
--

 Summary: Fix javadoc of Class LightWeightResizableGSet
 Key: HDFS-16903
 URL: https://issues.apache.org/jira/browse/HDFS-16903
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs
Affects Versions: 3.3.4
Reporter: ZhangHB


After [HDFS-16249. Add DataSetLockManager to manage fine-grain locks for 
FsDataSetImpl.], the Class LightWeightResizableGSet is thread-safe. So we 
should fix the docs of it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-01-31 Thread ZhangHB (Jira)
ZhangHB created HDFS-16900:
--

 Summary: Method DataNode#isWrite seems not working in DataTransfer 
constructor method
 Key: HDFS-16900
 URL: https://issues.apache.org/jira/browse/HDFS-16900
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


In constructor method of DataTransfer, there is codes below:
{code:java}
if (isTransfer(stage, clientname)) {
  this.throttler = xserver.getTransferThrottler();
} else if(isWrite(stage)) {
  this.throttler = xserver.getWriteThrottler();
} {code}
the stage is a parameter of DataTransfer Constructor. Let us see where 
instantiate DataTransfer object.

In method transferReplicaForPipelineRecovery, codes like below:
{code:java}
final DataTransfer dataTransferTask = new DataTransfer(targets,
targetStorageTypes, targetStorageIds, b, stage, client); {code}
but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
PIPELINE_SETUP_APPEND_RECOVERY.

It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16898) Make write lock fine-grain in processCommandFromActor method

2023-01-29 Thread ZhangHB (Jira)
ZhangHB created HDFS-16898:
--

 Summary: Make write lock fine-grain in processCommandFromActor 
method
 Key: HDFS-16898
 URL: https://issues.apache.org/jira/browse/HDFS-16898
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.3.4
Reporter: ZhangHB


Now in method processCommandFromActor,  we have code like below:

 
{code:java}
writeLock();
try {
  if (actor == bpServiceToActive) {
return processCommandFromActive(cmd, actor);
  } else {
return processCommandFromStandby(cmd, actor);
  }
} finally {
  writeUnlock();
} {code}
if method processCommandFromActive costs much time, the write lock would not 
release.

 

It maybe block the updateActorStatesFromHeartbeat method in 
offerService,furthermore, it can cause the lastcontact of datanode very high, 
even dead when lastcontact beyond 600s.
{code:java}
bpos.updateActorStatesFromHeartbeat(
this, resp.getNameNodeHaState());{code}
here we can make write lock fine-grain in processCommandFromActor method to 
address this problem

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16882) Add cache hit rate metric in MountTableResolver#getDestinationForPath

2023-01-03 Thread ZhangHB (Jira)
ZhangHB created HDFS-16882:
--

 Summary: Add cache hit rate metric in 
MountTableResolver#getDestinationForPath
 Key: HDFS-16882
 URL: https://issues.apache.org/jira/browse/HDFS-16882
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


Currently,  the default value of 
"dfs.federation.router.mount-table.cache.enable" is ture,

the default value of "dfs.federation.router.mount-table.max-cache-size" is 
1.

But there is no metric that display cache hit rate, I think we can add a hit 
rate metric to watch the Cache performance and better tuning the parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2022-12-29 Thread ZhangHB (Jira)
ZhangHB created HDFS-16880:
--

 Summary: modify invokeSingleXXX interface in order to pass actual 
file src to namenode for debug info.
 Key: HDFS-16880
 URL: https://issues.apache.org/jira/browse/HDFS-16880
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


We found lots of INFO level log like below:
{quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: / is closed by 
DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
{quote}
It lost the real path of completeFile. Actually this is caused by : 

 
*org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
 org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*

In this method, it instantiates a RemoteLocationContext object:

*RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*

and then execute: *Object[] params = method.getParams(loc);*

The problem is right here, becasuse we always use new RemoteParam(), so, 

context.getDest() always return "/"; That's why we saw lots of incorrect logs.

 

After diving into invokeSingleXXX source code, I found the following RPCs 
classified as need actual src and not need actual src.

 

*need src path RPC:*

addBlock、abandonBlock、getAdditionalDatanode、complete

*not need src path RPC:*

updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
 by: 
getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)

 

After changes, the src can be pass to NN correctly.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16859) acquirePermit : move LOG.debug into if condition

2022-12-02 Thread ZhangHB (Jira)
ZhangHB created HDFS-16859:
--

 Summary: acquirePermit : move LOG.debug into if condition
 Key: HDFS-16859
 URL: https://issues.apache.org/jira/browse/HDFS-16859
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


The invoke frequency of method 
AbstractRouterRpcFairnessPolicyController#acquirePermit is high.  before 
getting the permit of a nameservice, there is always a LOG.debug statement.

It is better to move the statement into if condition statement. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org