[ 
https://issues.apache.org/jira/browse/HDFS-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791659#comment-17791659
 ] 

farmmamba edited comment on HDFS-17267 at 11/30/23 2:43 PM:
------------------------------------------------------------

We can debug the unit test method testPipelineRecoveryWithSlowNode to verify 
this PR.

Set breakpoint in DataStreamer#run : `LOG.debug("{} sending {}", this, one);`.

We can see the DFSPacket with seq=3 sends twice.

 

BTW, on datanode side. It will not write packet data twice, because if will 
compare the onDiskLen and offsetInBlock in method receivePacket(). if onDiskLen 
>= offsetInBlock, there will not happen writing data behavior.


was (Author: zhanghaobo):
We can debug the unit test method testPipelineRecoveryWithSlowNode to verify 
this PR.

Set breakpoint in DataStreamer#run : `LOG.debug("{} sending {}", this, one);`.

We can see the DFSPacket with seq=3 sends twice.

> Client send the smae packet multiple times when method markSlowNode throws 
> IOException.
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-17267
>                 URL: https://issues.apache.org/jira/browse/HDFS-17267
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.6
>            Reporter: farmmamba
>            Assignee: farmmamba
>            Priority: Major
>
> Since we have HDFS-16348, we can kick out SLOW node in pipeline when writing 
> data to pipeline. 
> And I think it introduced a problem, that is the same packet will be sent 
> twice or more times when we kick out SLOW node.
>  
> The flow are as below:
> 1、 DFSPacket p1 is pushed into dataQueue.
> 2、DataStreamer takes DFSPacket p1 from dataQueue.
> 3、Remove p1 from dataQueue and   push p1 into ackQueue.
> 4、sendPacket(p1).
> 5、In ResponseProcessor#run,  read pipelineAck for p1.
> 6、We meet SlOW node,  so method markSlowNode throw IOException and does not 
> execute `ackQueue.removeFirst();`.
> 7、In next loop of DataStreamer#run, we come into method 
> processDatanodeOrExternalError and execute `dataQueue.addAll(0, ackQueue);`.
> 8、the p1 will be sent repeatedly.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to