ivandika3 commented on code in PR #3229: URL: https://github.com/apache/ozone/pull/3229#discussion_r1901568117
########## hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockDataStreamOutput.java: ########## @@ -406,12 +410,26 @@ private void executePutBlock(boolean close, byteBufferList = null; } waitFuturesComplete(); + final BlockData blockData = containerBlockData.build(); if (close) { - dataStreamCloseReply = out.closeAsync(); + final ContainerCommandRequestProto putBlockRequest + = ContainerProtocolCalls.getPutBlockRequest( + xceiverClient.getPipeline(), blockData, true, token); + dataStreamCloseReply = executePutBlockClose(putBlockRequest, + PUT_BLOCK_REQUEST_LENGTH_MAX, out); + dataStreamCloseReply.whenComplete((reply, e) -> { + if (e != null || reply == null || !reply.isSuccess()) { + LOG.warn("Failed executePutBlockClose, reply=" + reply, e); + try { + executePutBlock(true, false); + } catch (IOException ex) { + throw new CompletionException(ex); + } + } + }); Review Comment: > two PutBlockRequest are created by getPutBlockRequest using the same parameters This is true. However, it seems the BCSID will be set on the DN side (`ContainerStateMachine`) based on the log index. This BCSID will be part of the PutBlock response. <img width="628" alt="image" src="https://github.com/user-attachments/assets/a9343ee6-29b3-4b10-bc9a-0e7f1beecd1a" /> The possible issue is that it seems that `putBlockAsync` is the only one that will update the `BlockID`'s `blockCommitSequenceId` in the `BlockDataStreamOutput` that will be used as the `BlockID`'s `blockCommitSequenceId` in `OmKeyLocationInfo'. So if `putBlockAsync` is executed in the DN before the `executePutBlock`, the `blockCommitSequenceId` in the committed key in OM will be smaller than the one in the DN replicas. ## Example ### Normal case When `putBlockAsync` is processed in DN after the `executePutBlock` 1. `executePutBlock` (`ContainerStateMachine#link`) -> BCSID: 100 2. `putBlockAsync` (`ContainerStateMachine#applyTransaction`) -> BCSID: 101 The BCSID in Datanode in 101 and committed blockCommitSequenceId in OM is 101. When client calls `getBlock`, the block will be returned successfully since the blockCommitSequenceId = BCSID. ### Problematic case When `putBlockAsync` is executed in DN before the `executPutBlock` 1. `putBlockAsync` (`ContainerStateMachine#applyTransaction`) -> BCSID: 100 2. `executePutBlock` (`ContainerStateMachine#link`) -> BCSID: 101 The BCSID in Datanode is 101, but the committed blockCommitSequenceId in OM is 100. Therefore, there might be a difference between the blockCommitSequenceId in OM and DN. However, I'm not sure if the reordering might happen in the first place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org