poorbarcode opened a new pull request, #4557:
URL: https://github.com/apache/bookkeeper/pull/4557

   ### Motivation
   
   **Background: the normal steps of adding an entry**
   - Gains BKC's IO thread
   - Triggers the callback after gaining the IO thread
     - Write data to servers
   - Triggers `PendingAddOp.writeComplete` after receiving the response from BK 
servers.
   - Triggers all succeed callbacks in the pending queue
   
   **Background: the steps of disconnection**
   - Gains BKC's IO thread
   - Triggers a failed callback after gaining the IO thread
     - Triggers a failed `PendingAddOp.writeComplete`. You can reproduce this 
flow by the new test `testAddEntriesCallbackWithBKClientThread`
     - **(Highlight)** If the writing is already complete, this process will 
also trigger all successful callbacks in the pending queue even if the current 
writing is failed<sup>[code-1]</sup>
   
   ---
   
   **Issue-1: write stuck due to pending add callback by multiple threads**
   - Settings
     - Ensembles: `3`
     - WriteQuoram: `3`
     - AckQuoram: `2`
   
   
   | steps | write entries | `client->BK1` | `client->BK2` | `client-> BK3`|
   | --- | --- | --- | --- | --- |
   | 1 | write BKs | 
   | 2 | | start writing | start writing | start writing |
   | 3 | | write success<br> `ack`: `1/3` | write success<br> `ack`: `2/3` | |
   | 4 | mark the writing as `complete` since ack quorum is `2/3` |
   | 5 | | | | connected and disconnected |
   | 6 | | | | Triggers a failed `PendingAddOp.writeComplete` |
   | 8 | | | `thread`: `bookkeeper workers` |  `thread`: `client-server io` |
   | 7 | | | Trigger all succeed callbacks in the pending queue |  Trigger all 
succeed callbacks in the pending queue<sup>[code-1]</sup> |
   
   Since there are multiple threads that will trigger all successful callbacks 
in the pending queue, it may cause the following race 
condition<sup>[code-2]</sup>
     - BTW, the `thread-1` and `thread-2` may be triggered by different 
`PendingAddOps`
   
   | steps | `thread-1` | `thread-2` |
   | --- | --- | --- |
   | 1 | peek pending addOp from the queue | peek pending addOp from the queue |
   | 2 | check it is the first item: `success`|  check it is the first item: 
`success`|
   | 3 | call `queue.pop` | call `queue.pop`|
   | 4 | | Issue: the second OP will never get a callback triggering|
   
   
   **[1] code link**: 
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L307
   
   ```java
   // PendingAddOp.writeComplete
   public synchronized void writeComplete(int rc, long ledgerId, long entryId, 
BookieId addr, Object ctx) {
           if (completed) {
               sendAddSuccessCallbacks();
               maybeRecycle();
               return;
   }
   ```
   
   **[2] code-link**: 
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2092-L2124
   
   ```java
       // LedgerHandle.sendAddSuccessCallbacks
       void sendAddSuccessCallbacks() {
           while ((pendingAddOp = pendingAddOps.peek()) != null && 
!changingEnsemble) {
               if (!pendingAddOp.completed) {
                   return;
               }
               pendingAddOps.remove();
               explicitLacFlushPolicy.updatePiggyBackedLac(lastAddConfirmed);
               pendingAddsSequenceHead = pendingAddOp.entryId;
               pendingAddOp.submitCallback(BKException.Code.OK);
           }
       }
   
   ```
   
   ---
   
   **Issue-2: ledger will be closed with a incorrect length**
   
   Since the task that triggers all successful callbacks in the pending queue 
may be run in `IO` thread, the task "triggers all successful callbacks in the 
pending queue" and closing ledger may concurrectly execute
   
   | steps | `workeer-thread` | `io-thread` |
   | --- | --- | --- |
   | 1 | start closing ledger | peek pending addOp from the queue |
   | 2 |  |  check it is the first item: `success`|
   | 3 | drain pending adds<sup>[code-3]</sup> |
   | 4 | reduce `ledger.length` which was popped out from the 
queue<sup>[code-3]</sup> | |
   | 5 | | call `queue.pop` and pop nothing|
   | 6 | | update `ledger.LAC` |
   
   The variables `ledger.LAC` and `ledger.length` do not match
   
   **[3] code-link**: 
https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2076-L2084
   
   ```java
       synchronized List<PendingAddOp> drainPendingAddsAndAdjustLength() {
           PendingAddOp pendingAddOp;
           List<PendingAddOp> opsDrained = new 
ArrayList<PendingAddOp>(pendingAddOps.size());
           while ((pendingAddOp = pendingAddOps.poll()) != null) {
               addToLength(-pendingAddOp.entryLength);
               opsDrained.add(pendingAddOp);
           }
           return opsDrained;
       }
   ```
   
   ---
   
   **The issue we encountered**
   
   A pulsar topic is stuck at `ClosingLedger` state
   
   **pulsar topic stats**
   ```json
   {
     "entriesAddedCounter" : 12485917,
     "numberOfEntries" : 126383,
     "totalSize" : 53668291,
     "currentLedgerEntries" : 137418,
     "currentLedgerSize" : 58374388,
     "lastLedgerCreatedTimestamp" : "2025-02-06T09:13:45.371Z",
     "waitingCursorsCount" : 1,
     "pendingAddEntriesCount" : 8342123,
     "lastConfirmedEntry" : "41901:126416",
     "state" : "ClosingLedger",
     "ledgers" : [ {
       "ledgerId" : 41901,
       "entries" : 0,
       "size" : 0,
       "offloaded" : false,
       "underReplicated" : false
     } ],
     "cursors" : {
   ...
   ```
   
   **logs**
   ```
   2025-02-06T09:14:06,421+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] 
WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901, 
126417) to bookie (2, bookie-0:3181): Bookie handle is not available"
   2025-02-06T09:14:06,421+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] 
WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901, 
126412) to bookie (2, bookie-0:3181): Bookie handle is not available"
   2025-02-06T09:14:06,421+0000 [pulsar-io-3-5] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to 
bookie: bookie-0:3181 [id: 0xbc663b9b, L:/10.247.4.112:59578 - 
R:bookie-0/10.247.4.38:3181]"
   2025-02-06T09:14:06,421+0000 [pulsar-io-3-5] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 
0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181] authenticated 
as BookKeeperPrincipal{ANONYMOUS}"
   2025-02-06T09:14:06,423+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] 
INFO  org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [bookie-3:3181, 
bookie-2:3181, bookie-1:3181] for ledger: 41901"
   2025-02-06T09:14:06,423+0000 [pulsar-io-3-5] WARN  
org.apache.bookkeeper.proto.PerChannelBookieClient - Exception caught on:[id: 
0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181] cause: 
recvAddress(..) failed: Connection reset by peer"
   2025-02-06T09:14:06,423+0000 [pulsar-io-3-5] INFO  
org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie 
channel [id: 0xbc663b9b, L:/10.247.4.112:59578 ! R:bookie-0/10.247.4.38:3181]"
   2025-02-06T09:14:06,423+0000 [pulsar-io-3-6] WARN  
org.apache.bookkeeper.mledger.impl.OpAddEntry - [{topic}] The add op is 
terminal legacy callback for entry 41901-126416 adding."
   2025-02-06T09:14:06,423+0000 [pulsar-io-3-6] WARN  
org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901, 
126426) to bookie (2, bookie-1:3181): Bookie handle is not available"
   2025-02-06T09:14:06,423+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] 
WARN  org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for 
bookie: bookie-1:3181 in ensemble: [bookie-3:3181, bookie-2:3181, 
bookie-1:3181] is not adhering to placement policy and chose bookie-0:3181. 
excludedBookies [bookie-1:3181] and quarantinedBookies [bookie-2:3181]"
   ```
   
   
   ### Changes
   
   Switch the thread to `Bookkeeper works` if the connection is broken.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to