michaeljmarshall opened a new pull request, #18237:
URL: https://github.com/apache/pulsar/pull/18237

   Fixes: https://github.com/apache/pulsar/issues/18236
   
   ### Motivation
   
   See https://github.com/apache/pulsar/issues/18236 for details on the 
problematic behavior.
   
   The fundamental problem is that the cursor persists the `readPosition` 
instead of persisting the `markDeletePosition` in the 
`ManagedCursor#internalResetCursor` method. This method is only called on 
specific occasions, the broker persists the correct `markDeletePosition` in 
zookeeper, and the bookkeeper's cursor data is only used when the broker 
doesn't update the cursor in zookeeper, so this bug is rare.
   
   I found this bug at first by producing to a newly created topic and 
subscription where the subscription was set to "latest". In that case, when the 
broker evaluates the following code, the `position` evaluates to `ledgerId:0` 
instead of `ledgerId:-1`. It seems that this bug affects all reset cursor calls 
except "earliest" because of the way "earliest" always goes to entryId -1.
   
   
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1186-L1191
   
   After setting the position incorrectly for `latest` and for regular resets, 
the method persists `newPosition` (which is defined by `position`) as the 
`markDeletePosition`:
   
   
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1281
   
   As a result, the cursor's persisted data is incorrect for those two cases.
   
   Interestingly, we do not see this bug much because a cleanly closed cursor 
ledger persists its `markDeletePosition` in 
[Zookeeper](https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L2525),
 and when the broker recovers the cursor, it just uses the zookeeper data:
   
   
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L458
   
   Note that the only reason the zk metadata is correct is because we update 
the `markDeletePosition` to a new value in a callback run _after_ persisting a 
different `markDeletePosition`:
   
   
https://github.com/apache/pulsar/blob/fa328a42d5770a27237d926eac97385284876174/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1217
   
   Note also that this bug does not occur unless you produce a message to the 
topic after triggering the `resetCursor` logic. If you do not produce a 
message, you'll see a log line like `Current position 4:0 is ahead of last 
position 4:-1`, which was introduced by 
https://github.com/apache/pulsar/pull/2673. It's possible that PR partially 
found this bug.
   
   Finally, this bug may be related to 
https://github.com/apache/pulsar/pull/15031 
https://github.com/apache/pulsar/pull/15067.
   
   ### Modifications
   
   * Update `ManagedCursor#internalResetCursor` so that it persists the 
`markDeletePosition` instead of the `readPosition` to the cursor's bookkeeper 
ledger. The rest of the changes make sure that the correct variables are shared 
around.
   * Improve logging to make it clearer that `resetCursor` updates the 
`readPosition`
   * Fix test assertion ordering for semi-related test (this test failed as I 
iterated over the solution)
   
   ### Verifying this change
   
   This PR includes two new tests and all current tests continue to pass.
   
   ### Does this pull request potentially affect one of the following parts:
   
   The only possible "breaking" change is that this PR asserts that the 
`resetCursor` logic resets the read position (not the mark delete position). I 
think this is very sensible, but I'll need verification. Note that we already 
follow this logic in the way that we update the `markDeletePosition` in memory 
and in zookeeper. This change just updates what gets written to the cursor's 
ledger.
   
   ### Documentation
   
   - [x] `doc`
   
   ### Matching PR in forked repository
   
   PR in forked repository: https://github.com/michaeljmarshall/pulsar/pull/7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to