lhotari commented on PR #25768:
URL: https://github.com/apache/pulsar/pull/25768#issuecomment-4444486298
_Local review findings from Claude Code — not a substitute for human review,
but flagging for the author's consideration._
### 1. [BUG] `TxnIds.fromKey` cannot round-trip a negative `mostSigBits` —
`TxnIds.java`
`toKey(new TxnID(-1, 1))` produces `"-1-1"`. `fromKey` then runs
`key.indexOf('-')` → returns `0` (the leading sign), trips the `dash <= 0`
guard, and throws `IllegalArgumentException`. Same for `Long.MIN_VALUE`
mostSigBits. The javadoc claim "round-trips losslessly" is false. In current
Pulsar code `TxnID` mostSigBits is normally non-negative, so this is latent —
but the contract should match reality. The companion `TxnPaths.txnIdFromOpPath`
already uses `lastIndexOf('-')`, so the on-disk encoding is consistent; only
`fromKey` is wrong.
### 2. [BUG] `recover()` leaks the segment-events subscription on the
failure path — `MetadataTransactionBuffer.java:115-167`
`subscribeSegmentEvents` is called first and `subscription` is set. If the
subsequent `listWritesBySegment` or per-txn header reads fail, `recoveryFuture`
is completed exceptionally and the method returns without closing
`subscription`. Nothing else closes it until/unless `closeAsync()` is later
invoked — but a TB whose recovery failed may never get a clean shutdown call,
and the listener stays wired to the metadata store until then. Close
`subscription` in the error branch of the terminal `whenComplete`.
### 3. [QUALITY] `txns` map grows unbounded —
`MetadataTransactionBuffer.java:94, 286-308`
Terminal txns are never removed from `txns`; their `firstPosition` is nulled
but the entry stays in the map for the life of the topic. `cleanupOpRecords`
only deletes the on-disk `/txn-op/...` records. For a long-running segment with
high txn turnover this is an in-memory leak. Either evict from `txns` once
`cleanupOpRecords` completes, or cap the cache. If you evict, `isTxnAborted`
would then start returning `true` for evicted txns (the "unknown → aborted"
default) — that matches what readers should do after physical cleanup, but it
should be tested.
### 4. [QUALITY] Events fired during recovery are dropped, with no
post-recovery rescan — `MetadataTransactionBuffer.java:115-167, 262`
`subscribeSegmentEvents` is set up before the scan, but `triggerReconcile`
short-circuits while `\!recoveryFuture.isDone()`. Any event fired between
subscription start and recovery completion is silently ignored. If no further
event arrives for an affected txn, its cached state stays at the
recovery-snapshot value (potentially OPEN when the header has already moved to
terminal). The design intent — events as wake-ups, header as truth — relies on
a strong "any event will re-read all open headers", but that does not protect
against a quiet txn whose only event landed in the recovery window. Either
queue events arriving during recovery and replay them after `recoveryFuture`
completes, or do an unconditional reconcile pass once recovery finishes.
### 5. [QUALITY] `recordOp` can mutate state on a concurrently-reconciled
terminal txn — `MetadataTransactionBuffer.java:240-257`
Between the cache-first header check and the lock re-acquisition in
`recordOp`, a reconcile may have flipped the entry to a terminal state with
`firstPosition = null`. The current code then does `entry.firstPosition =
position` (because `firstPosition == null`), mutating a terminal entry.
`recomputeMaxReadPositionLocked` ignores non-OPEN entries so it is benign for
read-position correctness today, but the state is misleading. Worse, the `entry
== null` branch *adds* the txn back as OPEN with `firstPosition = position` —
that genuinely undoes a reconcile decision (though that branch only triggers if
some path actually removes entries, which today nothing does — see also finding
3). Guard the mutation on `entry.state == OPEN`.
### 6. [QUALITY] TOCTOU between header authorization and ML append —
`MetadataTransactionBuffer.java:197-205`
The cache-first header read happens before `appendToLedger`. Between them,
the TC may commit or abort the txn. The entry is still written to the managed
ledger, and a `/txn-op` record is then written too. On COMMITTED, the spurious
write becomes visible since `isTxnAborted` returns `false`. On ABORTED, it is
filtered. The legacy `TopicTransactionBuffer` presumably has a similar window
so this may be accepted as a TC-ordering concern, but a sentence in the class
javadoc on what the TC must guarantee here would help future maintainers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]