grishaf opened a new pull request, #25817:
URL: https://github.com/apache/pulsar/pull/25817

   ### Motivation
   When a non-Java producer (C++, Python, Go) sends a null-value message 
(tombstone) on a compacted topic, the key is not removed during topic 
compaction. The tombstone is retained in the compacted view instead of being 
deleted.
   **Root cause:**
   `AbstractTwoPhaseCompactor.extractKeyAndSize()` computed payload size using 
`headersAndPayload.readableBytes()`, which returns the combined size of the 
serialized MessageMetadata + payload. For null-value messages the payload is 
empty, but the metadata is always present, so `readableBytes()` is always > 0 
(e.g. 32 bytes of metadata). This prevents the tombstone path (`size <= 0 → 
latestForKey.remove(key)`) from being reached in phase one of compaction.
   The batch code path is not affected because it extracts per-message payload 
sizes from `SingleMessageMetadata.payloadSize`, which correctly returns 0 for 
null-value messages. The Java client always sets `numMessagesInBatch` in the 
metadata (even with `enableBatching(false)`), so all Java-produced messages go 
through the batch path — which is why this bug was never caught by existing 
tests.
   ### Modifications
   - **`AbstractTwoPhaseCompactor.extractKeyAndSize()`**: Fixed to compute the 
correct payload-only size by using `Commands.skipMessageMetadata()` to skip 
past the metadata before reading `readableBytes()`. For compressed messages, 
`getUncompressedSize()` was already correct.
   - **`EventTimeOrderCompactor.extractMessageCompactionData()`**: Refactored 
to reuse `extractKeyAndSize()` instead of duplicating the (buggy) size 
calculation.
   ### Verifying this change
   This change added tests and can be verified as follows:
   - Added `CompactionTest.testNonBatchedMessageWithNullValue` — writes raw 
non-batch entries (no `numMessagesInBatch` in metadata, simulating C++/Python 
producers) directly to the managed ledger, triggers compaction, and verifies 
tombstoned keys are removed from the compacted view.
   | Test                                | Without fix                        | 
With fix |
   
|-------------------------------------|------------------------------------|----------|
   | testNonBatchedMessageWithNullValue  | FAILED: expected [2] but found [4] | 
PASSED   |
   ### Does this pull request potentially affect one of the following parts:
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The threading model
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] The metrics
   - [ ] Anything that affects deployment


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to