grishaf opened a new pull request, #25817: URL: https://github.com/apache/pulsar/pull/25817
### Motivation When a non-Java producer (C++, Python, Go) sends a null-value message (tombstone) on a compacted topic, the key is not removed during topic compaction. The tombstone is retained in the compacted view instead of being deleted. **Root cause:** `AbstractTwoPhaseCompactor.extractKeyAndSize()` computed payload size using `headersAndPayload.readableBytes()`, which returns the combined size of the serialized MessageMetadata + payload. For null-value messages the payload is empty, but the metadata is always present, so `readableBytes()` is always > 0 (e.g. 32 bytes of metadata). This prevents the tombstone path (`size <= 0 → latestForKey.remove(key)`) from being reached in phase one of compaction. The batch code path is not affected because it extracts per-message payload sizes from `SingleMessageMetadata.payloadSize`, which correctly returns 0 for null-value messages. The Java client always sets `numMessagesInBatch` in the metadata (even with `enableBatching(false)`), so all Java-produced messages go through the batch path — which is why this bug was never caught by existing tests. ### Modifications - **`AbstractTwoPhaseCompactor.extractKeyAndSize()`**: Fixed to compute the correct payload-only size by using `Commands.skipMessageMetadata()` to skip past the metadata before reading `readableBytes()`. For compressed messages, `getUncompressedSize()` was already correct. - **`EventTimeOrderCompactor.extractMessageCompactionData()`**: Refactored to reuse `extractKeyAndSize()` instead of duplicating the (buggy) size calculation. ### Verifying this change This change added tests and can be verified as follows: - Added `CompactionTest.testNonBatchedMessageWithNullValue` — writes raw non-batch entries (no `numMessagesInBatch` in metadata, simulating C++/Python producers) directly to the managed ledger, triggers compaction, and verifies tombstoned keys are removed from the compacted view. | Test | Without fix | With fix | |-------------------------------------|------------------------------------|----------| | testNonBatchedMessageWithNullValue | FAILED: expected [2] but found [4] | PASSED | ### Does this pull request potentially affect one of the following parts: - [ ] Dependencies (add or upgrade a dependency) - [ ] The public API - [ ] The schema - [ ] The default values of configurations - [ ] The threading model - [ ] The binary protocol - [ ] The REST endpoints - [ ] The admin CLI options - [ ] The metrics - [ ] Anything that affects deployment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
