github-actions[bot] commented on code in PR #63116:
URL: https://github.com/apache/doris/pull/63116#discussion_r3214802351
##########
be/src/storage/compaction/compaction.cpp:
##########
@@ -1332,12 +1332,19 @@ Status CompactionMixin::modify_rowsets() {
// New loads are not blocked, so some keys of input rowsets might
// be deleted during the time. We need to deal with delete bitmap
// of incremental data later.
- // TODO(LiaoXin): check if there are duplicate keys
std::size_t missed_rows_size = 0;
tablet()->calc_compaction_output_rowset_delete_bitmap(
_input_rowsets, *_rowid_conversion, 0, version.second + 1,
missed_rows.get(),
location_map.get(), _tablet->tablet_meta()->delete_bitmap(),
&output_rowset_delete_bitmap);
+ // In cluster-key MOW compaction, rows are sorted by cluster key, so
duplicate unique keys
+ // may be non-adjacent in merge order. Scan the output primary key
index to delete older
+ // duplicate rows inside the output rowset.
+ if (!tablet()->tablet_schema()->cluster_key_uids().empty()) {
+
RETURN_IF_ERROR(tablet()->calc_compaction_output_rowset_internal_delete_bitmap(
Review Comment:
This only fixes the shared-nothing `CompactionMixin::modify_rowsets()` path.
Cloud compactions do not call this method;
`CloudCumulativeCompaction::modify_rowsets()` calls
`CloudTablet::calc_delete_bitmap_for_compaction()` instead, and that function
still only converts input-rowset delete bitmaps before/after the meta-service
lock. For a cloud cluster-key MoW table with the same shape as the new test
(older `(uk=1, ck=30)` and newer `(uk=1, ck=20)` sorted by cluster key into the
same output rowset), the output rowset is committed without any internal delete
bitmap entry, so both duplicates can remain visible. Please apply
`calc_compaction_output_rowset_internal_delete_bitmap()` in the cloud
delete-bitmap calculation path as well and add coverage for that path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]