xiangfu0 opened a new pull request, #17606: URL: https://github.com/apache/pinot/pull/17606
### Motivation - Provide a config toggle to enable or disable dimension-table upsert/dedup logic so clusters can opt into queryable-doc-id filtering and upsert behavior for dimension tables. - Ensure upsert-related processing (computing/applying per-segment queryable doc id bitmaps and enabling segment upsert state) is only performed when the feature is explicitly enabled. ### Description - Added an `enableUpsert` boolean to `DimensionTableConfig` (JSON property `enableUpsert`) and exposed `isUpsertEnabled()` in `pinot-spi`. - Read the new flag in `DimensionTableDataManager` and gate upsert-related logic behind `_enableUpsert`, including using queryable-doc-id snapshots when sizing/iterating segments and applying per-segment bitmaps. - Introduced a small `RecordLocation` type and helper methods `applyQueryableDocIdsForRecordLocations`, `applyQueryableDocIdsForLookupTable`, `applyQueryableDocIdsToSegments`, and `getQueryableDocIdsSnapshot` in `DimensionTableDataManager` to compute and apply per-segment `MutableRoaringBitmap` sets and call `ImmutableSegmentImpl.enableUpsert(...)` when appropriate. - Updated all test and helper call sites that construct `DimensionTableConfig` to pass the new flag, and added integration coverage that creates a small OFFLINE upsert dimension table and asserts deduplicated selection/count results (`testDimensionTableUpsertSelection`), as well as a unit test `testLookupRespectsQueryableDocIds` that verifies lookup respects queryable doc ids when upsert is enabled. ### Testing - No automated test suites (`mvn`/CI) were executed as part of this change. - Added/updated tests include `MultiStageEngineIntegrationTest.testDimensionTableUpsertSelection` (integration) and `DimensionTableDataManagerTest.testLookupRespectsQueryableDocIds` (unit), but these tests were added and not run in this rollout. - Existing test usages and benchmark helpers were updated to construct the new config parameter where needed and compile-time imports were adjusted accordingly. ------ [Codex Task](https://chatgpt.com/codex/tasks/task_e_697af028b33c832d98fb7d8ff1035e4a) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
