xiangfu0 opened a new pull request, #17536: URL: https://github.com/apache/pinot/pull/17536
### Motivation - Allow offline dimension tables to support UPSERT-like semantics so later segments can deterministically overwrite earlier rows with the same primary key instead of producing duplicates. - Prevent invalid table configuration combinations that would enable both upsert semantics and strict duplicate-key errors. ### Description - Add an `enableUpsert` flag to `DimensionTableConfig` with a JSON-aware constructor, backwards-compatible constructors, and an `isEnableUpsert()` getter in `pinot-spi`. - Thread `enableUpsert` into `DimensionTableDataManager` by adding `_enableUpsert`, reading it from `DimensionTableConfig`, and changing duplicate-key handling to only throw when `!_enableUpsert && _errorOnDuplicatePrimaryKey` in both fast-lookup and memory-optimized loading paths. - Implement deterministic segment ordering when upsert is enabled via `sortSegmentsForUpsert(...)` which sorts by `indexCreationTime` then `segmentName` so later segments overwrite earlier ones. - Add `validateDimensionTableConfig(...)` in `TableConfigUtils.validate` to reject configs that enable both `enableUpsert` and `errorOnDuplicatePrimaryKey` simultaneously. - Add/adjust unit-test helpers and tests in `DimensionTableDataManagerTest` (including `testUpsertOverwritesDuplicatePrimaryKey` and new `testUpsertDedupesAcrossSegments`) and add a validation test in `TableConfigUtilsTest`; also add `createSegmentFromCsv` test helper and update existing tests to pass the new flag. ### Testing - Unit tests added or updated: `DimensionTableDataManagerTest#testUpsertOverwritesDuplicatePrimaryKey` and `DimensionTableDataManagerTest#testUpsertDedupesAcrossSegments`, plus an invalid-config check in `TableConfigUtilsTest`; these cover overwrite semantics and invalid config detection. - No automated test suites (for example `mvn` runs or CI) were executed as part of this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
