nsivabalan opened a new issue, #18830: URL: https://github.com/apache/hudi/issues/18830
**Describe the problem you faced** When a Hudi table has a pending clustering plan and an `INSERT_OVERWRITE` (or `INSERT_OVERWRITE_TABLE`) operation targets the same partition(s), the operation proceeds and replaces the file groups that clustering was scheduled against. The clustering update strategies (`SparkRejectUpdateStrategy`, `SparkAllowUpdateStrategy`) only inspect explicit *record-level* updates to detect a conflict. `INSERT_OVERWRITE` does not tag records with existing file groups — it declares whole partitions to be replaced wholesale via `getPartitionToReplacedFileIds`. The strategies never see the to-be-replaced groups, so `SparkRejectUpdateStrategy` (the default) does not throw, and the overwrite is admitted. With `hoodie.clustering.rollback.pending.replacecommit=true`, this can also lead to clustering being rolled back repeatedly (pipeline starvation). **To Reproduce** 1. Configure a table with `hoodie.clustering.updates.strategy=org.apache.hudi.client.clustering.update.strategy.SparkRejectUpdateStrategy` (the default). 2. Ingest some data into partition `p`. 3. Schedule clustering on `p` (do not run it). 4. Issue `INSERT_OVERWRITE` against partition `p`. 5. The overwrite completes; the `Reject` strategy did not detect the conflict. **Expected behavior** `SparkRejectUpdateStrategy` should throw `HoodieClusteringUpdateException` because the file groups being replaced overlap with pending clustering. Same expectation for `INSERT_OVERWRITE_TABLE` against any partition that has pending clustering. **Environment Description** * Hudi version: master * Spark version: 3.5 * Storage: any **Additional context** `DELETE_PARTITION` already has its own pre-existing check (`DeletePartitionUtils.checkForPendingTableServiceActions`) and is unaffected. PR #18829 addresses this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
