[PR] fix(spark): fix mor bulk insert commit type error [hudi]

via GitHub Thu, 28 May 2026 21:31:04 -0700


fhan688 opened a new pull request, #18878:
URL: https://github.com/apache/hudi/pull/18878

### Describe the issue this Pull Request addresses

When Spark datasource write options specify MOR table type through
`hoodie.datasource.write.table.type`, `mergeParamsAndGetHoodieConfig` did
not
propagate that value to the table config key `hoodie.table.type`.

This caused `HoodieWriteConfig#getTableType` to fall back to the default
COW
table type in the row-writer `bulk_insert` path. As a result, MOR
row-writer
bulk insert could choose `commit` instead of the expected `deltacommit`.

This PR describes the issue inline; no GitHub issue is linked.

### Summary and Changelog

This PR fixes MOR row-writer bulk insert commit action selection by
keeping the
datasource table type option and table config table type key consistent.

Changes:
- Populate `hoodie.table.type` from `hoodie.datasource.write.table.type`
when
the table config key is not already present.
- Preserve existing precedence: an explicitly provided `hoodie.table.type`
is
not overwritten.
- Add a regression test covering MOR + row writer + `bulk_insert`,
verifying
the completed write instant uses `deltacommit`.

No code was copied.

### Impact

User-facing bug fix for Spark datasource writes.

MOR tables written with row writer and `bulk_insert` now use the correct
`deltacommit` action when table type is provided through datasource
options.

No public API changes.
No storage format changes.
No new config is introduced.
No performance impact is expected.

### Risk Level

low

The change is limited to parameter normalization in Spark SQL writer config
construction. It only fills `hoodie.table.type` when that key is absent and
`hoodie.datasource.write.table.type` is present, so explicit table-config
values
continue to take precedence.

Verification:
- `mvn -Pspark3.5 -pl hudi-spark-datasource/hudi-spark -am

-Dtest=org.apache.hudi.TestHoodieSparkSqlWriter#testMorRowWriterBulkInsertUsesDeltaCommitAction
-Dsurefire.failIfNoSpecifiedTests=false -DfailIfNoTests=false
-Dcheckstyle.skip=true -DskipUTs=true test`

Result:
- `Tests run: 1, Failures: 0, Errors: 0, Skipped: 0`
- `BUILD SUCCESS`

### Documentation Update

none

This fixes existing Spark datasource behavior and does not add or change
any
public config, feature, or API.

### Contributor's checklist

- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix(spark): fix mor bulk insert commit type error [hudi]

Reply via email to