zhongyujiang opened a new pull request, #5241: URL: https://github.com/apache/paimon/pull/5241
<!-- Please specify the module before the PR name: [core] ... or [flink] ... --> ### Purpose <!-- Linking this pull request to the issue --> Linked issue: part of #4816 <!-- What is the purpose of the change --> Support spark datasource v2 write path, reduce write serialization overhead and accelerate the process of writing to primary key tables in Spark. Currently only added support for fixed-bucket table. ### Tests <!-- List UT and IT cases to verify this change --> BucketFunctionTest, SparkWriteITCase PaimonSourceWriteBenchmark: ```md Benchmark Mode Cnt Score Error Units PaimonSourceWriteBenchmark.v1Write ss 5 13.845 ± 23.192 s/op PaimonSourceWriteBenchmark.v2Write ss 5 9.579 ± 14.929 s/op ``` ### API and Format <!-- Does this change affect API or storage format --> ### Documentation <!-- Does this change introduce a new feature --> Add a config `spark.sql.paimon.use-v2-write` to enable switching to v2 write, will fall back to v1 write when encountering an unsupported scenario(e.g. `HASH_DYNAMIC` bucket mode table). Note: this is an overall draft PR, which will be split into smaller PRs for easier review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
