Re: [D] Spark DataSource V2 read and write benchmarks? [hudi]

via GitHub Wed, 12 Nov 2025 20:03:35 -0800


GitHub user geserdugarov added a comment to the discussion: Spark DataSource V2 
read and write benchmarks?


>From my point of view the main stages are the following:
- prepare benchmarks for write scenarios to check is it true that V2 wouldn't 
provide enough flexibility, and V1 is more performant for integration with Hudi:
  - **if write path will use V1**: prepare list of all APIs that could call 
either V1 or V2,
  - prepare design of hybrid call V1 for write and V2 to read,
  - **if write path will use V2**: support all missed APIs for performant V2 
write path,
- prepare benchmarks for read scenarios to check that switching to V2 read path 
doesn't decrease performance,
- implement V2 read path.

GitHub link: 
https://github.com/apache/hudi/discussions/13955#discussioncomment-14954490

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Spark DataSource V2 read and write benchmarks? [hudi]

Reply via email to