GitHub user geserdugarov added a comment to the discussion: Spark DataSource V2 read and write benchmarks?
>From my point of view the main stages are the following: - prepare benchmarks for write scenarios to check is it true that V2 wouldn't provide enough flexibility, and V1 is more performant for integration with Hudi: - **if write path will use V1**: prepare list of all APIs that could call either V1 or V2, - prepare design of hybrid call V1 for write and V2 to read, - **if write path will use V2**: support all missed APIs for performant V2 write path, - prepare benchmarks for read scenarios to check that switching to V2 read path doesn't decrease performance, - implement V2 read path. GitHub link: https://github.com/apache/hudi/discussions/13955#discussioncomment-14954490 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
