Re: [D] Spark DataSource V2 read and write benchmarks? [hudi]

via GitHub Mon, 24 Nov 2025 01:58:43 -0800


GitHub user geserdugarov edited a comment on the discussion: Spark DataSource 
V2 read and write benchmarks?


For queries, when we read and write in the same table:
```sql
-- 1st example
INSERT INTO hudi_tbl
SELECT * FROM hudi_tbl WHERE ...

-- 2nd example
UPDATE hudi_tbl t
SET somecol = somecol + 100
WHERE EXISTS (
  SELECT 1
  FROM hudi_tbl s
  WHERE s.id = t.id
    AND s.anothercol > 100
);
```
combining of V1 write and V2 read could be tricky.

I suppose, we could change the focus on full support of DataSource V2 without 
performance drop (read and write) instead of trying to support V1 write and V2 
read simultaneously. In this case, we also would have to resolve compatibility 
issues from the V1 >> V2 migration point of view, not some complex hybrid 
migration with a lot of edge cases.

GitHub link: 
https://github.com/apache/hudi/discussions/13955#discussioncomment-15059073

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Spark DataSource V2 read and write benchmarks? [hudi]

Reply via email to