Re: [D] Spark DataSource V2 read and write benchmarks? [hudi]

via GitHub Mon, 24 Nov 2025 00:51:15 -0800


GitHub user geserdugarov edited a comment on the discussion: Spark DataSource 
V2 read and write benchmarks?


For queries, when we read and write in the same table:
```sql
-- 1st example
INSERT INTO hudi_tbl
SELECT * FROM hudi_tbl WHERE ...

-- 2nd example
UPDATE hudi_tbl t
SET somecol = somecol + 100
WHERE EXISTS (
  SELECT 1
  FROM hudi_tbl s
  WHERE s.id = t.id
    AND s.anothercol > 100
);
```
looks like it's not possible to combine V1 write and V2 read.

I suppose, we should change the focus on full support of DataSource V2 without 
performance drop (read and write) instead of trying to support V1 write and V2 
read simultaneously.

GitHub link: 
https://github.com/apache/hudi/discussions/13955#discussioncomment-15059073

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Re: [D] Spark DataSource V2 read and write benchmarks? [hudi]

Reply via email to