malinjawi commented on PR #11900:
URL: https://github.com/apache/gluten/pull/11900#issuecomment-4236562676
> @malinjawi Thanks.
>
> A broad design question: Can you check if it's worth it to maintain native
deletion vector reader vs. passing serialized deletion vector from Java to C++?
In terms of the overall reading performance.
Thanks for the design query @zhztheplayer.
I checked this with 2 benchmark layers:
1. a focused DV-path microbenchmark
2. a full end-to-end Spark benchmark on a Delta table with deletion vectors
I compared native path Vs. JVM-materialized serialized DV handoff path,
where I setup as follows:
- native side loads/materializes the deletion vector and applies it in the
read path
- JVM materializes the deletion vector first using Delta’s JVM path,
serializes it as a portable Roaring payload, then passes it to native through
Gluten’s existing split metadata path to decode/deserialize it and apply it in
the read path
Here what I found:
Microbenchmark:
- When full cost is included, the JVM-materialized handoff path performs
essentially the same as the native path.
- Across the inline and stored DV cases I tested, the observed difference
stayed within normal benchmark noise, approximately `-1%` to `+3%` relative to
the native path.
- The only clearly faster result came from an optimistic consumer-only setup
where native receives serialized DV bytes at zero cost, which does not reflect
the real end-to-end design.
End-to-end Spark benchmark:
- 1M rows, 125k deleted rows in the DV
- native median: `627.81 ms`
- JVM handoff median: `596.75 ms`
- JVM handoff was approximately `4.9%` faster in this run
- 4M rows, 500k deleted rows in the DV
- native median: `1556.15 ms`
- JVM handoff median: `1541.18 ms`
- JVM handoff was approximately `1.0%` faster in this run
From my observation, the smaller run shows a modest advantage for the
JVM-materialized handoff path, but the larger run narrows that gap to about
`1%`. Taken together, these results do not show a strong end-to-end performance
advantage for replacing the native DV reader with a JVM-side serialized-DV
handoff design. Therefore, the two approaches are effectively
performance-neutral for this test, so JVM handoff path is not clearly better
then native in these testing.
Based on that, I think keeping the native DV reader is still the right
foundation for this PR.
I think we can introduce caching / avoiding repeated DV loads, rather than
shifting the DV load boundary from native to JVM. Although what we could also
checkout is a zero-copy JNI byte-transfer design.
What do you think @zhztheplayer ?
cc: @zhouyuan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]