malinjawi commented on PR #11900:
URL: https://github.com/apache/gluten/pull/11900#issuecomment-4236562676

   > @malinjawi Thanks.
   > 
   > A broad design question: Can you check if it's worth it to maintain native 
deletion vector reader vs. passing serialized deletion vector from Java to C++? 
In terms of the overall reading performance.
   
   Thanks for the design query @zhztheplayer.
   
    I checked this with 2 benchmark layers:
   1. a focused DV-path microbenchmark
   2. a full end-to-end Spark benchmark on a Delta table with deletion vectors
   
   I compared native path Vs. JVM-materialized serialized DV handoff path, 
where I setup as follows:
   
   - native side loads/materializes the deletion vector and applies it in the 
read path
   
   -  JVM materializes the deletion vector first using Delta’s JVM path, 
serializes it as a portable Roaring payload, then passes it to native through 
Gluten’s existing split metadata path to decode/deserialize it and apply it in 
the read path
   
   
   Here what I found:
   
   Microbenchmark:
   - When full cost is included, the JVM-materialized handoff path performs 
essentially the same as the native path.
   - Across the inline and stored DV cases I tested, the observed difference 
stayed within normal benchmark noise, approximately `-1%` to `+3%` relative to 
the native path.
   - The only clearly faster result came from an optimistic consumer-only setup 
where native receives serialized DV bytes at zero cost, which does not reflect 
the real end-to-end design.
   
   End-to-end Spark benchmark:
   - 1M rows, 125k deleted rows in the DV
     - native median: `627.81 ms`
     - JVM handoff median: `596.75 ms`
     - JVM handoff was approximately `4.9%` faster in this run
   
   - 4M rows, 500k deleted rows in the DV
     - native median: `1556.15 ms`
     - JVM handoff median: `1541.18 ms`
     - JVM handoff was approximately `1.0%` faster in this run
   
   
   From my observation, the smaller run shows a modest advantage for the 
JVM-materialized handoff path, but the larger run narrows that gap to about 
`1%`. Taken together, these results do not show a strong end-to-end performance 
advantage for replacing the native DV reader with a JVM-side serialized-DV 
handoff design. Therefore, the two approaches are effectively 
performance-neutral for this test,  so JVM handoff path is not clearly better 
then native in these testing.
   
   Based on that, I think keeping the native DV reader is still the right 
foundation for this PR.
   
   I think we can introduce  caching / avoiding repeated DV loads, rather than 
shifting the DV load boundary from native to JVM. Although what we could also 
checkout is a zero-copy JNI byte-transfer design.
   
   
   What do you think @zhztheplayer ?
   
   cc: @zhouyuan 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to