malinjawi opened a new pull request, #12062:
URL: https://github.com/apache/gluten/pull/12062

   What changes are proposed in this pull request?
   This PR follows #12024 and adds native support for Delta OPTIMIZE ZORDER 
expression execution in the Velox backend.
   
   The change:
   
   - adds Velox native functions for Delta ZORDER expressions:
     - interleave_bits for InterleaveBits
     - range_partition_id for RangePartitionId / PartitionerExpr
   - converts supported Delta ZORDER expressions in ExpressionConverter
   - allows supported OPTIMIZE ... ZORDER BY commands through 
GlutenOptimisticTransaction
   - keeps unsupported OPTIMIZE variants on the existing Delta command path
   - adds Delta 3.3 and Delta 4.0 coverage for path-based ZORDER and 
partition-predicate ZORDER
   
   Why are the changes needed?
   #12024 enabled plain OPTIMIZE compaction command offload. OPTIMIZE ZORDER 
still needed native expression coverage for InterleaveBits and RangePartitionId 
to keep the supported command execution native.
   
   Does this PR introduce any user-facing change?
   No public API change. It extends native Delta OPTIMIZE ZORDER support in the 
Velox backend.
   
   How was this patch tested?
   Built and ran locally:
   
   - Spark 3.5 test-compile with delta profile
   - Spark 4.0 test-compile with delta profile
   - C++ Velox backend build
   - focused Spark 3.5 DeltaNativeWriteSuite ZORDER tests
   - focused Spark 4.0 DeltaNativeWriteSuite ZORDER tests
   
   Performance
   I ran a targeted local benchmark for OPTIMIZE ZORDER on Spark 3.5. Workload: 
2,000,000 rows, 14 columns, 128 input files, 1 warmup, 3 measured runs. The 
benchmark measures only:
   
   OPTIMIZE delta.`path` ZORDER BY (z1, z2)
   
   Table setup time is excluded.
   
   Compared with native Delta write disabled:
   
   | mode | avg ms | median ms | files removed | files added |
   |---|---:|---:|---:|---:|
   | native ZORDER | 5519.0 | 5361.2 | 128 | 1 |
   | fallback ZORDER | 7459.8 | 7116.3 | 128 | 1 |
   
   Compared with local vanilla Spark/Delta using spark.gluten.enabled=false:
   
   | mode | avg ms | median ms | files removed | files added |
   |---|---:|---:|---:|---:|
   | native ZORDER | 5762.7 | 5756.6 | 128 | 1 |
   | vanilla Spark/Delta | 5293.5 | 5320.6 | 128 | 1 |
   
   The patch improves Gluten native ZORDER posture and is faster than the 
existing Gluten fallback path on this workload. It does not yet beat local 
vanilla Spark/Delta; remaining overhead is likely in Delta command 
planning/log/listing/commit work plus Gluten planning/listener and small 
terminal job overhead.
   
   Related issue: #10215
   Tracked by #12025
   
   Was this patch authored or co-authored using generative AI tooling?
   Generated-by: IBM BOB


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to