malinjawi opened a new pull request, #12062:
URL: https://github.com/apache/gluten/pull/12062
What changes are proposed in this pull request?
This PR follows #12024 and adds native support for Delta OPTIMIZE ZORDER
expression execution in the Velox backend.
The change:
- adds Velox native functions for Delta ZORDER expressions:
- interleave_bits for InterleaveBits
- range_partition_id for RangePartitionId / PartitionerExpr
- converts supported Delta ZORDER expressions in ExpressionConverter
- allows supported OPTIMIZE ... ZORDER BY commands through
GlutenOptimisticTransaction
- keeps unsupported OPTIMIZE variants on the existing Delta command path
- adds Delta 3.3 and Delta 4.0 coverage for path-based ZORDER and
partition-predicate ZORDER
Why are the changes needed?
#12024 enabled plain OPTIMIZE compaction command offload. OPTIMIZE ZORDER
still needed native expression coverage for InterleaveBits and RangePartitionId
to keep the supported command execution native.
Does this PR introduce any user-facing change?
No public API change. It extends native Delta OPTIMIZE ZORDER support in the
Velox backend.
How was this patch tested?
Built and ran locally:
- Spark 3.5 test-compile with delta profile
- Spark 4.0 test-compile with delta profile
- C++ Velox backend build
- focused Spark 3.5 DeltaNativeWriteSuite ZORDER tests
- focused Spark 4.0 DeltaNativeWriteSuite ZORDER tests
Performance
I ran a targeted local benchmark for OPTIMIZE ZORDER on Spark 3.5. Workload:
2,000,000 rows, 14 columns, 128 input files, 1 warmup, 3 measured runs. The
benchmark measures only:
OPTIMIZE delta.`path` ZORDER BY (z1, z2)
Table setup time is excluded.
Compared with native Delta write disabled:
| mode | avg ms | median ms | files removed | files added |
|---|---:|---:|---:|---:|
| native ZORDER | 5519.0 | 5361.2 | 128 | 1 |
| fallback ZORDER | 7459.8 | 7116.3 | 128 | 1 |
Compared with local vanilla Spark/Delta using spark.gluten.enabled=false:
| mode | avg ms | median ms | files removed | files added |
|---|---:|---:|---:|---:|
| native ZORDER | 5762.7 | 5756.6 | 128 | 1 |
| vanilla Spark/Delta | 5293.5 | 5320.6 | 128 | 1 |
The patch improves Gluten native ZORDER posture and is faster than the
existing Gluten fallback path on this workload. It does not yet beat local
vanilla Spark/Delta; remaining overhead is likely in Delta command
planning/log/listing/commit work plus Gluten planning/listener and small
terminal job overhead.
Related issue: #10215
Tracked by #12025
Was this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]