cloud-fan commented on PR #55498:
URL: https://github.com/apache/spark/pull/55498#issuecomment-4301820207
Posting local benchmark evidence that validates the PR's two claims (not
meant to replace CI-regenerated `*-results.txt`; local hardware differs).
Hardware: `Intel(R) Xeon(R) Platinum 8375C @ 2.90GHz`, JDK 17.0.14, `local[1]`.
**Setup.** Ran the benchmark on two source trees, both on this branch's code
**except** for `complexTypeExtractors.scala`:
- **WITHOUT fix** — `complexTypeExtractors.scala` from `upstream/master`
(SPARK-55959 as merged).
- **WITH fix** — this PR (`HEAD`).
Reduced to `size=10000` for the foldable-literal case and `size=1000` for
the non-foldable map-column case (10K rows per case, 3 iterations). `numRows *
mapSize` exceeds local heap at larger sizes.
### Claim 1: map-col regresses WITHOUT the fix
Non-foldable map column (`col('m')` backed by row data), `threshold=0` (most
aggressive), `size=1000`, `hit=1.0`:
| Case | WITHOUT fix (best, ms) | WITH fix (best, ms) | Delta |
|---|---:|---:|---:|
| GetMapValue interpreted | 2144 | 1153 | **WITHOUT 1.86x slower** |
| GetMapValue codegen | 260 | 215 | **WITHOUT 1.21x slower** |
| ElementAt interpreted | 2240 | 1137 | **WITHOUT 1.97x slower** |
| ElementAt codegen | 260 | 218 | **WITHOUT 1.19x slower** |
Without the foldable gate, every row rebuilds the hash index (the
reference-identity cache `lastMap ne map` / `$keys != $lastKeyArray` never hits
for `UnsafeRow.getMap()`, which allocates a fresh `UnsafeMapData` per call).
With the fix, non-foldable maps short-circuit to linear scan, which is cheaper
for single-lookup-per-row shapes.
### Claim 2: map-lit perf is unchanged WITH the fix
Foldable map literal (`typedLit(map)`), `size=10000`, `hit=1.0`:
| Case | WITHOUT fix (best, ms) | WITH fix (best, ms) | Delta |
|---|---:|---:|---:|
| GetMapValue interp Linear | 114 | 119 | ~noise |
| GetMapValue interp Hash | 38 | 36 | ~noise |
| GetMapValue codegen Linear| 88 | 89 | 0% |
| GetMapValue codegen Hash | 38 | 40 | ~noise (+5%) |
| ElementAt interp Linear | 100 | 122 | ~noise |
| ElementAt interp Hash | 34 | 38 | ~noise |
| ElementAt codegen Linear | 85 | 99 | ~noise |
| ElementAt codegen Hash | 36 | 36 | 0% |
For the codegen hash path, the PR's `PrebuiltHashExecutor` now builds the
`int[] buckets` table on the driver and uses the same inline primitive hash
codegen SPARK-55959 used (just without the per-row rebuild branch) — so the
best-case hot loop is unchanged.
(An earlier iteration of this PR simplified the codegen to a `HashMap.get` +
autobox lookup and showed a ~25-34% slowdown on this case; that was reverted to
preserve SPARK-55959's performance for foldable maps.)
### TL;DR
- **Non-foldable maps**: the fix eliminates the per-row hash rebuild
regression (1.2x-2x speedup depending on path).
- **Foldable maps**: performance matches SPARK-55959. The optimization is
preserved in the regime where it was designed to win.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]