cloud-fan commented on PR #55498:
URL: https://github.com/apache/spark/pull/55498#issuecomment-4301820207

   Posting local benchmark evidence that validates the PR's two claims (not 
meant to replace CI-regenerated `*-results.txt`; local hardware differs). 
Hardware: `Intel(R) Xeon(R) Platinum 8375C @ 2.90GHz`, JDK 17.0.14, `local[1]`.
   
   **Setup.** Ran the benchmark on two source trees, both on this branch's code 
**except** for `complexTypeExtractors.scala`:
   - **WITHOUT fix** — `complexTypeExtractors.scala` from `upstream/master` 
(SPARK-55959 as merged).
   - **WITH fix** — this PR (`HEAD`).
   
   Reduced to `size=10000` for the foldable-literal case and `size=1000` for 
the non-foldable map-column case (10K rows per case, 3 iterations). `numRows * 
mapSize` exceeds local heap at larger sizes.
   
   ### Claim 1: map-col regresses WITHOUT the fix
   
   Non-foldable map column (`col('m')` backed by row data), `threshold=0` (most 
aggressive), `size=1000`, `hit=1.0`:
   
   | Case | WITHOUT fix (best, ms) | WITH fix (best, ms) | Delta |
   |---|---:|---:|---:|
   | GetMapValue interpreted | 2144 | 1153 | **WITHOUT 1.86x slower** |
   | GetMapValue codegen    | 260  | 215  | **WITHOUT 1.21x slower** |
   | ElementAt  interpreted | 2240 | 1137 | **WITHOUT 1.97x slower** |
   | ElementAt  codegen     | 260  | 218  | **WITHOUT 1.19x slower** |
   
   Without the foldable gate, every row rebuilds the hash index (the 
reference-identity cache `lastMap ne map` / `$keys != $lastKeyArray` never hits 
for `UnsafeRow.getMap()`, which allocates a fresh `UnsafeMapData` per call). 
With the fix, non-foldable maps short-circuit to linear scan, which is cheaper 
for single-lookup-per-row shapes.
   
   ### Claim 2: map-lit perf is unchanged WITH the fix
   
   Foldable map literal (`typedLit(map)`), `size=10000`, `hit=1.0`:
   
   | Case | WITHOUT fix (best, ms) | WITH fix (best, ms) | Delta |
   |---|---:|---:|---:|
   | GetMapValue interp Linear | 114 | 119 | ~noise |
   | GetMapValue interp Hash   |  38 |  36 | ~noise |
   | GetMapValue codegen Linear|  88 |  89 | 0% |
   | GetMapValue codegen Hash  |  38 |  40 | ~noise (+5%) |
   | ElementAt  interp Linear  | 100 | 122 | ~noise |
   | ElementAt  interp Hash    |  34 |  38 | ~noise |
   | ElementAt  codegen Linear |  85 |  99 | ~noise |
   | ElementAt  codegen Hash   |  36 |  36 | 0% |
   
   For the codegen hash path, the PR's `PrebuiltHashExecutor` now builds the 
`int[] buckets` table on the driver and uses the same inline primitive hash 
codegen SPARK-55959 used (just without the per-row rebuild branch) — so the 
best-case hot loop is unchanged.
   
   (An earlier iteration of this PR simplified the codegen to a `HashMap.get` + 
autobox lookup and showed a ~25-34% slowdown on this case; that was reverted to 
preserve SPARK-55959's performance for foldable maps.)
   
   ### TL;DR
   
   - **Non-foldable maps**: the fix eliminates the per-row hash rebuild 
regression (1.2x-2x speedup depending on path).
   - **Foldable maps**: performance matches SPARK-55959. The optimization is 
preserved in the regime where it was designed to win.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to