viirya commented on PR #56072:
URL: https://github.com/apache/spark/pull/56072#issuecomment-4532432830

   @LuciferYang Good catch. I dug into the diff and I think this is GHA runner 
heterogeneity rather than a real regression. Comparing the result headers:
   
   - The existing master baseline (`-results.txt`, JDK 17) was generated on 
`AMD EPYC 7763 64-Core Processor` (Zen 3).
   - The refreshed JDK 17 run from this PR landed on `AMD EPYC 9V74 80-Core 
Processor` (Zen 4).
   
   GHA's `ubuntu-latest` runner pool is mixed, and we can't pin a specific CPU. 
Since the two snapshots were taken on different CPUs (and slightly different 
kernel revisions), the JDK 17 diff is comparing apples to oranges.
   
   Looking at JDK 21 / JDK 25 where the patched and baseline numbers were 
generated on the same CPU class, the picture is what we expect:
   
   **Group C — Nullable batch with def-levels, JDK 21 (ns/row, lower is 
better):**
   
   | nullRatio, shape | master baseline | this PR | delta |
   | --- | ---: | ---: | ---: |
   | 0.1 random    | 8.1  | 7.1  | **+14%** |
   | 0.1 clustered | 5.7  | 4.9  | **+14%** |
   | 0.3 random    | 11.7 | 10.4 | **+11%** |
   | 0.3 clustered | 5.8  | 4.8  | **+17%** |
   | 0.5 random    | 13.1 | 11.4 | **+13%** |
   | 0.5 clustered | 5.8  | 4.7  | **+19%** |
   | 0.9 random    | 7.6  | 6.2  | **+18%** |
   | 0.9 clustered | 5.5  | 4.3  | **+22%** |
   
   JDK 25 shows small wins on clustered runs and noise on random; JDK 17 looks 
worse only because of the CPU swap I described above. The on-heap path is where 
most of the work lands (`OnHeapColumnVector` is the default), and JDK 21 gives 
the cleanest CPU-stable apples-to-apples picture there.
   
   I can re-trigger JDK 17 a couple of times to try to land on a 7763 runner if 
you'd like a like-for-like comparison there too — but that's a coin flip given 
the GHA pool. Let me know if you'd prefer that or if the JDK 21 evidence is 
sufficient.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to