viirya commented on PR #56072: URL: https://github.com/apache/spark/pull/56072#issuecomment-4532432830
@LuciferYang Good catch. I dug into the diff and I think this is GHA runner heterogeneity rather than a real regression. Comparing the result headers: - The existing master baseline (`-results.txt`, JDK 17) was generated on `AMD EPYC 7763 64-Core Processor` (Zen 3). - The refreshed JDK 17 run from this PR landed on `AMD EPYC 9V74 80-Core Processor` (Zen 4). GHA's `ubuntu-latest` runner pool is mixed, and we can't pin a specific CPU. Since the two snapshots were taken on different CPUs (and slightly different kernel revisions), the JDK 17 diff is comparing apples to oranges. Looking at JDK 21 / JDK 25 where the patched and baseline numbers were generated on the same CPU class, the picture is what we expect: **Group C — Nullable batch with def-levels, JDK 21 (ns/row, lower is better):** | nullRatio, shape | master baseline | this PR | delta | | --- | ---: | ---: | ---: | | 0.1 random | 8.1 | 7.1 | **+14%** | | 0.1 clustered | 5.7 | 4.9 | **+14%** | | 0.3 random | 11.7 | 10.4 | **+11%** | | 0.3 clustered | 5.8 | 4.8 | **+17%** | | 0.5 random | 13.1 | 11.4 | **+13%** | | 0.5 clustered | 5.8 | 4.7 | **+19%** | | 0.9 random | 7.6 | 6.2 | **+18%** | | 0.9 clustered | 5.5 | 4.3 | **+22%** | JDK 25 shows small wins on clustered runs and noise on random; JDK 17 looks worse only because of the CPU swap I described above. The on-heap path is where most of the work lands (`OnHeapColumnVector` is the default), and JDK 21 gives the cleanest CPU-stable apples-to-apples picture there. I can re-trigger JDK 17 a couple of times to try to land on a 7763 runner if you'd like a like-for-like comparison there too — but that's a coin flip given the GHA pool. Let me know if you'd prefer that or if the JDK 21 evidence is sufficient. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
