viirya opened a new pull request, #55929:
URL: https://github.com/apache/spark/pull/55929
### What changes were proposed in this pull request?
Fix three sites in `LongToUnsafeRowMap` where a `Long` page-word count is
multiplied by 8 using `Int` arithmetic. At the upper bound (`1 << 30` long
words, the explicit cap in `grow` plus the 8 GiB ceiling), `Int * 8` wraps to 0:
- `LongToUnsafeRowMap.grow`: `val newPage = allocatePage(newNumWords.toInt
* 8)`
- `LongToUnsafeRowMap.read` (deserialization on executors): `page =
allocatePage(pageLength * 8)` `cursor = pageLength * 8 + page.getBaseOffset`
When the multiplication overflows to 0, `MemoryConsumer.allocatePage(0)`
falls through `TaskMemoryManager.allocatePage(Math.max(pageSize, 0))` and
returns a default-sized page. Subsequent `append`s keep advancing `cursor` past
the new page's end and `Platform.copyMemory(... page.getBaseObject, cursor,
...)` writes/reads into adjacent native pages, eventually crashing inside the
SIMD-optimized `StubRoutines::forward_copy_longs` on aarch64 (SEGV_ACCERR at
the over-read of the next mmap page).
We observed the crash on ARM Graviton; this fix resolves it. The bug is a
latent heap corruption regardless of architecture.
Fix: use `Long` multiplication (`* 8L`) at all three sites so the multiply
matches `allocatePage`/`cursor`'s declared `Long` types.
### Why are the changes needed?
To fix a JVM SEGV in `LongToUnsafeRowMap` triggered when the page reaches
the 8 GiB cap, observed on ARM Graviton.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing `HashedRelationSuite` tests cover the affected paths. Validated on
a downstream broadcast-hash-join build on ARM Graviton where the original SEGV
reproduced; no crash with this fix applied.
The reproducible suite is internal and it is hard to port to OSS. But the
bug can be observed from the code clearly.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]