LuciferYang opened a new pull request, #55853: URL: https://github.com/apache/spark/pull/55853
### What changes were proposed in this pull request? Extend the bulk read+widen pattern introduced in SPARK-56791 to `DowncastLongUpdater` (parquet INT64 + DECIMAL(p<=9) read into a Spark 32-bit `DecimalType`). A new `readLongsAsInts` default method on `VectorizedValuesReader` does the per-row fallback. `VectorizedPlainValuesReader` overrides it to fetch source bytes once via `getBuffer(total * 8)` and run a tight in-method conversion loop. `DowncastLongUpdater.readValues` becomes a one-line delegation. The narrowing is Java's primitive long-to-int cast (`(int) buffer.getLong()`), which discards the high 32 bits; this is non-lossy in practice because Parquet's DECIMAL(p<=9) encoding bounds the value range to `[-999_999_999, 999_999_999]`. ### Why are the changes needed? `DowncastLongUpdater.readValues` allocates a fresh `ByteBuffer` slice inside `getBuffer(8)` for every element on the legacy path, and that allocation dominates the loop. Collapsing N allocations into one is the same win SPARK-56791 delivered for the INT32 -> Long sibling. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? (To be updated after the GHA benchmark and test runs complete.) ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
