LuciferYang opened a new pull request, #55853:
URL: https://github.com/apache/spark/pull/55853

   ### What changes were proposed in this pull request?
   
   Extend the bulk read+widen pattern introduced in SPARK-56791 to 
`DowncastLongUpdater` (parquet INT64 + DECIMAL(p<=9) read into a Spark 32-bit 
`DecimalType`).
   
   A new `readLongsAsInts` default method on `VectorizedValuesReader` does the 
per-row fallback. `VectorizedPlainValuesReader` overrides it to fetch source 
bytes once via `getBuffer(total * 8)` and run a tight in-method conversion 
loop. `DowncastLongUpdater.readValues` becomes a one-line delegation. The 
narrowing is Java's primitive long-to-int cast (`(int) buffer.getLong()`), 
which discards the high 32 bits; this is non-lossy in practice because 
Parquet's DECIMAL(p<=9) encoding bounds the value range to `[-999_999_999, 
999_999_999]`.
   
   ### Why are the changes needed?
   
   `DowncastLongUpdater.readValues` allocates a fresh `ByteBuffer` slice inside 
`getBuffer(8)` for every element on the legacy path, and that allocation 
dominates the loop. Collapsing N allocations into one is the same win 
SPARK-56791 delivered for the INT32 -> Long sibling.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   (To be updated after the GHA benchmark and test runs complete.)
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to