gengliangwang opened a new pull request, #55934:
URL: https://github.com/apache/spark/pull/55934

   **Title**: [SPARK-56909][SQL] Refactor Cast to int/long codegen under ANSI 
mode
   
   **Base**: apache/spark master
   **Head**: gengliangwang:SPARK-56909-cast-int-long
   
   ---
   
   ### What changes were proposed in this pull request?
   
   Introduce `CastUtils.java` and use it from `Cast.scala` to collapse the 
multi-line ANSI overflow-check codegen for casts that target `int` and `long` 
into one-line static-method calls. Source and target `DataType` constants used 
in the overflow error message live as `private static final` fields on the 
helper class, so the happy path performs no per-row `references[]` lookups.
   
   Helpers added:
   * `longToIntExact(long)` for narrowing `long -> int`.
   * `floatToIntExact(float)`, `doubleToIntExact(double)` for fractional -> int.
   * `floatToLongExact(float)`, `doubleToLongExact(double)` for fractional -> 
long.
   
   `Cast.scala` changes:
   * `castIntegralTypeToIntegralTypeExactCode` and 
`castFractionToIntegralTypeCode` dispatch on the target type: `int` (and `long` 
for the fraction case) emit a `CastUtils.<...>Exact` call; byte/short targets 
keep the inline body (refactored in SPARK-56910).
   * Eval paths for `castToInt` add ANSI `LongType` / `FloatType` / 
`DoubleType` cases, and `castToLong` adds `FloatType` / `DoubleType` cases, 
both delegating to the new helpers.
   
   Also adds `ExpressionClassIdentitySuite` as a regression guard that `Add`, 
`Cast`, `EqualTo`, `And` keep their class identity after canonicalization, so a 
future PR cannot accidentally wrap a hot expression in `RuntimeReplaceable` and 
silently break optimizer pattern matches that look for those classes.
   
   ### Why are the changes needed?
   
   Part of SPARK-56908 (umbrella). The current ANSI cast codegen emits 5-line 
inline overflow blocks per call site. Multiplied across the many cast paths in 
a TPC-DS plan, this contributes meaningfully to the generated source size and 
to Janino compile time, and pushes whole-stage methods closer to the 64KB JVM 
method limit.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. The compiled behavior is identical; only the emitted Java source text 
changes.
   
   ### How was this patch tested?
   
   ```
   build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite 
*CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite 
*ExpressionClassIdentitySuite"
   ```
   
   312/312 pass.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Cursor 1.x
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to