[ 
https://issues.apache.org/jira/browse/SPARK-56908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-56908:
-----------------------------------
    Description: 
Whole-stage codegen generates a fresh Java class per stage. Across many 
operators the generated source contains (a) boilerplate that is 
type-independent across stages and can be deduplicated into static Java 
helpers, and (b) branches or variables that are statically dead at codegen time 
but emitted anyway.

These patterns cost us in three places:
- JVM 64KB method-size and constant-pool limits, which force interpreted 
fallback on deep query plans.
- Janino compile time per stage.
- JIT compile work (each stage class has its own bodies).

This umbrella tracks small, behavior-preserving cleanups across the generated 
Java to address these issues. Each subtask is independently PR-able; behavior 
is preserved end-to-end and verified by the relevant operator's existing test 
suite with {{spark.sql.codegen.wholeStage}} forced both on and off.
        Summary: Reduce generated Java size in whole-stage codegen  (was: 
Improve expression codegen for ANSI SQL mode)

> Reduce generated Java size in whole-stage codegen
> -------------------------------------------------
>
>                 Key: SPARK-56908
>                 URL: https://issues.apache.org/jira/browse/SPARK-56908
>             Project: Spark
>          Issue Type: Umbrella
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: Gengliang Wang
>            Priority: Major
>              Labels: pull-request-available
>
> Whole-stage codegen generates a fresh Java class per stage. Across many 
> operators the generated source contains (a) boilerplate that is 
> type-independent across stages and can be deduplicated into static Java 
> helpers, and (b) branches or variables that are statically dead at codegen 
> time but emitted anyway.
> These patterns cost us in three places:
> - JVM 64KB method-size and constant-pool limits, which force interpreted 
> fallback on deep query plans.
> - Janino compile time per stage.
> - JIT compile work (each stage class has its own bodies).
> This umbrella tracks small, behavior-preserving cleanups across the generated 
> Java to address these issues. Each subtask is independently PR-able; behavior 
> is preserved end-to-end and verified by the relevant operator's existing test 
> suite with {{spark.sql.codegen.wholeStage}} forced both on and off.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to