[ 
https://issues.apache.org/jira/browse/SPARK-57125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-57125:
-----------------------------------
    Labels: pull-request-available  (was: )

> LimitPushDown should fold literal Limit+Offset sum so plan stays planable 
> without ConstantFolding
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57125
>                 URL: https://issues.apache.org/jira/browse/SPARK-57125
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.8, 4.1.2
>            Reporter: Norio Akagi
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>
> h3. Description:
> {{LimitPushDown}} in {{Optimizer.scala}} rewrites {{LocalLimit(le, Offset(oe, 
> child))}} to {{{}Offset(oe, LocalLimit(Add(le, oe), child)){}}}. The 
> {{Add(le, oe)}} is left unfolded, depending on a subsequent ConstantFolding 
> run to become a Literal. If ConstantFolding is excluded via 
> {{{}spark.sql.optimizer.excludedRules{}}}, the resulting 
> {{LocalLimit(Add(Literal(N), Literal(M)), ...)}} cannot be matched by 
> BasicOperators in SparkStrategies (which only matches 
> {{{}LocalLimit(IntegerLiteral, _)){}}}, causing physical planning to fail 
> with 
> {noformat}
> AssertionError: No plan for LocalLimit (N + M){noformat}
> h4. Reproduction (Scala):
> {noformat}
> val spark = SparkSession.builder().master("local[2]")
> .config("spark.sql.optimizer.excludedRules",
> "org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
> .getOrCreate()
> spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES 
> (10,'d1'),(20,'d2'),(30,'d3') AS dept(id,name)")
> spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS emp(id, 
> dept_id)")
> spark.sql("""
> SELECT * FROM emp
> WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
> """).show(){noformat}
> Fails with
> {noformat}
> [INTERNAL_ERROR] Caused by: AssertionError: assertion failed: No plan for 
> LocalLimit (1 + 2){noformat}
> Root cause: Optimizer.scala:
> {noformat}
> case LocalLimit(le, Offset(oe, grandChild)) =>
> Offset(oe, LocalLimit(Add(le, oe), grandChild)){noformat}
> This rule produces output that requires ConstantFolding to be planable.
> h4. Fix:
> Fold the sum eagerly when both operands are integer literals (the realistic 
> case from LIMIT N OFFSET M):
>  
> {noformat}
> case LocalLimit(le, Offset(oe, grandChild)) =>
> val mergedLimit = (le, oe) match {
>   case (IntegerLiteral(l), IntegerLiteral(o)) => Literal(l + o, IntegerType)
>   case _ => Add(le, oe)
> }
> Offset(oe, LocalLimit(mergedLimit, grandChild))
> {noformat}
> h4. Impact:
>  - Default config: no behavior change ({{{}ConstantFolding{}}} folds the Add 
> as today)
>  - With {{ConstantFolding}} excluded: queries with LIMIT N OFFSET M no longer 
> crash at physical planning
> Related: SPARK-39057 introduced this rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to