Norio Akagi created SPARK-57125:
-----------------------------------

             Summary: LimitPushDown should fold literal Limit+Offset sum so 
plan stays planable without ConstantFolding
                 Key: SPARK-57125
                 URL: https://issues.apache.org/jira/browse/SPARK-57125
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.1.2, 3.5.8
            Reporter: Norio Akagi
             Fix For: 5.0.0


h3. Description:

{{LimitPushDown}} in {{Optimizer.scala}} rewrites {{LocalLimit(le, Offset(oe, 
child))}} to {{{}Offset(oe, LocalLimit(Add(le, oe), child)){}}}. The {{Add(le, 
oe)}} is left unfolded, depending on a subsequent ConstantFolding run to become 
a Literal. If ConstantFolding is excluded via 
{{{}spark.sql.optimizer.excludedRules{}}}, the resulting 
{{LocalLimit(Add(Literal(N), Literal(M)), ...)}} cannot be matched by 
BasicOperators in SparkStrategies (which only matches 
{{{}LocalLimit(IntegerLiteral, _)){}}}, causing physical planning to fail with 
{noformat}
AssertionError: No plan for LocalLimit (N + M){noformat}
h4. Reproduction (Scala):
{noformat}
val spark = SparkSession.builder().master("local[2]")
.config("spark.sql.optimizer.excludedRules",
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
.getOrCreate()
spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES 
(10,'d1'),(20,'d2'),(30,'d3') AS dept(id,name)")
spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS emp(id, 
dept_id)")
spark.sql("""
SELECT * FROM emp
WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
""").show(){noformat}
Fails with
{noformat}
[INTERNAL_ERROR] Caused by: AssertionError: assertion failed: No plan for 
LocalLimit (1 + 2){noformat}
Root cause: Optimizer.scala:
{noformat}
case LocalLimit(le, Offset(oe, grandChild)) =>
Offset(oe, LocalLimit(Add(le, oe), grandChild)){noformat}
This rule produces output that requires ConstantFolding to be planable.
h4. Fix:

Fold the sum eagerly when both operands are integer literals (the realistic 
case from LIMIT N OFFSET M):

 
{noformat}
case LocalLimit(le, Offset(oe, grandChild)) =>
val mergedLimit = (le, oe) match {
  case (IntegerLiteral(l), IntegerLiteral(o)) => Literal(l + o, IntegerType)
  case _ => Add(le, oe)
}
Offset(oe, LocalLimit(mergedLimit, grandChild))
{noformat}
h4. Impact:
 - Default config: no behavior change ({{{}ConstantFolding{}}} folds the Add as 
today)
 - With {{ConstantFolding}} excluded: queries with LIMIT N OFFSET M no longer 
crash at physical planning

Related: SPARK-39057 introduced this rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to