rdtr opened a new pull request, #12165:
URL: https://github.com/apache/gluten/pull/12165

   
   ## Summary
   
   Enables `subquery/exists-subquery/exists-orderby-limit.sql` in spark41 SQL 
query tests by re-enabling `ConstantFolding` for just this one file via a 
per-file `--SET spark.sql.optimizer.excludedRules=...` directive. The test was 
previously TODO-commented out because it crashed with an `INTERNAL_ERROR` 
during physical planning.
   
   ## Root cause
   
   `GlutenSQLQueryTestSuite` excludes `ConvertToLocalRelation`, 
`ConstantFolding`,
   and `NullPropagation` from the optimizer by default to force queries through
   Gluten's offload paths. With `ConstantFolding` excluded, Spark's 
`LimitPushDown`
   rule produces an unfolded `Add` expression that `BasicOperators` in
   `SparkStrategies` cannot match, causing physical planning to fail with:
   
   ```
   java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
   +- Project [1 AS col#...]
      +- Filter (dept_id#... > 10)
         +- LocalRelation [dept_id#..., dept_name#..., state#...]
   
     at scala.Predef$.assert(Predef.scala:279)
     at 
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
     at 
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
     ...
     at 
org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.compileSubquery(...)
   ```
   
   wrapped as `[INTERNAL_ERROR] The Spark SQL phase planning failed with an 
internal error`.
   
   The trigger is the EXISTS+OFFSET pattern (query # 17 in this SQL file):
   ```sql
   SELECT * FROM emp WHERE EXISTS (
     SELECT dept.dept_name FROM dept WHERE dept.dept_id > 10 LIMIT 1 OFFSET 2
   )
   ```
   
   `LimitPushDown` rewrites `LocalLimit(le, Offset(oe, child))` into 
`Offset(oe, LocalLimit(Add(le, oe), child))` and relies on `ConstantFolding` to 
subsequently fold `Add(Literal(1), Literal(2))` to `Literal(3)` so that 
`BasicOperators` (which only matches `LocalLimit(IntegerLiteral, _)`) can 
produce a physical plan.
   
   ## Fix
   
   Add a per-file `--SET` directive that overrides the default exclusion to 
only exclude `ConvertToLocalRelation`, re-enabling `ConstantFolding` (and 
`NullPropagation`, which gets re-enabled because the test framework's `--SET` 
parser splits values by comma — see the note below). This keeps Gluten's 
offload paths exercised while allowing the test to plan.
   
   The upstream Spark fix is tracked as **SPARK-57125** (PR
   [apache/spark#56180](https://github.com/apache/spark/pull/56180)), which 
makes
   `LimitPushDown` produce a folded literal directly so the rule no longer 
depends
   on `ConstantFolding`. Once that lands and Gluten picks up the Spark version,
   the `--SET` directive in this file can be removed.
   
   ### Test framework limitation
   
   The Gluten/Spark SQL test framework's `--SET` parser at
   `SQLQueryTestHelper.scala:476-481` splits values by comma at the top level,
   which means multi-rule values like
   `excludedRules=Rule1,Rule2` can't be specified in a single `--SET`
   (`StringIndexOutOfBoundsException`). For now, accepting `NullPropagation`
   being re-enabled is fine for this test. Filed as a separate follow-up.
   
   ## Verification
   
   - ✅ Reproduced the `AssertionError: No plan for LocalLimit (1 + 2)` by
     enabling the test without the `--SET` directive on the original
     `GlutenSQLQueryTestSuite` (not a diagnostic subclass).
   - ✅ With the `--SET` directive applied, the test passes.
   - ✅ Spark unit test added in SPARK-57125 / apache/spark#56180 fails on
     master and passes with the upstream fix.
   
   ## Regarding Spark 4.0
   
   The same SQL test file (with the same EXISTS+OFFSET queries) is enabled and
   appears to pass in `gluten-ut/spark40`. I checked:
   
   - Same SQL input file content
   - Identical golden file (both versions expect successful results for the
     OFFSET queries)
   - Same `LimitPushDown` rule in Spark 4.0.0 source (verified via the GitHub
     v4.0.0 tag)
   - Same `BasicOperators` `IntegerLiteral`-only matchers in Spark 4.0.0 source
   - Same rule exclusions (`ConvertToLocalRelation`, `ConstantFolding`,
     `NullPropagation`) in `gluten-ut/spark40/.../GlutenSQLQueryTestSuite.scala`
   - `gluten-ut/spark40` is enabled in `velox_backend_x86.yml` and the
     `spark-test-spark40-slow` job runs `GlutenSQLQueryTestSuite` via the
     `ExtendedSQLTest` tag — and passes
   
   I couldn't identify what makes Spark 4.0.0 + Gluten avoid hitting this code 
path while Spark 4.1.1 + Gluten triggers it. The bug ingredients look 
identical. If a reviewer with more context on the spark40 setup can shed light, 
that would be appreciated — but it doesn't block this PR. I am happy to build 
and test locally with 4.0 if necessary.
   
   ## Test plan
   
   - [ ] CI: `spark-test-spark41-slow` runs `GlutenSQLQueryTestSuite` with
     `ExtendedSQLTest` tag, which exercises this file.
   - [ ] Verified locally in IntelliJ by running `GlutenSQLQueryTestSuite`
     filtered to `subquery/exists-subquery/exists-orderby-limit.sql`.
   
   Related: GLUTEN-11916, 
[#12146](https://github.com/apache/incubator-gluten/pull/12146)
   (first batch of Spark 4.1 TODO test fixes).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to