rdtr opened a new pull request, #12165:
URL: https://github.com/apache/gluten/pull/12165
## Summary
Enables `subquery/exists-subquery/exists-orderby-limit.sql` in spark41 SQL
query tests by re-enabling `ConstantFolding` for just this one file via a
per-file `--SET spark.sql.optimizer.excludedRules=...` directive. The test was
previously TODO-commented out because it crashed with an `INTERNAL_ERROR`
during physical planning.
## Root cause
`GlutenSQLQueryTestSuite` excludes `ConvertToLocalRelation`,
`ConstantFolding`,
and `NullPropagation` from the optimizer by default to force queries through
Gluten's offload paths. With `ConstantFolding` excluded, Spark's
`LimitPushDown`
rule produces an unfolded `Add` expression that `BasicOperators` in
`SparkStrategies` cannot match, causing physical planning to fail with:
```
java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
+- Project [1 AS col#...]
+- Filter (dept_id#... > 10)
+- LocalRelation [dept_id#..., dept_name#..., state#...]
at scala.Predef$.assert(Predef.scala:279)
at
org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at
org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
...
at
org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.compileSubquery(...)
```
wrapped as `[INTERNAL_ERROR] The Spark SQL phase planning failed with an
internal error`.
The trigger is the EXISTS+OFFSET pattern (query # 17 in this SQL file):
```sql
SELECT * FROM emp WHERE EXISTS (
SELECT dept.dept_name FROM dept WHERE dept.dept_id > 10 LIMIT 1 OFFSET 2
)
```
`LimitPushDown` rewrites `LocalLimit(le, Offset(oe, child))` into
`Offset(oe, LocalLimit(Add(le, oe), child))` and relies on `ConstantFolding` to
subsequently fold `Add(Literal(1), Literal(2))` to `Literal(3)` so that
`BasicOperators` (which only matches `LocalLimit(IntegerLiteral, _)`) can
produce a physical plan.
## Fix
Add a per-file `--SET` directive that overrides the default exclusion to
only exclude `ConvertToLocalRelation`, re-enabling `ConstantFolding` (and
`NullPropagation`, which gets re-enabled because the test framework's `--SET`
parser splits values by comma — see the note below). This keeps Gluten's
offload paths exercised while allowing the test to plan.
The upstream Spark fix is tracked as **SPARK-57125** (PR
[apache/spark#56180](https://github.com/apache/spark/pull/56180)), which
makes
`LimitPushDown` produce a folded literal directly so the rule no longer
depends
on `ConstantFolding`. Once that lands and Gluten picks up the Spark version,
the `--SET` directive in this file can be removed.
### Test framework limitation
The Gluten/Spark SQL test framework's `--SET` parser at
`SQLQueryTestHelper.scala:476-481` splits values by comma at the top level,
which means multi-rule values like
`excludedRules=Rule1,Rule2` can't be specified in a single `--SET`
(`StringIndexOutOfBoundsException`). For now, accepting `NullPropagation`
being re-enabled is fine for this test. Filed as a separate follow-up.
## Verification
- ✅ Reproduced the `AssertionError: No plan for LocalLimit (1 + 2)` by
enabling the test without the `--SET` directive on the original
`GlutenSQLQueryTestSuite` (not a diagnostic subclass).
- ✅ With the `--SET` directive applied, the test passes.
- ✅ Spark unit test added in SPARK-57125 / apache/spark#56180 fails on
master and passes with the upstream fix.
## Regarding Spark 4.0
The same SQL test file (with the same EXISTS+OFFSET queries) is enabled and
appears to pass in `gluten-ut/spark40`. I checked:
- Same SQL input file content
- Identical golden file (both versions expect successful results for the
OFFSET queries)
- Same `LimitPushDown` rule in Spark 4.0.0 source (verified via the GitHub
v4.0.0 tag)
- Same `BasicOperators` `IntegerLiteral`-only matchers in Spark 4.0.0 source
- Same rule exclusions (`ConvertToLocalRelation`, `ConstantFolding`,
`NullPropagation`) in `gluten-ut/spark40/.../GlutenSQLQueryTestSuite.scala`
- `gluten-ut/spark40` is enabled in `velox_backend_x86.yml` and the
`spark-test-spark40-slow` job runs `GlutenSQLQueryTestSuite` via the
`ExtendedSQLTest` tag — and passes
I couldn't identify what makes Spark 4.0.0 + Gluten avoid hitting this code
path while Spark 4.1.1 + Gluten triggers it. The bug ingredients look
identical. If a reviewer with more context on the spark40 setup can shed light,
that would be appreciated — but it doesn't block this PR. I am happy to build
and test locally with 4.0 if necessary.
## Test plan
- [ ] CI: `spark-test-spark41-slow` runs `GlutenSQLQueryTestSuite` with
`ExtendedSQLTest` tag, which exercises this file.
- [ ] Verified locally in IntelliJ by running `GlutenSQLQueryTestSuite`
filtered to `subquery/exists-subquery/exists-orderby-limit.sql`.
Related: GLUTEN-11916,
[#12146](https://github.com/apache/incubator-gluten/pull/12146)
(first batch of Spark 4.1 TODO test fixes).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]