srielau opened a new pull request, #56038: URL: https://github.com/apache/spark/pull/56038
### What changes were proposed in this pull request? Backports the default SQL `PATH` stabilization stack from `master` to `branch-4.x`. Four commits, applied in dependency order: | # | Upstream commit | PR | JIRA | |---|-----------------|----|------| | 1 | `3a3e4edb02d` | [#55647](https://github.com/apache/spark/pull/55647) | [SPARK-56681][SQL] PATH cleanup | | 2 | `cc16f9166e9` | [#55717](https://github.com/apache/spark/pull/55717) | [SPARK-56750][SQL] default path config | | 3 | `26dd5955522` | [#55866](https://github.com/apache/spark/pull/55866) | [SPARK-56853][SQL][TESTS] Improve PATH Tests | | 4 | `6ecd76f8345` | [#55977](https://github.com/apache/spark/pull/55977) | [SPARK-56939][SQL] Resolve deadlock between USE and function lookup | Net effect on `branch-4.x`: - New session conf `spark.sql.defaultPath` (parallel to `spark.sql.defaultCatalog`), with `SET PATH = DEFAULT_PATH` expanding to its value and an inner `DEFAULT_PATH` token cycle-safely falling back to the spark-builtin default ordering. - `CatalogManager` materializes a single effective `sqlResolutionPathEntries` from stored `SET PATH`, the parsed `DEFAULT_PATH` conf, and the seeded `defaultPathOrder`. `current_path()`, unqualified function resolution, and unqualified relation resolution all read from this single source of truth. - Function-resolution shortcuts (the `COUNT(*) -> COUNT(1)` rewrite gate and the temp-vs-builtin security check) are now driven by the live PATH via `FunctionResolution.isSessionBeforeBuiltinInPath` / `CatalogManager.systemFunctionKindsFromPath`, fixing a semantic gap where `SET PATH = system.session, system.builtin, ...` did not flip those gates unless the legacy `spark.sql.functionResolution.sessionOrder` conf was also flipped. - `SET PATH` rejects static literal-vs-literal duplicates at statement time. Literal-vs-`CURRENT_SCHEMA` collisions are tolerated (state-dependent and harmless under first-match resolution). - Concurrency fix: `CatalogManager.setCurrentNamespace` / `setCurrentCatalog` / `reset` no longer hold the manager's intrinsic lock across the `v1SessionCatalog` callbacks (split-lock pattern). New `CatalogManager.sessionFunctionKindsForUnqualifiedResolution()` is the single snapshot point shared by `SessionCatalog.sessionFunctionKindsInResolutionOrder` and `FunctionResolution.isSessionBeforeBuiltinInPath`, eliminating the `SessionCatalog <-> CatalogManager` lock-order inversion that could deadlock concurrent `USE SCHEMA` / `USE CATALOG` and unqualified function lookups. - Test coverage: Spark Connect E2E (`SqlPathE2ETestSuite`), PySpark (`test_path_*` in `CatalogTestsMixin`), golden SQL (`sql-tests/inputs/sql-path.sql`), V2 catalogs in PATH (`SqlPathV2CatalogSuite`), PATH-disabled persisted-view fallback, `cloneSession()` propagation matrix, `ALTER VIEW ... WITH SCHEMA` frozen-path preservation, and `CatalogManagerSuite` / `SqlPathFormatSuite` unit tests for serialization, `PathElement.validateNoStaticDuplicates`, and display formatting. Two minor conflict resolutions during cherry-pick, both confined to `sql/core/src/test/scala/org/apache/spark/sql/SetPathSuite.scala`: - During #55977: brought in the new SPARK-56939 concurrency test verbatim; the conflict was an empty HEAD side on this branch. - During #55866: kept the SPARK-56939 concurrency test already on this branch (#55866 predates #55977 in time on master, so its incoming side was empty at that hunk). ### Why are the changes needed? The default `PATH` feature plus the deadlock fix have shipped together on master. Backporting all four commits keeps `branch-4.x` consistent with the released semantics: admins get a sessionable default path, the routine-resolution shortcuts agree with `SET PATH`, and the `USE SCHEMA` / `USE CATALOG` deadlock that surfaced after #55717 is fixed. ### Does this PR introduce _any_ user-facing change? Yes, on `branch-4.x` (matches the user-facing surface already on master): - New conf `spark.sql.defaultPath`. - With `spark.sql.path.enabled=true`, sessions without explicit `SET PATH` resolve through the configured default. - `SET PATH = DEFAULT_PATH` and `CURRENT_PATH()` reflect the configured default. - Function-resolution shortcuts (`COUNT(*) -> COUNT(1)` rewrite gate, temp-vs-builtin security check) now follow the effective PATH instead of only `spark.sql.functionResolution.sessionOrder`. - `UNRESOLVED_VARIABLE.searchPath` is rendered as a bracketed list and (for DML lookups) reflects the live SQL path. - `DROP TEMPORARY VARIABLE` no longer raises `UNRESOLVED_VARIABLE` when `system.session` is not on the path (DDL on session variables ignores PATH, matching `DECLARE OR REPLACE VARIABLE`). - `lookupFunctionType` no longer swallows non-`NotFound` catalog errors as `UNRESOLVED_ROUTINE`. - No serial-behavior change from #55977; under contention, the session no longer deadlocks. ### How was this patch tested? Local on the backport branch (`branch-4.x` base): - `build/sbt catalyst/compile sql/compile sql/Test/compile` -- clean, no errors (1m42s). - `build/sbt 'sql/testOnly *SetPathSuite'` -- 60/60 pass (includes the SPARK-56939 concurrent-`USE`/lookup test and the SPARK-56853 PATH-QA additions). - `build/sbt 'catalyst/testOnly *CatalogManagerSuite *FunctionResolution* *SessionCatalogSuite *SqlPathFormatSuite'` -- 126/126 pass. - Broader suites (`*SQLViewSuite`, `*SQLFunctionSuite`, `*RelationQualificationSuite`, `*FunctionQualificationSuite`, `*DataSourceV2FunctionSuite`) are running locally in parallel with CI. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor / Claude Opus 4.7 (backport mechanics: cherry-pick, conflict resolution, PR description). The original product PRs disclose their own AI tooling usage individually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
