srielau opened a new pull request, #55977: URL: https://github.com/apache/spark/pull/55977
### What changes were proposed in this pull request? Break the `SessionCatalog`/`CatalogManager` lock-order inversion that can deadlock concurrent `USE SCHEMA` / `USE CATALOG` and unqualified function resolution on the same `SparkSession`. - `CatalogManager.setCurrentNamespace` / `setCurrentCatalog`: snapshot the dispatch decision under the manager's intrinsic lock, run the `v1SessionCatalog` callbacks **outside** the lock, then publish the new state under the lock again. This stops the "manager lock then catalog lock" arm of the cycle. - Add `CatalogManager.sessionFunctionKindsForUnqualifiedResolution()` that snapshots `(currentCatalog, currentNamespace, sessionPath)` in a single critical section. The `v1SessionCatalog.getCurrentDatabase` read needed for the default-namespace fallback is taken **before** the manager lock, so the helper never re-introduces the deadlock cycle while still avoiding torn-state observations under racing path updates. - Route `SessionCatalog.sessionFunctionKindsInResolutionOrder` and `FunctionResolution.isSessionBeforeBuiltinInPath` through that single helper, so the lookup loop and the `session-before-builtin` predicate share one consistent snapshot. - Tighten the doc comments on the affected methods to document the locking contract and prevent future regressions. ### Why are the changes needed? After SPARK-56750 wired `CatalogManager` into `SessionCatalog` as the live source for path-driven session function kinds, two paths form a lock-order inversion: - **Arm 1** (`SessionCatalog.synchronized` -> `CatalogManager.synchronized`): unqualified function resolution evaluating the live PATH reaches into `CatalogManager.sqlResolutionPathEntries` (which synchronizes on the manager) while holding the catalog's intrinsic lock at peer call sites. - **Arm 2** (`CatalogManager.synchronized` -> `SessionCatalog.synchronized`): `setCurrentNamespace` / `setCurrentCatalog` hold the manager's lock and then call back into `v1SessionCatalog.setCurrentDatabase*`, which synchronizes on `SessionCatalog`. Two threads sharing a `SparkSession` -- one running any SQL with an unqualified function reference, the other running `USE SCHEMA` / `USE CATALOG` -- can wedge on each other's intrinsic locks. The hazard is independent of `spark.sql.functionResolution.sessionOrder`: Arm 1 still acquires the manager lock just to read what the order is, and Arm 2 has nothing to do with order at all. See [SPARK-56939](https://issues.apache.org/jira/browse/SPARK-56939) for a standalone repro. ### Does this PR introduce _any_ user-facing change? No. This is a concurrency fix; serial behavior is unchanged. The only observable difference under contention is that the session no longer deadlocks. ### How was this patch tested? - New regression test in `SetPathSuite`, `SPARK-56939: concurrent USE SCHEMA and unqualified function lookups do not deadlock`, that follows the JIRA's exact repro: one thread alternates `USE SCHEMA s1` / `USE SCHEMA s2`, another runs unqualified `count(*)` queries. Without the fix the threads hang on each other's intrinsic locks; with the fix the test completes within the 30s budget. - ``build/sbt 'sql/testOnly *SetPathSuite'`` -- 60/60 pass. - ``build/sbt 'catalyst/testOnly *SessionCatalogSuite *CatalogManagerSuite *FunctionResolution*'`` -- 119/119 pass. - ``build/sbt 'sql/testOnly *SQLFunctionSuite *SQLViewSuite'`` -- 70/70 pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor / Claude Opus 4.7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
