srielau commented on code in PR #55977:
URL: https://github.com/apache/spark/pull/55977#discussion_r3272828330


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -132,15 +132,33 @@ class CatalogManager(
     }
   }
 
-  def setCurrentNamespace(namespace: Array[String]): Unit = synchronized {
-    if (isSessionCatalog(currentCatalog) && namespace.length == 1) {
+  def setCurrentNamespace(namespace: Array[String]): Unit = {
+    // SPARK-56939: do NOT hold [[CatalogManager]]'s intrinsic lock across the 
callbacks below.
+    // [[v1SessionCatalog.setCurrentDatabaseWithNameCheck]] briefly 
synchronizes on
+    // [[SessionCatalog]], and concurrent unqualified function resolution 
acquires the
+    // [[SessionCatalog]] lock and then reaches into [[CatalogManager]] via
+    // [[sqlResolutionPathEntries]]; nesting the manager lock outside the 
catalog lock here
+    // would invert that order and deadlock. Snapshot the dispatch decision 
under the lock,
+    // run callbacks outside it, then publish the new namespace under the lock 
again.
+    //
+    // Concurrency trade-off versus the pre-SPARK-56939 atomic version: the 
`isSession`
+    // snapshot can drift if a concurrent [[setCurrentCatalog]] switches to a 
v2 catalog
+    // between this read and the v1 callback below -- the callback would still 
touch
+    // `v1.currentDb` even though the active catalog is no longer the session 
catalog. A
+    // later switch back to the session catalog resets `v1.currentDb` to 
`default` (see
+    // [[setCurrentCatalog]]), so long-term state remains consistent; only the 
intermediate
+    // observation is novel. Acceptable trade-off against the deadlock 
alternative.

Review Comment:
   Good catch -- the prior comment scoped the trade-off to v1-side staleness 
only. Restructured the block in commit f504667f into two labeled cases: (a) the 
v1-side `isSession` drift it already documented, and (b) the sticky CM-side 
publish-overwrite race you describe (concurrent `setCurrentCatalog` completing 
between the v1 callback returning and the publish, leaving `_currentNamespace = 
Some(namespace)` published under a different `_currentCatalogName` than the 
snapshotted one, with no auto-recovery). Framed (b) explicitly as 
last-writer-wins for racing `USE` commands -- the conventional expectation -- 
so it reads as an accepted trade-off rather than a hidden anomaly.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -221,6 +239,15 @@ class CatalogManager(
    * When PATH is enabled and a session path is in effect (stored or via
    * [[SQLConf#DEFAULT_PATH]]), formats the resolved entries. Otherwise falls 
back to the legacy
    * resolutionSearchPath.
+   *
+   * SPARK-56939 note: this is currently the only intentional 
`CatalogManager.synchronized ->
+   * SessionCatalog.synchronized` nest left in this class. The transitive call 
into
+   * [[v1SessionCatalog.getCurrentDatabase]] happens via [[currentNamespace]], 
which fetches
+   * the v1 current database under the CM lock. It is safe today because no 
code path holds
+   * [[SessionCatalog]]'s intrinsic lock while waiting on [[CatalogManager]]'s 
-- the
+   * SPARK-56939 fix removed every such SC->CM ordering. Any future change 
that introduces a
+   * new SC->CM ordering must avoid resurrecting the deadlock by also taking
+   * `currentPathString` (or any other CM->SC nest) into account.

Review Comment:
   Applied your suggested rewording (f504667f) -- inverted the clauses so the 
modal verb attaches unambiguously.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -265,6 +292,48 @@ class CatalogManager(
       currentCatalog, currentNamespace,
       currentCatalog, currentNamespace)
 
+  /**
+   * Snapshot the live PATH-derived [[SessionCatalog.SessionFunctionKind]] 
order used by
+   * unqualified function/table-function resolution.
+   *
+   * The `(currentCatalog, _currentNamespace, sessionPath)` triple is read 
together inside a
+   * single CM critical section so a concurrent `USE` / `SET PATH` cannot 
return a torn
+   * snapshot for those three fields (e.g. catalog from one observation, 
explicit namespace
+   * from another).
+   *
+   * The `v1SessionCatalog.getCurrentDatabase` read needed for the 
default-namespace fallback
+   * is taken OUTSIDE the CM lock and is therefore intentionally racy w.r.t. a 
concurrent
+   * `USE SCHEMA`. That staleness is harmless for this helper's output: this 
method consumes
+   * `effectiveNs` only to expand `CURRENT_SCHEMA` markers in the SQL path, and
+   * [[CatalogManager.systemFunctionKindsFromPath]] only retains literal 
`system.builtin` /
+   * `system.session` entries from the resolved path -- it never inspects any
+   * `(catalog, namespace)` derived from `v1`. So if `v1CurrentDb` lags by one 
`USE SCHEMA`,
+   * a `CURRENT_SCHEMA` entry might briefly resolve to the previous database, 
but the kinds
+   * list (the only thing returned here) is unaffected. Lifting the read 
inside the CM lock
+   * would re-introduce the SPARK-56939 lock-order inversion this helper 
exists to avoid.

Review Comment:
   Applied (f504667f). "Lifting" -> "Moving".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to