srielau commented on code in PR #55717:
URL: https://github.com/apache/spark/pull/55717#discussion_r3196685813


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -113,17 +113,34 @@ class SessionCatalog(
     identifier.copy(funcName = "") == SESSION_NAMESPACE_TEMPLATE
 
   /**
-   * Session function kinds in resolution order for unqualified lookups.
-   * Matches [[SQLConf.sessionFunctionResolutionOrder]]: "first" (session 
first),
-   * "second" (default), "last" (builtin only; session tried after persistent).
+   * Provider for the ordered `system.builtin` / `system.session` entries on 
the effective SQL
+   * PATH (mapped to [[SessionFunctionKind]]). Wired by
+   * [[org.apache.spark.sql.connector.catalog.CatalogManager]] after 
construction so unqualified
+   * function lookups and the security check that blocks temp functions from 
shadowing builtins
+   * read the live PATH (post-`SET PATH`, with [[SQLConf.DEFAULT_PATH]] and
+   * [[SQLConf.defaultPathOrder]] fallbacks already applied) instead of the 
legacy
+   * [[SQLConf.SESSION_FUNCTION_RESOLUTION_ORDER]] proxy.
+   *
+   * The default reproduces the previous "second" behavior 
(builtin-then-session) for tests
+   * that construct a [[SessionCatalog]] directly without going through 
[[CatalogManager]].
    */
-  private def sessionFunctionKindsInResolutionOrder: Seq[SessionFunctionKind] 
= {
-    conf.sessionFunctionResolutionOrder match {
-      case "first" => Seq(Temp, Builtin)
-      case "last" => Seq(Builtin)
-      case _ => Seq(Builtin, Temp) // "second" (default)
-    }
-  }
+  private[sql] var sessionFunctionKindsProvider: () => 
Seq[SessionFunctionKind] =

Review Comment:
   **Deadlock vector + publication concern on this `var`.**
   
   *Deadlock.* `lookupBuiltinOrTempTableFunction` (line 2519, master) holds 
`SessionCatalog.synchronized` and now reaches `sessionFunctionKindsProvider()` 
→ `CatalogManager.leadingSystemFunctionKinds()` → `currentCatalog.name()` / 
`sqlResolutionPathEntries` (`CatalogManager.synchronized`). Lock order: SC → 
CM. Conversely, `CatalogManager.setCurrentNamespace` 
(`CatalogManager.scala:159`) holds CM and calls 
`v1SessionCatalog.setCurrentDatabaseWithNameCheck` → `synchronized { currentDb 
= … }` (SC). Lock order: CM → SC. Classic AB-BA pattern. Pre-PR, 
`sessionFunctionKindsInResolutionOrder` only read 
`conf.sessionFunctionResolutionOrder` and never crossed into CM, so this hazard 
is new.
   
   *Publication.* The `var` is mutated from `CatalogManager`'s constructor body 
without `@volatile`; under JMM, another thread that obtained a `SessionCatalog` 
reference may observe the default callback even after the wiring statement runs.
   
   Options: (a) compute the kinds outside the SC lock — e.g., drop 
`synchronized` on `lookupBuiltinOrTempTableFunction` (it doesn't actually need 
it given underlying registry lookups), or pre-compute kinds before calling 
`lookupFunctionWithShadowing`; (b) replace the `var` with a back-reference to 
`CatalogManager` passed at construction (eliminates publication, makes the 
dependency explicit); (c) cache the kinds list inside `CatalogManager` so the 
callback doesn't acquire CM in the hot path.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala:
##########
@@ -113,17 +113,34 @@ class SessionCatalog(
     identifier.copy(funcName = "") == SESSION_NAMESPACE_TEMPLATE
 
   /**
-   * Session function kinds in resolution order for unqualified lookups.
-   * Matches [[SQLConf.sessionFunctionResolutionOrder]]: "first" (session 
first),
-   * "second" (default), "last" (builtin only; session tried after persistent).
+   * Provider for the ordered `system.builtin` / `system.session` entries on 
the effective SQL
+   * PATH (mapped to [[SessionFunctionKind]]). Wired by
+   * [[org.apache.spark.sql.connector.catalog.CatalogManager]] after 
construction so unqualified
+   * function lookups and the security check that blocks temp functions from 
shadowing builtins
+   * read the live PATH (post-`SET PATH`, with [[SQLConf.DEFAULT_PATH]] and
+   * [[SQLConf.defaultPathOrder]] fallbacks already applied) instead of the 
legacy
+   * [[SQLConf.SESSION_FUNCTION_RESOLUTION_ORDER]] proxy.
+   *
+   * The default reproduces the previous "second" behavior 
(builtin-then-session) for tests
+   * that construct a [[SessionCatalog]] directly without going through 
[[CatalogManager]].
    */
-  private def sessionFunctionKindsInResolutionOrder: Seq[SessionFunctionKind] 
= {
-    conf.sessionFunctionResolutionOrder match {
-      case "first" => Seq(Temp, Builtin)
-      case "last" => Seq(Builtin)
-      case _ => Seq(Builtin, Temp) // "second" (default)
-    }
-  }
+  private[sql] var sessionFunctionKindsProvider: () => 
Seq[SessionFunctionKind] =

Review Comment:
   **Test default silently ignores `SESSION_FUNCTION_RESOLUTION_ORDER`.** 
Pre-PR `sessionFunctionKindsInResolutionOrder` switched on the conf. Tests that 
construct `SessionCatalog` directly without `CatalogManager` (`AnalysisTest`, 
`SessionCatalogSuite`, `LookupFunctionsSuite`, `TableLookupCacheSuite`, 
`PlanResolutionSuite`, …) now get `Seq(Builtin, Temp)` regardless of 
`withSQLConf(SESSION_FUNCTION_RESOLUTION_ORDER → "first"|"last")`. Either 
restore the conf-switch as the default so existing tests keep working 
unchanged, or document the new contract loudly.
   
   ```suggestion
     private[sql] var sessionFunctionKindsProvider: () => 
Seq[SessionFunctionKind] =
       () => conf.sessionFunctionResolutionOrder match {
         case "first" => Seq(Temp, Builtin)
         case "last" => Seq(Builtin)
         case _ => Seq(Builtin, Temp)
       }
   ```



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -52,6 +53,38 @@ class CatalogManager(
   // TODO: create a real SYSTEM catalog to host `TempVariableManager` under 
the SESSION namespace.
   val tempVariableManager: TempVariableManager = new TempVariableManager
 
+  // Wire the path-driven kinds provider into SessionCatalog so the 
function-resolution loop
+  // and the security check that blocks temp functions from shadowing builtins 
read the live
+  // SQL PATH (post-`SET PATH`, with `DEFAULT_PATH` and `defaultPathOrder` 
fallbacks already
+  // applied) instead of the legacy 
[[SQLConf.SESSION_FUNCTION_RESOLUTION_ORDER]] proxy.
+  // Order is every `system.builtin` / `system.session` entry on the PATH in 
path order
+  // (user-catalog segments in between are skipped for this list).
+  v1SessionCatalog.sessionFunctionKindsProvider = () => 
leadingSystemFunctionKinds()
+
+  /**
+   * Walk the effective SQL PATH and collect every system function namespace 
entry
+   * (`system.builtin` / `system.session`) in path order, mapped to
+   * [[SessionCatalog.SessionFunctionKind]].
+   *
+   * User-catalog entries (e.g. `spark_catalog.default`) appear on the PATH 
before
+   * `system.builtin` / `system.session`; they must not hide later system 
entries.
+   * Unqualified builtin/temp resolution follows this ordered kind list via
+   * 
[[org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunctionWithShadowing]].
+   *
+   * When no system entries appear on the PATH (user-defined path without 
builtins/session),
+   * falls back to builtin-then-session so bare names still resolve like the 
legacy default.
+   */
+  private def leadingSystemFunctionKinds(): 
Seq[SessionCatalog.SessionFunctionKind] = {
+    val path = sqlResolutionPathEntries(currentCatalog.name(), 
currentNamespace.toSeq)
+    val kinds = path.flatMap { e =>

Review Comment:
   **Behavior change in `"last"` mode that the PR description doesn't call 
out.** `defaultPathOrder("last")` returns `[builtin, ..catalog.., session]`, so 
this `flatMap` extracts `Seq(Builtin, Temp)`. Pre-PR 
`sessionFunctionKindsInResolutionOrder` returned `Seq(Builtin)` for `"last"` — 
session was deliberately excluded from the system-only shortcut.
   
   Concrete consequence: `DESCRIBE FUNCTION foo` (`Analyzer.scala:2258`, via 
`lookupBuiltinOrTempFunction`) in `"last"` mode now returns a temp `foo` even 
when a persistent `foo` exists in the user catalog. Previously the shortcut 
returned None and the rule fell through to persistent lookup. Same change 
affects `Analyzer.LookupFunctions.quickCheck` (line 2186) and 
`HigherOrderFunctionResolver` (line 109).
   
   Either preserve `Seq(Builtin)` for `"last"` mode in the path-derived 
computation (e.g., only collect kinds *up to the first user-catalog entry*, so 
`[builtin, ..catalog.., session]` yields `Seq(Builtin)`), or document the 
change and add an explicit test (`DESCRIBE FUNCTION foo` with both temp and 
persistent in `"last"` mode).



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -138,8 +171,39 @@ class CatalogManager(
 
   private var _sessionPath: Option[Seq[SessionPathEntry]] = None
 
-  /** Returns the raw stored session path entries, or None if no path is set. 
*/
-  def sessionPathEntries: Option[Seq[SessionPathEntry]] = synchronized { 
_sessionPath }
+  /**
+   * Returns the effective session path entries: the explicit `SET PATH` value 
if stored,
+   * else the parsed [[SQLConf.DEFAULT_PATH]] conf if non-empty (mirroring how
+   * [[currentCatalog]] falls back to [[SQLConf.DEFAULT_CATALOG]]). Returns 
`None` when
+   * [[SQLConf.PATH_ENABLED]] is false or both sources are empty.
+   */
+  def sessionPathEntries: Option[Seq[SessionPathEntry]] = synchronized {
+    if (!conf.pathEnabled) None
+    else _sessionPath.orElse(workspaceDefaultPathEntries)
+  }
+
+  /** Raw `_sessionPath` (post-`SET PATH`), without the 
[[SQLConf.DEFAULT_PATH]] fallback. */
+  def storedSessionPathEntries: Option[Seq[SessionPathEntry]] = synchronized { 
_sessionPath }
+
+  /**
+   * Parsed [[SQLConf.DEFAULT_PATH]] value, or `None` when the conf is empty / 
blank. Reuses
+   * the SET PATH grammar via [[CatalystSqlParser.parseSqlPathElements]] so 
the workspace
+   * default accepts the same syntax as `SET PATH = ...`.
+   *
+   * An inner `DEFAULT_PATH` token inside the conf value resolves to the 
spark-builtin
+   * default ordering rather than recursing back into this method (cycle 
break).
+   */
+  def workspaceDefaultPathEntries: Option[Seq[SessionPathEntry]] = {

Review Comment:
   **Naming: `workspace*`.** "Workspace" isn't a Spark/OSS concept; the conf is 
`spark.sql.defaultPath`, not `spark.sql.workspaceDefaultPath`. Suggest 
`confDefaultPathEntries` / `isConfDefaultExpansion` (or 
`defaultPathConfEntries`) — applies here, on 
`_sessionPath.orElse(workspaceDefaultPathEntries)` (line 182), on the parameter 
`isWorkspaceDefaultExpansion` in `PathElement.expand`, and inside 
`PathElement.scala`'s Scaladoc.
   
   Also: this method doesn't apply the duplicate-entry rejection that 
`SetPathCommand.run` does (`SetPathCommand.scala:51-56`), so `DEFAULT_PATH = 
"system.builtin, system.builtin"` is silently accepted while the equivalent 
`SET PATH` is rejected. Easiest fix: pull the dedup into `PathElement.expand` 
(or a small helper) and run it on both paths.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -52,6 +53,38 @@ class CatalogManager(
   // TODO: create a real SYSTEM catalog to host `TempVariableManager` under 
the SESSION namespace.
   val tempVariableManager: TempVariableManager = new TempVariableManager
 
+  // Wire the path-driven kinds provider into SessionCatalog so the 
function-resolution loop
+  // and the security check that blocks temp functions from shadowing builtins 
read the live
+  // SQL PATH (post-`SET PATH`, with `DEFAULT_PATH` and `defaultPathOrder` 
fallbacks already
+  // applied) instead of the legacy 
[[SQLConf.SESSION_FUNCTION_RESOLUTION_ORDER]] proxy.
+  // Order is every `system.builtin` / `system.session` entry on the PATH in 
path order
+  // (user-catalog segments in between are skipped for this list).
+  v1SessionCatalog.sessionFunctionKindsProvider = () => 
leadingSystemFunctionKinds()
+
+  /**
+   * Walk the effective SQL PATH and collect every system function namespace 
entry
+   * (`system.builtin` / `system.session`) in path order, mapped to
+   * [[SessionCatalog.SessionFunctionKind]].
+   *
+   * User-catalog entries (e.g. `spark_catalog.default`) appear on the PATH 
before
+   * `system.builtin` / `system.session`; they must not hide later system 
entries.
+   * Unqualified builtin/temp resolution follows this ordered kind list via
+   * 
[[org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunctionWithShadowing]].
+   *
+   * When no system entries appear on the PATH (user-defined path without 
builtins/session),
+   * falls back to builtin-then-session so bare names still resolve like the 
legacy default.
+   */
+  private def leadingSystemFunctionKinds(): 
Seq[SessionCatalog.SessionFunctionKind] = {

Review Comment:
   **Coordination layer doing strategy work** (open to discussion, not 
blocking). Per Spark's name-resolution layer model, `CatalogManager` 
(Coordination) shouldn't *generate resolution candidates or decide resolution 
order* — that's `FunctionResolution`'s (Strategy) job. 
`leadingSystemFunctionKinds` and `isSessionBeforeBuiltinInPath` derive 
resolution order from the path.
   
   The pre-PR site of this logic was inside `SessionCatalog`, which was also 
wrong by the same model — so this PR moves the violation rather than resolving 
it. Cleanest home would be `FunctionResolution` (it already consumes 
`sqlResolutionPathEntries` via `sqlResolutionPathEntriesForAnalysis`, and the 
lookup shortcuts in `SessionCatalog` would just receive a kinds list as a 
parameter or via a thin helper). Curious if this was considered.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogManager.scala:
##########
@@ -138,8 +171,39 @@ class CatalogManager(
 
   private var _sessionPath: Option[Seq[SessionPathEntry]] = None
 
-  /** Returns the raw stored session path entries, or None if no path is set. 
*/
-  def sessionPathEntries: Option[Seq[SessionPathEntry]] = synchronized { 
_sessionPath }
+  /**
+   * Returns the effective session path entries: the explicit `SET PATH` value 
if stored,
+   * else the parsed [[SQLConf.DEFAULT_PATH]] conf if non-empty (mirroring how
+   * [[currentCatalog]] falls back to [[SQLConf.DEFAULT_CATALOG]]). Returns 
`None` when
+   * [[SQLConf.PATH_ENABLED]] is false or both sources are empty.
+   */
+  def sessionPathEntries: Option[Seq[SessionPathEntry]] = synchronized {
+    if (!conf.pathEnabled) None
+    else _sessionPath.orElse(workspaceDefaultPathEntries)
+  }
+
+  /** Raw `_sessionPath` (post-`SET PATH`), without the 
[[SQLConf.DEFAULT_PATH]] fallback. */
+  def storedSessionPathEntries: Option[Seq[SessionPathEntry]] = synchronized { 
_sessionPath }
+
+  /**
+   * Parsed [[SQLConf.DEFAULT_PATH]] value, or `None` when the conf is empty / 
blank. Reuses
+   * the SET PATH grammar via [[CatalystSqlParser.parseSqlPathElements]] so 
the workspace
+   * default accepts the same syntax as `SET PATH = ...`.
+   *
+   * An inner `DEFAULT_PATH` token inside the conf value resolves to the 
spark-builtin
+   * default ordering rather than recursing back into this method (cycle 
break).
+   */
+  def workspaceDefaultPathEntries: Option[Seq[SessionPathEntry]] = {
+    val confValue = conf.defaultPath
+    if (confValue == null || confValue.trim.isEmpty) {
+      None
+    } else {
+      val elements = CatalystSqlParser.parseSqlPathElements(confValue)

Review Comment:
   **Hot-path re-parse + parsing inside `CatalogManager.synchronized`.** This 
method runs on every `sessionPathEntries` call when `_sessionPath` is None and 
`PATH_ENABLED` is true — every unqualified function/table resolution, every 
`count(*)→count(1)` gate via `isSessionBeforeBuiltinInPath`, every 
`currentPathString`. With `DEFAULT_PATH` set, that's an ANTLR parse + 
`PathElement.expand` per call, all while `sessionPathEntries` (line 180) holds 
the CatalogManager lock.
   
   Cache the parsed result keyed on the conf string — invalidate on string 
change. Even a simple `@volatile var cachedDefaultPath: Option[(String, 
Option[Seq[SessionPathEntry]])]` (read outside the synchronized block, written 
under it) would eliminate the per-call parse cost in the steady state and move 
parsing out of the hot lock.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to