[ 
https://issues.apache.org/jira/browse/SPARK-56681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Tenedorio resolved SPARK-56681.
--------------------------------------
    Fix Version/s: 5.0.0
       Resolution: Fixed

Issue resolved by pull request 55647
[https://github.com/apache/spark/pull/55647]

> PATH cleanup
> ------------
>
>                 Key: SPARK-56681
>                 URL: https://issues.apache.org/jira/browse/SPARK-56681
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 4.2.0
>            Reporter: Serge Rielau
>            Assignee: Serge Rielau
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 5.0.0
>
>
> Umbrella for SQL Language follow-ups identified  Each item below is
> independently actionable; #1, #4, #5, #6, #7, #11 are 1-line / 1-block
> fixes; #2 and #3 are the substantive follow-up work the SPARK-56605
> docstring already promised.
> Source review: 
> [https://github.com/databricks-eng/runtime/pull/215112#issuecomment-4353729745]
> (See the comment on this issue for the full enumerated checklist; each
> item can be broken out into its own linked subtask.)
> h2. Critical
> h3. 1. {{FunctionResolution.resolveProcedure}} is dead code
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}
> The PR adds a ~70-line {{def resolveProcedure(unresolved: 
> UnresolvedProcedure): LogicalPlan}} that has no callers in 
> {{{}apache/spark{}}}. Procedure resolution is still routed through the 
> existing {{Analyzer.ResolveProcedures}} rule into a different code path. 
> Downstream forks have to either wire it or delete it on each cherry-pick.
> *Action:* either replace the existing call site to use the new method (and 
> delete the older path) or revert the hunk. Decide as part of the wiring PR 
> that the SPARK-56605 docstring promises.
> h2. High
> h3. 2. Frozen view / SQL-function PATH wiring is unfinished
> The PR docstring states:
> Frozen path analysis for views/SQL functions (using stored path during 
> analysis) comes in a follow-up PR. This PR only wires the resolution engine 
> to use the live session path.\{quote}
>  
> The follow-up has not landed. Concretely:
>  * {{AnalysisContext.resolutionPathEntries}} is declared but never populated 
> from view metadata.
>  * {{CatalogManager.serializePathEntries}} / {{pathEntriesForPersistence}} 
> are declared but never written to view / SQL function properties.
>  * {{AnalysisContext.relationResolutionPathCache}} is unused on this branch.
> *Action:* finish the wiring. Capture path entries at CREATE VIEW / CREATE 
> FUNCTION time, persist into catalog properties, deserialize on view body / 
> SQL function body re-analysis, and populate 
> {{AnalysisContext.resolutionPathEntries}} accordingly.
> h3. 3. {{AnalysisContext.resolutionPathEntries}} is a new threadlocal
> The single-pass resolver project explicitly forbids new threadlocal state. 
> The frozen view path needs to be plumbed through 
> {{OperatorResolutionContextStack}} (or another single-pass-friendly 
> propagation mechanism) and the field removed from {{{}AnalysisContext{}}}.
> *Action:* before/together with #2, plumb the frozen path through the 
> single-pass operator stack instead of {{{}AnalysisContext{}}}. Audit 
> {{AnalysisContext.withNewAnalysisContext}} and {{AnalysisContext.reset}} to 
> ensure no leakage even before the field is removed.
> h3. 4. {{Analyzer.executeAndCheck}} clobbers outer 
> {{SQLConf.withExistingConf}}
> After the PR, {{Analyzer}} carries a {{sessionConf: Option[SQLConf]}} and 
> {{executeAndCheck}} unconditionally wraps analysis in 
> {{{}SQLConf.withExistingConf(sessionConf){}}}. This clobbers any outer 
> {{withExistingConf}} scope (e.g. a SQL UDF / view body that wired frozen 
> configs). {{executeSameContext}} already short-circuits with {{SQLConf.get ne 
> c => super.execute(plan)}} but {{executeAndCheck}} does not. The two 
> predicates also disagree: {{executeSameContext}} uses {{SQLConf.get}} (falls 
> through to {{{}confGetter{}}}) while a correct fix needs to peek at the 
> threadlocal directly so "no outer scope" and "outer scope happens to match 
> the session conf" are not conflated.
> *Action:* add public {{SQLConf.getExistingConfIfSet: Option[SQLConf]}} 
> (returns {{Some}} only when an outer scope explicitly installed a conf via 
> {{{}withExistingConf{}}}). Extract one {{runWithSessionConf}} helper in 
> {{{}Analyzer{}}}, used by both {{executeAndCheck}} and 
> {{{}executeSameContext{}}}, that:
>  * returns the thunk unchanged if {{sessionConf}} is {{{}None{}}};
>  * returns the thunk unchanged if an outer scope has already installed a 
> different conf via {{withExistingConf}} (use {{{}getExistingConfIfSet{}}});
>  * otherwise wraps in {{{}SQLConf.withExistingConf(sessionConf){}}}.
> Reproducer: a SQL UDF / view body that captures an {{ANSI_ENABLED}} value at 
> CREATE time and is re-analyzed in a session that has flipped {{ANSI_ENABLED}} 
> should still see the captured value during analysis. Today the captured value 
> is overwritten by the analyzer's {{{}sessionConf{}}}.
> h3. 5. {{VariableResolution.allowUnqualifiedSessionTempVariableLookup}} 
> force-loads default catalog
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/VariableResolution.scala}}
> The check eagerly evaluates {{{}catalogManager.currentCatalog.name(){}}}, 
> which calls {{Catalogs.load}} on the configured default catalog. If the 
> configured default catalog is not a registered plugin (e.g. tests, customer 
> sessions with a mistyped {{{}spark.sql.defaultCatalog{}}}), this throws 
> {{CATALOG_NOT_FOUND}} mid-analysis on every column-resolution pass that 
> reaches the variable-lookup last-resort path.
> When SQL PATH is not explicitly set, the answer is fully determined by 
> {{SQLConf.isDefinerProcedureExcludeSessionFromSearchPath}} (the default path 
> always contains {{system.session}} otherwise) and never depends on the 
> catalog name.
> *Action:* short-circuit before touching the catalog when PATH is not 
> explicitly set:
> {code:scala}
> private def allowUnqualifiedSessionTempVariableLookup(nameParts: 
> Seq[String]): Boolean = {
>   if (nameParts.length != 1) return true
>   val pathExplicitlySet = conf.pathEnabled && 
> catalogManager.sessionPathEntries.isDefined
>   if (!pathExplicitlySet) {
>     return !conf.isDefinerProcedureExcludeSessionFromSearchPath
>   }
>   catalogManager.sessionScopeUnqualifiedAllowed(
>     catalogManager.currentCatalog.name(),
>     catalogManager.currentNamespace.toSeq)
> }
> {code}
> h3. 6. DROP VARIABLE PATH gate is asymmetric with DECLARE / CREATE VARIABLE
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala}}
> The PR added a {{{}system.session{}}}-on-PATH gate to {{DropVariable}} only. 
> {{CreateVariable}} (DECLARE / CREATE) has no such gate, so a user can 
> {{DECLARE OR REPLACE VARIABLE foo}} they cannot drop under a {{SET PATH}} 
> that omits {{{}system.session{}}}. {{SET VAR foo = ...}} (which goes through 
> {{VariableResolution.lookupVariable}} and inherits the same gate) also fails 
> in that case.
> The intended policy is "DDL ignores PATH, DML uses it":
>  * DECLARE / CREATE / DROP VARIABLE always target {{system.session}} directly 
> regardless of PATH (no gate).
>  * SET VAR / {{SELECT @foo}} resolve through PATH and require 
> {{system.session}} on the path (gate stays).
> *Action:* remove the {{if (nameParts.length == 1 && 
> !catalogManager.sessionScopeUnqualifiedAllowed(...)) throw ...}} block from 
> the {{DropVariable}} case in {{{}ResolveCatalogs{}}}. Add a regression test 
> that exercises a {{DECLARE -> SET VAR (qualified) -> DROP}} cycle under 
> {{{}SET PATH = somecat.somesch{}}}.
> h3. 7. {{lookupFunctionType}} exception swallow is too broad
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}
> The PR uses {{{}catch \{ case NonFatal(_) => false }}} around 
> {{functionExists{}}}. That swallows {{{}PERMISSION_DENIED{}}}, transient 
> catalog errors, and unrelated bugs uniformly – the user sees 
> {{UNRESOLVED_ROUTINE}} with no upstream cause, and the underlying error never 
> surfaces.
> *Action:* mirror the explicit not-found list already used in 
> {{{}resolveFunctionCandidate{}}}:
> {code:scala}
> catch {
>   case _: NoSuchFunctionException
>      | _: NoSuchDatabaseException
>      | _: NoSuchNamespaceException
>      | _: CatalogNotFoundException =>
>     false
>   case e: AnalysisException if e.getCondition == "FORBIDDEN_OPERATION" =>
>     false
> }
> {code}
> Other exceptions should propagate.
> h3. 8. {{lookupFunctionType}} fan-out includes wasteful system-prefix entries
> For 1-part names the new code iterates {{resolutionCandidates(nameParts)}} 
> and calls {{functionExists}} on every entry. The candidate list includes the 
> {{{}system.session.<name>{}}}, {{{}system.builtin.<name>{}}}, 
> {{system.ai.<name>}} path entries – but routines under those namespaces are 
> already resolved by {{lookupBuiltinOrTempFunction}} / 
> {{lookupBuiltinOrTempTableFunction}} higher up. Calling {{functionExists}} on 
> them wastes catalog calls (UC backends RPC) and is observable under tests 
> that count calls.
> *Action:* filter the candidate list down to non-system entries before the 
> {{functionExists}} walk:
> {code:scala}
> val persistentCandidates = resolutionCandidates(nameParts).filterNot { c =>
>   c.length >= 2 && 
> c.head.equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) && {
>     val ns = c(1)
>     ns.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)
>       || ns.equalsIgnoreCase(CatalogManager.BUILTIN_NAMESPACE)
>       || ns.equalsIgnoreCase(CatalogManager.AI_NAMESPACE)
>   }
> }
> {code}
> h2. Medium
> h3. 9. Three near-duplicate path-resolution helpers should be lifted to 
> {{CatalogManager}}
> {{RelationResolution.relationResolutionEntries}} and 
> {{FunctionResolution.sqlResolutionPathEntriesForAnalysis}} each implement 
> "view-aware ordered SQL path entries". They will drift. Procedure resolution 
> in particular doesn't consult {{{}AnalysisContext.resolutionPathEntries{}}}, 
> so when #2 lands, {{CALL}} inside a view body will silently use the live 
> session path.
> *Action:* add one helper on {{{}CatalogManager{}}}:
> {code:scala}
> def resolutionPathEntriesForAnalysis(
>     pinnedEntries: Option[Seq[Seq[String]]],
>     viewCatalogAndNamespace: Seq[String]): Seq[Seq[String]]
> {code}
> that picks pinned entries when set (and PATH is enabled) or falls back to 
> {{sqlResolutionPathEntries}} with view-aware path/expand args. Route 
> relation, function, and procedure resolution all through it.
> h3. 10. Tests for the new error paths and gates
> The PR added at least three new behaviour gates with no direct test coverage 
> in {{{}SetPathSuite{}}}:
>  * DECLARE / SET VAR / DROP VARIABLE cycle under {{SET PATH = 
> somecat.somesch}} (positive case for DDL, negative case for unqualified DML 
> lookup).
>  * DESCRIBE TABLE error message format under the new {{QueryLike}} / 
> {{UnresolvedTableOrViewSearchPathMode}} error path – verify the message lists 
> the full PATH.
>  * Unqualified variable resolution failure when {{system.session}} is not on 
> PATH (e.g. PATH set, {{SELECT @x}} must fail; {{SELECT system.session.x}} 
> must succeed).
> *Action:* add {{{}intercept[AnalysisException]{}}}-based tests for the above 
> to {{{}SetPathSuite{}}}.
> h3. 11. {{ProtoToParsedPlanTestSuite.analyzerIsolationConf}} is a bare 
> {{SQLConf}}
> File: 
> {{sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala}}
> The PR introduced {{analyzerIsolationConf = new SQLConf()}} to insulate the 
> suite from session SET PATH. That bare {{SQLConf}} discards every override 
> the test session's {{sparkConf}} sets ({{{}ANSI_ENABLED{}}}, 
> {{{}ALWAYS_INLINE_COMMON_EXPR{}}}, {{{}USE_COMMON_EXPR_ID_FOR_ALIAS{}}}, ...) 
> and exposes the {{SQLConf}} factory defaults. Each lost override is silently 
> a coverage gap.
> *Action:* clone the shared session's conf instead, only overriding the knobs 
> that genuinely need to differ:
> {code:scala}
> private lazy val analyzerIsolationConf: SQLConf = {
>   val c = spark.sessionState.conf.clone()
>   c.setConf(SQLConf.PATH_ENABLED, false)
>   c
> }
> {code}
> This automatically picks up the test session's overrides and only the genuine 
> isolation knob remains explicit.
> ALSO:
> Confirmed — {{ResolveSetVariable.scala:105}} hardcodes {{Seq("SYSTEM", 
> "SESSION")}} as the search path it passes to {{{}unresolvedVariableError{}}}, 
> regardless of the actual SQL PATH. So the error message lies about what was 
> searched. That's the same family of bug as Vladimir's #6 (DESCRIBE error 
> format) — the error path doesn't get reformatted under the new {{QueryLike}} 
> mode either.
> I'll add it to SPARK-56681 as a separate follow-up item and commit the test 
> as-is (the hardcoded message means my {{checkError}} matches today, with the 
> understanding that the OSS fix will require a corresponding {{parameters = 
> Map(... "searchPath" -> "<actual path>")}} update here).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to