[ 
https://issues.apache.org/jira/browse/SPARK-56681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Serge Rielau updated SPARK-56681:
---------------------------------
    Description: 
Umbrella for SQL Language follow-ups identified  Each item below is
independently actionable; #1, #4, #5, #6, #7, #11 are 1-line / 1-block
fixes; #2 and #3 are the substantive follow-up work the SPARK-56605
docstring already promised.

Source review: 
[https://github.com/databricks-eng/runtime/pull/215112#issuecomment-4353729745]

(See the comment on this issue for the full enumerated checklist; each
item can be broken out into its own linked subtask.)
h2. Critical
h3. 1. {{FunctionResolution.resolveProcedure}} is dead code

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}

The PR adds a ~70-line {{def resolveProcedure(unresolved: UnresolvedProcedure): 
LogicalPlan}} that has no callers in {{{}apache/spark{}}}. Procedure resolution 
is still routed through the existing {{Analyzer.ResolveProcedures}} rule into a 
different code path. Downstream forks have to either wire it or delete it on 
each cherry-pick.

*Action:* either replace the existing call site to use the new method (and 
delete the older path) or revert the hunk. Decide as part of the wiring PR that 
the SPARK-56605 docstring promises.
h2. High
h3. 2. Frozen view / SQL-function PATH wiring is unfinished

The PR docstring states:

Frozen path analysis for views/SQL functions (using stored path during 
analysis) comes in a follow-up PR. This PR only wires the resolution engine to 
use the live session path.\{quote}

 

The follow-up has not landed. Concretely:
 * {{AnalysisContext.resolutionPathEntries}} is declared but never populated 
from view metadata.
 * {{CatalogManager.serializePathEntries}} / {{pathEntriesForPersistence}} are 
declared but never written to view / SQL function properties.
 * {{AnalysisContext.relationResolutionPathCache}} is unused on this branch.

*Action:* finish the wiring. Capture path entries at CREATE VIEW / CREATE 
FUNCTION time, persist into catalog properties, deserialize on view body / SQL 
function body re-analysis, and populate 
{{AnalysisContext.resolutionPathEntries}} accordingly.
h3. 3. {{AnalysisContext.resolutionPathEntries}} is a new threadlocal

The single-pass resolver project explicitly forbids new threadlocal state. The 
frozen view path needs to be plumbed through {{OperatorResolutionContextStack}} 
(or another single-pass-friendly propagation mechanism) and the field removed 
from {{{}AnalysisContext{}}}.

*Action:* before/together with #2, plumb the frozen path through the 
single-pass operator stack instead of {{{}AnalysisContext{}}}. Audit 
{{AnalysisContext.withNewAnalysisContext}} and {{AnalysisContext.reset}} to 
ensure no leakage even before the field is removed.
h3. 4. {{Analyzer.executeAndCheck}} clobbers outer {{SQLConf.withExistingConf}}

After the PR, {{Analyzer}} carries a {{sessionConf: Option[SQLConf]}} and 
{{executeAndCheck}} unconditionally wraps analysis in 
{{{}SQLConf.withExistingConf(sessionConf){}}}. This clobbers any outer 
{{withExistingConf}} scope (e.g. a SQL UDF / view body that wired frozen 
configs). {{executeSameContext}} already short-circuits with {{SQLConf.get ne c 
=> super.execute(plan)}} but {{executeAndCheck}} does not. The two predicates 
also disagree: {{executeSameContext}} uses {{SQLConf.get}} (falls through to 
{{{}confGetter{}}}) while a correct fix needs to peek at the threadlocal 
directly so "no outer scope" and "outer scope happens to match the session 
conf" are not conflated.

*Action:* add public {{SQLConf.getExistingConfIfSet: Option[SQLConf]}} (returns 
{{Some}} only when an outer scope explicitly installed a conf via 
{{{}withExistingConf{}}}). Extract one {{runWithSessionConf}} helper in 
{{{}Analyzer{}}}, used by both {{executeAndCheck}} and 
{{{}executeSameContext{}}}, that:
 * returns the thunk unchanged if {{sessionConf}} is {{{}None{}}};
 * returns the thunk unchanged if an outer scope has already installed a 
different conf via {{withExistingConf}} (use {{{}getExistingConfIfSet{}}});
 * otherwise wraps in {{{}SQLConf.withExistingConf(sessionConf){}}}.

Reproducer: a SQL UDF / view body that captures an {{ANSI_ENABLED}} value at 
CREATE time and is re-analyzed in a session that has flipped {{ANSI_ENABLED}} 
should still see the captured value during analysis. Today the captured value 
is overwritten by the analyzer's {{{}sessionConf{}}}.
h3. 5. {{VariableResolution.allowUnqualifiedSessionTempVariableLookup}} 
force-loads default catalog

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/VariableResolution.scala}}

The check eagerly evaluates {{{}catalogManager.currentCatalog.name(){}}}, which 
calls {{Catalogs.load}} on the configured default catalog. If the configured 
default catalog is not a registered plugin (e.g. tests, customer sessions with 
a mistyped {{{}spark.sql.defaultCatalog{}}}), this throws {{CATALOG_NOT_FOUND}} 
mid-analysis on every column-resolution pass that reaches the variable-lookup 
last-resort path.

When SQL PATH is not explicitly set, the answer is fully determined by 
{{SQLConf.isDefinerProcedureExcludeSessionFromSearchPath}} (the default path 
always contains {{system.session}} otherwise) and never depends on the catalog 
name.

*Action:* short-circuit before touching the catalog when PATH is not explicitly 
set:
{code:scala}
private def allowUnqualifiedSessionTempVariableLookup(nameParts: Seq[String]): 
Boolean = {
  if (nameParts.length != 1) return true
  val pathExplicitlySet = conf.pathEnabled && 
catalogManager.sessionPathEntries.isDefined
  if (!pathExplicitlySet) {
    return !conf.isDefinerProcedureExcludeSessionFromSearchPath
  }
  catalogManager.sessionScopeUnqualifiedAllowed(
    catalogManager.currentCatalog.name(),
    catalogManager.currentNamespace.toSeq)
}
{code}
h3. 6. DROP VARIABLE PATH gate is asymmetric with DECLARE / CREATE VARIABLE

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala}}

The PR added a {{{}system.session{}}}-on-PATH gate to {{DropVariable}} only. 
{{CreateVariable}} (DECLARE / CREATE) has no such gate, so a user can {{DECLARE 
OR REPLACE VARIABLE foo}} they cannot drop under a {{SET PATH}} that omits 
{{{}system.session{}}}. {{SET VAR foo = ...}} (which goes through 
{{VariableResolution.lookupVariable}} and inherits the same gate) also fails in 
that case.

The intended policy is "DDL ignores PATH, DML uses it":
 * DECLARE / CREATE / DROP VARIABLE always target {{system.session}} directly 
regardless of PATH (no gate).
 * SET VAR / {{SELECT @foo}} resolve through PATH and require 
{{system.session}} on the path (gate stays).

*Action:* remove the {{if (nameParts.length == 1 && 
!catalogManager.sessionScopeUnqualifiedAllowed(...)) throw ...}} block from the 
{{DropVariable}} case in {{{}ResolveCatalogs{}}}. Add a regression test that 
exercises a {{DECLARE -> SET VAR (qualified) -> DROP}} cycle under {{{}SET PATH 
= somecat.somesch{}}}.
h3. 7. {{lookupFunctionType}} exception swallow is too broad

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}

The PR uses {{{}catch \{ case NonFatal(_) => false }}} around 
{{functionExists{}}}. That swallows {{{}PERMISSION_DENIED{}}}, transient 
catalog errors, and unrelated bugs uniformly – the user sees 
{{UNRESOLVED_ROUTINE}} with no upstream cause, and the underlying error never 
surfaces.

*Action:* mirror the explicit not-found list already used in 
{{{}resolveFunctionCandidate{}}}:
{code:scala}
catch {
  case _: NoSuchFunctionException
     | _: NoSuchDatabaseException
     | _: NoSuchNamespaceException
     | _: CatalogNotFoundException =>
    false
  case e: AnalysisException if e.getCondition == "FORBIDDEN_OPERATION" =>
    false
}
{code}
Other exceptions should propagate.
h3. 8. {{lookupFunctionType}} fan-out includes wasteful system-prefix entries

For 1-part names the new code iterates {{resolutionCandidates(nameParts)}} and 
calls {{functionExists}} on every entry. The candidate list includes the 
{{{}system.session.<name>{}}}, {{{}system.builtin.<name>{}}}, 
{{system.ai.<name>}} path entries – but routines under those namespaces are 
already resolved by {{lookupBuiltinOrTempFunction}} / 
{{lookupBuiltinOrTempTableFunction}} higher up. Calling {{functionExists}} on 
them wastes catalog calls (UC backends RPC) and is observable under tests that 
count calls.

*Action:* filter the candidate list down to non-system entries before the 
{{functionExists}} walk:
{code:scala}
val persistentCandidates = resolutionCandidates(nameParts).filterNot { c =>
  c.length >= 2 && c.head.equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) 
&& {
    val ns = c(1)
    ns.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)
      || ns.equalsIgnoreCase(CatalogManager.BUILTIN_NAMESPACE)
      || ns.equalsIgnoreCase(CatalogManager.AI_NAMESPACE)
  }
}
{code}
h2. Medium
h3. 9. Three near-duplicate path-resolution helpers should be lifted to 
{{CatalogManager}}

{{RelationResolution.relationResolutionEntries}} and 
{{FunctionResolution.sqlResolutionPathEntriesForAnalysis}} each implement 
"view-aware ordered SQL path entries". They will drift. Procedure resolution in 
particular doesn't consult {{{}AnalysisContext.resolutionPathEntries{}}}, so 
when #2 lands, {{CALL}} inside a view body will silently use the live session 
path.

*Action:* add one helper on {{{}CatalogManager{}}}:
{code:scala}
def resolutionPathEntriesForAnalysis(
    pinnedEntries: Option[Seq[Seq[String]]],
    viewCatalogAndNamespace: Seq[String]): Seq[Seq[String]]
{code}
that picks pinned entries when set (and PATH is enabled) or falls back to 
{{sqlResolutionPathEntries}} with view-aware path/expand args. Route relation, 
function, and procedure resolution all through it.
h3. 10. Tests for the new error paths and gates

The PR added at least three new behaviour gates with no direct test coverage in 
{{{}SetPathSuite{}}}:
 * DECLARE / SET VAR / DROP VARIABLE cycle under {{SET PATH = somecat.somesch}} 
(positive case for DDL, negative case for unqualified DML lookup).
 * DESCRIBE TABLE error message format under the new {{QueryLike}} / 
{{UnresolvedTableOrViewSearchPathMode}} error path – verify the message lists 
the full PATH.
 * Unqualified variable resolution failure when {{system.session}} is not on 
PATH (e.g. PATH set, {{SELECT @x}} must fail; {{SELECT system.session.x}} must 
succeed).

*Action:* add {{{}intercept[AnalysisException]{}}}-based tests for the above to 
{{{}SetPathSuite{}}}.
h3. 11. {{ProtoToParsedPlanTestSuite.analyzerIsolationConf}} is a bare 
{{SQLConf}}

File: 
{{sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala}}

The PR introduced {{analyzerIsolationConf = new SQLConf()}} to insulate the 
suite from session SET PATH. That bare {{SQLConf}} discards every override the 
test session's {{sparkConf}} sets ({{{}ANSI_ENABLED{}}}, 
{{{}ALWAYS_INLINE_COMMON_EXPR{}}}, {{{}USE_COMMON_EXPR_ID_FOR_ALIAS{}}}, ...) 
and exposes the {{SQLConf}} factory defaults. Each lost override is silently a 
coverage gap.

*Action:* clone the shared session's conf instead, only overriding the knobs 
that genuinely need to differ:
{code:scala}
private lazy val analyzerIsolationConf: SQLConf = {
  val c = spark.sessionState.conf.clone()
  c.setConf(SQLConf.PATH_ENABLED, false)
  c
}
{code}
This automatically picks up the test session's overrides and only the genuine 
isolation knob remains explicit.


ALSO:

Confirmed — {{ResolveSetVariable.scala:105}} hardcodes {{Seq("SYSTEM", 
"SESSION")}} as the search path it passes to {{{}unresolvedVariableError{}}}, 
regardless of the actual SQL PATH. So the error message lies about what was 
searched. That's the same family of bug as Vladimir's #6 (DESCRIBE error 
format) — the error path doesn't get reformatted under the new {{QueryLike}} 
mode either.

I'll add it to SPARK-56681 as a separate follow-up item and commit the test 
as-is (the hardcoded message means my {{checkError}} matches today, with the 
understanding that the OSS fix will require a corresponding {{parameters = 
Map(... "searchPath" -> "<actual path>")}} update here).

  was:
Umbrella for SQL Language follow-ups identified  Each item below is
independently actionable; #1, #4, #5, #6, #7, #11 are 1-line / 1-block
fixes; #2 and #3 are the substantive follow-up work the SPARK-56605
docstring already promised.

Source review: 
[https://github.com/databricks-eng/runtime/pull/215112#issuecomment-4353729745]

(See the comment on this issue for the full enumerated checklist; each
item can be broken out into its own linked subtask.)
h2. Critical
h3. 1. {{FunctionResolution.resolveProcedure}} is dead code

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}

The PR adds a ~70-line {{def resolveProcedure(unresolved: UnresolvedProcedure): 
LogicalPlan}} that has no callers in {{{}apache/spark{}}}. Procedure resolution 
is still routed through the existing {{Analyzer.ResolveProcedures}} rule into a 
different code path. Downstream forks have to either wire it or delete it on 
each cherry-pick.

*Action:* either replace the existing call site to use the new method (and 
delete the older path) or revert the hunk. Decide as part of the wiring PR that 
the SPARK-56605 docstring promises.
h2. High
h3. 2. Frozen view / SQL-function PATH wiring is unfinished

The PR docstring states:

Frozen path analysis for views/SQL functions (using stored path during 
analysis) comes in a follow-up PR. This PR only wires the resolution engine to 
use the live session path.\{quote}

 

The follow-up has not landed. Concretely:
 * {{AnalysisContext.resolutionPathEntries}} is declared but never populated 
from view metadata.
 * {{CatalogManager.serializePathEntries}} / {{pathEntriesForPersistence}} are 
declared but never written to view / SQL function properties.
 * {{AnalysisContext.relationResolutionPathCache}} is unused on this branch.

*Action:* finish the wiring. Capture path entries at CREATE VIEW / CREATE 
FUNCTION time, persist into catalog properties, deserialize on view body / SQL 
function body re-analysis, and populate 
{{AnalysisContext.resolutionPathEntries}} accordingly.
h3. 3. {{AnalysisContext.resolutionPathEntries}} is a new threadlocal

The single-pass resolver project explicitly forbids new threadlocal state. The 
frozen view path needs to be plumbed through {{OperatorResolutionContextStack}} 
(or another single-pass-friendly propagation mechanism) and the field removed 
from {{{}AnalysisContext{}}}.

*Action:* before/together with #2, plumb the frozen path through the 
single-pass operator stack instead of {{{}AnalysisContext{}}}. Audit 
{{AnalysisContext.withNewAnalysisContext}} and {{AnalysisContext.reset}} to 
ensure no leakage even before the field is removed.
h3. 4. {{Analyzer.executeAndCheck}} clobbers outer {{SQLConf.withExistingConf}}

After the PR, {{Analyzer}} carries a {{sessionConf: Option[SQLConf]}} and 
{{executeAndCheck}} unconditionally wraps analysis in 
{{{}SQLConf.withExistingConf(sessionConf){}}}. This clobbers any outer 
{{withExistingConf}} scope (e.g. a SQL UDF / view body that wired frozen 
configs). {{executeSameContext}} already short-circuits with {{SQLConf.get ne c 
=> super.execute(plan)}} but {{executeAndCheck}} does not. The two predicates 
also disagree: {{executeSameContext}} uses {{SQLConf.get}} (falls through to 
{{{}confGetter{}}}) while a correct fix needs to peek at the threadlocal 
directly so "no outer scope" and "outer scope happens to match the session 
conf" are not conflated.

*Action:* add public {{SQLConf.getExistingConfIfSet: Option[SQLConf]}} (returns 
{{Some}} only when an outer scope explicitly installed a conf via 
{{{}withExistingConf{}}}). Extract one {{runWithSessionConf}} helper in 
{{{}Analyzer{}}}, used by both {{executeAndCheck}} and 
{{{}executeSameContext{}}}, that:
 * returns the thunk unchanged if {{sessionConf}} is {{{}None{}}};
 * returns the thunk unchanged if an outer scope has already installed a 
different conf via {{withExistingConf}} (use {{{}getExistingConfIfSet{}}});
 * otherwise wraps in {{{}SQLConf.withExistingConf(sessionConf){}}}.

Reproducer: a SQL UDF / view body that captures an {{ANSI_ENABLED}} value at 
CREATE time and is re-analyzed in a session that has flipped {{ANSI_ENABLED}} 
should still see the captured value during analysis. Today the captured value 
is overwritten by the analyzer's {{{}sessionConf{}}}.
h3. 5. {{VariableResolution.allowUnqualifiedSessionTempVariableLookup}} 
force-loads default catalog

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/VariableResolution.scala}}

The check eagerly evaluates {{{}catalogManager.currentCatalog.name(){}}}, which 
calls {{Catalogs.load}} on the configured default catalog. If the configured 
default catalog is not a registered plugin (e.g. tests, customer sessions with 
a mistyped {{{}spark.sql.defaultCatalog{}}}), this throws {{CATALOG_NOT_FOUND}} 
mid-analysis on every column-resolution pass that reaches the variable-lookup 
last-resort path.

When SQL PATH is not explicitly set, the answer is fully determined by 
{{SQLConf.isDefinerProcedureExcludeSessionFromSearchPath}} (the default path 
always contains {{system.session}} otherwise) and never depends on the catalog 
name.

*Action:* short-circuit before touching the catalog when PATH is not explicitly 
set:
{code:scala}
private def allowUnqualifiedSessionTempVariableLookup(nameParts: Seq[String]): 
Boolean = {
  if (nameParts.length != 1) return true
  val pathExplicitlySet = conf.pathEnabled && 
catalogManager.sessionPathEntries.isDefined
  if (!pathExplicitlySet) {
    return !conf.isDefinerProcedureExcludeSessionFromSearchPath
  }
  catalogManager.sessionScopeUnqualifiedAllowed(
    catalogManager.currentCatalog.name(),
    catalogManager.currentNamespace.toSeq)
}
{code}
h3. 6. DROP VARIABLE PATH gate is asymmetric with DECLARE / CREATE VARIABLE

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala}}

The PR added a {{{}system.session{}}}-on-PATH gate to {{DropVariable}} only. 
{{CreateVariable}} (DECLARE / CREATE) has no such gate, so a user can {{DECLARE 
OR REPLACE VARIABLE foo}} they cannot drop under a {{SET PATH}} that omits 
{{{}system.session{}}}. {{SET VAR foo = ...}} (which goes through 
{{VariableResolution.lookupVariable}} and inherits the same gate) also fails in 
that case.

The intended policy is "DDL ignores PATH, DML uses it":
 * DECLARE / CREATE / DROP VARIABLE always target {{system.session}} directly 
regardless of PATH (no gate).
 * SET VAR / {{SELECT @foo}} resolve through PATH and require 
{{system.session}} on the path (gate stays).

*Action:* remove the {{if (nameParts.length == 1 && 
!catalogManager.sessionScopeUnqualifiedAllowed(...)) throw ...}} block from the 
{{DropVariable}} case in {{{}ResolveCatalogs{}}}. Add a regression test that 
exercises a {{DECLARE -> SET VAR (qualified) -> DROP}} cycle under {{{}SET PATH 
= somecat.somesch{}}}.
h3. 7. {{lookupFunctionType}} exception swallow is too broad

File: 
{{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}

The PR uses {{{}catch \{ case NonFatal(_) => false }}} around 
{{functionExists{}}}. That swallows {{{}PERMISSION_DENIED{}}}, transient 
catalog errors, and unrelated bugs uniformly – the user sees 
{{UNRESOLVED_ROUTINE}} with no upstream cause, and the underlying error never 
surfaces.

*Action:* mirror the explicit not-found list already used in 
{{{}resolveFunctionCandidate{}}}:
{code:scala}
catch {
  case _: NoSuchFunctionException
     | _: NoSuchDatabaseException
     | _: NoSuchNamespaceException
     | _: CatalogNotFoundException =>
    false
  case e: AnalysisException if e.getCondition == "FORBIDDEN_OPERATION" =>
    false
}
{code}
Other exceptions should propagate.
h3. 8. {{lookupFunctionType}} fan-out includes wasteful system-prefix entries

For 1-part names the new code iterates {{resolutionCandidates(nameParts)}} and 
calls {{functionExists}} on every entry. The candidate list includes the 
{{{}system.session.<name>{}}}, {{{}system.builtin.<name>{}}}, 
{{system.ai.<name>}} path entries – but routines under those namespaces are 
already resolved by {{lookupBuiltinOrTempFunction}} / 
{{lookupBuiltinOrTempTableFunction}} higher up. Calling {{functionExists}} on 
them wastes catalog calls (UC backends RPC) and is observable under tests that 
count calls.

*Action:* filter the candidate list down to non-system entries before the 
{{functionExists}} walk:
{code:scala}
val persistentCandidates = resolutionCandidates(nameParts).filterNot { c =>
  c.length >= 2 && c.head.equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) 
&& {
    val ns = c(1)
    ns.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)
      || ns.equalsIgnoreCase(CatalogManager.BUILTIN_NAMESPACE)
      || ns.equalsIgnoreCase(CatalogManager.AI_NAMESPACE)
  }
}
{code}
h2. Medium
h3. 9. Three near-duplicate path-resolution helpers should be lifted to 
{{CatalogManager}}

{{RelationResolution.relationResolutionEntries}} and 
{{FunctionResolution.sqlResolutionPathEntriesForAnalysis}} each implement 
"view-aware ordered SQL path entries". They will drift. Procedure resolution in 
particular doesn't consult {{{}AnalysisContext.resolutionPathEntries{}}}, so 
when #2 lands, {{CALL}} inside a view body will silently use the live session 
path.

*Action:* add one helper on {{{}CatalogManager{}}}:
{code:scala}
def resolutionPathEntriesForAnalysis(
    pinnedEntries: Option[Seq[Seq[String]]],
    viewCatalogAndNamespace: Seq[String]): Seq[Seq[String]]
{code}
that picks pinned entries when set (and PATH is enabled) or falls back to 
{{sqlResolutionPathEntries}} with view-aware path/expand args. Route relation, 
function, and procedure resolution all through it.
h3. 10. Tests for the new error paths and gates

The PR added at least three new behaviour gates with no direct test coverage in 
{{{}SetPathSuite{}}}:
 * DECLARE / SET VAR / DROP VARIABLE cycle under {{SET PATH = somecat.somesch}} 
(positive case for DDL, negative case for unqualified DML lookup).
 * DESCRIBE TABLE error message format under the new {{QueryLike}} / 
{{UnresolvedTableOrViewSearchPathMode}} error path – verify the message lists 
the full PATH.
 * Unqualified variable resolution failure when {{system.session}} is not on 
PATH (e.g. PATH set, {{SELECT @x}} must fail; {{SELECT system.session.x}} must 
succeed).

*Action:* add {{{}intercept[AnalysisException]{}}}-based tests for the above to 
{{{}SetPathSuite{}}}.
h3. 11. {{ProtoToParsedPlanTestSuite.analyzerIsolationConf}} is a bare 
{{SQLConf}}

File: 
{{sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala}}

The PR introduced {{analyzerIsolationConf = new SQLConf()}} to insulate the 
suite from session SET PATH. That bare {{SQLConf}} discards every override the 
test session's {{sparkConf}} sets ({{{}ANSI_ENABLED{}}}, 
{{{}ALWAYS_INLINE_COMMON_EXPR{}}}, {{{}USE_COMMON_EXPR_ID_FOR_ALIAS{}}}, ...) 
and exposes the {{SQLConf}} factory defaults. Each lost override is silently a 
coverage gap.

*Action:* clone the shared session's conf instead, only overriding the knobs 
that genuinely need to differ:
{code:scala}
private lazy val analyzerIsolationConf: SQLConf = {
  val c = spark.sessionState.conf.clone()
  c.setConf(SQLConf.PATH_ENABLED, false)
  c
}
{code}
This automatically picks up the test session's overrides and only the genuine 
isolation knob remains explicit.


> PATH cleanup
> ------------
>
>                 Key: SPARK-56681
>                 URL: https://issues.apache.org/jira/browse/SPARK-56681
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 4.2.0
>            Reporter: Serge Rielau
>            Priority: Major
>
> Umbrella for SQL Language follow-ups identified  Each item below is
> independently actionable; #1, #4, #5, #6, #7, #11 are 1-line / 1-block
> fixes; #2 and #3 are the substantive follow-up work the SPARK-56605
> docstring already promised.
> Source review: 
> [https://github.com/databricks-eng/runtime/pull/215112#issuecomment-4353729745]
> (See the comment on this issue for the full enumerated checklist; each
> item can be broken out into its own linked subtask.)
> h2. Critical
> h3. 1. {{FunctionResolution.resolveProcedure}} is dead code
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}
> The PR adds a ~70-line {{def resolveProcedure(unresolved: 
> UnresolvedProcedure): LogicalPlan}} that has no callers in 
> {{{}apache/spark{}}}. Procedure resolution is still routed through the 
> existing {{Analyzer.ResolveProcedures}} rule into a different code path. 
> Downstream forks have to either wire it or delete it on each cherry-pick.
> *Action:* either replace the existing call site to use the new method (and 
> delete the older path) or revert the hunk. Decide as part of the wiring PR 
> that the SPARK-56605 docstring promises.
> h2. High
> h3. 2. Frozen view / SQL-function PATH wiring is unfinished
> The PR docstring states:
> Frozen path analysis for views/SQL functions (using stored path during 
> analysis) comes in a follow-up PR. This PR only wires the resolution engine 
> to use the live session path.\{quote}
>  
> The follow-up has not landed. Concretely:
>  * {{AnalysisContext.resolutionPathEntries}} is declared but never populated 
> from view metadata.
>  * {{CatalogManager.serializePathEntries}} / {{pathEntriesForPersistence}} 
> are declared but never written to view / SQL function properties.
>  * {{AnalysisContext.relationResolutionPathCache}} is unused on this branch.
> *Action:* finish the wiring. Capture path entries at CREATE VIEW / CREATE 
> FUNCTION time, persist into catalog properties, deserialize on view body / 
> SQL function body re-analysis, and populate 
> {{AnalysisContext.resolutionPathEntries}} accordingly.
> h3. 3. {{AnalysisContext.resolutionPathEntries}} is a new threadlocal
> The single-pass resolver project explicitly forbids new threadlocal state. 
> The frozen view path needs to be plumbed through 
> {{OperatorResolutionContextStack}} (or another single-pass-friendly 
> propagation mechanism) and the field removed from {{{}AnalysisContext{}}}.
> *Action:* before/together with #2, plumb the frozen path through the 
> single-pass operator stack instead of {{{}AnalysisContext{}}}. Audit 
> {{AnalysisContext.withNewAnalysisContext}} and {{AnalysisContext.reset}} to 
> ensure no leakage even before the field is removed.
> h3. 4. {{Analyzer.executeAndCheck}} clobbers outer 
> {{SQLConf.withExistingConf}}
> After the PR, {{Analyzer}} carries a {{sessionConf: Option[SQLConf]}} and 
> {{executeAndCheck}} unconditionally wraps analysis in 
> {{{}SQLConf.withExistingConf(sessionConf){}}}. This clobbers any outer 
> {{withExistingConf}} scope (e.g. a SQL UDF / view body that wired frozen 
> configs). {{executeSameContext}} already short-circuits with {{SQLConf.get ne 
> c => super.execute(plan)}} but {{executeAndCheck}} does not. The two 
> predicates also disagree: {{executeSameContext}} uses {{SQLConf.get}} (falls 
> through to {{{}confGetter{}}}) while a correct fix needs to peek at the 
> threadlocal directly so "no outer scope" and "outer scope happens to match 
> the session conf" are not conflated.
> *Action:* add public {{SQLConf.getExistingConfIfSet: Option[SQLConf]}} 
> (returns {{Some}} only when an outer scope explicitly installed a conf via 
> {{{}withExistingConf{}}}). Extract one {{runWithSessionConf}} helper in 
> {{{}Analyzer{}}}, used by both {{executeAndCheck}} and 
> {{{}executeSameContext{}}}, that:
>  * returns the thunk unchanged if {{sessionConf}} is {{{}None{}}};
>  * returns the thunk unchanged if an outer scope has already installed a 
> different conf via {{withExistingConf}} (use {{{}getExistingConfIfSet{}}});
>  * otherwise wraps in {{{}SQLConf.withExistingConf(sessionConf){}}}.
> Reproducer: a SQL UDF / view body that captures an {{ANSI_ENABLED}} value at 
> CREATE time and is re-analyzed in a session that has flipped {{ANSI_ENABLED}} 
> should still see the captured value during analysis. Today the captured value 
> is overwritten by the analyzer's {{{}sessionConf{}}}.
> h3. 5. {{VariableResolution.allowUnqualifiedSessionTempVariableLookup}} 
> force-loads default catalog
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/VariableResolution.scala}}
> The check eagerly evaluates {{{}catalogManager.currentCatalog.name(){}}}, 
> which calls {{Catalogs.load}} on the configured default catalog. If the 
> configured default catalog is not a registered plugin (e.g. tests, customer 
> sessions with a mistyped {{{}spark.sql.defaultCatalog{}}}), this throws 
> {{CATALOG_NOT_FOUND}} mid-analysis on every column-resolution pass that 
> reaches the variable-lookup last-resort path.
> When SQL PATH is not explicitly set, the answer is fully determined by 
> {{SQLConf.isDefinerProcedureExcludeSessionFromSearchPath}} (the default path 
> always contains {{system.session}} otherwise) and never depends on the 
> catalog name.
> *Action:* short-circuit before touching the catalog when PATH is not 
> explicitly set:
> {code:scala}
> private def allowUnqualifiedSessionTempVariableLookup(nameParts: 
> Seq[String]): Boolean = {
>   if (nameParts.length != 1) return true
>   val pathExplicitlySet = conf.pathEnabled && 
> catalogManager.sessionPathEntries.isDefined
>   if (!pathExplicitlySet) {
>     return !conf.isDefinerProcedureExcludeSessionFromSearchPath
>   }
>   catalogManager.sessionScopeUnqualifiedAllowed(
>     catalogManager.currentCatalog.name(),
>     catalogManager.currentNamespace.toSeq)
> }
> {code}
> h3. 6. DROP VARIABLE PATH gate is asymmetric with DECLARE / CREATE VARIABLE
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala}}
> The PR added a {{{}system.session{}}}-on-PATH gate to {{DropVariable}} only. 
> {{CreateVariable}} (DECLARE / CREATE) has no such gate, so a user can 
> {{DECLARE OR REPLACE VARIABLE foo}} they cannot drop under a {{SET PATH}} 
> that omits {{{}system.session{}}}. {{SET VAR foo = ...}} (which goes through 
> {{VariableResolution.lookupVariable}} and inherits the same gate) also fails 
> in that case.
> The intended policy is "DDL ignores PATH, DML uses it":
>  * DECLARE / CREATE / DROP VARIABLE always target {{system.session}} directly 
> regardless of PATH (no gate).
>  * SET VAR / {{SELECT @foo}} resolve through PATH and require 
> {{system.session}} on the path (gate stays).
> *Action:* remove the {{if (nameParts.length == 1 && 
> !catalogManager.sessionScopeUnqualifiedAllowed(...)) throw ...}} block from 
> the {{DropVariable}} case in {{{}ResolveCatalogs{}}}. Add a regression test 
> that exercises a {{DECLARE -> SET VAR (qualified) -> DROP}} cycle under 
> {{{}SET PATH = somecat.somesch{}}}.
> h3. 7. {{lookupFunctionType}} exception swallow is too broad
> File: 
> {{sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionResolution.scala}}
> The PR uses {{{}catch \{ case NonFatal(_) => false }}} around 
> {{functionExists{}}}. That swallows {{{}PERMISSION_DENIED{}}}, transient 
> catalog errors, and unrelated bugs uniformly – the user sees 
> {{UNRESOLVED_ROUTINE}} with no upstream cause, and the underlying error never 
> surfaces.
> *Action:* mirror the explicit not-found list already used in 
> {{{}resolveFunctionCandidate{}}}:
> {code:scala}
> catch {
>   case _: NoSuchFunctionException
>      | _: NoSuchDatabaseException
>      | _: NoSuchNamespaceException
>      | _: CatalogNotFoundException =>
>     false
>   case e: AnalysisException if e.getCondition == "FORBIDDEN_OPERATION" =>
>     false
> }
> {code}
> Other exceptions should propagate.
> h3. 8. {{lookupFunctionType}} fan-out includes wasteful system-prefix entries
> For 1-part names the new code iterates {{resolutionCandidates(nameParts)}} 
> and calls {{functionExists}} on every entry. The candidate list includes the 
> {{{}system.session.<name>{}}}, {{{}system.builtin.<name>{}}}, 
> {{system.ai.<name>}} path entries – but routines under those namespaces are 
> already resolved by {{lookupBuiltinOrTempFunction}} / 
> {{lookupBuiltinOrTempTableFunction}} higher up. Calling {{functionExists}} on 
> them wastes catalog calls (UC backends RPC) and is observable under tests 
> that count calls.
> *Action:* filter the candidate list down to non-system entries before the 
> {{functionExists}} walk:
> {code:scala}
> val persistentCandidates = resolutionCandidates(nameParts).filterNot { c =>
>   c.length >= 2 && 
> c.head.equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) && {
>     val ns = c(1)
>     ns.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)
>       || ns.equalsIgnoreCase(CatalogManager.BUILTIN_NAMESPACE)
>       || ns.equalsIgnoreCase(CatalogManager.AI_NAMESPACE)
>   }
> }
> {code}
> h2. Medium
> h3. 9. Three near-duplicate path-resolution helpers should be lifted to 
> {{CatalogManager}}
> {{RelationResolution.relationResolutionEntries}} and 
> {{FunctionResolution.sqlResolutionPathEntriesForAnalysis}} each implement 
> "view-aware ordered SQL path entries". They will drift. Procedure resolution 
> in particular doesn't consult {{{}AnalysisContext.resolutionPathEntries{}}}, 
> so when #2 lands, {{CALL}} inside a view body will silently use the live 
> session path.
> *Action:* add one helper on {{{}CatalogManager{}}}:
> {code:scala}
> def resolutionPathEntriesForAnalysis(
>     pinnedEntries: Option[Seq[Seq[String]]],
>     viewCatalogAndNamespace: Seq[String]): Seq[Seq[String]]
> {code}
> that picks pinned entries when set (and PATH is enabled) or falls back to 
> {{sqlResolutionPathEntries}} with view-aware path/expand args. Route 
> relation, function, and procedure resolution all through it.
> h3. 10. Tests for the new error paths and gates
> The PR added at least three new behaviour gates with no direct test coverage 
> in {{{}SetPathSuite{}}}:
>  * DECLARE / SET VAR / DROP VARIABLE cycle under {{SET PATH = 
> somecat.somesch}} (positive case for DDL, negative case for unqualified DML 
> lookup).
>  * DESCRIBE TABLE error message format under the new {{QueryLike}} / 
> {{UnresolvedTableOrViewSearchPathMode}} error path – verify the message lists 
> the full PATH.
>  * Unqualified variable resolution failure when {{system.session}} is not on 
> PATH (e.g. PATH set, {{SELECT @x}} must fail; {{SELECT system.session.x}} 
> must succeed).
> *Action:* add {{{}intercept[AnalysisException]{}}}-based tests for the above 
> to {{{}SetPathSuite{}}}.
> h3. 11. {{ProtoToParsedPlanTestSuite.analyzerIsolationConf}} is a bare 
> {{SQLConf}}
> File: 
> {{sql/connect/server/src/test/scala/org/apache/spark/sql/connect/ProtoToParsedPlanTestSuite.scala}}
> The PR introduced {{analyzerIsolationConf = new SQLConf()}} to insulate the 
> suite from session SET PATH. That bare {{SQLConf}} discards every override 
> the test session's {{sparkConf}} sets ({{{}ANSI_ENABLED{}}}, 
> {{{}ALWAYS_INLINE_COMMON_EXPR{}}}, {{{}USE_COMMON_EXPR_ID_FOR_ALIAS{}}}, ...) 
> and exposes the {{SQLConf}} factory defaults. Each lost override is silently 
> a coverage gap.
> *Action:* clone the shared session's conf instead, only overriding the knobs 
> that genuinely need to differ:
> {code:scala}
> private lazy val analyzerIsolationConf: SQLConf = {
>   val c = spark.sessionState.conf.clone()
>   c.setConf(SQLConf.PATH_ENABLED, false)
>   c
> }
> {code}
> This automatically picks up the test session's overrides and only the genuine 
> isolation knob remains explicit.
> ALSO:
> Confirmed — {{ResolveSetVariable.scala:105}} hardcodes {{Seq("SYSTEM", 
> "SESSION")}} as the search path it passes to {{{}unresolvedVariableError{}}}, 
> regardless of the actual SQL PATH. So the error message lies about what was 
> searched. That's the same family of bug as Vladimir's #6 (DESCRIBE error 
> format) — the error path doesn't get reformatted under the new {{QueryLike}} 
> mode either.
> I'll add it to SPARK-56681 as a separate follow-up item and commit the test 
> as-is (the hardcoded message means my {{checkError}} matches today, with the 
> understanding that the OSS fix will require a corresponding {{parameters = 
> Map(... "searchPath" -> "<actual path>")}} update here).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to