Re: [PR] [SPARK-56984][DOCS] Document the SQL PATH feature [spark]

via GitHub Fri, 22 May 2026 08:30:47 -0700


cloud-fan commented on code in PR #56040:
URL: https://github.com/apache/spark/pull/56040#discussion_r3289444095



##########
docs/sql-ref-syntax-aux-conf-mgmt-set-path.md:
##########
@@ -0,0 +1,239 @@
+---
+layout: global
+title: SET PATH
+displayTitle: SET PATH
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+     http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description
+
+`SET PATH` changes the **SQL Path** of the current session.
+
+The SQL Path is an ordered list of catalog-qualified schema names that Spark 
walks when
+resolving unqualified references to functions, tables, views, and session 
variables in queries
+and DML (`SELECT`, `INSERT`, `UPDATE`, `DELETE`, `MERGE`). The first match 
wins. DDL
+(`CREATE TABLE`, `CREATE VIEW`, `CREATE FUNCTION`, `DROP`, `ALTER`, ...) 
resolves unqualified
+object names against `current_catalog.current_schema`, not the path; so 
`CREATE TABLE t` always
+creates `t` in the current schema regardless of the path.
+
+The path can include two virtual namespaces in the `system` catalog:
+
+- `system.builtin` &mdash; built-in functions, including those injected by
+  `SparkSessionExtensions`.
+- `system.session` &mdash; temporary views, temporary functions, and session 
variables in the
+  current session.
+
+`SET PATH` is controlled by `spark.sql.path.enabled`. When it is `false` (the 
default),
+`SET PATH` raises `UNSUPPORTED_FEATURE.SET_PATH_WHEN_DISABLED`. Unqualified 
resolution and
+[`current_path()`](sql-ref-function-current-path.html) still use the default 
path.
+
+The initial value of `PATH` in a session is `DEFAULT_PATH`. `DEFAULT_PATH` is 
either the value of
+`spark.sql.defaultPath`, or, when that configuration is empty, a built-in 
value composed of
+`system.builtin`, `system.session`, and the current schema. To override, set
+`spark.sql.defaultPath`. See the [`DEFAULT_PATH` parameter](#parameters) for 
the exact derivation
+rules.
+
+The effect of `SET PATH` is scoped to the current session and is lost when the 
session ends. To
+re-apply the current default path mid-session, run `SET PATH = DEFAULT_PATH`. 
(This stores a
+snapshot of `DEFAULT_PATH` at the moment of the statement; later changes to
+`spark.sql.defaultPath` are not picked up automatically.) Cloned sessions 
inherit the parent's
+path at clone time; later changes in the child do not propagate back.
+
+Persistent views and SQL UDFs capture the path at `CREATE` time into the 
object's metadata.
+Each invocation resolves the body against that frozen path, not the invoker's 
current path;
+`current_schema()` and `current_path()` inside the body still return the 
invoker's context.
+
+The leading names `session` and `builtin` have special meaning in 2-part 
references; see
+[Reserved system names](sql-ref-identifier.html#reserved-system-names).
+
+### Syntax
+
+```sql
+SET PATH = path_element [ , ... ]
+
+path_element
+    { DEFAULT_PATH |
+      SYSTEM_PATH |
+      PATH |
+      CURRENT_SCHEMA |
+      CURRENT_DATABASE |
+      catalog_name . namespace [ . namespace ... ] }
+```
+
+### Parameters
+
+* **`DEFAULT_PATH`**
+
+  Expands to the session's default path. The default path has two layers:
+
+  1. If `spark.sql.defaultPath` is set to a non-empty value, that value is 
parsed using the same
+     grammar as `SET PATH` (with one restriction: the `PATH` keyword is not 
allowed inside the
+     conf value, since it would be self-referential).
+
+     The conf value is validated for syntax at the time it is set; an invalid 
value is rejected.
+     Static duplicates inside the conf are tolerated (unlike interactive `SET 
PATH`, which
+     rejects them) so a later `USE SCHEMA` cannot turn a previously valid 
default into a runtime
+     error. A `DEFAULT_PATH` token inside the conf value resolves to the 
spark-built-in default
+     below to avoid a cycle, rather than recursing.
+
+  2. If `spark.sql.defaultPath` is empty (the factory setting), the 
spark-built-in default
+     applies: `system.builtin`, `system.session`, and the current schema
+     (`current_catalog.current_schema`), in that order.
+
+  To change the default path, set `spark.sql.defaultPath` via any of the usual 
mechanisms
+  (`SET spark.sql.defaultPath = ...` at runtime, `--conf` on `spark-submit`, 
`SparkConf`, or
+  `spark-defaults.conf`); clear it with `RESET spark.sql.defaultPath` to 
return to the
+  spark-built-in default.
+
+* **`SYSTEM_PATH`**
+
+  Expands to the two system namespaces, `system.builtin` and `system.session`.
+
+* **`PATH`**
+
+  Expands to the **current** value of the SQL Path. Useful for appending 
entries without
+  re-typing them, for example `SET PATH = PATH, spark_catalog.analytics`.
+  `PATH` is not allowed in the value of `spark.sql.defaultPath` (it would 
create a cycle).
+
+* **`CURRENT_SCHEMA`** / **`CURRENT_DATABASE`**
+
+  A virtual marker that resolves to the catalog-qualified current schema
+  (`current_catalog.current_schema`) every time the path is consulted. This 
means subsequent
+  `USE SCHEMA` statements are picked up without re-issuing `SET PATH`.
+  `CURRENT_DATABASE` is a synonym for `CURRENT_SCHEMA`.
+
+* **`schema_name`**

Review Comment:
   The grammar update in this commit (`catalog_name . namespace [ . namespace 
... ]` on line 74) left this parameter heading orphaned: every other heading in 
this block — `DEFAULT_PATH`, `SYSTEM_PATH`, `PATH`, `CURRENT_SCHEMA`, 
`CURRENT_DATABASE` — appears verbatim as a token in the grammar above, but 
`schema_name` no longer does. The pre-follow-up grammar said `catalog_name . 
schema_name`, so the alignment used to hold.
   
   Suggest renaming this heading (and its label) to match the new grammar, e.g.:
   
   ```
   * **`catalog_name . namespace [ . namespace ... ]`**
   
     An explicit catalog-qualified namespace reference. At least two parts are 
required.
     The catalog and namespace do not need to exist at the time of `SET PATH`; 
non-existent
     entries are silently skipped during name resolution. ...
   ```
   
   Alternatively, revert the grammar to `catalog_name . schema_name` and convey 
"multi-level namespaces allowed" only in the body — but the current grammar 
form is more accurate, so realigning the heading seems better.



##########
docs/sql-ref-identifier.md:
##########
@@ -52,6 +52,30 @@ An identifier is a string used to identify a database object 
such as a table, vi
 
     Any character from the character set. Use <code>`</code> to escape special 
characters (e.g., <code>`</code>).
 
+### Reserved system names
+
+`system`, `session`, and `builtin` have special meaning and should not be used 
as user-defined
+catalog or schema names.
+
+| Name | Position | Notes |
+| :--- | :------- | :---- |
+| `system` | catalog | Virtual catalog hosting `system.builtin` and 
`system.session`. Spark does not load `system` through the v2 catalog API; 
setting `spark.sql.catalog.system = ...` is unsupported and produces undefined 
results. The current catalog cannot be `system`. |
+| `builtin` | schema | A persistent schema named `builtin` is allowed but 
discouraged because it collides with `system.builtin`. |
+| `session` | schema | A persistent schema named `session` is allowed but 
discouraged because it collides with `system.session`. |
+
+An unqualified 2-part reference like `builtin.x` or `session.x` walks a small 
**mini-path** to

Review Comment:
   "An unqualified 2-part reference" is in tension with the taxonomy this PR 
establishes in `sql-ref-name-resolution.md:273`, which heads exactly this case 
as `### Partially qualified (2 parts) — schema.object`. 
`describe-function.md:47` (also new in this PR) just says "2-part names". A 
2-part reference like `builtin.x` is partially qualified — it carries one level 
of qualifier (the schema), so calling it "unqualified" reads as 
self-contradictory.
   
   (Late catch — this wording was already in the prior review's snapshot and I 
should have flagged it then. Apologies for the second pass.)
   
   ```suggestion
   A partially qualified 2-part reference like `builtin.x` or `session.x` walks 
a small **mini-path** to
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56984][DOCS] Document the SQL PATH feature [spark]

Reply via email to