This is an automated email from the ASF dual-hosted git repository.
cloud-fan pushed a commit to branch branch-4.2
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.2 by this push:
new 22f018bea049 [SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF
NOT EXISTS on existing namespace
22f018bea049 is described below
commit 22f018bea049cec7ba627fec34d0ed290199dc5d
Author: Wenchen Fan <[email protected]>
AuthorDate: Thu May 21 18:25:15 2026 +0800
[SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF NOT EXISTS on
existing namespace
### What changes were proposed in this pull request?
Follow-up to SPARK-55250. Add a recovery path to
`CreateNamespaceExec.run()` so `IF NOT EXISTS` is a no-op when the namespace
already exists, even if the catalog surfaces an error other than
`NamespaceAlreadyExistsException`:
```scala
try {
val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName())
catalog.createNamespace(ns, (properties ++ ownership).asJava)
} catch {
case _: NamespaceAlreadyExistsException if ifNotExists =>
logWarning(...)
case NonFatal(e) if ifNotExists =>
val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_)
=> false }
if (exists) logWarning(..., e) else throw e
}
```
The unconditional `createNamespace` call introduced by SPARK-55250 is
preserved as the first step, so the perf win from that PR is kept on every
happy path. The `namespaceExists` fallback runs only when `createNamespace` has
already failed — a path that was previously an unrecoverable error.
### Why are the changes needed?
SPARK-55250 changed `CREATE NAMESPACE IF NOT EXISTS foo` from "check
existence first, skip if present" to "always call `createNamespace`, catch
`NamespaceAlreadyExistsException`". This relies on the catalog raising
`NamespaceAlreadyExistsException` rather than some other error when the
namespace is pre-existing.
For `SupportsNamespaces` implementations that validate the request (ACLs,
properties, etc.) before checking existence, this assumption doesn't hold: the
validation error surfaces first, the `NamespaceAlreadyExistsException` is never
thrown, and the `IF NOT EXISTS` no-op semantic is lost.
The fix asks the only question that actually matters under `IF NOT EXISTS`
after a failure: "does the namespace exist now?" If yes, intent satisfied;
otherwise the original error propagates.
### Does this PR introduce _any_ user-facing change?
Yes. `CREATE NAMESPACE IF NOT EXISTS foo` is again a no-op when `foo`
already exists, regardless of which error the catalog raised on the create
attempt. This matches the pre-SPARK-55250 contract.
RPC accounting (using UC as an example of a catalog that validates before
existence check):
| Scenario | SPARK-55250 | This PR |
|---|---|---|
| no `IF NOT EXISTS` | 1 | 1 |
| `IF NOT EXISTS`, foo absent | 1 | 1 |
| `IF NOT EXISTS`, foo exists, create succeeds-or-throws-AlreadyExists | 1
| 1 |
| `IF NOT EXISTS`, foo exists, create throws something else | 1 (surfaces
error ❌) | 2 (recovers ✅) |
### How was this patch tested?
- New `ValidatingInMemoryTableCatalog` (an `InMemoryTableCatalog` subclass
that validates before checking existence, so a pre-existing namespace raises a
non-`NamespaceAlreadyExistsException`) registered as `validating_test_catalog`
in `v2.CommandSuiteBase`.
- New SQL-level regression test in `v2.CreateNamespaceSuite` that creates a
namespace and re-runs `CREATE NAMESPACE IF NOT EXISTS` against that catalog —
fails on master, passes with this PR.
- Existing
`org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite` "hive client
calls" test still asserts exactly 1 RPC for each of the three `CREATE
NAMESPACE` shapes it covers — confirming SPARK-55250's perf win is preserved on
the happy path.
- Existing v1 / v2 `CreateNamespaceSuite` pass.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
Closes #56027 from cloud-fan/wenchen/SPARK-55250-followup.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 4131b00ca51b34b0505c498edc9349d5fc6c13c7)
Signed-off-by: Wenchen Fan <[email protected]>
---
.../catalog/ValidatingInMemoryTableCatalog.scala} | 21 ++++++++++++++++-----
.../datasources/v2/CreateNamespaceExec.scala | 14 ++++++++++++++
.../execution/command/v2/CreateNamespaceSuite.scala | 21 +++++++++++++++++++++
3 files changed, 51 insertions(+), 5 deletions(-)
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/ValidatingInMemoryTableCatalog.scala
similarity index 50%
copy from
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
copy to
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/ValidatingInMemoryTableCatalog.scala
index 6b5475a1e267..820f51a2af45 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
+++
b/sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/ValidatingInMemoryTableCatalog.scala
@@ -15,13 +15,24 @@
* limitations under the License.
*/
-package org.apache.spark.sql.execution.command.v2
+package org.apache.spark.sql.connector.catalog
-import org.apache.spark.sql.execution.command
+import java.util
/**
- * The class contains tests for the `CREATE NAMESPACE` command to check V2
table catalogs.
+ * A test catalog whose `createNamespace` validates the request before
checking existence, so a
+ * pre-existing namespace surfaces a non-`NamespaceAlreadyExistsException`
error. Mirrors the
+ * authorize-then-execute ordering of catalogs like Unity Catalog and is used
to exercise the
+ * `IF NOT EXISTS` recovery path in `CreateNamespaceExec`.
*/
-class CreateNamespaceSuite extends command.CreateNamespaceSuiteBase with
CommandSuiteBase {
- override def namespace: String = "ns1.ns2"
+class ValidatingInMemoryTableCatalog extends InMemoryTableCatalog {
+ override def createNamespace(
+ namespace: Array[String],
+ metadata: util.Map[String, String]): Unit = {
+ if (namespaceExists(namespace)) {
+ throw new RuntimeException(
+ s"simulated validation failure on pre-existing namespace
${namespace.mkString(".")}")
+ }
+ super.createNamespace(namespace, metadata)
+ }
}
diff --git
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala
index 02197a76aa1b..95edbba62dcb 100644
---
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala
+++
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala
@@ -18,6 +18,7 @@
package org.apache.spark.sql.execution.datasources.v2
import scala.jdk.CollectionConverters.MapHasAsJava
+import scala.util.control.NonFatal
import org.apache.spark.internal.LogKeys.NAMESPACE
import org.apache.spark.sql.catalyst.InternalRow
@@ -46,6 +47,19 @@ case class CreateNamespaceExec(
case _: NamespaceAlreadyExistsException if ifNotExists =>
logWarning(log"Namespace ${MDC(NAMESPACE, namespace.quoted)} was
created concurrently. " +
log"Ignoring.")
+ case NonFatal(e) if ifNotExists =>
+ // Some catalogs validate the request (e.g. ACLs, properties) before
checking existence,
+ // so creating a pre-existing namespace can surface errors unrelated
to the "already
+ // exists" condition the caller intends to ignore under IF NOT EXISTS.
If the namespace
+ // really does exist, treat the operation as a no-op; otherwise
propagate the original
+ // error.
+ val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_)
=> false }
+ if (exists) {
+ logWarning(log"Namespace ${MDC(NAMESPACE, namespace.quoted)} already
exists; " +
+ log"swallowing underlying error under IF NOT EXISTS.", e)
+ } else {
+ throw e
+ }
}
Seq.empty
diff --git
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
index 6b5475a1e267..973676fe1f63 100644
---
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
+++
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CreateNamespaceSuite.scala
@@ -17,6 +17,8 @@
package org.apache.spark.sql.execution.command.v2
+import org.apache.spark.SparkConf
+import org.apache.spark.sql.connector.catalog.ValidatingInMemoryTableCatalog
import org.apache.spark.sql.execution.command
/**
@@ -24,4 +26,23 @@ import org.apache.spark.sql.execution.command
*/
class CreateNamespaceSuite extends command.CreateNamespaceSuiteBase with
CommandSuiteBase {
override def namespace: String = "ns1.ns2"
+
+ // A test catalog whose createNamespace validates before checking existence;
used to
+ // exercise CreateNamespaceExec's IF NOT EXISTS recovery path.
+ private val validatingCatalog: String = "validating_test_catalog"
+
+ override def sparkConf: SparkConf = super.sparkConf
+ .set(s"spark.sql.catalog.$validatingCatalog",
+ classOf[ValidatingInMemoryTableCatalog].getName)
+
+ test("SPARK-55250: IF NOT EXISTS is a no-op on pre-existing namespace even
when the " +
+ "catalog raises a non-NamespaceAlreadyExistsException error") {
+ val ns = s"$validatingCatalog.$namespace"
+ withNamespace(ns) {
+ sql(s"CREATE NAMESPACE $ns")
+ // Without the IF NOT EXISTS recovery path, this would surface the
catalog's
+ // pre-existence validation error.
+ sql(s"CREATE NAMESPACE IF NOT EXISTS $ns")
+ }
+ }
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]