viirya opened a new pull request, #56102:
URL: https://github.com/apache/spark/pull/56102

   ### What changes were proposed in this pull request?
   
   Add a standard DSv2 mix-in interface `SupportsBranching`, plus the SQL 
surface to manage branches:
   
   ```java
   public interface SupportsBranching extends Table {
       TableBranch createBranch(String name, OptionalLong sourceSnapshotId);
       default TableBranch replaceBranch(String name, OptionalLong 
sourceSnapshotId);
       boolean dropBranch(String name);
       TableBranch fastForward(String branch, String targetBranch);
       default TableBranch[] listBranches();
   }
   ```
   
   With companion value type `TableBranch(name, snapshotId, creationTimeMs)` 
and `SupportsBranching.BranchAlreadyExistsException` for the duplicate-create 
case.
   
   New DDL:
   
   ```sql
   ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name [AS OF VERSION 
<integer>]
   ALTER TABLE t DROP BRANCH [IF EXISTS] name
   ALTER TABLE t FAST FORWARD branch TO target
   SHOW BRANCHES (FROM | IN) t
   ```
   
   Implementation:
   - Define the interface and `TableBranch` value type under 
`sql/catalyst/.../connector/catalog/`.
   - Extend the ANTLR grammar with the four DDL forms; register `BRANCH`, 
`BRANCHES`, `FAST`, `FORWARD` as non-reserved keywords; update 
`docs/sql-ref-ansi-compliance.md`.
   - Add logical plans (`CreateBranch` / `DropBranch` / `FastForwardBranch` / 
`ShowBranches`) and exec nodes; dispatch through `ResolvedTable` in 
`DataSourceV2Strategy`.
   - Add `DataSourceV2Implicits.asBranchable` and 
`QueryCompilationErrors.tableDoesNotSupportBranchingError` so non-branching 
tables fail with a clear message.
   - Add error condition `CREATE_BRANCH_WITH_IF_NOT_EXISTS_AND_REPLACE` for the 
conflicting clauses.
   - Implement `SupportsBranching` on `InMemoryTable` for testing.
   
   Reads and writes against a specific branch (`SELECT ... FOR BRANCH 'x'`, 
`INSERT ... FOR BRANCH 'x'`) are out of scope here and are added in the 
follow-up SPARK-57057.
   
   ### Why are the changes needed?
   
   Apache Iceberg and similar table formats support named branches as a 
first-class concept, but Spark today only exposes branching through 
connector-specific SQL extensions (e.g. `IcebergSparkSessionExtensions`). A 
standard DSv2 interface lets any data source expose branching through built-in 
Spark SQL, the same way `SupportsDelete` / `TruncatableTable` / 
`SupportsRowLevelOperations` standardize their respective capabilities.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. New SQL DDL is recognized:
   
   ```sql
   ALTER TABLE t CREATE [OR REPLACE] BRANCH [IF NOT EXISTS] name [AS OF VERSION 
<integer>]
   ALTER TABLE t DROP BRANCH [IF EXISTS] name
   ALTER TABLE t FAST FORWARD branch TO target
   SHOW BRANCHES (FROM | IN) t
   ```
   
   Data sources that do not implement `SupportsBranching` are unaffected — 
running the new DDL against them raises `AnalysisException` ("does not support 
branching"). Four new non-reserved keywords (`BRANCH`, `BRANCHES`, `FAST`, 
`FORWARD`) are added; they remain usable as identifiers in non-DDL contexts.
   
   ### How was this patch tested?
   
   - `DDLParserSuite`: 5 new parser tests covering all four DDL forms and the 
`IF NOT EXISTS` / `OR REPLACE` conflict error.
   - `SupportsBranchingSuite`: 12 new integration tests exercising the new DDL 
end-to-end against `InMemoryTable`, including positive cases for each operation 
and negative cases for duplicates, missing branches, fast-forward direction, 
and non-branching tables.
   - All 152 existing tests in `DDLParserSuite` still pass; 
`TableIdentifierParserSuite` (7 tests) confirms the new keywords don't regress 
identifier handling.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Code (Claude Opus 4.7)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to