Hi Everyone,

I'd like to discuss adding an endpoint to the REST SQL Gateway that exposes
Flink's parser as a side-effect-free service for splitting and classifying
multi-statement SQL scripts.

*Motivation*
Tools that submit multi-statement SQL scripts to the SQL Gateway today must
choose between three bad options for splitting them into individual
statements:

1. Naive ;-splitting — breaks on semicolons inside string literals, quoted
identifiers, comments, and STATEMENT SET BEGIN ... END blocks.
2. Custom client-side state machines — every tool reimplements the full
Flink lexer (string literals, escape rules, line/block comments,
statement-set detection). It's hundreds of lines of duplicated code that
drifts as Flink evolves and never quite matches the parser.
3. Per-statement Parser.parse() calls — validates against the catalog, so
it rejects any statement that references a table not yet defined. Can't be
used to split scripts where DDL precedes DML.

Each of these carries a real cost: subtly wrong splitting (semicolons in
string literals broken, STATEMENT SET blocks fragmented), dead code paths
that diverge from Flink's grammar, or a hard dependency on the catalog
being pre-populated.

The gateway already has the right tool — Calcite's parseStmtList() — but
it's not exposed.

*Proposal*
*A new endpoint:*
```
POST /sessions/{handle}/statements/split
```

*Request:*
 ```json
{"script": "CREATE TABLE src ...; INSERT INTO sink SELECT * FROM src;"}
 ```

* Response:*
 ```json
 {
   "statements": [
     {"sql": "CREATE TABLE `src` ...",  "producesPlan": false},
     {"sql": "INSERT INTO `sink`\nSELECT * FROM `src`", "producesPlan":
true}
   ]
 }
```

Backed by a new Parser.splitStatements(String) API that delegates to
parseStmtList() — the same parser used everywhere else, minus the catalog
resolution.

*Key properties*
- Parser-aware — respects string literals, quoted identifiers, comments,
and STATEMENT SET BEGIN ... END blocks.
- Side-effect free — no catalog mutation, no DDL execution, no session
state changes. Safe to call from any session at any point.
- Catalog-independent — works on scripts that reference tables defined
later in the same script.
- Annotated output — producesPlan: true for INSERT, CTAS, REPLACE TABLE AS
SELECT, STATEMENT SET, EXECUTE STATEMENT SET — anything eligible for
COMPILE PLAN FOR, EXPLAIN, or EXECUTE.

*Why include a boolean for producesPlan?*
Many SQL clients want to know if they can execute COMPILE PLAN or EXPLAIN
on a SQL statement. However, you need either a clone of the parsing logic
or try/catch blocks. This provides a simple interface using the engine's
parser.

Most REST endpoint changes don't require a FLIP - but I can open one if the
team thinks it is worthwhile. Otherwise, I can open a PR with our internal
changes.

Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>

Reply via email to