Hi Everyone,
I'd like to discuss adding an endpoint to the REST SQL Gateway that exposes
Flink's parser as a side-effect-free service for splitting and classifying
multi-statement SQL scripts.
*Motivation*
Tools that submit multi-statement SQL scripts to the SQL Gateway today must
choose between three bad options for splitting them into individual
statements:
1. Naive ;-splitting — breaks on semicolons inside string literals, quoted
identifiers, comments, and STATEMENT SET BEGIN ... END blocks.
2. Custom client-side state machines — every tool reimplements the full
Flink lexer (string literals, escape rules, line/block comments,
statement-set detection). It's hundreds of lines of duplicated code that
drifts as Flink evolves and never quite matches the parser.
3. Per-statement Parser.parse() calls — validates against the catalog, so
it rejects any statement that references a table not yet defined. Can't be
used to split scripts where DDL precedes DML.
Each of these carries a real cost: subtly wrong splitting (semicolons in
string literals broken, STATEMENT SET blocks fragmented), dead code paths
that diverge from Flink's grammar, or a hard dependency on the catalog
being pre-populated.
The gateway already has the right tool — Calcite's parseStmtList() — but
it's not exposed.
*Proposal*
*A new endpoint:*
```
POST /sessions/{handle}/statements/split
```
*Request:*
```json
{"script": "CREATE TABLE src ...; INSERT INTO sink SELECT * FROM src;"}
```
* Response:*
```json
{
"statements": [
{"sql": "CREATE TABLE `src` ...", "producesPlan": false},
{"sql": "INSERT INTO `sink`\nSELECT * FROM `src`", "producesPlan":
true}
]
}
```
Backed by a new Parser.splitStatements(String) API that delegates to
parseStmtList() — the same parser used everywhere else, minus the catalog
resolution.
*Key properties*
- Parser-aware — respects string literals, quoted identifiers, comments,
and STATEMENT SET BEGIN ... END blocks.
- Side-effect free — no catalog mutation, no DDL execution, no session
state changes. Safe to call from any session at any point.
- Catalog-independent — works on scripts that reference tables defined
later in the same script.
- Annotated output — producesPlan: true for INSERT, CTAS, REPLACE TABLE AS
SELECT, STATEMENT SET, EXECUTE STATEMENT SET — anything eligible for
COMPILE PLAN FOR, EXPLAIN, or EXECUTE.
*Why include a boolean for producesPlan?*
Many SQL clients want to know if they can execute COMPILE PLAN or EXPLAIN
on a SQL statement. However, you need either a clone of the parsing logic
or try/catch blocks. This provides a simple interface using the engine's
parser.
Most REST endpoint changes don't require a FLIP - but I can open one if the
team thinks it is worthwhile. Otherwise, I can open a PR with our internal
changes.
Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>