Hi Calcite devs, I would like to start a discussion around *error handling semantics at SQL statement level* in Calcite. ------------------------------ 1. Background
In modern data processing systems (both batch and streaming), handling malformed or partially invalid data is a common requirement. Typical issues include: - malformed JSON / structured payloads - type mismatches during casting - schema evolution inconsistencies - runtime exceptions in user-defined functions Currently, Calcite provides limited support for error handling via: - expression-level constructs (e.g., TRY_CAST) - NULL propagation semantics However, these approaches are limited to *expression-level error tolerance*. ------------------------------ 2. Problem Statement There is currently no way to express *statement-level error handling semantics* in SQL within Calcite. Specifically: 2.1 No abstraction beyond expression level Error handling must be embedded into individual expressions: TRY_CAST(col AS INT) This leads to: - verbose queries - duplicated logic - lack of composability ------------------------------ 2.2 No structured error propagation There is no way to: - capture error context - classify errors - propagate error metadata alongside query execution ------------------------------ 2.3 No extensibility for downstream systems Many systems built on Calcite (e.g., streaming engines, data processing frameworks) require more advanced error handling capabilities, but currently must implement them outside SQL. ------------------------------ 3. Discussion Proposal I would like to explore whether Calcite should support a more general abstraction for error handling at SQL level. Some possible directions: ------------------------------ Option A: Statement-level TRY semantics SELECT * FROM TRY(source_table) Semantics: - failed records are skipped or handled based on policy ------------------------------ Option B: Error handling clause (conceptual) SELECT * FROM source_table HANDLE ERRORS WITH <policy> Where <policy> could define: - ignore - nullify - propagate - custom handling ------------------------------ Option C: Error-aware relational operator Introduce a logical abstraction such as: ErrorHandlingRelNode Which could: - wrap existing RelNodes - attach error metadata - allow downstream systems to interpret error semantics ------------------------------ 4. Key Questions I would appreciate feedback on: 1. Should Calcite support error handling beyond expression-level? 2. Is there prior discussion or design work in this area? 3. Would a logical operator (instead of SQL syntax) be more appropriate? 4. How should this interact with relational algebra assumptions (single output, determinism)? 5. Should Calcite remain minimal and leave this entirely to downstream systems? ------------------------------ 5. Motivation The goal is not to introduce engine-specific features, but to explore whether Calcite should provide a *generic abstraction layer* for error handling that downstream systems can leverage. ------------------------------ Closing I am interested in hearing thoughts on whether this direction aligns with Calciteās design goals, and whether such an extension would be considered in scope. Thanks!
