Viggo Chen created CALCITE-7213:
-----------------------------------
Summary: StackOverflowError in SqlValidatorImpl with long OR/AND
predicate chains
Key: CALCITE-7213
URL: https://issues.apache.org/jira/browse/CALCITE-7213
Project: Calcite
Issue Type: Bug
Reporter: Viggo Chen
h1. Problem Description:
Apache Calcite's SQL validation stage is vulnerable to
`java.lang.StackOverflowError` when processing queries that contain a large
number of chained logical operators (`AND`/`OR`). This occurs because the
parser generates a deeply nested, unbalanced Abstract Syntax Tree (AST) for
these predicates. Subsequent recursive traversal during the validation phase
exhausts the JVM's call stack, causing the process to fail.
This issue prevents Calcite from handling complex, often machine-generated, SQL
queries, which are prevalent in modern data systems.
Steps to Reproduce:
The issue can be reliably reproduced with a test case that generates a long
chain of `OR` conditions. The following Java code creates a query with 5000
`OR` predicates, which is sufficient to trigger the stack overflow during the
validation step.
{code:java}
// Test case to generate the problematic SQL
final String whereClause = java.util.stream.IntStream.range(0, 5000)
.mapToObj(i -> "sal = " + i)
.collect(java.util.stream.Collectors.joining(" OR "));
// This will throw StackOverflowError during the validation phase
sql("SELECT * FROM emp WHERE " + whereClause).ok(); {code}
Generated SQL:
{code:java}
SELECT * FROM emp WHERE sal = 0 OR sal = 1 OR sal = 2 OR ... OR sal = 4999
{code}
h1. Stack Trace Analysis:
A typical stack trace points to a deep recursion within the `SqlValidatorImpl`
component, demonstrating that the failure happens during the validation and
semantic analysis of the expression tree.
{code:java}
java.lang.StackOverflowError
at org.apache.calcite.sql.SqlBasicCall.getKind(SqlBasicCall.java:83)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1458)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1473)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1473)
// ... repeats thousands of times {code}
h1. Impact on System Stability:
While a query with thousands of chained predicates may be considered
suboptimal, the framework's response should be a graceful error (e.g., a
validation exception), not a catastrophic failure. A
`java.lang.StackOverflowError` is a critical, non-recoverable error that can
terminate the executing thread and potentially bring down the entire service.
This transforms a query-level issue into a severe system stability and
reliability problem.
h1. Suggested Solutions:
I propose a two-part approach: an immediate defensive measure to improve
stability, and long-term solutions to fundamentally fix the issue.
h2. Part 1: Immediate Defensive Measure (Mitigation)
Introduce a configurable limit on the maximum depth of the AST. This acts as a
"safety fuse" to prevent stack overflows.
- Mechanism: After parsing, or at the very beginning of the validation phase,
perform a quick, non-recursive depth check of the `SqlNode` tree.
- Configuration: This depth limit should be configurable (e.g., via
`SqlParser.Config` or `SqlValidator.Config`). The default value should be set
to a safe but reasonable number (e.g., 1000).
- Action: If the AST depth exceeds the configured threshold, Calcite should
immediately throw a standard `SqlParseException` or `SqlValidatorException`
with a clear error message, such as "Query exceeds maximum expression depth
limit of [N]".
This provides users with an immediate way to protect their systems from
problematic queries without waiting for a full architectural fix.
h2. Part 2: Fundamental Fixes (Long-Term Solutions)
To properly support these complex queries, the underlying cause of the deep
recursion must be addressed.
1. Adopt Iterative Traversal (Recommended): The most robust solution is to
refactor the recursive algorithms within `SqlValidatorImpl` and other core
components to use an explicit, heap-allocated stack (e.g., `java.util.Deque`).
This converts the recursive traversal into an iterative one, which is not
constrained by the call stack depth and can handle expression trees of any
complexity. This holistically solves this class of problems.
2. Balanced Tree Construction: An alternative is to modify the parser's
behavior. Instead of generating a left-deep tree for chained operators like `a
OP b OP c OP d`, it could produce a balanced tree: `OP(OP(a, b), OP(c, d))`. A
balanced tree's depth grows logarithmically (`log2(N)`) with the number of
operators, which would naturally prevent stack overflows for even a very large
number of predicates. This could be a configurable parser feature.
By implementing the defensive measure from Part 1, we can provide immediate
stability. Following up with one of the fundamental fixes from Part 2 will
ensure Calcite's long-term resilience and scalability.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)