Viggo Chen created CALCITE-7213:
-----------------------------------

             Summary: StackOverflowError in SqlValidatorImpl with long OR/AND 
predicate chains
                 Key: CALCITE-7213
                 URL: https://issues.apache.org/jira/browse/CALCITE-7213
             Project: Calcite
          Issue Type: Bug
            Reporter: Viggo Chen


h1. Problem Description:
Apache Calcite's SQL validation stage is vulnerable to 
`java.lang.StackOverflowError` when processing queries that contain a large 
number of chained logical operators (`AND`/`OR`). This occurs because the 
parser generates a deeply nested, unbalanced Abstract Syntax Tree (AST) for 
these predicates. Subsequent recursive traversal during the validation phase 
exhausts the JVM's call stack, causing the process to fail.
This issue prevents Calcite from handling complex, often machine-generated, SQL 
queries, which are prevalent in modern data systems.
Steps to Reproduce:
The issue can be reliably reproduced with a test case that generates a long 
chain of `OR` conditions. The following Java code creates a query with 5000 
`OR` predicates, which is sufficient to trigger the stack overflow during the 
validation step.
{code:java}
// Test case to generate the problematic SQL
final String whereClause = java.util.stream.IntStream.range(0, 5000)
    .mapToObj(i -> "sal = " + i)
    .collect(java.util.stream.Collectors.joining(" OR "));
// This will throw StackOverflowError during the validation phase
sql("SELECT * FROM emp WHERE " + whereClause).ok(); {code}
Generated SQL:
{code:java}
SELECT * FROM emp WHERE sal = 0 OR sal = 1 OR sal = 2 OR ... OR sal = 4999 
{code}
h1. Stack Trace Analysis:
A typical stack trace points to a deep recursion within the `SqlValidatorImpl` 
component, demonstrating that the failure happens during the validation and 
semantic analysis of the expression tree.
{code:java}
java.lang.StackOverflowError
at org.apache.calcite.sql.SqlBasicCall.getKind(SqlBasicCall.java:83)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1458)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1473)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.performUnconditionalRewrites(SqlValidatorImpl.java:1473)
// ... repeats thousands of times {code}
h1. Impact on System Stability:

While a query with thousands of chained predicates may be considered 
suboptimal, the framework's response should be a graceful error (e.g., a 
validation exception), not a catastrophic failure. A 
`java.lang.StackOverflowError` is a critical, non-recoverable error that can 
terminate the executing thread and potentially bring down the entire service. 
This transforms a query-level issue into a severe system stability and 
reliability problem.
 
h1. Suggested Solutions:

I propose a two-part approach: an immediate defensive measure to improve 
stability, and long-term solutions to fundamentally fix the issue.
h2. Part 1: Immediate Defensive Measure (Mitigation)
Introduce a configurable limit on the maximum depth of the AST. This acts as a 
"safety fuse" to prevent stack overflows.
- Mechanism: After parsing, or at the very beginning of the validation phase, 
perform a quick, non-recursive depth check of the `SqlNode` tree.
- Configuration: This depth limit should be configurable (e.g., via 
`SqlParser.Config` or `SqlValidator.Config`). The default value should be set 
to a safe but reasonable number (e.g., 1000).
- Action: If the AST depth exceeds the configured threshold, Calcite should 
immediately throw a standard `SqlParseException` or `SqlValidatorException` 
with a clear error message, such as "Query exceeds maximum expression depth 
limit of [N]".
This provides users with an immediate way to protect their systems from 
problematic queries without waiting for a full architectural fix.
h2. Part 2: Fundamental Fixes (Long-Term Solutions)
To properly support these complex queries, the underlying cause of the deep 
recursion must be addressed.
1. Adopt Iterative Traversal (Recommended): The most robust solution is to 
refactor the recursive algorithms within `SqlValidatorImpl` and other core 
components to use an explicit, heap-allocated stack (e.g., `java.util.Deque`). 
This converts the recursive traversal into an iterative one, which is not 
constrained by the call stack depth and can handle expression trees of any 
complexity. This holistically solves this class of problems.
2. Balanced Tree Construction: An alternative is to modify the parser's 
behavior. Instead of generating a left-deep tree for chained operators like `a 
OP b OP c OP d`, it could produce a balanced tree: `OP(OP(a, b), OP(c, d))`. A 
balanced tree's depth grows logarithmically (`log2(N)`) with the number of 
operators, which would naturally prevent stack overflows for even a very large 
number of predicates. This could be a configurable parser feature.
By implementing the defensive measure from Part 1, we can provide immediate 
stability. Following up with one of the fundamental fixes from Part 2 will 
ensure Calcite's long-term resilience and scalability.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to