This is an automated email from the ASF dual-hosted git repository.
dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git
The following commit(s) were added to refs/heads/master by this push:
new 87aa62c1992 [feature]support recusive cte (#3206)
87aa62c1992 is described below
commit 87aa62c199288a84fded5d9b8f310837df0f740b
Author: starocean999 <[email protected]>
AuthorDate: Tue Feb 10 08:05:41 2026 +0800
[feature]support recusive cte (#3206)
## Versions
- [x] dev
- [ ] 4.x
- [ ] 3.x
- [ ] 2.1
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [ ] Checked by AI
- [ ] Test Cases Built
---
docs/query-data/cte.md | 118 ++++++++++++++++++--
.../current/query-data/cte.md | 120 +++++++++++++++++++--
2 files changed, 226 insertions(+), 12 deletions(-)
diff --git a/docs/query-data/cte.md b/docs/query-data/cte.md
index e4398f6216c..b339b6dc9bb 100644
--- a/docs/query-data/cte.md
+++ b/docs/query-data/cte.md
@@ -40,12 +40,118 @@ SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;
```
-### Recursive CTE (NOT Support)
+### Recursive CTE
+A recursive CTE (Common Table Expression with the `RECURSIVE` keyword) is used
to express self-referential queries within a single SQL statement, and is
commonly applied in scenarios such as tree/hierarchy traversal, graph
traversal, and hierarchical aggregation. A recursive CTE consists of two parts:
+
+- **Anchor query**: The non-recursive part, executed once to generate the
initial row set (seed).
+- **Recursive query**: Can reference the CTE itself and continue generating
new rows based on the new rows produced in the previous iteration.
+
+The anchor and recursive parts are typically connected by `UNION` or `UNION
ALL`. Recursive execution continues until no new rows are generated or a system
limit is reached.
+
+## Syntax
+
+```sql
+WITH [RECURSIVE] cte_name [(col1, col2, ...)] AS (
+ <anchor_query> -- Non-recursive part (executed once)
+ UNION [ALL]
+ <recursive_query> -- Recursive part that can reference cte_name
+)
+SELECT ... FROM cte_name;
+```
+
+Key Points:
+- The `RECURSIVE` keyword allows the CTE definition to reference itself.
+- The number of columns and their data types output by the anchor and
recursive members must be strictly consistent.
+- The `cte_name` can be referenced in the `recursive_query`, usually used in
the form of a `JOIN`.
+
+## Execution Semantics (Iterative Model)
+
+Typical iterative execution flow:
+1. Execute the `anchor_query`, write the results to the output set (Output),
and use them as the work set (WorkSet) for the first iteration.
+2. While the WorkSet is not empty:
+ - Use the WorkSet as input for the `recursive_query`, execute the
`recursive_query`, and obtain `newRows`.
+ - If `UNION ALL` is used: Directly append `newRows` to the Output and set
`newRows` as the WorkSet for the next iteration.
+ - If `UNION` (deduplication) is used: Compute the difference set between
`newRows` and the existing Output (to remove duplicates), and only add the
non-duplicate rows to the Output and the next iteration's WorkSet.
+3. Repeat step 2 until `newRows` is empty or a preset system upper limit is
triggered (the Doris session variable `cte_max_recursion_depth` limits the
recursion depth, with a default value of 100; exceeding this will throw an
error).
+
+Termination occurs when no new rows are generated in the current iteration (or
the system's maximum recursion depth limit is reached).
+
+## UNION vs UNION ALL
+
+- `UNION ALL`: Retains duplicates and has low execution overhead (no
deduplication required). Suitable for scenarios where duplicates are allowed or
controlled by business logic in the backend.
+- `UNION`: Implicitly performs deduplication, which adds sorting/hash-based
deduplication overhead per iteration or globally—this cost is significant,
especially with large data volumes.
+
+Recommendation: Prefer `UNION ALL` if the semantics allow it and duplicates
can be post-processed at the application layer.
+
+## Common Use Cases and SQL Examples
+
+### 1) Simple Hierarchy Traversal
```sql
-WITH r_cte AS (
- SELECT 1 AS user_id, 2 as manager_id
- UNION ALL
- SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as
manager_id) t ON r_cte.manager_id = t.user_id
+CREATE TABLE tree
+(
+ id int,
+ parent_id int,
+ data varchar(100)
+) DUPLICATE KEY (id)
+DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ('replication_num' = '1');
+
+INSERT INTO tree VALUES (0, NULL, 'ROOT'), (1, 0, 'Child_1'), (2, 0,
'Child_2'), (3, 1, 'Child_1_1');
+
+WITH RECURSIVE search_tree AS (
+ SELECT id, parent_id, data
+ FROM tree t
+ WHERE t.id = 0
+UNION ALL
+ SELECT t.id, t.parent_id, t.data
+ FROM tree t, search_tree st
+ WHERE t.parent_id = st.id
+)
+SELECT * FROM search_tree ORDER BY id;
+```
+
+### 2) Graph Traversal
+```sql
+CREATE TABLE graph
+(
+ c_from int,
+ c_to int,
+ label varchar(100)
+) DUPLICATE KEY (c_from) DISTRIBUTED BY HASH(c_from) BUCKETS 1 PROPERTIES
'replication_num' = '1';
+
+INSERT INTO graph VALUES (1, 2, '1 -> 2'), (1, 3, '1 -> 3'), (2, 3, '2 -> 3'),
(1, 4, '1 -> 4'), (4, 5, '4 -> 5');
+
+WITH RECURSIVE search_graph AS (
+ SELECT c_from, c_to, label FROM graph g
+UNION ALL
+ SELECT g.c_from, g.c_to, g.label
+ FROM graph g, search_graph sg
+ WHERE g.c_from = sg.c_to
)
-SELECT * FROM r_cte
+SELECT DISTINCT * FROM search_graph ORDER BY c_from, c_to;
+```
+
+Note: Using `UNION` performs deduplication in each iteration, resulting in
high overhead.
+
+## Limitations of Recursive CTEs
+
+- The top-level operator of the internal query must be UNION(ALL).
+- Subqueries in the non-recursive part cannot reference the recursive CTE
itself.
+- Subqueries in the recursive part can only reference the recursive CTE once.
+- If a subquery within the recursive part contains another nested subquery,
the nested subquery cannot reference the recursive CTE.
+- The data types of the output columns of a recursive CTE are determined by
the output of the non-recursive subquery. An error will be thrown if the data
types of the recursive and non-recursive sides do not match—manual casting is
required to ensure consistency between the two sides.
+- The session variable `cte_max_recursion_depth` limits the maximum number of
recursions to prevent infinite loops (default value: 100).
+
+## Common Errors, Causes, and Solutions
+
+### 1. Error: Mismatched number of columns or data types between anchor and
recursive members
+- **Cause**: The number of columns or their data types in the `SELECT` clauses
of the two parts are inconsistent.
+- **Solution**: Ensure the number, order, and data types of columns on both
sides are consistent. Use `CAST` or explicit column names if necessary.
+
+### 2. Error: Illegal self-reference in the anchor query
+- **Cause**: The anchor query is not allowed to reference the CTE itself.
+- **Solution**: Reference the CTE only in the recursive member; check the
syntax/parse tree.
+
+### 3. Error: Infinite recursion / Exceeded maximum recursion depth
+- **Cause**: The recursion lacks a convergence condition or the convergence
condition is incorrectly configured.
+- **Solution**: Add a `WHERE` filter, adjust the system's maximum recursion
depth, or correct the query logic if infinite recursion is inherent to the
logic.
\ No newline at end of file
diff --git
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
index 7d78ab88f77..8d211129939 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
@@ -40,12 +40,120 @@ SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;
```
-### 递归 CTE(Doris 不支持)
+### 递归 CTE
+
+递归 CTE(Common Table Expression with `RECURSIVE`)用于在单条 SQL
内表达自引用的查询,常用于树/层级遍历、图遍历和分层聚合等场景。递归 CTE 由两部分组成:
+
+- 锚点(anchor)查询:非递归部分,执行一次生成初始行集(seed)。
+- 递归(recursive)查询:可以引用 CTE 本体,基于上轮产生的新行继续生成新行。
+
+锚点与递归部分通常由 `UNION` 或 `UNION ALL` 连接。递归执行直到不再产生新行或达到系统限制。
+
+## 语法
+
+```sql
+WITH [RECURSIVE] cte_name [(col1, col2, ...)] AS (
+ <anchor_query> -- 非递归部分(一次执行)
+ UNION [ALL]
+ <recursive_query> -- 可引用 cte_name 的递归部分
+)
+SELECT ... FROM cte_name;
+```
+
+要点:
+
+- `RECURSIVE` 关键字允许在 CTE 定义中引用自身。
+- 锚点和递归成员输出的列数和类型必须严格一致。
+- `recursive_query` 中能引用 `cte_name`,通常以 `JOIN`形式使用。
+
+## 执行语义(迭代模型)
+
+典型迭代执行流程:
+
+1. 执行 `anchor_query`,将结果写入输出集合(Output)并作为首轮的工作集合(WorkSet)。
+2. 当 WorkSet 非空时:
+ - 用 WorkSet 作为 `recursive_query` 的输入,执行 `recursive_query`,得到 `newRows`。
+ - 若使用 `UNION ALL`:直接将 `newRows` 追加到 Output,并把 `newRows` 作为下一轮的 WorkSet。
+ - 若使用 `UNION`(去重):对 `newRows` 与已有 Output 做差集(去重),只将未出现的行加入 Output 与下一轮
WorkSet。
+3. 重复步骤 2,直到 `newRows` 为空或触发系统预设的上限(Doris
session变量限制递归深度`cte_max_recursion_depth` 默认值为100,超出会抛错)。
+
+终止当当前轮没有新行被产生(或达到系统最大递归深度限制)。
+
+## UNION vs UNION ALL
+
+- `UNION ALL`:保留重复,执行开销低(无需去重)。适用于允许重复或在后端由业务逻辑控制重复的场景。
+- `UNION`:隐含去重,会在每轮或全局增加排序/哈希去重开销,代价显著,尤其是在大数据量下。
+
+建议:如果语义允许且能在应用层后处理重复,优先使用 `UNION ALL`。
+
+## 常见用例与 SQL 示例
+
+1) 简单层级遍历:
```sql
-WITH r_cte AS (
- SELECT 1 AS user_id, 2 as manager_id
- UNION ALL
- SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as
manager_id) t ON r_cte.manager_id = t.user_id
+CREATE TABLE tree
+(
+ id int,
+ parent_id int,
+ data varchar(100)
+) DUPLICATE KEY (id)
+DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ('replication_num' = '1');
+
+INSERT INTO tree VALUES (0, NULL, 'ROOT'), (1, 0, 'Child_1'), (2, 0,
'Child_2'), (3, 1, 'Child_1_1');
+
+WITH RECURSIVE search_tree AS (
+ SELECT id, parent_id, data
+ FROM tree t
+ WHERE t.id = 0
+UNION ALL
+ SELECT t.id, t.parent_id, t.data
+ FROM tree t, search_tree st
+ WHERE t.parent_id = st.id
)
-SELECT * FROM r_cte
+SELECT * FROM search_tree order BY id;
+```
+
+2) 图遍历:
+
+```sql
+CREATE TABLE graph
+(
+ c_from int,
+ c_to int,
+ label varchar(100)
+) DUPLICATE KEY (c_from) DISTRIBUTED BY HASH(c_from) BUCKETS 1 PROPERTIES
'replication_num' = '1');
+
+INSERT INTO graph VALUES (1, 2, '1 -> 2'), (1, 3, '1 -> 3'), (2, 3, '2 -> 3'),
(1, 4, '1 -> 4'), (4, 5, '4 -> 5');
+
+WITH RECURSIVE search_graph AS (
+ SELECT c_from, c_to, label FROM graph g
+UNION ALL
+ SELECT g.c_from, g.c_to, g.label
+ FROM graph g, search_graph sg
+ WHERE g.c_from = sg.c_to
+)
+SELECT DISTINCT * FROM search_graph ORDER BY c_from, c_to;
+```
+
+注意:使用 `UNION` 会在每轮做去重,代价较高。
+
+## 递归CTE的限制
+
+- 内部查询顶层操作符必须是UNION(ALL)。
+- 非递归部分的子查询不能引用递归CTE自身。
+- 递归部分的子查询只能引用递归CTE一次。
+- 递归部分的子查询如果内部还有子查询,那内部的子查询不能引用递归CTE。
+- 递归CTE的输出列类型由非递归侧的子查询输出决定,如果递归侧和非递归侧类型不一致,会报错。需要人工cast保证两边数据类型一致。
+- session变量`cte_max_recursion_depth`,限制最大递归次数,防止死循环,默认值是100。
+
+## 常见错误、原因与解决办法
+
+1. 错误:锚点与递归成员列数或类型不匹配
+ - 原因:两部分 `SELECT` 列数或列类型不一致。
+ - 解决:确保两侧列数、顺序与类型一致,必要时使用 `CAST` 或显式列名。
+2. 错误:锚点引用自身(非法)
+ - 原因:锚点不允许引用 CTE 本体。
+ - 解决:只在递归成员中引用 CTE;检查语法/解析树。
+3. 错误:无限递归 / 超过最大递归深度
+ - 原因:递归没有收敛条件或收敛条件设置不正确。
+ - 解决:添加`WHERE` 过滤、或调整系统最大递归深度;若逻辑确实无限递归需修正查询逻辑。
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]