(doris-website) branch master updated: [feature]support recusive cte (#3206)

dataroaring Mon, 09 Feb 2026 16:07:48 -0800

This is an automated email from the ASF dual-hosted git repository.

dataroaring pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris-website.git



The following commit(s) were added to refs/heads/master by this push:
     new 87aa62c1992 [feature]support recusive cte (#3206)
87aa62c1992 is described below

commit 87aa62c199288a84fded5d9b8f310837df0f740b
Author: starocean999 <[email protected]>
AuthorDate: Tue Feb 10 08:05:41 2026 +0800

    [feature]support recusive cte (#3206)
    
    ## Versions
    
    - [x] dev
    - [ ] 4.x
    - [ ] 3.x
    - [ ] 2.1
    
    ## Languages
    
    - [x] Chinese
    - [x] English
    
    ## Docs Checklist
    
    - [ ] Checked by AI
    - [ ] Test Cases Built
---
 docs/query-data/cte.md                             | 118 ++++++++++++++++++--
 .../current/query-data/cte.md                      | 120 +++++++++++++++++++--
 2 files changed, 226 insertions(+), 12 deletions(-)

diff --git a/docs/query-data/cte.md b/docs/query-data/cte.md
index e4398f6216c..b339b6dc9bb 100644
--- a/docs/query-data/cte.md
+++ b/docs/query-data/cte.md
@@ -40,12 +40,118 @@ SELECT b, d FROM cte1 JOIN cte2
 WHERE cte1.a = cte2.c;
 ```
 
-### Recursive CTE (NOT Support)
+### Recursive CTE
 
+A recursive CTE (Common Table Expression with the `RECURSIVE` keyword) is used 
to express self-referential queries within a single SQL statement, and is 
commonly applied in scenarios such as tree/hierarchy traversal, graph 
traversal, and hierarchical aggregation. A recursive CTE consists of two parts:
+
+- **Anchor query**: The non-recursive part, executed once to generate the 
initial row set (seed).
+- **Recursive query**: Can reference the CTE itself and continue generating 
new rows based on the new rows produced in the previous iteration.
+
+The anchor and recursive parts are typically connected by `UNION` or `UNION 
ALL`. Recursive execution continues until no new rows are generated or a system 
limit is reached.
+
+## Syntax
+
+```sql
+WITH [RECURSIVE] cte_name [(col1, col2, ...)] AS (
+  <anchor_query>     -- Non-recursive part (executed once)
+  UNION [ALL]
+  <recursive_query>  -- Recursive part that can reference cte_name
+)
+SELECT ... FROM cte_name;
+```
+
+Key Points:
+- The `RECURSIVE` keyword allows the CTE definition to reference itself.
+- The number of columns and their data types output by the anchor and 
recursive members must be strictly consistent.
+- The `cte_name` can be referenced in the `recursive_query`, usually used in 
the form of a `JOIN`.
+
+## Execution Semantics (Iterative Model)
+
+Typical iterative execution flow:
+1. Execute the `anchor_query`, write the results to the output set (Output), 
and use them as the work set (WorkSet) for the first iteration.
+2. While the WorkSet is not empty:
+   - Use the WorkSet as input for the `recursive_query`, execute the 
`recursive_query`, and obtain `newRows`.
+   - If `UNION ALL` is used: Directly append `newRows` to the Output and set 
`newRows` as the WorkSet for the next iteration.
+   - If `UNION` (deduplication) is used: Compute the difference set between 
`newRows` and the existing Output (to remove duplicates), and only add the 
non-duplicate rows to the Output and the next iteration's WorkSet.
+3. Repeat step 2 until `newRows` is empty or a preset system upper limit is 
triggered (the Doris session variable `cte_max_recursion_depth` limits the 
recursion depth, with a default value of 100; exceeding this will throw an 
error).
+
+Termination occurs when no new rows are generated in the current iteration (or 
the system's maximum recursion depth limit is reached).
+
+## UNION vs UNION ALL
+
+- `UNION ALL`: Retains duplicates and has low execution overhead (no 
deduplication required). Suitable for scenarios where duplicates are allowed or 
controlled by business logic in the backend.
+- `UNION`: Implicitly performs deduplication, which adds sorting/hash-based 
deduplication overhead per iteration or globally—this cost is significant, 
especially with large data volumes.
+
+Recommendation: Prefer `UNION ALL` if the semantics allow it and duplicates 
can be post-processed at the application layer.
+
+## Common Use Cases and SQL Examples
+
+### 1) Simple Hierarchy Traversal
 ```sql
-WITH r_cte AS (
-  SELECT 1 AS user_id, 2 as manager_id
-  UNION ALL
-  SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as 
manager_id) t ON r_cte.manager_id = t.user_id
+CREATE TABLE tree
+(
+    id int,
+    parent_id int,
+    data varchar(100)
+) DUPLICATE KEY (id)
+DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ('replication_num' = '1');
+
+INSERT INTO tree VALUES (0, NULL, 'ROOT'), (1, 0, 'Child_1'), (2, 0, 
'Child_2'), (3, 1, 'Child_1_1');
+
+WITH RECURSIVE search_tree AS (
+    SELECT id, parent_id, data
+    FROM tree t
+    WHERE t.id = 0
+UNION ALL
+    SELECT t.id, t.parent_id, t.data
+    FROM tree t, search_tree st
+    WHERE t.parent_id = st.id
+)
+SELECT * FROM search_tree ORDER BY id;
+```
+
+### 2) Graph Traversal
+```sql
+CREATE TABLE graph
+(
+    c_from int,
+    c_to int,
+    label varchar(100)
+) DUPLICATE KEY (c_from) DISTRIBUTED BY HASH(c_from) BUCKETS 1 PROPERTIES 
'replication_num' = '1';
+
+INSERT INTO graph VALUES (1, 2, '1 -> 2'), (1, 3, '1 -> 3'), (2, 3, '2 -> 3'), 
(1, 4, '1 -> 4'), (4, 5, '4 -> 5');
+
+WITH RECURSIVE search_graph AS (
+    SELECT c_from, c_to, label FROM graph g
+UNION ALL
+    SELECT g.c_from, g.c_to, g.label
+    FROM graph g, search_graph sg
+    WHERE g.c_from = sg.c_to
 )
-SELECT * FROM r_cte
+SELECT DISTINCT * FROM search_graph ORDER BY c_from, c_to;
+```
+
+Note: Using `UNION` performs deduplication in each iteration, resulting in 
high overhead.
+
+## Limitations of Recursive CTEs
+
+- The top-level operator of the internal query must be UNION(ALL).
+- Subqueries in the non-recursive part cannot reference the recursive CTE 
itself.
+- Subqueries in the recursive part can only reference the recursive CTE once.
+- If a subquery within the recursive part contains another nested subquery, 
the nested subquery cannot reference the recursive CTE.
+- The data types of the output columns of a recursive CTE are determined by 
the output of the non-recursive subquery. An error will be thrown if the data 
types of the recursive and non-recursive sides do not match—manual casting is 
required to ensure consistency between the two sides.
+- The session variable `cte_max_recursion_depth` limits the maximum number of 
recursions to prevent infinite loops (default value: 100).
+
+## Common Errors, Causes, and Solutions
+
+### 1. Error: Mismatched number of columns or data types between anchor and 
recursive members
+- **Cause**: The number of columns or their data types in the `SELECT` clauses 
of the two parts are inconsistent.
+- **Solution**: Ensure the number, order, and data types of columns on both 
sides are consistent. Use `CAST` or explicit column names if necessary.
+
+### 2. Error: Illegal self-reference in the anchor query
+- **Cause**: The anchor query is not allowed to reference the CTE itself.
+- **Solution**: Reference the CTE only in the recursive member; check the 
syntax/parse tree.
+
+### 3. Error: Infinite recursion / Exceeded maximum recursion depth
+- **Cause**: The recursion lacks a convergence condition or the convergence 
condition is incorrectly configured.
+- **Solution**: Add a `WHERE` filter, adjust the system's maximum recursion 
depth, or correct the query logic if infinite recursion is inherent to the 
logic.
\ No newline at end of file
diff --git 
a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md 
b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
index 7d78ab88f77..8d211129939 100644
--- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/query-data/cte.md
@@ -40,12 +40,120 @@ SELECT b, d FROM cte1 JOIN cte2
 WHERE cte1.a = cte2.c;
 ```
 
-### 递归 CTE（Doris 不支持）
+### 递归 CTE
+
+递归 CTE（Common Table Expression with `RECURSIVE`）用于在单条 SQL 
内表达自引用的查询，常用于树/层级遍历、图遍历和分层聚合等场景。递归 CTE 由两部分组成：
+
+- 锚点（anchor）查询：非递归部分，执行一次生成初始行集（seed）。
+- 递归（recursive）查询：可以引用 CTE 本体，基于上轮产生的新行继续生成新行。
+
+锚点与递归部分通常由 `UNION` 或 `UNION ALL` 连接。递归执行直到不再产生新行或达到系统限制。
+
+## 语法
+
+```sql
+WITH [RECURSIVE] cte_name [(col1, col2, ...)] AS (
+  <anchor_query>     -- 非递归部分（一次执行）
+  UNION [ALL]
+  <recursive_query>  -- 可引用 cte_name 的递归部分
+)
+SELECT ... FROM cte_name;
+```
+
+要点：
+
+- `RECURSIVE` 关键字允许在 CTE 定义中引用自身。
+- 锚点和递归成员输出的列数和类型必须严格一致。
+- `recursive_query` 中能引用 `cte_name`，通常以 `JOIN`形式使用。
+
+## 执行语义（迭代模型）
+
+典型迭代执行流程：
+
+1. 执行 `anchor_query`，将结果写入输出集合（Output）并作为首轮的工作集合（WorkSet）。
+2. 当 WorkSet 非空时：
+   - 用 WorkSet 作为 `recursive_query` 的输入，执行 `recursive_query`，得到 `newRows`。
+   - 若使用 `UNION ALL`：直接将 `newRows` 追加到 Output，并把 `newRows` 作为下一轮的 WorkSet。
+   - 若使用 `UNION`（去重）：对 `newRows` 与已有 Output 做差集（去重），只将未出现的行加入 Output 与下一轮 
WorkSet。
+3. 重复步骤 2，直到 `newRows` 为空或触发系统预设的上限（Doris 
session变量限制递归深度`cte_max_recursion_depth` 默认值为100，超出会抛错）。
+
+终止当当前轮没有新行被产生（或达到系统最大递归深度限制）。
+
+## UNION vs UNION ALL
+
+- `UNION ALL`：保留重复，执行开销低（无需去重）。适用于允许重复或在后端由业务逻辑控制重复的场景。
+- `UNION`：隐含去重，会在每轮或全局增加排序/哈希去重开销，代价显著，尤其是在大数据量下。
+
+建议：如果语义允许且能在应用层后处理重复，优先使用 `UNION ALL`。
+
+## 常见用例与 SQL 示例
+
+1) 简单层级遍历：
 
 ```sql
-WITH r_cte AS (
-  SELECT 1 AS user_id, 2 as manager_id
-  UNION ALL
-  SELECT user_id, manager_id FROM r_cte INNER JOIN (SELECT 1 AS user_id, 2 as 
manager_id) t ON r_cte.manager_id = t.user_id
+CREATE TABLE tree
+(
+    id int,
+    parent_id int,
+    data varchar(100)
+) DUPLICATE KEY (id)
+DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ('replication_num' = '1');
+
+INSERT INTO tree VALUES (0, NULL, 'ROOT'), (1, 0, 'Child_1'), (2, 0, 
'Child_2'), (3, 1, 'Child_1_1');
+
+WITH RECURSIVE search_tree AS (
+    SELECT id, parent_id, data
+    FROM tree t
+    WHERE t.id = 0
+UNION ALL
+    SELECT t.id, t.parent_id, t.data
+    FROM tree t, search_tree st
+    WHERE t.parent_id = st.id
 )
-SELECT * FROM r_cte
+SELECT * FROM search_tree order BY id;
+```
+
+2) 图遍历：
+
+```sql
+CREATE TABLE graph
+(
+    c_from int,
+    c_to int,
+    label varchar(100)
+) DUPLICATE KEY (c_from) DISTRIBUTED BY HASH(c_from) BUCKETS 1 PROPERTIES 
'replication_num' = '1');
+
+INSERT INTO graph VALUES (1, 2, '1 -> 2'), (1, 3, '1 -> 3'), (2, 3, '2 -> 3'), 
(1, 4, '1 -> 4'), (4, 5, '4 -> 5');
+
+WITH RECURSIVE search_graph AS (
+    SELECT c_from, c_to, label FROM graph g
+UNION ALL
+    SELECT g.c_from, g.c_to, g.label
+    FROM graph g, search_graph sg
+    WHERE g.c_from = sg.c_to
+)
+SELECT DISTINCT * FROM search_graph ORDER BY c_from, c_to;
+```
+
+注意：使用 `UNION` 会在每轮做去重，代价较高。
+
+## 递归CTE的限制
+
+- 内部查询顶层操作符必须是UNION(ALL)。
+- 非递归部分的子查询不能引用递归CTE自身。
+- 递归部分的子查询只能引用递归CTE一次。
+- 递归部分的子查询如果内部还有子查询，那内部的子查询不能引用递归CTE。
+- 递归CTE的输出列类型由非递归侧的子查询输出决定，如果递归侧和非递归侧类型不一致，会报错。需要人工cast保证两边数据类型一致。
+- session变量`cte_max_recursion_depth`，限制最大递归次数，防止死循环，默认值是100。
+
+## 常见错误、原因与解决办法
+
+1. 错误：锚点与递归成员列数或类型不匹配  
+   - 原因：两部分 `SELECT` 列数或列类型不一致。  
+   - 解决：确保两侧列数、顺序与类型一致，必要时使用 `CAST` 或显式列名。
+2. 错误：锚点引用自身（非法）  
+   - 原因：锚点不允许引用 CTE 本体。  
+   - 解决：只在递归成员中引用 CTE；检查语法/解析树。
+3. 错误：无限递归 / 超过最大递归深度  
+   - 原因：递归没有收敛条件或收敛条件设置不正确。  
+   - 解决：添加`WHERE` 过滤、或调整系统最大递归深度；若逻辑确实无限递归需修正查询逻辑。


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(doris-website) branch master updated: [feature]support recusive cte (#3206)

Reply via email to