[ 
https://issues.apache.org/jira/browse/HIVE-24606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

okumin reassigned HIVE-24606:
-----------------------------


> Multi-stage materialized CTEs can lost intermediate data
> --------------------------------------------------------
>
>                 Key: HIVE-24606
>                 URL: https://issues.apache.org/jira/browse/HIVE-24606
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 3.1.2, 2.3.7, 4.0.0
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>
> With complex multi-stage CTEs, Hive can start a latter stage before its 
> previous stage finishes.
>  That's because `SemanticAnalyzer#toRealRootTasks` can fail to resolve 
> dependency between multistage materialized CTEs when a non-materialized CTE 
> cuts in.
>  
> [https://github.com/apache/hive/blob/425e1ff7c054f87c4db87e77d004282d529599ae/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L1414]
>  
> For example, when submitting this query,
> {code:sql}
> SET hive.optimize.cte.materialize.threshold=2;
> SET hive.optimize.cte.materialize.full.aggregate.only=false;
> WITH x AS ( SELECT 'x' AS id ), -- not materialized
> a1 AS ( SELECT 'a1' AS id ), -- materialized by a2 and the root
> a2 AS ( SELECT 'a2 <- ' || id AS id FROM a1) -- materialized by the root
> SELECT * FROM a1
> UNION ALL
> SELECT * FROM x
> UNION ALL
> SELECT * FROM a2
> UNION ALL
> SELECT * FROM a2;
> {code}
> `toRealRootTask` will traverse the CTEs in order of `a1`, `x`, and `a2`. It 
> means the dependency between `a1` and `a2` will be ignored and `a2` can start 
> without waiting for `a1`. As a result, the above query returns the following 
> result.
> {code:java}
> +-----+
> | id  |
> +-----+
> | a1  |
> | x   |
> +-----+
> {code}
> For your information, I ran this test with revision = 
> 425e1ff7c054f87c4db87e77d004282d529599ae.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to