[jira] [Comment Edited] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

Stu (Jira) Thu, 24 Mar 2022 15:14:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17511358#comment-17511358
 ]


Stu edited comment on SPARK-26639 at 3/24/22, 10:13 PM:
--------------------------------------------------------

Here's another example of this happening, in Spark 3.1.2. I'm running the 
following code:
{code:java}
WITH t AS (
  SELECT random() as a
) 
  SELECT * FROM t
  UNION
  SELECT * FROM t {code}
The CTE has a non-deterministic function. If it was pre-calculated, the same 
random value would be chosen for `a` in both unioned queries, and the output 
would be deduplicated into a single record.

This is not the case. The output is two records, with different random values.

In our platform, some folks like to write complex CTEs and reference them 
multiple times. Recalculating these for every reference is quite 
computationally expensive, so we recommend to create separate tables in these 
cases, but don't have any way to enforce this. Fixing this bug would save a 
good number of compute hours!


was (Author: stubartmess):
Here's another example of this happening, in Spark 3.1.2. I'm running the 
following code:
{code:java}
WITH t AS (
  SELECT random() as a
) 
  SELECT * FROM t
  UNION
  SELECT * FROM t {code}
The CTE has a non-deterministic function. If it was pre-calculated, the same 
random value would be chosen for `a` in both unioned queries, and the output 
would be deduplicated into a single record.

This is not the case. The output is two records, with different random values.

> The reuse subquery function maybe does not work in SPARK SQL
> ------------------------------------------------------------
>
>                 Key: SPARK-26639
>                 URL: https://issues.apache.org/jira/browse/SPARK-26639
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Ke Jia
>            Priority: Major
>
> The subquery reuse feature has done in 
> [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

Reply via email to