Andy Lam created SPARK-46743:
--------------------------------

             Summary: Count bug introduced for scalar subquery when using 
TEMPORARY VIEW, as compared to using table
                 Key: SPARK-46743
                 URL: https://issues.apache.org/jira/browse/SPARK-46743
             Project: Spark
          Issue Type: Bug
          Components: Optimizer
    Affects Versions: 3.5.0
            Reporter: Andy Lam


Using the temp view reproduces COUNT bug, returns nulls instead of 0.

With a table:
{code:java}
scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM 
VALUES
     |     (1, 1),
     |     (2, 1),
     |     (3, 3),
     |     (6, 6),
     |     (7, 7),
     |     (9, 9) AS inner_table(a, b)""")

val res6: org.apache.spark.sql.DataFrame = []

scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null AS 
int) AS a, CAST(null as int) AS b ;")

val res7: org.apache.spark.sql.DataFrame = []

scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM 
null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect()

val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], [0]) 
{code}
With a view:

 
{code:java}
spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, 1),(3, 
3), (6, 6), (7, 7), (9, 9);")

spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), 
CAST(null as int);")

spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view 
WHERE null_view.a = outer_view.a) FROM outer_view""").collect()

val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], 
[null], [null], [null]){code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to