[jira] [Commented] (SPARK-29708) Different answers in aggregates of duplicate grouping sets

Dongjoon Hyun (Jira) Thu, 16 Jan 2020 13:00:24 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-29708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017478#comment-17017478
 ]


Dongjoon Hyun commented on SPARK-29708:
---------------------------------------

This is backported to branch-2.4 via https://github.com/apache/spark/pull/27229 
.

> Different answers in aggregates of duplicate grouping sets
> ----------------------------------------------------------
>
>                 Key: SPARK-29708
>                 URL: https://issues.apache.org/jira/browse/SPARK-29708
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>            Reporter: Takeshi Yamamuro
>            Assignee: Takeshi Yamamuro
>            Priority: Major
>              Labels: correctness
>             Fix For: 2.4.5, 3.0.0
>
>
> A query below with multiple grouping sets seems to have different answers 
> between PgSQL and Spark;
> {code:java}
> postgres=# create table gstest4(id integer, v integer, unhashable_col bit(4), 
> unsortable_col xid);
> postgres=# insert into gstest4
> postgres-# values (1,1,b'0000','1'), (2,2,b'0001','1'),
> postgres-#        (3,4,b'0010','2'), (4,8,b'0011','2'),
> postgres-#        (5,16,b'0000','2'), (6,32,b'0001','2'),
> postgres-#        (7,64,b'0010','1'), (8,128,b'0011','1');
> INSERT 0 8
> postgres=# select unsortable_col, count(*)
> postgres-#   from gstest4 group by grouping sets 
> ((unsortable_col),(unsortable_col))
> postgres-#   order by text(unsortable_col);
>  unsortable_col | count 
> ----------------+-------
>               1 |     8
>               1 |     8
>               2 |     8
>               2 |     8
> (4 rows)
> {code}
> {code:java}
> scala> sql("""create table gstest4(id integer, v integer, unhashable_col /* 
> bit(4) */ byte, unsortable_col /* xid */ integer) using parquet""")
> scala> sql("""
>      | insert into gstest4
>      | values (1,1,tinyint('0'),1), (2,2,tinyint('1'),1),
>      |        (3,4,tinyint('2'),2), (4,8,tinyint('3'),2),
>      |        (5,16,tinyint('0'),2), (6,32,tinyint('1'),2),
>      |        (7,64,tinyint('2'),1), (8,128,tinyint('3'),1)
>      | """)
> res21: org.apache.spark.sql.DataFrame = []
> scala> 
> scala> sql("""
>      | select unsortable_col, count(*)
>      |   from gstest4 group by grouping sets 
> ((unsortable_col),(unsortable_col))
>      |   order by string(unsortable_col)
>      | """).show
> +--------------+--------+
> |unsortable_col|count(1)|
> +--------------+--------+
> |             1|       8|
> |             2|       8|
> +--------------+--------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29708) Different answers in aggregates of duplicate grouping sets

Reply via email to