linhongliu-db commented on a change in pull request #31286:
URL: https://github.com/apache/spark/pull/31286#discussion_r565873980



##########
File path: sql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql
##########
@@ -743,20 +743,20 @@ select * from a left join b on i = x and i = y and x = i;
 --
 -- test NULL behavior of whole-row Vars, per bug #5025
 --
-select t1.q2, count(t2.*)
+select t1.q2, count(t2.q1, t2.q2)

Review comment:
       @cloud-fan , @maropu , I did more tests on pgsql, seems it is much more 
wired than I thought.
   TLDR: in pgsql, `count(t2.*)` means how many MATCHING rows from t2 table. 
and spark doesn't have such a semantic. Here, if we want to keep the output 
unchanged. it should be `count(t2.q1, t2.q2)`
   
   Here is the example:
   ```
   CREATE TABLE INT8_TBL AS SELECT * FROM (VALUES (1, 2), (2, 1), (3, NULL), 
(NULL, NULL)) AS v(q1, q2);
   select * from int8_tbl as t1 left join int8_tbl as t2 on t1.q1 = t2.q2;
   
   t1.q1  | t1.q2  | t2.q1  | t2.q2
   --     | --     | --     | --
   1      | 2      | 2      | 1
   2      | 1      | 1      | 2
   3      | (null) | (null) | (null)
   (null) | (null) | (null) | (null)
   
   select count(t1.*), count(t2.*) from int8_tbl as t1 left join int8_tbl as t2 
on t1.q1 = t2.q2;
   
   t1.count | t2.count
   --       | --
   4        | 2
   
   ```  
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to