Re: [PR] HIVE-28572: Support Distribute by and Cluster by clauses in CBO [hive]

via GitHub Wed, 22 Jan 2025 03:45:30 -0800


zabetak commented on code in PR #5505:
URL: https://github.com/apache/hive/pull/5505#discussion_r1925182241



##########
ql/src/test/results/clientpositive/llap/input_columnarserde.q.out:
##########
@@ -54,7 +54,7 @@ STAGE PLANS:
                     Reduce Output Operator
                       null sort order: 
                       sort order: 
-                      Map-reduce partition columns: 1 (type: int)
+                      Map-reduce partition columns: _col0 (type: array<int>)

Review Comment:
   The situation is a bit subtle. 
   
   `DISTRIBUTE BY constant` should work in a similar fashion to `GROUP BY 
constant` (https://blog.jooq.org/how-to-group-by-nothing-in-sql/ has some nice 
examples). In that sense if we had `DISTRIBUTE BY 'A'` we would basically put 
all rows to the same bucket/group.
   
   On the other hand, many DBMS including Hive support `GROUP BY <column 
index>` so in that case it is reasonable to say that `DISTRIBUTE BY 1` means 
the first column of the SELECT clause.
   
   I think we should treat the CBO off behavior as a bug and document the 
expected behavior in 
https://github.com/apache/hive-site/blob/main/content/docs/latest/languagemanual-sortby_27362045.md
   
   For the specific test here I don't know what is the original intention of 
putting `DISTRIBUTE BY 1`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28572: Support Distribute by and Cluster by clauses in CBO [hive]

Reply via email to