zabetak commented on code in PR #5505:
URL: https://github.com/apache/hive/pull/5505#discussion_r1925182241
##########
ql/src/test/results/clientpositive/llap/input_columnarserde.q.out:
##########
@@ -54,7 +54,7 @@ STAGE PLANS:
Reduce Output Operator
null sort order:
sort order:
- Map-reduce partition columns: 1 (type: int)
+ Map-reduce partition columns: _col0 (type: array<int>)
Review Comment:
The situation is a bit subtle.
`DISTRIBUTE BY constant` should work in a similar fashion to `GROUP BY
constant` (https://blog.jooq.org/how-to-group-by-nothing-in-sql/ has some nice
examples). In that sense if we had `DISTRIBUTE BY 'A'` we would basically put
all rows to the same bucket/group.
On the other hand, many DBMS including Hive support `GROUP BY <column
index>` so in that case it is reasonable to say that `DISTRIBUTE BY 1` means
the first column of the SELECT clause.
I think we should treat the CBO off behavior as a bug and document the
expected behavior in
https://github.com/apache/hive-site/blob/main/content/docs/latest/languagemanual-sortby_27362045.md
For the specific test here I don't know what is the original intention of
putting `DISTRIBUTE BY 1`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]