[ https://issues.apache.org/jira/browse/HIVE-25905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17483120#comment-17483120 ]
Alessandro Solimando commented on HIVE-25905: --------------------------------------------- You are right [~zabetak], it works as expected and it matches both the property description and the summarisation table in your comment: {noformat} -- hive.default.nulls.last is true by default, it sets NULLS_FIRST for DESC set hive.default.nulls.last; OUT: hive.default.nulls.last=true SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC) FROM test1; OUT: John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1 John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2 John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3 John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4 John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5 John Doe 1987-05-10 00:00:00 NULL 1 John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 2 John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 3 John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 4 John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 5 -- we set hive.default.nulls.last=false, it sets NULLS_LAST for DESC set hive.default.nulls.last=false; SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC) FROM test1; OUT: John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1 John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2 John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3 John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4 John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5 John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 1 John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 2 John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 3 John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 4 John Doe 1987-05-10 00:00:00 NULL 5 -- we set hive.default.nulls.last=false but we have explicit NULLS_LAST, we expect NULLS_LAST set hive.default.nulls.last=false; SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS LAST) FROM test1; OUT: John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1 John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2 John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3 John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4 John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5 John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 1 John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 2 John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 3 John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 4 John Doe 1987-05-10 00:00:00 NULL 5 -- we have explicit NULLS_FIRST, we expect NULLS_FIRST SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC NULLS FIRST) FROM test1; --OUT: John Doe 1990-05-10 00:00:00 2022-01-10 00:00:00 1 John Doe 1990-05-10 00:00:00 2021-12-10 00:00:00 2 John Doe 1990-05-10 00:00:00 2021-11-10 00:00:00 3 John Doe 1990-05-10 00:00:00 2021-10-10 00:00:00 4 John Doe 1990-05-10 00:00:00 2021-09-10 00:00:00 5 John Doe 1987-05-10 00:00:00 NULL 1 John Doe 1987-05-10 00:00:00 2022-01-10 00:00:00 2 John Doe 1987-05-10 00:00:00 2021-12-10 00:00:00 3 John Doe 1987-05-10 00:00:00 2021-11-10 00:00:00 4 John Doe 1987-05-10 00:00:00 2021-10-10 00:00:00 5{noformat} > ORDER BY colName DESC does not honour 'hive.default.nulls.last' property in > absence of NULLS_LAST/FIRST > ------------------------------------------------------------------------------------------------------- > > Key: HIVE-25905 > URL: https://issues.apache.org/jira/browse/HIVE-25905 > Project: Hive > Issue Type: Bug > Components: Parser > Affects Versions: 4.0.0 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Consider the following table and data: > {noformat} > create table test1 > ( > a string, > b timestamp, > c timestamp > ); > INSERT INTO TABLE test1 VALUES > ('John Doe', '1990-05-10 00:00:00.0', '2022-01-10 00:00:00.0'), > ('John Doe', '1990-05-10 00:00:00.0', '2021-12-10 00:00:00.0'), > ('John Doe', '1990-05-10 00:00:00.0', '2021-11-10 00:00:00.0'), > ('John Doe', '1990-05-10 00:00:00.0', '2021-10-10 00:00:00.0'), > ('John Doe', '1990-05-10 00:00:00.0', '2021-09-10 00:00:00.0'), > ('John Doe', '1987-05-10 00:00:00.0', '2022-01-10 00:00:00.0'), > ('John Doe', '1987-05-10 00:00:00.0', '2021-12-10 00:00:00.0'), > ('John Doe', '1987-05-10 00:00:00.0', '2021-11-10 00:00:00.0'), > ('John Doe', '1987-05-10 00:00:00.0', '2021-10-10 00:00:00.0'), > ('John Doe', '1987-05-10 00:00:00.0', null);{noformat} > Consider also the following query: > {noformat} > SELECT a, b, c, row_number() OVER (PARTITION BY a, b ORDER BY b DESC, c DESC) > FROM test1; > {noformat} > The output is: > {noformat} > John Doe 10/05/1990 00:00 10/01/2022 00:00 1 > John Doe 10/05/1990 00:00 10/12/2021 00:00 2 > John Doe 10/05/1990 00:00 10/11/2021 00:00 3 > John Doe 10/05/1990 00:00 10/10/2021 00:00 4 > John Doe 10/05/1990 00:00 10/09/2021 00:00 5 > John Doe 10/05/1987 00:00 NULL 1 > John Doe 10/05/1987 00:00 10/01/2022 00:00 2 > John Doe 10/05/1987 00:00 10/12/2021 00:00 3 > John Doe 10/05/1987 00:00 10/11/2021 00:00 4 > John Doe 10/05/1987 00:00 10/10/2021 00:00 5{noformat} > While the expected output should be: > {noformat} > John Doe 10/05/1990 00:00 10/01/2022 00:00 1 > John Doe 10/05/1990 00:00 10/12/2021 00:00 2 > John Doe 10/05/1990 00:00 10/11/2021 00:00 3 > John Doe 10/05/1990 00:00 10/10/2021 00:00 4 > John Doe 10/05/1990 00:00 10/09/2021 00:00 5 > John Doe 10/05/1987 00:00 10/01/2022 00:00 1 > John Doe 10/05/1987 00:00 10/12/2021 00:00 2 > John Doe 10/05/1987 00:00 10/11/2021 00:00 3 > John Doe 10/05/1987 00:00 10/10/2021 00:00 4 > John Doe 10/05/1987 00:00 NULL 5{noformat} -- This message was sent by Atlassian Jira (v8.20.1#820001)