mustafaiman commented on a change in pull request #1824:
URL: https://github.com/apache/hive/pull/1824#discussion_r553168379
##########
File path: ql/src/test/results/clientpositive/llap/auto_sortmerge_join_14.q.out
##########
@@ -194,7 +222,7 @@ STAGE PLANS:
keys:
0 _col0 (type: int)
1 _col0 (type: int)
- Statistics: Num rows: 221 Data size: 1768 Basic stats:
COMPLETE Column stats: COMPLETE
+ Statistics: Num rows: 220 Data size: 1760 Basic stats:
COMPLETE Column stats: COMPLETE
Review comment:
Yes, before the patch distinct count on tbl2_n6 was miscalculated.
Running the modified test before the patch reveals that distinct count on `key`
was calculated wrong.
```
describe formatted tbl2_n6 key;
select count (distinct key) from tbl2_n6;
```
```
POSTHOOK: Input: default@tbl2_n6
col_name key
data_type int
min 0
max 199
num_nulls 0
distinct_count 117
avg_col_len
max_col_len
num_trues
num_falses
bit_vector HL
comment from deserializer
COLUMN_STATS_ACCURATE
{\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"key\":\"true\",\"value\":\"true\"}}
...
POSTHOOK: query: select count (distinct key) from tbl2_n6
...
121
```
After the patch distinct count is correct. So the stats in the following
query is correct now.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]