[ https://issues.apache.org/jira/browse/HIVE-7060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xuefu Zhang updated HIVE-7060: ------------------------------ Description: It seems that the result from column statistics isn't correct on two measures for numeric columns: min (which is always 0) and distinct count. Here is an example: {code} select count(distinct avgTimeOnSite), min(avgTimeOnSite) from UserVisits_web_text_none; ... OK 9 1 Time taken: 9.747 seconds, Fetched: 1 row(s) {code} The statisitics for the column: {code} desc formatted UserVisits_web_text_none avgTimeOnSite ... # col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment avgTimeOnSite int 0 9 0 11 null null null {code} was: It seems that the result from column statistics isn't correct on two measures for numeric columns: min (which is always 0) and distinct count. Here is an example: {code} select count(distinct avgTimeOnSite), min(avgTimeOnSite) from UserVisits_web_text_none; ... OK 9 1 Time taken: 9.747 seconds, Fetched: 1 row(s) {code} The statisitics for the column: {code} PREHOOK: query: desc formatted UserVisits_web_text_none avgTimeOnSite PREHOOK: type: DESCTABLE PREHOOK: Input: default@uservisits_web_text_none POSTHOOK: query: desc formatted UserVisits_web_text_none avgTimeOnSite POSTHOOK: type: DESCTABLE POSTHOOK: Input: default@uservisits_web_text_none # col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment avgTimeOnSite int 0 9 0 11 null null null {code} > Column stats give incorrect min and distinct_count > -------------------------------------------------- > > Key: HIVE-7060 > URL: https://issues.apache.org/jira/browse/HIVE-7060 > Project: Hive > Issue Type: Bug > Components: Statistics > Affects Versions: 0.13.0 > Reporter: Xuefu Zhang > > It seems that the result from column statistics isn't correct on two measures > for numeric columns: min (which is always 0) and distinct count. Here is an > example: > {code} > select count(distinct avgTimeOnSite), min(avgTimeOnSite) from > UserVisits_web_text_none; > ... > OK > 9 1 > Time taken: 9.747 seconds, Fetched: 1 row(s) > {code} > The statisitics for the column: > {code} > desc formatted UserVisits_web_text_none avgTimeOnSite > ... > # col_name data_type min max > num_nulls distinct_count avg_col_len > max_col_len num_trues num_falses > comment > avgTimeOnSite int 0 9 > 0 11 null > null null > {code} -- This message was sent by Atlassian JIRA (v6.2#6252)