Konstantin Bereznyakov created HIVE-29534:
---------------------------------------------

             Summary: Statistics: StatsUtils::getColStatistics does not set 
NDV/some other fields for DATE?TIMESTAMP columns
                 Key: HIVE-29534
                 URL: https://issues.apache.org/jira/browse/HIVE-29534
             Project: Hive
          Issue Type: Bug
            Reporter: Konstantin Bereznyakov


Technically, the method is missing stats for multiple data types. The most 
important ones seem to be: setCountDistint() for DATE_TYPE_NAME and 
TIMESTAMP_TYPE_NAME
The TIMESTAMP datatype could also benefit from setBitVectors(), for which the 
info also appears to be available.

As the result of this, the NDV of columns of this data type is assigned a value 
of 0. which could negatively impact execution planning of some queries

[https://github.com/apache/hive/blob/bbd83dff5bfc8b8ce018476391469da3331216dd/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L840]

[https://github.com/apache/hive/blob/bbd83dff5bfc8b8ce018476391469da3331216dd/ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java#L870]

Adding this info seems to change the output of about 100 .out files



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to