[
https://issues.apache.org/jira/browse/HIVE-18279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296624#comment-16296624
]
Oleksiy Sayankin commented on HIVE-18279:
-----------------------------------------
Our customer uses his own SerDe format for storing data in tables. Stats can't
be collected on this data, so when we execute
{code}
select count(*) from my_custom_serde_table;
{code}
We see 0, although table does have values. This code
{code}
select count(some_column) from my_custom_serde_table;
{code}
works fine and shows correct value. After investigation I understood that in
first case Hive tries to optimize query and get row count from statistic, but
fails since it does not consider 0 as absence of optimization data.
> Incorrect condition in StatsOpimizer
> ------------------------------------
>
> Key: HIVE-18279
> URL: https://issues.apache.org/jira/browse/HIVE-18279
> Project: Hive
> Issue Type: Bug
> Components: Statistics
> Reporter: Oleksiy Sayankin
> Assignee: Oleksiy Sayankin
> Fix For: 3.0.0
>
> Attachments: HIVE-18279.1.patch
>
>
> At the moment {{StatsOpimizer}} has code
> {code}
> if (rowCnt == null) {
> // if rowCnt < 1 than its either empty table or table on which
> stats are not
> // computed We assume the worse and don't attempt to optimize.
> Logger.debug("Table doesn't have up to date stats " +
> tbl.getTableName());
> rowCnt = null;
> }
> {code}
> in method {{private Long getRowCnt()}}. Condition
> {code}
> if (rowCnt == null) {
> {code}
> should be changed to
> {code}
> if (rowCnt == null || rowCnt == 0) {
> {code}
> because 0 value also means that table stats may not be computed.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)