[ 
https://issues.apache.org/jira/browse/HIVE-18279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16296624#comment-16296624
 ] 

Oleksiy Sayankin commented on HIVE-18279:
-----------------------------------------

Our customer uses his own SerDe format for storing data in tables. Stats can't 
be collected on this data, so when we execute

{code}
select count(*) from my_custom_serde_table;
{code}

We see 0, although table does have values. This code

{code}
select count(some_column) from my_custom_serde_table;
{code}

works fine and shows correct value. After investigation I understood that in 
first case Hive tries to optimize query and get row count from statistic, but 
fails since it does not consider 0 as absence of optimization data. 

> Incorrect condition in StatsOpimizer
> ------------------------------------
>
>                 Key: HIVE-18279
>                 URL: https://issues.apache.org/jira/browse/HIVE-18279
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>            Reporter: Oleksiy Sayankin
>            Assignee: Oleksiy Sayankin
>             Fix For: 3.0.0
>
>         Attachments: HIVE-18279.1.patch
>
>
> At the moment {{StatsOpimizer}} has code
> {code}
>         if (rowCnt == null) {
>           // if rowCnt < 1 than its either empty table or table on which 
> stats are not
>           //  computed We assume the worse and don't attempt to optimize.
>           Logger.debug("Table doesn't have up to date stats " + 
> tbl.getTableName());
>           rowCnt = null;
>         }
> {code}
> in method {{private Long getRowCnt()}}. Condition 
> {code}
> if (rowCnt == null) {
> {code}
> should be changed to 
> {code}
> if (rowCnt == null || rowCnt == 0) {
> {code}
> because 0 value also means that table stats may not be computed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to