[ 
https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-8409:
-----------------------------------
    Fix Version/s:     (was: Impala 4.0)
                   Impala 3.3.0

> STRINGs without stats have too low row-size in explain plan
> -----------------------------------------------------------
>
>                 Key: IMPALA-8409
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8409
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.2.0
>            Reporter: Csaba Ringhofer
>            Assignee: Csaba Ringhofer
>            Priority: Minor
>              Labels: explain, statistics
>             Fix For: Impala 3.3.0
>
>
> STRING columns without avg_size statistic are calculated into the row-size as 
> 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in  the 
> memory if they are not empty). The issue is caused by adding -1 (meaning 
> unknown) to the 12 byte slot size.
> I think that this doesn't cause problems, as the estimation is probably way 
> off without statistics anyway, but row-size >= tuple size seems like a 
> meaningful invariant that we shouldn't break.
> Reproduce:
> {code}
> create table test_row_size (s string);
> explain select * from test_row_size; 
> Result:
> ...
> WARNING: The following tables are missing relevant table and/or column 
> statistics.
> default.test_row_size
> ...
> 00:SCAN HDFS [default.test_row_size]
>    partitions=1/1 files=0 size=0B
>    row-size=11B cardinality=0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to