Csaba Ringhofer created IMPALA-8409:
---------------------------------------

             Summary: STRINGs without stats have too low row-size in explain 
plan
                 Key: IMPALA-8409
                 URL: https://issues.apache.org/jira/browse/IMPALA-8409
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.2.0
            Reporter: Csaba Ringhofer


STRING columns without avg_size statistic are calculated into the row-size as 
11 bytes, while they take 12 bytes in the tuple (+ more somewhere in  the 
memory if they are not empty). The issue is caused by adding -1 (meaning 
unknown) to the 12 byte slot size.

I think that this doesn't cause problems, as the estimation is probably way off 
without statistics anyway, but row-size >= tuple size seems like a meaningful 
invariant that we shouldn't break.

Reproduce:
{code}
create table test_row_size (s string);
explain select * from test_row_size; 
Result:
...
WARNING: The following tables are missing relevant table and/or column 
statistics.
default.test_row_size
...
00:SCAN HDFS [default.test_row_size]
   partitions=1/1 files=0 size=0B
   row-size=11B cardinality=0
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to