[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan
[ https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124813#comment-17124813 ] Csaba Ringhofer commented on IMPALA-8409: - [~tarmstrong] yes, thanks for the reminder, closing it > STRINGs without stats have too low row-size in explain plan > --- > > Key: IMPALA-8409 > URL: https://issues.apache.org/jira/browse/IMPALA-8409 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.2.0 >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Minor > Labels: explain, statistics > > STRING columns without avg_size statistic are calculated into the row-size as > 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the > memory if they are not empty). The issue is caused by adding -1 (meaning > unknown) to the 12 byte slot size. > I think that this doesn't cause problems, as the estimation is probably way > off without statistics anyway, but row-size >= tuple size seems like a > meaningful invariant that we shouldn't break. > Reproduce: > {code} > create table test_row_size (s string); > explain select * from test_row_size; > Result: > ... > WARNING: The following tables are missing relevant table and/or column > statistics. > default.test_row_size > ... > 00:SCAN HDFS [default.test_row_size] >partitions=1/1 files=0 size=0B >row-size=11B cardinality=0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan
[ https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107632#comment-17107632 ] Tim Armstrong commented on IMPALA-8409: --- [~csringhofer] this is fixed, right? > STRINGs without stats have too low row-size in explain plan > --- > > Key: IMPALA-8409 > URL: https://issues.apache.org/jira/browse/IMPALA-8409 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.2.0 >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Minor > Labels: explain, statistics > > STRING columns without avg_size statistic are calculated into the row-size as > 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the > memory if they are not empty). The issue is caused by adding -1 (meaning > unknown) to the 12 byte slot size. > I think that this doesn't cause problems, as the estimation is probably way > off without statistics anyway, but row-size >= tuple size seems like a > meaningful invariant that we shouldn't break. > Reproduce: > {code} > create table test_row_size (s string); > explain select * from test_row_size; > Result: > ... > WARNING: The following tables are missing relevant table and/or column > statistics. > default.test_row_size > ... > 00:SCAN HDFS [default.test_row_size] >partitions=1/1 files=0 size=0B >row-size=11B cardinality=0 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan
[ https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832672#comment-16832672 ] ASF subversion and git services commented on IMPALA-8409: - Commit c2516d220da8e532b6ebdb6f3a12e7ad97c4f597 in impala's branch refs/heads/master from Csaba Ringhofer [ https://gitbox.apache.org/repos/asf?p=impala.git;h=c2516d2 ] IMPALA-8409: Fix row-size for STRING columns with unknown stats Explain returned row-size=11B for STRING columns without statistics. The issue was caused by adding -1 (meaning unknown) to the 12 byte slot size (sizeof(StringValue)). The code in TupleDescriptor.java tried to handle this by checking if the size is -1, but it was already 11 at this point. There is more potential for cleanup, but I wanted to keep this change minimal. Testing: - revived some tests in CatalogTest.java that were removed in 2013 due to flakiness - added an EE test that checks row size with and without stats - fixed a similar test, test_explain_validate_cardinality_estimates (the format of the line it looks for has changed, which lead to skipping the actual verification and accepting everything) - ran core FE and EE tests Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5 Reviewed-on: http://gerrit.cloudera.org:8080/13190 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > STRINGs without stats have too low row-size in explain plan > --- > > Key: IMPALA-8409 > URL: https://issues.apache.org/jira/browse/IMPALA-8409 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 3.2.0 >Reporter: Csaba Ringhofer >Assignee: Csaba Ringhofer >Priority: Minor > Labels: explain, statistics > > STRING columns without avg_size statistic are calculated into the row-size as > 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in the > memory if they are not empty). The issue is caused by adding -1 (meaning > unknown) to the 12 byte slot size. > I think that this doesn't cause problems, as the estimation is probably way > off without statistics anyway, but row-size >= tuple size seems like a > meaningful invariant that we shouldn't break. > Reproduce: > {code} > create table test_row_size (s string); > explain select * from test_row_size; > Result: > ... > WARNING: The following tables are missing relevant table and/or column > statistics. > default.test_row_size > ... > 00:SCAN HDFS [default.test_row_size] >partitions=1/1 files=0 size=0B >row-size=11B cardinality=0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org