[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan

2020-06-03 Thread Csaba Ringhofer (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124813#comment-17124813
 ] 

Csaba Ringhofer commented on IMPALA-8409:
-

[~tarmstrong] yes, thanks for the reminder, closing it

> STRINGs without stats have too low row-size in explain plan
> ---
>
> Key: IMPALA-8409
> URL: https://issues.apache.org/jira/browse/IMPALA-8409
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Minor
>  Labels: explain, statistics
>
> STRING columns without avg_size statistic are calculated into the row-size as 
> 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in  the 
> memory if they are not empty). The issue is caused by adding -1 (meaning 
> unknown) to the 12 byte slot size.
> I think that this doesn't cause problems, as the estimation is probably way 
> off without statistics anyway, but row-size >= tuple size seems like a 
> meaningful invariant that we shouldn't break.
> Reproduce:
> {code}
> create table test_row_size (s string);
> explain select * from test_row_size; 
> Result:
> ...
> WARNING: The following tables are missing relevant table and/or column 
> statistics.
> default.test_row_size
> ...
> 00:SCAN HDFS [default.test_row_size]
>partitions=1/1 files=0 size=0B
>row-size=11B cardinality=0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan

2020-05-14 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17107632#comment-17107632
 ] 

Tim Armstrong commented on IMPALA-8409:
---

[~csringhofer] this is fixed, right?

> STRINGs without stats have too low row-size in explain plan
> ---
>
> Key: IMPALA-8409
> URL: https://issues.apache.org/jira/browse/IMPALA-8409
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Minor
>  Labels: explain, statistics
>
> STRING columns without avg_size statistic are calculated into the row-size as 
> 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in  the 
> memory if they are not empty). The issue is caused by adding -1 (meaning 
> unknown) to the 12 byte slot size.
> I think that this doesn't cause problems, as the estimation is probably way 
> off without statistics anyway, but row-size >= tuple size seems like a 
> meaningful invariant that we shouldn't break.
> Reproduce:
> {code}
> create table test_row_size (s string);
> explain select * from test_row_size; 
> Result:
> ...
> WARNING: The following tables are missing relevant table and/or column 
> statistics.
> default.test_row_size
> ...
> 00:SCAN HDFS [default.test_row_size]
>partitions=1/1 files=0 size=0B
>row-size=11B cardinality=0
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8409) STRINGs without stats have too low row-size in explain plan

2019-05-03 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832672#comment-16832672
 ] 

ASF subversion and git services commented on IMPALA-8409:
-

Commit c2516d220da8e532b6ebdb6f3a12e7ad97c4f597 in impala's branch 
refs/heads/master from Csaba Ringhofer
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c2516d2 ]

IMPALA-8409: Fix row-size for STRING columns with unknown stats

Explain returned row-size=11B for STRING columns without statistics.
The issue was caused by adding -1 (meaning unknown) to the 12 byte
slot size (sizeof(StringValue)). The code in TupleDescriptor.java
tried to handle this by checking if the size is -1, but it was
already 11 at this point.

There is more potential for cleanup, but I wanted to keep this
change minimal.

Testing:
- revived some tests in CatalogTest.java that were removed
  in 2013 due to flakiness
- added an EE test that checks row size with and without stats
- fixed a similar test, test_explain_validate_cardinality_estimates
  (the format of the line it looks for has changed, which lead to
  skipping the actual verification and accepting everything)
- ran core FE and EE tests

Change-Id: I866acf10b2c011a735dee019f4bc29358f2ec4e5
Reviewed-on: http://gerrit.cloudera.org:8080/13190
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> STRINGs without stats have too low row-size in explain plan
> ---
>
> Key: IMPALA-8409
> URL: https://issues.apache.org/jira/browse/IMPALA-8409
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Csaba Ringhofer
>Assignee: Csaba Ringhofer
>Priority: Minor
>  Labels: explain, statistics
>
> STRING columns without avg_size statistic are calculated into the row-size as 
> 11 bytes, while they take 12 bytes in the tuple (+ more somewhere in  the 
> memory if they are not empty). The issue is caused by adding -1 (meaning 
> unknown) to the 12 byte slot size.
> I think that this doesn't cause problems, as the estimation is probably way 
> off without statistics anyway, but row-size >= tuple size seems like a 
> meaningful invariant that we shouldn't break.
> Reproduce:
> {code}
> create table test_row_size (s string);
> explain select * from test_row_size; 
> Result:
> ...
> WARNING: The following tables are missing relevant table and/or column 
> statistics.
> default.test_row_size
> ...
> 00:SCAN HDFS [default.test_row_size]
>partitions=1/1 files=0 size=0B
>row-size=11B cardinality=0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org