Quanlong Huang created IMPALA-13103:
---
Summary: Corrupt column stats are not reported
Key: IMPALA-13103
URL: https://issues.apache.org/jira/browse/IMPALA-13103
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Quanlong Huang
Impala will report corrupt table stats in the query plan. However, corrupt
column stats are not reported. For instance, consider the following table:
{code:sql}
create table t1 (id int, name string);
insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code}
with the following stats:
{code:sql}
alter table t1 set tblproperties('numRows'='4');
alter table t1 set column stats name ('numNulls'='0');{code}
Note that column "id" has missing stats and column "name" has missing/corrupt
stats (ndv=-1, numNulls=0).
Grouping by "id" will report the missing stats:
{code:sql}
explain select id, count(*) from t1 group by id;
WARNING: The following tables are missing relevant table and/or column
statistics.
default.t1{code}
However, grouping by "name" doesn't report the missing/corrupt stats:
{noformat}
explain select name, count(*) from t1 group by name;
+---+
| Explain String
|
+---+
| Max Per-Host Resource Reservation: Memory=38.00MB Threads=2
|
| Per-Host Resource Estimates: Memory=144MB
|
| Codegen disabled by planner
|
| Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name
|
|
|
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|
| | Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB
thread-reservation=2 |
| PLAN-ROOT SINK
|
| | output exprs: name, count(*)
|
| | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
thread-reservation=0|
| |
|
| 01:AGGREGATE [FINALIZE]
|
| | output: count(*)
|
| | group by: name
|
| | mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB
thread-reservation=0 |
| | tuple-ids=1 row-size=20B cardinality=4
|
| | in pipelines: 01(GETNEXT), 00(OPEN)
|
| |
|
| 00:SCAN HDFS [default.t1]
|
|HDFS partitions=1/1 files=1 size=24B
|
|stored statistics:
|
| table: rows=4 size=unavailable
|
| columns: all
|
|extrapolated-rows=disabled max-scan-range-rows=4
|
|mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1
|
|tuple-ids=0 row-size=12B cardinality=4
|
|in pipelines: 00(GETNEXT)
|
+---+
{noformat}
CC [~rizaon]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org