[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported

2024-05-20 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-13103:
---

 Summary: Corrupt column stats are not reported
 Key: IMPALA-13103
 URL: https://issues.apache.org/jira/browse/IMPALA-13103
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Impala will report corrupt table stats in the query plan. However, corrupt 
column stats are not reported. For instance, consider the following table:
{code:sql}
create table t1 (id int, name string);
insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code}
with the following stats:
{code:sql}
alter table t1 set tblproperties('numRows'='4');
alter table t1 set column stats name ('numNulls'='0');{code}
Note that column "id" has missing stats and column "name" has missing/corrupt 
stats (ndv=-1, numNulls=0).
Grouping by "id" will report the missing stats:
{code:sql}
explain select id, count(*) from t1 group by id;

WARNING: The following tables are missing relevant table and/or column 
statistics.
default.t1{code}
However, grouping by "name" doesn't report the missing/corrupt stats:
{noformat}
explain select name, count(*) from t1 group by name;
+---+
| Explain String
|
+---+
| Max Per-Host Resource Reservation: Memory=38.00MB Threads=2   
|
| Per-Host Resource Estimates: Memory=144MB 
|
| Codegen disabled by planner   
|
| Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name 
|
|   
|
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
|
| |  Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB 
thread-reservation=2 |
| PLAN-ROOT SINK
|
| |  output exprs: name, count(*)   
|
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0|
| | 
|
| 01:AGGREGATE [FINALIZE]   
|
| |  output: count(*)   
|
| |  group by: name 
|
| |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB 
thread-reservation=0 |
| |  tuple-ids=1 row-size=20B cardinality=4 
|
| |  in pipelines: 01(GETNEXT), 00(OPEN)
|
| | 
|
| 00:SCAN HDFS [default.t1] 
|
|HDFS partitions=1/1 files=1 size=24B   
|
|stored statistics: 
|
|  table: rows=4 size=unavailable   
|
|  columns: all 
|
|extrapolated-rows=disabled max-scan-range-rows=4   
|
|mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1   
|
|tuple-ids=0 row-size=12B cardinality=4 
|
|in pipelines: 00(GETNEXT)  
|
+---+
{noformat}
CC [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-12754) Update Impala document to cover external jdbc table

2024-05-20 Thread Jankiram Balakrishnan (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-12754 started by Jankiram Balakrishnan.
--
> Update Impala document to cover external jdbc table
> ---
>
> Key: IMPALA-12754
> URL: https://issues.apache.org/jira/browse/IMPALA-12754
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Wenzhe Zhou
>Assignee: Jankiram Balakrishnan
>Priority: Major
>
> We need to document  the SQL syntax to create external JDBC table and alter 
> external JDBC table, including the table properties to be set for JDBC and 
> DBCP (Database Connection Pool).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org