[jira] [Comment Edited] (IMPALA-11701) Slow query problem about querying iceberg table by impala

Qizhu Chan (Jira) Thu, 03 Nov 2022 23:15:26 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17628693#comment-17628693
 ]


Qizhu Chan edited comment on IMPALA-11701 at 11/4/22 6:14 AM:
--------------------------------------------------------------

Oh, I found an issue 
[IMPALA-11171|https://issues.apache.org/jira/browse/IMPALA-11171] that seems to 
be related to my issue ?


was (Author: libra_816):
Oh, I found an 
issue[IMPALA-11171|https://issues.apache.org/jira/browse/IMPALA-11171] that 
seems to be related to my issue ?

> Slow query problem about querying iceberg table by impala
> ---------------------------------------------------------
>
>                 Key: IMPALA-11701
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11701
>             Project: IMPALA
>          Issue Type: Question
>            Reporter: Qizhu Chan
>            Priority: Major
>              Labels: impala-iceberg
>         Attachments: image-2022-11-03-17-37-14-712.png, 
> profile_cf446a1ab3a5e852_1b1005de00000000.txt
>
>
> I use impala to query iceberg table, but the query efficiency is not ideal, 
> compared with querying the hive format table of the same data, the 
> time-consuming increase is dozens of times.
> The sql statement used is a very simple statistical query, be like :
> select count(*)  from `db_name`.tbl_name where datekey='20221001' and 
> event='xxx'
> ('datekey' and 'event' are the partition fields)
> My personal feeling is that impala might fetch iceberg's metadata stats and 
> return results very quickly, but it doesn't.
> The catalog of iceberg table is of the hadoop type, and Impala can access it 
> by creating an external table in hive. By the way,  iceberg table will 
> perform snapshot expiration and data compaction on a daily basis, so there 
> should be no small file problems.
> I found this warning using the explain statement:
> {code:java}
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | iceberg.gamebox_event_iceberg
> {code}
> Query: SHOW TABLE STATS `iceberg`.gamebox_event_iceberg
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> | #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | 
> Incremental stats | Location                                                  
>       |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> | 0     | 590509 | 1.91TB | NOT CACHED   | NOT CACHED        | PARQUET | 
> false             | hdfs:///hive/warehouse/iceberg/gamebox_event_iceberg |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+-----------------------------------------------------------------+
> It seems like Impala is not syncing iceberg's table and column statistics. 
> I'm not sure if this has anything to do with slow queries.
> As shown in the screenshot, i think the query time is mainly on planning and 
> execution backends , but I don't know what is the reason for these two time 
> consuming.
> Attachment is the complete profile for this query.
> How do I speed up the query? Can someone help with my question？plz.....
>  !image-2022-11-03-17-37-14-712.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-11701) Slow query problem about querying iceberg table by impala

Reply via email to