[jira] [Updated] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization

Michael Smith (Jira) Tue, 16 Jul 2024 16:44:22 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michael Smith updated IMPALA-12395:
-----------------------------------
    Component/s: Frontend
                     (was: fe)

> Planner overestimates scan cardinality for queries using count star 
> optimization
> --------------------------------------------------------------------------------
>
>                 Key: IMPALA-12395
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12395
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: David Rorke
>            Assignee: Riza Suminto
>            Priority: Critical
>             Fix For: Impala 4.3.0
>
>
> The scan cardinality estimate for count(*) queries doesn't account for the 
> fact that the count(*) optimization only scans metadata and not the actual 
> columns.
> Scan for a count(*) query on Parquet store_sales:
>  
> {noformat}
> Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak 
> Mem Detail 
> -----------------------------------------------------------------------------------------------------------------------------------------------------
> 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB 
> tpcds_3000_string_parquet_managed.store_sales
> {noformat}
>  
> This is a problem with all file/table formats that implement count(*) 
> optimizations (Parquet and also probably ORC and Iceberg).
> This problem is more serious than it was in the past because with 
> IMPALA-12091 we now rely on scan cardinality estimates for executor group 
> assignments so count(*) queries are likely to get assigned to a larger 
> executor group than needed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12395) Planner overestimates scan cardinality for queries using count star optimization

Reply via email to