[ https://issues.apache.org/jira/browse/IMPALA-12395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Smith updated IMPALA-12395: ----------------------------------- Component/s: Frontend (was: fe) > Planner overestimates scan cardinality for queries using count star > optimization > -------------------------------------------------------------------------------- > > Key: IMPALA-12395 > URL: https://issues.apache.org/jira/browse/IMPALA-12395 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: David Rorke > Assignee: Riza Suminto > Priority: Critical > Fix For: Impala 4.3.0 > > > The scan cardinality estimate for count(*) queries doesn't account for the > fact that the count(*) optimization only scans metadata and not the actual > columns. > Scan for a count(*) query on Parquet store_sales: > > {noformat} > Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak > Mem Detail > ----------------------------------------------------------------------------------------------------------------------------------------------------- > 00:SCAN S3 6 72 8s131ms 8s496ms 2.71K 8.64B 128.00 KB 88.00 MB > tpcds_3000_string_parquet_managed.store_sales > {noformat} > > This is a problem with all file/table formats that implement count(*) > optimizations (Parquet and also probably ORC and Iceberg). > This problem is more serious than it was in the past because with > IMPALA-12091 we now rely on scan cardinality estimates for executor group > assignments so count(*) queries are likely to get assigned to a larger > executor group than needed. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org