[ https://issues.apache.org/jira/browse/IMPALA-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Becker updated IMPALA-11171: ----------------------------------- Fix Version/s: Not Applicable (was: Impala 4.2.0) > Impala still re-reads Iceberg manifest files for each SCAN node. > ---------------------------------------------------------------- > > Key: IMPALA-11171 > URL: https://issues.apache.org/jira/browse/IMPALA-11171 > Project: IMPALA > Issue Type: Bug > Components: Frontend > Reporter: Zoltán Borók-Nagy > Priority: Major > Labels: impala-iceberg > Fix For: Not Applicable > > Attachments: Screen Shot 2022-03-12 at 3.23.28 PM.png > > > In IcebergUtil.getIcebergDataFiles() we issue scan.planFiles(): > https://github.com/apache/impala/blob/7f1ce039be30d5b36a490e8b07728f82f5d4c3de/fe/src/main/java/org/apache/impala/util/IcebergUtil.java#L534 > scan.planFiles() needs to read the manifest files to return a list of files > to be scanned. This unfortunately adds significant overhead to the plan time > for short-running queries. > Maybe we can do the followings to mitigate this issue: > * cache TableScan.planFiles() without predicates being used, and use this > instead of pushing predicates to Iceberg. It would need a logic to decide > when to use the cached plan files and when to push down predicates > * Figure out if it is possible to cache manifest files so we don't need to > re-read them for each table scan. > ** If this is not possible then we might need to contribute code to Iceberg -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org