[ 
https://issues.apache.org/jira/browse/IMPALA-11784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646598#comment-17646598
 ] 

ASF subversion and git services commented on IMPALA-11784:
----------------------------------------------------------

Commit da304c1feddd24ce5042d38365c9357d61608962 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=da304c1fe ]

IMPALA-11784: Don't call Iceberg's planFiles redundantly during table load

Iceberg's planFiles() API is very expensive because it involves reading
the Avro manifest files. It's especially expensive on object stores,
though manifest caching can help here.

Currently we invoke this API two times during table loading (via
IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in
loadPartitionStats().

With this patch we invoke IcebergUtil.getIcebergFiles() once, then pass
the result object to loadAllPartition() and loadPartitionStats().

Change-Id: I72575c722e65a91b14926cf24b4622b4499a4e20
Reviewed-on: http://gerrit.cloudera.org:8080/19334
Reviewed-by: <lipeng...@sensorsdata.cn>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Tamas Mate <tma...@apache.org>


> Don't call Iceberg's planFiles redundantly during table load
> ------------------------------------------------------------
>
>                 Key: IMPALA-11784
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11784
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> Iceberg's planFiles() API is very expensive because it involves reading the 
> Avro manifest files. It's especially expensive on object stores, though 
> manifest caching can help here.
> Currently we invoke this API two times during table loading (via 
> IcebergUtil.getIcebergFiles()), once in loadAllPartition() and once in 
> loadPartitionStats().
> We should just invoke IcebergUtil.getIcebergFiles() once, then pass the 
> result object to loadAllPartition() and loadPartitionStats().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to