[ 
https://issues.apache.org/jira/browse/IMPALA-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-10254:
---------------------------------------
    Description: 
Currently we still load the file descriptors of an Iceberg table via recursive 
file listing.

This lists too many files, e.g. metadata files, files that are being written 
(can later throw checksum errors), files from aborted INSERTs, removed files, 
etc.

We should use the Iceberg API to load the file descriptors corresponding to the 
table snapshot.

Note that we already load data files through the Iceberg APIs to fill the 
'path_hash_to_file_descriptor' map 
([https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L551).]

  was:
Currently we still load the file descriptors of an Iceberg table via recursive 
file listing.

This lists too many files, e.g. metadata files, files that are being written 
(can later throw checksum errors), files from aborted INSERTs, removed files, 
etc.

We should use the Iceberg API to load the file descriptors corresponding to the 
table snapshot.


> Load data files via Iceberg for Iceberg Tables
> ----------------------------------------------
>
>                 Key: IMPALA-10254
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10254
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently we still load the file descriptors of an Iceberg table via 
> recursive file listing.
> This lists too many files, e.g. metadata files, files that are being written 
> (can later throw checksum errors), files from aborted INSERTs, removed files, 
> etc.
> We should use the Iceberg API to load the file descriptors corresponding to 
> the table snapshot.
> Note that we already load data files through the Iceberg APIs to fill the 
> 'path_hash_to_file_descriptor' map 
> ([https://github.com/apache/impala/blob/master/common/thrift/CatalogObjects.thrift#L551).]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to