[ 
https://issues.apache.org/jira/browse/IMPALA-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-11608:
---------------------------------------
    Description: 
Impala SHOW TABLE stats outputs wrong value for number of files for Iceberg 
tables. It should only calculate the number of data files, but it calculates 
all files under the table directory, including metadata files, orphaned files, 
and old data files not belonging to the current snapshot.

It should only output the number of data files in the current snapshot, making 
the output consistent with SHOW FILES IN tbl;

{noformat}
create table test (i int) stored as iceberg;

compute stats test;

show table stats test;

+-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
| #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | 
Incremental stats | Location                                   |
+-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
| -1    | 2      | 2.70KB | NOT CACHED   | NOT CACHED        | PARQUET | false  
           | hdfs://localhost:20500/test-warehouse/test |
+-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
{noformat}

SHOW TABLE STATS is handled here: 
https://github.com/apache/impala/blob/66484a4c081f3242750a3a0e04159dd4580b37a4/fe/src/main/java/org/apache/impala/service/Frontend.java#L1429-L1457


  was:
Impala SHOW TABLE stats outputs wrong value for number of files. It should only 
calculate the number of data files, but it calculates all files under the table 
directory, including metadata files, orphaned files, and old data files not 
belonging to the current snapshot.

It should only output the number of data files in the current snapshot, making 
the output consistent with SHOW FILES IN tbl;

{noformat}
create table test (i int) stored as iceberg;

compute stats test;

show table stats test;

+-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
| #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | 
Incremental stats | Location                                   |
+-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
| -1    | 2      | 2.70KB | NOT CACHED   | NOT CACHED        | PARQUET | false  
           | hdfs://localhost:20500/test-warehouse/test |
+-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
{noformat}

SHOW TABLE STATS is handled here: 
https://github.com/apache/impala/blob/66484a4c081f3242750a3a0e04159dd4580b37a4/fe/src/main/java/org/apache/impala/service/Frontend.java#L1429-L1457



> Impala SHOW TABLE STATS shows wrong number of files
> ---------------------------------------------------
>
>                 Key: IMPALA-11608
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11608
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg, ramp-up
>
> Impala SHOW TABLE stats outputs wrong value for number of files for Iceberg 
> tables. It should only calculate the number of data files, but it calculates 
> all files under the table directory, including metadata files, orphaned 
> files, and old data files not belonging to the current snapshot.
> It should only output the number of data files in the current snapshot, 
> making the output consistent with SHOW FILES IN tbl;
> {noformat}
> create table test (i int) stored as iceberg;
> compute stats test;
> show table stats test;
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> | #Rows | #Files | Size   | Bytes Cached | Cache Replication | Format  | 
> Incremental stats | Location                                   |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> | -1    | 2      | 2.70KB | NOT CACHED   | NOT CACHED        | PARQUET | 
> false             | hdfs://localhost:20500/test-warehouse/test |
> +-------+--------+--------+--------------+-------------------+---------+-------------------+--------------------------------------------+
> {noformat}
> SHOW TABLE STATS is handled here: 
> https://github.com/apache/impala/blob/66484a4c081f3242750a3a0e04159dd4580b37a4/fe/src/main/java/org/apache/impala/service/Frontend.java#L1429-L1457



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to