Hi

Probably the metadata tables can help with this.

For the size/num_rows of partitions, you can query the files table,
https://iceberg.apache.org/docs/latest/spark-queries/#files.  (Because
Iceberg keeps stats for files, and not necessary partitions).

SELECT partition, sum(file_size_in_bytes), sum(record_count) from
$my_table.files f GROUP BY f.partition

This will be compressed size (again Iceberg keeps file-level stats and so
not sure if there are any stats for uncompressed sizes.)

For the last modified time, it will be slightly harder.  The file's
physical modified time is not good enough because it's not exactly when it
is 'committed' into Iceberg.   You may have to try a more advanced query on
the snapshots table and manifest-entries table:
https://iceberg.apache.org/docs/latest/spark-queries/#snapshots

SELECT MAX(s.committed_at),e.data_file.partition FROM $my_table.snapshots s
JOIN $my_table.entries e WHERE s.snapshot_id = e.snapshot_id GROUP_BY by
e.data_file.partition

Hope that helps,
Szehon

On Wed, Feb 23, 2022 at 8:50 AM Mayur Srivastava <
mayur.srivast...@twosigma.com> wrote:

> Hi,
>
>
>
> In Iceberg, is there a way to get the last modified timestamp and other
> stats (e.g. num rows, uncompressed size, compressed size) of the data per
> partition?
>
>
>
> Thanks,
>
> Mayur
>
>
>

Reply via email to