Re: Typical number of key-value metadata entries?

Fokko Driesprong Thu, 16 May 2024 13:45:19 -0700

Hey Antoine,

First of all, love the recent uptake in activity on the Parquet side. I'm
on holiday, but I'll definitly catch up when I return.


I wanted to respond to this particular mail since we do store various
fields in the metadata for Apache Iceberg. For example:

   - The JSON serialized Iceberg schema that was used when writing the
   file:
   
https://github.com/apache/iceberg/blob/bd046f844a1cbad6c98919d8ea63176aeae78d33/parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java#L274
   - I
   
<https://github.com/apache/iceberg/blob/bd046f844a1cbad6c98919d8ea63176aeae78d33/parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java#L274>n
   the case of delete files, we write the kind of file (positional or
   equality), and in the case of equality, also the field IDs:
   
https://github.com/apache/iceberg/blob/bd046f844a1cbad6c98919d8ea63176aeae78d33/parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java#L905-L910

This is mostly for debugging purposes. The schema could become quite big as
it is proportional to the number of columns. The metadata is mostly set for
debugging purposes and is not part of the official Iceberg spec.

I hope this helps!

Kind regards,
Fokko

Op do 16 mei 2024 om 21:17 schreef Antoine Pitrou <[email protected]>:

>
> Hello,
>
> In https://github.com/apache/parquet-format/pull/242 the question came
> of the size and overhead of key-value metadata entries in real world
> Parquet files (basically, user-defined metadata attached either to the
> entire file or to individual columns). Do people have insight to share
> about the typical number of metadata entries in a file or column, and
> their typical byte size?
>
> Regards
>
> Antoine.
>
>
>

Re: Typical number of key-value metadata entries?

Reply via email to