On Sun, Nov 2, 2025 at 1:00 PM Jan Finis <[email protected]> wrote:

> Note that you can already put such metadata into the footer by just putting
> it into the regular key-value metadata. Put a JSON array as value there
> with the same number of entries as the schema, then you have an implicit
> 1-to-1 mapping per column. We already use this to store per-column metadata
> and haven't encountered any problems with it so far.
>

Of course you can put anything you want into a single metadata slot.
The hope is to have something that's sensible and semantically clear. An
advantage of the Thrift encoding is that adding structure entries doesn't
impact existing readers as they ignore values that they don't recognize.

I think this is a free lunch proposal -- there is benefit and no harm.

Here is another possibility: how about allowing extension of the Parquet
Thrift IDL in general by permitting all negative values in defined Structs
to be owned by users? There could be some registry if desired, but
something like this would allow users to add whatever data they like to the
existing metadata layout without impacting those using the standard IDL.
Although the Thrift IDL doc doesn't specify size for a Struct identifier,
the generated .tcc code uses a signed 16 bit value. This should allow for
plenty of additions to the accepted spec and user additions as well. Again,
there would be no impact to existing readers or writers.

-- 
Andrew Bell
[email protected]

Reply via email to