Hi again,
Ok, a quick summary of my current feedback on this:
- decoding speed measurements are given, but not footer size
measurements; it would be interesting to have both
- it's not obvious whether the stated numbers are for reading all
columns or a subset of them
- optional LZ4 compression is mentioned, but no numbers are given for
it; it would be nice if numbers were available for both uncompressed
and compressed footers
- the numbers seem quite underwhelming currently, I think most of us
were expecting massive speed improvements given past discussions
- I'm firmly against narrowing sizes to 32 bits; making the footer more
compact is useful, but not to the point of reducing usefulness or
generality
A more general proposal: given the slightly underwhelming perf
numbers, has nested Flatbuffers been considered as an alternative?
For example, the RowGroup table could become:
```
table ColumnChunk {
file_path: string;
meta_data: ColumnMetadata;
// etc.
}
struct EncodedColumnChunk {
// Flatbuffers-encoded ColumnChunk, to be decoded/validated indidually
column: [ubyte];
}
table RowGroup {
columns: [EncodedColumnChunk];
total_byte_size: int;
num_rows: int;
sorting_columns: [SortingColumn];
file_offset: long;
total_compressed_size: int;
ordinal: short = null;
}
```
Regards
Antoine.
On Thu, 11 Sep 2025 08:41:34 +0200
Alkis Evlogimenos
<[email protected]>
wrote:
> Hi all. I am sharing as a separate thread the proposal for the footer
> change we have been working on:
> https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit
> .
>
> The proposal outlines the technical aspects of the design and the
> experimental results of shadow testing this in production workloads. I
> would like to discuss the proposal's most salient points in the next sync:
> 1. the use of flatbuffers as footer serialization format
> 2. the additional limitations imposed on parquet files (row group size
> limit, row group max num row limit)
>
> I would prefer comments on the google doc to facilitate async discussion.
>
> Thank you,
>