Thanks Alkis! I so wanted to love this proposal, but unfortunately I don't
think it will work.

> 2. Mark FileMetaData.version optional in thrift. A writer that sets the
> bundle field omits version. A file carries exactly one of the two.
> 
> The trick is (2): the deployed readers I checked hard-fail at footer parse
> when FileMetaData.version is missing: parquet-java, arrow-cpp, parquet-rs
> and DuckDB. They all enforce its presence even though the spec says to
> ignore its value. Old readers fail immediately on open instead of tripping
> on obscure errors later, or worse, reading bad data.

The problem is that the footer metadata is written and parsed depth first.
Validation only happens after all of the fields of a struct have been read.
So even if "version" is missing, an old reader won't know this until well
after the row group metadata has been parsed. If "path_in_schema" is
missing as well, that error will be thrown first. The only ways I can think to
make this work are pretty convoluted. I can share my ideas in a separate
thread if there's interest.

Cheers,
Ed



Reply via email to