> I have concerns about the Rust implementation which seems to hard-code
> writing out the new sort order.  It would be good to understand if this
> would break old readers (i.e. how do they handle unknown sort order at the
> thrift parsing stage).  I have less of a concern if like Java this is an
> opt-in feature for now.

Well, it is just a proof of concept ;-)

More seriously, the only generated code that I know of that would break is the
Rust version (which is one of the reasons we chose to roll our own thrift parser
in parquet-rs). I've looked at the Java, C++, and Python generated thrift code
and they all seem to tolerate unknown union fields (as they should, a union is
just a struct after all). I'm not sure what polars uses, so that might be an 
issue.

My original PoC for the earlier PR (#221 [1]) was opt-in, but by the time I took
over the implementation for the current PR (#514 [2]) a deliberate decision was
made to only use the new ColumnOrder for floats. The reasoning being making
it configurable added complexity to an already complex implementation, and
continuing to write files that cannot support pushdown makes little sense (so
long as existing implementations can simply ignore the new ordering). If it does
turn out that the change isn't as forward compatible as we think, we can revisit
before releasing it (which realistically won't be until August at the earliest).

We do have a file in parquet-testing [3] (thanks Gang!) that can be used to
check compatibility.

Cheers,
Ed

[1] https://github.com/apache/parquet-format/pull/221
[2] https://github.com/apache/parquet-format/pull/514
[3] https://github.com/apache/parquet-testing/pull/104

Reply via email to