Hi,

Arrow2 and parquet2 have passed the IP clearance vote and are ready to be
merged to apache/* repos.

My plan is to merge them and PR to both of them to the latest updates on my
own repo, so that I can temporarily (and hopefully permanently) archive the
versions of my account and move development to apache/*.

Most of the work happening in arrow-rs is backward compatible or simple to
deprecate. However, this situation is different in arrow2 and parquet2. A
release cadence of a major every 3 months is prohibitive at the pace that I
am plowing through.

The core API (types, alloc, buffer, bitmap, array, mutable array) is imo
stable and not prone to change much, but the non-core API (namely IO and
compute) is prone to change. Examples:

* Add Scalar API to allow dynamic casting over the aggregate kernels and
parquet statistics
* move compute/ from the arrow crate into a separate crate
* move io/ from the arrow crate into a separate crate
* add option to select encoding based on DataType and field name when
writing to parquet

(I will create issues for them in the experimental repos for proper
visibility and discussion).

This situation is usually addressed via the 0.X model in semver 2 (in
Python fastAPI <https://fastapi.tiangolo.com/> is a predominant example
that uses it, and almost all in Rust also uses it). However, there are a
couple of blockers in this context:

1. We do not allow releases of experimental repos to avoid confusion over
which is *the* official package.
2. arrow-rs is at version 5, and some dependencies like IOx/Influx seem to
prefer a slower release cadence of breaking changes.

On the other hand, other parts of the community do not care about this
aspect. Polars for example, the fastest DataFrame in H2O benchmarks,
currently maintains an arrow2 branch that is faster and safer than master
[1], and will be releasing the Python binaries from the arrow2 branch. We
would like to release the Rust API also based on arrow2, which requires it
to be in Cargo.

The best “hack” that I can come up with given the constraints above is to
release arrow2 and parquet2 in cargo.io from my personal account so that
dependents can release to cargo while still making it obvious that they are
not the official release. However, this is obviously not ideal.

Any suggestions?

[1] https://github.com/pola-rs/polars/pull/922

Best,
Jorge

Reply via email to