Hi,

Thanks for your input.

Every time there is a new major release, all new development shifts towards
that new API and users of previous APIs are left behind. It is not just a
matter of SemVer and size of version numbers, there is a whole development
shift to be on top of the new API.

I disagree that a software that has a major release every 3 months and no
maintenance window over previous versions is stable. I alluded to the Tokio
example because Tokio 1.0 recently became the runtime of rust-based AWS
lambda functions [1]; this commitment is only possible by enforcing API
stability and maintenance beyond a 3 month period (at least 3 years in
their case).

Also, imo the current major version number is not meaningless: divided by
the software age, it constitutes the historical release pattern and is
usually a good predictor of the pattern used in future releases.

The evidence is that we haven't been able to support any version for any
period of time; recently, Andrew has been doing amazing work at supporting
the latest version for a period of 3 months. I.e. an application that
depends on `arrow = ^5.0` has a support window of 3 months. Given that we
have not backported any security fixes to previous versions, it is
reasonable to assume that security patches are also applied within a 3
month period only.

As contributor of arrow2, I would rather not have arrow2 under Apache Arrow
than having to release it under its current versioning and scheduling (this
is similar to some of Julia's concerns). As a contributor to the Apache
Arrow, I currently cannot guarantee a maintenance window over arrow-rs for
any period of time because it is unsafe by design and I do not have the
motivation to fix it. As both, I am confident that the core arrow2 will
soon reach a point where we can live with and develop on top of it for at
least a year. This is not true to the whole API surface, though: there are
APIs that we will need to change more often until stability can be promised.

So, I am requesting that we tie the discussion of arrow2 to how it will be
released.

Could a middle ground be somewhere along the lines of splitting the crate
in smaller crates that are versioned independently. I.e. continue to
release `arrow` under the same versioning and cadence, and create 3 new
crates, arrow-core, arrow-compute, and arrow-io (see also [2]) that would
have their own versioning at 0.X until stability is achieved, based on
arrow2's code base. The migration of the `arrow` crate to arrow2's API
would be to re-export from the smaller crates (e.g. `pub use
arrow_core::array`).

[1] https://crates.io/crates/lambda_runtime/0.3.1/dependencies
[2] https://github.com/jorgecarleitao/arrow2/issues/257

Best,
Jorge


On Thu, Aug 5, 2021 at 11:53 PM Adam Lippai <a...@rigo.sk> wrote:

> Not taking sides, just two technical notes below.
>
> Server.org clearly defines (
> https://semver.org/#how-do-i-know-when-to-release-100) the versions
> >1.0.0.
> * If it's used in production, it's 1.0.0.
> * If it provides an API others depend on then it's 1.0.0.
> * If you intend to keep backward compatibility, it's 1.0.0.
> Tl;Dr 1.0.0 represents a version which from point we guarantee that
> non-production releases are marked (alpha, beta, rc) and breaking (API)
> changes, backwards incompatible changes result in major version bump. This
> we already do, 4x per year.
>
> The second fact is that arrow2 uses the arrow name, but it doesn't have
> apache governance. It's not released from GitHub.com/apache, there are no
> formal releases, there are no votes. This is not correct or fair usage of
> the brand (on the same level as DataFuse, or db-benchmark calling a custom
> R implementation arrow) even if it's "unofficial". My understanding is that
> arrow2 can be an unofficial implementation with a different name or an
> arrow-rs experiment with the intention to merge the code, but not both.
>
> I think both issues could be solved and I really value and like the arrow2
> work so far. That's the right way. I hope we'll see it in prod either way
> as soon as it's ready.
>
> Best regards,
> Adam Lippai
>
> On Wed, Aug 4, 2021, 08:25 QP Hou <houqp....@gmail.com> wrote:
>
> > Just my two cents.
> >
> > I think we all have the same goal here, which is to accelerate the
> > transitioning of arrow to arrow2 as the official arrow rust
> > implementation.
> >
> > In my opinion, the biggest gain we can get from merging two projects
> > into one repo is to have some kind of a policy to enforce that every
> > new feature/test added to the current arrow implementation also  needs
> > to be added to the arrow2 implementation. This way, we can make sure
> > the gap between arrow and arrow2 is closing on every iteration.
> > Without this, I tend to agree with Jorge that merging two repos would
> > add more overhead to his work and slow him down.
> >
> > For those who want to contribute to arrow2 to accelerate the
> > transition, I don't think they would have problem sending PRs to the
> > arrow2 repo. For those who are not interested in contributing to
> > arrow2, merging the arrow2 code base into the current arrow-rs repo
> > won't incentivize them to contribute. Merging arrow2 into current
> > arrow-rs repo could help with discovery. But I think this can be
> > achieved by adding a big note in the current arrow-rs README to
> > encourage contributions to the arrow2 repo as well.
> >
> > At the end of the day, Jorge is currently the sole active contributor
> > to the arrow2 implementation, so I think he would have the most say on
> > what's the most productive way to push arrow2 forward. The only
> > concern I have with regards to merging arrow2 into arrow-rs right now
> > is Jorge spent all the efforts to do the merge, then it turned out
> > that he is still the only active contributor to arrow2 within
> > arrow-rs, but with more overhead that he has to deal with.
> >
> > As for maintaining semantic versioning for arrow2, Andy had a good
> > point that we could still release arrow2 with its own versioning even
> > if we merge it into the arrow-rs repo. So I don't think we should
> > worry/focus too much about versioning in our discussion. Velocity to
> > close the gap between arrow-rs and arrow2 is the most important thing.
> >
> > Lastly, I do agree with Andrew that it would be good to only maintain
> > a single arrow crate in crates.io in the long run. As he mentioned,
> > when the current arrow2 code base becomes stable, we could still
> > release it under the arrow namespace in crates.io with a major version
> > bump. The absolute value in the major version doesn't really matter as
> > long as we stick to the convention that breaking change will result in
> > a major version bump.
> >
> > Thanks,
> > QP
> >
> >
> >
> > On Tue, Aug 3, 2021 at 5:31 PM paddy horan <paddyho...@hotmail.com>
> wrote:
> > >
> > > Hi Jorge,
> > >
> > > I see value in consolidating development in a single repo and releasing
> > under the existing arrow crate.  Regarding versioning, I think once we
> > follow semantic versioning we are fine.  I don't think it's worth
> migrating
> > to a different repo and crate to comply with the de-facto standard you
> > mention.
> > >
> > > Just one person's opinion though,
> > > Paddy
> > >
> > >
> > > -----Original Message-----
> > > From: Jorge Cardoso Leitão <jorgecarlei...@gmail.com>
> > > Sent: Tuesday, August 3, 2021 5:23 PM
> > > To: dev@arrow.apache.org
> > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > >
> > > Hi Paddy,
> > >
> > > > What do you think about moving Arrow2 into the main Arrow repo where
> > > > it
> > > is only enabled via an "experimental" feature flag?
> > >
> > > AFAIK this is already possible:
> > > * add `arrow2 = { version = "0.2.0", optional = true }` to Cargo.toml
> > > * add `#[cfg(feature = "arrow2")]\npub mod arrow2;\n` to lib.rs
> > >
> > > We do this kind of thing to expose APIs from non-arrow crates such as
> > parts of the parquet-format-rs crate, and is generally the way to go
> when a
> > crate wants to expose a third-party API.
> > >
> > > I would not recommend doing this, though: by exposing arrow2 from
> arrow,
> > we double the compilation time and binary size of all dependencies that
> > activate the flag. Furthermore, there are users of arrow2 that do not
> need
> > the arrow crate, which this model would not support.
> > >
> > > AFAIK where development happens is unrelated to this aspect, Rust
> > enables this by design.
> > >
> > > > but also this would be a clear signal that Arrow2 is <1.0.
> > > > the experimental flag will be a clear signal to the existing Arrow
> > > community that Arrow2 is the future but that it is <1.0
> > >
> > > arrow2 is already <1.0 <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bJEw92M9Lz8cxJZ0o3vc0ezpou%2BuQx1S0MYeODKCKmE%3D&amp;reserved=0
> >.
> > My argument is that the arrow/arrow-flight/parquet are not versioned
> > according to the Rust community standards: It is a de facto practice in
> > Rust to delay major releases until the API is stable. Tokio's blog post
> > about their 1.0 <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio.rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=En8p4k7Etyc%2BnQ3mJC4woQD%2Fkt7Uhmhw%2Bzf8scHhdgQ%3D&amp;reserved=0
> >
> > (i.e. "[...] we commit to holding back on a Tokio 2.0 release for at
> least
> > 3 years."). 10 most downloaded
> > > crates:
> > >
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sBxp1XYBLl6OIV57nM%2FGsZO0AmbgyBeRaoPANEvdZGE%3D&amp;reserved=0
> > (0.8.4)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fsyn&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oeQliVwSgrvgART7r49XeiM%2F72TYa7hX8M3QyVDrqsk%3D&amp;reserved=0
> > (1.0.74)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Flibc&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OULOu9vhaWEgnavRqedebM7ceZRsVnaF7YjYuq1MJ3Y%3D&amp;reserved=0
> > (0.2.98)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frand_core&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mx6X86bNRis6UykbWR%2FWTGEgAjq8h6JylmOSAQlfsh0%3D&amp;reserved=0
> > (0.6.3)
> > > * quote (1.0.9)
> > > * unicode-xid (0.2.2)
> > > * proc-macro2 (1.0.28)
> > > * cfg-if (1.0.0)
> > > *
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fserde&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p%2FNgTB0839C1%2F1Zn4GeEnRtvr0hiFhOuBJ5tF76aW5E%3D&amp;reserved=0
> > (1.0.126)
> > > * bitflags (1.2.1)
> > >
> > > These are small crates with a small scope, but even larger projects
> > share the same pattern:
> > >
> > > * crossbeam <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fcrossbeam&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764521997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=9C%2BX5DnKLpp%2F8aTGrmKNB73Jf5JanlL4OhuC0YKgw9s%3D&amp;reserved=0
> >
> > (0.8.1)
> > > * rocket <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Frocket&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Jh93g%2BiXxoeKlTNzhaOKvs3bsBfIJO3DJeetBI3nBV0%3D&amp;reserved=0
> >
> > (0.5)
> > > * polars <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fpolars&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Pdzno7bF3oqviXmv6nxInZemHD1d0SsaxmfdUxJ57T0%3D&amp;reserved=0
> >
> > (0.14.8)
> > > * tower <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftower&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AmUGvrzXd8giphnKq0FNwjnc4a4Ki3T3GJL3P8rvEeM%3D&amp;reserved=0
> >
> > (0.4.8)
> > > * Tokio <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Ftokio&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z%2FqBVQ%2Fi0BCmSJiBL7E6y%2F%2BbMVGKYXdo3oCRGOjm5UA%3D&amp;reserved=0
> >
> > (1.9.0)
> > > * hyper <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcrates.io%2Fcrates%2Fhyper&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=c%2Fy4eY0BQCXE8XIoSb6UZAVUx4U%2BwcRUKN9jGJs5v3w%3D&amp;reserved=0
> >
> > (0.14.11)
> > >
> > > Crates that arrow depends on
> > > <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-rs%2Fblob%2Fmaster%2Farrow%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=DdGZFC5Hf7i362%2FmhfFQUVVPnkDBJzw0zM6AzQ4jgcQ%3D&amp;reserved=0
> > >,
> > > that DataFusion
> > > depends on
> > > <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow-datafusion%2Fblob%2Fmaster%2Fdatafusion%2FCargo.toml&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OXKyW4O6q4hn6ZCHTN2jIvJpI3Iv8JvBBa0zKzBgZag%3D&amp;reserved=0
> > >,
> > > all share the same pattern of being either 0.X, 1.X when their API is
> > stable, and 2.X when they needed a large change in the API. This
> contrasts
> > with Apache Arrow's releases where we are now at 5.0 (and we have yet to
> > arrive at a safe design).
> > >
> > > > existing users will be well supported in this transition
> > >
> > > How so? imo people either PR to the arrow/arrow2 code base or they
> won't.
> > > This is largely independent of where the development of either arrow2
> or
> > arrow happens; people google the crate, click on the repository link and
> > file an issue or field a PR.
> > >
> > > > In general, I think the longer that development proceeds in separate
> > > repos the harder it will be to eventually merge the two in a way that
> > supports existing users.
> > >
> > > How so? I may be mistaken, but API design is unrelated to on which repo
> > the development happens: it is primarily driven by who is designing it
> and
> > from where or who they are inspired by. Both arrow and parquet's crate
> > design are inspired by the C++ implementation and have gradually been
> > migrated to "idiomatic" Rust, as "idiomatic" is becoming more well
> defined
> > in Rust.
> > > Arrow2 is inspired by the current crate and the pains of using it in
> > DataFusion. Datafuse, a fork of datafusion, recently migrated to arrow2
> > > <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatafuselabs%2Fdatafuse%2Fpull%2F1239&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0W9AeIxXcAvCrXkOE%2F1h0o%2BWam15PHEP7Pf7U1L84As%3D&amp;reserved=0
> >:
> > +1,947 −3,484, which shows that the crate is capturing important patterns
> > from the arrow crate and exposing ones that are useful / result in less
> > code for the same or higher performance.
> > >
> > > On the opposite side, merging the development of crates under the same
> > repo leads to: more triagging of PRs; more work for releases and
> > changelogging; tagging based on crates; multiple READMEs in subpaths of
> the
> > repo, curation of the CI to accommodate this, a workspace with many
> crates
> > each with its own set of dependencies, increasing compilation and
> > development; mixed commit logs, difficulties in reverts and cherry-picks;
> > more difficult to find stuff in the repo. See e.g. how tokio-rs does it:
> > >
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=nZUiKNr1DmeTNJLqiZgKX5P7nb6jt0OuZlufMywmDBE%3D&amp;reserved=0
> ,
> > even for small crates like bytes <
> >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftokio-rs%2Fbytes&amp;data=04%7C01%7C%7Ca37de2cddc6e447a777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ltf66TZejbomCtlqvhmDswFfdrunChIz5rDTeZzwyRU%3D&amp;reserved=0
> > >.
> > >
> > > Best,
> > > Jorge
> > >
> > > On Tue, Aug 3, 2021 at 3:13 PM paddy horan <paddyho...@hotmail.com>
> > wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > What do you think about moving Arrow2 into the main Arrow repo where
> > > > it is only enabled via an "experimental" feature flag?  This would
> > > > allow development of Arrow2 to proceed in the main repo but also this
> > > > would be a clear signal that Arrow2 is <1.0.  When we feel ready
> (i.e.
> > > > Arrow2 is 1.0) we can release it in the next main release with Arrow2
> > > > being the default and move the existing implementation behind a
> > "legacy" feature flag.
> > > >
> > > > Here is why I think this might work well:
> > > >  - People contributing to the Arrow project will naturally contribute
> > > > to Arrow2.  At the moment, some people will still contribute to Arrow
> > > > instead of Arrow2 just by virtue of it being the "official"
> > implementation.
> > > > However, if both are in one repo people will want to contribute to
> the
> > > > "future", i.e. Arrow2.
> > > >  - the experimental flag will be a clear signal to the existing Arrow
> > > > community that Arrow2 is the future but that it is <1.0
> > > >  - existing users will be well supported in this transition
> > > >  - In general, I think the longer that development proceeds in
> > > > separate repos the harder it will be to eventually merge the two in a
> > > > way that supports existing users.
> > > >
> > > > Do you think would work?
> > > >
> > > > Paddy
> > > >
> > > > -----Original Message-----
> > > > From: Jorge Cardoso Leitão <jorgecarlei...@gmail.com>
> > > > Sent: Monday, August 2, 2021 1:59 PM
> > > > To: dev@arrow.apache.org
> > > > Subject: Re: [Discuss] [Rust] Arrow2/parquet2 going foward
> > > >
> > > > Hi,
> > > >
> > > > Sorry for the delay.
> > > >
> > > > If there is a path towards an official release under a <1.0.0
> > > > versioning schema aligned with the rest of the Rust ecosystem and in
> > > > line with the stability of the API, then IMO we should move all
> > > > development to within Apache experimental asap (I can handle this and
> > > > the likely IP clearance round). If we require a release >=1.X.Y to it
> > > > and/or a schedule, then I prefer to keep expectations aligned and
> > postpone any movement.
> > > >
> > > > Under the move situation, I was thinking in something as follows:
> > > >
> > > > * gradually stop maintaining "arrow" in crates, offering a
> maintenance
> > > > window over which we release patches (*)
> > > > * work towards achieving feature parity on arrow2/parquet2 on the
> > > > experimental repos.
> > > > * keep releasing arrow2/parquet2 under a 0.X model during the step
> > > > above
> > > > (**)
> > > > * migrate to arrow-rs and archive experimentals (***)
> > > > * break arrow2 in smaller crates so that we can version the APIs at a
> > > > different cadence
> > > > * once a crate reaches some stability (this is always opinionated,
> but
> > > > it is fine), we bump it to 1.0 and announce a maintenance plan ala
> > > > tokio <
> > > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftokio
> > > >
> .rs%2Fblog%2F2020-12-tokio-1-0&amp;data=04%7C01%7C%7Ca37de2cddc6e447a7
> > > >
> 77b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637636225
> > > >
> 764531989%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIi
> > > >
> LCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oHPQI8MeSumgLTEsawCkRN
> > > > 5hANft%2BkbLTEmLZ3pIDiU%3D&amp;reserved=0
> > > > >.
> > > >
> > > > (*) e.g. "we will continue to patch the arrow crate up to at least 6
> > > > months starting after the first release of arrow2 that supports
> > > > a) nested parquet read and write
> > > > b) union array (including IPC integration tests)
> > > > c) map array (including IPC integration tests)"
> > > >
> > > > (**) officially or un-officially (I would suggest officially so that
> > > > we can acknowledge everyone's work on it, but no strong feelings)
> > > >
> > > > (***) something like:
> > > > 1. place arrow2 on top of a clear arrow repo so that the full
> > > > contribution history up to that point preserved 2. make arrow-rs the
> > > > home of arrow2 (i.e. we start releasing arrow2 from
> > > > arrow-rs) and archive the experimental repos; create arrow-rs-parquet
> > > > or something for parquet2.
> > > >
> > > > In summary, the core pain point for me is the current versioning of
> > > > arrow, which I feel is incompatible with my goals for arrow2 and the
> > > > ecosystem I envision it supporting :)
> > > >
> > > > Best,
> > > > Jorge
> > > >
> > > > On Fri, Jul 30, 2021 at 8:44 PM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > >
> > > > > I think it would also be fine to push "beta" arrow2 crates out of a
> > > > > repo under apache/ so long as they are not marked on crates.io as
> > > > > being Apache-official releases. There's a possible slippery slope
> > > > > there, but as long as we are on a path to formalizing the releases
> I
> > > > think it is okay.
> > > > >
> > > > > On Fri, Jul 30, 2021 at 1:07 PM Andrew Lamb <al...@influxdata.com>
> > > > wrote:
> > > > >
> > > > > > Jorge -- do you feel like we have a resolution on what to do with
> > > > > > arrow2
> > > > > in
> > > > > > the near term?
> > > > > >
> > > > > > The current state of affairs seems to me that arrow2 is released
> > > > > > from
> > > > > >
> > > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > > > b.com
> %2Fjorgecarleitao%2Farrow2&amp;data=04%7C01%7C%7Ca37de2cddc6e447a
> > > >
> 777b08d956c4dbce%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C63763622
> > > >
> 5764541982%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzI
> > > >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNo5puUzWEOmWj3wIs8CN
> > > > p44WmsoaRQGfsRdWgrftwE%3D&amp;reserved=0
> > > > to crates.io (which is fine).
> > > > > > Are
> > > > > > you happy with keeping development in the jorgecarleitao repo
> > > > > > where you will retain maximal control and flexibility until it is
> > > > > > ready to start integrating?
> > > > > >
> > > > > > Or would you prefer to put it into one of the apache repos and
> > > > > > subject
> > > > > its
> > > > > > development and release to the normal Arrow governance model
> > > > > > (tarball, vote, etc)?
> > > > > >
> > > > > > Since you are the primary author/architect I think you should
> have
> > > > > > a substantial say at this stage.
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 27, 2021 at 7:16 PM Andrew Lamb <
> al...@influxdata.com>
> > > > > wrote:
> > > > > >
> > > > > > > I would be happy with this approach. Thank you for the
> > > > > > > suggestion
> > > > > > >
> > > > > > > This hybrid approach of both arrow and arrow2 in the same repo
> > > > > > > seems better to me than separate repos.
> > > > > > >
> > > > > > > What I really care about is ensuring we don't have two
> > > > > > > crates/APIs indefinitely -- as long as we are continually
> making
> > > > > > > progress towards unification that is what is important to me.
> > > > > > >
> > > > > > > Andrew
> > > > > > >
> > > > > > > On Tue, Jul 27, 2021 at 1:40 PM Andy Grove
> > > > > > > <andygrov...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Apologies for being late to this discussion.
> > > > > > >>
> > > > > > >> There is a hybrid option to consider here where we add the
> > > > > > >> arrow2 code into the arrow crate as a separate module, so we
> > > > > > >> release one crate
> > > > > containing
> > > > > > >> the "old" API (which we can mark as deprecated) as well as the
> > > > > > >> new
> > > > > API.
> > > > > > >> Java did a similar thing a long time ago with "java.io"
> versus
> > > > > > "java.nio"
> > > > > > >> (new IO).
> > > > > > >>
> > > > > > >> I agree that the versioning wouldn't be ideal, but this seems
> > > > > > >> like it might be a pragmatic compromise?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Andy.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Tue, Jul 20, 2021 at 5:41 AM Andrew Lamb
> > > > > > >> <al...@influxdata.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > What I meant is that when you decide arrow2 is suitable for
> > > > > > >> > release
> > > > > to
> > > > > > >> > existing arrow users, I stand ready to help you incorporate
> > > > > > >> > it into
> > > > > > >> arrow.
> > > > > > >> >
> > > > > > >> > All the feedback I have heard so far from the rest of the
> > > > > > >> > community
> > > > > is
> > > > > > >> that
> > > > > > >> > we are ready. One might even say we are anxious to do so :)
> > > > > > >> >
> > > > > > >> > Andrew
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Reply via email to