Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-24 Thread Sutou Kouhei
Hi,

Looks reasonable. I'll start voting next week.


Thanks,
-- 
kou

In 
  "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Mon, 20 Sep 
2021 23:25:54 -0700,
  QP Hou  wrote:

> To expedite the donation, perhaps we could move on with the decoupled
> version scheme for now to reduce workload and disruption to the
> existing users. The julia maintainers can always decide to change the
> versioning scheme later after the donation has been completed. This
> doesn't seem like a blocker issue to me.
> 
> On Mon, Sep 20, 2021 at 8:09 PM Sutou Kouhei  wrote:
>>
>> Hi Jacob,
>>
>> Thanks for confirming this.
>>
>> For major release:
>>
>> As far as I know:
>>
>> We chose this style because we will develop actively in at
>> least a few years. Active development will need API breaking
>> changes. So we release a major version per 3-4 months.
>>
>> Our release process releases all implementations at once
>> before we chose this style. We just didn't change it. Some
>> implementations don't have API breaking changes between
>> major releases. But we just don't care it.
>>
>> Aligned versions for all implementations may have a merit
>> for users. Users can assume that it's safe that they use
>> Apache Arrow C++ 6.0.0 and Apache Arrow Rust 6.0.0. (We have
>> integration tests for implementations with the same version.)
>>
>> References:
>>
>>   * Discussion: [Discuss] Compatibility Guarantees and Versioning Post 
>> "1.0.0"
>> 
>> https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8%40%3Cdev.arrow.apache.org%3E
>>
>>   * Vote: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for 
>> Arrow 1.0.0 and beyond
>> 
>> https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E
>>
>>   * Follow-up thread: Versioning of arrow
>> 
>> https://lists.apache.org/thread.html/rb11c0839a7167c2f1d82b0b77134c53abc5487e9165c3493b55db12b%40%3Cdev.arrow.apache.org%3E
>>
>>
>> My opinion:
>>
>> I have no opinion on this. I don't object that the Julia
>> implementation uses separated version.
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 16 
>> Sep 2021 23:47:45 -0600,
>>   Jacob Quinn  wrote:
>>
>> > Good question.
>> >
>> > In my mind, I was imagining the arrow-julia repo would have a fully
>> > decoupled versioning from the main arrow project. This comes from my
>> > understanding that the julia implementation is it's own "project" that
>> > implements the arrow spec/format, and we may need a breaking major release
>> > at different cadences than the main spec version. Indeed, while the arrow
>> > project has gone from 2.0 -> 6.0 since the julia implementation was first
>> > released, we're just now releasing our own 2.0.0 version after a change in
>> > API for how metadata is set/retrieved on table/column objects.
>> >
>> > I'll admit that it's not entirely clear to me how to best signal/implement
>> > coordination between the main arrow project versions and the julia version
>> > though. I'm just guessing here, but is that why the main arrow project does
>> > so frequent major version releases? To account for any child
>> > implementations happening to have breaking changes? I think I remember
>> > discussion recently around moving the actual spec/format document out as a
>> > separate repo or at least versioning it separately from all the various
>> > implementations, and that seems like it would be a good idea, though I
>> > guess the format itself has versioning builtin to itself. It's certainly
>> > something we can clarify in the Julia package itself; i.e. which version of
>> > the spec a given Julia package version is compatible with. Typically with
>> > other julia package dependencies, just a minor version increment is
>> > required when a new breaking dependency version is upgraded, so I would
>> > think we could follow something similar by treating the arrow format as a
>> > "dependency".
>> >
>> > I'll clarify that I don't feel very strongly on these points, so if there's
>> > something I'm missing or gaps in my understanding of how the rest of the
>> > web of projects are coordinating things, I'm all ears.
>> >
>> > -Jacob
>> &

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-21 Thread QP Hou
To expedite the donation, perhaps we could move on with the decoupled
version scheme for now to reduce workload and disruption to the
existing users. The julia maintainers can always decide to change the
versioning scheme later after the donation has been completed. This
doesn't seem like a blocker issue to me.

On Mon, Sep 20, 2021 at 8:09 PM Sutou Kouhei  wrote:
>
> Hi Jacob,
>
> Thanks for confirming this.
>
> For major release:
>
> As far as I know:
>
> We chose this style because we will develop actively in at
> least a few years. Active development will need API breaking
> changes. So we release a major version per 3-4 months.
>
> Our release process releases all implementations at once
> before we chose this style. We just didn't change it. Some
> implementations don't have API breaking changes between
> major releases. But we just don't care it.
>
> Aligned versions for all implementations may have a merit
> for users. Users can assume that it's safe that they use
> Apache Arrow C++ 6.0.0 and Apache Arrow Rust 6.0.0. (We have
> integration tests for implementations with the same version.)
>
> References:
>
>   * Discussion: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"
> 
> https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8%40%3Cdev.arrow.apache.org%3E
>
>   * Vote: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for 
> Arrow 1.0.0 and beyond
> 
> https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E
>
>   * Follow-up thread: Versioning of arrow
> 
> https://lists.apache.org/thread.html/rb11c0839a7167c2f1d82b0b77134c53abc5487e9165c3493b55db12b%40%3Cdev.arrow.apache.org%3E
>
>
> My opinion:
>
> I have no opinion on this. I don't object that the Julia
> implementation uses separated version.
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 16 Sep 
> 2021 23:47:45 -0600,
>   Jacob Quinn  wrote:
>
> > Good question.
> >
> > In my mind, I was imagining the arrow-julia repo would have a fully
> > decoupled versioning from the main arrow project. This comes from my
> > understanding that the julia implementation is it's own "project" that
> > implements the arrow spec/format, and we may need a breaking major release
> > at different cadences than the main spec version. Indeed, while the arrow
> > project has gone from 2.0 -> 6.0 since the julia implementation was first
> > released, we're just now releasing our own 2.0.0 version after a change in
> > API for how metadata is set/retrieved on table/column objects.
> >
> > I'll admit that it's not entirely clear to me how to best signal/implement
> > coordination between the main arrow project versions and the julia version
> > though. I'm just guessing here, but is that why the main arrow project does
> > so frequent major version releases? To account for any child
> > implementations happening to have breaking changes? I think I remember
> > discussion recently around moving the actual spec/format document out as a
> > separate repo or at least versioning it separately from all the various
> > implementations, and that seems like it would be a good idea, though I
> > guess the format itself has versioning builtin to itself. It's certainly
> > something we can clarify in the Julia package itself; i.e. which version of
> > the spec a given Julia package version is compatible with. Typically with
> > other julia package dependencies, just a minor version increment is
> > required when a new breaking dependency version is upgraded, so I would
> > think we could follow something similar by treating the arrow format as a
> > "dependency".
> >
> > I'll clarify that I don't feel very strongly on these points, so if there's
> > something I'm missing or gaps in my understanding of how the rest of the
> > web of projects are coordinating things, I'm all ears.
> >
> > -Jacob
> >
> > On Thu, Sep 16, 2021 at 11:24 PM Sutou Kouhei  wrote:
> >
> >> Hi,
> >>
> >> Good point! Jacob, could you confirm this?
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Sat, 11
> >> Sep 2021 16:57:17 -0700,
> >>   QP Hou  wrote:
> >>
> >> > Just one minor point to confirm and clarify. It looks like Julia arrow
> >> only
> >> > wants 

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-20 Thread Sutou Kouhei
Hi Jacob,

Thanks for confirming this.

For major release:

As far as I know:

We chose this style because we will develop actively in at
least a few years. Active development will need API breaking
changes. So we release a major version per 3-4 months.

Our release process releases all implementations at once
before we chose this style. We just didn't change it. Some
implementations don't have API breaking changes between
major releases. But we just don't care it.

Aligned versions for all implementations may have a merit
for users. Users can assume that it's safe that they use
Apache Arrow C++ 6.0.0 and Apache Arrow Rust 6.0.0. (We have
integration tests for implementations with the same version.)

References:

  * Discussion: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

https://lists.apache.org/thread.html/5715a4d402c835d22d929a8069c5c0cf232077a660ee98639d544af8%40%3Cdev.arrow.apache.org%3E

  * Vote: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for 
Arrow 1.0.0 and beyond

https://lists.apache.org/thread.html/2a630234214e590eb184c24bbf9dac4a8d8f7677d85a75fa49d70ba8%40%3Cdev.arrow.apache.org%3E

  * Follow-up thread: Versioning of arrow

https://lists.apache.org/thread.html/rb11c0839a7167c2f1d82b0b77134c53abc5487e9165c3493b55db12b%40%3Cdev.arrow.apache.org%3E


My opinion:

I have no opinion on this. I don't object that the Julia
implementation uses separated version.


Thanks,
-- 
kou

In 
  "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 16 Sep 
2021 23:47:45 -0600,
  Jacob Quinn  wrote:

> Good question.
> 
> In my mind, I was imagining the arrow-julia repo would have a fully
> decoupled versioning from the main arrow project. This comes from my
> understanding that the julia implementation is it's own "project" that
> implements the arrow spec/format, and we may need a breaking major release
> at different cadences than the main spec version. Indeed, while the arrow
> project has gone from 2.0 -> 6.0 since the julia implementation was first
> released, we're just now releasing our own 2.0.0 version after a change in
> API for how metadata is set/retrieved on table/column objects.
> 
> I'll admit that it's not entirely clear to me how to best signal/implement
> coordination between the main arrow project versions and the julia version
> though. I'm just guessing here, but is that why the main arrow project does
> so frequent major version releases? To account for any child
> implementations happening to have breaking changes? I think I remember
> discussion recently around moving the actual spec/format document out as a
> separate repo or at least versioning it separately from all the various
> implementations, and that seems like it would be a good idea, though I
> guess the format itself has versioning builtin to itself. It's certainly
> something we can clarify in the Julia package itself; i.e. which version of
> the spec a given Julia package version is compatible with. Typically with
> other julia package dependencies, just a minor version increment is
> required when a new breaking dependency version is upgraded, so I would
> think we could follow something similar by treating the arrow format as a
> "dependency".
> 
> I'll clarify that I don't feel very strongly on these points, so if there's
> something I'm missing or gaps in my understanding of how the rest of the
> web of projects are coordinating things, I'm all ears.
> 
> -Jacob
> 
> On Thu, Sep 16, 2021 at 11:24 PM Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> Good point! Jacob, could you confirm this?
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Sat, 11
>> Sep 2021 16:57:17 -0700,
>>   QP Hou  wrote:
>>
>> > Just one minor point to confirm and clarify. It looks like Julia arrow
>> only
>> > wants to do on demand minor and patch releases. Major version release
>> still
>> > needs to be aligned with the main arrow release schedule, is that
>> correct?
>> > In other words, breaking changes should be avoided in on demand releases
>> > (assuming they are using semantic versioning).
>> >
>> > From the original julia donation thread, I got the impression that the
>> > julia maintainers wanted to have their own versioning scheme. Maybe
>> that’s
>> > not the case anymore. So I wanted to make sure we set the right
>> expectation
>> > for Julia maintainers.
>> >
>> > FWIW, Arrow-rs today aligns the major version with the main arrow
>> release,
>> > so Andrew spend quite a bit of time maintaining an active release branch
>> to
>

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-16 Thread Jacob Quinn
Good question.

In my mind, I was imagining the arrow-julia repo would have a fully
decoupled versioning from the main arrow project. This comes from my
understanding that the julia implementation is it's own "project" that
implements the arrow spec/format, and we may need a breaking major release
at different cadences than the main spec version. Indeed, while the arrow
project has gone from 2.0 -> 6.0 since the julia implementation was first
released, we're just now releasing our own 2.0.0 version after a change in
API for how metadata is set/retrieved on table/column objects.

I'll admit that it's not entirely clear to me how to best signal/implement
coordination between the main arrow project versions and the julia version
though. I'm just guessing here, but is that why the main arrow project does
so frequent major version releases? To account for any child
implementations happening to have breaking changes? I think I remember
discussion recently around moving the actual spec/format document out as a
separate repo or at least versioning it separately from all the various
implementations, and that seems like it would be a good idea, though I
guess the format itself has versioning builtin to itself. It's certainly
something we can clarify in the Julia package itself; i.e. which version of
the spec a given Julia package version is compatible with. Typically with
other julia package dependencies, just a minor version increment is
required when a new breaking dependency version is upgraded, so I would
think we could follow something similar by treating the arrow format as a
"dependency".

I'll clarify that I don't feel very strongly on these points, so if there's
something I'm missing or gaps in my understanding of how the rest of the
web of projects are coordinating things, I'm all ears.

-Jacob

On Thu, Sep 16, 2021 at 11:24 PM Sutou Kouhei  wrote:

> Hi,
>
> Good point! Jacob, could you confirm this?
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Sat, 11
> Sep 2021 16:57:17 -0700,
>   QP Hou  wrote:
>
> > Just one minor point to confirm and clarify. It looks like Julia arrow
> only
> > wants to do on demand minor and patch releases. Major version release
> still
> > needs to be aligned with the main arrow release schedule, is that
> correct?
> > In other words, breaking changes should be avoided in on demand releases
> > (assuming they are using semantic versioning).
> >
> > From the original julia donation thread, I got the impression that the
> > julia maintainers wanted to have their own versioning scheme. Maybe
> that’s
> > not the case anymore. So I wanted to make sure we set the right
> expectation
> > for Julia maintainers.
> >
> > FWIW, Arrow-rs today aligns the major version with the main arrow
> release,
> > so Andrew spend quite a bit of time maintaining an active release branch
> to
> > backport backwards compatible commits for minor and patch releases.
> > Datadusion and ballista on the other hand has a versioning scheme that’s
> > fully decoupled from the main Arrow version including the major version.
> >
> > On Thu, Sep 9, 2021 at 1:38 PM Sutou Kouhei  wrote:
> >
> >> Hi,
> >>
> >> Thanks for all comments about release schedule.
> >>
> >> Let's use release-on-demand approach based on
> >> arrow-datafusion's flow for the Julia Arrow implementation.
> >>
> >> Do we have more items to be discussed? Can we start voting?
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >> In 
> >>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 9
> >> Sep 2021 09:48:57 -0400,
> >>   Andrew Lamb  wrote:
> >>
> >> > I also think release on demand is a good strategy.
> >> >
> >> > The primary reasons to do an arrow-rs release every 2 weeks were:
> >> > 1. To have predictable cadence into downstream projects (e.g.
> datafusion
> >> > and others)
> >> > 2. Amortize the overhead associated with each release (the process is
> non
> >> > trivial and the current 72 hour voting window adds some backpressure
> as
> >> > well -- I remember Wes may have said windows shorter than 72 hours
> might
> >> be
> >> > fine too)
> >> >
> >> >
> >> > On Wed, Sep 8, 2021 at 12:19 AM QP Hou 
> wrote:
> >> >
> >> >> A minor note on the Rust side of things. arrow-rs has a 2 weeks
> >> >> release cycle, but arrow-datafusion mostly does release on demand at
> >>

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-16 Thread Sutou Kouhei
Hi,

Good point! Jacob, could you confirm this?


Thanks,
-- 
kou

In 
  "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Sat, 11 Sep 
2021 16:57:17 -0700,
  QP Hou  wrote:

> Just one minor point to confirm and clarify. It looks like Julia arrow only
> wants to do on demand minor and patch releases. Major version release still
> needs to be aligned with the main arrow release schedule, is that correct?
> In other words, breaking changes should be avoided in on demand releases
> (assuming they are using semantic versioning).
> 
> From the original julia donation thread, I got the impression that the
> julia maintainers wanted to have their own versioning scheme. Maybe that’s
> not the case anymore. So I wanted to make sure we set the right expectation
> for Julia maintainers.
> 
> FWIW, Arrow-rs today aligns the major version with the main arrow release,
> so Andrew spend quite a bit of time maintaining an active release branch to
> backport backwards compatible commits for minor and patch releases.
> Datadusion and ballista on the other hand has a versioning scheme that’s
> fully decoupled from the main Arrow version including the major version.
> 
> On Thu, Sep 9, 2021 at 1:38 PM Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> Thanks for all comments about release schedule.
>>
>> Let's use release-on-demand approach based on
>> arrow-datafusion's flow for the Julia Arrow implementation.
>>
>> Do we have more items to be discussed? Can we start voting?
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 9
>> Sep 2021 09:48:57 -0400,
>>   Andrew Lamb  wrote:
>>
>> > I also think release on demand is a good strategy.
>> >
>> > The primary reasons to do an arrow-rs release every 2 weeks were:
>> > 1. To have predictable cadence into downstream projects (e.g. datafusion
>> > and others)
>> > 2. Amortize the overhead associated with each release (the process is non
>> > trivial and the current 72 hour voting window adds some backpressure as
>> > well -- I remember Wes may have said windows shorter than 72 hours might
>> be
>> > fine too)
>> >
>> >
>> > On Wed, Sep 8, 2021 at 12:19 AM QP Hou  wrote:
>> >
>> >> A minor note on the Rust side of things. arrow-rs has a 2 weeks
>> >> release cycle, but arrow-datafusion mostly does release on demand at
>> >> the moment. Our most uptodate release processes are documented at [1]
>> >> and [2].
>> >>
>> >> [1]:
>> https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
>> >> [2]:
>> >>
>> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md
>> >>
>> >> On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn 
>> wrote:
>> >> >
>> >> > Thanks kou.
>> >> >
>> >> > I think the TODO action list looks good.
>> >> >
>> >> > The one point I think could use some additional discussion is around
>> the
>> >> > release cadence: it IS desirable to be able to release more frequently
>> >> than
>> >> > the parent repo 3-4 month cadence. But we also haven't had the
>> frequency
>> >> of
>> >> > commits to necessarily warrant a release every 2 weeks. I can think of
>> >> two
>> >> > possible options, not sure if one or the other would be more
>> compatible
>> >> > with the apache release process:
>> >> >
>> >> > 1) Allow for release-on-demand; this is idiomatic for most Julia
>> packages
>> >> > I'm aware of. When a particular bug is fixed, or feature added, a user
>> >> can
>> >> > request a release, a little discussion happens, and a new release is
>> >> made.
>> >> > This approach would work well for the "bursty" kind of contributions
>> >> we've
>> >> > seen to Arrow.jl where development by certain people will happen
>> >> frequently
>> >> > for a while, then take a break for other things. This also avoids
>> having
>> >> > "scheduled" releases (every 2 weeks, 3 months, etc.) where there
>> hasn't
>> >> > been significant updates to necessarily warrant a new release. This
>> >> > approach may also facilitate differentiating between bugfix (patch)
>> >> > releases vs. new functionality r

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-11 Thread QP Hou
Just one minor point to confirm and clarify. It looks like Julia arrow only
wants to do on demand minor and patch releases. Major version release still
needs to be aligned with the main arrow release schedule, is that correct?
In other words, breaking changes should be avoided in on demand releases
(assuming they are using semantic versioning).

>From the original julia donation thread, I got the impression that the
julia maintainers wanted to have their own versioning scheme. Maybe that’s
not the case anymore. So I wanted to make sure we set the right expectation
for Julia maintainers.

FWIW, Arrow-rs today aligns the major version with the main arrow release,
so Andrew spend quite a bit of time maintaining an active release branch to
backport backwards compatible commits for minor and patch releases.
Datadusion and ballista on the other hand has a versioning scheme that’s
fully decoupled from the main Arrow version including the major version.

On Thu, Sep 9, 2021 at 1:38 PM Sutou Kouhei  wrote:

> Hi,
>
> Thanks for all comments about release schedule.
>
> Let's use release-on-demand approach based on
> arrow-datafusion's flow for the Julia Arrow implementation.
>
> Do we have more items to be discussed? Can we start voting?
>
>
> Thanks,
> --
> kou
>
> In 
>   "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 9
> Sep 2021 09:48:57 -0400,
>   Andrew Lamb  wrote:
>
> > I also think release on demand is a good strategy.
> >
> > The primary reasons to do an arrow-rs release every 2 weeks were:
> > 1. To have predictable cadence into downstream projects (e.g. datafusion
> > and others)
> > 2. Amortize the overhead associated with each release (the process is non
> > trivial and the current 72 hour voting window adds some backpressure as
> > well -- I remember Wes may have said windows shorter than 72 hours might
> be
> > fine too)
> >
> >
> > On Wed, Sep 8, 2021 at 12:19 AM QP Hou  wrote:
> >
> >> A minor note on the Rust side of things. arrow-rs has a 2 weeks
> >> release cycle, but arrow-datafusion mostly does release on demand at
> >> the moment. Our most uptodate release processes are documented at [1]
> >> and [2].
> >>
> >> [1]:
> https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
> >> [2]:
> >>
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md
> >>
> >> On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn 
> wrote:
> >> >
> >> > Thanks kou.
> >> >
> >> > I think the TODO action list looks good.
> >> >
> >> > The one point I think could use some additional discussion is around
> the
> >> > release cadence: it IS desirable to be able to release more frequently
> >> than
> >> > the parent repo 3-4 month cadence. But we also haven't had the
> frequency
> >> of
> >> > commits to necessarily warrant a release every 2 weeks. I can think of
> >> two
> >> > possible options, not sure if one or the other would be more
> compatible
> >> > with the apache release process:
> >> >
> >> > 1) Allow for release-on-demand; this is idiomatic for most Julia
> packages
> >> > I'm aware of. When a particular bug is fixed, or feature added, a user
> >> can
> >> > request a release, a little discussion happens, and a new release is
> >> made.
> >> > This approach would work well for the "bursty" kind of contributions
> >> we've
> >> > seen to Arrow.jl where development by certain people will happen
> >> frequently
> >> > for a while, then take a break for other things. This also avoids
> having
> >> > "scheduled" releases (every 2 weeks, 3 months, etc.) where there
> hasn't
> >> > been significant updates to necessarily warrant a new release. This
> >> > approach may also facilitate differentiating between bugfix (patch)
> >> > releases vs. new functionality releases (minor), since when a release
> is
> >> > requested, it could be specified whether it should be patch or minor
> (or
> >> > major).
> >> >
> >> > 2) Commit to a scheduled release pattern like every 2 weeks, once a
> >> month,
> >> > etc. This has the advantage of consistency and clearer expectations
> for
> >> > users/devs involved. A release also doesn't need to be requested,
> because
> >> > we can just wait for the scheduled time to release. In terms of the
> >> > "unnecessary r

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-09 Thread Sutou Kouhei
Hi,

Thanks for all comments about release schedule.

Let's use release-on-demand approach based on
arrow-datafusion's flow for the Julia Arrow implementation.

Do we have more items to be discussed? Can we start voting?


Thanks,
-- 
kou

In 
  "Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?" on Thu, 9 Sep 
2021 09:48:57 -0400,
  Andrew Lamb  wrote:

> I also think release on demand is a good strategy.
> 
> The primary reasons to do an arrow-rs release every 2 weeks were:
> 1. To have predictable cadence into downstream projects (e.g. datafusion
> and others)
> 2. Amortize the overhead associated with each release (the process is non
> trivial and the current 72 hour voting window adds some backpressure as
> well -- I remember Wes may have said windows shorter than 72 hours might be
> fine too)
> 
> 
> On Wed, Sep 8, 2021 at 12:19 AM QP Hou  wrote:
> 
>> A minor note on the Rust side of things. arrow-rs has a 2 weeks
>> release cycle, but arrow-datafusion mostly does release on demand at
>> the moment. Our most uptodate release processes are documented at [1]
>> and [2].
>>
>> [1]: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
>> [2]:
>> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md
>>
>> On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn  wrote:
>> >
>> > Thanks kou.
>> >
>> > I think the TODO action list looks good.
>> >
>> > The one point I think could use some additional discussion is around the
>> > release cadence: it IS desirable to be able to release more frequently
>> than
>> > the parent repo 3-4 month cadence. But we also haven't had the frequency
>> of
>> > commits to necessarily warrant a release every 2 weeks. I can think of
>> two
>> > possible options, not sure if one or the other would be more compatible
>> > with the apache release process:
>> >
>> > 1) Allow for release-on-demand; this is idiomatic for most Julia packages
>> > I'm aware of. When a particular bug is fixed, or feature added, a user
>> can
>> > request a release, a little discussion happens, and a new release is
>> made.
>> > This approach would work well for the "bursty" kind of contributions
>> we've
>> > seen to Arrow.jl where development by certain people will happen
>> frequently
>> > for a while, then take a break for other things. This also avoids having
>> > "scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't
>> > been significant updates to necessarily warrant a new release. This
>> > approach may also facilitate differentiating between bugfix (patch)
>> > releases vs. new functionality releases (minor), since when a release is
>> > requested, it could be specified whether it should be patch or minor (or
>> > major).
>> >
>> > 2) Commit to a scheduled release pattern like every 2 weeks, once a
>> month,
>> > etc. This has the advantage of consistency and clearer expectations for
>> > users/devs involved. A release also doesn't need to be requested, because
>> > we can just wait for the scheduled time to release. In terms of the
>> > "unnecessary releases" mentioned above, it could be as simple as
>> > "cancelling" a release if there hasn't been significant updates in the
>> > elapsed time period.
>> >
>> > My preference would be for 1), but that's influenced from what I'm
>> familiar
>> > with in the Julia package ecosystem. It seems like it would still fit in
>> > the apache way since we would formally request a new release, wait the
>> > elapsed amount of time for voting (24 hours would be preferrable), then
>> at
>> > the end of the voting period, a new release could be made.
>> >
>> > Thanks again kou for helping support the Julia implementation here.
>> >
>> > -Jacob
>> >
>> > 2)
>> >
>> > On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei  wrote:
>> >
>> > > Hi,
>> > >
>> > > Sorry for the delay. This is a continuation of the "Status
>> > > of Arrow Julia implementation?" thread:
>> > >
>> > >
>> > >
>> https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E
>> > >
>> > > I summarize the current status, the next actions and items
>> > > to be discussed.
>> > >
>> > > The current status:
>

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-09 Thread Andrew Lamb
I also think release on demand is a good strategy.

The primary reasons to do an arrow-rs release every 2 weeks were:
1. To have predictable cadence into downstream projects (e.g. datafusion
and others)
2. Amortize the overhead associated with each release (the process is non
trivial and the current 72 hour voting window adds some backpressure as
well -- I remember Wes may have said windows shorter than 72 hours might be
fine too)


On Wed, Sep 8, 2021 at 12:19 AM QP Hou  wrote:

> A minor note on the Rust side of things. arrow-rs has a 2 weeks
> release cycle, but arrow-datafusion mostly does release on demand at
> the moment. Our most uptodate release processes are documented at [1]
> and [2].
>
> [1]: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
> [2]:
> https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md
>
> On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn  wrote:
> >
> > Thanks kou.
> >
> > I think the TODO action list looks good.
> >
> > The one point I think could use some additional discussion is around the
> > release cadence: it IS desirable to be able to release more frequently
> than
> > the parent repo 3-4 month cadence. But we also haven't had the frequency
> of
> > commits to necessarily warrant a release every 2 weeks. I can think of
> two
> > possible options, not sure if one or the other would be more compatible
> > with the apache release process:
> >
> > 1) Allow for release-on-demand; this is idiomatic for most Julia packages
> > I'm aware of. When a particular bug is fixed, or feature added, a user
> can
> > request a release, a little discussion happens, and a new release is
> made.
> > This approach would work well for the "bursty" kind of contributions
> we've
> > seen to Arrow.jl where development by certain people will happen
> frequently
> > for a while, then take a break for other things. This also avoids having
> > "scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't
> > been significant updates to necessarily warrant a new release. This
> > approach may also facilitate differentiating between bugfix (patch)
> > releases vs. new functionality releases (minor), since when a release is
> > requested, it could be specified whether it should be patch or minor (or
> > major).
> >
> > 2) Commit to a scheduled release pattern like every 2 weeks, once a
> month,
> > etc. This has the advantage of consistency and clearer expectations for
> > users/devs involved. A release also doesn't need to be requested, because
> > we can just wait for the scheduled time to release. In terms of the
> > "unnecessary releases" mentioned above, it could be as simple as
> > "cancelling" a release if there hasn't been significant updates in the
> > elapsed time period.
> >
> > My preference would be for 1), but that's influenced from what I'm
> familiar
> > with in the Julia package ecosystem. It seems like it would still fit in
> > the apache way since we would formally request a new release, wait the
> > elapsed amount of time for voting (24 hours would be preferrable), then
> at
> > the end of the voting period, a new release could be made.
> >
> > Thanks again kou for helping support the Julia implementation here.
> >
> > -Jacob
> >
> > 2)
> >
> > On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei  wrote:
> >
> > > Hi,
> > >
> > > Sorry for the delay. This is a continuation of the "Status
> > > of Arrow Julia implementation?" thread:
> > >
> > >
> > >
> https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E
> > >
> > > I summarize the current status, the next actions and items
> > > to be discussed.
> > >
> > > The current status:
> > >
> > >   * The Julia Arrow implementation uses
> > > https://github.com/JuliaData/Arrow.jl as a "dev branch"
> > > instead of creating a branch in
> > > https://github.com/apache/arrow
> > >   * The Julia Arrow implementation wants to use GitHub
> > > for the main issue management platform
> > >   * The Julia Arrow implementation wants to release
> > > more frequency than 1 release per 3-4 months
> > >   * The current workflow of the Rust Arrow implementation
> > > will also fit the Julia Arrow implementation
> > >
> > > The current workflow of the Rust Arrow implementation:
> > >
> > >
> > >
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi
> > >
> > > * Uses apache/arrow-rs and apache/arrow-datafusion instead
> > >   of apache/arrow for repository
> > >
> > > * Uses GitHub instead of JIRA for issue management
> > >   platform
> > >
> > >
> > >
> https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit
> > >
> > > * Releases a new minor and patch version every 2 weeks
> > >   in addition to the quarterly release of the other releases
> > >
> > > The next actions after we get a consensus about this
> > > 

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-07 Thread QP Hou
A minor note on the Rust side of things. arrow-rs has a 2 weeks
release cycle, but arrow-datafusion mostly does release on demand at
the moment. Our most uptodate release processes are documented at [1]
and [2].

[1]: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md
[2]: 
https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md

On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn  wrote:
>
> Thanks kou.
>
> I think the TODO action list looks good.
>
> The one point I think could use some additional discussion is around the
> release cadence: it IS desirable to be able to release more frequently than
> the parent repo 3-4 month cadence. But we also haven't had the frequency of
> commits to necessarily warrant a release every 2 weeks. I can think of two
> possible options, not sure if one or the other would be more compatible
> with the apache release process:
>
> 1) Allow for release-on-demand; this is idiomatic for most Julia packages
> I'm aware of. When a particular bug is fixed, or feature added, a user can
> request a release, a little discussion happens, and a new release is made.
> This approach would work well for the "bursty" kind of contributions we've
> seen to Arrow.jl where development by certain people will happen frequently
> for a while, then take a break for other things. This also avoids having
> "scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't
> been significant updates to necessarily warrant a new release. This
> approach may also facilitate differentiating between bugfix (patch)
> releases vs. new functionality releases (minor), since when a release is
> requested, it could be specified whether it should be patch or minor (or
> major).
>
> 2) Commit to a scheduled release pattern like every 2 weeks, once a month,
> etc. This has the advantage of consistency and clearer expectations for
> users/devs involved. A release also doesn't need to be requested, because
> we can just wait for the scheduled time to release. In terms of the
> "unnecessary releases" mentioned above, it could be as simple as
> "cancelling" a release if there hasn't been significant updates in the
> elapsed time period.
>
> My preference would be for 1), but that's influenced from what I'm familiar
> with in the Julia package ecosystem. It seems like it would still fit in
> the apache way since we would formally request a new release, wait the
> elapsed amount of time for voting (24 hours would be preferrable), then at
> the end of the voting period, a new release could be made.
>
> Thanks again kou for helping support the Julia implementation here.
>
> -Jacob
>
> 2)
>
> On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei  wrote:
>
> > Hi,
> >
> > Sorry for the delay. This is a continuation of the "Status
> > of Arrow Julia implementation?" thread:
> >
> >
> > https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E
> >
> > I summarize the current status, the next actions and items
> > to be discussed.
> >
> > The current status:
> >
> >   * The Julia Arrow implementation uses
> > https://github.com/JuliaData/Arrow.jl as a "dev branch"
> > instead of creating a branch in
> > https://github.com/apache/arrow
> >   * The Julia Arrow implementation wants to use GitHub
> > for the main issue management platform
> >   * The Julia Arrow implementation wants to release
> > more frequency than 1 release per 3-4 months
> >   * The current workflow of the Rust Arrow implementation
> > will also fit the Julia Arrow implementation
> >
> > The current workflow of the Rust Arrow implementation:
> >
> >
> > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi
> >
> > * Uses apache/arrow-rs and apache/arrow-datafusion instead
> >   of apache/arrow for repository
> >
> > * Uses GitHub instead of JIRA for issue management
> >   platform
> >
> >
> > https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit
> >
> > * Releases a new minor and patch version every 2 weeks
> >   in addition to the quarterly release of the other releases
> >
> > The next actions after we get a consensus about this
> > discussion:
> >
> >   1. Start voting the Julia Arrow implementation move like
> >  the Rust's one:
> >
> >
> > https://lists.apache.org/x/thread.html/r44390a18b3fbb08ddb68aa4d12f37245d948984fae11a41494e5fc1d@%3Cdev.arrow.apache.org%3E
> >
> >   2. Create apache/arrow-julia
> >
> >   3. Start IP clearance process to import JuliaData/Arrow.jl
> >  to apache/arrow-julia
> >
> >  (We don't use julia/Arrow/ in apache/arrow.)
> >
> >   4. Import JuliaData/Arrow.jl to apache/arrow-julia
> >
> >   5. Prepare integration tests CI in apache/arrow-julia and apache/arrow
> >
> >   6. Prepare releasing tools in apache/arrow-julia and apache/arrow
> >
> >   7. Remove julia/... from apache/arrow and leave
> 

Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-07 Thread Jacob Quinn
Thanks kou.

I think the TODO action list looks good.

The one point I think could use some additional discussion is around the
release cadence: it IS desirable to be able to release more frequently than
the parent repo 3-4 month cadence. But we also haven't had the frequency of
commits to necessarily warrant a release every 2 weeks. I can think of two
possible options, not sure if one or the other would be more compatible
with the apache release process:

1) Allow for release-on-demand; this is idiomatic for most Julia packages
I'm aware of. When a particular bug is fixed, or feature added, a user can
request a release, a little discussion happens, and a new release is made.
This approach would work well for the "bursty" kind of contributions we've
seen to Arrow.jl where development by certain people will happen frequently
for a while, then take a break for other things. This also avoids having
"scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't
been significant updates to necessarily warrant a new release. This
approach may also facilitate differentiating between bugfix (patch)
releases vs. new functionality releases (minor), since when a release is
requested, it could be specified whether it should be patch or minor (or
major).

2) Commit to a scheduled release pattern like every 2 weeks, once a month,
etc. This has the advantage of consistency and clearer expectations for
users/devs involved. A release also doesn't need to be requested, because
we can just wait for the scheduled time to release. In terms of the
"unnecessary releases" mentioned above, it could be as simple as
"cancelling" a release if there hasn't been significant updates in the
elapsed time period.

My preference would be for 1), but that's influenced from what I'm familiar
with in the Julia package ecosystem. It seems like it would still fit in
the apache way since we would formally request a new release, wait the
elapsed amount of time for voting (24 hours would be preferrable), then at
the end of the voting period, a new release could be made.

Thanks again kou for helping support the Julia implementation here.

-Jacob

2)

On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei  wrote:

> Hi,
>
> Sorry for the delay. This is a continuation of the "Status
> of Arrow Julia implementation?" thread:
>
>
> https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E
>
> I summarize the current status, the next actions and items
> to be discussed.
>
> The current status:
>
>   * The Julia Arrow implementation uses
> https://github.com/JuliaData/Arrow.jl as a "dev branch"
> instead of creating a branch in
> https://github.com/apache/arrow
>   * The Julia Arrow implementation wants to use GitHub
> for the main issue management platform
>   * The Julia Arrow implementation wants to release
> more frequency than 1 release per 3-4 months
>   * The current workflow of the Rust Arrow implementation
> will also fit the Julia Arrow implementation
>
> The current workflow of the Rust Arrow implementation:
>
>
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi
>
> * Uses apache/arrow-rs and apache/arrow-datafusion instead
>   of apache/arrow for repository
>
> * Uses GitHub instead of JIRA for issue management
>   platform
>
>
> https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit
>
> * Releases a new minor and patch version every 2 weeks
>   in addition to the quarterly release of the other releases
>
> The next actions after we get a consensus about this
> discussion:
>
>   1. Start voting the Julia Arrow implementation move like
>  the Rust's one:
>
>
> https://lists.apache.org/x/thread.html/r44390a18b3fbb08ddb68aa4d12f37245d948984fae11a41494e5fc1d@%3Cdev.arrow.apache.org%3E
>
>   2. Create apache/arrow-julia
>
>   3. Start IP clearance process to import JuliaData/Arrow.jl
>  to apache/arrow-julia
>
>  (We don't use julia/Arrow/ in apache/arrow.)
>
>   4. Import JuliaData/Arrow.jl to apache/arrow-julia
>
>   5. Prepare integration tests CI in apache/arrow-julia and apache/arrow
>
>   6. Prepare releasing tools in apache/arrow-julia and apache/arrow
>
>   7. Remove julia/... from apache/arrow and leave
>  julia/README.md pointing to apache/arrow-julia
>
>
> Items to be discussed:
>
>   * Interval of minor and patch releases
>
> * The Rust Arrow implementation uses 2 weeks.
>
> * Does the Julia Arrow implementation also wants to use
>   2 weeks?
>
>   * Can we accordance with the Apache way with this workflow
> without pain?
>
> The Rust Arrow implementation workflow includes the
> following for this:
>
>
> https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi
>
>   > Contributors will be required to write issues for
>   > planned 

[DISCUSS][Julia] How to restart at apache/arrow-julia?

2021-09-05 Thread Sutou Kouhei
Hi,

Sorry for the delay. This is a continuation of the "Status
of Arrow Julia implementation?" thread:

  
https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E

I summarize the current status, the next actions and items
to be discussed.

The current status:

  * The Julia Arrow implementation uses
https://github.com/JuliaData/Arrow.jl as a "dev branch"
instead of creating a branch in
https://github.com/apache/arrow
  * The Julia Arrow implementation wants to use GitHub
for the main issue management platform
  * The Julia Arrow implementation wants to release
more frequency than 1 release per 3-4 months
  * The current workflow of the Rust Arrow implementation
will also fit the Julia Arrow implementation

The current workflow of the Rust Arrow implementation:

  
https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi

* Uses apache/arrow-rs and apache/arrow-datafusion instead
  of apache/arrow for repository

* Uses GitHub instead of JIRA for issue management
  platform

  
https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit

* Releases a new minor and patch version every 2 weeks
  in addition to the quarterly release of the other releases

The next actions after we get a consensus about this
discussion:

  1. Start voting the Julia Arrow implementation move like
 the Rust's one:

   
https://lists.apache.org/x/thread.html/r44390a18b3fbb08ddb68aa4d12f37245d948984fae11a41494e5fc1d@%3Cdev.arrow.apache.org%3E

  2. Create apache/arrow-julia

  3. Start IP clearance process to import JuliaData/Arrow.jl
 to apache/arrow-julia

 (We don't use julia/Arrow/ in apache/arrow.)

  4. Import JuliaData/Arrow.jl to apache/arrow-julia

  5. Prepare integration tests CI in apache/arrow-julia and apache/arrow

  6. Prepare releasing tools in apache/arrow-julia and apache/arrow

  7. Remove julia/... from apache/arrow and leave
 julia/README.md pointing to apache/arrow-julia


Items to be discussed:

  * Interval of minor and patch releases

* The Rust Arrow implementation uses 2 weeks.

* Does the Julia Arrow implementation also wants to use
  2 weeks?

  * Can we accordance with the Apache way with this workflow
without pain?

The Rust Arrow implementation workflow includes the
following for this:

  
https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi

  > Contributors will be required to write issues for
  > planned features and bug fixes so that we have
  > visibility and opportunities for collaboration
  > before a PR shows up.

  * More items?


Thanks,
-- 
kou