Re: [DISCUSS][MATLAB] Proposed "Category B" License for Bundling MATLAB MEX Build Artifacts in Official Arrow Release

2024-03-12 Thread Jacob Wujciak-Jens
That's great news, thank you both for your efforts!

Best,
Jacob

On Tue, Mar 12, 2024 at 6:45 PM Sarah Gilmore
 wrote:

> Hi Everyone,
>
>
>
> We just wanted to close the loop on this discussion.
>
>
>
> After further discussion with our colleagues at MathWorks, we determined
> that we can license the MEX binaries and ALL other contents included within
> the MLTBX files distrusted via the ASF release infrastructure under the
> standard Apache V2 license.
>
>
>
> ASF Legal agreed [1] that this approach abides by the ASF 3rd Party
> License Policy [2].
>
>
>
> Moving forward, Kevin and I will continue working on integrating with the
> Arrow project's release infrastructure [3] as we initially planned.
>
>
>
> We sincerely appreciate everyone's patience as we navigated these
> challenges.
>
>
>
> [1]
> https://issues.apache.org/jira/browse/LEGAL-665?focusedCommentId=17823330=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17823330
> [2] https://www.apache.org/legal/resolved.html
> [3] https://github.com/apache/arrow/pull/38660
>
>
>
> Best,
>
>
>
> Sarah and Kevin
>
> 
> From: Sarah Gilmore 
> Sent: Friday, January 26, 2024 1:28 PM
> To: dev@arrow.apache.org 
> Cc: Kevin Gurney 
> Subject: Re: [DISCUSS][MATLAB] Proposed "Category B" License for Bundling
> MATLAB MEX Build Artifacts in Official Arrow Release
>
> Hi Ian,
>
> Thanks for the feedback! We will proceed with the ASF Legal process. Once
> we hear back from them, we'll followup on this thread to close the loop.
>
> Thanks again!
>
> Sarah and Kevin
>
>
> From: Ian Cook 
> Sent: Friday, January 26, 2024 11:37 AM
> To: dev@arrow.apache.org 
> Cc: Kevin Gurney 
> Subject: Re: [DISCUSS][MATLAB] Proposed "Category B" License for Bundling
> MATLAB MEX Build Artifacts in Official Arrow Release
>
> Hi Sarah and Kevin,
>
> Thanks for your thoughtful follow-up.
>
> Based on all of this, it seems that this question will need to be
> submitted to ASF Legal for consideration. I think it is quite clear
> that this is a good-faith effort to abide by the spirit of the ASF 3rd
> Party License Policy, but the specific details will need to be
> considered by ASF Legal.
>
> > The binaries we plan to submit, and the accompanying license,
> > are similar to the use cases listed under “Handling Licenses That
> > Prevent Modification” [3] in the Category B description. While most
> > of the contents of the distributed MLTBX file would be Apache-
> > licensed, the compiled MEX functions would be dynamically linked
> > against proprietary MathWorks shared libraries, which would cause
> > inclusion of non-Apache licensed object code.
>
> Yes, I think that is the right approach to pursue with ASF Legal:
> asking them to add the license that governs the MEX functions to the
> list of approved licenses under [3].
>
> Thanks,
> Ian
>
>
> On Fri, Jan 26, 2024 at 10:56 AM Sarah Gilmore
>  wrote:
> >
> > Hi all,
> >
> > After consulting with some of our colleagues at MathWorks, we wanted to
> follow-up on this thread.
> >
> > Before going through the official ASF legal process, we wanted to give
> the community some insight into our thinking about why our proposed license
> may be appropriate for Category B consideration.
> >
> > Our interpretation of the ASF 3rd Party License Policy [1] was that
> Category B licenses are not limited to standard licenses, but, rather, must
> meet the Appropriately Labelled Condition and the Binary-Only Inclusion
> Condition. The proposed license [2] we shared is intended to meet these
> conditions. However, we understand that our interpretation may not be
> accurate.
> >
> > The binaries we plan to submit, and the accompanying license, are
> similar to the use cases listed under “Handling Licenses That Prevent
> Modification” [3] in the Category B description. While most of the contents
> of the distributed MLTBX file would be Apache-licensed, the compiled MEX
> functions would be dynamically linked against proprietary MathWorks shared
> libraries, which would cause inclusion of non-Apache licensed object code.
> >
> > The goal of the proposed license is to allow the MLTBX file to be used
> and distributed freely as an official ASF release artifact. Ideally,
> MathWorks would like to restrict reverse engineering and modification of
> the proprietary components and the proposed license includes a clause for
> this restriction. Since the MATLAB Interface to Arrow will likely only be
> useful to users of MathWorks products, our hope is that this restriction
> would not be an impediment to users.
> >
> > We understand this is an unusual situation and appreciate the
> community's support in helping us identify a solution.
> >
> > [1] https://www.apache.org/legal/resolved.html<
> https://www.apache.org/legal/resolved.html>
> > [2] https://github.com/apache/arrow/files/13955180/license.txt<
> https://github.com/apache/arrow/files/13955180/license.txt>
> > [3] 

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-03 Thread Jacob Wujciak-Jens
+1 (non-binding)

On Mon, Mar 4, 2024 at 3:39 AM Yang Jiang  wrote:

> +1 (non-binding)
>
> On 2024/03/01 18:08:26 Daniël Heres wrote:
> > +1 (binding)
> >
> > On Fri, Mar 1, 2024, 19:05 Chao Sun  wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Fri, Mar 1, 2024 at 9:59 AM QP Hou  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > exciting milestone :)
> > > >
> > > > On Fri, Mar 1, 2024 at 9:49 AM David Li  wrote:
> > > > >
> > > > > +1
> > > > >
> > > > > On Fri, Mar 1, 2024, at 12:06, Jorge Cardoso Leitão wrote:
> > > > > > +1 - great work!!!
> > > > > >
> > > > > > On Fri, Mar 1, 2024 at 5:49 PM Micah Kornfield <
> > > emkornfi...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> +1 (binding)
> > > > > >>
> > > > > >> On Friday, March 1, 2024, Uwe L. Korn  wrote:
> > > > > >>
> > > > > >> > +1 (binding)
> > > > > >> >
> > > > > >> > On Fri, Mar 1, 2024, at 2:37 PM, Andy Grove wrote:
> > > > > >> > > +1 (binding)
> > > > > >> > >
> > > > > >> > > On Fri, Mar 1, 2024 at 6:20 AM Weston Pace <
> > > weston.p...@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > >> +1 (binding)
> > > > > >> > >>
> > > > > >> > >> On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb <
> > > al...@influxdata.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >>
> > > > > >> > >> > Hello,
> > > > > >> > >> >
> > > > > >> > >> > As we have discussed[1][2] I would like to vote on the
> > > > proposal to
> > > > > >> > >> > create a new Apache Top Level Project for DataFusion. The
> > > text
> > > > of
> > > > > >> the
> > > > > >> > >> > proposed resolution and background document is
> copy/pasted
> > > > below
> > > > > >> > >> >
> > > > > >> > >> > If the community is in favor of this, we plan to submit
> the
> > > > > >> resolution
> > > > > >> > >> > to the ASF board for approval with the next Arrow report
> (for
> > > > the
> > > > > >> > >> > April 2024 board meeting).
> > > > > >> > >> >
> > > > > >> > >> > The vote will be open for at least 7 days.
> > > > > >> > >> >
> > > > > >> > >> > [ ] +1 Accept this Proposal
> > > > > >> > >> > [ ] +0
> > > > > >> > >> > [ ] -1 Do not accept this proposal because...
> > > > > >> > >> >
> > > > > >> > >> > Andrew
> > > > > >> > >> >
> > > > > >> > >> > [1]
> > > > > >>
> https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341
> > > > > >> > >> > [2]
> > > > https://github.com/apache/arrow-datafusion/discussions/6475
> > > > > >> > >> >
> > > > > >> > >> > -- Proposed Resolution -
> > > > > >> > >> >
> > > > > >> > >> > Resolution to Create the Apache DataFusion Project from
> the
> > > > Apache
> > > > > >> > >> > Arrow DataFusion Sub Project
> > > > > >> > >> >
> > > > > >> > >> >
> =
> > > > > >> > >> >
> > > > > >> > >> > X. Establish the Apache DataFusion Project
> > > > > >> > >> >
> > > > > >> > >> > WHEREAS, the Board of Directors deems it to be in the
> best
> > > > > >> > >> > interests of the Foundation and consistent with the
> > > > > >> > >> > Foundation's purpose to establish a Project Management
> > > > > >> > >> > Committee charged with the creation and maintenance of
> > > > > >> > >> > open-source software related to an extensible query
> engine
> > > > > >> > >> > for distribution at no charge to the public.
> > > > > >> > >> >
> > > > > >> > >> > NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> > > > > >> > >> > Committee (PMC), to be known as the "Apache DataFusion
> > > > Project",
> > > > > >> > >> > be and hereby is established pursuant to Bylaws of the
> > > > > >> > >> > Foundation; and be it further
> > > > > >> > >> >
> > > > > >> > >> > RESOLVED, that the Apache DataFusion Project be and
> hereby is
> > > > > >> > >> > responsible for the creation and maintenance of software
> > > > > >> > >> > related to an extensible query engine; and be it further
> > > > > >> > >> >
> > > > > >> > >> > RESOLVED, that the office of "Vice President, Apache
> > > > DataFusion" be
> > > > > >> > >> > and hereby is created, the person holding such office to
> > > > > >> > >> > serve at the direction of the Board of Directors as the
> chair
> > > > > >> > >> > of the Apache DataFusion Project, and to have primary
> > > > responsibility
> > > > > >> > >> > for management of the projects within the scope of
> > > > > >> > >> > responsibility of the Apache DataFusion Project; and be
> it
> > > > further
> > > > > >> > >> >
> > > > > >> > >> > RESOLVED, that the persons listed immediately below be
> and
> > > > > >> > >> > hereby are appointed to serve as the initial members of
> the
> > > > > >> > >> > Apache DataFusion Project:
> > > > > >> > >> >
> > > > > >> > >> > * Andy Grove (agr...@apache.org)
> > > > > >> > >> > * Andrew Lamb (al...@apache.org)
> > > > > >> > >> > * Daniël Heres (dhe...@apache.org)
> > > > > >> > >> > * Jie Wen (jake...@apache.org)
> > > > > >> > >> > * Kun Liu (liu...@apache.org)
> > > > > >> > >> > * Liang-Chi Hsieh (vii...@apache.org)

Re: [ANNOUNCE] New Arrow committer: Jay Zhan

2024-02-20 Thread Jacob Wujciak-Jens
Congrats and welcome!

On Sat, Feb 17, 2024 at 9:45 AM Christiano Anderson 
wrote:

> Congrats!
>
> On 16/02/2024 11:25, Andrew Lamb wrote:
> > On behalf of the Arrow PMC, I'm happy to announce that Jay Zhan
> > has accepted an invitation to become a committer on Apache
> > Arrow. Welcome, and thank you for your contributions!
> >
> > Andrew
> >
>


Re: New tag for releases for R-universe

2024-02-10 Thread Jacob Wujciak-Jens
Thanks Nic.

For a versioned history I planned to also have the versioned
'apache-arrow-x.y.z-cran' tags or maybe 'apache-arrow-x.y.z-r' would be
less ambiguous as Nic mentioned.
The r-universe tag is unversioned so we can update it without any changes
on the r-universe side.

On Sat, Feb 10, 2024 at 10:16 PM Raúl Cumplido 
wrote:

> Hi,
>
> Thanks for doing this! Only one question, would there be any downside on
> the R side on having the tag with a version associated so we have a
> historic on the repo?
>
> Something on the lines of "r-universe-release-15.0.0".
>
> Maybe not relevant at the moment but if in the future we decide to have
> some long term support releases or something like this it might be
> relevant.
>
> Thanks,
> Raúl
>
>
> El sáb, 10 feb 2024, 22:10, Jonathan Keane  escribió:
>
> > Thanks for this Nic.
> >
> > And just to clarify: the latest here is the latest _release_ of Apache
> > Arrow with this new set up. Prior to this the build available on
> R-universe
> > were effectively dev builds (commits to main), but with this new tag,
> > R-universe will only have (or at least default to having) the latest
> > release.
> >
> > -Jon
> >
> >
> > On Sat, Feb 10, 2024 at 2:18 PM Nic Crane  wrote:
> >
> > > Hi folks,
> > >
> > > The Arrow R package is distributed via a few different methods, one of
> > > which is R-universe[1].
> > >
> > > In order for r-universe to track the latest version of the R package,
> we
> > > have started using the tag "r-universe-release" to indicate the commit
> > > which represents the latest version of the R package (which is also
> > > submitted to CRAN).
> > >
> > > I'm mentioning it here just to be transparent about this - it doesn't
> > make
> > > any changes to the current release process as it's still the version
> > based
> > > off the release candidate and this is just an additional step for the R
> > > package which follows a successful main project release.
> > >
> > > Hope this all sounds OK - if not, happy to take feedback for changes
> etc
> > on
> > > this.
> > >
> > > Thanks,
> > >
> > > Nic
> > >
> > >
> > > [1] https://r-universe.dev/
> > >
> >
>


Re: [ANNOUNCE] New Arrow committer: Jeffrey Vo

2024-02-07 Thread Jacob Wujciak-Jens
Congrats 

Raúl Cumplido  schrieb am Mi., 7. Feb. 2024, 14:02:

> Congratulations Jeffrey!
>
> El mié, 7 feb 2024 a las 14:00, Andrew Lamb ()
> escribió:
> >
> > Congratulations Jeffrey! Well deserved!
> >
> > On Tue, Feb 6, 2024 at 1:30 PM Raphael Taylor-Davies
> >  wrote:
> >
> > > On behalf of the Arrow PMC, I am happy to announce that Jeffrey Vo has
> > > accepted an invitation to become a committer on Apache Arrow. Welcome,
> > > and thank you for your contributions!
> > >
> > > Raphael Taylor-Davies
> > >
> > >
>


Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-27 Thread Jacob Wujciak-Jens
+1 (non-binding)

Jorge Cardoso Leitão  schrieb am So., 28. Jan.
2024, 05:17:

> +1
>
> On Sun, 28 Jan 2024, 00:00 Wes McKinney,  wrote:
>
> > +1 (binding)
> >
> > On Sat, Jan 27, 2024 at 12:26 PM Micah Kornfield 
> > wrote:
> >
> > > +1 Binding
> > >
> > > On Sat, Jan 27, 2024 at 10:21 AM David Li  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Sat, Jan 27, 2024, at 13:03, L. C. Hsieh wrote:
> > > > > +1 (binding)
> > > > >
> > > > > On Sat, Jan 27, 2024 at 8:10 AM Andrew Lamb 
> > > > wrote:
> > > > >>
> > > > >> +1 (binding)
> > > > >>
> > > > >> This is super exciting
> > > > >>
> > > > >> On Sat, Jan 27, 2024 at 11:00 AM Daniël Heres <
> > danielhe...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> > +1 (binding). Awesome addition to the DataFusion ecosystem!!!
> > > > >> >
> > > > >> > Daniël
> > > > >> >
> > > > >> >
> > > > >> > On Sat, Jan 27, 2024, 16:57 vin jake 
> > wrote:
> > > > >> >
> > > > >> > > +1 (binding)
> > > > >> > >
> > > > >> > > Andy Grove  于 2024年1月27日周六 下午11:43写道:
> > > > >> > >
> > > > >> > > > Hello,
> > > > >> > > >
> > > > >> > > > This vote is to determine if the Arrow PMC is in favor of
> > > > accepting the
> > > > >> > > > donation of Comet (a Spark native engine that is powered by
> > > > DataFusion
> > > > >> > > and
> > > > >> > > > the Rust implementation of Arrow).
> > > > >> > > >
> > > > >> > > > The donation was previously discussed on the mailing list
> [1].
> > > > >> > > >
> > > > >> > > > The proposed donation is at [2].
> > > > >> > > >
> > > > >> > > > The Arrow PMC will start the IP clearance process if the
> vote
> > > > passes.
> > > > >> > > There
> > > > >> > > > is a Google document [3] where the community is working on
> the
> > > > draft
> > > > >> > > > contents for the IP clearance form.
> > > > >> > > >
> > > > >> > > > The vote will be open for at least 72 hours.
> > > > >> > > >
> > > > >> > > > [ ] +1 : Accept the donation
> > > > >> > > > [ ] 0 : No opinion
> > > > >> > > > [ ] -1 : Reject donation because...
> > > > >> > > >
> > > > >> > > > My vote: +1
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > >
> > > > >> > > > Andy.
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > [1]
> > > > https://lists.apache.org/thread/0q1rb11jtpopc7vt1ffdzro0omblsh0s
> > > > >> > > > [2] https://github.com/apache/arrow-datafusion-comet/pull/1
> > > > >> > > > [3]
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > >
> > >
> >
> https://docs.google.com/document/d/1azmxE1LERNUdnpzqDO5ortKTsPKrhNgQC4oZSmXa8x4/edit?usp=sharing
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > >
> > >
> >
>


Re: [DISC] Improve Arrow Release verification process

2024-01-19 Thread Jacob Wujciak-Jens
I concur, a minimally scoped verification script for the actual voting
process without any binary verification etc. should be created. The ease in
verifying a release will lower the burden to participate in the vote which
is good for the community and will even be necessary if we ever want to
increase release cadance as previously discussed.

In my opinion it will also mean that the binaries are no longer part of the
release, which will help in situations similar to the release of Python
3.12 just after 14.0.0 was released and lots of users were running into
issues because there were no 14.0.0 wheels for 3.12.

While it would still be nice to potentially make reproduction of CI errors
easier by having better methods to restart a failed script, this is of much
lower importance then improving the release process.

Jacob

On Fri, Jan 19, 2024 at 7:38 PM Andrew Lamb  wrote:

> I would second this notion that manually running tests that are already
> covered as part of CI as part of the release process is of (very) limited
> value.
>
> While we do the same thing (compile and run some tests) as part of the Rust
> release this has never caught any serious defect I am aware of and we only
> run a subset of tests (e.g. not tests for integration with other arrow
> versions)
>
> Reducing the burden for releases I think would benefit everyone.
>
> Andrew
>
> On Fri, Jan 19, 2024 at 1:08 PM Antoine Pitrou  wrote:
>
> >
> > Well, if the main objective is to just follow the ASF Release
> > guidelines, then our verification process can be simplified drastically.
> >
> > The ASF indeed just requires:
> > """
> > Every ASF release MUST contain one or more source packages, which MUST
> > be sufficient for a user to build and test the release provided they
> > have access to the appropriate platform and tools. A source release
> > SHOULD not contain compiled code.
> > """
> >
> > So, basically, if the source tarball is enough to compile Arrow on a
> > single platform with a single set of tools, then we're ok. :-)
> >
> > The rest is just an additional burden that we voluntarily inflict to
> > ourselves. *Ideally*, our continuous integration should be able to
> > detect any build-time or runtime issue on supported platforms. There
> > have been rare cases where some issues could be detected at release time
> > thanks to the release verification script, but these are a tiny portion
> > of all issues routinely detected in the form of CI failures. So, there
> > doesn't seem to be a reason to believe that manual release verification
> > is bringing significant benefits compared to regular CI.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 19/01/2024 à 11:42, Raúl Cumplido a écrit :
> > > Hi,
> > >
> > > One of the challenges we have when doing a release is verification and
> > voting.
> > >
> > > Currently the Arrow verification process is quite long, tedious and
> > error prone.
> > >
> > > I would like to use this email to get feedback and user requests in
> > > order to improve the process.
> > >
> > > Several things already on my mind:
> > >
> > > One thing that is quite annoying is that any flaky failure makes us
> > > restart the process and possibly requires downloading everything
> > > again. It would be great to have some kind of retry mechanism that
> > > allows us to keep going from where it failed and doesn't have to redo
> > > the previous successful jobs.
> > >
> > > We do have a bunch of flags to do specific parts but that requires
> > > knowledge and time to go over the different flags, etcetera so the UX
> > > could be improved.
> > >
> > > Based on the ASF release policy [1] in order to cast a +1 vote we have
> > > to validate the source code packages but it is not required to
> > > validate binaries locally. Several binaries are currently tested using
> > > docker images and they are already tested and validated on CI. Our
> > > documentation for release verification points to perform binary
> > > validation. I plan to update the documentation and move it to the
> > > official docs instead of the wiki [2].
> > >
> > > I would appreciate input on the topic so we can improve the current
> > process.
> > >
> > > Thanks everyone,
> > > Raúl
> > >
> > > [1] https://www.apache.org/legal/release-policy.html#release-approval
> > > [2]
> >
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >
>


Re: [DISCUSS][MATLAB] Proposal for incremental point releases of the MATLAB interface

2023-11-10 Thread Jacob Wujciak-Jens
GitHub releases do have a prerelease/rc status that can be activated. Maybe
that could be used as an indicator to not include theses on the exchange
site?

Sarah Gilmore  schrieb am Fr., 10. Nov.
2023, 21:06:

> Hi Raúl,
>
> > Currently all the binaries are generated on the third step of the
> > Release process [1] when we run `03-binary-submit.sh`. The crossbow
> > job could build the MLTBX artifact and then when we do download the
> > other binaries (`04-binary-download.sh`) we should also download the
> > MTLBX and when we submit the rest to jfrog (`05-binary-upload.sh`) we
> > could Upload MLTBX to GitHub Releases for apache-arrow-X.Y.Z-rcN.
>
> Thanks for clarifying how these scripts work together. This all makes
> sense. Our one concern is that the Arrow-MATLAB File Exchange entry would
> be automatically updated to show release candidates that have been uploaded
> to apache/arrow's GitHub Releases area. We're looking into how to prevent
> this from happening.
>
> Best,
>
> Sarah Gilmore
>
> 
> From: Raúl Cumplido 
> Sent: Friday, November 10, 2023 1:11 PM
> To: Raúl Cumplido 
> Cc: Sutou Kouhei ; dev@arrow.apache.org <
> dev@arrow.apache.org>; Lei Hou ; Sarah Gilmore <
> sgilm...@mathworks.com>
> Subject: Re: [DISCUSS][MATLAB] Proposal for incremental point releases of
> the MATLAB interface
>
> In case it was not clear, even though the binary job is run on
> ursacomputing/crossbow when we upload the binaries and create the
> Release that should be, at least in my opinion, an apache/arrow
> release.
>
> Both for the steps:
> 1. RC: Upload MLTBX to GitHub Releases for apache-arrow-X.Y.Z-rcN
> and
> 2.2 Upload it to GitHub Releases for apache-arrow-X.Y.Z
>
> El vie, 10 nov 2023 a las 19:02, Raúl Cumplido ()
> escribió:
> >
> > Hi Sara,
> >
> > El vie, 10 nov 2023 a las 18:48, Sarah Gilmore
> > () escribió:
> > >
> > > Hi Kou,
> > >
> > > > We can use apache/arrow's GitHub Releases. The release
> > > > distribution document says that we can use GitHub as a
> > > > release platform:
> > > > https://infra.apache.org/release-distribution.html#other-platforms<
> https://infra.apache.org/release-distribution.html#other-platforms>
> > > >
> > > > apache/arrow doesn't use GitHub Releases yet but
> > > > apache/arrow-adbc and apache/arrow-flight-sql-postgresql
> > > > already use GitHub Releases. (We just use "gh release
> > > > upload" to upload our artifacts to GitHub Releases.)
> > >
> > > Thank you for clarifying that we can use apache/arrow's GitHub
> Releases area for hosting the MLTBX file. We assumed we couldn't use the
> main repository, but it's great to hear we can!
> > >
> > > > BTW, how does File Exchange "Connecting to GitHub Repositories"?
> > > >
> https://www.mathworks.com/matlabcentral/content/fx/about.html#Why_GitHub
> > > >
> > > > Does it just use "polling"? Or do we need to install any
> > > > GitHub App, set secret variable or something on
> > > > apache/arrow? If the latter, we need to ask INFRA to do it.
> > >
> > > We are currently consulting with the development team responsible for
> the GitHub <-> File Exchange integration. We'll send a followup email with
> a concrete answer once we know more.
> > >
> > > > If we use GitHub Releases on apache/arrow, we can use the
> > > > following workflow. We don't need to use JFrog.
> > > >
> > > > 1. RC: Upload MLTBX to GitHub Releases for apache-arrow-X.Y.Z-rcN
> > > > 2. Release: Run a post release script that would:
> > > > 2.1 Download MLTBX from GitHub Releases for apache-arrow-X.Y.Z-rcN
> > > > 2.2 Upload it to GitHub Releases for apache-arrow-X.Y.Z
> > > > 2.3 Linked File Exchange entry will be automatically updated
> > >
> > > This seems like a much more streamlined approach. Not having to upload
> to JFrog will make things easier. Thanks for the suggestion!
> > >
> > > To clarify, in step 1, would we upload the MLTBX to
> ursacomputing/crossbow's GitHub Releases area [1]? Or, would we upload to
> apache/arrow's GitHub Releases area? If we upload release candidates to
> apache/arrow's GitHub Releases area, they would get automatically linked to
> the File Exchange. Ideally, we wouldn't want users to download release
> candidates.
> > >
> >
> > Currently all the binaries are generated on the third step of the
> > Release process [1] when we run `03-binary-submit.sh`. The crossbow
> > job could build the MLTBX artifact and then when we do download the
> > other binaries (`04-binary-download.sh`) we should also download the
> > MTLBX and when we submit the rest to jfrog (`05-binary-upload.sh`) we
> > could Upload MLTBX to GitHub Releases for apache-arrow-X.Y.Z-rcN.
> >
> > Once the release is approved and we do the post-release tasks to
> > "officially" release, we would download the MLTBX and upload to the
> > new GitHub Releases for apache-arrow-X.Y.Z this can be done as another
> > step on our post-release tasks (post-xx-matlab.sh)
> >
> > [1]
> 

Re: [VOTE] Release Apache Arrow 14.0.0 - RC2

2023-10-24 Thread Jacob Wujciak-Jens
+1 (non-binding)

pop_os 22.04

On Tue, Oct 24, 2023 at 8:14 PM Raúl Cumplido 
wrote:

> Hi Bryce,
>
> This happened on the verification tasks and is related to this issue [1].
>
> It should be solved if you pull the latest main and the related
> testing submodules.
>
> Thanks,
> Raúl
>
> [1] https://github.com/apache/arrow/issues/38345
>
> El mar, 24 oct 2023 a las 20:02, Bryce Mecum ()
> escribió:
> >
> > I've failed to verify this release candidate on macOS M1, running
> > "dev/release/verify-release-candidate.sh 14.0.0 2" [1]. The failure
> > looks related to the Go implementation's "parquet-encryption-test".
> > Can anyone on a similar machine verify?
> >
> > [1] https://gist.github.com/amoeba/f47534bea44d78a7ee79e4b44ed0e4ff
> >
> >
> > On Mon, Oct 23, 2023 at 11:19 PM Raúl Cumplido 
> wrote:
> > >
> > > Hi,
> > >
> > > I would like to propose the following release candidate (RC2) of Apache
> > > Arrow version 14.0.0. This is a release consisting of 461
> > > resolved GitHub issues[1].
> > >
> > > This release candidate is based on commit:
> > > 2dcee3f82c6cf54b53a64729fd81840efa583244 [2]
> > >
> > > The source release rc2 is hosted at [3].
> > > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
> > > The changelog is located at [12].
> > >
> > > Please download, verify checksums and signatures, run the unit tests,
> > > and vote on the release. See [13] for how to validate a release
> candidate.
> > >
> > > See also a verification result on GitHub pull request [14].
> > >
> > > The vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Release this as Apache Arrow 14.0.0
> > > [ ] +0
> > > [ ] -1 Do not release this as Apache Arrow 14.0.0 because...
> > >
> > > [1]:
> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A14.0.0+is%3Aclosed
> > > [2]:
> https://github.com/apache/arrow/tree/2dcee3f82c6cf54b53a64729fd81840efa583244
> > > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-14.0.0-rc2
> > > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> > > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
> > > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
> > > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> > > [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/14.0.0-rc2
> > > [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/14.0.0-rc2
> > > [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/14.0.0-rc2
> > > [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> > > [12]:
> https://github.com/apache/arrow/blob/2dcee3f82c6cf54b53a64729fd81840efa583244/CHANGELOG.md
> > > [13]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> > > [14]: https://github.com/apache/arrow/pull/38343
>


Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-23 Thread Jacob Wujciak-Jens
Congrats and welcome!

On Mon, Oct 23, 2023 at 5:02 PM Ian Cook  wrote:

> Congratulations Xuwei!
>
> On Mon, Oct 23, 2023 at 12:46 AM Sutou Kouhei  wrote:
> >
> > On behalf of the Arrow PMC, I'm happy to announce that Xuwei Fu
> > has accepted an invitation to become a committer on Apache
> > Arrow. Welcome, and thank you for your contributions!
> >
> > --
> > kou
>


Re: [ANNOUNCE] New Arrow PMC member: Jonathan Keane

2023-10-14 Thread Jacob Wujciak-Jens
Congratulations !

Raúl Cumplido  schrieb am So., 15. Okt. 2023, 00:58:

> Congratulations Jon!
>
> El dom, 15 oct 2023, 0:05, Antoine Pitrou  escribió:
>
> >
> > Welcome to the PMC, Jon!
> >
> > Le 14/10/2023 à 19:42, David Li a écrit :
> > > Congrats Jon!
> > >
> > > On Sat, Oct 14, 2023, at 13:25, Ian Cook wrote:
> > >> Congratulations Jonathan!
> > >>
> > >> On Sat, Oct 14, 2023 at 13:24 Andrew Lamb 
> wrote:
> > >>
> > >>> The Project Management Committee (PMC) for Apache Arrow has invited
> > >>> Jonathan Keane to become a PMC member and we are pleased to announce
> > >>> that Jonathan Keane has accepted.
> > >>>
> > >>> Congratulations and welcome!
> > >>>
> > >>> Andrew
> > >>>
> >
>


Re: [Java][Discuss]: consensus for JDK 8 deprecation

2023-10-10 Thread Jacob Wujciak-Jens
> I cannot estimate the effort to backport large features like the new
layouts that are currently being added (e.g. RunEndEncoding, ListView,
etc.).

In my mind we are only talking about patch releases for security fixes or
similarly critical issues as otherwise the effort to maintain 'v14' (but
actually arrow-latest) would surely overshadow any gains made by
deprecating jdk 8?

On Wed, Oct 11, 2023 at 3:31 AM Gang Wu  wrote:

> I agree that we have to move on. It seems that patch release to
> Arrow v14 is a good idea, though I cannot estimate the effort to
> backport large features like the new layouts that are currently
> being added (e.g. RunEndEncoding, ListView, etc.).
>
> As an Arrow developer, I am always happy to drop JDK 8. My
> employer has leveraged Apache Arrow in the internal engine
> and depends on Arrow Java in the Java SDK. For end users
> who cannot get away with JDK 8, we might need to prepare
> different Java SDKs and use features that are available in the
> Arrow v14, or let the server side chooses which subset of
> features based on the SDK version.
>
> Thanks,
> Gang
>
>
>
> On Wed, Oct 11, 2023 at 12:40 AM Dane Pitkin  >
> wrote:
>
> > To summarize the discussion so far:
> >
> > * Some Arrow Java users are still on JDK 8
> > * Arrow v14 is proposed as the final version with JDK 8 support
> > * Arrow v14 can support patch releases if necessary for JDK 8 users
> > * There is an open question to decide if JDK 11 should be dropped
> > simultaneously
> >
> > Gang Wu, I'm curious what are your thoughts given your initial concerns?
> >
> > -Dane
> >
> > On Sat, Oct 7, 2023 at 12:00 AM Jacob Wujciak-Jens
> >  wrote:
> >
> > > From a release engineer perspective (without java knowledge) I agree
> with
> > > Micah, I'd rather make a patch release for an older version if needed
> but
> > > modernize the codebase and simplify CI!
> > >
> > >
> > > On Sat, Oct 7, 2023 at 5:27 AM Micah Kornfield 
> > > wrote:
> > >
> > > > I think given the stability of Arrow Java, dropping support probably
> > > makes
> > > > sense.  If a bug comes up or consumers really need to new features we
> > can
> > > > always make a patch release of an older version.
> > > >
> > > > On Thu, Oct 5, 2023 at 3:13 PM Dane Pitkin
> >  > > >
> > > > wrote:
> > > >
> > > > > I also learned today that Apache Spark has dropped support for
> Java 8
> > > and
> > > > > 11 for their next release (v4.0)[1]. Should we consider dropping
> Java
> > > 11
> > > > as
> > > > > well?
> > > > >
> > > > > [1]https://github.com/apache/spark/pull/43005
> > > > >
> > > > > -Dane
> > > > >
> > > > > On Thu, Oct 5, 2023 at 3:30 PM Dane Pitkin 
> > > wrote:
> > > > >
> > > > > > I created a GH issue[1] proposing the removal of Java 8 support.
> It
> > > > > > would target the Arrow v15 release (~Jan 2024).
> > > > > >
> > > > > > IMO it would be in the best interest of the project for two major
> > > > > reasons:
> > > > > > 1. Unblock the Java Platform Module System (JPMS)[2]
> > implementation.
> > > > > > 2. Unblock Arrow from upgrading dependencies that no longer
> support
> > > > Java
> > > > > > 8. (See [1] for examples)
> > > > > >
> > > > > > Since Arrow Java has been quite stable, will Java 8 users be okay
> > > with
> > > > > > pinning Arrow to the last supported release (v14) if the Arrow
> > > project
> > > > > > ultimately decides to remove Java 8 support?
> > > > > >
> > > > > >
> > > > > > [1]https://github.com/apache/arrow/issues/38051
> > > > > > [2]https://en.wikipedia.org/wiki/Java_Platform_Module_System
> > > > > >
> > > > > > -Dane
> > > > > >
> > > > > > On Fri, Sep 15, 2023 at 12:26 PM Dane Pitkin <
> d...@voltrondata.com
> > >
> > > > > wrote:
> > > > > >
> > > > > >> - As a low level library, users have to add specific flags to
> use
> > > > > >>>  Java 9 and up with Arrow to resolve issues with java.nio. This
> > has
> > > > > >>>  been annoying for our cust

Re: [DISCUSS][Swift] repo for swift similar to arrow-rs

2023-10-10 Thread Jacob Wujciak-Jens
+1 on Dewey's sentiment.

With regards to technicalities:
- a PMC member can create the repo via ASF's gitbox (I assume
'arrow-swift'?)
- the repo then needs to be configured using the '.asf.yaml'
  - which merge styles are allowed
  - branch protection rules
  - to which ml should notifications be sent
  - see [1] for more features
- CI
- PR/Issue template
- ...

What is the usual versioning scheme for swift projects and what release
cadence are you planning?

Best
Jacob


On Tue, Oct 10, 2023 at 10:25 PM Dewey Dunnington
 wrote:

> Hi Alva,
>
> I would encourage you to do whatever will make life more pleasant for
> you and other contributors to the Swift Arrow implementation. I have
> found development of an Arrow subproject (nanoarrow) in a separate
> repository very pleasant. While I don't run integration tests there,
> it's not because of any technical limitation (instead of pulling one
> repo in your CI job, just pull two).
>
> For the R bindings to Arrow, which do depend on the C++ bindings, we
> do have some benefit because Arrow C++ changes that break R tend to
> get fixed by the C++ contributor in their PR, rather than that
> responsibility always falling on us. That said, it doesn't happen very
> often, and we have informally toyed with the idea of moving out of the
> monorepo to make it less intimidating for outside contributors.
>
> Cheers,
>
> -dewey
>
> On Tue, Oct 10, 2023 at 2:33 PM Antoine Pitrou  wrote:
> >
> >
> > Hi Alva,
> >
> > I'll let others give their opinions on the repo.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 10/10/2023 à 19:25, Alva Bandy a écrit :
> > > Hi Antoine,
> > >
> > > Thanks for the reply.
> > >
> > > It would be great to get the Swift implementation added to the
> integration test.  I have a task for adding the C Data Interface and I will
> work on getting the integration test running for Swift after that task.
> Can we move forward with setting up the repo as long as there is a
> task/issue to ensure the integration test will be run against Swift soon or
> would this be a blocker?
> > >
> > > Also, I am not sure about Julia, I have not looked into Julia’s
> implementation.
> > >
> > > Thank you,
> > > Alva Bandy
> > >
> > > On 2023/10/10 08:54:30 Antoine Pitrou wrote:
> > >>
> > >> Hello Alva,
> > >>
> > >> This is a reasonable request, but it might come with its own drawbacks
> > >> as well.
> > >>
> > >> One significant drawback is that adding the Swift implementation to
> the
> > >> cross-implementation integration tests will be slightly more
> complicated.
> > >> It is very important that all Arrow implementations are
> > >> integration-tested against each other, otherwise we only have a
> > >> theoretical guarantee that they are compatible. See how this is done
> here:
> > >> https://arrow.apache.org/docs/dev/format/Integration.html
> > >>
> > >> Unless I'm mistaken, neither Swift nor Julia are running the
> integration
> > >> tests.
> > >>
> > >> Regards
> > >>
> > >> Antoine.
> > >>
> > >>
> > >>
> > >> Le 09/10/2023 à 22:26, Alva Bandy a écrit :
> > >>> Hi,
> > >>>
> > >>> I would like to request a repo for Arrow Swift (similar to
> arrow-rs).  Swift arrow is currently fully Swift and doesn't leverage the
> C++ libraries. One of the goals of Arrow Swift was to provide a fully Swift
> impl and splitting them now would help ensure that Swift Arrow stays on
> this path.
> > >>>
> > >>> Also, the Swift Package Manager uses a git repo url to pull down a
> package.  This can lead to a large download since the entire arrow repo
> will be pulled down just to include Arrow Swift.  It would be great to make
> this change before registering Swift Arrow with a Swift registry (such as
> Swift Package Registry).
> > >>>
> > >>> Please let me know if this is possible and if so, what would be the
> process going forward.
> > >>>
> > >>> Thank you,
> > >>> Alva Bandy
> > >>>
>


Re: [Java][Discuss]: consensus for JDK 8 deprecation

2023-10-06 Thread Jacob Wujciak-Jens
>From a release engineer perspective (without java knowledge) I agree with
Micah, I'd rather make a patch release for an older version if needed but
modernize the codebase and simplify CI!


On Sat, Oct 7, 2023 at 5:27 AM Micah Kornfield 
wrote:

> I think given the stability of Arrow Java, dropping support probably makes
> sense.  If a bug comes up or consumers really need to new features we can
> always make a patch release of an older version.
>
> On Thu, Oct 5, 2023 at 3:13 PM Dane Pitkin 
> wrote:
>
> > I also learned today that Apache Spark has dropped support for Java 8 and
> > 11 for their next release (v4.0)[1]. Should we consider dropping Java 11
> as
> > well?
> >
> > [1]https://github.com/apache/spark/pull/43005
> >
> > -Dane
> >
> > On Thu, Oct 5, 2023 at 3:30 PM Dane Pitkin  wrote:
> >
> > > I created a GH issue[1] proposing the removal of Java 8 support. It
> > > would target the Arrow v15 release (~Jan 2024).
> > >
> > > IMO it would be in the best interest of the project for two major
> > reasons:
> > > 1. Unblock the Java Platform Module System (JPMS)[2] implementation.
> > > 2. Unblock Arrow from upgrading dependencies that no longer support
> Java
> > > 8. (See [1] for examples)
> > >
> > > Since Arrow Java has been quite stable, will Java 8 users be okay with
> > > pinning Arrow to the last supported release (v14) if the Arrow project
> > > ultimately decides to remove Java 8 support?
> > >
> > >
> > > [1]https://github.com/apache/arrow/issues/38051
> > > [2]https://en.wikipedia.org/wiki/Java_Platform_Module_System
> > >
> > > -Dane
> > >
> > > On Fri, Sep 15, 2023 at 12:26 PM Dane Pitkin 
> > wrote:
> > >
> > >> - As a low level library, users have to add specific flags to use
> > >>>  Java 9 and up with Arrow to resolve issues with java.nio. This has
> > >>>  been annoying for our customers constantly. If this is not resolved,
> > >>>  I would say we may see a lot of complaints in the future.
> > >>>
> > >> I filed issue 37739[1] to track this, but it sounds like this can't be
> > >> changed until Java 21 or 24.
> > >>
> > >> - It seems that the EOL of Java 8 from Oracle is Dec 2030 [2]. A lot
> > >>>  users will still stay on it for a long time. At least this is true
> for
> > >>> our
> > >>>  customers. So I am afraid we may not upgrade to newer versions
> > >>>  of Arrow if it no longer supports Java 8.
> > >>>
> > >> Java 8 does have a long Extended Support timeline, but a recent
> > >> report shows Java 11 increasing in adoption vs Java 8. "More than 56%
> of
> > >> applications are now using Java 11 in production (up from 48% in 2022
> > and
> > >> 11% in 2020). Java 8 is a close second with nearly 33% of applications
> > >> using it in production (down from 46% in 2022)."[2]
> > >> I expect the Java ecosystem will find a way to move on from Java 8
> much
> > >> sooner than 2030, meaning many of Arrow's dependencies could drop
> > support
> > >> for Java 8 before then. At this point, Arrow may be forced to support
> a
> > >> higher minimum Java version.
> > >>
> > >> That being said, it's hard to argue against real use cases. I'd be
> > >> curious to hear what Java version other users of Arrow are using (and
> if
> > >> there is a timeline to upgrade if on Java 8).
> > >>
> > >>
> > >> [1]https://github.com/apache/arrow/issues/37739
> > >> [2]
> > >>
> >
> https://newrelic.com/sites/default/files/2023-04/new-relic-2023-state-of-the-java-ecosystem-2023-04-20.pdf
> > >>
> > >>
> > >> -Dane
> > >>
> > >>
> > >> On Thu, Sep 14, 2023 at 11:45 AM Gang Wu  wrote:
> > >>
> > >>> Thanks for bringing this up!
> > >>>
> > >>> I have two concerns of dropping Java 8 support:
> > >>> - As a low level library, users have to add specific flags [1] to use
> > >>>  Java 9 and up with Arrow to resolve issues with java.nio. This has
> > >>>  been annoying for our customers constantly. If this is not resolved,
> > >>>  I would say we may see a lot of complaints in the future.
> > >>> - It seems that the EOL of Java 8 from Oracle is Dec 2030 [2]. A lot
> > >>>  users will still stay on it for a long time. At least this is true
> for
> > >>> our
> > >>>  customers. So I am afraid we may not upgrade to newer versions
> > >>>  of Arrow if it no longer supports Java 8.
> > >>>
> > >>> [1]
> https://arrow.apache.org/docs/java/install.html#java-compatibility
> > >>> [2]
> > >>>
> https://www.oracle.com/java/technologies/java-se-support-roadmap.html
> > >>>
> > >>> Best,
> > >>> Gang
> > >>>
> > >>>
> > >>>
> > >>> On Thu, Sep 14, 2023 at 11:14 PM David Dali Susanibar Arce <
> > >>> davi.sar...@gmail.com> wrote:
> > >>>
> > >>> > Hi Arrow Java developers,
> > >>> >
> > >>> > I would like to propose a timeline for dropping support for Java 8:
> > >>> > - Propose to drop JDK8 in Arrow v15 (2 releases from now)
> > >>> > - JDK 21 support will be added before removal of JDK8
> > >>> >
> > >>> > Why?
> > >>> > - Java 8 no longer receives Premier Support (1)
> > >>> > - Some Arrow Java (test) dependencies have 

Re: [VOTE] Release Apace Arrow nanoarrow 0.3.0 - RC0

2023-09-26 Thread Jacob Wujciak-Jens
+1 (non-binding)

full verification with conda arrow 13.0.0 R 4.3 on pop_os 23.04, cmake
3.27, gcc 11

On Wed, Sep 27, 2023 at 1:26 AM Bryce Mecum  wrote:

> +1 (non-binding)
>
> Verified with `./verify-release-candidate.sh 0.3.0 0` on:
> - Windows 10, x86_64, libarrow-main, MSVC 17 2022, R 4.3.1, Rtools 43
> - macOS 13.6, aarch64, libarrow 13.0.0, R 4.3.1
> - Ubuntu 23.04, aarch64, libarrow 13.0.0, R 4.2.2
>


Re: [ACCOUNCE] New Arrow Committer: Metehan Yildirim

2023-09-05 Thread Jacob Wujciak-Jens
Welcome and congratulations!

On Tue, Sep 5, 2023 at 11:05 PM Raúl Cumplido  wrote:

> Welcome!
>
> El mar, 5 sept 2023, 22:32, Mehmet Ozan Kabak  escribió:
>
> > Welcome to the community Mete! We’ve already worked on many interesting
> > things together, and I’m looking forward to working on many others.
> >
> > Mehmet Ozan Kabak
> >
> > > On Sep 5, 2023, at 7:14 PM, Andrew Lamb  wrote:
> > >
> > > Belatedly,
> > >
> > > On behalf of the Arrow PMC, I'm happy to announce that Metehan Yildirim
> > > (mete[1])
> > > has accepted an invitation to become a committer on Apache
> > > Arrow. Welcome, and thank you for your contributions!
> > >
> > > Andrew
> > >
> > > [1]: https://people.apache.org/phonebook.html?uid=mete
> >
>


Improved nightly build dashboard

2023-08-25 Thread Jacob Wujciak-Jens
Hello Everyone!

Sam spent some time this week improving our nightly build dashboard with
new features like filtering by job name (failed and passing builds!) and
fancy new graphs.

This will make investigating and fixing nightly fails even easier (the next
release will come soon ;) )! Thanks Sam!

You can find the dashboard here: http://crossbow.voltrondata.com/
(it is intentionally http due to technical limitations)

Best
Jacob


Re: [Discuss] Do we need a release verification script?

2023-08-22 Thread Jacob Wujciak-Jens
I am also in favour of changes to the verification scripts. This is a bit
of a tangent but has direct implications for our entire release process so
I will bring it up first. While re-reading the general release policy I
came across this new addition (made in late july):

> All supplied packages MUST be cryptographically signed with a detached
signature. It MUST be signed by either the Release Manager or __the
automated release infrastructure__

The details are described in [1] but we now have a clear way to completely
automate everything outside the vote.Combined with the current discussion
and the previous discussions about release frequency, versioning and the
need for a support policy, a re-thinking of our entire process might be in
order.

[1]: https://infra.apache.org/release-signing.html#automated-release-signing

On Tue, Aug 22, 2023 at 12:33 PM Raúl Cumplido 
wrote:

> Hi,
>
> I do agree that currently verifying the release locally provides
> little benefit for the effort we have to put in but I thought this was
> required as per Apache policy:
> https://www.apache.org/legal/release-policy.html#release-approval
>
> Copying the important bit:
> """
> Before casting +1 binding votes, individuals are REQUIRED to download
> all signed source code packages onto their own hardware, verify that
> they meet all requirements of ASF policy on releases as described
> below, validate all cryptographic signatures, compile as provided, and
> test the result on their own platform.
> """
>
> I also think we should try and challenge those.
>
> In the past we have identified some minor issues on the local
> verification but I don't recall any of them being blockers for the
> release.
>
> Thanks,
> Raúl
>
> El mar, 22 ago 2023 a las 11:46, Andrew Lamb ()
> escribió:
> >
> > The Rust arrow implementation (arrow-rs) and DataFusion also use release
> > verification scripts, mostly inherited from when they were split from the
> > mono repo. They have found issues from time to time, for us, but those
> > issues are often not platform related and have not been release blockers.
> >
> > Thankfully for Rust, the verification scripts don't need much maintenance
> > so we just continue the ceremony. However, I certainly don't think we
> would
> > lose much/any test coverage if we stopped their use.
> >
> > Andrew
> >
> > On Tue, Aug 22, 2023 at 4:54 AM Antoine Pitrou 
> wrote:
> >
> > >
> > > Hello,
> > >
> > > Abiding by the Apache Software Foundation's guidelines, every Arrow
> > > release is voted on and requires at least 3 "binding" votes to be
> approved.
> > >
> > > Also, every Arrow release vote is accompanied by a little ceremonial
> > > where contributors and core developers run a release verification
> script
> > > on their machine, wait for long minutes (sometimes an hour) and report
> > > the results.
> > >
> > > This ceremonial has gone on for years, and it has not really been
> > > questioned. Yet, it's not obvious to me what it is achieving exactly.
> > > I've been here since 2018, but I don't really understand what the
> > > verification script is testing for, or, more importantly, *why* it is
> > > testing for what it is testing. I'm probably not the only one?
> > >
> > > I would like to bring the following points:
> > >
> > > * platform compatibility is (supposed to be) exercised on Continuous
> > > Integration; there is no understandable reason why it should be
> > > ceremoniously tested on each developer's machine before the release
> > >
> > > * just before a release is probably the wrong time to be testing
> > > platform compatibility, and fixing compatibility bugs (though, of
> > > course, it might still be better than not noticing?)
> > >
> > > * home environments are unstable, and not all developers run the
> > > verification script for each release, so each release is actually
> > > verified on different, uncontrolled, platforms
> > >
> > > * as for sanity checks on binary packages, GPG signatures, etc., there
> > > shouldn't be any need to run them on multiple different machines, as
> > > they are (should be?) entirely deterministic and platform-agnostic
> > >
> > > * maintaining the verification scripts is a thankless task, in part due
> > > to their nature (they need to track and mirror changes made in each
> > > implementation's build chain), in part due to implementation choices
> > >
> > > * due to the existence of the verification scripts, the release vote is
> > > focussed on getting the script to run successfully (a very contextual
> > > and non-reproducible result), rather than the actual *contents* of the
> > > release
> > >
> > > The most positive thing I can personally say about the verification
> > > scripts is that they *may* help us trust the release is not broken? But
> > > that's a very unqualified statement, and is very close to
> cargo-culting.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
>


Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-08-18 Thread Jacob Wujciak-Jens
+1 (non-binding)

On Fri, Aug 18, 2023 at 6:04 PM L. C. Hsieh  wrote:

> +1 (binding)
>
> On Fri, Aug 18, 2023 at 5:53 AM Neal Richardson
>  wrote:
> >
> > +1
> >
> > Thanks all for the thoughtful discussions here.
> >
> > Neal
> >
> > On Fri, Aug 18, 2023 at 4:14 AM Raphael Taylor-Davies
> >  wrote:
> >
> > > +1 (binding)
> > >
> > > Despite my earlier misgivings, I think this will be a valuable addition
> > > to the specification.
> > >
> > > To clarify I've interpreted this as a vote on both Utf8View and
> > > BinaryView as in the linked PR.
> > >
> > > On 28/06/2023 20:34, Benjamin Kietzman wrote:
> > > > Hello,
> > > >
> > > > I'd like to propose adding Utf8View arrays to the arrow format.
> > > > Previous discussion in [1], columnar format description in [2],
> > > > flatbuffers changes in [3].
> > > >
> > > > There are implementations available in both C++[4] and Go[5] which
> > > > exercise the new type over IPC. Utf8View format demonstrates[6]
> > > > significant performance benefits over Utf8 in common tasks.
> > > >
> > > > The vote will be open for at least 72 hours.
> > > >
> > > > [ ] +1 add the proposed Utf8View type to the Apache Arrow format
> > > > [ ] -1 do not add the proposed Utf8View type to the Apache Arrow
> format
> > > > because...
> > > >
> > > > Sincerely,
> > > > Ben Kietzman
> > > >
> > > > [1] https://lists.apache.org/thread/w88tpz76ox8h3rxkjl4so6rg3f1rv7wt
> > > > [2]
> > > >
> > >
> https://github.com/apache/arrow/blob/46cf7e67766f0646760acefa4d2d01cdfead2d5d/docs/source/format/Columnar.rst#variable-size-binary-view-layout
> > > > [3]
> > > >
> > >
> https://github.com/apache/arrow/pull/35628/files#diff-0623d567d0260222d5501b4e169141b5070eabc2ec09c3482da453a3346c5bf3
> > > > [4] https://github.com/apache/arrow/pull/35628
> > > > [5] https://github.com/apache/arrow/pull/35769
> > > > [6]
> https://github.com/apache/arrow/pull/35628#issuecomment-1583218617
> > > >
> > >
>


Re: [VOTE] Release Apache Arrow 13.0.0 - RC3

2023-08-18 Thread Jacob Wujciak-Jens
+1 (non-binding)
Verified on pop_os 22.04 with conda

On Fri, Aug 18, 2023 at 5:09 PM David Li  wrote:

> +1
>
> Tested on Ubuntu Linux 20.04/Conda/x86_64
>
> On Fri, Aug 18, 2023, at 09:37, Raúl Cumplido wrote:
> > +1 (non binding)
> >
> > I have tested both SOURCES and BINARIES successfully with:
> > TEST_DEFAULT=0 TEST_SOURCE=1 dev/release/verify-release-candidate.sh
> 13.0.0 3
> > TEST_DEFAULT=0 TEST_BINARIES=1 dev/release/verify-release-candidate.sh
> 13.0.0 3
> > with:
> >   * Python 3.11.4
> >   * gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
> >   * NVIDIA CUDA cuda_11.5.r11.5/compiler.30672275_0
> >   * openjdk version 17.0.8 2023-07-18
> >   * ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
> >   * dotnet 7.0.400
> >   * Ubuntu 22.04 LTS
> >
> > El vie, 18 ago 2023 a las 10:05, Raúl Cumplido ()
> escribió:
> >>
> >> Hi,
> >>
> >> I ran the new benchmark comparison between the previous release
> >> (12.0.1) and the new Release Candidate (13.0.0. RC3). This can be
> >> found here [1].
> >>
> >> Many thanks to Sam Albers and Jacob Wujciak for setting the crossbow
> >> workflow to generate the report.
> >>
> >> Thanks,
> >> Raúl
> >>
> >> [1]
> http://crossbow.voltrondata.com/performance-release-report13.0.0-rc3.html
> >>
> >> El vie, 18 ago 2023 a las 10:00, Raúl Cumplido ()
> escribió:
> >> >
> >> > Hi,
> >> >
> >> > I would like to propose the following release candidate (RC3) of
> Apache
> >> > Arrow version 13.0.0. This is a release consisting of 440
> >> > resolved GitHub issues[1].
> >> >
> >> > This release candidate is based on commit:
> >> > b7d2f7ffca66c868bd2fce5b3749c6caa002a7f0 [2]
> >> >
> >> > The source release rc3 is hosted at [3].
> >> > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
> >> > The changelog is located at [12].
> >> >
> >> > Please download, verify checksums and signatures, run the unit tests,
> >> > and vote on the release. See [13] for how to validate a release
> candidate.
> >> >
> >> > See also a verification result on GitHub pull request [14].
> >> >
> >> > The vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Release this as Apache Arrow 13.0.0
> >> > [ ] +0
> >> > [ ] -1 Do not release this as Apache Arrow 13.0.0 because...
> >> >
> >> > [1]:
> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A13.0.0+is%3Aclosed
> >> > [2]:
> https://github.com/apache/arrow/tree/b7d2f7ffca66c868bd2fce5b3749c6caa002a7f0
> >> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-13.0.0-rc3
> >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> >> > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
> >> > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
> >> > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> >> > [8]: https://apache.jfrog.io/artifactory/arrow/java-rc/13.0.0-rc3
> >> > [9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/13.0.0-rc3
> >> > [10]: https://apache.jfrog.io/artifactory/arrow/python-rc/13.0.0-rc3
> >> > [11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> >> > [12]:
> https://github.com/apache/arrow/blob/b7d2f7ffca66c868bd2fce5b3749c6caa002a7f0/CHANGELOG.md
> >> > [13]:
> https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
> >> > [14]: https://github.com/apache/arrow/pull/37220
>


Re: [QUESTION][BLOG] Contributing a Blog Post

2023-07-14 Thread Jacob Wujciak-Jens
+1 Sounds great, looking forward to more blog posts!

On Fri, Jul 14, 2023 at 4:40 PM Matt Topol  wrote:

> I think this would be a great idea! It's been great seeing various
> organizations posting on the Arrow blog and this would be a great
> contribution. Assuming that no one objects, you can contribute a PR to
> https://github.com/apache/arrow-site
>
> --Matt
>
> On Fri, Jul 14, 2023 at 10:17 AM Christopher Akiki <
> christopher.ak...@gmail.com> wrote:
>
> > Hi everyone!
> >
> > We are currently writing a blog post (
> > https://github.com/huggingface/blog/pull/1283) about the synergies
> between
> > the Hugging Face `datasets` library and Apache Arrow, and how to
> > use the Compute API to analyze HF datasets out-of-core. This will soon be
> > published on the HF blog: https://huggingface.co/blog.
> >
> > We thought it might be cool to cross-post (not necessarily in its exact
> > same form) on the Arrow blog, if that's something that you'd be
> interested
> > in.
> >
> > Look forward to hearing what you think!
> >
> > Best,
> > Chris
> >
>


Re: [RESULT][VOTE] Release Apache Arrow 12.0.1 - RC1

2023-07-13 Thread Jacob Wujciak-Jens
Hello Everyone,

I have opened a PR to add branch protection to the main branch, this will
prevent force pushing and direct commits without a reviewed PR
https://github.com/apache/arrow/pull/36678

Jacob

On Tue, Jun 13, 2023 at 4:23 PM Raúl Cumplido  wrote:

> Hi,
>
> I've had an issue with the post-11-bump-versions.sh script. For patch
> releases the script fails unless using the `BUMP_DEB_PACKAGE_NAMES=0`
> flag.
> This is not documented and I had to test several retries locally to
> understand what the issue was.
>
> The problem is that this script commits and pushes to main without
> prompting. And me executing it several times ended up with several
> commits on main that got accidentally pushed to the upstream remote.
> I raised the issue on Zulip and had a chat with David Li on how to
> solve it and ended up pushing a single commit reverting the
> "duplicated commits". The wrong commits + the revert can be seen here:
>
> https://github.com/apache/arrow/compare/f08670bd20e81ae79f33e66256927f584ae62d02...e53db939bfad2f20e332172ab4f453add1dc680d
>
> I have created an issue to improve the `post-11-bump-versions.sh`
> script to avoid this from happening in the future:
> https://github.com/apache/arrow/issues/36048
>
> I just wanted to give a heads up on what happened.
>
> Thanks and sorry about that.
>
> I will continue with the post-release tasks:
>
> - [done] Update the released milestone Date and set to "Closed" on GitHub
> - [done] Merge changes on release branch to maintenance branch for
> patch releases
> - [done] Add the new release to the Apache Reporter System
> - [done] Upload source
> - [done] Upload binaries
> - [done] Update website
> - [done] Upload JavaScript packages
> - [done] Upload C# packages
> - [done] Upload wheels/sdist to pypi
> - [done] Publish Maven artifacts
> - [done] Bump versions
> - [done] Update tags for Go modules
> - [In progress] Update Homebrew packages
> - [In progress] Update MSYS2 package
> - [In progress] Update vcpkg port
> - [ ] Upload RubyGems
> - [ ] Update Conan recipe
> - [ ] Update docs
> - [ ] Update version in Apache Arrow Cookbook
> - [ ] Announce the new release
> - [ ] Publish release blog posts
> - [ ] Announce the release on Twitter
> - [ ] Remove old artifacts
>
> I will need help with:
> - [ ] Update conda recipes
> - [ ] Update R packages
>
> El mar, 13 jun 2023 a las 12:02, Raúl Cumplido ()
> escribió:
> >
> > Thanks Nic for helping me with uploading sources and adding the
> > release to the Apache Reporter System.
> >
> > This is the current status of the post-release tasks:
> >
> > - [done] Update the released milestone Date and set to "Closed" on GitHub
> > - [done] Merge changes on release branch to maintenance branch for
> > patch releases
> > - [done] Add the new release to the Apache Reporter System
> > - [done] Upload source
> > - [done] Upload binaries
> > - [ ] Update website
> > - [ ] Update Homebrew packages
> > - [ ] Update MSYS2 package
> > - [ ] Upload RubyGems
> > - [ ] Upload JavaScript packages
> > - [ ] Upload C# packages
> > - [ ] Upload wheels/sdist to pypi
> > - [ ] Publish Maven artifacts
> > - [ ] Update vcpkg port
> > - [ ] Update Conan recipe
> > - [ ] Bump versions
> > - [ ] Update tags for Go modules
> > - [ ] Update docs
> > - [ ] Update version in Apache Arrow Cookbook
> > - [ ] Announce the new release
> > - [ ] Publish release blog posts
> > - [ ] Announce the release on Twitter
> > - [ ] Remove old artifacts
> >
> > I will need help with:
> > - [ ] Make the CPP PARQUET related version as "RELEASED" on JIRA
> > - [ ] Start the new version on JIRA for the related CPP PARQUET version
> > - [ ] Update conda recipes
> > - [ ] Update R packages
> >
> > El mar, 13 jun 2023 a las 9:09, Raúl Cumplido
> > () escribió:
> > >
> > > Hi,
> > >
> > > Thanks everyone.
> > >
> > > The result of the vote is successful with 3 +1 binding votes, 3 +1
> > > non-binding vote and no -1 votes.
> > > I will start the post release tasks for 12.0.1 [1].
> > >
> > > Thanks,
> > > Raúl
> > >
> > > [1]
> https://arrow.apache.org/docs/dev/developers/release.html#post-release-tasks
> > >
> > > El mar, 13 jun 2023 a las 5:21, Jacob Wujciak-Jens
> > > () escribió:
> > > >
> > > > +1 non-binding, verified Go and C++ on manjaro
> > > >
> > > > On Mon, Jun 12, 2023 at 6:17 PM Raúl Cumplido 
> wrote:
> > &

Re: [DISCUSS] Canonical alternative layout proposal

2023-07-12 Thread Jacob Wujciak-Jens
Hello Everyone,

Thanks for this comprehensive but concise write up Neal! I think this
proposal is a good way to avoid both fragmentation of the arrow ecosystem
as well as its obsolescence. In my opinion of these two problems the
obsolescence is the bigger issue as (as mentioned in the proposal) arrow is
already (close to) being relegated to the sidelines in eco-system defining
projects.

Jacob

On Thu, Jul 13, 2023 at 12:03 AM Neal Richardson <
neal.p.richard...@gmail.com> wrote:

> Hi all,
> As was previously raised in [1] and surfaced again in [2], there is a
> proposal for representing alternative layouts. The intent, as I understand
> it, is to be able to support memory layouts that some (but perhaps not all)
> applications of Arrow find valuable, so that these nearly Arrow systems can
> be fully Arrow-native.
>
> I wanted to start a more focused discussion on it because I think it's
> worth being considered on its own merits, but I also think this gets to the
> core of what the Arrow project is and should be, and I don't want us to
> lose sight of that.
>
> To restate the proposal from [1]:
>
>  * There are one or more primary layouts
>* Existing layouts are automatically considered primary layouts,
> even if they
> wouldn't have been primary layouts initially (e.g. large list)
>  * A new layout, if it is semantically equivalent to another, is
> considered an
> alternative layout
>  * An alternative layout still has the same requirements for adoption
> (two implementations
> and a vote)
>* An implementation should not feel pressured to rush and implement the
> new
> layout. It would be good if they contribute in the discussion and consider
> the layout and vote if they feel it would be an acceptable design.
>  * We can define and vote and approve as many canonical alternative
> layouts as
> we want:
>* A canonical alternative layout should, at a minimum, have some
> reasonable
> justification, such as improved performance for algorithm X
>  * Arrow implementations MUST support the primary layouts
>  * An Arrow implementation MAY support a canonical alternative, however:
>* An Arrow implementation MUST first support the primary layout
>* An Arrow implementation MUST support conversion to/from the primary
> and
> canonical layout
>* An Arrow implementation's APIs MUST only provide data in the
> alternative layout if it is explicitly asked for (e.g. schema inference
> should prefer the primary layout).
>  * We can still vote for new primary layouts (e.g. promoting a
> canonical alternative)
> but, in these votes we don't only consider the value (e.g. performance) of
> the layout but also the interoperability. In other words, a layout can only
> become a primary layout if there is significant evidence that most
> implementations
> plan to adopt it.
>
>
> To summarize some of the arguments against the proposal from the previous
> threads, there are concerns about increasing the complexity of the Arrow
> specification and the cost/burden of updating all of the Arrow
> specifications to support them.
>
> Where these discussions, both about several proposed new types and this
> layout proposal, get to the core of Arrow is well expressed in the comments
> on the previous thread by Raphael [3] and Pedro [4]. Raphael asks: "what
> matters to people more, interoperability or best-in-class performance?" And
> Pedro notes that because of the overhead of converting these not-yet-Arrow
> types to the Arrow C ABI is high enough that they've considered abandoning
> Arrow as their interchange format. So: on the one hand, we're kinda
> choosing which quality we're optimizing for, but on the other,
> interoperability and performance are dependent on each other.
>
> What I see that we're trying to do here is find a way to expand the Arrow
> specification just enough so that Arrow becomes or remains the in-memory
> standard everywhere, but not so much that it creates too much complexity or
> burden to implement. Expand too much and you get a fragmented ecosystem
> where everyone is writing subsets of the Arrow standard and so nothing is
> fully compatible and the whole premise is undermined. But expand too little
> and projects will abandon the standard and we've also failed.
>
> I don't have a tidy answer, but I wanted to acknowledge the bigger issues,
> and see if this helps us reason about the various proposals on the table. I
> wonder if the alternative layout proposal is the happy medium that adds
> some complexity to the specification, but less than there would be if three
> new types were added, and still meets the needs of projects like DuckDB,
> Velox, and Gluten and gets them fully Arrow native.
>
> Neal
>
>
> [1]: https://lists.apache.org/thread/pfy02d9m2zh08vn8opm5td6l91z6ssrk
> [2]: https://lists.apache.org/thread/wosy53ysoy4s0yy6zbnch3dx2x4jplw6
> [3]: https://lists.apache.org/thread/r35g5612kszx9scfpk5rqpmlym4yq832
> [4]: 

Re: [ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-06 Thread Jacob Wujciak-Jens
Congratulations and welcome!

On Wed, Jul 5, 2023 at 10:49 PM Kevin Gurney  wrote:

> Thank you all for the kind words and warm welcome!
>
> I feel honored and excited to be part of this vibrant community! I look
> forward to continuing to collaborate with all of you!
>
> Best Regards,
>
> Kevin Gurney
> 
> From: Alenka Frim 
> Sent: Wednesday, July 5, 2023 12:22 AM
> To: dev@arrow.apache.org 
> Subject: Re: [ANNOUNCE] New Arrow committer: Kevin Gurney
>
> Congratulations!
>
> On Tue, Jul 4, 2023 at 9:41 PM Dewey Dunnington
>  wrote:
>
> > Congrats!
> >
> > On Tue, Jul 4, 2023 at 2:08 PM Matt Topol 
> wrote:
> > >
> > > Welcome!
> > >
> > > On Tue, Jul 4, 2023, 11:06 AM Joris Van den Bossche <
> > > jorisvandenboss...@gmail.com> wrote:
> > >
> > > > Congrats Kevin!
> > > >
> > > > On Tue, 4 Jul 2023 at 13:47, David Li  wrote:
> > > > >
> > > > > Welcome Kevin!
> > > > >
> > > > > On Tue, Jul 4, 2023, at 05:55, Raúl Cumplido wrote:
> > > > > > Congratulations Kevin!!!
> > > > > >
> > > > > > El mar, 4 jul 2023 a las 3:32, Weston Pace (<
> weston.p...@gmail.com
> > >)
> > > > escribió:
> > > > > >>
> > > > > >> Congratulations Kevin!
> > > > > >>
> > > > > >> On Mon, Jul 3, 2023 at 5:18 PM Sutou Kouhei  >
> > > > wrote:
> > > > > >>
> > > > > >> > On behalf of the Arrow PMC, I'm happy to announce that Kevin
> > Gurney
> > > > > >> > has accepted an invitation to become a committer on Apache
> > > > > >> > Arrow. Welcome, and thank you for your contributions!
> > > > > >> >
> > > > > >> > --
> > > > > >> > kou
> > > > > >> >
> > > >
> >
>


Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington

2023-06-23 Thread Jacob Wujciak-Jens
Well deserved! Congratulations Dewey!

Ian Cook  schrieb am Fr., 23. Juni 2023, 16:32:

> Congratulations Dewey!
>
> On Fri, Jun 23, 2023 at 10:03 AM Matt Topol 
> wrote:
> >
> > Congrats Dewey!!
> >
> > On Fri, Jun 23, 2023, 9:35 AM Dane Pitkin 
> > wrote:
> >
> > > Congrats Dewey!
> > >
> > > On Fri, Jun 23, 2023 at 9:15 AM Nic Crane  wrote:
> > >
> > > > Well-deserved Dewey, congratulations!
> > > >
> > > > On Fri, 23 Jun 2023 at 11:53, Vibhatha Abeykoon 
> > > > wrote:
> > > >
> > > > > Congratulations Dewey!
> > > > >
> > > > > On Fri, Jun 23, 2023 at 4:16 PM Alenka Frim <
> ale...@voltrondata.com
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Congratulations Dewey!! 
> > > > > >
> > > > > > On Fri, Jun 23, 2023 at 12:10 PM Raúl Cumplido <
> > > raulcumpl...@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Congratulations Dewey!
> > > > > > >
> > > > > > > El vie, 23 jun 2023, 11:55, Andrew Lamb 
> > > > > escribió:
> > > > > > >
> > > > > > > > The Project Management Committee (PMC) for Apache Arrow has
> > > invited
> > > > > > > > Dewey Dunnington (paleolimbot) to become a PMC member and we
> are
> > > > > > pleased
> > > > > > > to
> > > > > > > > announce
> > > > > > > > that Dewey Dunnington has accepted.
> > > > > > > >
> > > > > > > > Congratulations and welcome!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


Re: [VOTE] Release Apache Arrow ADBC 0.5.0 - RC0

2023-06-19 Thread Jacob Wujciak-Jens
+1 (nb) with conda on ubuntu

On Mon, Jun 19, 2023 at 2:18 PM David Li  wrote:

> My vote: +1 (Ubuntu Linux 20.04/x86_64)
>
> On Fri, Jun 16, 2023, at 05:24, Raúl Cumplido wrote:
> > +1 (non-binding)
> >
> > I ran the following on Ubuntu 22.04:
> >
> > USE_CONDA=1 dev/release/verify-release-candidate.sh 0.5.0 0
> >
> > Thanks!
> > Raúl
> >
> > El vie, 16 jun 2023 a las 9:10, Sutou Kouhei ()
> escribió:
> >>
> >> +1
> >>
> >> I ran the following on Debian GNU/Linux sid:
> >>
> >>   JAVA_HOME=/usr/lib/jvm/default-java \
> >> dev/release/verify-release-candidate.sh 0.5.0 0
> >>
> >> with:
> >>
> >>   * Python 3.11.2
> >>   * g++ (Debian 12.2.0-14) 12.2.0
> >>   * go version go1.19.8 linux/amd64
> >>   * openjdk version "17.0.6" 2023-01-17
> >>   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
> >>   * R version 4.3.0 (2023-04-21) -- "Already Tomorrow"
> >>
> >>
> >> Note: I needed https://github.com/apache/arrow-adbc/pull/810 .
> >>
> >>
> >> Thanks,
> >> --
> >> kou
> >>
> >>
> >> In <74f01cb9-5c76-4745-b357-4deca0bbd...@app.fastmail.com>
> >>   "[VOTE] Release Apache Arrow ADBC 0.5.0 - RC0" on Thu, 15 Jun 2023
> 20:06:46 -0400,
> >>   "David Li"  wrote:
> >>
> >> > Hello,
> >> >
> >> > I would like to propose the following release candidate (RC0) of
> Apache Arrow ADBC version 0.5.0. This is a release consisting of 36
> resolved GitHub issues [1].
> >> >
> >> > This release candidate is based on commit:
> ac0e0ef8bd83787f65e53d421fce6ad490d9a37d [2]
> >> >
> >> > The source release rc0 is hosted at [3].
> >> > The binary artifacts are hosted at [4][5][6][7][8].
> >> > The changelog is located at [9].
> >> >
> >> > Please download, verify checksums and signatures, run the unit tests,
> and vote on the release. See [10] for how to validate a release candidate.
> >> >
> >> > See also a verification result on GitHub Actions [11].
> >> >
> >> > The vote will be open for at least 72 hours.
> >> >
> >> > [ ] +1 Release this as Apache Arrow ADBC 0.5.0
> >> > [ ] +0
> >> > [ ] -1 Do not release this as Apache Arrow ADBC 0.5.0 because...
> >> >
> >> > Note: to verify APT/YUM packages on macOS/AArch64, you must `export
> DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export
> TEST_APT=0 TEST_YUM=0`.)
> >> >
> >> > [1]:
> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A%22ADBC+Libraries+0.5.0%22+is%3Aclosed
> >> > [2]:
> https://github.com/apache/arrow-adbc/commit/ac0e0ef8bd83787f65e53d421fce6ad490d9a37d
> >> > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-adbc-0.5.0-rc0/
> >> > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> >> > [5]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> >> > [6]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
> >> > [7]:
> https://repository.apache.org/content/repositories/staging/org/apache/arrow/adbc/
> >> > [8]:
> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.5.0-rc0
> >> > [9]:
> https://github.com/apache/arrow-adbc/blob/apache-arrow-adbc-0.5.0-rc0/CHANGELOG.md
> >> > [10]:
> https://arrow.apache.org/adbc/main/development/releasing.html#how-to-verify-release-candidates
> >> > [11]: https://github.com/apache/arrow-adbc/actions/runs/5284608862
>


Re: [VOTE] Release Apache Arrow nanoarrow 0.2.0 - RC0

2023-06-16 Thread Jacob Wujciak-Jens
+1 (non-binding) verified fully on R 4.3 and GCC 12 on manjaro

On Fri, Jun 16, 2023 at 11:13 PM David Li  wrote:

> +1
>
> Tested on Ubuntu 20.04/x86_64
>
> On Fri, Jun 16, 2023, at 16:15, Dewey Dunnington wrote:
> > Hello,
> >
> > I would like to propose the following release candidate (RC0) of
> > Apache Arrow nanoarrow version 0.2.0. This release consists of 17
> > resolved GitHub issues [1].
> >
> > This release candidate is based on commit:
> > a7b824de6cb99ce458e1a5cd311d69588ceb0570 [2]
> >
> > The source release rc0 is hosted at [3].
> > The changelog is located at [4].
> >
> > Please download, verify checksums and signatures, run the unit tests,
> > and vote on the release. See [5] for how to validate a release
> > candidate.
> >
> > The vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this as Apache Arrow nanoarrow 0.2.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Arrow nanoarrow 0.2.0 because...
> >
> > [0] https://github.com/apache/arrow-nanoarrow
> > [1] https://github.com/apache/arrow-nanoarrow/milestone/2?closed=1
> > [2]
> >
> https://github.com/apache/arrow-nanoarrow/tree/apache-arrow-nanoarrow-0.2.0-rc0
> > [3]
> >
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-nanoarrow-0.2.0-rc0/
> > [4]
> >
> https://github.com/apache/arrow-nanoarrow/blob/apache-arrow-nanoarrow-0.2.0-rc0/CHANGELOG.md
> > [5]
> >
> https://github.com/apache/arrow-nanoarrow/blob/main/dev/release/README.md
>


Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-06-15 Thread Jacob Wujciak-Jens
> Even if ListView is rarely used for interoperability (if it never gains
wide adoption), some of the arrow implementations could use ListView to
offer faster computation kernels, which I think has real value

This is an important point, thanks for the clear phrasing Andrew!

On Thu, Jun 15, 2023 at 5:32 PM Felipe Oliveira Carvalho <
felipe...@gmail.com> wrote:

> On Wed, Jun 14, 2023 at 5:07 PM Raphael Taylor-Davies
>  wrote:
>
> > Even something relatively straightforward becomes a huge implementation
> > effort when multiplied by a large number of codebases, users and
> > datasets. Parquet is a great source of historical examples of the
> > challenges of incremental changes that don't meaningfully unlock new
> > use-cases. To take just one, Int96 was deprecated almost a decade ago,
> > in favour of some additional metadata over an existing physical layout,
> > and yet Int96 is still to the best of my knowledge used by Spark by
> > default.
> >
> > That's not to say that I think the arrow specification should ossify and
> > we should never change it, but I'm not hugely enthusiastic about adding
> > encodings that are only incremental improvements over existing encodings.
> >
> > I therefore wonder if there are some new use-cases I am missing that
> > would be unlocked by this change, and that wouldn't be supported by the
> > dictionary proposal? Perhaps you could elaborate here?
>
>
> The dict-encoded ListArray would add a memory indirection: the slot in the
> dictionary has to be resolved before the offset can be. With ListView, the
> CPU can access offset and size without waiting on that memory access. Both
> need two buffer accesses after the validity check, but ListView doesn't
> have that dependency.
>
>
> > Whilst I do agree
> > using dictionaries as proposed is perhaps a less elegant solution, I
> > don't see anything inherently wrong with it, and if it ain't broke we
> > really shouldn't be trying to fix it.
> >
> > Kind Regards,
> >
> > Raphael Taylor-Davies
> >
> > On 14 June 2023 17:52:52 BST, Felipe Oliveira Carvalho
> >  wrote:
> >
> > General approach to alternative formats aside, in the specific case
> > of ListView, I think the implementation complexity is being
> > overestimated in these discussions. The C++ Arrow implementation
> > shares a lot of code between List and LargeList. And with some
> > tweaks, I'm able to share that common infrastructure for ListView as
> > well. [1] ListView is similar to list: it doesn't require offsets to
> > be sorted and adds an extra buffer containing sizes. For symmetry
> > with the List and LargeList types (FixedSizeList not included), I'm
> > going to propose we add a LargeListView. That is not part of the
> > draft implementation yet, but seems like an obvious thing to have
> > now that I implemented the `if_else` specialization. [2] David Li
> > asked about this above and I can confirm now that 64-bit version of
> > ListView (LargeListView) is in the plans. Trying to avoid
> > re-implementing some kernels is not a good goal to chase, IMO,
> > because kernels need tweaks to take advantage of the format. [1]
> > https://github.com/apache/arrow/pull/35345 [2]
> > https://github.com/felipecrv/arrow/commits/list_view_backup --
> > Felipe On Wed, Jun 14, 2023 at 12:08 PM Weston Pace
> >  wrote:
> >
> > perhaps we could support this use-case as a canonical
> > extension type over dictionary encoded, variable-sized arrays
> >
> > I believe this suggestion is valid and could be used to solve
> > the if-else case. The algorithm, if I understand it, would be
> > roughly: ``` // Note: Simple pseudocode, vectorization left as
> > exercise for the reader auto indices_builder = ... auto
> > list_builder = ... indices_builder.resize(batch.length); Array
> > condition_mask = condition.EvaluateBatch(batch); for row_index
> > in selected_rows(condition_mask): indices_builder[row_index] =
> > list_builder.CurrentLength();
> > list_builder.Append(if_expr.EvaluateRow(batch, row_index)) for
> > row_index in unselected_rows(condition_mask):
> > indices_builder[row_index] = list_builder.CurrentLength();
> > list_builder.Append(else_expr.EvaluateRow(batch, row_index))
> > return DictionaryArray(indices_builder.Finish(),
> > list_builder.Finish()) ``` I also agree this is a slightly
> > awkward use of dictionaries (e.g. the dictionary would have the
> > same size as the # of indices) and perhaps not the most
> > intuitive way to solve the problem. My gut reaction is simply
> > "an improved if/else kernel is not, alone, enough justification
> > for a new layout" and yet... I think we are seeing the start (I
> > hope) of a trend where Arrow is not just used "between systems"
> > (e.g. to shuttle data from one 

Re: [Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

2023-06-15 Thread Jacob Wujciak-Jens
+1 on the update but also on properly communicating the change to avoid
surprising issues :)

On Thu, Jun 15, 2023 at 7:53 PM Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:

> On Thu, 15 Jun 2023 at 19:08, Ian Cook  wrote:
> >
> > It will still be possible to write files using Parquet 2.4 by
> > explicitly specifying the 2.4 version to the Parquet writer, correct?
> > If yes, that provides a simple workaround for users who encounter
> > compatibility issues.
>
> Indeed. Using the pyarrow API, it would be something like
> `pq.write_parquet(table, path, version="2.4")`
>
> >
> > However we should take care to document this as a potentially breaking
> > change, and document the workaround in release notes, release blog,
> > etc.
>
> Certainly!
>
> >
> > Ian
> >
> > On Thu, Jun 15, 2023 at 12:25 PM Joris Van den Bossche
> >  wrote:
> > >
> > > Hi all,
> > >
> > > Bringing up https://github.com/apache/arrow/issues/35746 to the
> > > mailing list: this issue proposes to bump the default Parquet version
> > > we use for writing to Parquet files in the C++ library (and in the
> > > various bindings including pyarrow and R arrow) from the current
> > > default of "2.4" to "2.6".
> > >
> > > In practice, the only change is that the writer will, by default,
> > > write the Timestamp LogicalType with NANOS unit
> > > (
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
> )
> > > if your data uses timestamp("ns") (currently, such data gets coerced
> > > to microsecond resolution when writing to Parquet).
> > >
> > > In theory this could cause compatibility issues if the files you are
> > > writing need to be read by other Parquet implementations which don't
> > > yet support nanoseconds. But the Parquet format 2.6 was released in
> > > Sept 2018, and parquet-mr added support for it in 2018 as well.
> > >
> > > Unless there is pushback on this, we are currently planning to make
> > > this change for the upcoming Arrow 13.0.0 release.
> > >
> > > Best,
> > > Joris
>


Re: [DISCUSS][C++] Can we require CMake 3.16+ since 13.0.0?

2023-06-15 Thread Jacob Wujciak-Jens
+1 on 3.16 and dropping amazon linux 2 (as that is recommended by aws).

@antonie 3.14+ has a number of improvements to FetchContent that we could
use to vastly improve our bundled dependency system. There are also
improvements to precompiled headers etc. an overview of some of the changes
in each version is here:
https://cliutils.gitlab.io/modern-cmake/chapters/intro/newcmake.html

On Thu, Jun 15, 2023 at 11:36 PM Antoine Pitrou  wrote:

>
> Hi,
>
> I'd ask the question differently: what do we gain from requiring 3.16
> rather than 3.13?
>
>
> Le 15/06/2023 à 23:19, Sutou Kouhei a écrit :
> > Hi,
> >
> > We require CMake 3.5+ now because Ubuntu 18.04 ships 3.5.
> > We dropped support for Ubuntu 18.04 because it reached EOL.
> >
> > Can we require CMake 3.16+ in Apache Arrow C++ 13.0.0?
> >
> > Here are CMake versions of our supported platforms:
> >
> > * Ubuntu 20.04: CMake 3.16
> > * CentOS 7: CMake 3.17
> > * Debian GNU/Linux bullseye: 3.18
> > * Amazon Linux 2: CMake 3.13!!!
> >
> > See the Amazon Linux 2 item. It ships CMake 3.13 not CMake
> > 3.16+. Can we drop support for Amazon Linux 2 in Apache
> > Arrow C++ 13.0.0?
> >
> > FYI1: Amazon Linux released a new version (Amazon Linux
> > 2023) on 2023-03-15:
> >
> > Amazon Linux 2023, a Cloud-Optimized Linux Distribution with
> > Long-Term Support
> >
> https://aws.amazon.com/blogs/aws/amazon-linux-2023-a-cloud-optimized-linux-distribution-with-long-term-support/
> >
> > FYI2: Amazon Linux 2 was scheduled to reach EOL on
> > 2023-06-30 but has been extended to 2025-06-30:
> >
> > https://aws.amazon.com/amazon-linux-2/faqs/
> >
> >> Amazon Linux 2 end of support date (End of Life, or EOL)
> >> has been extended by two years from 2023-06-30 to
> >> 2025-06-30 to provide customers with ample time to migrate
> >> to the next version.
> >
> > FYI3: We'll support Amazon Linux 2023 in Apache Arrow C++
> > 13.0.0:
> >
> > https://github.com/apache/arrow/pull/36081
> >
> >
> > Related issue:
> >
> > [C++] Require CMake 3.16 or later
> > https://github.com/apache/arrow/issues/34921
> >
> >
> > Thanks,
>


Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Jacob Wujciak-Jens
+1 non-binding, verified Go and C++ on manjaro

On Mon, Jun 12, 2023 at 6:17 PM Raúl Cumplido  wrote:

> +1 non-binding
>
> I've run the following:
>
> TEST_DEFAULT=0 TEST_SOURCE=1
> ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON"
> dev/release/verify-release-candidate.sh 12.0.1 1
> TEST_DEFAULT=0 TEST_WHEELS=1
> TEST_WHEEL_PLATFORM_TAGS="manylinux_2_17_x86_64.manylinux2014_x86_64"
> dev/release/verify-release-candidate.sh 12.0.1 1
> TEST_DEFAULT=0 TEST_JARS=1 dev/release/verify-release-candidate.sh 12.0.1 1
> TEST_DEFAULT=0 TEST_YUM=1 dev/release/verify-release-candidate.sh 12.0.1 1
> TEST_DEFAULT=0 TEST_BINARY=1 dev/release/verify-release-candidate.sh
> 12.0.1 1
>
> I haven't been able to verify APT due to the known issue of
> artifactory returning HTTP 403's.
>
> with:
>   * Python 3.10.6
>   * gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
>   * NVIDIA CUDA cuda_11.5.r11.5/compiler.30672275_0
>   * openjdk 17.0.7 2023-04-18
>   * ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]
>   * dotnet 7.0.302
>   * Ubuntu 22.04 LTS
>
> El lun, 12 jun 2023 a las 16:02, Dewey Dunnington
> () escribió:
> >
> > +1! I ran
> >
> > TEST_DEFAULT=0 TEST_CPP=1
> > ARROW_CMAKE_OPTIONS="-DProtobuf_SOURCE=BUNDLED -DARROW_FLIGHT=OFF
> > -DARROW_FLIGHT_SQL=OFF"  ./verify-release-candidate.sh
> >
> > ...on MacOS Ventura aarch64. (Flight disabled because of protobuf
> issues).
> >
> > On Mon, Jun 12, 2023 at 10:28 AM Joris Van den Bossche
> >  wrote:
> > >
> > > +1 (verified source release on Ubuntu 20.04, using conda)
> > >
> > > On Sat, 10 Jun 2023 at 22:31, Sutou Kouhei  wrote:
> > > >
> > > > +1
> > > >
> > > > I ran the followings on Debian GNU/Linux sid:
> > > >
> > > >   * TEST_DEFAULT=0 \
> > > >   TEST_SOURCE=1 \
> > > >   LANG=C \
> > > >   TZ=UTC \
> > > >   CUDAToolkit_ROOT=/usr \
> > > >   ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON
> -Dxsimd_SOURCE=BUNDLED" \
> > > >   dev/release/verify-release-candidate.sh 12.0.1 1
> > > >
> > > >   * TEST_DEFAULT=0 \
> > > >   TEST_APT=1 \
> > > >   LANG=C \
> > > >   dev/release/verify-release-candidate.sh 12.0.1 1
> > > >
> > > >   * TEST_DEFAULT=0 \
> > > >   TEST_BINARY=1 \
> > > >   LANG=C \
> > > >   dev/release/verify-release-candidate.sh 12.0.1 1
> > > >
> > > >   * TEST_DEFAULT=0 \
> > > >   TEST_JARS=1 \
> > > >   LANG=C \
> > > >   dev/release/verify-release-candidate.sh 12.0.1 1
> > > >
> > > >   * TEST_DEFAULT=0 \
> > > >   TEST_PYTHON_VERSIONS=3.11 \
> > > >
>  TEST_WHEEL_PLATFORM_TAGS=manylinux_2_17_x86_64.manylinux2014_x86_64 \
> > > >   TEST_WHEELS=1 \
> > > >   LANG=C \
> > > >   dev/release/verify-release-candidate.sh 12.0.1 1
> > > >
> > > >   * TEST_DEFAULT=0 \
> > > >   TEST_YUM=1 \
> > > >   LANG=C \
> > > >   dev/release/verify-release-candidate.sh 12.0.1 1
> > > >
> > > > with:
> > > >
> > > >   * .NET SDK (6.0.408)
> > > >   * Python 3.11.2
> > > >   * gcc (Debian 12.2.0-14) 12.2.0
> > > >   * nvidia-cuda-dev 11.8.89~11.8.0-3
> > > >   * openjdk version "18.0.2-ea" 2022-07-19
> > > >   * ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux-gnu]
> > > >
> > > >
> > > > Thanks,
> > > > --
> > > > kou
> > > >
> > > > In  rg...@mail.gmail.com>
> > > >   "[VOTE] Release Apache Arrow 12.0.1 - RC1" on Fri, 9 Jun 2023
> 14:32:26 +0200,
> > > >   Raúl Cumplido  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I would like to propose the following release candidate (RC1) of
> Apache
> > > > > Arrow version 12.0.1. This is a release consisting of 29
> > > > > resolved GitHub issues[1].
> > > > >
> > > > > This release candidate is based on commit:
> > > > > 6af660f48472b8b45a5e01b7136b9b040b185eb1 [2]
> > > > >
> > > > > The source release rc1 is hosted at [3].
> > > > > The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
> > > > > The changelog is located at [12].
> > > > >
> > > > > Please download, verify checksums and signatures, run the unit
> tests,
> > > > > and vote on the release. See [13] for how to validate a release
> candidate.
> > > > >
> > > > > See also a verification result on GitHub pull request [14].
> > > > >
> > > > > The vote will be open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this as Apache Arrow 12.0.1
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this as Apache Arrow 12.0.1 because...
> > > > >
> > > > > [1]:
> https://github.com/apache/arrow/issues?q=is%3Aissue+milestone%3A12.0.1+is%3Aclosed
> > > > > [2]:
> https://github.com/apache/arrow/tree/6af660f48472b8b45a5e01b7136b9b040b185eb1
> > > > > [3]:
> https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-12.0.1-rc1
> > > > > [4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
> > > > > [5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
> > > > > [6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
> > > > > [7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
> > > > > [8]: