from:"Benson Muite"

Re: Help regarding setting up the r package in arrow apache

2023-10-16 Thread Benson Muite

Dpending on context, this message maybe for the users list:
https://arrow.apache.org/community/

Consider examining the CI files:
https://github.com/apache/arrow/blob/main/.github/workflows/r.yml

An alternative to docker is Guile:
https://packages.guix.gnu.org/packages/r-arrow/13.0.0.1/

On 10/16/23 11:47, Nic Crane wrote:
> Hi Divyansh,
> 
> There are instructions for creating a R package dev setup here:
> https://arrow.apache.org/docs/r/articles/developers/setup.html
> 
> If you can explain a bit more about what you've tried so far and what's not
> working, we may be able to advise.
> 
> Best wishes,
> 
> Nic
> 
> On Mon, 16 Oct 2023 at 06:02, Divyansh Khatri 
> wrote:
> 
>> I am having problems regarding setting up the r package using docker of the
>> apache arrow.Can you give me the step by step process of how do i setup the
>> r package in my vs code system using docker.
>>
>

Re: [DISCUSS] Blog on improved DataFusion Grouping in 28.0.0

2023-08-05 Thread Benson Muite

On 8/5/23 21:51, Andrew Lamb wrote:
> I propose posting a blog article by myself, Daniël Heres, and Raphael
> Tustvold-Davies wrote about datafusion grouping performance [1]
> 
> This content was originally published on the InfluxData blog[2] but we
> would like to repost it on the Arrow site blog [3], as the content is
> general and other reasons described on the PR. This is the same pattern we
> followed for [4], which seems to have been successful and uncontroversial.
> 
> Please let us know your thoughts either here or by commenting on the PR

Maybe it is helpful to setup a blog aggregator, see for example [5], [6]
and [7]


> Andrew
> 
> [1]: https://github.com/apache/arrow-site/pull/386
> [2]:
> https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/
> [3]: https://arrow.apache.org/blog/
> [4]:
> https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/
> 
[5] https://www.r-bloggers.com/
[6] https://meetingcpp.com/blog/blogroll/
[7] https://demo.freshrss.org/

Re: Apache Arrow | Graph Algorithms & Data Structures

2023-06-29 Thread Benson Muite

On 6/30/23 04:21, Bechir Ben Daadouch wrote:
> Dear Apache Arrow Dev Community,
> 
> My name is Bechir, I am currently working on a project that involves
> implementing graph algorithms in Apache Arrow.
> 
> The initial plan was to construct a node structure and a subsequent graph
> that would encompass all the nodes. However, I quickly realized that due to
> Apache Arrow's columnar format, this approach was not feasible.
> 
> I tried a couple of things, including the implementation of the
> shortest-path algorithm. However, I rapidly discovered that manipulating
> arrow objects, particularly when applying graph algorithms, proved more
> complex than anticipated and it became very clear that I would need to
> resort to some data structures outside of what arrow offers (i.e.: Heapq
> wouldn't be possible using arrow).
> 
> I also gave a shot at doing it similar to a certain SQL method (see:
> https://ibb.co/0rPGB42 ), but ran into some roadblocks there too and I
> ended up having to resort to using Pandas for some transformations.
> 
> My next course of action is to experiment with compressed sparse rows,
> hoping to execute Matrix Multiplication using this method. But honestly,
> with what I know right now, I remain skeptical about the feasibility
> of it. However,
> before committing to this approach, I would greatly appreciate your opinion
> based on your experience with Apache Arrow.
> 
> Thank you very much for your time.
> 
> Looking forward to potentially discussing this further.
> 
> Many thanks,
> Bechir
> 
Arrow may not be the best choice for most graph algorithms as they
typically require random memory accesses that will be difficult to
coalesce into forms that allow for vectorization. If your data will fit
in memory of a single node, you might consider:
https://github.com/DrTimothyAldenDavis/GraphBLAS
https://pypi.org/project/python-graphblas/
https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl

Re: [DISCUSS] Migrate s390x from Travis to ASF Jenkins

2023-04-21 Thread Benson Muite

On 4/21/23 16:45, Raúl Cumplido wrote:
>>> On Thu, Apr 20, 2023 at 4:07 PM Matt Topol  wrote:
>>>>
>>>> I just wanted to add on that there was a Go on s390x job too that needs to
>>>> get migrated and wasn't on the list in Raul's original email.
> 
> Thanks! I thought I added the Go one too :)
> 
>>>> On Thu, Apr 20, 2023 at 2:42 PM Benson Muite 
>>>> wrote:
>>>>
>>>>> Might also consider testing farm for Centos Stream, Fedora and/or RHEL
>>>>> builds[1][2].
>>>>>
>>>>> 1) https://docs.testing-farm.io/general/0.1/test-environment.html
>>>>> 2)
>>>>>
>>>>> https://fedoramagazine.org/test-github-projects-with-github-actions-and-testing-farm/
>>>>>
> 
> That looks pretty cool! I am not sure we can use it though as from
> their onboarding page it looks like is only opened to:
> 
> Currently Testing Farm is open for:
> -any Red Hat employee, team or project
> -any Fedora or CentOS Stream contributor, team or SIG
Can check on this. Contribute to Fedora. Arrow is available in Fedora:
https://packages.fedoraproject.org/pkgs/libarrow/libarrow/
> -any public project, service or initiative which Red Hat or Fedora
> is maintaining or co-maintaining
>  -   via Packit integration
> https://docs.testing-farm.io/general/0.1/onboarding.html
> 
> If someone has any contacts on Red Hat/Fedora maybe this is something
> we could explore as it would be like adding GitHub actions.
It would not support other operating systems though.  Maybe the tooling
could be added to ASF infrastructure? Possibly Bryan Cutler and Kazuaki
Ishizaki may have interest in this as well.
> 
>>>>> On 4/20/23 19:43, Antoine Pitrou wrote:
>>>>>>
>>>>>> Hi Raul,
>>>>>>
>>>>>> I'm a bit lukewarm about this. We currently don't use Jenkins and it's
>>>>>> quite different from the CI services we have. Adding Jenkins jobs for
>>>>>> s390x sounds like significant additional maintenance for a little-used
>>>>>> platform. Has someone been asking for this?
> 
> I agree, adding Jenkins is far from ideal. I thought there was some
> reminder from INFRA being sent to the PMCs or PMC Chair earlier in the
> year but I might be wrong. Kou, do you know if INFRA asked about this
> lately?
> 
>>>>>> Regards
>>>>>>
>>>>>> Antoine.
>>>>>>
>>>>>>
>>>>>> Le 20/04/2023 à 13:00, Raúl Cumplido a écrit :
>>>>>>> Hi,
>>>>>>>
>>>>>>> As discussed on this mailing list thread [1], one month and a half ago
>>>>>>> we migrated the ARM 64 jobs from Travis to self-hosted runners [2].
>>>>>>>
>>>>>>> We are still missing to migrate the s390x jobs that we run on Travis
>>>>> [3].
>>>>>>>
>>>>>>> - name: "Java on s390x"
>>>>>>> - name: "C++ on s390x"
>>>>>>> - name: "Python on s390x"
>>>>>>>
>>>>>>> As we don't have other s390x hosts I will try and set up new Jenkins
>>>>>>> jobs for those using the ASF provided infrastructure.
>>>>>>>   From what I can read on the ASF wiki [4] I might require some PMC to
>>>>>>> help me get access to Jenkins via the whimsy tool to be added to the
>>>>>>> hudson-jobadmin group in order to have access to set up jobs on
>>>>>>> Jenkins.
>>>>>>>
>>>>>>> I wanted to validate that this is ok and would like to ask if someone
>>>>>>> can help me with the access.
>>>>>>>
>>>>>>> As a reminder all Apache projects were supposed to migrate from Travis
>>>>>>> CI by the end of 2022 [5].
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Raúl
>>>>>>>
>>>>>>> [1] https://lists.apache.org/thread/mskpqwpdq65t1wpj4f5klfq9217ljodw
>>>>>>> [2] https://github.com/apache/arrow/pull/34482
>>>>>>> [3]
>>>>>>>
>>>>> https://github.com/apache/arrow/blob/f2cc0b41fe9fb1d8d2bdb1d2abf676278e273f55/.travis.yml
>>>>>>> [4] https://cwiki.apache.org/confluence/display/INFRA/Jenkins
>>>>>>> [5] https://github.com/apache/arrow/issues/20496
>>>>>
>>>>>

Re: [DISCUSS] Migrate s390x from Travis to ASF Jenkins

2023-04-20 Thread Benson Muite

Might also consider testing farm for Centos Stream, Fedora and/or RHEL
builds[1][2].

1) https://docs.testing-farm.io/general/0.1/test-environment.html
2)
https://fedoramagazine.org/test-github-projects-with-github-actions-and-testing-farm/

On 4/20/23 19:43, Antoine Pitrou wrote:
> 
> Hi Raul,
> 
> I'm a bit lukewarm about this. We currently don't use Jenkins and it's
> quite different from the CI services we have. Adding Jenkins jobs for
> s390x sounds like significant additional maintenance for a little-used
> platform. Has someone been asking for this?
> 
> Regards
> 
> Antoine.
> 
> 
> Le 20/04/2023 à 13:00, Raúl Cumplido a écrit :
>> Hi,
>>
>> As discussed on this mailing list thread [1], one month and a half ago
>> we migrated the ARM 64 jobs from Travis to self-hosted runners [2].
>>
>> We are still missing to migrate the s390x jobs that we run on Travis [3].
>>
>> - name: "Java on s390x"
>> - name: "C++ on s390x"
>> - name: "Python on s390x"
>>
>> As we don't have other s390x hosts I will try and set up new Jenkins
>> jobs for those using the ASF provided infrastructure.
>>  From what I can read on the ASF wiki [4] I might require some PMC to
>> help me get access to Jenkins via the whimsy tool to be added to the
>> hudson-jobadmin group in order to have access to set up jobs on
>> Jenkins.
>>
>> I wanted to validate that this is ok and would like to ask if someone
>> can help me with the access.
>>
>> As a reminder all Apache projects were supposed to migrate from Travis
>> CI by the end of 2022 [5].
>>
>> Thanks,
>> Raúl
>>
>> [1] https://lists.apache.org/thread/mskpqwpdq65t1wpj4f5klfq9217ljodw
>> [2] https://github.com/apache/arrow/pull/34482
>> [3]
>> https://github.com/apache/arrow/blob/f2cc0b41fe9fb1d8d2bdb1d2abf676278e273f55/.travis.yml
>> [4] https://cwiki.apache.org/confluence/display/INFRA/Jenkins
>> [5] https://github.com/apache/arrow/issues/20496

Re: New Pandas-Apache repo

2023-01-22 Thread Benson Muite

On 1/22/23 13:15, Adesola Adedewe wrote:
> i'm working on a project where big financial data needs to be loaded stored
> and manipulated. the data is stored as parquet. my initial version had
> arrow just load the parquet data and i used the basic unorderedmap but this
> limited me to only one data type. i found i could make my database more
> generic with arrow and its performance benefits. unfortunately my team is
> mostly filled with python dev, so i decided to write a cleaner interface
> over arrow, and using interfaces closer to panda. This enabled us to use
> fewer lines of code as well, and still enjoy the benefit. i will write a
> blog post later, i was mostly looking for other developers looking to
> collaborate, or who may need this as well. not necessarily add it to the
> main library, but i'm not opposed to that. I also implemented some
> custom kernels like covariance correlation, cumprod, shift, pctchange.
> 

The context is very helpful. A blog post would certainly alert others in
the Arrow community of your work.  Most developers are over burdened, so
explaining a use case and how it may help them would encourage
exploration and review of your repository, so would encourage a blog
post that alerts the wider Arrow developer community about your work.
Updating the README of your repository would also encourage use.

Re: New Pandas-Apache repo

2023-01-22 Thread Benson Muite

On 1/22/23 11:41, Adesola Adedewe wrote:
> The project was initially meant to provide a simpler interface over arrow
> apache so pretty much what was done with the python api, but it has
> evolved to be more than that ,with indexing and other panda operations
> implemented like reindex, resample, concat etc. I currently have it good
> enough for my project but I think it has potential to also open the door
> for more developers to use arrow for their projects. please take a look.
> 

Thanks.  What problem did this solve for you?  How did you utilize it
for your project?  Maybe you could contribute a blog post to Arrow
describing the end use case and the motivation for a C++ dataframe
interface?

Re: New Pandas-Apache repo

2023-01-22 Thread Benson Muite



On 1/22/23 06:23, Adesola Adedewe wrote:
> okay thanks for your consideration.
> 
> On Sat, Jan 21, 2023 at 4:49 PM Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> I'm not sure pandas like API is suitable for our official
>> data frame API.
>>
>> FYI:
>>
>> * GitHub issue of this:
>>   https://github.com/apache/arrow/issues/33747
>> * [DISCUSS] Developing a "data frame" subproject in the Arrow C++ libraries
>>   https://lists.apache.org/thread/50vbmw49w83sj3km326srown64c7hlf1
>>
>>
>> Thanks,
>> --
>> kou
>>
>> In 
>>   "New Pandas-Apache repo" on Wed, 18 Jan 2023 19:04:53 -0800,
>>   Adesola Adedewe  wrote:
>>
>>> https://github.com/ava6969/panda-arrow.git
>>>
How would this compare to [1] and [2]?

1} https://github.com/xtensor-stack/xframe
2) https://github.com/hosseinmoein/DataFrame

>>> please review this new repo i made, writing nice wrappers over arrow
>>> apache using panda interfaces. i'll love to add it to the main repo.
>>
>

Re: [VOTE] Release Apache Arrow ADBC 0.1.0 - RC6

2023-01-09 Thread Benson Muite

Non-binding +1

Fedora Rawhide, aarch64:
TEST_APT=0 TEST_YUM=0 ./dev/release/verify-release-candidate.sh 0.1.0 6
Using go installed by the script. Script hung when using go in repositories.

Ruby gems can be installed using
gem install --user-local red-arrow
which does not require sudo permissions, unless this also pulls in arrow
from package repositories, it did not on Fedora Rawhide.

With
DOCKER_DEFAULT_ARCHITECTURE=linux/arm64
./dev/release/verify-release-candidate.sh 0.1.0 6
or
DOCKER_DEFAULT_ARCHITECTURE=linux/amd64
./dev/release/verify-release-candidate.sh 0.1.0 6

Get the error:
E: Unable to locate package libadbc-driver-manager-dev
Failed to verify the APT repository for debian:bullseye


On 1/9/23 19:40, Raúl Cumplido wrote:
> +1
> 
> Verified on Ubuntu 22.04 with:
> 
> USE_CONDA=1 ./dev/release/verify-release-candidate.sh 0.1.0 6
> 
> Minor note, I tried without conda first but had to install libpq manually
> and I got the same issue as Antoine with requiring sudo to install Ruby
> gems.
> 
> 
> 
> El lun, 9 ene 2023 a las 17:21, David Li () escribió:
> 
>> Hmm. Kou, do you have an idea of what's going on here? I tried just now
>> and wasn't able to reproduce, either on my Ubuntu 18.04 system, or in a
>> fresh Debian Bookworm container. I see that I get bundler 2.3.7 instead of
>> 2.3.5, but that's it.
>>
>> On Mon, Jan 9, 2023, at 10:56, Antoine Pitrou wrote:
>>> Ok, now that I pass `USE_CONDA=1`, I get the following error:
>>>
>>> Fetching gem metadata from https://rubygems.org/...
>>> Resolving dependencies...
>>> Using bundler 2.3.5
>>> Fetching fiddle 1.1.1
>>> Fetching pkg-config 1.5.1
>>> Fetching power_assert 2.0.3
>>> Fetching native-package-installer 1.1.5
>>> Fetching bigdecimal 3.1.3
>>> Fetching extpp 0.1.1
>>> Installing pkg-config 1.5.1
>>> Installing extpp 0.1.1
>>> Installing native-package-installer 1.1.5
>>> Installing fiddle 1.1.1 with native extensions
>>> Installing power_assert 2.0.3
>>> Installing bigdecimal 3.1.3 with native extensions
>>> Fetching test-unit 3.5.7
>>> Fetching glib2 4.0.6
>>> Installing test-unit 3.5.7
>>> Installing glib2 4.0.6 with native extensions
>>> Fetching gobject-introspection 4.0.6
>>> Installing gobject-introspection 4.0.6 with native extensions
>>> Fetching gio2 4.0.6
>>> Installing gio2 4.0.6 with native extensions
>>> Fetching red-arrow 10.0.0
>>> Installing red-arrow 10.0.0 with native extensions
>>> Bundle complete! 3 Gemfile dependencies, 12 gems now installed.
>>> Bundled gems are installed into `./vendor/bundle`
>>> Could not find gobject-introspection-4.0.6, red-arrow-10.0.0,
>>> test-unit-3.5.7, glib2-4.0.6, bigdecimal-3.1.3, extpp-0.1.1, gio2-4.0.6,
>>> native-package-installer-1.1.5, pkg-config-1.5.1, power_assert-2.0.3,
>>> fiddle-1.1.1 in any of the sources
>>> Run `bundle install` to install missing gems.
>>>
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> Le 09/01/2023 à 16:39, David Li a écrit :
 USE_CONDA=1 or manually installing libarrow-dev 10 beforehand should
>> both get around this. In this case I believe the issue is that building the
>> Arrow gem requires the Arrow libraries from the system package manager.

 On Mon, Jan 9, 2023, at 10:23, Antoine Pitrou wrote:
> The dev release script asks for the root password at some point, which
> is concerning:
>
> [...]
> Fetching red-arrow 10.0.0
> Installing red-arrow 10.0.0 with native extensions
> [sudo] password for antoine to install :
>
>
> Is there no possibility to install Ruby packages in some kind of
>> virtual
> environment, like in Python?
>
> Regards
>
> Antoine.
>
>
> Le 26/12/2022 à 16:55, David Li a écrit :
>> Hello,
>>
>> I would like to propose the following release candidate (RC0) of
>> Apache Arrow ADBC version 0.1.0. This is a release consisting of 63
>> resolved GitHub issues[1].
>>
>> This release candidate is based on commit:
>> 618a2ff1c64a5e2e410e30c5a156409c96fd9dfc [2]
>>
>> The source release rc6 is hosted at [3].
>> The binary artifacts are hosted at [4][5][6][7][8].
>> The changelog is located at [9].
>>
>> Please download, verify checksums and signatures, run the unit tests,
>> and vote on the release. See [10] for how to validate a release
>> candidate.
>>
>> See also a verification result on GitHub Actions [11].
>>
>> The vote will be open for at least 72 hours.
>>
>> [ ] +1 Release this as Apache Arrow ADBC 0.1.0
>> [ ] +0
>> [ ] -1 Do not release this as Apache Arrow ADBC 0.1.0 because...
>>
>> Note: to verify APT/YUM packages on macOS/AArch64, you must `export
>> DOCKER_DEFAULT_ARCHITECTURE=linux/amd64`. (Or skip this step by `export
>> TEST_APT=0 TEST_YUM=0`.)
>>
>> [1]:
>> https://github.com/apache/arrow-adbc/issues?q=is%3Aissue+milestone%3A0.1.0+is%3Aclosed
>> [2]:
>> https://github.com/apache/arrow-adbc/releases/tag/apache-arrow-adbc-0.1.0-rc6
>> [3]:
>

Re: Arrow sync call January 4 at 12:00 US/Eastern, 17:00 UTC

2023-01-06 Thread Benson Muite

On 1/7/23 05:54, Ian Cook wrote:
>> If a Google Doc is used, can it be configured to send out notifications of
> the summary to the list?
> 
> Not as far as I know, but I think we can continue to send a copy of the
> notes to the mailing list after each biweekly meeting, copied and pasted
> from the Google Doc.
https://developers.google.com/docs/api/how-tos/overview
https://developers.google.com/apps-script/guides/docs

However, manually sending the notes is also fine.

Re: Arrow sync call January 4 at 12:00 US/Eastern, 17:00 UTC

2023-01-06 Thread Benson Muite



> Proposal to move sync call meeting notes into a Google Doc
> 
> - Will proposed that we share notes from sync calls in a publicly
> viewable Google Doc instead of in emails to the mailing list [2]
> - There was a discussion about whether managing edit access to this
> Google Doc would be difficult and whether we should consider
> alternatives such as GitHub or Confluence, but the consensus seemed to
> be that a Google Doc would be best
> - Further discussion welcome; we tentatively plan to begin using a
> Google Doc in the next meeting
If a Google Doc is used, can it be configured to send out notifications
of the summary to the list?
> 
> 
>

Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb

2022-12-26 Thread Benson Muite

Congratulations!
On 12/27/22 05:44, Yibo Cai wrote:

Congratulations!

-Original Message-
From: Rok Mihevc 
Sent: Tuesday, December 27, 2022 7:57 AM
To: dev@arrow.apache.org
Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb

Congratulations Andrew!

Rok

On Mon, Dec 26, 2022 at 11:26 PM Neal Richardson < neal.p.richard...@gmail.com> 
wrote:

Congratulations!

On Mon, Dec 26, 2022 at 4:38 PM Matt Topol  wrote:

Congrats!!!

On Mon, Dec 26, 2022, 12:47 PM Jacob Wujciak

wrote:

Congratulations Andrew!

Matthew Turner  schrieb am Mo., 26. Dez.
2022, 16:44:

Congratulations, Andrew!

From: Yijie Shen 
Date: Monday, December 26, 2022 at 8:14 AM
To: dev@arrow.apache.org 
Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andrew Lamb
Congrats Andrew!

On Mon, Dec 26, 2022 at 20:37 Wang Xudong

wrote:

Congratulations Andrew!

Thank you for your dedication to arrow rust ecosystem!

Willy Kuo  于2022年12月26日周一 20:13写道：

Congratulations Andrew!

Sent from my iPhone

On Dec 26, 2022, at 7:48 PM, Nic Crane

wrote:

Congratulations!

On Mon, 26 Dec 2022, 11:01 Daniël Heres, <

danielhe...@gmail.com

wrote:

Congrats Andrew!

On Mon, Dec 26, 2022, 09:00 L. C. Hsieh

wrote:

Congratulations!

On Sun, Dec 25, 2022 at 10:36 PM Weston Pace <

weston.p...@gmail.com>

wrote:

Congratulations!

On Sun, Dec 25, 2022, 9:44 PM Remzi Yang <

1371656737...@gmail.com

wrote:

Congratulation Andrew!

On Mon, 26 Dec 2022 at 13:40, David Li <

lidav...@apache.org>

wrote:

Congrats Andrew!

On Mon, Dec 26, 2022, at 00:26, vin jake wrote:

congratulation!

Sutou Kouhei  于 2022年12月26日周一

12:54写道：

I am pleased to announce that we have a new PMC
chair

and

VP

as

per

our newly started tradition of rotating the chair
once a

year. I

have

resigned and Andrew Lamb was duly elected by the
PMC and

approved

unanimously by the board. Please join me in

congratulating

Andrew

Lamb!

Thanks,
--
kou

IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

Re: Current state of using GitHub issues for Arrow

2022-12-08 Thread Benson Muite


Maybe a bit late, but maybe a repository could be create just for issues.

Code repositories would just have discussion on pull requests.

This might also make managing and filtering subscriptions easier.

On 12/8/22 14:36, Antoine Pitrou wrote:


I'm not sure it makes sense to file website issues to a different repo 
than documentation issues.


Regards

Antoine.


Le 08/12/2022 à 11:50, Joris Van den Bossche a écrit :
On Tue, 6 Dec 2022 at 08:41, Benson Muite  
wrote:



For sure the exact workflows will still be further refined while 
starting

to use this. And if there are things missing or unclear in the current
practices around how to handle GitHub issues or any other feedback or
ideas, this thread is yours!

Maybe helpful to also update website bot:
https://github.com/apache/arrow-site



Related to the arrow-site repo: up to now we were also using the Arrow
JIRA to track website issues. Now we move to GitHub issues, it
probably makes more sense to use the issues on the arrow-site repo
itself. So we just enabled that issues can be opened there:
https://github.com/apache/arrow-site/issues

We will still have to update the merge script in the arrow-site repo
as well (or discuss about using the merge button).

Joris

Re: Current state of using GitHub issues for Arrow

2022-12-05 Thread Benson Muite





For sure the exact workflows will still be further refined while starting
to use this. And if there are things missing or unclear in the current
practices around how to handle GitHub issues or any other feedback or
ideas, this thread is yours!

Maybe helpful to also update website bot:
https://github.com/apache/arrow-site

Re: [ANNOUNCE] New Arrow committer: Raúl Cumplido

2022-12-05 Thread Benson Muite


On 12/6/22 05:53, Sutou Kouhei wrote:

On behalf of the Arrow PMC, I'm happy to announce that Raúl
Cumplido has accepted an invitation to become a committer on
Apache Arrow. Welcome, and thank you for your contributions!


Congratulations Raúl

Re: [DISCUSS] Maintenance policy

2022-11-23 Thread Benson Muite


On 10/19/22 20:47, Will Jones wrote:

One particular type of defect we might want to consider backporting to
supported versions are ones that silently produce incorrect data. Unlike
ones that cause a crash, it's not easy for a user to know they are affected.

Here are a few examples:

  * ARROW-17453: [Go][C++][Parquet] Inconsistent Data with Repetition Levels
[1] (fixed in 10.0.0)
  * ARROW-17995: [C++] Fix json decimals not being rescaled based on the
explicit schema [2] (fixed in 10.0.0)
  * ARROW-14523: [C++] Fix potential data loss in S3 multipart upload [3]
(fixed in 7.0.0)

Also, I know we have high release costs for new versions, but is that also
true for backporting fixes? Unlike new releases, if we were creating a
bugfix release, we are presumably starting from a much more stable point,
right?

Thanks,

Will Jones

[1] https://issues.apache.org/jira/browse/ARROW-17453
[2] https://issues.apache.org/jira/browse/ARROW-17995
[3] https://issues.apache.org/jira/browse/ARROW-14523

On Wed, Oct 19, 2022 at 9:32 AM Todd Farmer 
wrote:


Hi,

I've been thinking a lot about maintenance and lifecycle policies and
defect classification recently - I'm very grateful this is being raised. I
believe establishing such policies will prove instrumental to enable
adoption of Arrow for a number of use cases that prioritize stability over
innovation.

On Wed, Oct 19, 2022 at 5:42 AM Antoine Pitrou  wrote:



Hi Kou,

Le 19/10/2022 à 06:29, Sutou Kouhei a écrit :


My proposal: We maintain the last major release:
* We maintain 9.Y.Z when the latest major release is 9.0.0
* We may release 9.Y.Z when we find a problem such as a
security vulnerability in 9.Y.Z
* We drop support for 9.Y.Z when we release 10.0.0


That sounds ok to me, but is there a more precise criterion than "we
find a problem"?
For most users, backwards compatibility and supported platforms are 
likely more important than the version number.  If there are many 
breaking API changes, this increases the cost of using Arrow, so 
supporting easy continuous use of Arrow should be the goal.


In the past, we have from time to time done maintenance releases based
on annoying bugs/regressions. But not always.



I very much agree, and actually think there are multiple questions to
answer here:

1. Which class of defects should be allowed to be merged into a maintenance
branch?
2. Which class of defects must be fixed in a supported maintenance branch?
3. Which class of defects should trigger a maintenance release once a fix
is made to the branch?
4. Which versions should be targeted in backporting a defect fix?  How long
will a release receive maintenance support?
5. Which class of defects can be batched into a future maintenance release,
and which need immediate release?
6. What delivery artifacts are needed for maintenance releases? Can some
things be source-only?

Today, any fix may be a candidate for backporting to a maintenance branch
if there's support for doing so in a vote. I believe it might be useful to
more formally triage defects in part to establish policy answering these
questions. For example:

* How severe is the defect?  Does it produce wrong results? Cause crashes?
Or is it an annoying spelling error in a log message?
* How widespread is the impact? Is everybody who uses Arrow going to be
affected by this? Or is it only triggered by some very obscure use case?
* How accessible is any workaround?
* How much risk is involved in a fix?

Having a common framework to classify those elements above would enable
policy that clearly defines which defects can (or should, eventually) get
what attention.

If there is interest in the community, I'll continue a draft proposal I'm
working on to formalize triage to capture these aspects. Any such triage
process would be entirely optional for work done against master/main, but
could be required for assessing potential backports as needed.

I'll also note that I recognize Arrow may not currently see a need to
answer all the questions about maintenance/lifecycle policy today, or may
not have the resources needed to deliver what may be desired. It takes a
lot of work to generate a release today. I think it's completely
appropriate to commit only to what can be delivered today, with an eye
towards incremental improvement. For example, an entirely acceptable policy
might be:

* Only the most recently-released minor version is eligible for defect
fixes.
* Security vulnerabilities with CVSS 3.0 score >= 7.0 (High) should trigger
a maintenance release.
* Fixes for defects of any nature may be backported if it reaches
established thresholds (TBD) for severity, widespread impact, workaround
accessibility and risk. Such fixes will be incorporated into the release
maintenance release, made available via source, but no release will be
produced unless triggered by a subsequent security vulnerability fix.

It may be good to disclose known problems on a site associated with the 
release.  Bug tickets are helpful for work i

Re: [ANNOUNCE] New Arrow committer: Curtis Vogt

2022-11-03 Thread Benson Muite


Congratulations
On 11/4/22 01:29, Vibhatha Abeykoon wrote:

Congratulations

On Thu, Nov 3, 2022 at 7:09 PM Rok Mihevc  wrote:


Congratulations!

On Thu, Nov 3, 2022 at 12:31 AM David Li  wrote:


Welcome, Curtis!

On Tue, Nov 1, 2022, at 17:14, Sutou Kouhei wrote:

On behalf of the Arrow PMC, I'm happy to announce that Curtis Vogt
has accepted an invitation to become a committer on Apache
Arrow. Welcome, and thank you for your contributions!

Re: [ANNOUNCE] New Arrow committer: Yang Jiang

2022-11-03 Thread Benson Muite


Congratulations!
On 11/3/22 21:09, Percy Camilo Triveño Aucahuasi wrote:

Congratulations!

On Thu, Nov 3, 2022 at 8:39 AM Rok Mihevc  wrote:


Congrats!

On Thu, Nov 3, 2022 at 2:27 PM Weston Pace  wrote:


Congratulations

On Thu, Nov 3, 2022, 6:25 AM Patrick Horan  wrote:


Congrats Jiang!

On Thu, Nov 3, 2022, at 1:52 AM, Wang Xudong wrote:

Congratulations!

Yijie Shen  于2022年11月3日周四 11:08写道：


Congratulations Jiang!

On Thu, Nov 3, 2022 at 9:54 AM vin jake 

wrote:



Congratulations!

On Thu, Nov 3, 2022 at 9:52 AM Remzi Yang <

1371656737...@gmail.com



wrote:



Congratulations Yang!

On Thu, 3 Nov 2022 at 09:51, Kun Liu 

wrote:



Congrats !!!


David Li  于2022年11月3日周四 07:31写道：


Congrats Yang!

On Wed, Nov 2, 2022, at 17:09, Andy Grove wrote:

On behalf of the Arrow PMC, I'm happy to announce that

Yang

Jiang

has accepted an invitation to become a committer on

Apache

Arrow. Welcome, and thank you for your contributions!

Re: Using Arrow on RHEL/CentOS/Rocky and related linux distros

2022-11-02 Thread Benson Muite


On 11/2/22 10:32, Sutou Kouhei wrote:

Hi,


As an example Arrow is packaged in Fedora/EPEL. The spec file does not
bundle Abseil, thrift, gRPC,
https://src.fedoraproject.org/rpms/libarrow/blob/rawhide/f/libarrow.spec


Because Fedora ships recent Abseil, Thrift and gRPC. It
doesn't use software collections, right?

It seems that software collections don't provide them:

https://www.softwarecollections.org/en/scls/?search=abseil
https://www.softwarecollections.org/en/scls/?search=grpc

(Note that our RPMs already use system Thrift package.)

As a starting point for redistribution, will create a build on copr:
https://www.softwarecollections.org/en/docs/add-to-catalogue/


So it seems that software collections doesn't help us...


Thanks,

Re: Using Arrow on RHEL/CentOS/Rocky and related linux distros

2022-10-31 Thread Benson Muite


On 10/31/22 00:14, Sutou Kouhei wrote:

Hi,

Thanks for the suggestion. But what do we need to do for it?

For example, our RPMs for AlmaLinux 9 bundle the following
libraries:

https://github.com/ursacomputing/crossbow/actions/runs/3354778483/jobs/5558561346#step:6:463

* Protocol Buffers
* jemalloc
* mimalloc
* gRPC
* Abseil
* Google Could C++
* CRC32C
* ORC
As an example Arrow is packaged in Fedora/EPEL. The spec file does not 
bundle Abseil, thrift, gRPC,

https://src.fedoraproject.org/rpms/libarrow/blob/rawhide/f/libarrow.spec
jemalloc is available in this ecosystem, but is not enabled in this build.

Using a software collection would enable much easier build deployment 
and customization.  In particular some of these packages follow a live 
at head philosophy and others do not, so what is packaged by the 
distribution may not have features used in Arrow.  A software collection 
would enable easy packaging of a tested combination.  It would also 
enable easy modification of selected components improving developer 
productivity since it would setup the development environment correctly 
and this would closely match the production environment.

Using Arrow on RHEL/CentOS/Rocky and related linux distros

2022-10-30 Thread Benson Muite

Arrow releases are distributed as an RPM package for these 
distributions.  However, many dependencies are bundled with the released 
RPMs, which may make using them in other software problematic.  Software 
collections[1] are similar to Python virtual envs for RPM based 
distributions.  They would enable unbundling dependencies and make it 
easier to build on top of Arrow releases for RHEL and related 
distributions. As an example Milvus uses Arrow, but releases of Arrow 
used often lag behind the latest Arrow release[2].  For those that 
use/build on top of RPM based distributions, would you consider 
using/building on top of an Arrow software collection?


1) https://www.softwarecollections.org/en/
2) 
https://github.com/milvus-io/milvus/blob/master/internal/core/thirdparty/arrow/CMakeLists.txt

Re: [VOTE][Julia] Release Apache Arrow Julia 2.4.0 RC1

2022-10-26 Thread Benson Muite


+1 non binding, Rocky linux 9
$ dev/release/verify_rc.sh 2.4.0 1
and also Cent OS 7 (Signature verification fails due to older libraries)
$ VERIFY_SIGN=0 dev/release/verify_rc.sh 2.4.0 1

Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-25 Thread Benson Muite


Congratulations Nic!
On 10/26/22 04:11, Vibhatha Abeykoon wrote:

Congrats Nic!

On Wed, Oct 26, 2022 at 5:30 AM Ashish  wrote:


Congrats !

On Wednesday, October 26, 2022, Anja  wrote:


Congrats!!

On Tue, 25 Oct 2022 at 15:45, Rok Mihevc  wrote:


Congrats Nic!

Rok

On Tue, Oct 25, 2022 at 11:16 PM Will Jones 
wrote:


Congrats Nic!

On Tue, Oct 25, 2022 at 2:14 PM David Li 

wrote:



Congrats & welcome Nic!

On Tue, Oct 25, 2022, at 17:07, Matt Topol wrote:

Congrats!!

On Tue, Oct 25, 2022 at 5:06 PM Sutou Kouhei 


wrote:



The Project Management Committee (PMC) for Apache Arrow has

invited

Nicola Crane to become a PMC member and we are pleased to

announce

that Nicola Crane has accepted.

Congratulations and welcome!












--
thanks
ashish

Re: [ANNOUNCE] New Arrow committer: Bogumił Kamiński

2022-10-25 Thread Benson Muite


Congratulations Bogumił!
On 10/26/22 04:13, Vibhatha Abeykoon wrote:

Congrats Bogumił!

On Wed, Oct 26, 2022 at 4:24 AM Rok Mihevc  wrote:


Congrats Bogumił!

Rok

On Tue, Oct 25, 2022 at 11:15 PM David Li  wrote:


Welcome Bogumił!

On Tue, Oct 25, 2022, at 17:05, Sutou Kouhei wrote:

Hi,

On behalf of the Arrow PMC, I'm happy to announce that Bogumił Kamiński
has accepted an invitation to become a committer on Apache
Arrow. Welcome, and thank you for your contributions!

Thanks,
--
kou

Re: [ANNOUNCE] New Arrow PMC member: Jacob Quinn

2022-10-25 Thread Benson Muite


Congratulations Jacob!
On 10/26/22 04:12, Vibhatha Abeykoon wrote:

Congratulations Jacob!

On Wed, Oct 26, 2022 at 4:23 AM Rok Mihevc  wrote:


Congratulations Jacob!

Rok

On Tue, Oct 25, 2022 at 11:15 PM David Li  wrote:


Congrats Jacob!!

On Tue, Oct 25, 2022, at 17:06, Sutou Kouhei wrote:

The Project Management Committee (PMC) for Apache Arrow has invited
Jacob Quinn to become a PMC member and we are pleased to announce
that Jacob Quinn has accepted.

Congratulations and welcome!

Re: [DISCUSS] Move issue tracking to

2022-10-23 Thread Benson Muite

It is unclear why the infrastructure team cannot allow a variety of
authentication mechanisms - Gitee for example enables SMS authentication
and validation by any validated Gitee user to obtain basic
functionality. My expectation is that any committer or validated
contributor (not just PMC) can validate potential Jira users. This
would lessen the burden on PMC members.

Making it easy to report is great, in addition to user and developer
mailing lists, GitHub issues has been enabled for this purpose. A
little friction may also mean more thought out issues that should and
will be worked on.

Some people may find GitHub inconvenient[1,2,3]. With GitHub one cannot
easily modify either the forge or the issue tracking system. Maybe other
issue tracking systems should be considered especially if better
validation mechanisms cannot be used with Jira, for example Redmine,
Bugzilla - this does place more burden on Apache Software Foundation,
but if project contributors also want to improve issue tracking, this
gives an opportunity to do so.

Finally, can the data on GitHub issues be mirrored somewhere else? If
so where and with what metadata?

Benson

1) https://man.sr.ht/~etalab/logiciels-libres/why-sourcehut.md
2) https://www.gnu.org/software/repo-criteria-evaluation.html
3) https://postmarketos.org/blog/2022/07/25/considering-sourcehut/

On 10/23/22 15:01, Andrew Lamb wrote:

It is my opinion that github issues have served us very well in the Arrow
Rust implementation. I haven't heard anyone complain about github or
suggest we should look for alternatives.

One of the major benefits is that everyone who contributes to the project
already has to have a github account, so using github issues needs no other
setup and the UI is very familiar (e.g. markdown syntax, etc).

We also use github issues in my job and it works well in that context as
well.

After using Jira (as I had for 10+ years), it certainly takes time getting
used to a new interface but I found it is "good enough" for all project
planning and coordination needs I have had.

In terms of migration, when we switched from JIRA to github issues I wrote
a simple python script that copied content from JIRA to github and left
bidirectional pointers -- it wasn't perfect but seems to have gotten the
job done

Andrew

On Sat, Oct 22, 2022 at 2:47 PM Todd Farmer
wrote:

I can't claim experience migrating to GitHub issues, but I've done some
work with Jira APIs and import/export tools. I'm happy to help draft a
proposal or proof-of-concept to validate if nobody else expresses interest,
or to assist anybody who does.

Todd

On Sat, Oct 22, 2022 at 12:08 PM Neal Richardson <
neal.p.richard...@gmail.com> wrote:

I would guess that mostly could be handled with labels, though that does
turn into lots of labels. GitHub also seems to have grown some useful
features for this, like Projects [1] and Milestones, and we should look
into how to make them work for us.

While I agree that those features in the current Jira instance are nice

and

we should seek to preserve them, I fear that if we chose to adopt a
different issue tracker other than GitHub, we wouldn't solve the
barrier-to-entry problem--we'd only be moving it.

Regarding migration, I'm sure there are export and import APIs, the
question is how much effort/code we'd have to put in to make it happen.
Does anyone here have experience doing this?

Neal

[1]:

https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects

On Sat, Oct 22, 2022 at 10:19 AM Antoine Pitrou
wrote:

Hi Neal,

Le 22/10/2022 à 15:35, Neal Richardson a écrit :

Their email says:

Infra knows this process change places an increasing burden on PMC

members

for managing contributors, and makes it harder for people to

contribute

bug reports.

We suggest projects consider using GitHub Issues for customer-facing

questions/bug

reports/etc., while maintaining development issues on Jira.

but I think that having a two-tiered system for issue tracking

presents

some notable downsides for us, including:

* Increased barriers to entry for new contributors and a sense of
inequality between "us" and "them". There's already too much friction

IMO,

and this pushes that up significantly.
* Maintenance burden of triaging and synchronizing issues across

trackers

sounds like a lot for us to take on. I'd prefer the active

maintainers

the project spend their time shipping useful, reliable software, not

doing

bookkeeping.

I fully agree with your concerns. So I'm +1 on migrating to *something
else*.

The one thing I would not want to lose, though, is the categorization
facilities we currently have in Jira. Namely: Component, Affects
version, Fix version, Type (bug/improvement/task...), Issue links
(superceded by/relates to/is caused by...), Priority (at least
Minor/Major/Blocker).

How much of that can be recreated in Github Issues, or any other
alternative?

A seco

Re: [VOTE] Release Apache Arrow 10.0.0 - RC0

2022-10-23 Thread Benson Muite

rpc_message":"Can't 
prepare statement: near "(": syntax error","grpc_status":3}. Client 
context: IOError: Server never sent a data message. Detail: Internal

[  FAILED  ] TestFlightSqlServer.TestCommandGetImportedKeys (4 ms)

/root/apache-arrow-10.0.0/cpp/src/arrow/flight/sql/server_test.cc:674: 
Failure

Failed
'_error_or_value110.status()' failed with Invalid: Can't prepare 
statement: near "(": syntax error. gRPC client debug context: 
{"created":"@1666547640.657013835","description":"Error received from 
peer 
ipv6:[::1]:36523","file":"/tmp/arrow-HEAD.bOtfP/cpp-build/grpc_ep-prefix/src/grpc_ep/src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Can't 
prepare statement: near "(": syntax error","grpc_status":3}. Client 
context: IOError: Server never sent a data message. Detail: Internal

[  FAILED  ] TestFlightSqlServer.TestCommandGetExportedKeys (5 ms)

/root/apache-arrow-10.0.0/cpp/src/arrow/flight/sql/server_test.cc:708: 
Failure

Failed
'_error_or_value113.status()' failed with Invalid: Can't prepare 
statement: near "(": syntax error. gRPC client debug context: 
{"created":"@1666547640.662347626","description":"Error received from 
peer 
ipv6:[::1]:43394","file":"/tmp/arrow-HEAD.bOtfP/cpp-build/grpc_ep-prefix/src/grpc_ep/src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Can't 
prepare statement: near "(": syntax error","grpc_status":3}. Client 
context: IOError: Server never sent a data message. Detail: Internal

[  FAILED  ] TestFlightSqlServer.TestCommandGetCrossReference (3 ms)

On 10/23/22 10:31, Benson Muite wrote:
WIP but source verification fails for me on CentOS 7 due to unsigned key 
from Neville Dipale:


TEST_DEFAULT=0 TEST_SOURCE=1 dev/release/verify-release-candidate.sh 
10.0.0 0


gpg: key 717D3FB2: no valid user IDs
gpg: this may be caused by a missing self-signature
...
gpg: Total number processed: 14
gpg:   w/o user IDs: 1
gpg:  unchanged: 13
Failed to verify release candidate. See /tmp/arrow-10.0.0.gOoKw for 
details.


On 10/22/22 22:32, David Li wrote:

Still WIP for me. Verified:
- C++, Python, Java, binaries on Ubuntu Linux 18.04/AMD64
- C++, Python, Java on MacOS 12.3/AArch64

* MacOS required Rosetta installed to generate Protobuf sources for Java
* I needed https://github.com/apache/arrow/pull/14477 to verify APT 
packages on Linux
* I needed https://github.com/apache/arrow/pull/14479 to verify native 
wheels on MacOS


I cannot verify universal2 wheels on MacOS as the binaries are for 
macosx_10_14 but the script hardcodes macosx_11_0. And if I edit the 
filename in the script, I get "...macosx_10_14_universal2.whl is not a 
supported wheel on this platform". Is this intended?


On Fri, Oct 21, 2022, at 14:01, Jacob Wujciak wrote:

+1 (non-binding) verified on Manjaro with CUDA:

TEST_DEFAULT=0 \
   TEST_SOURCE=0 \
   TEST_INTEGRATION_CPP=1 \
   TEST_CPP=1 \
   TEST_PYTHON=1 \
   dev/release/verify-release-candidate.sh 10.0.0 0

TEST_DEFAULT=0 \
   TEST_SOURCE=0 \
   TEST_BINARY=1 \
   dev/release/verify-release-candidate.sh 10.0.0 0

with:
   gcc 12.2.2
   cuda_11.7.r11.7/compiler.31442593_0
   python 3.10.7

Thanks!

On Fri, Oct 21, 2022 at 8:07 AM Sutou Kouhei  wrote:


Hi,

I would like to propose the following release candidate (RC0) of Apache
Arrow version 10.0.0. This is a release consisting of 470
resolved JIRA issues[1].

This release candidate is based on commit:
89f9a0948961f6e94f1ef5e4f310b707d22a3c11 [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
The changelog is located at [12].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [13] for how to validate a release 
candidate.


See also a verification result on GitHub pull request [14].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 10.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow 10.0.0 because...

[1]:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%2010.0.0
[2]:
https://github.com/apache/arrow/tree/89f9a0948961f6e94f1ef5e4f310b707d22a3c11
[3]: 
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-10.0.0-rc0

[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
[7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[8]: https://apache.jfrog.io/artifactory/arrow/java-rc/10.0.0-rc0
[9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/10.0.0-rc0
[10]: https://apache.jfrog.io/artifactory/arrow/python-rc/10.0.0-rc0
[11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[12]:
https://github.com/apache/arrow/blob/89f9a0948961f6e94f1ef5e4f310b707d22a3c11/CHANGELOG.md
[13]:
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
[14]: https://github.com/apache/arrow/pull/14466

Re: [VOTE] Release Apache Arrow 10.0.0 - RC0

2022-10-23 Thread Benson Muite

WIP but source verification fails for me on CentOS 7 due to unsigned key 
from Neville Dipale:


TEST_DEFAULT=0 TEST_SOURCE=1 dev/release/verify-release-candidate.sh 
10.0.0 0


gpg: key 717D3FB2: no valid user IDs
gpg: this may be caused by a missing self-signature
...
gpg: Total number processed: 14
gpg:   w/o user IDs: 1
gpg:  unchanged: 13
Failed to verify release candidate. See /tmp/arrow-10.0.0.gOoKw for details.

On 10/22/22 22:32, David Li wrote:

Still WIP for me. Verified:
- C++, Python, Java, binaries on Ubuntu Linux 18.04/AMD64
- C++, Python, Java on MacOS 12.3/AArch64

* MacOS required Rosetta installed to generate Protobuf sources for Java
* I needed https://github.com/apache/arrow/pull/14477 to verify APT packages on 
Linux
* I needed https://github.com/apache/arrow/pull/14479 to verify native wheels 
on MacOS

I cannot verify universal2 wheels on MacOS as the binaries are for macosx_10_14 but the 
script hardcodes macosx_11_0. And if I edit the filename in the script, I get 
"...macosx_10_14_universal2.whl is not a supported wheel on this platform". Is 
this intended?

On Fri, Oct 21, 2022, at 14:01, Jacob Wujciak wrote:

+1 (non-binding) verified on Manjaro with CUDA:

TEST_DEFAULT=0 \
   TEST_SOURCE=0 \
   TEST_INTEGRATION_CPP=1 \
   TEST_CPP=1 \
   TEST_PYTHON=1 \
   dev/release/verify-release-candidate.sh 10.0.0 0

TEST_DEFAULT=0 \
   TEST_SOURCE=0 \
   TEST_BINARY=1 \
   dev/release/verify-release-candidate.sh 10.0.0 0

with:
   gcc 12.2.2
   cuda_11.7.r11.7/compiler.31442593_0
   python 3.10.7

Thanks!

On Fri, Oct 21, 2022 at 8:07 AM Sutou Kouhei  wrote:


Hi,

I would like to propose the following release candidate (RC0) of Apache
Arrow version 10.0.0. This is a release consisting of 470
resolved JIRA issues[1].

This release candidate is based on commit:
89f9a0948961f6e94f1ef5e4f310b707d22a3c11 [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
The changelog is located at [12].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [13] for how to validate a release candidate.

See also a verification result on GitHub pull request [14].

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 10.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow 10.0.0 because...

[1]:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%2010.0.0
[2]:
https://github.com/apache/arrow/tree/89f9a0948961f6e94f1ef5e4f310b707d22a3c11
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-10.0.0-rc0
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
[7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[8]: https://apache.jfrog.io/artifactory/arrow/java-rc/10.0.0-rc0
[9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/10.0.0-rc0
[10]: https://apache.jfrog.io/artifactory/arrow/python-rc/10.0.0-rc0
[11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[12]:
https://github.com/apache/arrow/blob/89f9a0948961f6e94f1ef5e4f310b707d22a3c11/CHANGELOG.md
[13]:
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
[14]: https://github.com/apache/arrow/pull/14466

Re: [VOTE][RUST] Release Apache Arrow Rust 22.0.0 RC1

2022-09-03 Thread Benson Muite


This probably now binding. Congratulations.

On 9/3/22 03:08, L. C. Hsieh wrote:

+1 (non-binding)

Verified on Intel Mac.

Thanks, Andrew.

On Fri, Sep 2, 2022 at 3:45 PM Andy Grove  wrote:


+1 (binding)

Verified on Ubuntu 20.04.4 LTS

Thanks, Andrew.

On Fri, Sep 2, 2022 at 12:25 PM Ian Joiner  wrote:


+1 (Non-binding)

Tested on macOS 12.2.1 / Apple M1

On Fri, Sep 2, 2022 at 2:11 PM Andrew Lamb  wrote:


Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 22.0.0.

This release candidate is based on commit:
e5b9d05ec50807666efe401729708d53216d79fc [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:



https://github.com/apache/arrow-rs/tree/e5b9d05ec50807666efe401729708d53216d79fc

[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-22.0.0-rc1
[3]:



https://github.com/apache/arrow-rs/blob/e5b9d05ec50807666efe401729708d53216d79fc/CHANGELOG.md

[4]:



https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh

Re: Proposal: Unassign idle issues

2022-07-08 Thread Benson Muite

On 7/8/22 18:49, Todd Farmer wrote:

Hello,

The backlog of ARROW issues currently stands at 2585 open issues [1]. The
size of the backlog presents challenges to users and developers alike, and
I believe the project would benefit from establishing guidance around issue
handling. I'll be submitting a series of proposals for discussion,
including this one focusing specifically on assigned issues that have gone
dormant.

It's my belief that issue assignment implies the assignee is actively
working on a task, or intends to start working on it in short order. In
cases where well-intentioned developers are assigned tasks, but no progress
is reported for extended periods of time, I propose that we unassign the
issue and add a comment explaining why the action was taken. I believe any
assigned issue that has been idle for 90 days or more is a good threshold
to use, and that currently maps to 370 such issues [2], or 14% of the
current backlog. (Note: 153 of those have not been updated in the past
year.)

I would also like to expand the existing "Report bugs and propose features"
documentation [3] to cover issue handling in general, including the ask
that assigned issues be actively worked within the 90 day period or be
unassigned.

In summary, here are the actions I propose:

1. Establish a threshold at which assigned, idle issues should be
unassigned and comment added.
2. Define that threshold to be 90 days.
3. Document the above as a project policy for issue handling (PR against
docs)
4. Automate 1 and 2 above.

Thoughts on this?

[1]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved
[2]
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved%20%20AND%20assignee%20IS%20NOT%20EMPTY%20AND%20updated%20%3C%20-90d
[3] https://arrow.apache.org/docs/developers/bug_reports.html

Best regards,

Todd Farmer

This seems reasonable. One might change the policy according to the
importance of an issue, with low priority issues given more time.

Re: [ANNOUNCE] New Arrow committers: Dewey Dunnington, Alenka Frim, and Rok Mihevc

2022-06-22 Thread Benson Muite


Well deserved! Congratulations!
On 6/22/22 21:02, Andrew Lamb wrote:

Congratulations!

On Wed, Jun 22, 2022 at 1:27 PM Dragoș Moldovan-Grünfeld <
dragos.m...@gmail.com> wrote:


Congratulations!

Sent from my iPhone


On 22 Jun 2022, at 18:13, Neal Richardson 

wrote:


On behalf of the Arrow PMC, I'm happy to announce that

Dewey Dunnington
Alenka Frim
Rok Mihevc

have all accepted invitations to become committers on Apache Arrow!
Welcome, thank you for all your contributions so far, and we look forward
to continuing to drive Apache Arrow forward to an even better place in

the

future.

Neal

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

2022-04-27 Thread Benson Muite


Attendees:
Ian Joiner
Matthew Topol
Benson Muite

Discussion points:
1) New book on Arrow - covers C++, Python and Go, out in June
2) Building ORC bindings in R would be useful, extensions to parallel R?
3) Comparing ORC and Parquet for IO
4) IO optimization vs SIMD optimization - Parquet seems well optimized, 
so SIMD would be more helpful, but SIMD requires some care from Go.

5) Substrait would be great to use from Go if it is developed more fully.
6) A developer guide to developing Arrow on the cloud maybe useful

Re: Arrow sync call April 13 at 12:00 US/Eastern, 16:00 UTC

2022-04-27 Thread Benson Muite


On 4/25/22 2:49 PM, David Li wrote:

Following up here:


N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be 
able to host the fortnightly sync call. Is anyone available to run the meeting 
that day?


Is anyone available to run the sync call this Wednesday?

On Wed, Apr 13, 2022, at 12:59, David Li wrote:

Attendees:

- David Li
- Eduardo Ponce
- Gavin Ray
- Ian Cook
- James Duong
- Matthew Topol
- Nic
- Niranda
- Raul Cumplido
- Rok
- Weston Pace
- Will Jones

N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
not be able to host the fortnightly sync call. Is anyone available to
run the meeting that day?

Agenda:

8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
next ~1-2 weeks. See the ML post [1] for details, including a wiki page
listing outstanding issues. In particular, there are some Go PRs that
could use attention from an interested Go developer [2], as well as
some temporal kernel PRs that could use a review [3].

Arrow C++ Compute Engine: Weston gave a status update;
APIs/documentation has been improved for users, though likely most will
use it through an API like Substrait; basic Substrait support has been
added with forthcoming improvements; more tooling to measure
performance is being worked on; general kernel execution overhead is
being addressed with an eye towards running smaller batches through the
engine. An asof join implementation is being worked on, and Go is
working towards Substrait bindings to be able to bind to the C++ engine.

Kernel vectorization/SIMD: Eduardo has been looking at making some of
the primitive kernels (e.g. arithmetic) more easily autovectorized by
the compiler, testing a variety of approaches. See related discussion
[4]. We do not have benchmarks to evaluate compiler performance in this
regard generally, but we have manually inspected some compiler output
and found that not all compilers manage to do this with the current
kernel implementations. We also don't have a holistic way to evaluate
this going forward, nor do we have a sense for current benchmark
coverage, though possibly we could generate benchmarks. However, it was
pointed out that general engine performance is likely more important,
and that current profiling indicates kernels are not yet a bottleneck,
though there may be low-hanging fruit here.

Flight/Flight SQL: we discussed the barriers to Flight SQL support in
Go; Flight SQL heavily uses union types which are not yet implemented.
A further proposal [5] has been submitted to extend the type metadata,
please take a look for those interested. The GetXdbcTypeInfo proposal
was merged, and the inline data proposal is still outstanding (but
probably ready to have a vote).

IPC/Format: it was asked if there's an IPC structure for serializing a
single array to reduce overhead. Current APIs likely suffice but
Niranda may submit a separate discussion to explain further.

[1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
[2]: https://github.com/apache/arrow/pull/12158
[3]: https://github.com/apache/arrow/pull/12657
[4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
[5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6

On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:

Hi all,

Our biweekly sync call is tomorrow at 12:00 noon Eastern time.

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

Thanks,
Ian

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

2022-04-27 Thread Benson Muite


Hi,

Can host if required, though the timing is not ideal for me. It may be 
helpful to vary the timing in future.


Benson

On 4/25/22 2:49 PM, David Li wrote:

Following up here:


N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be 
able to host the fortnightly sync call. Is anyone available to run the meeting 
that day?


Is anyone available to run the sync call this Wednesday?

On Wed, Apr 13, 2022, at 12:59, David Li wrote:

Attendees:

- David Li
- Eduardo Ponce
- Gavin Ray
- Ian Cook
- James Duong
- Matthew Topol
- Nic
- Niranda
- Raul Cumplido
- Rok
- Weston Pace
- Will Jones

N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
not be able to host the fortnightly sync call. Is anyone available to
run the meeting that day?

Agenda:

8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
next ~1-2 weeks. See the ML post [1] for details, including a wiki page
listing outstanding issues. In particular, there are some Go PRs that
could use attention from an interested Go developer [2], as well as
some temporal kernel PRs that could use a review [3].

Arrow C++ Compute Engine: Weston gave a status update;
APIs/documentation has been improved for users, though likely most will
use it through an API like Substrait; basic Substrait support has been
added with forthcoming improvements; more tooling to measure
performance is being worked on; general kernel execution overhead is
being addressed with an eye towards running smaller batches through the
engine. An asof join implementation is being worked on, and Go is
working towards Substrait bindings to be able to bind to the C++ engine.

Kernel vectorization/SIMD: Eduardo has been looking at making some of
the primitive kernels (e.g. arithmetic) more easily autovectorized by
the compiler, testing a variety of approaches. See related discussion
[4]. We do not have benchmarks to evaluate compiler performance in this
regard generally, but we have manually inspected some compiler output
and found that not all compilers manage to do this with the current
kernel implementations. We also don't have a holistic way to evaluate
this going forward, nor do we have a sense for current benchmark
coverage, though possibly we could generate benchmarks. However, it was
pointed out that general engine performance is likely more important,
and that current profiling indicates kernels are not yet a bottleneck,
though there may be low-hanging fruit here.

Flight/Flight SQL: we discussed the barriers to Flight SQL support in
Go; Flight SQL heavily uses union types which are not yet implemented.
A further proposal [5] has been submitted to extend the type metadata,
please take a look for those interested. The GetXdbcTypeInfo proposal
was merged, and the inline data proposal is still outstanding (but
probably ready to have a vote).

IPC/Format: it was asked if there's an IPC structure for serializing a
single array to reduce overhead. Current APIs likely suffice but
Niranda may submit a separate discussion to explain further.

[1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
[2]: https://github.com/apache/arrow/pull/12158
[3]: https://github.com/apache/arrow/pull/12657
[4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
[5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6

On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:

Hi all,

Our biweekly sync call is tomorrow at 12:00 noon Eastern time.

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

Thanks,
Ian

Re: Perf/Benchmark for temporal operations

2022-04-17 Thread Benson Muite


On 4/13/22 7:58 PM, Rok Mihevc wrote:

Thanks for describing the use case Li!


The examples we ran are on UTC timestamp without any timezone
complications, perhaps there is room for short circuits when there are no
timezone complications...


I think using UTC zoned timestamp array might currently behave as a
regular timezoned timestamp array and use the zoned path.
However, setting timezone="" should use a non-zoned computation path.
See here: 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/temporal_internal.h#L233

Rok



For many of the kernels, a comparison with memory bandwidth, for example 
as measured using Likwid[1], would be a good test of performance of the 
implementation.  However, this would typically require use of SIMD, and 
many initial implementations do not utilize SIMD operations, which at 
the moment is mostly done through the XSIMD library[2]. Maybe this is 
something to add to the developer documentation? There has been a 
related discussion on the list of xsimd adoption in the Arrow codebase.


[1] https://github.com/RRZE-HPC/likwid/wiki/Likwid-Bench
[2] https://github.com/xtensor-stack/xsimd

Re: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies, Wang Xudong, Yijie Shen, and Kun Liu

2022-03-09 Thread Benson Muite


Congratulations!

On 3/9/22 9:56 PM, David Li wrote:

Congrats everyone!

On Wed, Mar 9, 2022, at 13:47, Rok Mihevc wrote:

Congrats all!

Rok

On Wed, Mar 9, 2022 at 7:16 PM QP Hou  wrote:


Congratulations to all, well deserved!

On Wed, Mar 9, 2022 at 9:37 AM Daniël Heres  wrote:


Congratulations!

On Wed, Mar 9, 2022, 18:26 LM  wrote:


Congrats to you all!

On Wed, Mar 9, 2022 at 9:19 AM Chao Sun  wrote:


Congrats all!

On Wed, Mar 9, 2022 at 9:16 AM Micah Kornfield 
wrote:


Congrats!

On Wed, Mar 9, 2022 at 8:36 AM Weston Pace 

wrote:



Congratulations to all of you!

On Wed, Mar 9, 2022, 4:52 AM Matthew Turner <

matthew.m.tur...@outlook.com>

wrote:


Congrats all and thank you for your contributions! It's been great

to

work

with and learn from you all.

-Original Message-
From: Andrew Lamb 
Sent: Wednesday, March 9, 2022 8:59 AM
To: dev 
Subject: [ANNOUNCE] New Arrow committers: Raphael Taylor-Davies,

Wang

Xudong, Yijie Shen, and Kun Liu

On behalf of the Arrow PMC, I'm happy to announce that

Raphael Taylor-Davies
Wang Xudong
Yijie Shen
Kun Liu

Have all accepted invitations to become committers on Apache Arrow!
Welcome, thank you for all your contributions so far, and we look

forward

to continuing to drive Apache Arrow forward to an even better place

in

the

future.

This exciting growth in committers mirrors the growth of the Arrow

Rust

community.

Andrew

p.s. sorry for the somewhat impersonal email; I was trying to avoid
several very similar emails. I am truly excited for each of these
individuals.

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2022-03-02 Thread Benson Muite

Interested in learning more about this. Can work through the code and 
discuss on 17 March either 4:00 or 16:00 UTC.


Benson

On 3/3/22 12:03 AM, Andrew Lamb wrote:

I noticed that Matthew Turner added a note to the agenda[1] for a walk
through of the JIT code. I would be interested in this as well -- would
anyone plan to be on the call and discuss it?

I don't think I have time to prepare that content prior

Andrew

[1]
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#

Re: [ANNOUNCE] New Arrow PMC member: QP Hou

2022-02-17 Thread Benson Muite


Congratulations QP!
On 2/18/22 8:35 AM, Jiayu Liu wrote:

Congratulations QP!

On Fri, Feb 18, 2022 at 1:32 PM Micah Kornfield 
wrote:


Congrats!

On Thu, Feb 17, 2022 at 7:27 PM Weston Pace  wrote:


Congratulations QP!

On Thu, Feb 17, 2022 at 3:22 PM hao Yang <1371656737...@gmail.com>

wrote:


Congratulations QP!

On Fri, 18 Feb 2022 at 09:14, Vibhatha Abeykoon 

wrote:



Congratulations!

On Fri, Feb 18, 2022 at 5:51 AM Yijie Shen <

henry.yijies...@gmail.com>

wrote:


Congratulations QP!

On Fri, Feb 18, 2022 at 6:17 AM Phillip Cloud 

wrote:



Congratulations!!

On Thu, Feb 17, 2022 at 5:12 PM Neal Richardson <
neal.p.richard...@gmail.com>
wrote:


Congratulations!

Neal

On Thu, Feb 17, 2022 at 4:48 PM Rok Mihevc <

rok.mih...@gmail.com



wrote:



Congrats QP!

Rok

On Thu, Feb 17, 2022 at 10:41 PM David Li <

lidav...@apache.org



wrote:


Congrats QP!

On Thu, Feb 17, 2022, at 16:26, Matthew Turner wrote:

Congratulations, QP! Appreciate all of your contributions

and

guidance.


From: Sutou Kouhei 
Date: Thursday, February 17, 2022 at 3:17 PM
To: dev@arrow.apache.org 
Subject: [ANNOUNCE] New Arrow PMC member: QP Hou
The Project Management Committee (PMC) for Apache Arrow

has

invited

QP Hou to become a PMC member and we are pleased to

announce

that QP Hou has accepted.

Congratulations and welcome!









--
Vibhatha Abeykoon

Re: Building Arrow Cpp: Cannot find Boost on MacOS

2022-02-02 Thread Benson Muite


On 2/2/22 7:40 PM, Li Jin wrote:

David - Will give it a try. I am using Apple clang 12 on MacOS.


Related issue https://issues.apache.org/jira/browse/ARROW-15531

Re: Building Arrow Cpp: Cannot find Boost on MacOS

2022-02-02 Thread Benson Muite

Perhaps try a fresh build in a clean build directory. Most testing has 
been on Clang 11. Also check your other options, is -DARROW_COMPUTE=ON set?


On 2/2/22 6:46 PM, Li Jin wrote:

Thanks!

-DARROW_DEPENDENCY_SOURCE=BUNDLED

seems to do the trick - I can build without adding find_package!

Although, any idea how to get pass this?

"

*/Users/icexelloss/workspace/arrow/cpp/src/arrow/compute/kernels/scalar_string_internal.h:216:20:
**error: **unused function 'StringClassifyDoc' [-Werror,-Wunused-function]*

static FunctionDoc StringClassifyDoc(std::string class_summary, std::string
class_desc,

"

On Wed, Feb 2, 2022 at 10:11 AM Benson Muite 
wrote:


Can you try using one of the CMAKE options:
-DARROW_DEPENDENCY_SOURCE=BREW
-DARROW_DEPENDENCY_SOURCE=BUNDLED

see https://arrow.apache.org/docs/developers/cpp/building.html

Re: Building Arrow Cpp: Cannot find Boost on MacOS

2022-02-02 Thread Benson Muite


Can you try using one of the CMAKE options:
-DARROW_DEPENDENCY_SOURCE=BREW
-DARROW_DEPENDENCY_SOURCE=BUNDLED

see https://arrow.apache.org/docs/developers/cpp/building.html

On 2/2/22 5:44 PM, Li Jin wrote:

Also tried to test a basic CMake file with boost on my machine and it
appears to find it

CMakeLists.txt
"

find_package(Boost COMPONENTS program_options REQUIRED)


add_executable(main main.cpp)


target_link_libraries(main Boost::program_options)

"


Log:
"

-- The C compiler identification is AppleClang 12.0.0.1232

-- The CXX compiler identification is AppleClang 12.0.0.1232

-- Detecting C compiler ABI info

-- Detecting C compiler ABI info - done

-- Check for working C compiler:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
- skipped

-- Detecting C compile features

-- Detecting C compile features - done

-- Detecting CXX compiler ABI info

-- Detecting CXX compiler ABI info - done

-- Check for working CXX compiler:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
- skipped

-- Detecting CXX compile features

-- Detecting CXX compile features - done

-- Found Boost: /usr/local/lib/cmake/Boost-1.76.0/BoostConfig.cmake (found
version "1.76.0") found components: program_options

"

On Wed, Feb 2, 2022 at 9:32 AM Li Jin  wrote:


Yep, Here it is!

https://gist.github.com/icexelloss/db0e5df214addd63dc4ab0570ca7ee30

On Tue, Feb 1, 2022 at 6:28 PM Sutou Kouhei  wrote:


Hi,

Could you upload the log to something such as
https://gist.github.com/ and share the URL?


Thanks,
--
kou

In 
   "Re: Building Arrow Cpp: Cannot find Boost on MacOS" on Tue, 1 Feb 2022
16:47:45 -0500,
   Li Jin  wrote:


Hi!

I ran

"cmake .. -DARROW_BUILD_TESTS=ON -DARROW_COMPUTE=ON -DARROW_DATASET=ON
-DCMAKE_BUILD_TYPE=Debug -DCMAKE_FIND_DEBUG_MODE=ON"

and here is the log.

Perhaps Cmake cannot find where Brew installed this by default? (Just
guessing, new to CMake too..)

Li

On Tue, Feb 1, 2022 at 4:30 PM Sutou Kouhei  wrote:


Hi,

Could you run cmake with -DCMAKE_FIND_DEBUG_MODE=ON and
share log of it?


FYI: Boost 1.76.0 is found in our CI:




https://github.com/apache/arrow/runs/5017148285?check_suite_focus=true#step:7:183


   -- Found Boost: /usr/local/lib/cmake/Boost-1.76.0/BoostConfig.cmake
(found suitable version "1.76.0", minimum required is "1.64") found
components: system filesystem
   -- Boost include dir: /usr/local/include
   -- Boost libraries: Boost::system;Boost::filesystem


Thanks,
--
kou

In 


   "Building Arrow Cpp: Cannot find Boost on MacOS" on Tue, 1 Feb 2022
16:18:13 -0500,
   Li Jin  wrote:


Hello!

I am new to the Arrow cpp code and play with it a little.

Unfortunately I

hit this error when trying to cmake with preset "ninja-debug-basic".

I

wonder if anyone else has hit an similar issue?

cmake .. --preset ninja-debug-basic

...

-- ARROW_ZSTD_BUILD_VERSION: v1.5.1

-- ARROW_ZSTD_BUILD_SHA256_CHECKSUM:
dc05773342b28f11658604381afd22cb0a13e8ba17ff2bd7516df377060c18dd

CMake Error at




/usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230

(message):

   Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)

   (Required is at least version "1.58")

Call Stack (most recent call first):





/usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594

(_FPHSA_FAILURE_MESSAGE)



  /usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/FindBoost.cmake:2375

(find_package_handle_standard_args)

   cmake_modules/FindBoostAlt.cmake:41 (find_package)

   cmake_modules/ThirdpartyToolchain.cmake:241 (find_package)

   cmake_modules/ThirdpartyToolchain.cmake:956 (resolve_dependency)

   CMakeLists.txt:554 (include)


I installed boost via HomeBrew under

"/usr/local/Cellar/boost/1.76.0/"

but

I am not really familiar with where cmake looks for boost

dependency..



Much appreciated,

Li

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-01-30 Thread Benson Muite


+1 Non binding

Checks on Rocky Linux 8, x86-64, GNU 8.5

Source  C++, Python, GLib, Ruby, Java, Go, Javascript
Wheels
Binaries

Some of the warnings below:

/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/compute/exec/ir_consumer.cc: 
In function ‘arrow::Result 
arrow::compute::Convert(const 
org::apache::arrow::computeir::flatbuf::Relation&)’:
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/compute/exec/ir_consumer.cc:649:30: 
warning: ‘key_null_placement’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
  SortOptions{std::move(sort_keys), 
*null_placement}, nullptr},


/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/plasma/test/serialization_tests.cc: 
In member function ‘int 
plasma::TestPlasmaSerialization::CreateTemporaryFile()’:
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/plasma/test/serialization_tests.cc:85:12: 
warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound 
1024 equals destination size [-Wstringop-truncation]

 strncpy(path, ss.str().c_str(), sizeof(path));
 ~~~^~
In member function ‘int 
plasma::TestPlasmaSerialization::CreateTemporaryFile()’,
inlined from ‘virtual void 
plasma::TestPlasmaSerialization_EvictRequest_Test::TestBody()’ at 
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/plasma/test/serialization_tests.cc:275:31:
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/plasma/test/serialization_tests.cc:85:12: 
warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound 
1024 equals destination size [-Wstringop-truncation]

 strncpy(path, ss.str().c_str(), sizeof(path));
 ~~~^~


In file included from 
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/util/bit_run_reader.h:26,
 from 
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/util/bit_util_test.cc:45:
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/util/bitmap_reader.h: 
In function ‘void arrow::TestBitmapUInt64Reader::AssertWords(const 
arrow::Buffer&, int64_t, int64_t, const std::vector&)’:
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/util/bitmap_reader.h:99:16: 
warning: ‘reader.arrow::internal::BitmapUInt64Reader::carry_bits_’ may 
be used uninitialized in this function [-Wmaybe-uninitialized]

   uint64_t word = carry_bits_ | (next_word << num_carry_bits_);
^~~~
/tmp/arrow-7.0.0.v8rVF/apache-arrow-7.0.0/cpp/src/arrow/util/bit_util_test.cc:245:34: 
note: ‘reader.arrow::internal::BitmapUInt64Reader::carry_bits_’ was 
declared here
 internal::BitmapUInt64Reader reader(buffer.data(), start_offset, 
length);



+ bundle --version
Bundler version 2.2.24
+ bundle install --path vendor/bundle
[DEPRECATED] The `--path` flag is deprecated because it relies on being 
remembered across bundler invocations, which bundler will no longer do 
in future versions. Instead please use `bundle config set --local path 
'vendor/bundle'`, and stop using this flag
Don't run Bundler as root. Bundler can ask for sudo if it is needed, and 
installing your bundle as root will break this application

for all non-root users on this machine.


In file included from 
/usr/include/c++/8/x86_64-redhat-linux/bits/os_defines.h:39,
 from 
/usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2458,

 from /usr/include/c++/8/cstdint:38,
 from 
/tmp/arrow-7.0.0.v8rVF/install/include/arrow/array/array_base.h:20,
 from 
/tmp/arrow-7.0.0.v8rVF/install/include/arrow/array.h:37,
 from 
/tmp/arrow-7.0.0.v8rVF/install/include/arrow/api.h:22,

 from red-arrow.hpp:22,
 from arrow.cpp:20:
/usr/include/features.h:381:4: warning: #warning _FORTIFY_SOURCE 
requires compiling with optimization (-O) [-Wcpp]

 #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
^~~
In file included from 
/usr/include/c++/8/x86_64-redhat-linux/bits/os_defines.h:39,
 from 
/usr/include/c++/8/x86_64-redhat-linux/bits/c++config.h:2458,

 from /usr/include/c++/8/cstdint:38,
 from 
/tmp/arrow-7.0.0.v8rVF/install/include/arrow/array/array_base.h:20,
 from 
/tmp/arrow-7.0.0.v8rVF/install/include/arrow/array.h:37,
 from 
/tmp/arrow-7.0.0.v8rVF/install/include/arrow/api.h:22,

 from red-arrow.hpp:22,
 from converters.hpp:20,
 from converters.cpp:20:
/usr/include/features.h:381:4: warning: #warning _FORTIFY_SOURCE 
requires compiling with optimization (-O) [-Wcpp]

 #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
^~~
In file included from /usr/include/bits/libc-header-start.h:33,
 from /usr/include/stdio.h:27,
 from /usr/include/ruby/defines.h:126,
 from /usr/include/ruby/ruby.h:29,
 fr

Re: [ANNOUNCE] New Arrow PMC chair: Kouhei Sutou

2022-01-25 Thread Benson Muite


Congratulations Kou!
On 1/25/22 8:44 PM, Vibhatha Abeykoon wrote:

Congrats Kou!


On Tue, Jan 25, 2022 at 11:13 PM Ian Joiner  wrote:


Congrats Kou!

On Tuesday, January 25, 2022, Wes McKinney  wrote:


I am pleased to announce that we have a new PMC chair and VP as per
our newly started tradition of rotating the chair once a year. I have
resigned and Kouhei was duly elected by the PMC and approved
unanimously by the board. Please join me in congratulating Kou!

Thanks,
Wes

Re: [DISCUSS] Annual rotation of Arrow PMC chair

2022-01-05 Thread Benson Muite


Congratulations!
On 1/6/22 12:54 AM, Sutou Kouhei wrote:

Hi,

Thanks for nominating me. I'm happy to serve.

Thanks,

Re: [ANNOUNCE] New Arrow committer: Alessandro Molina

2022-01-05 Thread Benson Muite


Congratulations

On 1/5/22 8:39 PM, Vibhatha Abeykoon wrote:

Congratulations

On Wed, Jan 5, 2022 at 9:29 PM Supun Kamburugamuve  wrote:


Congratulations!

On Wed, Jan 5, 2022 at 10:17 AM Niranda Perera 
wrote:


Congrats Alessandro! :-)

On Wed, Jan 5, 2022 at 9:54 AM David Li  wrote:


Congrats Alessandro!

On Wed, Jan 5, 2022, at 09:45, Ian Cook wrote:

Congratulations Alessandro!

On Wed, Jan 5, 2022 at 9:40 AM Rok Mihevc 

wrote:


Congrats Alessandro!

On Wed, Jan 5, 2022 at 2:24 PM Eduardo Ponce 

wrote:


Great addition to PMC. Congratulations!

~Eduardo

On Wed, Jan 5, 2022 at 7:34 AM Wes McKinney 


wrote:



On behalf of the Arrow PMC, I'm happy to announce that

Alessandro

Molina has accepted an invitation to become a committer on

Apache

Arrow. Welcome, and thank you for your contributions!

Wes







--
Niranda Perera
https://niranda.dev/
@n1r44 




--
Supun Kamburugamuve

Re: [ANNOUNCE] New Arrow PMC member: Yibo Cai

2022-01-04 Thread Benson Muite


Congratulations!

On 1/4/22 6:00 PM, Wang Xudong wrote:

Congratulations！

xudong

Andrew Lamb  于2022年1月4日周二 21:43写道：


Congratulations, Yibo!

Andrew

On Tue, Jan 4, 2022 at 8:14 AM Neal Richardson <
neal.p.richard...@gmail.com>
wrote:


Congratulations, Yibo!

Neal

On Tue, Jan 4, 2022 at 7:15 AM Jacky Lee  wrote:


Congratulations Yibo!

Rok Mihevc  于2022年1月4日周二 20:07写道：


Congratulations Yibo!

On Tue, Jan 4, 2022 at 9:54 AM Eduardo Ponce 

wrote:


Congratulations Yibo! Thanks for all your contributions and

guidance.


On Tue, Jan 4, 2022 at 3:52 AM Wes McKinney 

wrote:



The Project Management Committee (PMC) for Apache Arrow has

invited

Yibo Cai to become a PMC member and we are pleased to announce
that Yibo has accepted.

Congratulations and welcome!

Re: Arrow in HPC

2021-12-28 Thread Benson Muite

This is very nice. Look forward to trying it out. One should get
performance improvements on hardware with better interconnects, so
performance just with TCP is not illustrative of all cases.

On 12/28/21 11:41 PM, David Li wrote:

Thanks for the feedback!

Collective operations: unfortunately, these look very different from the model
Flight provides (which is just RPC). If there's interest, we could consider
implementing or exposing them in the future, or looking at making sure Arrow's
IPC APIs play well with the MPI APIs.

Non-blocking operations: I was thinking about this. In Flight, this would
probably mean async APIs, which have been discussed before (either here or on
JIRA). The UCX APIs lend themselves naturally to implementing an async API and
I would like to explore this further.

Serialization: what you probably want is GetRecordBatchPayload[1] and related
functions, which is also what Flight uses as part of its zero-copy
optimizations. This function will allocate a buffer for the IPC metadata and
return that along with a list of buffers to be written; you can then
individually send the buffers. You do have to remember to add padding yourself.
(It's what this proof-of-concept does: [2])

@Antoine: sorry, I was a little loose with things there. I don't see an HTTP/2
implementation for UCX, unfortunately.

[1]:
https://github.com/apache/arrow/blob/06b10133e486ff736e657f79ffad7b029150cfcd/cpp/src/arrow/ipc/writer.h#L389
[2]:
https://github.com/lidavidm/arrow/blob/cf804e3505b6dab996d03f8fab658aea02504090/cpp/src/arrow/flight/transport/ucx/ucx_internal.cc#L341

On Tue, Dec 28, 2021, at 15:35, Antoine Pitrou wrote:

Le 28/12/2021 à 20:09, David Li a écrit :

Antoine/Micah raised the possibility of extending gRPC instead. That would
be preferable, frankly, given otherwise we'd might have to re-implement a
lot of what gRPC and HTTP2 provide by ourselves. However, the necessary
proposal stalled and was dropped without much discussion:
https://groups.google.com/g/grpc-io/c/oIbBfPVO0lY

I'm not sure whether I proposed extending gRPC :-) Is there an HTTP2
implementation above UCX? If so, we could devise a Flight
implementation over REST/HTTP2, which might also make the TCP backend
faster than with gRPC.

Regards

Antoine.

Re: Arrow vs Artus

2021-12-28 Thread Benson Muite

The paper[1] is helpful. Compression may also be helpful - but it may be 
difficult to standardize this.


[1] https://vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf


On 12/25/21 5:37 AM, Micah Kornfield wrote:

What exactly are you looking for?  To my knowledge neither Capacitor nor
Artus have been described in enough detail external to Google to allow for
external benchmarking, so the details would probably only be relevant to
Google.

Both formats have more complicated encodings and embedded data-structures
making them closer to Parquet (which is loosely based on precursor to
capacitor) and ORC then Arrow.  There are interesting ideas from the
Procella paper which covers Artus that might be worth thinking about in the
context of these formats (or a new one).

Arrow has not spent much focus on optimizing storage size.

Cheers,
Micah

On Wednesday, December 22, 2021, Benson Muite 
wrote:


On 12/23/21 7:14 AM, Hayden Livingston wrote:


Has anyone been able to benchmark the Artus file format vs Arrow?

It seems that the Artus file format is gaining traction inside Google,
replacing their current columnar format Capacitor.

Hayden,

Do you have a link to a specification or implementation of Artus?
Performance may also be related to disk type, network etc.

Re: Jira Access

2021-12-22 Thread Benson Muite


On 12/23/21 8:01 AM, Dulvin Witharane wrote:

Hi,

I would love to have access to JIRA. Please enroll me or let me know the
due process.

Thanks and regards,

You should be able to make a JIRA account but do need to request 
contributor permissions as described in:


https://github.com/apache/arrow/blob/master/docs/source/developers/contributing.rst

Re: Arrow sync call December 22 at 12:00 US/Eastern, 17:00 UTC

2021-12-22 Thread Benson Muite


On 12/22/21 11:04 PM, Ian Cook wrote:


Discussion of how best to use this meeting in 2022
- Consider changing the day/time? How can we best accommodate time
zones, people doing Arrow dev work in day jobs vs. on evenings and
weekends, etc? Further discussion about this is welcome
Mining the Arrow repositories using [1], shows that over the past year 
most commits in the Arrow and Arrow-RS repositories are between 
9:00-16:00 UTC, nevertheless given the distribution of people, some 
variation maybe helpful.

- Idea from Eduardo to invite participants to present five-minute
lightning talks on pertinent Arrow topics in this call. Further
discussion about this is welcome

This might be helpful.




[0] https://github.com/gotec/git2net
[1] https://postimg.cc/TKHBxbLv
[2] https://postimg.cc/F7YBZRCP

Re: Arrow vs Artus

2021-12-22 Thread Benson Muite


On 12/23/21 7:14 AM, Hayden Livingston wrote:

Has anyone been able to benchmark the Artus file format vs Arrow?

It seems that the Artus file format is gaining traction inside Google,
replacing their current columnar format Capacitor.


Hayden,
Do you have a link to a specification or implementation of Artus? 
Performance may also be related to disk type, network etc.

Re: [ANNOUNCE] New Arrow PMC member: Daniël Heres

2021-12-21 Thread Benson Muite


Congratulations!

On 12/22/21 9:23 AM, QP Hou wrote:

Congrats Daniël! Thank you for all your awesome work on the rust
implementation and datafusion!

On Tue, Dec 21, 2021 at 9:49 PM Eduardo Ponce  wrote:


Congrats!


On Dec 21, 2021, at 12:18 PM, Wes McKinney  wrote:

The Project Management Committee (PMC) for Apache Arrow has invited
Daniël Heres to become a PMC member and we are pleased to announce
that Daniël has accepted.

Congratulations and welcome!

Re: [VOTE][RUST] Release Apache Arrow Rust 6.4.0 RC1

2021-12-12 Thread Benson Muite


+1 non binding. Ran script on Rocky Linux 8.

Steps:
dnf -y update
dnf -y install gcc tar git
git clone https://github.com/apache/arrow-rs
cd arrow-rs/dev/release
bash verify-release-candidate.sh 6.4.0 1

On 12/10/21 10:30 PM, Andrew Lamb wrote:

Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 6.4.0. I am personally excited about this release as it contains
the input validation needed to close out the currently open RUSTSEC issues
for Arrow [5].

This release candidate is based on commit:
7a0bca35239f1d4fc3a1dca410384a1e5e962147 [1].

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:
https://github.com/apache/arrow-rs/tree/7a0bca35239f1d4fc3a1dca410384a1e5e962147
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-6.4.0-rc1
[3]:
https://github.com/apache/arrow-rs/blob/7a0bca35239f1d4fc3a1dca410384a1e5e962147/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh
[5]: https://github.com/rustsec/advisory-db/tree/main/crates/arrow

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-12-08 Thread Benson Muite


The slides are very helpful. Is the timing correct, 4 UTC?

On 12/9/21 12:16 AM, Andrew Lamb wrote:

I plan to do the biweekly call tomorrow

We are experimenting a bit with the agenda[1] for this one.

After discussing any other agenda items people want, I plan to do a code
walkthrough with Matthew Turner and we'll discuss various parts of the
DataFusion and Arrow codebases and see where the conversation goes

Andrew



[1]
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#

Re: [ANNOUNCE] New Arrow committer: Rémi Dattai

2021-12-08 Thread Benson Muite


Congratulations Rémi

On 12/9/21 3:39 AM, QP Hou wrote:

Congrats Rémi, thank you for your epic work on datafusion :)

On Wed, Dec 8, 2021 at 9:00 AM Andrew Lamb  wrote:


Congratulations Rémi!

On Wed, Dec 8, 2021 at 10:56 AM Nic  wrote:


Congratulations! :)

On Wed, 8 Dec 2021 at 07:20, Jorge Cardoso Leitão <
jorgecarlei...@gmail.com>
wrote:


Congrats!

On Wed, Dec 8, 2021 at 8:14 AM Daniël Heres 

wrote:



Congrats Rémi!

On Wed, Dec 8, 2021, 04:27 Ian Joiner  wrote:


Congrats!

On Tuesday, December 7, 2021, Wes McKinney 

wrote:



On behalf of the Arrow PMC, I'm happy to announce that Rémi Dattai

has

accepted an invitation to become a committer on Apache Arrow.

Welcome,

and thank you for your contributions!

Wes

Re: [VOTE] Release Apache Arrow JS 6.0.2

2021-12-07 Thread Benson Muite

At the moment, the release is not packaged or signed. Thus one can only 
run the tests on the branch in the git repository. A script to do that 
on Linux is available at:


https://github.com/bkmgit/arrow/blob/ARROW-14801/dev/release/verify-js.sh

My understanding is that only PMC members can sign, at the moment not 
many seem to use Javascript extensively. Can create a script for 
generating the Javascript only release source package based on the 
current source packaging and release scripts, but a PMC member would 
need to have this signed and uploaded.


@Dominik - was not aware of arrow-wasm, thanks.

Arrow rust implementation is in another repository and has support for 
Javascript/Webassembly :


https://github.com/apache/arrow-rs/tree/master/arrow

The release cadence for the Rust implementation is much higher than for 
the  C++ implementation. Efficiencies might be gained by releasing Rust 
and Javascript point implementations together since then the process of 
creating and verifying signed software would minimize PMC workload.


Benson

On 12/6/21 1:01 AM, Wes McKinney wrote:

hi Dominik — can you provide instructions for how we should verify the
release, aside from checking the GPG signature and checksums?

On Sun, Nov 28, 2021 at 12:41 PM Dominik Moritz  wrote:


Are you talking about https://github.com/domoritz/arrow-wasm? It definitely
isn’t ready for prime time. The overhead of WASM, some issues with the Rust
implementation (some of which I think will be addressed with the Arrow2
Rust migration), and the much larger bundle size make it not practical
right now. As the WASM ecosystem matures, we can reevaluate and maybe also
consider moving only some of the processing in WASM and leave the rest in
JS. I’m pretty excited about WASM and what it could bring to Arrow
especially when combined with WebGPU.

Either way, I think we should release the 6.0.2 version soon. @PMC, could
you vote on the patch release?

On Nov 28, 2021 at 04:33:41, Benson Muite 
wrote:


Rust implementation can be compiled to WebAssembly and is released
biweekly. The Javascript version compiled from Rust may not satisfy all
Javascript users, but maybe there could be some collaboration to reduce
duplicated efforts?


On 11/23/21 9:52 PM, Dominik Moritz wrote:

   Ahh, thank you for the clarification. There are no breaking changes in

this point release, just fixes.


@PMC, could you please vote on this point release.


Would anyone volunteer as the release manager with me to give me a better

understanding of the process?


On Nov 23, 2021 at 13:09:47, Benson Muite 

wrote:



https://issues.apache.org/jira/browse/ARROW-14801







Rust has its own repository and does frequent point releases:



https://github.com/apache/arrow-rs/tree/master/dev/release







however, even point releases require 3 PMC binding +1 votes and API



breaking changes can only take place on major releases.







Many of the tests for releases can be automated, possibly relieving some



of the PMC burden in the current process.  Judgement on code quality and



software license is still required though[1]. Similarly, releases need



to be signed.











[1] https://infra.apache.org/release-publishing.html







On 11/23/21 7:33 PM, Dominik Moritz wrote:







I tested Node v14.18.1 and tests pass. I think we can go ahead and

make a






release.











@Benson, could you help me update the script to work off of branches. I







don’t know what the expected process for release verification is. I’d be







happy to adopt another process.











On Nov 20, 2021 at 09:57:53, Dominik Moritz  wrote:











Thanks for catching that.















Jest is used for running the tests and jest supports node 14.15. Could

we






switch to node 14.15 instead of 14.0 for this test?















On Nov 20, 2021 at 05:37:00, Benson Muite 







wrote:















Hi,















Tested this on AlmaLinux 8. Following steps:















   export NVM_DIR="`pwd`/.nvm"







   mkdir -p $NVM_DIR







   curl -o-







https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | \







 PROFILE=/dev/null bash







   [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"















   nvm install --lts







   npm install -g yarn







   git clone https://github.com/apache/arrow







   cd arrow







   git checkout release-6.0.2-js







   cd js







   yarn --frozen-lockfile







   yarn run-s clean:all lint build







   yarn test















Tests pass.















yarn 1.22.17







npm 8.1.0







node 16.13.0















Tests also pass on







node 17.0.0























Node 14 is supported until 2023, however if one tries to use Node 14,







one gets the error:















jest@27.0.6: The engine "node" is incompatible with this module.







Ex

Re: [VOTE][RUST] Release Apache Arrow Rust 6.3.0 RC1

2021-11-28 Thread Benson Muite


+1 non-binding

tests pass, steps on Rocky Linux 8
 dnf -y update
 dnf -y install git tar gcc
 git clone https://github.com/apache/arrow-rs
 cd arrow-rs/dev/release
 ./verify-release-candidate.sh 6.3.0 1


On 11/26/21 6:15 PM, Jörn Horstmann wrote:

+1 non-binding

Updated our query engine (which was already on an earlier commit of the
active_release branch) and everything works fine

On Fri, Nov 26, 2021 at 3:22 PM Wang Xudong  wrote:


+1 non-binding

"Release candidate looks good!"

---
xudong963

Andrew Lamb  于2021年11月26日周五 下午9:45写道：


Hi,

I would like to propose a release of Apache Arrow Rust Implementation,
version 6.3.0.

This release candidate is based on commit:
686ac184c10f99c89ef555cc97b2231fba4d4cee [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:



https://github.com/apache/arrow-rs/tree/686ac184c10f99c89ef555cc97b2231fba4d4cee

[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-6.3.0-rc1
[3]:



https://github.com/apache/arrow-rs/blob/686ac184c10f99c89ef555cc97b2231fba4d4cee/CHANGELOG.md

[4]:



https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh

Re: [VOTE] Release Apache Arrow JS 6.0.2

2021-11-28 Thread Benson Muite

Rust implementation can be compiled to WebAssembly and is released 
biweekly. The Javascript version compiled from Rust may not satisfy all 
Javascript users, but maybe there could be some collaboration to reduce 
duplicated efforts?



On 11/23/21 9:52 PM, Dominik Moritz wrote:

  Ahh, thank you for the clarification. There are no breaking changes in
this point release, just fixes.

@PMC, could you please vote on this point release.

Would anyone volunteer as the release manager with me to give me a better
understanding of the process?

On Nov 23, 2021 at 13:09:47, Benson Muite 
wrote:


https://issues.apache.org/jira/browse/ARROW-14801

Rust has its own repository and does frequent point releases:
https://github.com/apache/arrow-rs/tree/master/dev/release

however, even point releases require 3 PMC binding +1 votes and API
breaking changes can only take place on major releases.

Many of the tests for releases can be automated, possibly relieving some
of the PMC burden in the current process.  Judgement on code quality and
software license is still required though[1]. Similarly, releases need
to be signed.


[1] https://infra.apache.org/release-publishing.html

On 11/23/21 7:33 PM, Dominik Moritz wrote:

   I tested Node v14.18.1 and tests pass. I think we can go ahead and make a

release.


@Benson, could you help me update the script to work off of branches. I

don’t know what the expected process for release verification is. I’d be

happy to adopt another process.


On Nov 20, 2021 at 09:57:53, Dominik Moritz  wrote:



Thanks for catching that.







Jest is used for running the tests and jest supports node 14.15. Could we



switch to node 14.15 instead of 14.0 for this test?







On Nov 20, 2021 at 05:37:00, Benson Muite 



wrote:







Hi,







Tested this on AlmaLinux 8. Following steps:







  export NVM_DIR="`pwd`/.nvm"



  mkdir -p $NVM_DIR



  curl -o-



https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | \



PROFILE=/dev/null bash



  [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"







  nvm install --lts



  npm install -g yarn



  git clone https://github.com/apache/arrow



  cd arrow



  git checkout release-6.0.2-js



  cd js



  yarn --frozen-lockfile



  yarn run-s clean:all lint build



  yarn test







Tests pass.







yarn 1.22.17



npm 8.1.0



node 16.13.0







Tests also pass on



node 17.0.0











Node 14 is supported until 2023, however if one tries to use Node 14,



one gets the error:







jest@27.0.6: The engine "node" is incompatible with this module.



Expected version "^10.13.0 || ^12.13.0 || ^14.15.0 || >=15.0.0". Got



"14.0.0"



error Found incompatible module.











The current release verification script could be update to support



testing directly from a branch if this will be the point release process



in future.







On 11/20/21 12:25 AM, Dominik Moritz wrote:







Hi,











I would like to propose a patch release for Arrow JS. The release is



forked







off of maint-6.0.x and available at







https://github.com/apache/arrow/tree/release-6.0.2-js.











The release contains two fixes for the js bundle:







ARROW-14773: [JS] Fix sourcemap paths







<https://github.com/apache/arrow/pull/11741>







ARROW-14774: [JS] Correct package exports







<https://github.com/apache/arrow/pull/11742>











[ ] +1 Release this as Apache Arrow JS 6.0.2







[ ] +0







[ ] -1 Do not release this as Apache Arrow JS 6.0.2 because...











Thank you,







Dominik

Re: [VOTE] Release Apache Arrow JS 6.0.2

2021-11-23 Thread Benson Muite


https://issues.apache.org/jira/browse/ARROW-14801

Rust has its own repository and does frequent point releases:
https://github.com/apache/arrow-rs/tree/master/dev/release

however, even point releases require 3 PMC binding +1 votes and API 
breaking changes can only take place on major releases.


Many of the tests for releases can be automated, possibly relieving some 
of the PMC burden in the current process.  Judgement on code quality and 
software license is still required though[1]. Similarly, releases need 
to be signed.



[1] https://infra.apache.org/release-publishing.html

On 11/23/21 7:33 PM, Dominik Moritz wrote:

  I tested Node v14.18.1 and tests pass. I think we can go ahead and make a
release.

@Benson, could you help me update the script to work off of branches. I
don’t know what the expected process for release verification is. I’d be
happy to adopt another process.

On Nov 20, 2021 at 09:57:53, Dominik Moritz  wrote:


Thanks for catching that.

Jest is used for running the tests and jest supports node 14.15. Could we
switch to node 14.15 instead of 14.0 for this test?

On Nov 20, 2021 at 05:37:00, Benson Muite 
wrote:


Hi,

Tested this on AlmaLinux 8. Following steps:

 export NVM_DIR="`pwd`/.nvm"
 mkdir -p $NVM_DIR
 curl -o-
https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | \
   PROFILE=/dev/null bash
 [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"

 nvm install --lts
 npm install -g yarn
 git clone https://github.com/apache/arrow
 cd arrow
 git checkout release-6.0.2-js
 cd js
 yarn --frozen-lockfile
 yarn run-s clean:all lint build
 yarn test

Tests pass.

yarn 1.22.17
npm 8.1.0
node 16.13.0

Tests also pass on
node 17.0.0


Node 14 is supported until 2023, however if one tries to use Node 14,
one gets the error:

jest@27.0.6: The engine "node" is incompatible with this module.
Expected version "^10.13.0 || ^12.13.0 || ^14.15.0 || >=15.0.0". Got
"14.0.0"
error Found incompatible module.


The current release verification script could be update to support
testing directly from a branch if this will be the point release process
in future.

On 11/20/21 12:25 AM, Dominik Moritz wrote:

Hi,


I would like to propose a patch release for Arrow JS. The release is
forked

off of maint-6.0.x and available at

https://github.com/apache/arrow/tree/release-6.0.2-js.


The release contains two fixes for the js bundle:

ARROW-14773: [JS] Fix sourcemap paths

<https://github.com/apache/arrow/pull/11741>

ARROW-14774: [JS] Correct package exports

<https://github.com/apache/arrow/pull/11742>


[ ] +1 Release this as Apache Arrow JS 6.0.2

[ ] +0

[ ] -1 Do not release this as Apache Arrow JS 6.0.2 because...


Thank you,

Dominik

Re: [VOTE] Release Apache Arrow JS 6.0.2

2021-11-20 Thread Benson Muite


Hi,

Tested this on AlmaLinux 8. Following steps:

export NVM_DIR="`pwd`/.nvm"
mkdir -p $NVM_DIR
curl -o- 
https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | \

  PROFILE=/dev/null bash
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"

nvm install --lts
npm install -g yarn
git clone https://github.com/apache/arrow
cd arrow
git checkout release-6.0.2-js
cd js
yarn --frozen-lockfile
yarn run-s clean:all lint build
yarn test

Tests pass.

yarn 1.22.17
npm 8.1.0
node 16.13.0

Tests also pass on
node 17.0.0


Node 14 is supported until 2023, however if one tries to use Node 14, 
one gets the error:


jest@27.0.6: The engine "node" is incompatible with this module. 
Expected version "^10.13.0 || ^12.13.0 || ^14.15.0 || >=15.0.0". Got 
"14.0.0"

error Found incompatible module.


The current release verification script could be update to support 
testing directly from a branch if this will be the point release process 
in future.


On 11/20/21 12:25 AM, Dominik Moritz wrote:

Hi,

I would like to propose a patch release for Arrow JS. The release is forked
off of maint-6.0.x and available at
https://github.com/apache/arrow/tree/release-6.0.2-js.

The release contains two fixes for the js bundle:
ARROW-14773: [JS] Fix sourcemap paths

ARROW-14774: [JS] Correct package exports


[ ] +1 Release this as Apache Arrow JS 6.0.2
[ ] +0
[ ] -1 Do not release this as Apache Arrow JS 6.0.2 because...

Thank you,
Dominik

Re: [DISCUSS] A repository for collaborative prototyping + algorithms / performance research?

2021-11-18 Thread Benson Muite


On 11/18/21 6:29 PM, Wes McKinney wrote:

On Thu, Nov 18, 2021 at 2:25 AM Antoine Pitrou  wrote:



Le 18/11/2021 à 02:54, Wes McKinney a écrit :


In short I wanted to propose creating a separate git repository under
apache/arrow-* for this purpose, to invite these kinds of
contributions to our project and to help more R&D work happen inside
the Arrow umbrella so we have clean IP lineage. I can't imagine we
would ever make releases from this repository but it could serve as a
flexible place to put stuff (even in branches that are independent
from each other) that may or may not be ready to make its home in one
of our production repositories.


What would be the rules for contributing? Is it just a place where
people store source code?


People would make pull requests like any other repository, but it
would be a bit more free form than our other repositories. The goal is
to get this kind of collaboration (code and the discussions) happening
on Arrow community channels.
This may be helpful. Some of it might also lead to developer/interested 
user documentation, perhaps similar to the R-Journal 
https://journal.r-project.org/ but with less formality

Re: [ANNOUNCE] New Arrow PMC member: Joris Van den Bossche

2021-11-18 Thread Benson Muite


Congratulations!

On 11/18/21 2:17 PM, Rok Mihevc wrote:

Congrats Joris!

On Thu, Nov 18, 2021 at 11:40 AM Krisztián Szűcs
 wrote:


Congrats Joris!

On Thu, Nov 18, 2021 at 10:03 AM Maarten Breddels
 wrote:


Nice Joris, congratulations!





On Thu, Nov 18, 2021 at 9:34 AM Nic  wrote:


Congratulations, great news!

On Thu, 18 Nov 2021 at 07:27, Joris Van den Bossche <
jorisvandenboss...@gmail.com> wrote:


Thanks all!

On Thu, 18 Nov 2021 at 08:10, Jorge Cardoso Leitão <
jorgecarlei...@gmail.com>
wrote:


Congratulations!

On Thu, Nov 18, 2021 at 3:34 AM Ian Joiner 

wrote:



Congrats Joris and really thanks for your effort in integrating ORC

and

dataset!

Ian


On Nov 17, 2021, at 5:55 PM, Wes McKinney 

wrote:


The Project Management Committee (PMC) for Apache Arrow has invited
Joris Van den Bossche to become a PMC member and we are pleased to
announce that Joris has accepted.

Congratulations and welcome!

Re: [VOTE][RUST][Datafusion] Release Apache Arrow Datafusion 6.0.0 RC0

2021-11-14 Thread Benson Muite


+1 (non-binding)

Tested on Rocky Linux 8 using the verification script

On 11/14/21 2:43 PM, Wang Xudong wrote:

+1 (non-binding)

I checked on macOS Monterey.

Thanks,
—
xudong963

QP Hou  于2021年11月14日周日 下午5:03写道：


Hi,

I would like to propose a release of Apache Arrow Datafusion
Implementation,
version 6.0.0.

This release candidate is based on commit:
7824a8d74093374da8a4f040d23a81b8436b7380 [1]
The proposed release tarball and signatures are hosted at [2].
The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests, and
vote
on the release. The vote will be open for at least 72 hours.

Only votes from PMC members are binding, but all members of the community
are
encouraged to test the release and vote with "(non-binding)".

The standard verification procedure is documented at

https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md#verifying-release-candidates
.

[ ] +1 Release this as Apache Arrow Datafusion 6.0.0
[ ] +0
[ ] -1 Do not release this as Apache Arrow Datafusion 6.0.0 because...

[1]:
https://github.com/apache/arrow-datafusion/tree/7824a8d74093374da8a4f040d23a81b8436b7380
[2]:
https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-datafusion-6.0.0-rc0
[3]:
https://github.com/apache/arrow-datafusion/blob/7824a8d74093374da8a4f040d23a81b8436b7380/CHANGELOG.md

Thanks,
QP

Re: [VOTE][RUST] Release Apache Arrow Rust 6.2.0 RC1

2021-11-13 Thread Benson Muite


+1 (non-binding)
Checked signature and ran release verification script on Rocky Linux 8
Benson

On 11/13/21 9:00 AM, dong a wrote:

+1 (non-binding)

I checked signatures and ran the release verification script on macOS Monterey.

Thanks,
—
xudong963

On 2021/11/12 12:24:16 Andrew Lamb wrote:

Hi,

I know these emails feel somewhat automated and are easy to ignore, but I
think the Apache process overhead is worth the effort for several reasons.
Thank you for taking the time if you have it to keep the releases flowing.

With that preamble aside, I would like to propose a release of Apache Arrow
Rust Implementation, version 6.2.0.

This release candidate is based on commit:
311d59f2e9f541938b9fcb7c0a8b800a893437dc [1]

The proposed release tarball and signatures are hosted at [2].

The changelog is located at [3].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. There is a script [4] that automates some of
the verification.

The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow Rust
[ ] +0
[ ] -1 Do not release this as Apache Arrow Rust  because...

[1]:
https://github.com/apache/arrow-rs/tree/311d59f2e9f541938b9fcb7c0a8b800a893437dc
[2]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-rs-6.2.0-rc1
[3]:
https://github.com/apache/arrow-rs/blob/311d59f2e9f541938b9fcb7c0a8b800a893437dc/CHANGELOG.md
[4]:
https://github.com/apache/arrow-rs/blob/master/dev/release/verify-release-candidate.sh

Re: [VOTE] Release Apache Arrow 6.0.1 - RC1

2021-11-12 Thread Benson Muite


+1 non binding

Verified C++/Go/Java/Javascript/Python/Ruby sources (Rocky Linux 8)

bash verify-release-candidate.sh source 6.0.1 1
bash verify-release-candidate.sh wheels 6.0.1 1
bash verify-release-candidate.sh binaries 6.0.1 1

g++ (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1)
ruby 2.7.4p191 (2021-07-07 revision a21a3b7d23) [x86_64-linux]
openjdk version "1.8.0_312"
Python 3.6.8

On 11/11/21 8:22 PM, David Li wrote:

+1

Verified C++/Python/Java sources (Ubuntu 16.04 AMD64), wheels, and binaries.

-David

On Thu, Nov 11, 2021, at 04:00, Yibo Cai wrote:

+1.

Verified c++ and python source, on ubuntu 20.04, aarch64.

CC=clang-10 CXX=clang++-10 \
TEST_SOURCE=1 TEST_DEFAULT=0 TEST_CPP=1 TEST_PYTHON=1 \
dev/release/verify-release-candidate.sh source 6.0.1 1


On 11/11/21 10:39 AM, Sutou Kouhei wrote:

Hi,

I would like to propose the following release candidate (RC1) of Apache
Arrow version 6.0.1. This is a release consisting of 29
resolved JIRA issues[1].

This release candidate is based on commit:
347a88ff9d20e2a4061eec0b455b8ea1aa8335dc [2]

The source release rc1 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8][9][10][11].
The changelog is located at [12].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [13] for how to validate a release candidate.

See also verification results by GitHub Actions:

https://github.com/apache/arrow/pull/11671

There are some known failures:

* verify-rc-source-integration-linux-amd64
* verify-rc-source-python-macos-arm64
* verify-rc-wheels-macos-11-amd64
* verify-rc-wheels-macos-11-arm64

They except verify-rc-source-integration-linux-amd64 are
also failed with 6.0.0 RC3:

https://github.com/apache/arrow/pull/11511

Here is the verify-rc-source-integration-linux-amd64 log:


https://github.com/ursacomputing/crossbow/runs/4172486523?check_suite_focus=true

I'm not sure whether this is a blocker or not.

Note that the verification passed on my local machine.


The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 6.0.1
[ ] +0
[ ] -1 Do not release this as Apache Arrow 6.0.1 because...

[1]: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%206.0.1
[2]: 
https://github.com/apache/arrow/tree/347a88ff9d20e2a4061eec0b455b8ea1aa8335dc
[3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-6.0.1-rc1
[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
[7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[8]: https://apache.jfrog.io/artifactory/arrow/java-rc/6.0.1-rc1
[9]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/6.0.1-rc1
[10]: https://apache.jfrog.io/artifactory/arrow/python-rc/6.0.1-rc1
[11]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[12]: 
https://github.com/apache/arrow/blob/347a88ff9d20e2a4061eec0b455b8ea1aa8335dc/CHANGELOG.md
[13]: 
https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates

Re: [VOTE] Release Apache Arrow 6.0.1 - RC0

2021-11-08 Thread Benson Muite


On AlmaLinux 8

Python 3.6.8
openjdk version "1.8.0_312"
gcc (GCC) 8.4.1 20200928
ruby 2.7.4p191
Docker version 20.10.10, build b485636

dev/release/verify-release-candidate.sh source 6.0.1 0 passes

dev/release/verify-release-candidate.sh wheels 6.0.1 0
Cannot run the verification script directly. Need to change
python pytest -r s --pyargs pyarrow
to
python -m pytest -r s --pyargs pyarrow
issue raised

with export PYARROW_TEST_PARQUET=OFF tests pass

with export PYARROW_TEST_PARQUET=ON tests fail, issue raised


dev/release/verify-release-candidate.sh binaries 6.0.1 0 fails

Reading package lists...E: Invalid archive signature
E: Internal error, could not locate member 
control.tar{.zst,.lz4,.gz,.xz,.bz2,.lzma,}

E: Could not read meta data from /apache-arrow-apt-source-latest-impish.deb
E: The package lists or status file could not be parsed or opened.

+ echo 'Failed to verify the APT repository for ubuntu:impish'
Failed to verify the APT repository for ubuntu:impish


On 11/8/21 12:15 PM, Joris Van den Bossche wrote:

Although causing more delay for the release, I would also vote for
including Weston's PR. Otherwise it would be very unfortunate that users
can't preserve the existing behaviour (of 5.0).

Joris

On Sun, 7 Nov 2021 at 22:17, Sutou Kouhei  wrote:


Hi,

Python developers, what do you think about this?


Thanks,
--
kou

In 
   "Re: [VOTE] Release Apache Arrow 6.0.1 - RC0" on Fri, 5 Nov 2021
16:42:41 -1000,
   Weston Pace  wrote:


-0.5

Unfortunately, I think ARROW-14620[1] might be blocking.  It is a
regression that was detected from StackOverflow[2] earlier today.
There is not a great workaround for users that were using the pyarrow
datasets for an 'append files' workflow (with a custom basename
template).  I just added a PR which is passing CI (Travis hasn't run
yet and it could use review)[3].  Apologies for the last minute issue,
I know this creates additional work for everyone.  If others think
this isn't a blocker I will change my vote.

[1] https://issues.apache.org/jira/browse/ARROW-14620
[2]

https://stackoverflow.com/questions/69854919/how-do-you-set-existing-data-behavior-in-pyarrow/69860362#69860362

[3] https://github.com/apache/arrow/pull/11632

On Fri, Nov 5, 2021 at 4:15 PM Sutou Kouhei  wrote:


Hi,

I would like to propose the following release candidate (RC0) of Apache
Arrow version 6.0.1. This is a release consisting of 19
resolved JIRA issues[1].

This release candidate is based on commit:
2233a108b212656464e02203b111c9b991d3c95d [2]

The source release rc0 is hosted at [3].
The binary artifacts are hosted at [4][5][6][7][8][9][10].
The changelog is located at [11].

Please download, verify checksums and signatures, run the unit tests,
and vote on the release. See [12] for how to validate a release

candidate.


See also verification results by GitHub Actions:

   https://github.com/apache/arrow/pull/11628

There are some known failures:

   * verify-rc-source-python-macos-arm64
   * verify-rc-wheels-macos-11-amd64
   * verify-rc-wheels-macos-11-arm64

They are also failed with 6.0.0 RC3:

   https://github.com/apache/arrow/pull/11511


The vote will be open for at least 72 hours.

[ ] +1 Release this as Apache Arrow 6.0.1
[ ] +0
[ ] -1 Do not release this as Apache Arrow 6.0.1 because...

[1]:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%206.0.1

[2]:

https://github.com/apache/arrow/tree/2233a108b212656464e02203b111c9b991d3c95d

[3]:

https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-6.0.1-rc0

[4]: https://apache.jfrog.io/artifactory/arrow/almalinux-rc/
[5]: https://apache.jfrog.io/artifactory/arrow/amazon-linux-rc/
[6]: https://apache.jfrog.io/artifactory/arrow/centos-rc/
[7]: https://apache.jfrog.io/artifactory/arrow/debian-rc/
[8]: https://apache.jfrog.io/artifactory/arrow/nuget-rc/6.0.1-rc0
[9]: https://apache.jfrog.io/artifactory/arrow/python-rc/6.0.1-rc0
[10]: https://apache.jfrog.io/artifactory/arrow/ubuntu-rc/
[11]:

https://github.com/apache/arrow/blob/2233a108b212656464e02203b111c9b991d3c95d/CHANGELOG.md

[12]:

https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates

Re: [DISCUSS] Community maintained extension repos for Datafusion

2021-11-07 Thread Benson Muite

A community owned GitHub organization would be helpful. Maybe for all 
other Arrow related projects not just Datafusion. This would make them 
easier to find, and for community members to contribute. It could also 
include a listing of relevant projects elsewhere.


On 11/7/21 9:40 AM, Jiayu Liu wrote:

FWIW if there's a way to contribute code pertaining to datafusion I can
contribute my version of Java bindings to it.

IMO having a central place (instead of linking) for all bindings, 3rd
libraries, etc. for datafusion would mean more synergy across different
languages but I won't go as far as a monorepo because the CI/CD process
and release process are unlikely to benefit from it. Maybe a community
owned GitHub org?

On 2021/11/07 00:52:49 QP Hou wrote:

Hi all,
  
I would like to propose a new and more community friendly governance

model for community contributed and maintained extensions for the
datafusion project.
  
Over the last year, many datafusion extensions have been proposed and

created by the community including the java binding, s3 and hdfs[1]
object storage implementations, etc. Right now these code are or will
be hosted in individual github namespaces due to the following
reasons:
  
* Most of these extensions are not considered part of the Datafusion

core, so the current maintainers prefer to not have them managed in
the main repository. The current python binding and ballista code base
is already adding a decent amount of overhead to our development
process. Adding more dependent crates will slow us down further
without much upside.
  
* Considering the overhead of the official Apache release process,

current Datafusion PMCs don't have the bandwidth to manage individual
releases for these extensions. All of the authors of these extensions
are not Arrow PMC members, so they won't have the access to drive the
Apache releases by themselves.
  
Therefore, I am proposing that we create an unofficial shared Github

organization to host these Datafusion contrib type projects that are
only maintained by non-PMC community members. I think this is strictly
better than hosting these extensions projects in personal github
namespaces. If any of these extensions end up getting significant
involvements or interests from Datafusion committers, then we can
promote them into official projects and provide official Apache style
release support.
  
Other alternatives I have considered are:
  
* Keep these projects under personal namespaces and only link them in

Datafusion's documentation.
  
* Manage these extensions using experimental repos. But as far as I

know, the code owners still need to be a PMC member in order to
perform crates.io releases and it's not intended for long running
projects without no goal for eventual archival.
  
* Create a dedicated mono repo named apache/datafusion-contrib to host

these extensions. However, this approach also requires PMC members to
get involved for crates.io releases if I understand it correctly.
  
Am I curious if this is something that could be done under the Apache

governance model? My main goal is to create an unofficial incubator
type space for community members to develop and collaborate on
extensions that may or may not be adopted as official extensions in
the future.
  
[1]: https://github.com/apache/arrow-datafusion/pull/1223
  
Thanks,

QP

Re: [RUST] 6.0.0 Release Communication

2021-10-28 Thread Benson Muite


Andrew,

Can write something over the weekend.

Benson

On 10/28/21 2:29 PM, Andrew Lamb wrote:

Does anyone want to write up a blog post or more details on the 6.0.0 Rust
Release?

The  6.0.0 arrow blog post[1]  is about to ship — I added a brief summary
of the Rust content, but additional content and feedback are welcome.

In the past we have also done Rust specific release blog posts (e.g. [2]),
but I don't currently have the bandwidth to write one. Perhaps someone else
is interested in doing so.

Thanks,
Andrew


[1] https://github.com/apache/arrow-site/pull/153/files#r737688716
[2] https://arrow.apache.org/blog/2021/07/29/5.0.0-rs-release/

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-10-27 Thread Benson Muite

Cannot host at 4:00 UTC on 28 October, but can host at 4:00 UTC on 11 
November.


Benson

On 10/27/21 1:02 PM, Andrew Lamb wrote:

We have some proposed agenda items[1] for the Rust sync[1] this week so I
will plan to see anyone who is interested tomorrow.

Andrew

[1]
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit#
[2]
https://arrow.apache.org/datafusion/community/communication.html#sync-up-zoom-calls

Re: Arrow in HPC

2021-10-27 Thread Benson Muite

UCX is interesting, relatively new and seems like it may be easier to 
integrate. MPI is the most commonly used backend for HPC. Influencing 
the development of UCX is more difficult than influencing the 
development of MPI, but both have a slower pace of development than 
Arrow. One may want to consider support for multiple accelerators, Arrow 
has CUDA support but SYCL support seems like it will fit well with a C++ 
base for compute unit/node level parallelism and then UCX and/or MPI 
support for multinode parallelism.


OLCF does support RAPIDS 
https://docs.olcf.ornl.gov/software/analytics/nvidia-rapids.html so HPC 
in the commercial cloud could also make of Arrow.


On 10/27/21 5:26 AM, Keith Kraus wrote:

Outside of just HPC, integrating UCX would potentially allow taking
advantage of its shared memory backend which would be interesting from a
performance perspective in the single-node, multi-process case in many
situations.

Not sure it's worth the UCX dependency in the long run, but would allow us
to experiment with a lot of different transport backends.

On Tue, Oct 26, 2021 at 10:10 PM Yibo Cai  wrote:



On 10/26/21 10:02 PM, David Li wrote:

Hi Yibo,

Just curious, has there been more thought on this from your/the HPC side?


Yes. I will investigate the possible approach. Maybe build a quick (and
dirty) POC test at first.



I also realized we never asked, what is motivating Flight in this space

in the first place? Presumably broader Arrow support in general?

No special reason. Will be great if comes up with something useful, or
an interesting experiment otherwise.



-David

On Fri, Sep 10, 2021, at 12:27, Micah Kornfield wrote:


I would support doing the work necessary to get UCX (or really any

other

transport) supported, even if it is a lot of work. (I'm hoping this

clears

the path to supporting a Flight-to-browser transport as well; a few
projects seem to have rolled their own approaches but I think Flight

itself

should really handle this, too.)



Another possible technical approach is investigating to see if coming up
with a  custom gRPC "channel" implementation for new transports .
Searching around it seems like there were some defunct PRs trying to
enable UCX as one, I didn't look closely enough at why they might have
failed.

On Thu, Sep 9, 2021 at 11:07 AM David Li  wrote:


I would support doing the work necessary to get UCX (or really any

other

transport) supported, even if it is a lot of work. (I'm hoping this

clears

the path to supporting a Flight-to-browser transport as well; a few
projects seem to have rolled their own approaches but I think Flight

itself

should really handle this, too.)

  From what I understand, you could tunnel gRPC over UCX as Keith

mentions,

or directly use UCX, which is what it sounds like you are thinking

about.

One idea we had previously was to stick to gRPC for 'control plane'
methods, and support alternate protocols only for 'data plane' methods

like

DoGet - this might be more manageable, depending on what you have in

mind.


In general - there's quite a bit of work here, so it would help to
separate the work into phases, and share some more detailed
design/implementation plans, to make review more manageable. (I

realize of

course this is just a general interest check right now.) Just splitting
gRPC/Flight is going to take a decent amount of work, and (from what

little

I understand) using UCX means choosing from various communication

methods

it offers and writing a decent amount of scaffolding code, so it would

be

good to establish what exactly a 'UCX' transport means. (For instance,
presumably there's no need to stick to the Protobuf-based wire format,

but

what format would we use?)

It would also be good to expand the benchmarks, to validate the
performance we get from UCX and have a way to compare it against gRPC.
Anecdotally I've found gRPC isn't quite able to saturate a connection

so it

would be interesting to see what other transports can do.

Jed - how would you see MPI and Flight interacting? As another
transport/alternative to UCX? I admit I'm not familiar with the HPC

space.


About transferring commands with data: Flight already has an

app_metadata

field in various places to allow things like this, it may be

interesting to

combine with the ComputeIR proposal on this mailing list, and

hopefully you

& your colleagues can take a look there as well.

-David

On Thu, Sep 9, 2021, at 11:24, Jed Brown wrote:

Yibo Cai  writes:


HPC infrastructure normally leverages RDMA for fast data transfer

among

storage nodes and compute nodes. Computation tasks are dispatched to
compute nodes with best fit resources.

Concretely, we are investigating porting UCX as Flight transport

layer.

UCX is a communication framework for modern networks. [1]
Besides HPC usage, many projects (spark, dask, blazingsql, etc) also
adopt UCX to accelerate network transmission. [2][3]


I'm interested in this topic and think it's impor

Re: [VOTE] Release Apache Arrow 6.0.0 - RC3

2021-10-26 Thread Benson Muite


Ok. Thanks for the feedback.

Javascript may have problems when using nohup

so directly running

env "TEST_DEFAULT=0" env "TEST_JS=1"  bash 
dev/release/verify-release-candidate.sh source 6.0.0 3


seems to work, but

nohup env "TEST_DEFAULT=0" env "TEST_JS=1"  bash 
dev/release/verify-release-candidate.sh source 6.0.0 3 > log.out &


may not work [1].

[1] 
https://stackoverflow.com/questions/16604176/error-ebadf-bad-file-descriptor-when-running-node-using-nohup-of-forever


On 10/26/21 2:32 PM, Krisztián Szűcs wrote:

Thanks Benson for verifying!

Created a jira to track the depreciation warnings [1] and seems like
you've already created a PR for the javascript issue [2].
Luckily, these issues are not blockers.

[1]: https://issues.apache.org/jira/browse/ARROW-14468
[2]: 
https://github.com/apache/arrow/commit/b4bc846fcdf189ae0443b8445c3ef69fc4131764


On Sat, Oct 23, 2021 at 1:59 AM Benson Muite  wrote:


on Ubuntu 20.04 x86

Checked sources (C++, Python, Java, Ruby, Glib, C#, Javascript)

bash dev/release/verify-release-candidate.sh source 6.0.0 3

gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Ubuntu clang version
10.0.1-++20211003085942+ef32c611aa21-1~exp1~20211003090334.2
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
node v14.18.1
openjdk version "1.8.0_292"
Python 3.8.10

++, Python, Java, Ruby, Glib and C# pass tests. Get a failure with
Javascript, though this is likely a setup error on my part:

+ yarn run-s clean:all lint build
yarn run v1.22.17
$ /tmp/arrow-6.0.0.BDnN3/apache-arrow-6.0.0/js/node_modules/.bin/run-s
clean:all lint build
events.js:377
throw er; // Unhandled 'error' event
^

Error: EBADF: bad file descriptor, read
Emitted 'error' event on ReadStream instance at:
  at internal/fs/streams.js:173:14
  at FSReqCallback.wrapper [as oncomplete] (fs.js:562:5) {
errno: -9,
code: 'EBADF',
syscall: 'read'
}
error Command failed with exit code 1.


When running the tests directly in arrow/js using

nvm install --lts
npm install -g yarn
yarn --forzen-lockfile
yarn run-s clean:all lint build
yarn test

Tests pass.

There are some compilation warnings due to uninitialized variables and
due to deprecations, for example:

/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:
In function ‘PyObject*
__pyx_pf_7pyarrow_8_parquet_12FileMetaData_14format_version___get__(__pyx_obj_7pyarrow_8_parquet_FileMetaData*)’:
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:14168:36:
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection
[-Wdeprecated-declarations]
14168 | case  parquet::ParquetVersion::PARQUET_2_0:
|^~~
In file included from
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
   from
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
   from
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
   from
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note:
declared here
 44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or
PARQUET_2_6 "
| ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:14168:36:
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection
[-Wdeprecated-declarations]
14168 | case  parquet::ParquetVersion::PARQUET_2_0:
|^~~
In file included from
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
   from
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
   from
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
   from
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note:
declared here
 44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or
PARQUET_2_6 "
| ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:
In function ‘std::shared_ptr
__pyx_f_7pyarrow_8_parquet__create_writer_properties(__pyx_opt_args_7pyarrow_8_parquet__create_writer_properties*)’:
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:23800:62:
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection
[-Wdeprecated-declarations]
23800 |   (void)(__pyx_v_props.version(
parquet:

Re: [VOTE] Release Apache Arrow 6.0.0 - RC3

2021-10-22 Thread Benson Muite


on Ubuntu 20.04 x86

Checked sources (C++, Python, Java, Ruby, Glib, C#, Javascript)

bash dev/release/verify-release-candidate.sh source 6.0.0 3

gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Ubuntu clang version 
10.0.1-++20211003085942+ef32c611aa21-1~exp1~20211003090334.2

ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
node v14.18.1
openjdk version "1.8.0_292"
Python 3.8.10

++, Python, Java, Ruby, Glib and C# pass tests. Get a failure with 
Javascript, though this is likely a setup error on my part:


+ yarn run-s clean:all lint build
yarn run v1.22.17
$ /tmp/arrow-6.0.0.BDnN3/apache-arrow-6.0.0/js/node_modules/.bin/run-s 
clean:all lint build

events.js:377
  throw er; // Unhandled 'error' event
  ^

Error: EBADF: bad file descriptor, read
Emitted 'error' event on ReadStream instance at:
at internal/fs/streams.js:173:14
at FSReqCallback.wrapper [as oncomplete] (fs.js:562:5) {
  errno: -9,
  code: 'EBADF',
  syscall: 'read'
}
error Command failed with exit code 1.


When running the tests directly in arrow/js using

nvm install --lts
npm install -g yarn
yarn --forzen-lockfile
yarn run-s clean:all lint build
yarn test

Tests pass.

There are some compilation warnings due to uninitialized variables and 
due to deprecations, for example:


/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp: 
In function ‘PyObject* 
__pyx_pf_7pyarrow_8_parquet_12FileMetaData_14format_version___get__(__pyx_obj_7pyarrow_8_parquet_FileMetaData*)’:
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:14168:36: 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]

14168 | case  parquet::ParquetVersion::PARQUET_2_0:
  |^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
 from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
 from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
 from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
   44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "

  | ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:14168:36: 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]

14168 | case  parquet::ParquetVersion::PARQUET_2_0:
  |^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
 from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
 from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
 from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
   44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "

  | ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp: 
In function ‘std::shared_ptr 
__pyx_f_7pyarrow_8_parquet__create_writer_properties(__pyx_opt_args_7pyarrow_8_parquet__create_writer_properties*)’:
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:23800:62: 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]
23800 |   (void)(__pyx_v_props.version( 
parquet::ParquetVersion::PARQUET_2_0));
  | 
^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/types.h:30,
 from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/schema.h:32,
 from 
/tmp/arrow-6.0.0.theE2/install/include/parquet/api/schema.h:21,
 from 
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:734:
/tmp/arrow-6.0.0.theE2/install/include/parquet/type_fwd.h:44:5: note: 
declared here
   44 | PARQUET_2_0 ARROW_DEPRECATED_ENUM_VALUE("use PARQUET_2_4 or 
PARQUET_2_6 "

  | ^~~
/tmp/arrow-6.0.0.theE2/apache-arrow-6.0.0/python/build/temp.linux-x86_64-3.8/_parquet.cpp:23800:62: 
warning: ‘parquet::ParquetVersion::PARQUET_2_0’ is deprecated: use 
PARQUET_2_4 or PARQUET_2_6 for fine-grained feature selection 
[-Wdeprecated-declarations]
23800 |   (void)(__pyx_v_props.version( 
parquet::ParquetVersion::PARQUET_2_0));
  | 
^~~
In file included from 
/tmp/arrow-6.0.0.theE2/install/include/pa

[DISCUSS][Rust] Biweekly sync call for arrow/datafusion

2021-10-13 Thread Benson Muite


In case there is a need for a UTC 4:00 sync:
https://trybbb.ml/#/rust-arrow-call

[C++] Comparison functions for strings in Between Ternary Kernel

2021-10-11 Thread Benson Muite

When comparing strings using C++, the default behavior is to order by 
UTF8 codepoints which impacts comparing strings such as a < b < c 
[1][2].  This may not be appropriate in all cases and like in the sort 
function [3], it may be helpful to have an optional  field for 
comparison keys. An example in C++ is at the end of this message. Are 
there any suggestions for or objections to adding an optional field with 
comparison keys?


[1] https://issues.apache.org/jira/browse/ARROW-9843
[2] https://issues.apache.org/jira/browse/ARROW-14290
[3] 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_sort.cc


/* Follows
* 
http://www.localizingjapan.com/blog/2011/02/13/sorting-in-japanese-%E2%80%94-an-unsolved-problem/

* https://stackoverflow.com/questions/2803071/c-sort-array-of-strings
*/

#include
#include
int main()
{
  string settings[19] = {"システム", "画面", "Windows ファイウォール",
 "インターネット オプション", "キーボード", 
"メール",
 "音声認識", "管理ツール", "自動更新", 
"日付と時刻",

 "タスク", "プログラムの追加と削除", "フォント",
 "電源オプション", "マウス", 
"地域と言語オプション",

 "電話とモデムのオプション", "Java", "NVIDIA"};
  string names[8] = {"Ayumi", "アユミ", "あゆみ",  "歩美",
"Tanaka", "タナカ",  "たなか", "田中"};
  std::sort(begin(settings), end(settings));
  std::cout << "Settings" << std::endl;
  for(auto& Word: settings){
cout << Word << endl;
  }
  std::sort(begin(names), std::end(names));
  std::cout << "Names" << std::endl;
  for(auto& Name: names){
std::cout << Name << std::endl;
  }
  return 0;
 }

Re: [ANNOUNCE] New Arrow committer: Jiayu Liu

2021-10-07 Thread Benson Muite


Congratulations Jiayu Liu!

On 10/7/21 1:56 PM, Andrew Lamb wrote:

Hi,

On behalf of the Arrow PMC, I'm happy to announce that
Jiayu Liu has accepted an invitation to become a
committer on Apache Arrow. Welcome, and thank you for your
contributions!


Andrew

Re: [DISCUSS] Deprecate user@ in favor for github issues/discussions

2021-10-01 Thread Benson Muite

Mail is archived at [1] and [2], which uses Pony mail [3][4]. 
Contributed to an issue to make this more search engine friendly[5]. 
Search is really helpful to find answers as a user before posting a 
question.


Arrow is developing rapidly, at present with greater engagement between 
developers building the project than with end users who are not 
primarily developers. As a new contributor, really appreciate feedback 
have received from developers. In future, one may expect more users, a 
workflow to encourage users to also contribute would be helpful to have, 
especially since many core developers may not be able to give feedback.


If GitHub issues/discussions is used, maybe the output can also be piped 
into u...@arrow.apache.org or some other archived mailing list in case 
migration to some other platform is needed in future?


[1] https://lists.apache.org/list.html?u...@arrow.apache.org
[2] https://lists.apache.org/list.html?dev@arrow.apache.org
[3] https://github.com/apache/incubator-ponymail
[4] https://ponymail.incubator.apache.org
[5] https://github.com/apache/incubator-ponymail/issues/494

On 9/30/21 12:29 PM, Nic wrote:

I'm +1 for GH issues due to it lowering the barrier for participation. As
someone who is sometimes a bit nervous about interacting with new open
source projects/communities, adding a GH Issue is fairly familiar and feels
inconsequential, whereas emailing everyone on a mailing list is
intimidating.

On Thu, 30 Sept 2021 at 09:24, Jarek Potiuk  wrote:


Just a comment on discussions: They already have answered/unanswered
filters and they have most of the same properties that "stack overflow"
questions have,

You do not need to "track" discussions. It's great to answer and react
quickly and if you have more discussions all the community might get more
involved and start answering. It happened for us after about a month/two of
using discussions.

The important thing is that "discussion" is a discussion - if it gets no
answer, that's perfectly fine - means that the discussion did not pick
anyone's interest. Author can still follow-up, ping other people etc. but
there is no "expectation" that discussion will reach a conclusion - it can
remain unanswered forever and simply disappear.

Also this is no coincidence that discussions have no "total count". They
are meant to grow "forever" unlike issues, the discussions are meant to
just "be there" - sometimes with, sometimes without answers. You can see
the discussion in the last day/week/month or search them via keywords but
this is about it - there is nothing like "x discussions opened". This is
what makes the a fantastic counterpart to issues because you can convert
issues to discussions (and back) as maintainer/committer, when you see that
you miss information, or that it's unclear whether this is an issue but you
have no idea what to do next. They might simply "go away' if the author and
others are not interested  - or if more information is available or if
someone else has similar observation and chimes in it can be revived at any
time. But the great thing about discussion it does not leave you with the
impression that you have such a big number of "open issues" that are
unhandled. Sometimes leaving the discussion open is the right "final state"
for it.

I did not realize that when we first started to use discussion but "convert
issue to discussion" is the single best feature of GitHub issues for me. It
does not really "close" the issue (which might be seen as rude and you have
to have strong arguments to close an issue), but it gives a clear
information to the author and whoever is looking at the discussion that it
needs extra effort, clarification, digging (usually from the author but
maybe from other interested parties) to qualify it as real issue.

We went down from ~880 to 814 opened issues over the last month or so (and
we continue our downard route in Apache Airflow) once we made it a bit more
difficult to enter the issue (via detailed issue template) and started to
promote discussions in the templates and started to actively convert issues
into discussion when they qualify as such,

J.


On Thu, Sep 30, 2021 at 4:04 AM Weston Pace  wrote:


+1 for issues because I believe it would lower the barrier for entry.

I'm +0 on discussions, they can work but would require more active
curation / labeling as they cannot be closed so an "answered /
unanswered" label would probably be needed.


I think I already get e-mails from issues but
have them filtered out with the rest of other github messages, I'm not

sure

if it is easy to split them out.


Issues will absolutely be lost in the flood of notifications you would
get from watching the arrow repo.  However, you can do a custom watch
that targets only issues.  This may be an alternative for those that
prefer an issue-like workflow.  For me personally, I've monitored
issues in the Zulip feed for Github.  That being said I went ahead and
turned on an issues-only watch to try that out

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-29 Thread Benson Muite


Attendees:

Ruihang
Benson

Discussion items:

Self-introduction
OpenVidu seemed to work
Data Fusion introduction
Speed of Arrow development process and intended use cases
Maybe get time zones of attendees?

On 9/30/21 6:58 AM, Benson Muite wrote:

Join link:

https://mkutano.nairuby.org/#/soft-amaranth-alpaca


Sorry it is late. Meeting should be short, as it seems there is a 
preference for one meeting.


On 9/29/21 10:59 AM, Benson Muite wrote:

Hi,

Will send a link to a BigBlueButton/OpenVidu instance at 3:45 UTC 
tomorrow.


Update the google doc [1]

Would be helpful to know if having 2 meetings on the same day, or 
alternating the meeting time will work best for most people.


Regards,
Benson

[1] 
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit 



On 9/24/21 4:26 PM, Andrew Lamb wrote:

Thank you!


On Thu, Sep 23, 2021 at 4:17 PM Benson Muite 


wrote:


Can host 4:00 UTC, will likely use a self-hosted video conferencing
solution that should just work in the browser.

Benson


On 9/22/21 11:15 PM, Andrew Lamb wrote:
The idea of time variation sounds great. As I am not typically 
available

at
4:00 UTC I would appreciate it if someone else could please arrange 
that.

Thus, I propose the following as an initial call and we can adjust
schedules or technology as needed:

Date/Time: Alternating Thursdays at 16:00 UTC starting September 
30, 2021

Location: Zoom [2]
Agenda: Google docs [1]

We will send a summary of all sync ups to the mailing list. I have 
also

proposed adding this information to the website [3]

Thanks,
Andrew

[1]

https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit 



[2]
Topic: Apache Arrow Rust Syncup (arrow, datafusion ballista, etc) 
Time:

Sep

30, 2021 04:00 PM Universal Time UTC
https://influxdata.zoom.us/j/94666921249

[3] https://github.com/apache/arrow-datafusion/pull/1042/files

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-29 Thread Benson Muite


Join link:

https://mkutano.nairuby.org/#/soft-amaranth-alpaca


Sorry it is late. Meeting should be short, as it seems there is a 
preference for one meeting.


On 9/29/21 10:59 AM, Benson Muite wrote:

Hi,

Will send a link to a BigBlueButton/OpenVidu instance at 3:45 UTC tomorrow.

Update the google doc [1]

Would be helpful to know if having 2 meetings on the same day, or 
alternating the meeting time will work best for most people.


Regards,
Benson

[1] 
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit 



On 9/24/21 4:26 PM, Andrew Lamb wrote:

Thank you!


On Thu, Sep 23, 2021 at 4:17 PM Benson Muite 
wrote:


Can host 4:00 UTC, will likely use a self-hosted video conferencing
solution that should just work in the browser.

Benson


On 9/22/21 11:15 PM, Andrew Lamb wrote:
The idea of time variation sounds great. As I am not typically 
available

at
4:00 UTC I would appreciate it if someone else could please arrange 
that.

Thus, I propose the following as an initial call and we can adjust
schedules or technology as needed:

Date/Time: Alternating Thursdays at 16:00 UTC starting September 30, 
2021

Location: Zoom [2]
Agenda: Google docs [1]

We will send a summary of all sync ups to the mailing list. I have also
proposed adding this information to the website [3]

Thanks,
Andrew

[1]

https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit 



[2]
Topic: Apache Arrow Rust Syncup (arrow, datafusion ballista, etc) Time:

Sep

30, 2021 04:00 PM Universal Time UTC
https://influxdata.zoom.us/j/94666921249

[3] https://github.com/apache/arrow-datafusion/pull/1042/files

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-29 Thread Benson Muite


Hi,

Will send a link to a BigBlueButton/OpenVidu instance at 3:45 UTC tomorrow.

Update the google doc [1]

Would be helpful to know if having 2 meetings on the same day, or 
alternating the meeting time will work best for most people.


Regards,
Benson

[1] 
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit


On 9/24/21 4:26 PM, Andrew Lamb wrote:

Thank you!


On Thu, Sep 23, 2021 at 4:17 PM Benson Muite 
wrote:


Can host 4:00 UTC, will likely use a self-hosted video conferencing
solution that should just work in the browser.

Benson


On 9/22/21 11:15 PM, Andrew Lamb wrote:

The idea of time variation sounds great. As I am not typically available

at

4:00 UTC I would appreciate it if someone else could please arrange that.
Thus, I propose the following as an initial call and we can adjust
schedules or technology as needed:

Date/Time: Alternating Thursdays at 16:00 UTC starting September 30, 2021
Location: Zoom [2]
Agenda: Google docs [1]

We will send a summary of all sync ups to the mailing list. I have also
proposed adding this information to the website [3]

Thanks,
Andrew

[1]


https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit


[2]
Topic: Apache Arrow Rust Syncup (arrow, datafusion ballista, etc) Time:

Sep

30, 2021 04:00 PM Universal Time UTC
https://influxdata.zoom.us/j/94666921249

[3] https://github.com/apache/arrow-datafusion/pull/1042/files

Re: C++ Boost GitHub URL in ThirdpartyToolchain.cmake

2021-09-28 Thread Benson Muite


Hmm, it should. Can you open a JIRA with the full build logs?

In the meantime though, you can also install the Developer Toolset to
get a much newer gcc version:
https://www.softwarecollections.org/en/scls/rhscl/devtoolset-8/

Regards

Antoine.



Ticket created:
https://issues.apache.org/jira/browse/ARROW-14152
Will add logs.

Re: C++ Boost GitHub URL in ThirdpartyToolchain.cmake

2021-09-28 Thread Benson Muite


On 9/28/21 10:47 AM, Antoine Pitrou wrote:


Le 28/09/2021 à 09:41, Benson Muite a écrit :

Sorry, second one should have -DARROW_BUILD_TESTS=ON instead of
-DARROW_BUILD_TESTS=OFF


I see. What is the gcc version?
4.8 (will need to rebuild to get minor version) is default on Cent OS 7 
- expect this should be adequate as indicate at 
https://arrow.apache.org/docs/developers/cpp/building.html#system-setup 
though maybe some newer C++ features are missing and this needs to be 
updated.

Re: C++ Boost GitHub URL in ThirdpartyToolchain.cmake

2021-09-28 Thread Benson Muite


On 9/28/21 10:36 AM, Antoine Pitrou wrote:


Hi,

Le 28/09/2021 à 09:25, Benson Muite a écrit :

Maybe helpful to create a ticket at:
https://issues.apache.org/jira/projects/ARROW
for more documentation on setup with Cent OS 7

Currently trying this on commit 1f481d9 (tagged as
apache-arrow-6.0.0.dev) - having some trouble with current head of the
development repository. Installed

yum install gcc-c++ gcc bison flex git python3 make
yum groupinstall "Development tools"

Then build and install a recent version of cmake from source (tried with
3.21.3).

Then

git clone https://github.com/apache/arrow.git
cd arrow
git submodule init
git submodule update
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
git checkout  1f481d9
mkdir build
cd build
PATH/TO/INSTALLED/cmake .. -DARROW_PARQUET=ON -DARROW_COMPUTE=ON
-DARROW_CSV=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BUILD_TESTS=OFF
-DThrift_SOURCE=BUNDLED -DPARQUET_REQUIRE_ENCRYPTION=ON

This seems to build.

When using,

PATH/TO/INSTALLED/cmake .. -DARROW_PARQUET=ON -DARROW_COMPUTE=ON
-DARROW_CSV=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BUILD_TESTS=OFF
-DThrift_SOURCE=BUNDLED -DPARQUET_REQUIRE_ENCRYPTION=ON

am investigating why the build fails with: [...]


I don't understand: is there a difference between the two cmake commands 
above?


Sorry, second one should have -DARROW_BUILD_TESTS=ON instead of 
-DARROW_BUILD_TESTS=OFF

Re: C++ Boost GitHub URL in ThirdpartyToolchain.cmake

2021-09-28 Thread Benson Muite


The lines
> mkdir build
> cd build
should be
mkdir cpp/build
cd cpp/build

Let me know if other configurations/bindings are needed.

On 9/28/21 10:25 AM, Benson Muite wrote:

Maybe helpful to create a ticket at:
https://issues.apache.org/jira/projects/ARROW
for more documentation on setup with Cent OS 7

Currently trying this on commit 1f481d9 (tagged as 
apache-arrow-6.0.0.dev) - having some trouble with current head of the 
development repository. Installed


yum install gcc-c++ gcc bison flex git python3 make
yum groupinstall "Development tools"

Then build and install a recent version of cmake from source (tried with 
3.21.3).


Then

git clone https://github.com/apache/arrow.git
cd arrow
git submodule init
git submodule update
export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
export ARROW_TEST_DATA="${PWD}/testing/data"
git checkout  1f481d9
mkdir build
cd build
PATH/TO/INSTALLED/cmake .. -DARROW_PARQUET=ON -DARROW_COMPUTE=ON 
-DARROW_CSV=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BUILD_TESTS=OFF 
-DThrift_SOURCE=BUNDLED -DPARQUET_REQUIRE_ENCRYPTION=ON


This seems to build.

When using,

PATH/TO/INSTALLED/cmake .. -DARROW_PARQUET=ON -DARROW_COMPUTE=ON 
-DARROW_CSV=ON -DCMAKE_BUILD_TYPE=Release -DARROW_BUILD_TESTS=OFF 
-DThrift_SOURCE=BUNDLED -DPARQUET_REQUIRE_ENCRYPTION=ON


am investigating why the build fails with:

In file included from 
/root/arrow/cpp/src/arrow/util/reflection_test.cc:23:0:
/root/arrow/cpp/src/arrow/util/enum.h: In instantiation of ‘static 
constexpr bool arrow::EnumStrings::assert_count() [with int M = 1; 
int N = 3]’:
/root/arrow/cpp/src/arrow/util/enum.h:57:28:   required from ‘constexpr 
arrow::EnumStrings::EnumStrings(const Strs& ...) [with Strs = 
{testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >}; int N = 3]’
/usr/include/c++/4.8.2/type_traits:1305:35:   required by substitution 
of ‘template static decltype 
((__test_aux<_To1>(declval<_From1>()), std::__sfinae_types::__one())) 
std::__is_convertible_helper<_From, _To, false>::__test(int) [with 
_From1 = _From1; _To1 = _To1; _From = 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >; _To = const arrow::EnumStrings<3>&] [with 
_From1 = 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >; _To1 = const arrow::EnumStrings<3>&]’
/usr/include/c++/4.8.2/type_traits:1312:50:   required from ‘constexpr 
const bool 
std::__is_convertible_helperstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >, const arrow::EnumStrings<3>&, false>::value’
/usr/include/c++/4.8.2/type_traits:1317:12:   required from ‘struct 
std::is_convertiblestd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >, const arrow::EnumStrings<3>&>’
/root/arrow/cpp/release/googletest_ep-prefix/include/gmock/gmock-matchers.h:137:48: 
   required from ‘static testing::Matcher 
testing::internal::MatcherCastImpl::Cast(const M&) [with T = const 
arrow::EnumStrings<3>&; M = 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >]’
/root/arrow/cpp/release/googletest_ep-prefix/include/gmock/gmock-matchers.h:260:78: 
   required from ‘static testing::Matcher 
testing::SafeMatcherCastImpl::Cast(const M&) [with M = 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >; T = const arrow::EnumStrings<3>&]’
/root/arrow/cpp/release/googletest_ep-prefix/include/gmock/gmock-matchers.h:298:58: 
   required from ‘testing::Matcher testing::SafeMatcherCast(const M&) 
[with T = const arrow::EnumStrings<3>&; M = 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >]’
/root/arrow/cpp/release/googletest_ep-prefix/include/gmock/gmock-matchers.h:1313:73: 
   required from ‘testing::AssertionResult 
testing::internal::PredicateFormatterFromMatcher::operator()(const 
char*, const T&) const [with T = arrow::EnumStrings<3>; M = 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string

Re: C++ Boost GitHub URL in ThirdpartyToolchain.cmake

2021-09-28 Thread Benson Muite

nst & s ) nssv_noexcept

 ^
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:914:5: note: 
template argument deduction/substitution failed:
In file included from 
/root/arrow/cpp/src/arrow/util/reflection_test.cc:23:0:
/root/arrow/cpp/src/arrow/util/enum.h:57:85: note:   ‘const 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >’ is not derived from ‘const 
std::basic_string, Allocator>’
   : dummy_{assert_count()}, 
strings_{util::string_view(strs)...} {}


  ^
In file included from /root/arrow/cpp/src/arrow/util/string_view.h:25:0,
 from /root/arrow/cpp/src/arrow/buffer.h:31,
 from /root/arrow/cpp/src/arrow/array/data.h:26,
 from /root/arrow/cpp/src/arrow/array/array_base.h:26,
 from /root/arrow/cpp/src/arrow/array/builder_binary.h:30,
 from /root/arrow/cpp/src/arrow/testing/gtest_util.h:33,
 from /root/arrow/cpp/src/arrow/testing/future_util.h:20,
 from /root/arrow/cpp/src/arrow/testing/matchers.h:24,
 from /root/arrow/cpp/src/arrow/util/reflection_test.cc:22:
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:502:20: note: 
constexpr nonstd::sv_lite::basic_string_viewTraits>::basic_string_view(const CharT*) [with CharT = char; Traits = 
std::char_traits]
 nssv_constexpr basic_string_view( CharT const * s) nssv_noexcept 
// non-standard noexcept

^
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:502:20: note:   no 
known conversion for argument 1 from ‘const 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >’ to ‘const char*’
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:497:20: note: 
constexpr nonstd::sv_lite::basic_string_viewTraits>::basic_string_view(const CharT*, 
nonstd::sv_lite::basic_string_view::size_type) [with 
CharT = char; Traits = std::char_traits; 
nonstd::sv_lite::basic_string_view::size_type = long 
unsigned int]
 nssv_constexpr basic_string_view( CharT const * s, size_type count 
) nssv_noexcept // non-standard noexcept

^
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:497:20: note: 
candidate expects 2 arguments, 1 provided
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:489:20: note: 
constexpr nonstd::sv_lite::basic_string_viewTraits>::basic_string_view(const 
nonstd::sv_lite::basic_string_view&) [with CharT = char; 
Traits = std::char_traits]
 nssv_constexpr basic_string_view( basic_string_view const & other 
) nssv_noexcept = default;

^
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:489:20: note:   no 
known conversion for argument 1 from ‘const 
testing::internal::ElementsAreMatcherstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits >, nonstd::sv_lite::basic_string_viewstd::char_traits > > >’ to ‘const 
nonstd::sv_lite::basic_string_view&’
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:483:20: note: 
constexpr nonstd::sv_lite::basic_string_viewTraits>::basic_string_view() [with CharT = char; Traits = 
std::char_traits]

 nssv_constexpr basic_string_view() nssv_noexcept
^
/root/arrow/cpp/src/arrow/vendored/string_view.hpp:483:20: note: 
candidate expects 0 arguments, 1 provided
cc1plus: warning: unrecognized command line option 
"-Wno-unknown-warning-option" [enabled by default]
make[2]: *** 
[src/arrow/util/CMakeFiles/arrow-utility-test.dir/reflection_test.cc.o] 
Error 1

make[1]: *** [src/arrow/util/CMakeFiles/arrow-utility-test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs

On 9/28/21 8:17 AM, Rares Vernica wrote:

CentOS 7

On Mon, Sep 27, 2021 at 10:06 PM Benson Muite 
wrote:


Hi Rares,
What operating system are you using?
Benson
On 9/28/21 7:38 AM, Rares Vernica wrote:

Hello,

I'm still struggling to build Arrow with Parquet. I compiled Thrift

myself

but I'm running into dependency issues with Boost.

It looks like the Boost download URL provided in

ThirdpartyToolchain.cmake

here


https://github.com/apache/arrow/blob/ef4e92982054fcc723729ab968296d799d3108dd/cpp/cmake_modules/ThirdpartyToolchain.cmake#L405

links to GitHub Releases https://github.com/boostorg/boost/releases. The
.tar.gz provided there does not contain the Boost headers.

Cheers,
Rares

Re: C++ Boost GitHub URL in ThirdpartyToolchain.cmake

2021-09-27 Thread Benson Muite


Hi Rares,
What operating system are you using?
Benson
On 9/28/21 7:38 AM, Rares Vernica wrote:

Hello,

I'm still struggling to build Arrow with Parquet. I compiled Thrift myself
but I'm running into dependency issues with Boost.

It looks like the Boost download URL provided in ThirdpartyToolchain.cmake
here
https://github.com/apache/arrow/blob/ef4e92982054fcc723729ab968296d799d3108dd/cpp/cmake_modules/ThirdpartyToolchain.cmake#L405
links to GitHub Releases https://github.com/boostorg/boost/releases. The
.tar.gz provided there does not contain the Boost headers.

Cheers,
Rares

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-23 Thread Benson Muite

Can host 4:00 UTC, will likely use a self-hosted video conferencing 
solution that should just work in the browser.


Benson


On 9/22/21 11:15 PM, Andrew Lamb wrote:

The idea of time variation sounds great. As I am not typically available at
4:00 UTC I would appreciate it if someone else could please arrange that.
Thus, I propose the following as an initial call and we can adjust
schedules or technology as needed:

Date/Time: Alternating Thursdays at 16:00 UTC starting September 30, 2021
Location: Zoom [2]
Agenda: Google docs [1]

We will send a summary of all sync ups to the mailing list. I have also
proposed adding this information to the website [3]

Thanks,
Andrew

[1]
https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit

[2]
Topic: Apache Arrow Rust Syncup (arrow, datafusion ballista, etc) Time: Sep
30, 2021 04:00 PM Universal Time UTC
https://influxdata.zoom.us/j/94666921249

[3] https://github.com/apache/arrow-datafusion/pull/1042/files

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-19 Thread Benson Muite

New to this. A suggestion may be to consider two of the times, eg. 4:00
UTC and 16:00 UTC perhaps alternating allowing geographic diversity in
joining convenience.

On 9/20/21 6:45 AM, QP Hou wrote:

16 UTC works for me too.

On Sun, Sep 19, 2021 at 10:00 AM zied bf wrote:

HI everyone,

Still new to the stack and as per @Remi mentioned a few details related to
internal design of datafuision documents and discussion could help
newcomers to understand how to contribute and at least where to start to
grasp the full implementation details and potentially contribute.

I would vote for 16:00 UTC.

Best

On Sun, Sep 19, 2021 at 5:50 PM Rémi Dettai wrote:

I would also vote for 16:00 UTC.

Remi

On Sun, Sep 19, 2021, 2:01 PM Yijie Shen
wrote:

4:00, 10:00, and 16:00 in UTC works for me as well :D

On 2021/09/19 10:32:03 Wayne Xia wrote:

I vote for 4:00, 10:00 and 16:00 (in UTC)

On Sun, Sep 19, 2021 at 6:27 PM Andrew Lamb

wrote:

It sounds like there are enough people to make it worth organizing at

least

a few.

I would be happy to organize a zoom meeting to facilitate this.

Previously

the schedule was every other Wednesday at 12 Noon Eastern Time,

which I

realize may be a tough time for some, especially those in Asia.

Could people please respond with their preference:
A) 4:00 UTC
A) 10:00 UTC
B) 16:00 UTC
D) 22:00 UTC

Thanks,
Andrew

On Fri, Sep 17, 2021 at 6:07 AM Yijie Shen <

henry.yijies...@gmail.com>

wrote:

I have received a lot of great help since I started working on

DataFusion.

It will be fantastic to have an opportunity to communicate with

community

members "face to face".

Best,
Yijie

On 2021/09/17 03:58:10 QP Hou wrote:

I would be interested in meeting with more contributors "face to

face"

and chime in to help move these major initiatives forward in any

way I

can :)

--
QP

On Thu, Sep 16, 2021 at 6:55 AM Rémi Dettai

wrote:

I am also very interested in re-instoring these events, at

least

occasionally.

I do think that sharing some higher level goals and ideas in

*informal

*discussions could help us understand each other better in our

asynchronous

work (design documents, issues, PRs).

I also agree that no decision should be taken during these

calls. An

interesting format could be that each time, one or two

participants,

on a

volunteering basis, share a small presentation of their work

on/around

Arrow/Datafusion, the time they have available to spend on it,

maybe

their

overall vision of what they would like the project to become...

Remi

Le jeu. 16 sept. 2021 à 15:37, Andrew Lamb <

al...@influxdata.com>

écrit :

A lot has been happening in DataFusion and Arrow since we

stopped

the

Rust specific sync calls (see mailing list thread [1] on the

topic).

I would like to gauge interest in restarting the calls

I think a call could be valuable to:
1. Help "put a face to the name" of some of other

contributors

are

working with
2. Discuss / synchronize on the goals and major initiatives

from

different

stakeholders to identify areas where more alignment is needed

Recent areas I am thinking about that might benefit from some

person

discussion are the object store API [2] and table provider

splits

[3].

As always, we would ensure that minutes are sent out, no

decisions

are

made on the call and anything of substance was discussed on

this

mailing

list or in github issues / google docs.

Andrew

[1]

https://lists.apache.org/thread.html/rbeadc3b11bce8731c69617c8e0fe780a97055de0fcd739c378d9c0e1%40%3Cdev.arrow.apache.org%3E

[2] https://github.com/apache/arrow-datafusion/pull/950

[3] https://github.com/apache/arrow-datafusion/issues/1009

87 matches

Mail list logo