[jira] [Created] (ARROW-7903) [Rust] Upgrade SQLParser dependency for DataFusion?

2020-02-20 Thread Max Burke (Jira)
Max Burke created ARROW-7903:


 Summary: [Rust] Upgrade SQLParser dependency for DataFusion?
 Key: ARROW-7903
 URL: https://issues.apache.org/jira/browse/ARROW-7903
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Max Burke


We've been running into a couple issues that seem to stem from the sqlparser 
crate, such as it not supporting columns that begin with a leading underscore.

 

Unfortunately the upgrade for DataFusion to sqlparser-0.5 (or even 0.3) seems 
to be non-trivial. 

 

Is this planned?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Integration testing

2020-02-20 Thread Neal Richardson
Hi all,
To help us reach 1.0 with as complete and thoroughly tested implementations
of the Arrow format, I've surveyed our integration test suite and open
issues and collected information here:
https://docs.google.com/spreadsheets/d/1Yu68rn2XMBpAArUfCOP9LC7uHb06CQrtqKE5vQ4bQx4/edit#gid=782909347

I'll happily grant edit privileges on the doc to anyone who requests.

This replaces the content on
https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone,
which was a bit stale. I carried over some notes from there as appropriate,
but most were no longer accurate. Hopefully this new document helps us
revise our understanding of what is implemented and makes clear what's left
to do.

Most of the outstanding issues (at least for C++ and Java) are already
ticketed in Jira and marked as blockers for 1.0, but let me know if you see
something missing.

Neal


[jira] [Created] (ARROW-7902) [Integration] Unskip nested dictionary integration tests

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7902:
--

 Summary: [Integration] Unskip nested dictionary integration tests
 Key: ARROW-7902
 URL: https://issues.apache.org/jira/browse/ARROW-7902
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7901) [Integration][Go] Add null type (and integration test)

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7901:
--

 Summary: [Integration][Go] Add null type (and integration test)
 Key: ARROW-7901
 URL: https://issues.apache.org/jira/browse/ARROW-7901
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go, Integration
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7900) [Integration][JavaScript] Add null type integration test

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7900:
--

 Summary: [Integration][JavaScript] Add null type integration test
 Key: ARROW-7900
 URL: https://issues.apache.org/jira/browse/ARROW-7900
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Integration, JavaScript
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7899) [Integration][Java] null type integration test

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7899:
--

 Summary: [Integration][Java] null type integration test
 Key: ARROW-7899
 URL: https://issues.apache.org/jira/browse/ARROW-7899
 Project: Apache Arrow
  Issue Type: Bug
  Components: Integration, Java
Reporter: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7898) [Python] Fix docstring formatting issues using numpydoc

2020-02-20 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7898:
--

 Summary: [Python] Fix docstring formatting issues using numpydoc
 Key: ARROW-7898
 URL: https://issues.apache.org/jira/browse/ARROW-7898
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation, Python
Reporter: Krisztian Szucs


This is going to require more than one patch, because we have more than a 
thousand violations, but we need to start somewhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7897) [Packaging] Temporarily disable artifact uploading until we fix the deployment issues

2020-02-20 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7897:
--

 Summary: [Packaging] Temporarily disable artifact uploading until 
we fix the deployment issues
 Key: ARROW-7897
 URL: https://issues.apache.org/jira/browse/ARROW-7897
 Project: Apache Arrow
  Issue Type: Task
  Components: Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


This will filter out the false negatives from the nightly build report until we 
fix the deployment errors in https://github.com/apache/arrow/pull/6458



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7896) [C++] Refactor from #include guards to #pragma once

2020-02-20 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7896:
---

 Summary: [C++] Refactor from #include guards to #pragma once
 Key: ARROW-7896
 URL: https://issues.apache.org/jira/browse/ARROW-7896
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 1.0.0


All compilers we support handle {{#pragma once}} correctly, and it reduces our 
header boilerplate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7895) [Python] Remove more python 2.7 cruft

2020-02-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7895:
--

 Summary: [Python] Remove more python 2.7 cruft
 Key: ARROW-7895
 URL: https://issues.apache.org/jira/browse/ARROW-7895
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7894) [C++] DefineOptions should invoke add_definitions

2020-02-20 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-7894:
---

 Summary: [C++] DefineOptions should invoke add_definitions
 Key: ARROW-7894
 URL: https://issues.apache.org/jira/browse/ARROW-7894
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 0.16.0
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 1.0.0


Several build options are mirrored as preprocessor definitions, for example 
\{{ARROW_JEMALLOC}}. This could be made more consistent by requiring that every 
option in DefineOptions should also define a preprocessor macro with 
{{add_definitions}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: CI setup on dedicated arm hardware

2020-02-20 Thread Krisztián Szűcs
On Thu, Feb 20, 2020 at 12:14 PM Wes McKinney  wrote:
>
> hi Ganesh,
>
> Thanks for writing.
>
> I've been working on setting up Buildkite (BK) as a way for third
> parties for attach machines to run builds on, with a free organization
> at
>
> https://buildkite.com/apache-arrow
>
> Configuring a new machine to accept builds is very easy [1] and takes
> less than 60 seconds on Linux or macOS (though maybe a bit more work
> on Windows). Currently I've attached 6 machines:
>
> * 2 CUDA-capable Linux x86
> * 3 armhf machines (not super high-powered), 1 CUDA-capable
> * 1 macOS
>
> We're still waiting on ASF Infra to twiddle some bits so that builds
> triggered in BK can report commit statuses on GitHub [2]
>
> It's possible we can use self-hosted GitHub Actions (GHA) for this
> also but the workflow for new machines to be contributed needs to be
> proven out.
I've already tried it out, and setting up self-hosted github runners is just as
easy as with buildkite, drawbacks:
- I'm unsure how would the tagging selection work in practice [1]
- We won't have access to the runners dashboard in lack of admin rights
  for the apache/arrow repository - so we need to test out the workflow.

I've created an INFRA ticket to get some information and to track it:
https://issues.apache.org/jira/browse/INFRA-19875

[1] 
https://help.github.com/en/actions/configuring-and-managing-workflows/configuring-a-workflow#using-a-self-hosted-runner
>
> Thanks,
> Wes
>
> [1]: 
> https://github.com/ursa-labs/dev-tools/blob/master/buildkite/debian_agent_bootstrap.sh
> [2]: https://issues.apache.org/jira/browse/INFRA-19217
>
> On Wed, Feb 19, 2020 at 3:38 PM Ganesh Raju  wrote:
> >
> > Hi,
> > I am following up on the discussion from here
> > , with interest to have
> > dedicated arm hardware for CI setup. We can surely help with that if we get
> > a go-ahead from the project.
> >
> > Thanks,
> > Ganesh
> >
> > --
> > IRC: ganeshraju@#linaro on irc.freenode.ne t


[jira] [Created] (ARROW-7892) [Python] Expose FilesystemSource.format attribute

2020-02-20 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7892:


 Summary: [Python] Expose FilesystemSource.format attribute
 Key: ARROW-7892
 URL: https://issues.apache.org/jira/browse/ARROW-7892
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche
Assignee: Joris Van den Bossche






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Using GitHub Actions to automate style and other fixes

2020-02-20 Thread Joris Van den Bossche
I am personally also not in favor of automatic fixes, but a bot you can
call would be a nice feature. It might be possible to let a workflow be
triggered by an issue comment? (
https://help.github.com/en/actions/reference/events-that-trigger-workflows#issue-comment-event-issue_comment
)

In addition, I think it would be nice to make the pre-commit hooks a
smoother experience. I started using it for pandas, and I really love it
that I don't have to worry anymore about failing CI for some style check
(but, in pandas, the pre-commit hook is simpler of course, since it's only
python checks).

Joris



On Thu, 20 Feb 2020 at 10:52, Wes McKinney  wrote:

> I'm also against _automatic_ (user-not-in-the-loop) changes. I think
> the model used conda-forge works pretty well, where a bot lints each
> patch and lets you know if any changes are needed
>
>
> https://github.com/conda-forge/pyarrow-feedstock/pull/99#issuecomment-585264694
>
> However, note that bot-generated commits could risk running afoul of
> the ASF's code provenance requirements. If a commit to a patch is
> requested / opted-in-to by a contributor then I think there is less of
> an issue, and during the squash everything is rolled up and attributed
> to the contributor(s).
>
> On Thu, Feb 20, 2020 at 3:43 AM Jacek Pliszka 
> wrote:
> >
> > As a beginner contributor I believe I can vote for linting as part of
> the build.
> >
> > For me the best would be BEGINNER/ALL_CHECKS option in the Makefile
> > that does all the linting and all checks done in the build.
> >
> > And in the instruction it would be clearly suggested to use it.
> >
> > BR,
> >
> > Jacek
> >
> >
> > śr., 19 lut 2020 o 20:40 Antoine Pitrou 
> napisał(a):
> > >
> > >
> > > Hi,
> > >
> > > On Wed, 19 Feb 2020 09:59:04 -0800
> > > >
> > > > It doesn't have to be this way. With GitHub Actions, we can run
> workflows
> > > > that fix style and other violations and push the fix in a commit
> back to
> > > > the branch.
> > >
> > > I'm rather opposed to this.  Doing automated pushes behind the user's
> > > back will feel confusing and slightly obnoxious.
> > >
> > > > Style guides and linting are important for large projects like
> Arrow, but
> > > > we don't want to add unnecessary friction to the dev process,
> particularly
> > > > for new contributors--it's challenging enough without it.
> > >
> > > Well, at worse, we can push fixes ourselves before merging a PR if the
> > > only remaining ones are style fixes.
> > >
> > > > What are your thoughts? If anyone objects to using GitHub Actions in
> this
> > > > way, would you be satisfied with blacklisting your fork (i.e. you
> don't
> > > > want it running on your branches but you don't mind if others do)?
> > >
> > > Egoistically, that would satify me, but I'm not sure it would be less
> > > confusing to beginner contributors.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
>


Re: CI setup on dedicated arm hardware

2020-02-20 Thread Wes McKinney
hi Ganesh,

Thanks for writing.

I've been working on setting up Buildkite (BK) as a way for third
parties for attach machines to run builds on, with a free organization
at

https://buildkite.com/apache-arrow

Configuring a new machine to accept builds is very easy [1] and takes
less than 60 seconds on Linux or macOS (though maybe a bit more work
on Windows). Currently I've attached 6 machines:

* 2 CUDA-capable Linux x86
* 3 armhf machines (not super high-powered), 1 CUDA-capable
* 1 macOS

We're still waiting on ASF Infra to twiddle some bits so that builds
triggered in BK can report commit statuses on GitHub [2]

It's possible we can use self-hosted GitHub Actions (GHA) for this
also but the workflow for new machines to be contributed needs to be
proven out.

Thanks,
Wes

[1]: 
https://github.com/ursa-labs/dev-tools/blob/master/buildkite/debian_agent_bootstrap.sh
[2]: https://issues.apache.org/jira/browse/INFRA-19217

On Wed, Feb 19, 2020 at 3:38 PM Ganesh Raju  wrote:
>
> Hi,
> I am following up on the discussion from here
> , with interest to have
> dedicated arm hardware for CI setup. We can surely help with that if we get
> a go-ahead from the project.
>
> Thanks,
> Ganesh
>
> --
> IRC: ganeshraju@#linaro on irc.freenode.ne t


Re: Using GitHub Actions to automate style and other fixes

2020-02-20 Thread Wes McKinney
I'm also against _automatic_ (user-not-in-the-loop) changes. I think
the model used conda-forge works pretty well, where a bot lints each
patch and lets you know if any changes are needed

https://github.com/conda-forge/pyarrow-feedstock/pull/99#issuecomment-585264694

However, note that bot-generated commits could risk running afoul of
the ASF's code provenance requirements. If a commit to a patch is
requested / opted-in-to by a contributor then I think there is less of
an issue, and during the squash everything is rolled up and attributed
to the contributor(s).

On Thu, Feb 20, 2020 at 3:43 AM Jacek Pliszka  wrote:
>
> As a beginner contributor I believe I can vote for linting as part of the 
> build.
>
> For me the best would be BEGINNER/ALL_CHECKS option in the Makefile
> that does all the linting and all checks done in the build.
>
> And in the instruction it would be clearly suggested to use it.
>
> BR,
>
> Jacek
>
>
> śr., 19 lut 2020 o 20:40 Antoine Pitrou  napisał(a):
> >
> >
> > Hi,
> >
> > On Wed, 19 Feb 2020 09:59:04 -0800
> > >
> > > It doesn't have to be this way. With GitHub Actions, we can run workflows
> > > that fix style and other violations and push the fix in a commit back to
> > > the branch.
> >
> > I'm rather opposed to this.  Doing automated pushes behind the user's
> > back will feel confusing and slightly obnoxious.
> >
> > > Style guides and linting are important for large projects like Arrow, but
> > > we don't want to add unnecessary friction to the dev process, particularly
> > > for new contributors--it's challenging enough without it.
> >
> > Well, at worse, we can push fixes ourselves before merging a PR if the
> > only remaining ones are style fixes.
> >
> > > What are your thoughts? If anyone objects to using GitHub Actions in this
> > > way, would you be satisfied with blacklisting your fork (i.e. you don't
> > > want it running on your branches but you don't mind if others do)?
> >
> > Egoistically, that would satify me, but I'm not sure it would be less
> > confusing to beginner contributors.
> >
> > Regards
> >
> > Antoine.
> >
> >


Re: Using GitHub Actions to automate style and other fixes

2020-02-20 Thread Jacek Pliszka
As a beginner contributor I believe I can vote for linting as part of the build.

For me the best would be BEGINNER/ALL_CHECKS option in the Makefile
that does all the linting and all checks done in the build.

And in the instruction it would be clearly suggested to use it.

BR,

Jacek


śr., 19 lut 2020 o 20:40 Antoine Pitrou  napisał(a):
>
>
> Hi,
>
> On Wed, 19 Feb 2020 09:59:04 -0800
> >
> > It doesn't have to be this way. With GitHub Actions, we can run workflows
> > that fix style and other violations and push the fix in a commit back to
> > the branch.
>
> I'm rather opposed to this.  Doing automated pushes behind the user's
> back will feel confusing and slightly obnoxious.
>
> > Style guides and linting are important for large projects like Arrow, but
> > we don't want to add unnecessary friction to the dev process, particularly
> > for new contributors--it's challenging enough without it.
>
> Well, at worse, we can push fixes ourselves before merging a PR if the
> only remaining ones are style fixes.
>
> > What are your thoughts? If anyone objects to using GitHub Actions in this
> > way, would you be satisfied with blacklisting your fork (i.e. you don't
> > want it running on your branches but you don't mind if others do)?
>
> Egoistically, that would satify me, but I'm not sure it would be less
> confusing to beginner contributors.
>
> Regards
>
> Antoine.
>
>


Re: Python 2.7 support removed

2020-02-20 Thread Antoine Pitrou


Hi Micah,

Unlike 2.7, it's not onerous at all, so we can definitely maintain it
for a couple more months if desired.

Regards

Antoine.


Le 20/02/2020 à 04:47, Micah Kornfield a écrit :
> Hi Antoine,
> Do you have a timeline for the 3.5 support?  If possible could it maybe
> wait until after the next release or has it become onerous to maintain?
> 
> Thanks,
> Micah
> 
> On Wed, Feb 19, 2020 at 1:24 AM Antoine Pitrou  wrote:
> 
>>
>> Hello,
>>
>> Following the previous discussions on this mailing-list, we have
>> entirely removed Python 2.7 support from the codebase (see ARROW-5757 on
>> JIRA).  This deleted a lot of compatibility code that was spread around
>> the C++ and Python codebases.
>>
>> As a reminder, Python 2.7 has stopped being supported by the upstream
>> CPython project (symbolically, a last 2.7 release will be made around
>> the next PyCon US, in April).
>>
>> PyArrow now supports Python versions from 3.5 to 3.8, but there's also
>> an issue open to remove 3.5 support: see ARROW-5679 on JIRA.
>>
>> Regards
>>
>> Antoine.
>>
>