[jira] [Created] (ARROW-7566) [CI] Use more recent Miniconda on AppVeyor

2020-01-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7566:
-

 Summary: [CI] Use more recent Miniconda on AppVeyor
 Key: ARROW-7566
 URL: https://issues.apache.org/jira/browse/ARROW-7566
 Project: Apache Arrow
  Issue Type: Wish
  Components: Continuous Integration
Reporter: Antoine Pitrou


A newer conda might improve setup speed because of the new package format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7567) Bump Checkstyle from 6.19 to 8.18

2020-01-14 Thread Fokko Driesprong (Jira)
Fokko Driesprong created ARROW-7567:
---

 Summary: Bump Checkstyle from 6.19 to 8.18
 Key: ARROW-7567
 URL: https://issues.apache.org/jira/browse/ARROW-7567
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 0.15.1
Reporter: Fokko Driesprong
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7568) Bump Apache Avro from 1.9.0 to 1.9.1

2020-01-14 Thread Fokko Driesprong (Jira)
Fokko Driesprong created ARROW-7568:
---

 Summary: Bump Apache Avro from 1.9.0 to 1.9.1
 Key: ARROW-7568
 URL: https://issues.apache.org/jira/browse/ARROW-7568
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 0.15.1
Reporter: Fokko Driesprong
 Fix For: 0.16.0


Apache Avro 1.9.1 contains some bugfixes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7569) [Python] Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions

2020-01-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7569:


 Summary: [Python] Add API to map Arrow types to pandas 
ExtensionDtypes for to_pandas conversions
 Key: ARROW-7569
 URL: https://issues.apache.org/jira/browse/ARROW-7569
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche
 Fix For: 0.16.0


ARROW-2428 was about adding such a mapping, and described three use cases (see 
this 
[comment|https://issues.apache.org/jira/browse/ARROW-2428?focusedCommentId=16914231&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16914231]
 for details):

* Basic roundtrip based on the pandas_metadata (in {{to_pandas}}, we check if 
the pandas_metadata specify pandas extension dtypes, and if so, use this as the 
target dtype for that column)
* Conversion for pyarrow extension types that can define their equivalent 
pandas extension dtype
* A way to override default conversion (eg for the built-in types, or in 
absence of pandas_metadata in the schema). This would require the user to be 
able to specify some mapping of pyarrow type or column name to the pandas 
extension dtype to use.

The PR that closed ARROW-2428 (https://github.com/apache/arrow/pull/5512) only 
covered the first two cases, and not the third case.

I think it is still interesting to also cover the third case in some way.  

An example use case are the new nullable dtypes that are introduced in pandas 
(eg the nullable integer dtype).  Assume I want to read a parquet file into a 
pandas DataFrame using this nullable integer dtype. The pyarrow Table has no 
pandas_metadata indicating to use this dtype (unless it was created from a 
pandas DataFrame that was already using this dtype, but that will often not be 
the case), and the pyarrow.int64() type is also not an extension type that can 
define its equivalent pandas extension dtype. 
Currently, the only solution is first read it into pandas DataFrame (which will 
use floats for the integers if there are nulls), and then afterwards to convert 
those floats back to a nullable integer dtype. 

A possible API for this could look like:

{code}
table.to_pandas(types_mapping={pa.int64(): pd.Int64Dtype()})
{code}

to indicate that you want to convert all columns of the pyarrow table with 
int64 type to a pandas column using the nullable Int64 dtype.
 








--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7570) Fix high severity issues

2020-01-14 Thread Fokko Driesprong (Jira)
Fokko Driesprong created ARROW-7570:
---

 Summary: Fix high severity issues
 Key: ARROW-7570
 URL: https://issues.apache.org/jira/browse/ARROW-7570
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 0.15.1
Reporter: Fokko Driesprong
 Fix For: 0.16.0


Fixes high severity issues reported by LGTM:

[https://lgtm.com/projects/g/apache/arrow/?mode=list&lang=java&severity=error]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7571) Correct minimal java version on README

2020-01-14 Thread Fokko Driesprong (Jira)
Fokko Driesprong created ARROW-7571:
---

 Summary: Correct minimal java version on README
 Key: ARROW-7571
 URL: https://issues.apache.org/jira/browse/ARROW-7571
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 0.15.1
Reporter: Fokko Driesprong
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7572) Enfore Maven 3.3+ as mentioned in README

2020-01-14 Thread Fokko Driesprong (Jira)
Fokko Driesprong created ARROW-7572:
---

 Summary: Enfore Maven 3.3+ as mentioned in README
 Key: ARROW-7572
 URL: https://issues.apache.org/jira/browse/ARROW-7572
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Affects Versions: 0.15.1
Reporter: Fokko Driesprong
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-01-14-0

2020-01-14 Thread Crossbow


Arrow Build Report for Job nightly-2020-01-14-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0

Failed Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-centos-6
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-travis-gandiva-jar-osx
- test-conda-r-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-r-3.6

Succeeded Tasks:
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-conda-win-vs2015-py38
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-azure-debian-stretch
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.7
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-01-14-0-circle-test-conda-python-3.8-dask-master
- test-conda-python-3.8-pandas-

[jira] [Created] (ARROW-7573) [Rust] Reduce boxing and cleanup

2020-01-14 Thread Gurwinder Singh (Jira)
Gurwinder Singh created ARROW-7573:
--

 Summary: [Rust] Reduce boxing and cleanup
 Key: ARROW-7573
 URL: https://issues.apache.org/jira/browse/ARROW-7573
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Gurwinder Singh
Assignee: Gurwinder Singh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7574) [Rust] FileSource read implementation is seeking for each single byte

2020-01-14 Thread Jira
Jörn Horstmann created ARROW-7574:
-

 Summary: [Rust] FileSource read implementation is seeking for each 
single byte
 Key: ARROW-7574
 URL: https://issues.apache.org/jira/browse/ARROW-7574
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 0.16.0
Reporter: Jörn Horstmann


on current master branch
{code:java}
$ RUST_BACKTRACE=1 strace target/debug/parquet-read tripdata.parquet{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7575) [R] Linux binary packaging followup

2020-01-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7575:
--

 Summary: [R] Linux binary packaging followup
 Key: ARROW-7575
 URL: https://issues.apache.org/jira/browse/ARROW-7575
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.16.0


After ARROW-6793 merged, I set up some nightly binary building CI and need to 
iterate on the install script and documentation to reflect what is available 
there.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7576) [C++][Dev] Improve fuzzing setup

2020-01-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7576:
-

 Summary: [C++][Dev] Improve fuzzing setup
 Key: ARROW-7576
 URL: https://issues.apache.org/jira/browse/ARROW-7576
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Developer Tools
Reporter: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7577) [C++][CI] Check fuzzer setup in CI

2020-01-14 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7577:
-

 Summary: [C++][CI] Check fuzzer setup in CI
 Key: ARROW-7577
 URL: https://issues.apache.org/jira/browse/ARROW-7577
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++, Continuous Integration
Reporter: Antoine Pitrou


Perhaps as a cron job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7578) [R] Add support for datasets with IPC files and with multiple sources

2020-01-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7578:
--

 Summary: [R] Add support for datasets with IPC files and with 
multiple sources
 Key: ARROW-7578
 URL: https://issues.apache.org/jira/browse/ARROW-7578
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Dataset, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7579) [FlightRPC] Make Handshake optional

2020-01-14 Thread David Li (Jira)
David Li created ARROW-7579:
---

 Summary: [FlightRPC] Make Handshake optional
 Key: ARROW-7579
 URL: https://issues.apache.org/jira/browse/ARROW-7579
 Project: Apache Arrow
  Issue Type: Bug
  Components: FlightRPC
Reporter: David Li
 Fix For: 1.0.0


We should make it possible to _not_ invoke Handshake for services that don't 
want it. Especially when using it with flight-grpc, where the standard gRPC 
authentication mechanisms don't know about Flight and try to authenticate the 
Handshake endpoint - it's easy to forget to configure this endpoint to bypass 
authentication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7580) [Website] 0.16 release post

2020-01-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7580:
--

 Summary: [Website] 0.16 release post
 Key: ARROW-7580
 URL: https://issues.apache.org/jira/browse/ARROW-7580
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Neal Richardson






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-14 Thread Neal Richardson
Hi all, to help us get ready, I've started a draft blog post for the 0.16
release: https://github.com/apache/arrow-site/pull/41

We'll need to fill in the sections. Feel free to push edits to my branch,
or you can also email me (personally is fine) and I can paste them in.

Neal


On Thu, Jan 9, 2020 at 5:37 PM Jacques Nadeau  wrote:

> Understood and appreciated. Yeah, it can become a bit of a mess.
>
> On Thu, Jan 9, 2020 at 12:22 PM Wes McKinney  wrote:
>
> > Will do -- there were many C++ and Python-related issues that I think
> > were put in 1.0.0 / 0.16.0 overly optimistically and so I removed the
> > Fix Version entirely (some of these had been pushed off 3-4 major
> > releases ago). I may have removed some Fix Versions from other
> > components that should have been rolled over -- sorry about that. It's
> > hard to judge on some issues that have been open for 6-12 months or
> > more.
> >
> > In general I think we should try to be more conservative about what
> > issues we pre-emptively assign fix versions -- there may be a more
> > constructive way that we can prioritize issues and distinguish between
> > "optimistic" / nice-to-have issues and "must do to release" issues.
> >
> > On Thu, Jan 9, 2020 at 12:42 PM Jacques Nadeau 
> wrote:
> > >
> > > It would be helpful that when something is assigned to a release and
> you
> > > want to push it out, you push it to the next release as opposed to
> > removing
> > > a fix version entirely. Thanks!
> > >
> > > On Tue, Jan 7, 2020 at 10:26 AM Wes McKinney 
> > wrote:
> > >
> > > > I just renamed the 1.0.0 release version in JIRA to 0.16.0 and will
> > > > work on removing issues that are not necessary to be able to release
> > > > (others, please help). If we make miraculous progress with the 1.0.0
> > > > columnar format blockers (per discussion below), we can change this
> > > > back, but I think either way we should put ourselves on a critical
> > > > path to have an RC cut by Friday January 24. Does that seem doable?
> > > >
> > > > On Tue, Jan 7, 2020 at 10:25 AM Wes McKinney 
> > wrote:
> > > > >
> > > > > We absolutely should have a list of exactly what needs to be done
> to
> > > > > put out the 1.0.0 release, but based on what we know needs to be
> done
> > > > > I am not optimistic that it can all be accomplished before the end
> of
> > > > > January. That doesn't mean that we should assume these things won't
> > > > > get done before March/April time frame. If they get done sooner,
> > let's
> > > > > release 1.0.0 sooner.
> > > > >
> > > > > On Mon, Jan 6, 2020 at 6:03 PM Neal Richardson
> > > > >  wrote:
> > > > > >
> > > > > > I'm all for maintaining a regular cadence of releases, but before
> > we
> > > > cast
> > > > > > aside the idea of 1.0, I'd still encourage us to do the work of
> > > > enumerating
> > > > > > what truly must happen before we call a release 1.0 so that we
> can
> > get
> > > > it
> > > > > > done. Otherwise, in April we're going to be talking about doing a
> > 0.17
> > > > > > release.
> > > > > >
> > > > > > I believe I've found the issues that Wes referenced and added
> them
> > as
> > > > > > "blockers" to 1.0.0. That brings the total blocker count listed
> on
> > > > > >
> > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release
> > > > to 10
> > > > > > issues, though some may be overlapping/redundant. Do we think
> this
> > is
> > > > an
> > > > > > exhaustive list of blockers? Should some of these be downgraded
> to
> > > > > > not-blocking? If we were to resolve all 10 of these issues, would
> > we
> > > > have
> > > > > > consensus that we're ready for 1.0?
> > > > > >
> > > > > > Would it help to update this wiki, which seems pretty stale at
> this
> > > > point?
> > > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone
> > > > > >
> > > > > > Thanks,
> > > > > > Neal
> > > > > >
> > > > > >
> > > > > > On Mon, Jan 6, 2020 at 11:40 AM Bryan Cutler 
> > > > wrote:
> > > > > >
> > > > > > > I agree on a 0.16.0 release. In the meantime I'll try to help
> out
> > > > with
> > > > > > > getting the Java side ready for 1.0.
> > > > > > >
> > > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya 
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Jacques,
> > > > > > > >
> > > > > > > > ARROW-4526 is interesting. I would like to try to resolve it.
> > > > > > > > Thanks a lot for the information.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Liya Fan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Sun, Jan 5, 2020 at 6:14 AM Jacques Nadeau <
> > jacq...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > The third ticket I was commenting on was ARROW-4526.
> > > > > > > > >
> > > > > > > > > Fan, do you want to take a shot at that one?
> > > > > > > > >
> > > > > > > > > On Fri, Jan 3, 2020 at 8:16 PM Fan Liya <
> > liya.fa...@gmail.com>
> > > > wrote:
> > > > > > > > >
> > > > > > > > > >   Hi Jacques,
> > > > > > > > > >
> > > > > >

[jira] [Created] (ARROW-7581) [R] Documentation/polishing for 0.16 release

2020-01-14 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-7581:
--

 Summary: [R] Documentation/polishing for 0.16 release
 Key: ARROW-7581
 URL: https://issues.apache.org/jira/browse/ARROW-7581
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.16.0


Includes updating NEWS.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7582) [Rust][Flight] Unable to compile arrow.flight.protocol.rs

2020-01-14 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7582:
--

 Summary: [Rust][Flight] Unable to compile arrow.flight.protocol.rs
 Key: ARROW-7582
 URL: https://issues.apache.org/jira/browse/ARROW-7582
 Project: Apache Arrow
  Issue Type: Bug
  Components: Rust
Reporter: Krisztian Szucs


Not sure exactly why, perhaps it has something to do with the recently updated 
dependencies: https://github.com/apache/arrow/runs/389937707

cc [~andygrove] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)