[jira] [Created] (ARROW-7224) [C++][Datasets] Partition level filters should be able to provide filtering to file systems

2019-11-20 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-7224:
--

 Summary: [C++][Datasets] Partition level filters should be able to 
provide filtering to file systems
 Key: ARROW-7224
 URL: https://issues.apache.org/jira/browse/ARROW-7224
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Dataset
Reporter: Micah Kornfield


When providing a filter for partitions, it should be possible in some cases to 
use it to optimize file system list calls.  This can greatly improve the speed 
for reading data from partitions because fewer number of directories/files need 
to be explored/expanded.  I've fallen behind on the dataset code, but I want to 
make sure this issue is tracked someplace.  This came up in SO question linked 
below (feel free to correct my analysis if I missed the functionality 
someplace).

Reference: 
[https://stackoverflow.com/questions/58868584/pyarrow-parquetdataset-read-is-slow-on-a-hive-partitioned-s3-dataset-despite-u/58951477#58951477]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors

2019-11-20 Thread Bryan Cutler
I'm not sure what the best way to handle this is.  Ideally we would use an
alternative that doesn't require setting a property, but I don't know Netty
well enough to make a recommendation. I also want to be careful not to
introduce anything that would hurt performance or cause any other side
effects. I made https://issues.apache.org/jira/browse/ARROW-7223 to track
this, we can continue the discussion there and I will try to do some
research into possible solutions.

On Wed, Nov 20, 2019 at 2:51 AM Fan Liya  wrote:

> Hi Bryan,
>
> Thanks for bringing this up.
> +1 for the change.
>
> I am not clear what is the right place to override the jvm property.
> It is possible that when we try to override it (possibly in a static
> block), the old property value has already been read by netty library.
> To avoid this problem, do we need to control the order of class loading?
>
> Best,
> Liya Fan
>
> On Mon, Nov 18, 2019 at 3:17 PM Micah Kornfield 
> wrote:
>
> > This sounds reasonable to me.  At this point I think having our consumers
> > have a better experience is more important then library purity concerns
> > I've had in the past.
> >
> > Do we need to handle jdk8 as a special case?  Do you think it pays to try
> > to find an alternate library that doesn't require special flags for
> > whatever we are using this functionality for?
> >
> > Thanks,
> > Micah
> >
> > On Sunday, November 17, 2019, Bryan Cutler  wrote:
> >
> > > After ARROW-3191 [1], consumers of Arrow Java with a JDK 9 and above
> are
> > > required to set the JVM property "io.netty.tryReflectionSetAccessible=
> > > true"
> > > at startup, each time Arrow code is run, as documented at [2]. Not
> doing
> > > this will result in the error "java.lang.UnsupportedOperationException:
> > > sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not
> available".
> > > This is due to a part of the Netty codebase, and I'm not sure if there
> > any
> > > way around it, but I don't think it's the correct behavior for Arrow
> > > out-of-the-box to fail with a 3rd party error by default. This has come
> > up
> > > before in our own unit testing [3], and most recently when upgrading
> > Arrow
> > > in Spark [4].
> > >
> > > I'd like to propose that Arrow Java change to the following behavior:
> > >
> > > 1) check to see if the property io.netty.tryReflectionSetAccessible has
> > > been set
> > > 2) if not set, automatically set to "true"
> > > 3) else if set to "false", catch the Netty error and prepend the error
> > > message with the suggested setting of "true"
> > >
> > > What do other devs think?
> > >
> > > [1] https://issues.apache.org/jira/browse/ARROW-3191
> > > [2] https://github.com/apache/arrow/tree/master/java#java-properties
> > > [3] https://issues.apache.org/jira/browse/ARROW-5412
> > > [4] https://github.com/apache/spark/pull/26552
> > >
> >
>


[jira] [Created] (ARROW-7223) [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true

2019-11-20 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-7223:
---

 Summary: [Java] Provide default setting of 
io.netty.tryReflectionSetAccessible=true
 Key: ARROW-7223
 URL: https://issues.apache.org/jira/browse/ARROW-7223
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Bryan Cutler


After ARROW-3191, consumers of Arrow Java with a JDK 9 and above are required 
to set the JVM property "io.netty.tryReflectionSetAccessible=true" at startup, 
each time Arrow code is run, as documented at 
https://github.com/apache/arrow/tree/master/java#java-properties. Not doing 
this will result in the error "java.lang.UnsupportedOperationException: 
sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available", making 
Arrow unusable out-of-the-box.

This proposes to automatically set the property if not already set in the 
following steps:

1) check to see if the property io.netty.tryReflectionSetAccessible has been set
2) if not set, automatically set to "true"
3) else if set to "false", catch the Netty error and prepend the error message 
with the suggested setting of "true"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS][C++] Pointer name aliasing

2019-11-20 Thread Micah Kornfield
A recent PR for datasets  [1] seems to have introduced the convention of
aliasing "std::shared_ptr" with "TypePtr" for some type.  I think in
the past we had decided not to use a convention like this but I can't find
the thread.  IMO, I think this generally makes the code less understandable
but this is a matter of taste.

Before the pattern gets too ingrained in the code base I just wanted to
make sure we discussed using this on the mailing list and made sure there
was a consensus on when to use the pattern.

Thanks,
Micah

[1] https://github.com/apache/arrow/pull/5857/files


[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)

2019-11-20 Thread Micah Kornfield
Hello,
As discussed on [1], I've proposed clarifications in a PR [2] that
clarifies:

1.  It is not required that all dictionary batches occur at the beginning
of the IPC stream format (if a the first record batch has an all null
dictionary encoded column, the null column's dictionary might not be sent
until later in the stream).

2.  A second dictionary batch for the same ID that is not a "delta batch"
in an IPC stream indicates the dictionary should be replaced.

3.  Clarifies that the file format, can only contain 1 "NON-delta"
dictionary batch and multiple "delta" dictionary batches. Dictionary
replacement is not supported in the file format.

4.  Add an enum to dictionary metadata for possible future changes in what
format dictionary batches can be sent. (the most likely would be an array
Map).  An enum is needed as a place holder to allow for forward
compatibility past the release 1.0.0.

If accepted there will be work in all implementations to make sure that
they cover the edge cases highlighted and additional integration testing
will be needed.

Please vote whether to accept these additions. The vote will be open for at
least 72 hours.

[ ] +1 Accept these change to the specification
[ ] +0
[ ] -1 Do not accept the changes because...

Thanks,
Micah


[1]
https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E
[2] https://github.com/apache/arrow/pull/5585


[jira] [Created] (ARROW-7222) [Python] Wipe any existing generated Python API documentation when updating website

2019-11-20 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7222:
---

 Summary: [Python] Wipe any existing generated Python API 
documentation when updating website
 Key: ARROW-7222
 URL: https://issues.apache.org/jira/browse/ARROW-7222
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 1.0.0


Removed APIs are persisting in Google searches, e.g.

https://arrow.apache.org/docs/python/generated/pyarrow.Column.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: ConcatenateTables APIs

2019-11-20 Thread Wes McKinney
I agree with introducing ConcatenateTables with an options argument
(which can have parameters added to it without disrupting public APIs
too much). It would be good to do this sooner rather than later


On Fri, Nov 15, 2019 at 12:22 AM Micah Kornfield  wrote:
>
> This sounds like a reasonable design to me.  One question I had for
> SchemaUnificationOptions will those only be applicable to Arrow schemas or
> does it make sense to extend them for other use-cases (like DataSet APIs).
>
> Cheers,
> Micah
>
> On Fri, Nov 8, 2019 at 10:27 AM Zhuo Peng  wrote:
>
> > Hi,
> >
> > https://github.com/apache/arrow/pull/5534 introduced
> > ConcatenateTablesWithPromotion(). And there is already a
> > ConcatenateTables() function which behaves differently (it requires the
> > tables to have the schema). Wes raised a concern in that PR [1] that we
> > might end up having many ConcatenateTables*() variants as there are various
> > things that can be tweaked and he suggested to introduce a
> > ConcatenateOptions so there is only one ConcatenateTables() function.
> >
> > While I'm onboard with that idea, I wanted to double check that there is a
> > consensus that we should (as of today) merge ConcatenateTables() and
> > ConcatenateTablesWithPromotion(), and have an option to do promotion or not
> > (as in an earlier comment in the PR, @bkietz advised otherwise, but maybe
> > at that point we didn't realize there were potentially many variants).
> >
> > [1] https://github.com/apache/arrow/pull/5534#discussion_r343745573
> >
> >
> > Thanks,
> >
> > Zhuo
> >


Re: Building Arrow 0.15.1 using dependencies in local source folder

2019-11-20 Thread Wes McKinney
I agree that the *_ROOT variables should be the way. If you find one
that does not work, please open a JIRA issue.

I don't think this is documented well enough in

http://arrow.apache.org/docs/developers/cpp.html#build-dependency-management

so I'm opening an issue

https://issues.apache.org/jira/browse/ARROW-7221

On Thu, Nov 14, 2019 at 2:49 PM Neal Richardson
 wrote:
>
> I am not an expert on this, but it seems you can specify `*_ROOT` arguments
> to cmake, like
> https://github.com/apache/arrow/blob/master/ci/PKGBUILD#L90-L91
>
> Maybe that does what you need?
>
> Neal
>
>
> On Thu, Nov 14, 2019 at 12:45 PM Tahsin Hassan 
> wrote:
>
> > Hi all,
> >
> > I am trying to build out arrow 0.15.1. The dependencies for arrow, e.g.
> > thrift, double-conversion are in a local source folder and we need to build
> > the dependencies from that location.
> >
> > I read up on
> >
> > https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst#offline-builds
> >
> >   *   BUNDLED: Building the dependency automatically from source
> >   *   SYSTEM: Finding the dependency in system paths using CMake's
> > built-in find_package function, or using pkg-config for packages that do
> > not have this feature
> > Unfortunately, that’s not exactly what I want.
> > and
> >
> > https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst#offline-builds
> > but, that basically downloads the tar(s) into a folder extracts them and
> > sets up build using that.
> > e.g.
> > $./download_dependencies.sh /sandbox/someArrowStuff/
> > # Environment variables for offline Arrow build
> > export ARROW_AWSSDK_URL=/sandbox/someArrowStuff/aws-sdk-cpp-1.7.160.tar.gz
> > export ARROW_BOOST_URL=/sandbox/someArrowStuff/boost-1.67.0.tar.gz
> > export ARROW_BROTLI_URL=/sandbox/someArrowStuff/brotli-v1.0.7.tar.gz
> > …
> >
> >
> > What I kind of wanted was , the set of environment variables that can
> > allow to set a source folder path
> > export ARROW_BOOST_MYPATH=/sandbox/someArrowStuff/ 3p/boost/
> > where /sandbox/someArrowStuff/ 3p/boost/ already holds the necessary boost
> > source folder and ARROW_BOOST_MYPATH is somekind of variable to help locate
> > the necessary source folder.
> >
> > Is there some option like that? Where can I dig for more information
> > regarding that?
> >
> > Thanks,
> > Tahsin
> >
> >
> >
> >
> >
> >


[jira] [Created] (ARROW-7221) [C++][Documentation] Document how to set installed location for individual toolchain components

2019-11-20 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-7221:
---

 Summary: [C++][Documentation] Document how to set installed 
location for individual toolchain components
 Key: ARROW-7221
 URL: https://issues.apache.org/jira/browse/ARROW-7221
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This is not well documented in 

http://arrow.apache.org/docs/developers/cpp.html#build-dependency-management

the CMake variable are {{$DEPENDENCY_NAME_ROOT}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-11-20-0

2019-11-20 Thread Krisztián Szűcs
Currently I'm busy with migrating our development docker
images to the apache organization [1], but afterwards I'll go
through the failures. Any help is appreciated!

[1]: https://issues.apache.org/jira/browse/ARROW-7116

On Wed, Nov 20, 2019 at 2:01 PM Crossbow  wrote:
>
>
> Arrow Build Report for Job nightly-2019-11-20-0
>
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0
>
> Failed Tasks:
> - conda-osx-clang-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py27
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py37
> - test-conda-python-2.7-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-2.7-pandas-master
> - test-conda-python-3.7-dask-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-dask-master
> - test-conda-python-3.7-pandas-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-pandas-master
> - test-conda-python-3.7-spark-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-spark-master
> - test-fedora-29-python-3:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-fedora-29-python-3
> - test-ubuntu-14.04-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-14.04-cpp
> - test-ubuntu-18.04-python-3:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-18.04-python-3
> - test-ubuntu-fuzzit:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-fuzzit
> - wheel-manylinux1-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27m
> - wheel-manylinux1-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27mu
> - wheel-manylinux1-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp35m
> - wheel-manylinux1-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp36m
> - wheel-manylinux1-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp37m
> - wheel-manylinux2010-cp27m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27m
> - wheel-manylinux2010-cp27mu:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27mu
> - wheel-manylinux2010-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp35m
> - wheel-manylinux2010-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp36m
> - wheel-manylinux2010-cp37m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp37m
> - wheel-osx-cp35m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp35m
> - wheel-osx-cp36m:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp36m
>
> Succeeded Tasks:
> - centos-6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-6
> - centos-7:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-7
> - centos-8:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-8
> - conda-linux-gcc-py27:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py27
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py37
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> 

[jira] [Created] (ARROW-7220) [CI] Docker compose (github actions) Mac Python 3 build is using Python 2

2019-11-20 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7220:


 Summary: [CI] Docker compose (github actions) Mac Python 3 build 
is using Python 2
 Key: ARROW-7220
 URL: https://issues.apache.org/jira/browse/ARROW-7220
 Project: Apache Arrow
  Issue Type: Test
Reporter: Joris Van den Bossche


The "AMD64 MacOS 10.15 Python 3" build is also running in python 2.

Possibly related to how brew is installing python 2 / 3, or because it is using 
the system python, ... (not familiar with mac)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7219) [CI][Python] Install pickle5 in the conda-python docker image for python version 3.6

2019-11-20 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7219:
--

 Summary: [CI][Python] Install pickle5 in the conda-python docker 
image for python version 3.6
 Key: ARROW-7219
 URL: https://issues.apache.org/jira/browse/ARROW-7219
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Python
Reporter: Krisztian Szucs
 Fix For: 1.0.0


See conversation https://github.com/apache/arrow/pull/5873#discussion_r348510729



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7218) [Python] Conversion from boolean numpy scalars not working

2019-11-20 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7218:


 Summary: [Python] Conversion from boolean numpy scalars not working
 Key: ARROW-7218
 URL: https://issues.apache.org/jira/browse/ARROW-7218
 Project: Apache Arrow
  Issue Type: Test
  Components: Python
Reporter: Joris Van den Bossche


In general, we are fine to accept a list of numpy scalars:

{code}
In [12]: type(list(np.array([1, 2]))[0])

   
Out[12]: numpy.int64

In [13]: pa.array(list(np.array([1, 2])))   

   
Out[13]: 

[
  1,
  2
]
{code}

But for booleans, this doesn't work:

{code}
In [14]: pa.array(list(np.array([True, False])))

   
---
ArrowInvalid  Traceback (most recent call last)
 in 
> 1 pa.array(list(np.array([True, False])))

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()

~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array()

ArrowInvalid: Could not convert True with type numpy.bool_: tried to convert to 
boolean
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2019-11-20-0

2019-11-20 Thread Crossbow


Arrow Build Report for Job nightly-2019-11-20-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0

Failed Tasks:
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py37
- test-conda-python-2.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-2.7-pandas-master
- test-conda-python-3.7-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-dask-master
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-spark-master
- test-fedora-29-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-fedora-29-python-3
- test-ubuntu-14.04-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-14.04-cpp
- test-ubuntu-18.04-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-18.04-python-3
- test-ubuntu-fuzzit:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-fuzzit
- wheel-manylinux1-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27m
- wheel-manylinux1-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27mu
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp35m
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp36m
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp37m
- wheel-manylinux2010-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27m
- wheel-manylinux2010-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27mu
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp35m
- wheel-manylinux2010-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp36m
- wheel-manylinux2010-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp37m
- wheel-osx-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp35m
- wheel-osx-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp36m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-win-vs2015-py37
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-debian-stretch
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-gandiva-jar-osx
- gandiva-jar-trusty:
  URL: 

Re: MIME type

2019-11-20 Thread Antoine Pitrou


If it's not standardized, shouldn't it be prefixed with x-?

e.g. application/x-apache-arrow-stream


Le 20/11/2019 à 08:29, Micah Kornfield a écrit :
> I would propose:
> application/apache-arrow-stream
> application/apache-arrow-file
> 
> I'm not attached to those names but I think there should be two different
> mime-types, since the formats are not interchangeable.
> 
> On Tue, Nov 19, 2019 at 10:31 PM Sutou Kouhei  wrote:
> 
>> Hi,
>>
>> What MIME type should be used for Apache Arrow data?
>> application/arrow?
>>
>> Should we use the same MIME type for IPC Streaming Format[1]
>> and IPC File Format[2]? Or should we use different MIME
>> types for them?
>>
>> [1]
>> https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
>> [2] https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format
>>
>>
>> Thanks,
>> --
>> kou
>>
> 


Re: [Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors

2019-11-20 Thread Fan Liya
Hi Bryan,

Thanks for bringing this up.
+1 for the change.

I am not clear what is the right place to override the jvm property.
It is possible that when we try to override it (possibly in a static
block), the old property value has already been read by netty library.
To avoid this problem, do we need to control the order of class loading?

Best,
Liya Fan

On Mon, Nov 18, 2019 at 3:17 PM Micah Kornfield 
wrote:

> This sounds reasonable to me.  At this point I think having our consumers
> have a better experience is more important then library purity concerns
> I've had in the past.
>
> Do we need to handle jdk8 as a special case?  Do you think it pays to try
> to find an alternate library that doesn't require special flags for
> whatever we are using this functionality for?
>
> Thanks,
> Micah
>
> On Sunday, November 17, 2019, Bryan Cutler  wrote:
>
> > After ARROW-3191 [1], consumers of Arrow Java with a JDK 9 and above are
> > required to set the JVM property "io.netty.tryReflectionSetAccessible=
> > true"
> > at startup, each time Arrow code is run, as documented at [2]. Not doing
> > this will result in the error "java.lang.UnsupportedOperationException:
> > sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available".
> > This is due to a part of the Netty codebase, and I'm not sure if there
> any
> > way around it, but I don't think it's the correct behavior for Arrow
> > out-of-the-box to fail with a 3rd party error by default. This has come
> up
> > before in our own unit testing [3], and most recently when upgrading
> Arrow
> > in Spark [4].
> >
> > I'd like to propose that Arrow Java change to the following behavior:
> >
> > 1) check to see if the property io.netty.tryReflectionSetAccessible has
> > been set
> > 2) if not set, automatically set to "true"
> > 3) else if set to "false", catch the Netty error and prepend the error
> > message with the suggested setting of "true"
> >
> > What do other devs think?
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-3191
> > [2] https://github.com/apache/arrow/tree/master/java#java-properties
> > [3] https://issues.apache.org/jira/browse/ARROW-5412
> > [4] https://github.com/apache/spark/pull/26552
> >
>


[jira] [Created] (ARROW-7217) Docker compose / github actions ignores PYTHON env

2019-11-20 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7217:


 Summary: Docker compose / github actions ignores PYTHON env
 Key: ARROW-7217
 URL: https://issues.apache.org/jira/browse/ARROW-7217
 Project: Apache Arrow
  Issue Type: Test
  Components: CI
Reporter: Joris Van den Bossche


The "AMD64 Conda Python 2.7" build is actually using Python 3.6. 

This python 3.6 version is written in the conda-python.dockerfile: 
https://github.com/apache/arrow/blob/master/ci/docker/conda-python.dockerfile#L24
 
and I am not fully sure how the ENV variable overrides that or not

cc [~kszucs]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7216) [Java] Improve the performance of setting/clearing individual bits

2019-11-20 Thread Liya Fan (Jira)
Liya Fan created ARROW-7216:
---

 Summary: [Java] Improve the performance of setting/clearing 
individual bits
 Key: ARROW-7216
 URL: https://issues.apache.org/jira/browse/ARROW-7216
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Liya Fan
Assignee: Liya Fan


Setting/clearing individual bits are key operations for Arrow. In this issue, 
we improve the performance these operations by:

1. replacing arithmetic operations with bit-wise operations
2. remove unnecessary casts between int/byte
3. provide new API to remove the if branch

Benchmark results show that for clearing a bit, the performance improve by 11%, 
and for general set/clear operation, the performance improve by 4.7%:

before:
BitVectorHelperBenchmarks.setValidityBitBenchmarkavgt5  4.524 ± 
0.015  us/op

after:
BitVectorHelperBenchmarks.setValidityBitBenchmarkavgt5  4.313 ± 
0.011  us/op
BitVectorHelperBenchmarks.setValidityBitToZeroBenchmark  avgt5  4.020 ± 
0.016  us/op





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-7215) [C++][Gandiva] Implement castVARCHAR(integer_type) functions in Gandiva

2019-11-20 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-7215:
-

 Summary: [C++][Gandiva] Implement castVARCHAR(integer_type) 
functions in Gandiva
 Key: ARROW-7215
 URL: https://issues.apache.org/jira/browse/ARROW-7215
 Project: Apache Arrow
  Issue Type: Task
  Components: C++ - Gandiva
Reporter: Projjal Chanda
Assignee: Projjal Chanda


Support following function signature in Gandiva:
FunctionSignature\{name =castVARCHAR, return type =Utf8, param types 
=[integer_type, Int(64, true)]}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)