[jira] [Created] (ARROW-7224) [C++][Datasets] Partition level filters should be able to provide filtering to file systems
Micah Kornfield created ARROW-7224: -- Summary: [C++][Datasets] Partition level filters should be able to provide filtering to file systems Key: ARROW-7224 URL: https://issues.apache.org/jira/browse/ARROW-7224 Project: Apache Arrow Issue Type: Improvement Components: C++, C++ - Dataset Reporter: Micah Kornfield When providing a filter for partitions, it should be possible in some cases to use it to optimize file system list calls. This can greatly improve the speed for reading data from partitions because fewer number of directories/files need to be explored/expanded. I've fallen behind on the dataset code, but I want to make sure this issue is tracked someplace. This came up in SO question linked below (feel free to correct my analysis if I missed the functionality someplace). Reference: [https://stackoverflow.com/questions/58868584/pyarrow-parquetdataset-read-is-slow-on-a-hive-partitioned-s3-dataset-despite-u/58951477#58951477] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors
I'm not sure what the best way to handle this is. Ideally we would use an alternative that doesn't require setting a property, but I don't know Netty well enough to make a recommendation. I also want to be careful not to introduce anything that would hurt performance or cause any other side effects. I made https://issues.apache.org/jira/browse/ARROW-7223 to track this, we can continue the discussion there and I will try to do some research into possible solutions. On Wed, Nov 20, 2019 at 2:51 AM Fan Liya wrote: > Hi Bryan, > > Thanks for bringing this up. > +1 for the change. > > I am not clear what is the right place to override the jvm property. > It is possible that when we try to override it (possibly in a static > block), the old property value has already been read by netty library. > To avoid this problem, do we need to control the order of class loading? > > Best, > Liya Fan > > On Mon, Nov 18, 2019 at 3:17 PM Micah Kornfield > wrote: > > > This sounds reasonable to me. At this point I think having our consumers > > have a better experience is more important then library purity concerns > > I've had in the past. > > > > Do we need to handle jdk8 as a special case? Do you think it pays to try > > to find an alternate library that doesn't require special flags for > > whatever we are using this functionality for? > > > > Thanks, > > Micah > > > > On Sunday, November 17, 2019, Bryan Cutler wrote: > > > > > After ARROW-3191 [1], consumers of Arrow Java with a JDK 9 and above > are > > > required to set the JVM property "io.netty.tryReflectionSetAccessible= > > > true" > > > at startup, each time Arrow code is run, as documented at [2]. Not > doing > > > this will result in the error "java.lang.UnsupportedOperationException: > > > sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not > available". > > > This is due to a part of the Netty codebase, and I'm not sure if there > > any > > > way around it, but I don't think it's the correct behavior for Arrow > > > out-of-the-box to fail with a 3rd party error by default. This has come > > up > > > before in our own unit testing [3], and most recently when upgrading > > Arrow > > > in Spark [4]. > > > > > > I'd like to propose that Arrow Java change to the following behavior: > > > > > > 1) check to see if the property io.netty.tryReflectionSetAccessible has > > > been set > > > 2) if not set, automatically set to "true" > > > 3) else if set to "false", catch the Netty error and prepend the error > > > message with the suggested setting of "true" > > > > > > What do other devs think? > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-3191 > > > [2] https://github.com/apache/arrow/tree/master/java#java-properties > > > [3] https://issues.apache.org/jira/browse/ARROW-5412 > > > [4] https://github.com/apache/spark/pull/26552 > > > > > >
[jira] [Created] (ARROW-7223) [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true
Bryan Cutler created ARROW-7223: --- Summary: [Java] Provide default setting of io.netty.tryReflectionSetAccessible=true Key: ARROW-7223 URL: https://issues.apache.org/jira/browse/ARROW-7223 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Bryan Cutler After ARROW-3191, consumers of Arrow Java with a JDK 9 and above are required to set the JVM property "io.netty.tryReflectionSetAccessible=true" at startup, each time Arrow code is run, as documented at https://github.com/apache/arrow/tree/master/java#java-properties. Not doing this will result in the error "java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available", making Arrow unusable out-of-the-box. This proposes to automatically set the property if not already set in the following steps: 1) check to see if the property io.netty.tryReflectionSetAccessible has been set 2) if not set, automatically set to "true" 3) else if set to "false", catch the Netty error and prepend the error message with the suggested setting of "true" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[DISCUSS][C++] Pointer name aliasing
A recent PR for datasets [1] seems to have introduced the convention of aliasing "std::shared_ptr" with "TypePtr" for some type. I think in the past we had decided not to use a convention like this but I can't find the thread. IMO, I think this generally makes the code less understandable but this is a matter of taste. Before the pattern gets too ingrained in the code base I just wanted to make sure we discussed using this on the mailing list and made sure there was a consensus on when to use the pattern. Thanks, Micah [1] https://github.com/apache/arrow/pull/5857/files
[VOTE] Clarifications and forward compatibility changes for Dictionary Encoding (second iteration)
Hello, As discussed on [1], I've proposed clarifications in a PR [2] that clarifies: 1. It is not required that all dictionary batches occur at the beginning of the IPC stream format (if a the first record batch has an all null dictionary encoded column, the null column's dictionary might not be sent until later in the stream). 2. A second dictionary batch for the same ID that is not a "delta batch" in an IPC stream indicates the dictionary should be replaced. 3. Clarifies that the file format, can only contain 1 "NON-delta" dictionary batch and multiple "delta" dictionary batches. Dictionary replacement is not supported in the file format. 4. Add an enum to dictionary metadata for possible future changes in what format dictionary batches can be sent. (the most likely would be an array Map). An enum is needed as a place holder to allow for forward compatibility past the release 1.0.0. If accepted there will be work in all implementations to make sure that they cover the edge cases highlighted and additional integration testing will be needed. Please vote whether to accept these additions. The vote will be open for at least 72 hours. [ ] +1 Accept these change to the specification [ ] +0 [ ] -1 Do not accept the changes because... Thanks, Micah [1] https://lists.apache.org/thread.html/d0f137e9db0abfcfde2ef879ca517a710f620e5be4dd749923d22c37@%3Cdev.arrow.apache.org%3E [2] https://github.com/apache/arrow/pull/5585
[jira] [Created] (ARROW-7222) [Python] Wipe any existing generated Python API documentation when updating website
Wes McKinney created ARROW-7222: --- Summary: [Python] Wipe any existing generated Python API documentation when updating website Key: ARROW-7222 URL: https://issues.apache.org/jira/browse/ARROW-7222 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 1.0.0 Removed APIs are persisting in Google searches, e.g. https://arrow.apache.org/docs/python/generated/pyarrow.Column.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: ConcatenateTables APIs
I agree with introducing ConcatenateTables with an options argument (which can have parameters added to it without disrupting public APIs too much). It would be good to do this sooner rather than later On Fri, Nov 15, 2019 at 12:22 AM Micah Kornfield wrote: > > This sounds like a reasonable design to me. One question I had for > SchemaUnificationOptions will those only be applicable to Arrow schemas or > does it make sense to extend them for other use-cases (like DataSet APIs). > > Cheers, > Micah > > On Fri, Nov 8, 2019 at 10:27 AM Zhuo Peng wrote: > > > Hi, > > > > https://github.com/apache/arrow/pull/5534 introduced > > ConcatenateTablesWithPromotion(). And there is already a > > ConcatenateTables() function which behaves differently (it requires the > > tables to have the schema). Wes raised a concern in that PR [1] that we > > might end up having many ConcatenateTables*() variants as there are various > > things that can be tweaked and he suggested to introduce a > > ConcatenateOptions so there is only one ConcatenateTables() function. > > > > While I'm onboard with that idea, I wanted to double check that there is a > > consensus that we should (as of today) merge ConcatenateTables() and > > ConcatenateTablesWithPromotion(), and have an option to do promotion or not > > (as in an earlier comment in the PR, @bkietz advised otherwise, but maybe > > at that point we didn't realize there were potentially many variants). > > > > [1] https://github.com/apache/arrow/pull/5534#discussion_r343745573 > > > > > > Thanks, > > > > Zhuo > >
Re: Building Arrow 0.15.1 using dependencies in local source folder
I agree that the *_ROOT variables should be the way. If you find one that does not work, please open a JIRA issue. I don't think this is documented well enough in http://arrow.apache.org/docs/developers/cpp.html#build-dependency-management so I'm opening an issue https://issues.apache.org/jira/browse/ARROW-7221 On Thu, Nov 14, 2019 at 2:49 PM Neal Richardson wrote: > > I am not an expert on this, but it seems you can specify `*_ROOT` arguments > to cmake, like > https://github.com/apache/arrow/blob/master/ci/PKGBUILD#L90-L91 > > Maybe that does what you need? > > Neal > > > On Thu, Nov 14, 2019 at 12:45 PM Tahsin Hassan > wrote: > > > Hi all, > > > > I am trying to build out arrow 0.15.1. The dependencies for arrow, e.g. > > thrift, double-conversion are in a local source folder and we need to build > > the dependencies from that location. > > > > I read up on > > > > https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst#offline-builds > > > > * BUNDLED: Building the dependency automatically from source > > * SYSTEM: Finding the dependency in system paths using CMake's > > built-in find_package function, or using pkg-config for packages that do > > not have this feature > > Unfortunately, that’s not exactly what I want. > > and > > > > https://github.com/apache/arrow/blob/master/docs/source/developers/cpp.rst#offline-builds > > but, that basically downloads the tar(s) into a folder extracts them and > > sets up build using that. > > e.g. > > $./download_dependencies.sh /sandbox/someArrowStuff/ > > # Environment variables for offline Arrow build > > export ARROW_AWSSDK_URL=/sandbox/someArrowStuff/aws-sdk-cpp-1.7.160.tar.gz > > export ARROW_BOOST_URL=/sandbox/someArrowStuff/boost-1.67.0.tar.gz > > export ARROW_BROTLI_URL=/sandbox/someArrowStuff/brotli-v1.0.7.tar.gz > > … > > > > > > What I kind of wanted was , the set of environment variables that can > > allow to set a source folder path > > export ARROW_BOOST_MYPATH=/sandbox/someArrowStuff/ 3p/boost/ > > where /sandbox/someArrowStuff/ 3p/boost/ already holds the necessary boost > > source folder and ARROW_BOOST_MYPATH is somekind of variable to help locate > > the necessary source folder. > > > > Is there some option like that? Where can I dig for more information > > regarding that? > > > > Thanks, > > Tahsin > > > > > > > > > > > >
[jira] [Created] (ARROW-7221) [C++][Documentation] Document how to set installed location for individual toolchain components
Wes McKinney created ARROW-7221: --- Summary: [C++][Documentation] Document how to set installed location for individual toolchain components Key: ARROW-7221 URL: https://issues.apache.org/jira/browse/ARROW-7221 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 This is not well documented in http://arrow.apache.org/docs/developers/cpp.html#build-dependency-management the CMake variable are {{$DEPENDENCY_NAME_ROOT}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-11-20-0
Currently I'm busy with migrating our development docker images to the apache organization [1], but afterwards I'll go through the failures. Any help is appreciated! [1]: https://issues.apache.org/jira/browse/ARROW-7116 On Wed, Nov 20, 2019 at 2:01 PM Crossbow wrote: > > > Arrow Build Report for Job nightly-2019-11-20-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0 > > Failed Tasks: > - conda-osx-clang-py27: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py27 > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py36 > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py37 > - test-conda-python-2.7-pandas-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-2.7-pandas-master > - test-conda-python-3.7-dask-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-dask-master > - test-conda-python-3.7-pandas-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-pandas-master > - test-conda-python-3.7-spark-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-spark-master > - test-fedora-29-python-3: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-fedora-29-python-3 > - test-ubuntu-14.04-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-14.04-cpp > - test-ubuntu-18.04-python-3: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-18.04-python-3 > - test-ubuntu-fuzzit: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-fuzzit > - wheel-manylinux1-cp27m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27m > - wheel-manylinux1-cp27mu: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27mu > - wheel-manylinux1-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp35m > - wheel-manylinux1-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp36m > - wheel-manylinux1-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp37m > - wheel-manylinux2010-cp27m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27m > - wheel-manylinux2010-cp27mu: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27mu > - wheel-manylinux2010-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp35m > - wheel-manylinux2010-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp36m > - wheel-manylinux2010-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp37m > - wheel-osx-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp35m > - wheel-osx-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp36m > > Succeeded Tasks: > - centos-6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-6 > - centos-7: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-7 > - centos-8: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-8 > - conda-linux-gcc-py27: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py27 > - conda-linux-gcc-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py36 > - conda-linux-gcc-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py37 > - conda-win-vs2015-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-win-vs2015-py36 > - conda-win-vs2015-py37: > URL: >
[jira] [Created] (ARROW-7220) [CI] Docker compose (github actions) Mac Python 3 build is using Python 2
Joris Van den Bossche created ARROW-7220: Summary: [CI] Docker compose (github actions) Mac Python 3 build is using Python 2 Key: ARROW-7220 URL: https://issues.apache.org/jira/browse/ARROW-7220 Project: Apache Arrow Issue Type: Test Reporter: Joris Van den Bossche The "AMD64 MacOS 10.15 Python 3" build is also running in python 2. Possibly related to how brew is installing python 2 / 3, or because it is using the system python, ... (not familiar with mac) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7219) [CI][Python] Install pickle5 in the conda-python docker image for python version 3.6
Krisztian Szucs created ARROW-7219: -- Summary: [CI][Python] Install pickle5 in the conda-python docker image for python version 3.6 Key: ARROW-7219 URL: https://issues.apache.org/jira/browse/ARROW-7219 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, Python Reporter: Krisztian Szucs Fix For: 1.0.0 See conversation https://github.com/apache/arrow/pull/5873#discussion_r348510729 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7218) [Python] Conversion from boolean numpy scalars not working
Joris Van den Bossche created ARROW-7218: Summary: [Python] Conversion from boolean numpy scalars not working Key: ARROW-7218 URL: https://issues.apache.org/jira/browse/ARROW-7218 Project: Apache Arrow Issue Type: Test Components: Python Reporter: Joris Van den Bossche In general, we are fine to accept a list of numpy scalars: {code} In [12]: type(list(np.array([1, 2]))[0]) Out[12]: numpy.int64 In [13]: pa.array(list(np.array([1, 2]))) Out[13]: [ 1, 2 ] {code} But for booleans, this doesn't work: {code} In [14]: pa.array(list(np.array([True, False]))) --- ArrowInvalid Traceback (most recent call last) in > 1 pa.array(list(np.array([True, False]))) ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib.array() ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array() ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._ndarray_to_array() ArrowInvalid: Could not convert True with type numpy.bool_: tried to convert to boolean {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2019-11-20-0
Arrow Build Report for Job nightly-2019-11-20-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0 Failed Tasks: - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py27 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-osx-clang-py37 - test-conda-python-2.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-2.7-pandas-master - test-conda-python-3.7-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-dask-master - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-conda-python-3.7-spark-master - test-fedora-29-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-fedora-29-python-3 - test-ubuntu-14.04-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-14.04-cpp - test-ubuntu-18.04-python-3: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-18.04-python-3 - test-ubuntu-fuzzit: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-circle-test-ubuntu-fuzzit - wheel-manylinux1-cp27m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27m - wheel-manylinux1-cp27mu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp27mu - wheel-manylinux1-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp35m - wheel-manylinux1-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp36m - wheel-manylinux1-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux1-cp37m - wheel-manylinux2010-cp27m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27m - wheel-manylinux2010-cp27mu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp27mu - wheel-manylinux2010-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp35m - wheel-manylinux2010-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp36m - wheel-manylinux2010-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-manylinux2010-cp37m - wheel-osx-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp35m - wheel-osx-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-wheel-osx-cp36m Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-centos-8 - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py27 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-linux-gcc-py37 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-conda-win-vs2015-py37 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-azure-debian-stretch - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-20-0-travis-gandiva-jar-osx - gandiva-jar-trusty: URL:
Re: MIME type
If it's not standardized, shouldn't it be prefixed with x-? e.g. application/x-apache-arrow-stream Le 20/11/2019 à 08:29, Micah Kornfield a écrit : > I would propose: > application/apache-arrow-stream > application/apache-arrow-file > > I'm not attached to those names but I think there should be two different > mime-types, since the formats are not interchangeable. > > On Tue, Nov 19, 2019 at 10:31 PM Sutou Kouhei wrote: > >> Hi, >> >> What MIME type should be used for Apache Arrow data? >> application/arrow? >> >> Should we use the same MIME type for IPC Streaming Format[1] >> and IPC File Format[2]? Or should we use different MIME >> types for them? >> >> [1] >> https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format >> [2] https://arrow.apache.org/docs/format/Columnar.html#ipc-file-format >> >> >> Thanks, >> -- >> kou >> >
Re: [Discuss][Java] Provide default for io.netty.tryReflectionSetAccessible to prevent errors
Hi Bryan, Thanks for bringing this up. +1 for the change. I am not clear what is the right place to override the jvm property. It is possible that when we try to override it (possibly in a static block), the old property value has already been read by netty library. To avoid this problem, do we need to control the order of class loading? Best, Liya Fan On Mon, Nov 18, 2019 at 3:17 PM Micah Kornfield wrote: > This sounds reasonable to me. At this point I think having our consumers > have a better experience is more important then library purity concerns > I've had in the past. > > Do we need to handle jdk8 as a special case? Do you think it pays to try > to find an alternate library that doesn't require special flags for > whatever we are using this functionality for? > > Thanks, > Micah > > On Sunday, November 17, 2019, Bryan Cutler wrote: > > > After ARROW-3191 [1], consumers of Arrow Java with a JDK 9 and above are > > required to set the JVM property "io.netty.tryReflectionSetAccessible= > > true" > > at startup, each time Arrow code is run, as documented at [2]. Not doing > > this will result in the error "java.lang.UnsupportedOperationException: > > sun.misc.Unsafe or java.nio.DirectByteBuffer.(long, int) not available". > > This is due to a part of the Netty codebase, and I'm not sure if there > any > > way around it, but I don't think it's the correct behavior for Arrow > > out-of-the-box to fail with a 3rd party error by default. This has come > up > > before in our own unit testing [3], and most recently when upgrading > Arrow > > in Spark [4]. > > > > I'd like to propose that Arrow Java change to the following behavior: > > > > 1) check to see if the property io.netty.tryReflectionSetAccessible has > > been set > > 2) if not set, automatically set to "true" > > 3) else if set to "false", catch the Netty error and prepend the error > > message with the suggested setting of "true" > > > > What do other devs think? > > > > [1] https://issues.apache.org/jira/browse/ARROW-3191 > > [2] https://github.com/apache/arrow/tree/master/java#java-properties > > [3] https://issues.apache.org/jira/browse/ARROW-5412 > > [4] https://github.com/apache/spark/pull/26552 > > >
[jira] [Created] (ARROW-7217) Docker compose / github actions ignores PYTHON env
Joris Van den Bossche created ARROW-7217: Summary: Docker compose / github actions ignores PYTHON env Key: ARROW-7217 URL: https://issues.apache.org/jira/browse/ARROW-7217 Project: Apache Arrow Issue Type: Test Components: CI Reporter: Joris Van den Bossche The "AMD64 Conda Python 2.7" build is actually using Python 3.6. This python 3.6 version is written in the conda-python.dockerfile: https://github.com/apache/arrow/blob/master/ci/docker/conda-python.dockerfile#L24 and I am not fully sure how the ENV variable overrides that or not cc [~kszucs] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7216) [Java] Improve the performance of setting/clearing individual bits
Liya Fan created ARROW-7216: --- Summary: [Java] Improve the performance of setting/clearing individual bits Key: ARROW-7216 URL: https://issues.apache.org/jira/browse/ARROW-7216 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Liya Fan Assignee: Liya Fan Setting/clearing individual bits are key operations for Arrow. In this issue, we improve the performance these operations by: 1. replacing arithmetic operations with bit-wise operations 2. remove unnecessary casts between int/byte 3. provide new API to remove the if branch Benchmark results show that for clearing a bit, the performance improve by 11%, and for general set/clear operation, the performance improve by 4.7%: before: BitVectorHelperBenchmarks.setValidityBitBenchmarkavgt5 4.524 ± 0.015 us/op after: BitVectorHelperBenchmarks.setValidityBitBenchmarkavgt5 4.313 ± 0.011 us/op BitVectorHelperBenchmarks.setValidityBitToZeroBenchmark avgt5 4.020 ± 0.016 us/op -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7215) [C++][Gandiva] Implement castVARCHAR(integer_type) functions in Gandiva
Projjal Chanda created ARROW-7215: - Summary: [C++][Gandiva] Implement castVARCHAR(integer_type) functions in Gandiva Key: ARROW-7215 URL: https://issues.apache.org/jira/browse/ARROW-7215 Project: Apache Arrow Issue Type: Task Components: C++ - Gandiva Reporter: Projjal Chanda Assignee: Projjal Chanda Support following function signature in Gandiva: FunctionSignature\{name =castVARCHAR, return type =Utf8, param types =[integer_type, Int(64, true)]} -- This message was sent by Atlassian Jira (v8.3.4#803005)