[jira] [Updated] (ARROW-6790) [Release] Automatically disable integration test cases in release verification
[ https://issues.apache.org/jira/browse/ARROW-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6790: -- Labels: pull-request-available (was: ) > [Release] Automatically disable integration test cases in release verification > -- > > Key: ARROW-6790 > URL: https://issues.apache.org/jira/browse/ARROW-6790 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Minor > Labels: pull-request-available > > If dev/release/verify-release-candidate.sh is run with selective testing and > includes integration tests, the selected implementations should be the only > ones enabled when running the integration test portion. For example: > TEST_DEFAULT=0 \ > TEST_CPP=1 \ > TEST_JAVA=1 \ > TEST_INTEGRATION=1 \ > dev/release/verify-release-candidate.sh source 0.15.0 2 > Should run integration only for C++ and Java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6790) [Release] Automatically disable integration test cases in release verification
Bryan Cutler created ARROW-6790: --- Summary: [Release] Automatically disable integration test cases in release verification Key: ARROW-6790 URL: https://issues.apache.org/jira/browse/ARROW-6790 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Bryan Cutler Assignee: Bryan Cutler If dev/release/verify-release-candidate.sh is run with selective testing and includes integration tests, the selected implementations should be the only ones enabled when running the integration test portion. For example: TEST_DEFAULT=0 \ TEST_CPP=1 \ TEST_JAVA=1 \ TEST_INTEGRATION=1 \ dev/release/verify-release-candidate.sh source 0.15.0 2 Should run integration only for C++ and Java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6789) [Python] Automatically box bytes/buffer-like values yielded from `FlightServerBase.do_action` in Result values
Wes McKinney created ARROW-6789: --- Summary: [Python] Automatically box bytes/buffer-like values yielded from `FlightServerBase.do_action` in Result values Key: ARROW-6789 URL: https://issues.apache.org/jira/browse/ARROW-6789 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney Fix For: 1.0.0 This will help with less boilerplate for server implementations -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6736) [Rust] [DataFusion] Aggregate expressions get evaluated repeatedly
[ https://issues.apache.org/jira/browse/ARROW-6736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-6736. --- Resolution: Fixed Issue resolved by pull request 5542 [https://github.com/apache/arrow/pull/5542] > [Rust] [DataFusion] Aggregate expressions get evaluated repeatedly > -- > > Key: ARROW-6736 > URL: https://issues.apache.org/jira/browse/ARROW-6736 > Project: Apache Arrow > Issue Type: Bug > Components: Rust, Rust - DataFusion >Affects Versions: 0.15.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is a design flaw in the new aggregate expression traits and > implementations where the input to the aggregate expression gets evaluated > against the whole batch once for each row in the batch. For example, if the > batch has 1024 rows then the expression gets evaluated 1024 times instead of > once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3808) [R] Implement [.arrow::Array
[ https://issues.apache.org/jira/browse/ARROW-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-3808: --- Fix Version/s: (was: 0.15.0) 1.0.0 > [R] Implement [.arrow::Array > > > Key: ARROW-3808 > URL: https://issues.apache.org/jira/browse/ARROW-3808 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Romain Francois >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-3808) [R] Implement [.arrow::Array
[ https://issues.apache.org/jira/browse/ARROW-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson resolved ARROW-3808. Fix Version/s: (was: 1.0.0) 0.15.0 Resolution: Fixed Issue resolved by pull request 5531 [https://github.com/apache/arrow/pull/5531] > [R] Implement [.arrow::Array > > > Key: ARROW-3808 > URL: https://issues.apache.org/jira/browse/ARROW-3808 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Romain Francois >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6688) [Packaging] Include s3 support in the conda packages
[ https://issues.apache.org/jira/browse/ARROW-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6688. - Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5484 [https://github.com/apache/arrow/pull/5484] > [Packaging] Include s3 support in the conda packages > - > > Key: ARROW-6688 > URL: https://issues.apache.org/jira/browse/ARROW-6688 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6788) [CI] Migrate Travis CI lint job to GitHub Actions
Wes McKinney created ARROW-6788: --- Summary: [CI] Migrate Travis CI lint job to GitHub Actions Key: ARROW-6788 URL: https://issues.apache.org/jira/browse/ARROW-6788 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration Reporter: Wes McKinney Fix For: 1.0.0 Depends on ARROW-5802. As far as I can tell GitHub Actions jobs run more or less immediately so this will give more prompt feedback to contributors -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6787) [CI] Decommission "C++ with clang 7 and system packages" Travis CI job
Wes McKinney created ARROW-6787: --- Summary: [CI] Decommission "C++ with clang 7 and system packages" Travis CI job Key: ARROW-6787 URL: https://issues.apache.org/jira/browse/ARROW-6787 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 Now that this is running in GitHub Actions, we can probably skip it in Travis CI? Any other barriers to turning this off and saving the CI build time? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6634) [C++] Do not require flatbuffers or flatbuffers_ep to build
[ https://issues.apache.org/jira/browse/ARROW-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6634. - Resolution: Fixed Issue resolved by pull request 5464 [https://github.com/apache/arrow/pull/5464] > [C++] Do not require flatbuffers or flatbuffers_ep to build > --- > > Key: ARROW-6634 > URL: https://issues.apache.org/jira/browse/ARROW-6634 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Flatbuffers is small enough that we can vendor {{flatbuffers/flatbuffers.h}} > and check in the compiled files to make flatbuffers_ep unneeded -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6091) [Rust] [DataFusion] Implement parallel execution for limit
[ https://issues.apache.org/jira/browse/ARROW-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove resolved ARROW-6091. --- Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5509 [https://github.com/apache/arrow/pull/5509] > [Rust] [DataFusion] Implement parallel execution for limit > -- > > Key: ARROW-6091 > URL: https://issues.apache.org/jira/browse/ARROW-6091 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust - DataFusion >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6744) [Rust] Export JsonEqual trait in the array module
[ https://issues.apache.org/jira/browse/ARROW-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paddy Horan resolved ARROW-6744. Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5549 [https://github.com/apache/arrow/pull/5549] > [Rust] Export JsonEqual trait in the array module > - > > Key: ARROW-6744 > URL: https://issues.apache.org/jira/browse/ARROW-6744 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Kyle McCarthy >Assignee: Kyle McCarthy >Priority: Trivial > Labels: easyfix, pull-request-available > Fix For: 1.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > ARROW-5901 added checking for array equality with JSON arrays. This added the > JsonEqual trait bound to the Array trait but it isn't exported making it > private. > The JsonEqual is a public trait, but the equal module is private and the > JsonEqual trait isn't exported like the ArrayEqual trait. > AFAIK this makes it impossible to implement your own arrays that are bound by > the Array trait. > I suggest that JsonEqual is exported with pub use like the ArrayEqual trait > from the array module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6744) [Rust] Export JsonEqual trait in the array module
[ https://issues.apache.org/jira/browse/ARROW-6744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paddy Horan reassigned ARROW-6744: -- Assignee: Kyle McCarthy > [Rust] Export JsonEqual trait in the array module > - > > Key: ARROW-6744 > URL: https://issues.apache.org/jira/browse/ARROW-6744 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Kyle McCarthy >Assignee: Kyle McCarthy >Priority: Trivial > Labels: easyfix, pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > ARROW-5901 added checking for array equality with JSON arrays. This added the > JsonEqual trait bound to the Array trait but it isn't exported making it > private. > The JsonEqual is a public trait, but the equal module is private and the > JsonEqual trait isn't exported like the ArrayEqual trait. > AFAIK this makes it impossible to implement your own arrays that are bound by > the Array trait. > I suggest that JsonEqual is exported with pub use like the ArrayEqual trait > from the array module. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6681) [C# -> R] - Record Batches in reverse order?
[ https://issues.apache.org/jira/browse/ARROW-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6681: -- Labels: pull-request-available (was: ) > [C# -> R] - Record Batches in reverse order? > > > Key: ARROW-6681 > URL: https://issues.apache.org/jira/browse/ARROW-6681 > Project: Apache Arrow > Issue Type: Bug > Components: C#, R >Affects Versions: 0.14.1 >Reporter: Anthony Abate >Priority: Minor > Labels: pull-request-available > > Are 'RecordBatches' being in C# being written in reverse order? > I made a simple test which creates a single row per record batch of 0 to 99 > and attempted to read this in R. To my surprise batch(0) in R had the value > 99 not 0 > This may not seem like a big deal, however when dealing with 'huge' files, > its more efficient to use Record Batches / index lookup than attempting to > load the entire file into memory. > Having the order consistent within the different language / API seems only to > make sense - for now I can work around this by reversing the order before > writing. > > https://github.com/apache/arrow/issues/5475 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6580) [Java] Support comparison for unsigned integers
[ https://issues.apache.org/jira/browse/ARROW-6580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6580: -- Labels: pull-request-available (was: ) > [Java] Support comparison for unsigned integers > --- > > Key: ARROW-6580 > URL: https://issues.apache.org/jira/browse/ARROW-6580 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Minor > Labels: pull-request-available > > In this issue, we support the comparison of unsigned integer vectors, > including UInt1Vector, UInt2Vector, UInt4Vector, and UInt8Vector. > With support for comparison for these vectors, the sort for them is also > supported automatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6757) [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access violation" with Visual Studio 2017
[ https://issues.apache.org/jira/browse/ARROW-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney closed ARROW-6757. --- Resolution: Cannot Reproduce I was having other problems with my Miniconda -- I installed a new Miniconda, re-bootstrapped the dev environment, and then was not able to reproduce > [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access > violation" with Visual Studio 2017 > --- > > Key: ARROW-6757 > URL: https://issues.apache.org/jira/browse/ARROW-6757 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > > I encountered this when trying to verify the release with MSVC 2017. It may > be particular to this machine or build (though it's 100% reproducible for > me). I will check the Windows wheels to see if it occurs there, too > {code} > (C:\tmp\arrow-verify-release\conda-env) λ python > Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) > [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow.csv as pc > >>> pc.ParseOptions() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6757) [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access violation" with Visual Studio 2017
[ https://issues.apache.org/jira/browse/ARROW-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-6757: Fix Version/s: (was: 1.0.0) > [Python] Creating csv.ParseOptions() causes "Windows fatal exception: access > violation" with Visual Studio 2017 > --- > > Key: ARROW-6757 > URL: https://issues.apache.org/jira/browse/ARROW-6757 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Wes McKinney >Priority: Major > > I encountered this when trying to verify the release with MSVC 2017. It may > be particular to this machine or build (though it's 100% reproducible for > me). I will check the Windows wheels to see if it occurs there, too > {code} > (C:\tmp\arrow-verify-release\conda-env) λ python > Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 22:01:29) > [MSC v.1900 64 bit (AMD64)] :: Anaconda, Inc. on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow.csv as pc > >>> pc.ParseOptions() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6786) [C++] arrow-dataset-file-parquet-test is slow
Antoine Pitrou created ARROW-6786: - Summary: [C++] arrow-dataset-file-parquet-test is slow Key: ARROW-6786 URL: https://issues.apache.org/jira/browse/ARROW-6786 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou It takes 15 seconds in debug mode (probably more with ASAN / UBSAN /etc.) to run 2 tests that simply iterated through a generated in-memory dataset: {code} $ ./build-test/debug/arrow-dataset-file-parquet-test Running main() from /home/conda/feedstock_root/build_artifacts/gtest_1551008230529/work/googletest/src/gtest_main.cc [==] Running 2 tests from 1 test case. [--] Global test environment set-up. [--] 2 tests from TestParquetFileFormat [ RUN ] TestParquetFileFormat.ScanRecordBatchReader [ OK ] TestParquetFileFormat.ScanRecordBatchReader (7338 ms) [ RUN ] TestParquetFileFormat.Inspect [ OK ] TestParquetFileFormat.Inspect (6222 ms) [--] 2 tests from TestParquetFileFormat (13560 ms total) [--] Global test environment tear-down [==] 2 tests from 1 test case ran. (13560 ms total) [ PASSED ] 2 tests. {code} Unless it is stressing something in particular, the number of repetitions or the batch size can probably be reduced dramatically. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6785) [JS] Remove superfluous child assignment
[ https://issues.apache.org/jira/browse/ARROW-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6785. - Resolution: Fixed Issue resolved by pull request 5394 [https://github.com/apache/arrow/pull/5394] > [JS] Remove superfluous child assignment > > > Key: ARROW-6785 > URL: https://issues.apache.org/jira/browse/ARROW-6785 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Assignee: Adam M Krebs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Per PR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6785) [JS] Remove superfluous child assignment
[ https://issues.apache.org/jira/browse/ARROW-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-6785: --- Assignee: Adam M Krebs > [JS] Remove superfluous child assignment > > > Key: ARROW-6785 > URL: https://issues.apache.org/jira/browse/ARROW-6785 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Assignee: Adam M Krebs >Priority: Major > Fix For: 1.0.0 > > > Per PR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6785) [JS] Remove superfluous child assignment
[ https://issues.apache.org/jira/browse/ARROW-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6785: -- Labels: pull-request-available (was: ) > [JS] Remove superfluous child assignment > > > Key: ARROW-6785 > URL: https://issues.apache.org/jira/browse/ARROW-6785 > Project: Apache Arrow > Issue Type: Bug > Components: JavaScript >Reporter: Wes McKinney >Assignee: Adam M Krebs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > Per PR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6785) [JS] Remove superfluous child assignment
Wes McKinney created ARROW-6785: --- Summary: [JS] Remove superfluous child assignment Key: ARROW-6785 URL: https://issues.apache.org/jira/browse/ARROW-6785 Project: Apache Arrow Issue Type: Bug Components: JavaScript Reporter: Wes McKinney Fix For: 1.0.0 Per PR -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6764) [C++] Add readahead iterator
[ https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6764: -- Labels: pull-request-available (was: ) > [C++] Add readahead iterator > > > Key: ARROW-6764 > URL: https://issues.apache.org/jira/browse/ARROW-6764 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > > This could replace the current ad-hoc ReadaheadSpooler, at least for JSON. > CSV currently uses non-zero padding, but it could switch to the same strategy > as JSON (i.e. keep track of partial / completion blocks). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1900) [C++] Add kernel functions for determining value range (maximum and minimum) of integer arrays
[ https://issues.apache.org/jira/browse/ARROW-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943794#comment-16943794 ] Neal Richardson commented on ARROW-1900: This would have been helpful in ARROW-3808, not as an optimization but because I literally wanted the min and max of an integer array. > [C++] Add kernel functions for determining value range (maximum and minimum) > of integer arrays > -- > > Key: ARROW-1900 > URL: https://issues.apache.org/jira/browse/ARROW-1900 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: Analytics > Fix For: 1.0.0 > > > These functions can be useful internally for determining when a "small range" > alternative to a hash table can be used for integer arrays. The maximum and > minimum is determined in a single scan. > We already have infrastructure for aggregate kernels, so this would be an > easy addition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6784) [C++][R] Move filter, take, select C++ code from Rcpp to C++ library
Neal Richardson created ARROW-6784: -- Summary: [C++][R] Move filter, take, select C++ code from Rcpp to C++ library Key: ARROW-6784 URL: https://issues.apache.org/jira/browse/ARROW-6784 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Neal Richardson Fix For: 1.0.0 Followup to ARROW-3808 and some other previous work. Of particular interest: * Filter and Take methods for ChunkedArray, in r/src/compute.cpp * Methods for that and some other things that apply Array and ChunkedArray methods across the columns of a RecordBatch or Table, respectively * RecordBatch__select and Table__select to take columns -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6771) [Packaging][Python] Missing pytest dependency from conda and wheel builds
[ https://issues.apache.org/jira/browse/ARROW-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-6771. - Fix Version/s: (was: 1.0.0) 0.15.0 Resolution: Fixed Issue resolved by pull request 5569 [https://github.com/apache/arrow/pull/5569] > [Packaging][Python] Missing pytest dependency from conda and wheel builds > - > > Key: ARROW-6771 > URL: https://issues.apache.org/jira/browse/ARROW-6771 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Multiple python packaging nightlies are failing: > {code} > Failed Tasks: > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py36 > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py37 > - conda-win-vs2015-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py36 > - wheel-manylinux1-cp27mu: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux1-cp27mu > - conda-linux-gcc-py27: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py27 > - wheel-osx-cp27m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-osx-cp27m > - docker-spark-integration: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-circle-docker-spark-integration > - wheel-win-cp35m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp35m > - conda-win-vs2015-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-win-vs2015-py37 > - conda-linux-gcc-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py37 > - wheel-manylinux2010-cp27mu: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-wheel-manylinux2010-cp27mu > - conda-linux-gcc-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-linux-gcc-py36 > - wheel-win-cp37m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp37m > - wheel-win-cp36m: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-appveyor-wheel-win-cp36m > - gandiva-jar-osx: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-travis-gandiva-jar-osx > - conda-osx-clang-py27: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-02-0-azure-conda-osx-clang-py27 > {code} > Because of missing, recently introduced pytest-lazy-fixture test dependency: > {code} > + pytest -m 'not requires_testing_data' --pyargs pyarrow > = test session starts > == > platform linux -- Python 3.7.3, pytest-5.2.0, py-1.8.0, pluggy-0.13.0 > hypothesis profile 'default' -> > database=DirectoryBasedExampleDatabase('$SRC_DIR/.hypothesis/examples') > rootdir: $SRC_DIR > plugins: hypothesis-4.38.1 > collected 1437 items / 1 errors / 3 deselected / 5 skipped / 1428 selected > ERRORS > > __ ERROR collecting tests/test_fs.py > ___ > ../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/python3.7/site-packages/pyarrow/tests/test_fs.py:91: > in > pytest.lazy_fixture('localfs'), > E AttributeError: module 'pytest' has no attribute 'lazy_fixture' > === warnings summary > === > $PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324 > $PREFIX/lib/python3.7/site-packages/_pytest/mark/structures.py:324: > PytestUnknownMarkWarning: Unknown pytest.mark.s3 - is this a typo? You > can register custom marks to avoid this warning - for details, see > https://docs.pytest.org/en/latest/mark.html > PytestUnknownMarkWarning, > -- Docs: https://docs.pytest.org/en/latest/warnings.html > !!! Interrupted: 1 errors during collection >
[jira] [Updated] (ARROW-6764) [C++] Add readahead iterator
[ https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6764: -- Description: This could replace the current ad-hoc ReadaheadSpooler, at least for JSON. CSV currently uses non-zero padding, but it could switch to the same strategy as JSON (i.e. keep track of partial / completion blocks). was: The current implementation is very ad-hoc and allows unused padding arguments. We could refactor it using the Iterator facility. > [C++] Add readahead iterator > > > Key: ARROW-6764 > URL: https://issues.apache.org/jira/browse/ARROW-6764 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > > This could replace the current ad-hoc ReadaheadSpooler, at least for JSON. > CSV currently uses non-zero padding, but it could switch to the same strategy > as JSON (i.e. keep track of partial / completion blocks). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6494) [C++][Dataset] Implement basic PartitionScheme
[ https://issues.apache.org/jira/browse/ARROW-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-6494. - Fix Version/s: 0.15.0 Resolution: Fixed Issue resolved by pull request 5443 [https://github.com/apache/arrow/pull/5443] > [C++][Dataset] Implement basic PartitionScheme > -- > > Key: ARROW-6494 > URL: https://issues.apache.org/jira/browse/ARROW-6494 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Ben Kietzman >Assignee: Ben Kietzman >Priority: Major > Labels: dataset, pull-request-available > Fix For: 0.15.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > The PartitionScheme interface parses paths and yields the partition > expressions which are encoded in those paths. For example, the Hive partition > scheme would yield {{"a"_ = 2 and "b"_ = 3}} from "a=2/b=3/*.parquet". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6764) [C++] Add readahead iterator
[ https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6764: -- Summary: [C++] Add readahead iterator (was: [C++] Simplify readahead implementation) > [C++] Add readahead iterator > > > Key: ARROW-6764 > URL: https://issues.apache.org/jira/browse/ARROW-6764 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > > The current implementation is very ad-hoc and allows unused padding arguments. > We could refactor it using the Iterator facility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5611) [C++] Improve clang-tidy speed
[ https://issues.apache.org/jira/browse/ARROW-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943757#comment-16943757 ] Francois Saint-Jacques commented on ARROW-5611: --- One major win would be to scope only to modified files (in the current branch) instead of the whole directory, like iwyu wrapper is doing. > [C++] Improve clang-tidy speed > -- > > Key: ARROW-5611 > URL: https://issues.apache.org/jira/browse/ARROW-5611 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Francois Saint-Jacques >Priority: Minor > Fix For: 1.0.0 > > > See https://github.com/apache/arrow/pull/4293#issuecomment-501950675 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6774) [Rust] Reading parquet file is slow
[ https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943755#comment-16943755 ] Adam Lippai commented on ARROW-6774: I've seen some nice work in [https://github.com/apache/arrow/blob/master/rust/parquet/src/column/reader.rs] and [https://github.com/apache/arrow/blob/master/rust/parquet/src/arrow/array_reader.rs] but I couldn't figure it out how to use it. [~liurenjie1024] can you help me perhaps? > [Rust] Reading parquet file is slow > --- > > Key: ARROW-6774 > URL: https://issues.apache.org/jira/browse/ARROW-6774 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.15.0 >Reporter: Adam Lippai >Priority: Major > > Using the example at > [https://github.com/apache/arrow/tree/master/rust/parquet] is slow. > The snippet > {code:none} > let reader = SerializedFileReader::new(file).unwrap(); > let mut iter = reader.get_row_iter(None).unwrap(); > let start = Instant::now(); > while let Some(record) = iter.next() {} > let duration = start.elapsed(); > println!("{:?}", duration); > {code} > Runs for 17sec for a ~160MB parquet file. > If there is a more effective way to load a parquet file, it would be nice to > add it to the readme. > P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6783) [C++] Provide API for reconstruction of RecordBatch from Flatbuffer containing process memory addresses instead of relative offsets into an IPC message
Wes McKinney created ARROW-6783: --- Summary: [C++] Provide API for reconstruction of RecordBatch from Flatbuffer containing process memory addresses instead of relative offsets into an IPC message Key: ARROW-6783 URL: https://issues.apache.org/jira/browse/ARROW-6783 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 A lot of our development has focused on _inter_process communication rather than _in_process. We should start by making sure we have disassembly and reassembly implemented where the Buffer Flatbuffers values contain process memory addresses rather than offsets. This may require a bit of refactoring so we can use the same reassembly code path for both use cases -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6782) [C++] Build minimal core Arrow libraries without any Boost headers
Wes McKinney created ARROW-6782: --- Summary: [C++] Build minimal core Arrow libraries without any Boost headers Key: ARROW-6782 URL: https://issues.apache.org/jira/browse/ARROW-6782 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 We have a couple of places where these are used. It would be good to be able to build without any Boost headers available -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6613) [C++] Remove dependency on boost::filesystem
[ https://issues.apache.org/jira/browse/ARROW-6613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-6613. --- Fix Version/s: (was: 1.0.0) 0.15.0 Resolution: Fixed Issue resolved by pull request 5545 [https://github.com/apache/arrow/pull/5545] > [C++] Remove dependency on boost::filesystem > > > Key: ARROW-6613 > URL: https://issues.apache.org/jira/browse/ARROW-6613 > Project: Apache Arrow > Issue Type: Wish > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 4h > Remaining Estimate: 0h > > See ARROW-2196 for details. > boost::filesystem should not be required for base functionality at least > (including filesystems, probably). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6774) [Rust] Reading parquet file is slow
[ https://issues.apache.org/jira/browse/ARROW-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943703#comment-16943703 ] Wes McKinney commented on ARROW-6774: - Row-by-row iteration is going to be slow compared with vectorized / column-by-column reads. This unfinished PR was related to this (I think?) but there are Arrow-based readers available that don't require it https://github.com/apache/arrow/pull/3461 > [Rust] Reading parquet file is slow > --- > > Key: ARROW-6774 > URL: https://issues.apache.org/jira/browse/ARROW-6774 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.15.0 >Reporter: Adam Lippai >Priority: Major > > Using the example at > [https://github.com/apache/arrow/tree/master/rust/parquet] is slow. > The snippet > {code:none} > let reader = SerializedFileReader::new(file).unwrap(); > let mut iter = reader.get_row_iter(None).unwrap(); > let start = Instant::now(); > while let Some(record) = iter.next() {} > let duration = start.elapsed(); > println!("{:?}", duration); > {code} > Runs for 17sec for a ~160MB parquet file. > If there is a more effective way to load a parquet file, it would be nice to > add it to the readme. > P.S.: My goal is to construct an ndarray from it, I'd be happy for any tips. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16943671#comment-16943671 ] Joris Van den Bossche commented on ARROW-2428: -- More thoughts on this? I implemented a POC for case 1 described above in https://github.com/apache/arrow/pull/5512 This allows to roundtrip pandas ExtensionArrays, assuming the the pandas.ExtensionDtype implements a {{\_\_from_arrow\_\_}} to convert an Arrow array into a pandas ExtensionArray of that dtype (so it can be put in the resulting DataFrame as an extension array). It doesn't yet handle to other cases described above, though. > [Python] Add API to map Arrow types (including extension types) to pandas > ExtensionArray instances for to_pandas conversions > > > Key: ARROW-2428 > URL: https://issues.apache.org/jira/browse/ARROW-2428 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Uwe Korn >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > With the next release of Pandas, it will be possible to define custom column > types that back a {{pandas.Series}}. Thus we will not be able to cover all > possible column types in the {{to_pandas}} conversion by default as we won't > be aware of all extension arrays. > To enable users to create {{ExtensionArray}} instances from Arrow columns in > the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} > call where they can overload the default conversion routines with the ones > that produce their {{ExtensionArray}} instances. > This should avoid additional copies in the case where we would nowadays first > convert the Arrow column into a default Pandas column (probably of object > type) and the user would afterwards convert it to a more efficient > {{ExtensionArray}}. This hook here will be especially useful when you build > {{ExtensionArrays}} where the storage is backed by Arrow. > The meta-issue that tracks the implementation inside of Pandas is: > https://github.com/pandas-dev/pandas/issues/19696 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6781) [C++] Improve and consolidate ARROW_CHECK, DCHECK macros
Ben Kietzman created ARROW-6781: --- Summary: [C++] Improve and consolidate ARROW_CHECK, DCHECK macros Key: ARROW-6781 URL: https://issues.apache.org/jira/browse/ARROW-6781 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Ben Kietzman Assignee: Ben Kietzman Currently we have multiple macros like {{DCHECK_EQ}} and {{DCHECK_LT}} which check various comparisons but don't report anything about their operands. Furthermore, the "stream to assertion" pattern for appending extra info has proven fragile. I propose a new unified macro which can capture operands of comparisons and report them: {code:cpp} int three = 3; int five = 5; DCHECK(three == five, "extra: ", 1, 2, five); {code} Results in check failure messages like: {code} F1003 11:12:46.174767 4166 logging_test.cc:141] Check failed: three == five LHS: 3 RHS: 5 extra: 125 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6762) [C++] JSON reader segfaults on newline
[ https://issues.apache.org/jira/browse/ARROW-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-6762. --- Fix Version/s: 0.15.0 Resolution: Fixed Issue resolved by pull request 5564 [https://github.com/apache/arrow/pull/5564] > [C++] JSON reader segfaults on newline > -- > > Key: ARROW-6762 > URL: https://issues.apache.org/jira/browse/ARROW-6762 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Joris Van den Bossche >Assignee: Antoine Pitrou >Priority: Major > Labels: json, pull-request-available > Fix For: 0.15.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Using the {{SampleRecord.jl}} attachment from ARROW-6737, I notice that > trying to read this file on master results in a segfault: > {code} > In [1]: from pyarrow import json >...: import pyarrow.parquet as pq >...: >...: r = json.read_json('SampleRecord.jl') > WARNING: Logging before InitGoogleLogging() is written to STDERR > F1002 09:56:55.362766 13035 reader.cc:93] Check failed: > (string_view(*next_partial).find_first_not_of(" \t\n\r")) == > (string_view::npos) > *** Check failure stack trace: *** > Aborted (core dumped) > {code} > while with 0.14.1 this works fine: > {code} > In [24]: from pyarrow import json > ...: import pyarrow.parquet as pq > ...: > ...: r = json.read_json('SampleRecord.jl') > > > In [25]: r > > > Out[25]: > pyarrow.Table > _type: string > provider_name: string > arrival: timestamp[s] > berthed: timestamp[s] > berth: null > cargoes: list volume_unit: string, buyer: null, seller: null>> > child 0, item: struct volume_unit: string, buyer: null, seller: null> > child 0, movement: string > child 1, product: string > child 2, volume: string > child 3, volume_unit: string > child 4, buyer: null > child 5, seller: null > departure: timestamp[s] > eta: null > installation: null > port_name: string > next_zone: null > reported_date: timestamp[s] > shipping_agent: null > vessel: struct null, dwt: null, flag_code: null, flag_name: null, gross_tonnage: null, imo: > string, length: int64, mmsi: null, name: string, type: null, vessel_type: > null> > child 0, beam: null > child 1, build_year: null > child 2, call_sign: null > child 3, dead_weight: null > child 4, dwt: null > child 5, flag_code: null > child 6, flag_name: null > child 7, gross_tonnage: null > child 8, imo: string > child 9, length: int64 > child 10, mmsi: null > child 11, name: string > child 12, type: null > child 13, vessel_type: null > In [26]: pa.__version__ > > > Out[26]: '0.14.1' > {code} > cc [~apitrou] [~bkietz] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6764) [C++] Simplify readahead implementation
[ https://issues.apache.org/jira/browse/ARROW-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-6764: - Assignee: Antoine Pitrou > [C++] Simplify readahead implementation > --- > > Key: ARROW-6764 > URL: https://issues.apache.org/jira/browse/ARROW-6764 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > > The current implementation is very ad-hoc and allows unused padding arguments. > We could refactor it using the Iterator facility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6780) [C++][Parquet] Support DurationType in writing/reading parquet
Joris Van den Bossche created ARROW-6780: Summary: [C++][Parquet] Support DurationType in writing/reading parquet Key: ARROW-6780 URL: https://issues.apache.org/jira/browse/ARROW-6780 Project: Apache Arrow Issue Type: Improvement Reporter: Joris Van den Bossche Currently this is not supported: {code} In [37]: table = pa.table({'a': pa.array([1, 2], pa.duration('s'))}) In [39]: table Out[39]: pyarrow.Table a: duration[s] In [41]: pq.write_table(table, 'test_duration.parquet') ... ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: duration[s] {code} There is no direct mapping to Parquet logical types. There is an INTERVAL type, but this more matches Arrow's ( YEAR_MONTH or DAY_TIME) interval type. But, those duration values could be stored as just integers, and based on the serialized arrow schema, it could be restored when reading back in. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6779) [Python] Conversion from datetime.datetime to timstamp('ns') can overflow
Joris Van den Bossche created ARROW-6779: Summary: [Python] Conversion from datetime.datetime to timstamp('ns') can overflow Key: ARROW-6779 URL: https://issues.apache.org/jira/browse/ARROW-6779 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche In the python conversion of datetime scalars, there is no check for integer overflow: {code} In [32]: pa.array([datetime.datetime(3000, 1, 1)], pa.timestamp('ns')) Out[32]: [ 1830-11-23 00:50:52.580896768 ] {code} So in case the target type has nanosecond unit, this can give wrong results (I don't think the other resolutions can reach overflow, given the limited range of years of datetime.datetime). We should probably check for this case and raise an error. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6778) [C++] Support DurationType in Cast kernel
[ https://issues.apache.org/jira/browse/ARROW-6778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6778: - Description: Currently, duration is not yet supported in basic cast operations (using the python binding from ARROW-5855, currently from my branch, not yet merged): {code} In [25]: arr = pa.array([1, 2]) In [26]: arr.cast(pa.duration('s')) ... ArrowNotImplementedError: No cast implemented from int64 to duration[s] In [27]: arr = pa.array([1, 2], pa.duration('s')) In [28]: arr.cast(pa.duration('ms')) ... ArrowNotImplementedError: No cast implemented from duration[s] to duration[ms] {code} > [C++] Support DurationType in Cast kernel > - > > Key: ARROW-6778 > URL: https://issues.apache.org/jira/browse/ARROW-6778 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Joris Van den Bossche >Priority: Major > > Currently, duration is not yet supported in basic cast operations (using the > python binding from ARROW-5855, currently from my branch, not yet merged): > {code} > In [25]: arr = pa.array([1, 2]) > In [26]: arr.cast(pa.duration('s')) > ... > ArrowNotImplementedError: No cast implemented from int64 to duration[s] > In [27]: arr = pa.array([1, 2], pa.duration('s')) > In [28]: arr.cast(pa.duration('ms')) > ... > ArrowNotImplementedError: No cast implemented from duration[s] to duration[ms] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6778) [C++] Support DurationType in Cast kernel
Joris Van den Bossche created ARROW-6778: Summary: [C++] Support DurationType in Cast kernel Key: ARROW-6778 URL: https://issues.apache.org/jira/browse/ARROW-6778 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Joris Van den Bossche -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice
[ https://issues.apache.org/jira/browse/ARROW-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6773: -- Labels: pull-request-available (was: ) > [C++] Filter kernel returns invalid data when filtering with an Array slice > --- > > Key: ARROW-6773 > URL: https://issues.apache.org/jira/browse/ARROW-6773 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > See ARROW-3808. This failing test reproduces the issue: > {code:java} > --- a/cpp/src/arrow/compute/kernels/filter_test.cc > +++ b/cpp/src/arrow/compute/kernels/filter_test.cc > @@ -151,6 +151,12 @@ TYPED_TEST(TestFilterKernelWithNumeric, FilterNumeric) { >this->AssertFilter("[7, 8, 9]", "[null, 1, 0]", "[null, 8]"); >this->AssertFilter("[7, 8, 9]", "[1, null, 1]", "[7, null, 9]"); > > + this->AssertFilterArrays( > +ArrayFromJSON(this->type_singleton(), "[7, 8, 9]"), > +ArrayFromJSON(boolean(), "[0, 1, 1, 1, 0, 1]")->Slice(3, 3), > +ArrayFromJSON(this->type_singleton(), "[7, 9]") > + ); > + > {code} > {code:java} > arrow/cpp/src/arrow/testing/gtest_util.cc:82: Failure > Failed > @@ -2, +2 @@ > +0 > [ FAILED ] TestFilterKernelWithNumeric/9.FilterNumeric, where TypeParam = > arrow::DoubleType (0 ms) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6773) [C++] Filter kernel returns invalid data when filtering with an Array slice
[ https://issues.apache.org/jira/browse/ARROW-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Kietzman resolved ARROW-6773. - Fix Version/s: (was: 1.0.0) 0.15.0 Resolution: Fixed Issue resolved by pull request 5570 [https://github.com/apache/arrow/pull/5570] > [C++] Filter kernel returns invalid data when filtering with an Array slice > --- > > Key: ARROW-6773 > URL: https://issues.apache.org/jira/browse/ARROW-6773 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 10m > Remaining Estimate: 0h > > See ARROW-3808. This failing test reproduces the issue: > {code:java} > --- a/cpp/src/arrow/compute/kernels/filter_test.cc > +++ b/cpp/src/arrow/compute/kernels/filter_test.cc > @@ -151,6 +151,12 @@ TYPED_TEST(TestFilterKernelWithNumeric, FilterNumeric) { >this->AssertFilter("[7, 8, 9]", "[null, 1, 0]", "[null, 8]"); >this->AssertFilter("[7, 8, 9]", "[1, null, 1]", "[7, null, 9]"); > > + this->AssertFilterArrays( > +ArrayFromJSON(this->type_singleton(), "[7, 8, 9]"), > +ArrayFromJSON(boolean(), "[0, 1, 1, 1, 0, 1]")->Slice(3, 3), > +ArrayFromJSON(this->type_singleton(), "[7, 9]") > + ); > + > {code} > {code:java} > arrow/cpp/src/arrow/testing/gtest_util.cc:82: Failure > Failed > @@ -2, +2 @@ > +0 > [ FAILED ] TestFilterKernelWithNumeric/9.FilterNumeric, where TypeParam = > arrow::DoubleType (0 ms) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6770) [CI][Travis] Download Minio quietly
[ https://issues.apache.org/jira/browse/ARROW-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-6770. Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5568 [https://github.com/apache/arrow/pull/5568] > [CI][Travis] Download Minio quietly > --- > > Key: ARROW-6770 > URL: https://issues.apache.org/jira/browse/ARROW-6770 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > To remove verbose output > https://travis-ci.org/pitrou/arrow/jobs/592577525#L191 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6686) [CI] Pull and push docker images to speed up the nightly builds
[ https://issues.apache.org/jira/browse/ARROW-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-6686. Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5485 [https://github.com/apache/arrow/pull/5485] > [CI] Pull and push docker images to speed up the nightly builds > > > Key: ARROW-6686 > URL: https://issues.apache.org/jira/browse/ARROW-6686 > Project: Apache Arrow > Issue Type: Improvement > Components: CI >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse
[ https://issues.apache.org/jira/browse/ARROW-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs reassigned ARROW-6767: -- Assignee: Taylor Baldwin > [JS] lazily bind batches in scan/scanReverse > > > Key: ARROW-6767 > URL: https://issues.apache.org/jira/browse/ARROW-6767 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Taylor Baldwin >Assignee: Taylor Baldwin >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Call {{bind(batch)}} lazily in {{scan}} and {{scanReverse}}, that is, only > when the predicate has matched a record in a batch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6767) [JS] lazily bind batches in scan/scanReverse
[ https://issues.apache.org/jira/browse/ARROW-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-6767. Fix Version/s: 1.0.0 Resolution: Fixed Issue resolved by pull request 5565 [https://github.com/apache/arrow/pull/5565] > [JS] lazily bind batches in scan/scanReverse > > > Key: ARROW-6767 > URL: https://issues.apache.org/jira/browse/ARROW-6767 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: Taylor Baldwin >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Call {{bind(batch)}} lazily in {{scan}} and {{scanReverse}}, that is, only > when the predicate has matched a record in a batch. -- This message was sent by Atlassian Jira (v8.3.4#803005)