[jira] [Created] (ARROW-7004) [Plasma] Make it possible to bump up object in LRU cache
Philipp Moritz created ARROW-7004: - Summary: [Plasma] Make it possible to bump up object in LRU cache Key: ARROW-7004 URL: https://issues.apache.org/jira/browse/ARROW-7004 Project: Apache Arrow Issue Type: Improvement Components: C++ - Plasma Reporter: Philipp Moritz Assignee: Philipp Moritz To avoid evicting objects too early, we sometimes want to bump up a number of objects up in the LRU cache. While it would be possible to call Get() on these objects, this can be undesirable, since it is blocking on the objects if they are not available and will make it necessary to call Release() on them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)
Hi everyone, I have a question about the state of decimal support in Arrow when reading from/writing to Parquet. * Is writing decimals to parquet supposed to work? Are there any examples on how to do this in C++? * When reading decimals in a parquet file with pyarrow and converting the resulting table to a pandas dataframe, datatype in the cells is "object". As a consequence, performance when doing analysis on this table is suboptimal. Can I somehow directly get the decimals from the parquet file into floats/doubles in a pandas dataframe? Thanks in advance, Roman
[NIGHTLY] Arrow Build Report for Job nightly-2019-10-28-0
Arrow Build Report for Job nightly-2019-10-28-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0 Failed Tasks: - docker-clang-format: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-clang-format - docker-r-sanitizer: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-r-sanitizer Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-centos-8 - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-linux-gcc-py27 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-linux-gcc-py37 - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-osx-clang-py27 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-osx-clang-py37 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-conda-win-vs2015-py37 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-azure-debian-stretch - docker-c_glib: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-c_glib - docker-cpp-cmake32: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-cpp-cmake32 - docker-cpp-release: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-cpp-release - docker-cpp-static-only: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-cpp-static-only - docker-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-cpp - docker-dask-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-dask-integration - docker-docs: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-docs - docker-go: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-go - docker-hdfs-integration: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-hdfs-integration - docker-iwyu: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-iwyu - docker-java: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-java - docker-js: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-js - docker-lint: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-lint - docker-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-pandas-master - docker-python-2.7-nopandas: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-python-2.7-nopandas - docker-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-python-2.7 - docker-python-3.6-nopandas: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-python-3.6-nopandas - docker-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-python-3.6 - docker-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-python-3.7 - docker-r-conda: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-r-conda - docker-r: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-r - docker-rust: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-28-0-circle-docker-rust - docker-spark-integra
[jira] [Created] (ARROW-7005) [Rust] run "cargo audit" in CI
Paddy Horan created ARROW-7005: -- Summary: [Rust] run "cargo audit" in CI Key: ARROW-7005 URL: https://issues.apache.org/jira/browse/ARROW-7005 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Paddy Horan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7006) [Rust] Bump flatbuffers version to avoid vulnerability
Paddy Horan created ARROW-7006: -- Summary: [Rust] Bump flatbuffers version to avoid vulnerability Key: ARROW-7006 URL: https://issues.apache.org/jira/browse/ARROW-7006 Project: Apache Arrow Issue Type: Improvement Affects Versions: 0.15.0 Reporter: Paddy Horan >From GitHub use emilk: [{{cargo audit}}|https://github.com/RustSec/cargo-audit] output: {{ID:RUSTSEC-2019-0028 Crate: flatbuffers Version: 0.5.0 Date:2019-10-20 URL: https://github.com/google/flatbuffers/issues/5530 Title: Unsound `impl Follow for bool`}} The fix should be as simple as editing [https://github.com/apache/arrow/blob/master/rust/arrow/Cargo.toml] from {{flatbuffers = "0.5.0"}} to {{flatbuffers = "0.6.0"}} A more longterm improvement is to add a call to {{cargo audit}} in your CI to catch these problems as early as possible -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7007) [C++] Enable mmap option for LocalFs
Francois Saint-Jacques created ARROW-7007: - Summary: [C++] Enable mmap option for LocalFs Key: ARROW-7007 URL: https://issues.apache.org/jira/browse/ARROW-7007 Project: Apache Arrow Issue Type: Improvement Reporter: Francois Saint-Jacques -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7008) [Python] pyarrow.chunked_array([array]) fails on array with
Uwe Korn created ARROW-7008: --- Summary: [Python] pyarrow.chunked_array([array]) fails on array with Key: ARROW-7008 URL: https://issues.apache.org/jira/browse/ARROW-7008 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.0 Reporter: Uwe Korn Minimal reproducer: {code} import pyarrow as pa pa.chunked_array([pa.array([], type=pa.string()).dictionary_encode().dictionary]) {code} Traceback {code} (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x20) * frame #0: 0x000112cd5d0e libarrow.15.dylib`arrow::Status arrow::internal::ValidateVisitor::ValidateOffsets(arrow::BinaryArray const&) + 94 frame #1: 0x000112cc79a3 libarrow.15.dylib`arrow::Status arrow::VisitArrayInline(arrow::Array const&, arrow::internal::ValidateVisitor*) + 915 frame #2: 0x000112cc747d libarrow.15.dylib`arrow::Array::Validate() const + 829 frame #3: 0x000112e3ea19 libarrow.15.dylib`arrow::ChunkedArray::Validate() const + 89 frame #4: 0x000112b8eb7d lib.cpython-37m-darwin.so`__pyx_pw_7pyarrow_3lib_135chunked_array(_object*, _object*, _object*) + 3661 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7009) [C++] Refactor filter/take kernels to use Datum instead of overloads
Neal Richardson created ARROW-7009: -- Summary: [C++] Refactor filter/take kernels to use Datum instead of overloads Key: ARROW-7009 URL: https://issues.apache.org/jira/browse/ARROW-7009 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Neal Richardson Fix For: 1.0.0 Followup to ARROW-6784. See discussion on [https://github.com/apache/arrow/pull/5686,|https://github.com/apache/arrow/pull/5686] as well as ARROW-6959. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7010) [C++] Support lossy casts from decimal128 to float32 and float64/double
Wes McKinney created ARROW-7010: --- Summary: [C++] Support lossy casts from decimal128 to float32 and float64/double Key: ARROW-7010 URL: https://issues.apache.org/jira/browse/ARROW-7010 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 I do not believe such casts are implemented. This can be helpful for people analyzing data where the precision of decimal128 is not needed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7011) [C++] Implement casts from float/double to decimal128
Wes McKinney created ARROW-7011: --- Summary: [C++] Implement casts from float/double to decimal128 Key: ARROW-7011 URL: https://issues.apache.org/jira/browse/ARROW-7011 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Wes McKinney see also ARROW-5905, ARROW-7010 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy
Neal Richardson created ARROW-7012: -- Summary: [C++] Clarify ChunkedArray chunking strategy and policy Key: ARROW-7012 URL: https://issues.apache.org/jira/browse/ARROW-7012 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Neal Richardson See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. Among the questions: * Do Arrow users control the chunking, or is it an internal implementation detail they should not manage? * If users control it, how do they control it? E.g. if I call Take and use a ChunkedArray for the indices to take, does the chunking follow how the indices are chunked? Or should we attempt to preserve the mapping of data to their chunks in the input table/chunked array? * If it's an implementation detail, what is the optimal chunk size? And when is it worth reshaping (concatenating, slicing) input data to attain this optimal size? -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: State of decimal support in Arrow (from/to Parquet Decimal Logicaltype)
hi Roman, On Mon, Oct 28, 2019 at 5:56 AM wrote: > > Hi everyone, > > > > I have a question about the state of decimal support in Arrow when reading > from/writing to Parquet. > > * Is writing decimals to parquet supposed to work? Are there any > examples on how to do this in C++? Yes, it's supported, the details are here https://github.com/apache/arrow/blob/46cdf557eb710f17f71a10609e5f497ca585ae1c/cpp/src/parquet/column_writer.cc#L1511 > * When reading decimals in a parquet file with pyarrow and converting > the resulting table to a pandas dataframe, datatype in the cells is > "object". As a consequence, performance when doing analysis on this table is > suboptimal. Can I somehow directly get the decimals from the parquet file > into floats/doubles in a pandas dataframe? Some work will be required. The cleanest way would be to cast decimal128 columns to float32/float64 prior to converting to pandas. I didn't see an issue for this right away so I opened https://issues.apache.org/jira/browse/ARROW-7010 I also opened https://issues.apache.org/jira/browse/ARROW-7011 about going the other way. This would be a useful thing to contribute to the project. Thanks Wes > > > Thanks in advance, > > Roman > > >
Re: Achieving parity with Java extension types in Python
Adding dev@ I don't believe we have APIs yet for plugging in user-defined Array subtypes. I assume you've read http://arrow.apache.org/docs/python/extending_types.html#defining-extension-types-user-defined-types There may be some JIRA issues already about this (defining subclasses of pa.Array with custom behavior) -- since Joris has been working on this I'm interested in more comments On Mon, Oct 28, 2019 at 3:56 PM Justin Polchlopek wrote: > > Hi! > > I've been working through understanding extension types in Arrow. It's a > great feature, and I've had no problems getting things working in Java/Scala; > however, Python has been a bit of a different story. Not that I am unable to > create and register extension types in Python, but rather that I can't seem > to recreate the functionality provided by the Java API's ExtensionTypeVector > class. > > In Java, ExtensionType::getNewVector() provides a clear pathway from the > registered type to output a vector in something other than the underlying > vector type, and I am at a loss for how to get this same functionality in > Python. Am I missing something? > > Thanks for any hints. > -Justin
Re: [VOTE] Release Apache Arrow 0.15.1 - RC0
I started looking at some of the Python wheels and found that the macOS Python 3.7 wheel is corrupted. Note that it's only 101KB while the other macOS wheels are ~35MB. Eyeballing the file list at https://bintray.com/apache/arrow/python-rc/0.15.1-rc0#files/python-rc/0.15.1-rc0 it seems this is the only wheel with this issue, but this suggests that we should prioritize some kind of wheel integrity check using Crossbow jobs. An issue for this is https://issues.apache.org/jira/browse/ARROW-2880 I'm going to check out some other wheels to see if they are OK, but maybe just this one wheel can be regenerated? On Sun, Oct 27, 2019 at 4:31 PM Sutou Kouhei wrote: > > +1 (binding) > > I ran the followings on Debian GNU/Linux sid: > > * TEST_CSHARP=0 \ > JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 \ > CUDA_TOOLKIT_ROOT=/usr \ > dev/release/verify-release-candidate.sh source 0.15.1 0 > * dev/release/verify-release-candidate.sh binaries 0.15.1 0 > > with: > > * gcc (Debian 9.2.1-8) 9.2.1 20190909 > * openjdk version "1.8.0_232-ea" > * Node.JS v12.1.0 > * go version go1.12.10 linux/amd64 > * nvidia-cuda-dev 10.1.105-3+b1 > > Notes: > > * C# sourcelink is failed as usual. > > * We can't use dev/release/verify-release-candidate.sh on > master to verify source because it depends on the latest > archery. We need to use > dev/release/verify-release-candidate.sh in 0.15.1. > > > Thanks, > -- > kou > > In > "[VOTE] Release Apache Arrow 0.15.1 - RC0" on Fri, 25 Oct 2019 20:43:07 > +0200, > Krisztián Szűcs wrote: > > > Hi, > > > > I would like to propose the following release candidate (RC0) of Apache > > Arrow version 0.15.1. This is a patch release consisting of 36 resolved > > JIRA issues[1]. > > > > This release candidate is based on commit: > > b789226ccb2124285792107c758bb3b40b3d082a [2] > > > > The source release rc0 is hosted at [3]. > > The binary artifacts are hosted at [4][5][6][7]. > > The changelog is located at [8]. > > > > Please download, verify checksums and signatures, run the unit tests, > > and vote on the release. See [9] for how to validate a release candidate. > > > > The vote will be open for at least 72 hours. > > > > [ ] +1 Release this as Apache Arrow 0.15.1 > > [ ] +0 > > [ ] -1 Do not release this as Apache Arrow 0.15.1 because... > > > > [1]: > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20fixVersion%20%3D%200.15.1 > > [2]: > > https://github.com/apache/arrow/tree/b789226ccb2124285792107c758bb3b40b3d082a > > [3]: https://dist.apache.org/repos/dist/dev/arrow/apache-arrow-0.15.1-rc0 > > [4]: https://bintray.com/apache/arrow/centos-rc/0.15.1-rc0 > > [5]: https://bintray.com/apache/arrow/debian-rc/0.15.1-rc0 > > [6]: https://bintray.com/apache/arrow/python-rc/0.15.1-rc0 > > [7]: https://bintray.com/apache/arrow/ubuntu-rc/0.15.1-rc0 > > [8]: > > https://github.com/apache/arrow/blob/b789226ccb2124285792107c758bb3b40b3d082a/CHANGELOG.md > > [9]: > > https://cwiki.apache.org/confluence/display/ARROW/How+to+Verify+Release+Candidates
[jira] [Created] (ARROW-7013) [C++] arrow-dataset pkgconfig is incomplete
Neal Richardson created ARROW-7013: -- Summary: [C++] arrow-dataset pkgconfig is incomplete Key: ARROW-7013 URL: https://issues.apache.org/jira/browse/ARROW-7013 Project: Apache Arrow Issue Type: Bug Components: C++ - Dataset Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 Unlike the other *.pc.in files, it doesn't include a {{Libs}} field, so passing the result of what is found by pkgconfig results in the lib still not being found. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7014) [Developer] Write script to verify Linux wheels given local environment with conda or virtualenv
Wes McKinney created ARROW-7014: --- Summary: [Developer] Write script to verify Linux wheels given local environment with conda or virtualenv Key: ARROW-7014 URL: https://issues.apache.org/jira/browse/ARROW-7014 Project: Apache Arrow Issue Type: New Feature Components: Developer Tools, Python Reporter: Wes McKinney Fix For: 1.0.0 Facilitate testing RC wheels. Also test checksum and sig -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7015) [Developer] Write script to verify macOS wheels given local environment with conda or virtualenv
Wes McKinney created ARROW-7015: --- Summary: [Developer] Write script to verify macOS wheels given local environment with conda or virtualenv Key: ARROW-7015 URL: https://issues.apache.org/jira/browse/ARROW-7015 Project: Apache Arrow Issue Type: New Feature Components: Developer Tools, Python Reporter: Wes McKinney Fix For: 1.0.0 macOS analogue to ARROW-7014 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7016) [Developer][Python] Write script to verify Windows wheels given local environment with conda
Wes McKinney created ARROW-7016: --- Summary: [Developer][Python] Write script to verify Windows wheels given local environment with conda Key: ARROW-7016 URL: https://issues.apache.org/jira/browse/ARROW-7016 Project: Apache Arrow Issue Type: New Feature Components: Developer Tools, Python Reporter: Wes McKinney Fix For: 1.0.0 Windows version of ARROW-7014 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-7017) [C++] Refactor AddKernel to support other operations and types
Francois Saint-Jacques created ARROW-7017: - Summary: [C++] Refactor AddKernel to support other operations and types Key: ARROW-7017 URL: https://issues.apache.org/jira/browse/ARROW-7017 Project: Apache Arrow Issue Type: Improvement Components: C++ - Compute Reporter: Francois Saint-Jacques * Should avoid using builders (and/or NULLs) since the output shape is known a compute time. * Should be refatored to support other operations, e.g. Substraction, Multiplication. * Should have a overflow, underflow detection mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)