[jira] [Updated] (ARROW-9127) [Rust] Update thirft library dependencies
[ https://issues.apache.org/jira/browse/ARROW-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9127: -- Labels: pull-request-available (was: ) > [Rust] Update thirft library dependencies > - > > Key: ARROW-9127 > URL: https://issues.apache.org/jira/browse/ARROW-9127 > Project: Apache Arrow > Issue Type: Bug >Reporter: Andrew Lamb >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Update to latest version of apache thrift (1.3) > > Rationale: > We were trying to update the version of `byteorder` that an internal project > used, but arrow/parquet -> depends on parquet-format-rs -> depends on thrift. > > [~sunchao] recently updated the thrift-pin in parquet-format in > [https://github.com/apache/arrow/pull/6626,] so now it is possible to update > the thrift version here as well > > The thrift dependency was postponed when the dependencies were last updated. > See: > [https://github.com/apache/arrow/pull/6626] > https://issues.apache.org/jira/browse/ARROW-8124 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9126) [C++] Trimmed Boost bundle fails to build on Windows
[ https://issues.apache.org/jira/browse/ARROW-9126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9126: -- Labels: pull-request-available (was: ) > [C++] Trimmed Boost bundle fails to build on Windows > > > Key: ARROW-9126 > URL: https://issues.apache.org/jira/browse/ARROW-9126 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Cuong Nguyen >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Build with the following commands > {code:java} > mkdir build > cd build > cmake .. -DARROW_PARQUET=ON > cmake --build .{code} > Error from build log > {code:java} > .\boost/graph/two_bit_color_map.hpp(106): fatal error C1083: Cannot open > include file: 'boost/graph/detail/empty_header.hpp': No such file or directory > {code} > This was because configuring Boost to build a subset of libraries doesn't > work on Windows as it does on Linux. As a result, all libraries, including > those being trimmed, were built: > {code:java} > Component configuration: > - atomic : building > - chrono : building > - container : building > - date_time : building > - exception : building > - filesystem : building > - headers : building > - iostreams : building > - locale : building > - log : building > - mpi : building > - program_options : building > - python : building > - random : building > - regex : building > - serialization : building > - system : building > - test : building > - thread : building > - timer : building > - wave : building > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9125) [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
[ https://issues.apache.org/jira/browse/ARROW-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9125: -- Labels: pull-request-available (was: ) > [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind > > > Key: ARROW-9125 > URL: https://issues.apache.org/jira/browse/ARROW-9125 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9124) [Rust][Datafusion] DFParser should consume sql query as &str instead of String
[ https://issues.apache.org/jira/browse/ARROW-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9124: -- Labels: pull-request-available (was: ) > [Rust][Datafusion] DFParser should consume sql query as &str instead of String > -- > > Key: ARROW-9124 > URL: https://issues.apache.org/jira/browse/ARROW-9124 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It's more efficient to use &str instead of String -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9123) [Python][wheel] Use libzstd.a explicitly
[ https://issues.apache.org/jira/browse/ARROW-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9123: -- Labels: pull-request-available (was: ) > [Python][wheel] Use libzstd.a explicitly > > > Key: ARROW-9123 > URL: https://issues.apache.org/jira/browse/ARROW-9123 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {{ARROW_ZSTD_USE_SHARED}} is introduced by ARROW-9084. We need to set > {{ARROW_ZSTD_USE_SHARED=OFF}} explicitly to use static zstd library. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9116) [C++] Add BinaryArray::total_values_length()
[ https://issues.apache.org/jira/browse/ARROW-9116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9116: -- Labels: pull-request-available (was: ) > [C++] Add BinaryArray::total_values_length() > > > Key: ARROW-9116 > URL: https://issues.apache.org/jira/browse/ARROW-9116 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Antoine Pitrou >Assignee: Wes McKinney >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > It's often useful to compute the total data size of a binary array. > Sample implementation: > {code:c++} > int64_t total_values_length() const { > return raw_value_offsets_[length() + data_->offset] - > raw_value_offsets_[data_->offset]; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9120) [C++] Lint and Format _internal headers
[ https://issues.apache.org/jira/browse/ARROW-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9120: -- Labels: pull-request-available (was: ) > [C++] Lint and Format _internal headers > --- > > Key: ARROW-9120 > URL: https://issues.apache.org/jira/browse/ARROW-9120 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.17.1 >Reporter: Ben Kietzman >Assignee: Wes McKinney >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, headers named /*_internal.h/ are neither clang-formatted nor > cpplinted. Since they're not exported, CLI lint (forbid , nullptr, > ...) need not be applied -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8942) [R] Detect compression in reading CSV/JSON
[ https://issues.apache.org/jira/browse/ARROW-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8942: -- Labels: pull-request-available (was: ) > [R] Detect compression in reading CSV/JSON > -- > > Key: ARROW-8942 > URL: https://issues.apache.org/jira/browse/ARROW-8942 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Dyfan Jones >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Hi all, > Apologises if this has already been covered by another ticket. Is it possible > for arrow to read in compress delimited files (for example gzip)? > Currently I get an error when trying to read in a compressed delimited file: > > {code:java} > vroom::vroom_write(iris, "iris.csv.gz", delim = ",") > arrow::read_csv_arrow("iris.csv.gz") > # Error in csv__TableReader_Read(self) : > # Invalid: CSV parse error: Expected 1 columns, got 4{code} > however it can be read in by vroom and readr: > {code:java} > vroom::vroom("iris.csv.gz") > readr::read_csv("iris.csv.gz") > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9119) [C++] Add support for building with system static gRPC
[ https://issues.apache.org/jira/browse/ARROW-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9119: -- Labels: pull-request-available (was: ) > [C++] Add support for building with system static gRPC > -- > > Key: ARROW-9119 > URL: https://issues.apache.org/jira/browse/ARROW-9119 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9030) [Python] Clean up some usages of pyarrow.compat, move some common functions/symbols to lib.pyx
[ https://issues.apache.org/jira/browse/ARROW-9030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9030: -- Labels: pull-request-available (was: ) > [Python] Clean up some usages of pyarrow.compat, move some common > functions/symbols to lib.pyx > -- > > Key: ARROW-9030 > URL: https://issues.apache.org/jira/browse/ARROW-9030 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I started doing this while looking into ARROW-4633 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8510) [C++] arrow/dataset/file_base.cc fails to compile with internal compiler error with "Visual Studio 15 2017 Win64" generator
[ https://issues.apache.org/jira/browse/ARROW-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8510: -- Labels: pull-request-available (was: ) > [C++] arrow/dataset/file_base.cc fails to compile with internal compiler > error with "Visual Studio 15 2017 Win64" generator > --- > > Key: ARROW-8510 > URL: https://issues.apache.org/jira/browse/ARROW-8510 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Developer Tools >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I discovered this while running the release verification on Windows. There > was an obscuring issue which is that if the build fails, the verification > script continues. I will fix that -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9115) [C++] Process data buffers in batch in ascii_lower / ascii_upper kernels rather than using string_view value iteration
[ https://issues.apache.org/jira/browse/ARROW-9115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9115: -- Labels: pull-request-available (was: ) > [C++] Process data buffers in batch in ascii_lower / ascii_upper kernels > rather than using string_view value iteration > -- > > Key: ARROW-9115 > URL: https://issues.apache.org/jira/browse/ARROW-9115 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Also add a benchmark -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9079) [C++] Write benchmark for arithmetic kernels
[ https://issues.apache.org/jira/browse/ARROW-9079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9079: -- Labels: pull-request-available (was: ) > [C++] Write benchmark for arithmetic kernels > > > Key: ARROW-9079 > URL: https://issues.apache.org/jira/browse/ARROW-9079 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The add kernel's implementation has changed in > https://github.com/apache/arrow/pull/7341, in order to ensure that no > performance regression was introduced write a benchmark for the kernels and > compare the results with the previous implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9113) Fix exception causes in cli.py
[ https://issues.apache.org/jira/browse/ARROW-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9113: -- Labels: pull-request-available (was: ) > Fix exception causes in cli.py > -- > > Key: ARROW-9113 > URL: https://issues.apache.org/jira/browse/ARROW-9113 > Project: Apache Arrow > Issue Type: Bug > Components: Archery >Reporter: Ram Rachum >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I recently went over > [Matplotlib](https://github.com/matplotlib/matplotlib/pull/16706), > [Pandas](https://github.com/pandas-dev/pandas/pull/32322) and > [NumPy](https://github.com/numpy/numpy/pull/15731), fixing a small mistake in > the way that Python 3's exception chaining is used. If you're interested, I > can do it here too. I've done it on just one file right now. > The mistake is this: In some parts of the code, an exception is being caught > and replaced with a more user-friendly error. In these cases the syntax > `raise new_error from old_error` needs to be used. > Python 3's exception chaining means it shows not only the traceback of the > current exception, but that of the original exception (and possibly more.) > This is regardless of `raise from`. The usage of `raise from` tells Python to > put a more accurate message between the tracebacks. Instead of this: > During handling of the above exception, another exception occurred: > You'll get this: > The above exception was the direct cause of the following exception: > The first is inaccurate, because it signifies a bug in the exception-handling > code itself, which is a separate situation than wrapping an exception. > Let me know what you think! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7028) [R] Date roundtrip results in different R storage mode
[ https://issues.apache.org/jira/browse/ARROW-7028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7028: -- Labels: pull-request-available (was: ) > [R] Date roundtrip results in different R storage mode > -- > > Key: ARROW-7028 > URL: https://issues.apache.org/jira/browse/ARROW-7028 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 0.15.0 >Reporter: Sascha >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: image-2019-10-30-23-08-17-296.png > > Time Spent: 10m > Remaining Estimate: 0h > > When saving R-dataframes with parquet and loading them again, the internal > representation of Dates changes, leading e.g. to errors when comparing them > in dplyr::if_else. > {code} > library(dplyr) > #> > #> Attaching package: 'dplyr' > #> The following objects are masked from 'package:stats': > #> > #> filter, lag > #> The following objects are masked from 'package:base': > #> > #> intersect, setdiff, setequal, union > tmp = tempdir() > dat = tibble(tag = as.Date("2018-01-01")) > dat2 = tibble(tag2 = as.Date("2019-01-01")) > arrow::write_parquet(dat, file.path(tmp, "dat.parquet")) > dat = arrow::read_parquet(file.path(tmp, "dat.parquet")) > typeof(dat$tag) > #> [1] "integer" > typeof(dat2$tag2) > #> [1] "double" > bind_cols(dat, dat2) %>% > mutate(comparison = if_else(TRUE, tag, tag2)) > #> `false` must be a `Date` object, not a `Date` object > {code} > Created on 2019-10-30 by the [reprex package](https://reprex.tidyverse.org) > (v0.3.0) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6645) [Python] Dictionary indices are boundschecked unconditionally in CategoricalBlock.to_pandas
[ https://issues.apache.org/jira/browse/ARROW-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6645: -- Labels: pull-request-available (was: ) > [Python] Dictionary indices are boundschecked unconditionally in > CategoricalBlock.to_pandas > --- > > Key: ARROW-6645 > URL: https://issues.apache.org/jira/browse/ARROW-6645 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This was added at some point to fix a bug. I suspect we might want to move > this check somewhere else rather than do it every time {{to_pandas}} is called -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9112) [R] Update autobrew script location
[ https://issues.apache.org/jira/browse/ARROW-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9112: -- Labels: pull-request-available (was: ) > [R] Update autobrew script location > --- > > Key: ARROW-9112 > URL: https://issues.apache.org/jira/browse/ARROW-9112 > Project: Apache Arrow > Issue Type: Task > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Jeroen is moving it to a different location. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8826) [Crossbow] remote URL should always have .git
[ https://issues.apache.org/jira/browse/ARROW-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8826: -- Labels: pull-request-available (was: ) > [Crossbow] remote URL should always have .git > - > > Key: ARROW-8826 > URL: https://issues.apache.org/jira/browse/ARROW-8826 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Developer Tools >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In ARROW-7803, I edited the crossbow templates for the homebrew jobs to > substitute in the correct fork of arrow and append the current git SHA so > that the code under test corresponds to the requested git commit. > Unfortunately, this caused the nightly builds to fail. > Comparing a successful on-demand run > (https://github.com/ursa-labs/crossbow/blob/actions-266-travis-homebrew-r-autobrew/.travis.yml) > with a nightly run > (https://github.com/ursa-labs/crossbow/blob/nightly-2020-05-16-0-travis-homebrew-cpp/.travis.yml), > it appears that the default "remote" URL that crossbow uses when not on a > fork/PR does not contain the ".git" suffix. And I suspect that Homebrew > requires that in order to identify the source as a git repo in order to clone > it correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-971) [C++/Python] Implement Array.isvalid/notnull/isnull as scalar functions
[ https://issues.apache.org/jira/browse/ARROW-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-971: - Labels: dataframe pull-request-available (was: dataframe) > [C++/Python] Implement Array.isvalid/notnull/isnull as scalar functions > --- > > Key: ARROW-971 > URL: https://issues.apache.org/jira/browse/ARROW-971 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Ben Kietzman >Priority: Major > Labels: dataframe, pull-request-available > Fix For: 2.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > For arrays with nulls, this amounts to returning the validity bitmap. Without > nulls, an array of all 1 bits must be constructed. For isnull, the bits must > be flipped (in this case, the un-set part of the new bitmap must stay 0, > though). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8649) [Java] [Website] Java documentation on website is hidden
[ https://issues.apache.org/jira/browse/ARROW-8649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8649: -- Labels: pull-request-available (was: ) > [Java] [Website] Java documentation on website is hidden > > > Key: ARROW-8649 > URL: https://issues.apache.org/jira/browse/ARROW-8649 > Project: Apache Arrow > Issue Type: Bug > Components: Java, Website >Reporter: Andy Grove >Assignee: Neal Richardson >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There is some excellent Java documentation on the web site that is hard to > find because the Java documentation link [1] goes straight to the generated > javadocs. > > [1] https://arrow.apache.org/docs/java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9110) [C++] Fix CPU cache size detection on macOS
[ https://issues.apache.org/jira/browse/ARROW-9110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9110: -- Labels: pull-request-available (was: ) > [C++] Fix CPU cache size detection on macOS > --- > > Key: ARROW-9110 > URL: https://issues.apache.org/jira/browse/ARROW-9110 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Running certain benchmarks on macOS never ends because CpuInfo detects the > RAM size as the size of L1 cache. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9101) [Doc][C++][Python] Document encoding expected by CSV and JSON readers
[ https://issues.apache.org/jira/browse/ARROW-9101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9101: -- Labels: pull-request-available (was: ) > [Doc][C++][Python] Document encoding expected by CSV and JSON readers > - > > Key: ARROW-9101 > URL: https://issues.apache.org/jira/browse/ARROW-9101 > Project: Apache Arrow > Issue Type: Task > Components: C++, Documentation, Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9093) [FlightRPC][C++][Python] Allow setting gRPC client options
[ https://issues.apache.org/jira/browse/ARROW-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9093: -- Labels: pull-request-available (was: ) > [FlightRPC][C++][Python] Allow setting gRPC client options > -- > > Key: ARROW-9093 > URL: https://issues.apache.org/jira/browse/ARROW-9093 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, FlightRPC, Python >Reporter: David Li >Assignee: David Li >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > There's no way to set generic gRPC options which are useful for tuning > behavior (e.g. round-robin load balancing). Rather than bind all of these one > by one, gRPC allows setting arguments as generic string-string or > string-integer pairs; we could expose this (and leave the interpretation > implementation-dependent). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-7676) [Packaging][Python] Ensure that the static libraries are not built in the wheel scripts
[ https://issues.apache.org/jira/browse/ARROW-7676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-7676: -- Labels: pull-request-available (was: ) > [Packaging][Python] Ensure that the static libraries are not built in the > wheel scripts > --- > > Key: ARROW-7676 > URL: https://issues.apache.org/jira/browse/ARROW-7676 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Even though we don't bundle them with the wheels. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9102) [Packaging] Upload built manylinux docker images
[ https://issues.apache.org/jira/browse/ARROW-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9102: -- Labels: pull-request-available (was: ) > [Packaging] Upload built manylinux docker images > > > Key: ARROW-9102 > URL: https://issues.apache.org/jira/browse/ARROW-9102 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > However the secrets were set on azure pipelines the upload step is failing: > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=13104&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181 > So the manylinux builds take more than two hours. This is due to azure's > secret handling, we need to explicitly export the azure secret variables as > environment variables. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9100) Add ascii_lower kernel
[ https://issues.apache.org/jira/browse/ARROW-9100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9100: -- Labels: pull-request-available (was: ) > Add ascii_lower kernel > -- > > Key: ARROW-9100 > URL: https://issues.apache.org/jira/browse/ARROW-9100 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Maarten Breddels >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9099) [C++][Gandiva] Add TRIM function for string
[ https://issues.apache.org/jira/browse/ARROW-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9099: -- Labels: pull-request-available (was: ) > [C++][Gandiva] Add TRIM function for string > --- > > Key: ARROW-9099 > URL: https://issues.apache.org/jira/browse/ARROW-9099 > Project: Apache Arrow > Issue Type: Task >Reporter: Sagnik Chakraborty >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9098) RecordBatch::ToStructArray cannot handle record batches with 0 column
[ https://issues.apache.org/jira/browse/ARROW-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9098: -- Labels: pull-request-available (was: ) > RecordBatch::ToStructArray cannot handle record batches with 0 column > - > > Key: ARROW-9098 > URL: https://issues.apache.org/jira/browse/ARROW-9098 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.17.1 >Reporter: Zhuo Peng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If RecordBatch::ToStructArray is called against a record batch with 0 column, > the following error will be raised: > Invalid: Can't infer struct array length with 0 child arrays -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9088) [Rust] Recent version of arrow crate does not compile into wasm target
[ https://issues.apache.org/jira/browse/ARROW-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9088: -- Labels: pull-request-available (was: ) > [Rust] Recent version of arrow crate does not compile into wasm target > -- > > Key: ARROW-9088 > URL: https://issues.apache.org/jira/browse/ARROW-9088 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Reporter: Sergey Todyshev >Assignee: Neville Dipale >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > arrow 0.16 compiles successfully into wasm32-unknown-unknown, but recent git > version does not. it would be nice to fix that. > compiler errors: > > {noformat} > error[E0433]: failed to resolve: could not find `unix` in `os` > --> > /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18 > | > 41 | use std::os::unix::ffi::OsStringExt; > | could not find `unix` in `os` > > error[E0432]: unresolved import `unix` >--> > /home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5 > | > 6 | use unix; > | no `unix` in the root{noformat} > the problem is that prettytable-rs dependency depends on term->dirs which > causes this error > consider making prettytable-rs as dev dependency > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9095) [Rust] Fix NullArray to comply with spec
[ https://issues.apache.org/jira/browse/ARROW-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9095: -- Labels: pull-request-available (was: ) > [Rust] Fix NullArray to comply with spec > > > Key: ARROW-9095 > URL: https://issues.apache.org/jira/browse/ARROW-9095 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Affects Versions: 0.17.0 >Reporter: Neville Dipale >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > When I implemented the NullArray, I didn't comply with the spec under the > premise that I'd handle reading and writing IPC in a spec-compliant way as > that looked like the easier approach. > After some integration testing, I realised that I wasn't doing it correctly, > so it's better to comply with the spec by not allocating any buffers for the > array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9090) [C++] Bump versions of bundled libraries
[ https://issues.apache.org/jira/browse/ARROW-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9090: -- Labels: pull-request-available (was: ) > [C++] Bump versions of bundled libraries > > > Key: ARROW-9090 > URL: https://issues.apache.org/jira/browse/ARROW-9090 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We should bump the versions of bundled dependencies, wherever possible, to > ensure that users get bugfixes and improvements made in those third-party > libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9092) [C++] gandiva-decimal-test hangs with LLVM 9
[ https://issues.apache.org/jira/browse/ARROW-9092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9092: -- Labels: pull-request-available (was: ) > [C++] gandiva-decimal-test hangs with LLVM 9 > > > Key: ARROW-9092 > URL: https://issues.apache.org/jira/browse/ARROW-9092 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I built Gandiva C++ unittests with LLVM 9 on Ubuntu 18.04 and > gandiva-decimal-test hangs forever -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9089) [Python] A PyFileSystem handler for fsspec-based filesystems
[ https://issues.apache.org/jira/browse/ARROW-9089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9089: -- Labels: pull-request-available (was: ) > [Python] A PyFileSystem handler for fsspec-based filesystems > > > Key: ARROW-9089 > URL: https://issues.apache.org/jira/browse/ARROW-9089 > Project: Apache Arrow > Issue Type: Sub-task > Components: Python >Reporter: Joris Van den Bossche >Assignee: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Follow-up on ARROW-8766 to use this machinery to add an FSSpecHandler -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8785) [Python][Packaging] Build the windows wheels with MIMALLOC enabled
[ https://issues.apache.org/jira/browse/ARROW-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8785: -- Labels: pull-request-available (was: ) > [Python][Packaging] Build the windows wheels with MIMALLOC enabled > -- > > Key: ARROW-8785 > URL: https://issues.apache.org/jira/browse/ARROW-8785 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Alread set the flag, but there is a typo in it ARROW_MIMA"ll"OC -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9087) Missing HDFS options parsing
[ https://issues.apache.org/jira/browse/ARROW-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9087: -- Labels: pull-request-available (was: ) > Missing HDFS options parsing > > > Key: ARROW-9087 > URL: https://issues.apache.org/jira/browse/ARROW-9087 > Project: Apache Arrow > Issue Type: Bug >Reporter: Yuan Zhou >Assignee: Yuan Zhou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > HDFS options for kerberos ticket and extra conf is not parsed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9086) [CI][Homebrew] Enable Gandiva
[ https://issues.apache.org/jira/browse/ARROW-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9086: -- Labels: pull-request-available (was: ) > [CI][Homebrew] Enable Gandiva > - > > Key: ARROW-9086 > URL: https://issues.apache.org/jira/browse/ARROW-9086 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9085) [C++][CI] Appveyor CI test failures
[ https://issues.apache.org/jira/browse/ARROW-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9085: -- Labels: pull-request-available (was: ) > [C++][CI] Appveyor CI test failures > --- > > Key: ARROW-9085 > URL: https://issues.apache.org/jira/browse/ARROW-9085 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > See > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/33417919 > These seem to have been introduced by > https://github.com/apache/arrow/commit/b058cf0d1c26ad7984c104bb84322cc7dcc66f00 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9075) [C++] Optimize Filter implementation
[ https://issues.apache.org/jira/browse/ARROW-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9075: -- Labels: pull-request-available (was: ) > [C++] Optimize Filter implementation > > > Key: ARROW-9075 > URL: https://issues.apache.org/jira/browse/ARROW-9075 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I split this off from ARROW-5760 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9084) [C++] cmake is unable to find zstd target when ZSTD_SOURCE=SYSTEM
[ https://issues.apache.org/jira/browse/ARROW-9084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9084: -- Labels: pull-request-available (was: ) > [C++] cmake is unable to find zstd target when ZSTD_SOURCE=SYSTEM > - > > Key: ARROW-9084 > URL: https://issues.apache.org/jira/browse/ARROW-9084 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.1 > Environment: zstd 1.4.5 >Reporter: Dmitry Kalinkin >Assignee: Dmitry Kalinkin >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > A following problem occurs when arrow-cpp is built against system zstd: > {noformat} > CMake Error at cmake_modules/ThirdpartyToolchain.cmake:1860 > (get_target_property): > get_target_property() called with non-existent target "ZSTD::zstd". > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5377) [C++] Develop interface for writing a RecordBatch IPC stream into pre-allocated space (e.g. memory map) that avoids unnecessary serialization
[ https://issues.apache.org/jira/browse/ARROW-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5377: -- Labels: pull-request-available (was: ) > [C++] Develop interface for writing a RecordBatch IPC stream into > pre-allocated space (e.g. memory map) that avoids unnecessary serialization > - > > Key: ARROW-5377 > URL: https://issues.apache.org/jira/browse/ARROW-5377 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As discussed in recent mailing list thread > https://lists.apache.org/thread.html/b756209052fecb8c28a5eb37db7aecb82a5f5351fa79a9d86f0dba3e@%3Cuser.arrow.apache.org%3E > The only viable process at the moment for getting an accurate report of > stream size is to write a simulated stream using {{MockOutputStream}}. This > is suboptimal for a couple of reasons: > * Flatbuffers metadata must be created twice > * Record batch disassembly into IpcPayload must be performed twice > It seems like an interface with a very constrained public API could be > provided to deconstruct a sequence of RecordBatches and report the size of > the produced IPC stream (based on metadata sizes, and padding), and then this > deconstructed set of IPC payloads can be written out to a stream (e.g. using > {{FixedSizeBufferWriter}}) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9071) [C++] MakeArrayOfNull makes invalid ListArray
[ https://issues.apache.org/jira/browse/ARROW-9071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9071: -- Labels: pull-request-available (was: ) > [C++] MakeArrayOfNull makes invalid ListArray > - > > Key: ARROW-9071 > URL: https://issues.apache.org/jira/browse/ARROW-9071 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Reporter: Zhuo Peng >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > One way to reproduce this bug is: > > >>> a = pa.array([[1, 2]]) > >>> b = pa.array([None, None], type=pa.null()) > >>> t1 = pa.Table.from_arrays([a], ["a"]) > >>> t2 = pa.Table.from_arrays([b], ["b"]) > > >>> pa.concat_tables([t1, t2], promote=True) > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/table.pxi", line 2138, in pyarrow.lib.concat_tables > File "pyarrow/public-api.pxi", line 390, in pyarrow.lib.pyarrow_wrap_table > File "pyarrow/error.pxi", line 85, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Column 0: In chunk 1: Invalid: List child array > invalid: Invalid: Buffer #1 too small in array of type int64 and length 2: > expected at least 16 byte(s), got 12 > (because concat_tables(promote=True) will call MakeArrayOfNulls > ([https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647))|https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/table.cc#L647)'] > > The code here seems incorrect: > [https://github.com/apache/arrow/blob/ec3bae18157723411bb772fca628cbd02eea5c25/cpp/src/arrow/array/util.cc#L218] > the length of the child array of a ListArray may not equal to the length of > the ListArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8430) [CI] Configure self-hosted runners for Github Actions
[ https://issues.apache.org/jira/browse/ARROW-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8430: -- Labels: pull-request-available (was: ) > [CI] Configure self-hosted runners for Github Actions > - > > Key: ARROW-8430 > URL: https://issues.apache.org/jira/browse/ARROW-8430 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Set up Ubuntu C++ ARMv8 builders and perhaps AMD64 builder to run on > self-hosted github runners. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9082) [Rust] - Stream reader fail when steam not ended with (optional) 0xFFFFFFFF 0x00000000"
[ https://issues.apache.org/jira/browse/ARROW-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9082: -- Labels: pull-request-available (was: ) > [Rust] - Stream reader fail when steam not ended with (optional) 0x > 0x" > > > Key: ARROW-9082 > URL: https://issues.apache.org/jira/browse/ARROW-9082 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.17.1 >Reporter: Eyal Leshem >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > according to spec : > [https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format] , > the 0x 0x is optional in the arrow response stream , but > currently when client receive such response it's read all the batches well , > but return an error in the end (instead of Ok(None)) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9074) [GLib] Add missing arrow-json check
[ https://issues.apache.org/jira/browse/ARROW-9074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9074: -- Labels: pull-request-available (was: ) > [GLib] Add missing arrow-json check > --- > > Key: ARROW-9074 > URL: https://issues.apache.org/jira/browse/ARROW-9074 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5760) [C++] Optimize Take implementation
[ https://issues.apache.org/jira/browse/ARROW-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5760: -- Labels: pull-request-available (was: ) > [C++] Optimize Take implementation > -- > > Key: ARROW-5760 > URL: https://issues.apache.org/jira/browse/ARROW-5760 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Ben Kietzman >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > There is some question of whether these kernels allocate optimally- for > example when Filtering or Taking strings it might be more efficient to pass > over the filter/indices twice, first to determine how much character storage > will be needed then again into allocated memory: > https://github.com/apache/arrow/pull/4531#discussion_r297160457 > Additionally, these kernels could probably make good use of scatter/gather > SIMD instructions. > Furthermore, Filter's bitmap is currently lazily expanded into the indices of > elements to be appended to the output array. It would probably be more > efficient to expand to indices in batches, then gather using an index batch. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8726) [C++][Dataset] Mis-specified DirectoryPartitioning incorrectly uses the file name as value
[ https://issues.apache.org/jira/browse/ARROW-8726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8726: -- Labels: dataset pull-request-available (was: dataset) > [C++][Dataset] Mis-specified DirectoryPartitioning incorrectly uses the file > name as value > -- > > Key: ARROW-8726 > URL: https://issues.apache.org/jira/browse/ARROW-8726 > Project: Apache Arrow > Issue Type: Bug > Components: R >Reporter: Jonathan Keane >Assignee: Francois Saint-Jacques >Priority: Major > Labels: dataset, pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Calling filter + collect on a dataset with a mis-specified partitioning > causes a segfault. Though this is clearly input error, it would be nice if > there was some guidance that something was wrong with the partitioning. > {code:r} > library(arrow) > library(dplyr) > dir.create("multi_mtcars/one", recursive = TRUE) > dir.create("multi_mtcars/two", recursive = TRUE) > write_parquet(mtcars, "multi_mtcars/one/mtcars.parquet") > write_parquet(mtcars, "multi_mtcars/two/mtcars.parquet") > ds <- open_dataset("multi_mtcars", partitioning = c("level", "nothing")) > # the following will segfault > ds %>% > filter(cyl > 8) %>% > collect() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9073) [C++] RapidJSON include directory detection doesn't work with RapidJSONConfig.cmake
[ https://issues.apache.org/jira/browse/ARROW-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9073: -- Labels: pull-request-available (was: ) > [C++] RapidJSON include directory detection doesn't work with > RapidJSONConfig.cmake > --- > > Key: ARROW-9073 > URL: https://issues.apache.org/jira/browse/ARROW-9073 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9077) [C++] Fix aggregate/scalar-compare benchmark null_percent calculation
[ https://issues.apache.org/jira/browse/ARROW-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9077: -- Labels: pull-request-available (was: ) > [C++] Fix aggregate/scalar-compare benchmark null_percent calculation > - > > Key: ARROW-9077 > URL: https://issues.apache.org/jira/browse/ARROW-9077 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Frank Du >Assignee: Frank Du >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Wrong null percent in aggregate/scalar-compare as the changes in > benchmark_util.h. Correct both to use the new defined boilerplate. > ./release/arrow-compute-aggregate-benchmark > > -- > Benchmark Time CPU Iterations UserCounters... > > -- > SumKernelFloat/32768/1 5.38 us 5.38 us 129832 > bytes_per_second=5.67524G/s {color:#FF}null_percent=10k{color} > size=32.768k > SumKernelFloat/32768/1000 5.36 us 5.35 us 130069 bytes_per_second=5.6994G/s > null_percent=1000 size=32.768k > SumKernelFloat/32768/100 5.35 us 5.35 us 131071 bytes_per_second=5.70903G/s > null_percent=100 size=32.768k > SumKernelFloat/32768/50 10.8 us 10.7 us 65504 bytes_per_second=2.84073G/s > null_percent=50 size=32.768k > SumKernelFloat/32768/10 4.94 us 4.93 us 141624 bytes_per_second=6.18964G/s > null_percent=10 size=32.768k > SumKernelFloat/32768/1 4.41 us 4.40 us 158949 bytes_per_second=6.92913G/s > null_percent=1 size=32.768k -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8866) [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION
[ https://issues.apache.org/jira/browse/ARROW-8866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8866: -- Labels: pull-request-available (was: ) > [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION > - > > Key: ARROW-8866 > URL: https://issues.apache.org/jira/browse/ARROW-8866 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Ben Kietzman >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Similar to the recent {{Type::INTERVAL}} split, having these two array types > which have different memory layouts under the same {{Type::type}} value makes > function dispatch somewhat more complicated. This issue is less critical from > INTERVAL so this may not be urgent but seems like a good pre-1.0 change -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9062) [Rust] Support to read JSON into dictionary type
[ https://issues.apache.org/jira/browse/ARROW-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9062: -- Labels: pull-request-available (was: ) > [Rust] Support to read JSON into dictionary type > > > Key: ARROW-9062 > URL: https://issues.apache.org/jira/browse/ARROW-9062 > Project: Apache Arrow > Issue Type: Sub-task > Components: Rust >Reporter: Sven Wagner-Boysen >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently a JSON reader build from a schema using the type dictionary for one > of the fields in the schema will fail with JsonError("struct types are not > yet supported") > {code:java} > let builder = ReaderBuilder::new().with_schema(..) > let mut reader: Reader = > builder.build::(File::open(path).unwrap()).unwrap(); > let rb = reader.next().unwrap() > {code} > > Suggested solution: > Support reading into a dictionary in Json Reader: > [https://github.com/apache/arrow/blob/master/rust/arrow/src/json/reader.rs#L368] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9066) [Python] Raise correct error in isnull()
[ https://issues.apache.org/jira/browse/ARROW-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9066: -- Labels: pull-request-available (was: ) > [Python] Raise correct error in isnull() > > > Key: ARROW-9066 > URL: https://issues.apache.org/jira/browse/ARROW-9066 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.17.1 >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9064) optimization debian package manager tweaks
[ https://issues.apache.org/jira/browse/ARROW-9064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9064: -- Labels: pull-request-available (was: ) > optimization debian package manager tweaks > -- > > Key: ARROW-9064 > URL: https://issues.apache.org/jira/browse/ARROW-9064 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Pratik Raj >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > By default, Ubuntu or Debian based "apt" or "apt-get" system installs > recommended but not suggested packages . > By passing "--no-install-recommends" option, the user lets apt-get know not > to consider recommended packages as a dependency to install. > This results in smaller downloads and installation of packages . > Refer to blog at [Ubuntu Blog] at > https://ubuntu.com/blog/we-reduced-our-docker-images-by-60-with-no-install-recommends -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8974) [C++] Refine TransferBitmap template parameters
[ https://issues.apache.org/jira/browse/ARROW-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8974: -- Labels: pull-request-available (was: ) > [C++] Refine TransferBitmap template parameters > --- > > Key: ARROW-8974 > URL: https://issues.apache.org/jira/browse/ARROW-8974 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > [TransferBitmap|https://github.com/apache/arrow/blob/44e723d9ac7c64739d419ad66618d2d56003d1b7/cpp/src/arrow/util/bit_util.cc#L110] > has two template parameters of bool type with four combinations. > Change them to function parameters can reduce code size. I think > "restore_trailing_bits" cannot impact performance. "invert_bits" needs > benchmark. > Also, bool parameter is hard to figure out at [caller > side|https://github.com/apache/arrow/blob/44e723d9ac7c64739d419ad66618d2d56003d1b7/cpp/src/arrow/util/bit_util.cc#L208], > better to use meaningful defines. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9061) [Packaging][APT][Yum][GLib] Add Apache Arrow Datasets GLib
[ https://issues.apache.org/jira/browse/ARROW-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9061: -- Labels: pull-request-available (was: ) > [Packaging][APT][Yum][GLib] Add Apache Arrow Datasets GLib > -- > > Key: ARROW-9061 > URL: https://issues.apache.org/jira/browse/ARROW-9061 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib, Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8736) [Rust] [DataFusion] Table API should provide a schema() method
[ https://issues.apache.org/jira/browse/ARROW-8736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8736: -- Labels: pull-request-available (was: ) > [Rust] [DataFusion] Table API should provide a schema() method > -- > > Key: ARROW-8736 > URL: https://issues.apache.org/jira/browse/ARROW-8736 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion >Reporter: Andy Grove >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Table API should provide a schema() method. It is currently not possible to > examine the schema of a registered table without getting it via the logical > schema but that isn't intuitive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9060) [GLib] Add support for building Apache Arrow Datasets GLib with non-installed Apache Arrow Datasets
[ https://issues.apache.org/jira/browse/ARROW-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9060: -- Labels: pull-request-available (was: ) > [GLib] Add support for building Apache Arrow Datasets GLib with non-installed > Apache Arrow Datasets > --- > > Key: ARROW-9060 > URL: https://issues.apache.org/jira/browse/ARROW-9060 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It's required for packaging: > https://travis-ci.org/github/ursa-labs/crossbow/builds/695595159 > {noformat} > CXX libarrow_dataset_glib_la-scanner.lo > scanner.cpp:24:33: fatal error: arrow/util/iterator.h: No such file or > directory > #include > ^ > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9059) [Rust] Documentation for slicing array data has the wrong sign
[ https://issues.apache.org/jira/browse/ARROW-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9059: -- Labels: pull-request-available (was: ) > [Rust] Documentation for slicing array data has the wrong sign > -- > > Key: ARROW-9059 > URL: https://issues.apache.org/jira/browse/ARROW-9059 > Project: Apache Arrow > Issue Type: Bug > Components: Rust >Reporter: Bobby Wagner >Priority: Trivial > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In the slice_data function in array.rs, the docstring says it panics if > offset+length is less than data.len(), the code actually panics if offset + > length is greater than data.len() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9058) [Packaging][wheel] Boost download is failed
[ https://issues.apache.org/jira/browse/ARROW-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9058: -- Labels: pull-request-available (was: ) > [Packaging][wheel] Boost download is failed > --- > > Key: ARROW-9058 > URL: https://issues.apache.org/jira/browse/ARROW-9058 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=12893&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=d9b15392-e4ce-5e4c-0c8c-b69645229181 > {noformat} > + curl -sL > https://dl.bintray.com/boostorg/release/1.68.0/source/boost_1_68_0.tar.gz -o > /boost_1_68_0.tar.gz > + tar xf boost_1_68_0.tar.gz > tar: This does not look like a tar archive > tar: Error exit delayed from previous errors > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9057) Projection should work on InMemoryScan without error
[ https://issues.apache.org/jira/browse/ARROW-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9057: -- Labels: pull-request-available (was: ) > Projection should work on InMemoryScan without error > > > Key: ARROW-9057 > URL: https://issues.apache.org/jira/browse/ARROW-9057 > Project: Apache Arrow > Issue Type: Bug > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8781) [CI][C++] Enable ccache on GHA MinGW jobs
[ https://issues.apache.org/jira/browse/ARROW-8781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8781: -- Labels: pull-request-available (was: ) > [CI][C++] Enable ccache on GHA MinGW jobs > - > > Key: ARROW-8781 > URL: https://issues.apache.org/jira/browse/ARROW-8781 > Project: Apache Arrow > Issue Type: Wish > Components: C++, Continuous Integration >Reporter: Antoine Pitrou >Assignee: Kouhei Sutou >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > It would be nice to enable caching with ccache on the MinGW Github Actions > jobs. They're currently quite slow... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9050) [Release] Use 1.0.0 as the next version
[ https://issues.apache.org/jira/browse/ARROW-9050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9050: -- Labels: pull-request-available (was: ) > [Release] Use 1.0.0 as the next version > --- > > Key: ARROW-9050 > URL: https://issues.apache.org/jira/browse/ARROW-9050 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9052) [CI][MinGW] Enable Gandiva
[ https://issues.apache.org/jira/browse/ARROW-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9052: -- Labels: pull-request-available (was: ) > [CI][MinGW] Enable Gandiva > -- > > Key: ARROW-9052 > URL: https://issues.apache.org/jira/browse/ARROW-9052 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva, Continuous Integration, GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9047) [Rust] Setting 0-bits of a 0-length bitset segfaults
[ https://issues.apache.org/jira/browse/ARROW-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9047: -- Labels: pull-request-available (was: ) > [Rust] Setting 0-bits of a 0-length bitset segfaults > > > Key: ARROW-9047 > URL: https://issues.apache.org/jira/browse/ARROW-9047 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Max Burke >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > See PR for details -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9007) [Rust] Support appending arrays by merging array data
[ https://issues.apache.org/jira/browse/ARROW-9007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9007: -- Labels: pull-request-available (was: ) > [Rust] Support appending arrays by merging array data > - > > Key: ARROW-9007 > URL: https://issues.apache.org/jira/browse/ARROW-9007 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Affects Versions: 0.17.0 >Reporter: Neville Dipale >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > ARROW-9005 introduces a concat kernel which allows for concatenating multiple > arrays of the same type into a single array. This is useful for sorting on > multiple arrays, among other things. > The concat kernel is implemented for most array types, but not yet for nested > arrays (lists, structs, etc). > This Jira is for creating a way of appending/merging all array types, so that > concat (and functionality that depends on it) can support all array types. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6917) [Developer] Implement Python script to generate git cherry-pick commands needed to create patch build branch for maint releases
[ https://issues.apache.org/jira/browse/ARROW-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6917: -- Labels: pull-request-available (was: ) > [Developer] Implement Python script to generate git cherry-pick commands > needed to create patch build branch for maint releases > --- > > Key: ARROW-6917 > URL: https://issues.apache.org/jira/browse/ARROW-6917 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > For 0.14.1, I maintained this script by hand. It would be less failure-prone > (maybe) to generate it based on the fix versions set in JIRA -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9051) [GLib] Refer Array related objects from Array
[ https://issues.apache.org/jira/browse/ARROW-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9051: -- Labels: pull-request-available (was: ) > [GLib] Refer Array related objects from Array > - > > Key: ARROW-9051 > URL: https://issues.apache.org/jira/browse/ARROW-9051 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9046) [C++][R] Put more things in type_fwds
[ https://issues.apache.org/jira/browse/ARROW-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9046: -- Labels: pull-request-available (was: ) > [C++][R] Put more things in type_fwds > - > > Key: ARROW-9046 > URL: https://issues.apache.org/jira/browse/ARROW-9046 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, R >Reporter: Neal Richardson >Assignee: Ben Kietzman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Hopefully to reduce compile time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8723) [Rust] Remove SIMD specific benchmark code
[ https://issues.apache.org/jira/browse/ARROW-8723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8723: -- Labels: pull-request-available (was: ) > [Rust] Remove SIMD specific benchmark code > -- > > Key: ARROW-8723 > URL: https://issues.apache.org/jira/browse/ARROW-8723 > Project: Apache Arrow > Issue Type: New Feature > Components: Rust >Reporter: Paddy Horan >Assignee: Paddy Horan >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Now that SIMD is behind a feature flag it's trivial to compare SIMD vs > non-SIMD and the SIMD versions of benchmarks can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9045) [C++] Improve and expand Take/Filter benchmarks
[ https://issues.apache.org/jira/browse/ARROW-9045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9045: -- Labels: pull-request-available (was: ) > [C++] Improve and expand Take/Filter benchmarks > --- > > Key: ARROW-9045 > URL: https://issues.apache.org/jira/browse/ARROW-9045 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I'm putting this up as a separate patch for review -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-555) [C++] String algorithm library for StringArray/BinaryArray
[ https://issues.apache.org/jira/browse/ARROW-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-555: - Labels: Analytics pull-request-available (was: Analytics) > [C++] String algorithm library for StringArray/BinaryArray > -- > > Key: ARROW-555 > URL: https://issues.apache.org/jira/browse/ARROW-555 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Labels: Analytics, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is a parent JIRA for starting a module for processing strings in-memory > arranged in Arrow format. This will include using the re2 C++ regular > expression library and other standard string manipulations (such as those > found on Python's string objects) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9034) [C++] Implement binary (two bitmap) version of BitBlockCounter
[ https://issues.apache.org/jira/browse/ARROW-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9034: -- Labels: pull-request-available (was: ) > [C++] Implement binary (two bitmap) version of BitBlockCounter > -- > > Key: ARROW-9034 > URL: https://issues.apache.org/jira/browse/ARROW-9034 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The current BitBlockCounter from ARROW-9029 is useful for unary operations. > Some operations involve multiple bitmaps and so it's useful to be able to > determine the block popcounts of the AND of the respective words in the > bitmaps. So each returned block would contain the number of bits that are set > in both bitmaps at the same locations -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9043) [Go] Temporarily copy LICENSE.txt to go/
[ https://issues.apache.org/jira/browse/ARROW-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9043: -- Labels: pull-request-available (was: ) > [Go] Temporarily copy LICENSE.txt to go/ > > > Key: ARROW-9043 > URL: https://issues.apache.org/jira/browse/ARROW-9043 > Project: Apache Arrow > Issue Type: Improvement > Components: Go >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {{go mod}} needs to find a license file in the root of the Go module. In the > future "go mod" may be able to follow symlinks in which case this can be > replaced by a symlink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9042) [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior
[ https://issues.apache.org/jira/browse/ARROW-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9042: -- Labels: pull-request-available (was: ) > [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior > > > Key: ARROW-9042 > URL: https://issues.apache.org/jira/browse/ARROW-9042 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Also avoid undefined behaviour caused by signed integer overflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8909) [Java] Out of order writes using setSafe
[ https://issues.apache.org/jira/browse/ARROW-8909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8909: -- Labels: pull-request-available (was: ) > [Java] Out of order writes using setSafe > > > Key: ARROW-8909 > URL: https://issues.apache.org/jira/browse/ARROW-8909 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Saurabh >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > I noticed that calling setSafe on a VarCharVector with indices not in > increasing order causes the lastIndex to be set to the index in the last call > to setSafe. > Is this a documented and expected behavior ? > Sample code: > {code:java} > import java.util.Collections; > import lombok.extern.slf4j.Slf4j; > import org.apache.arrow.memory.RootAllocator; > import org.apache.arrow.vector.VarCharVector; > import org.apache.arrow.vector.VectorSchemaRoot; > import org.apache.arrow.vector.types.pojo.ArrowType; > import org.apache.arrow.vector.types.pojo.Field; > import org.apache.arrow.vector.types.pojo.Schema; > import org.apache.arrow.vector.util.Text; > @Slf4j > public class ATest { > public static void main() { > Schema schema = new > Schema(Collections.singletonList(Field.nullable("Data", new > ArrowType.Utf8(; > try (VectorSchemaRoot vroot = VectorSchemaRoot.create(schema, new > RootAllocator())) { > VarCharVector vec = (VarCharVector) vroot.getVector("Data"); > for (int i = 0; i < 10; i++) { > vec.setSafe(i, new Text(Integer.toString(i) + "_mtest")); > } > vec.setSafe(7, new Text(Integer.toString(7) + "_new")); > log.info("Data at index 8 Before {}", vec.getObject(8)); > vroot.setRowCount(10); > log.info("Data at index 8 After {}", vec.getObject(8)); > log.info(vroot.contentToTSVString()); > } > } > } > {code} > > If I don't set the index 7 after the loop, I get all the 0_mtest, 1_mtest, > ..., 9_mtest entries. > If I set index 7 after the loop, I see 0_mtest, ..., 5_mtest, 6_mtext, 7_new, > Before the setRowCount, the data at index 8 is -> *st8_mtest* ; index 9 > is *9_mtest* > After the setRowCount, the data at index 8 is -> "" ; index 9 is "" > With a text with more chars instead of 4 with _new, it keeps eating into the > data at the following indices. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)
[ https://issues.apache.org/jira/browse/ARROW-9037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9037: -- Labels: pull-request-available (was: ) > [C++/C-ABI] unable to import array with null count == -1 (which could be > exported) > -- > > Key: ARROW-9037 > URL: https://issues.apache.org/jira/browse/ARROW-9037 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.17.1 >Reporter: Zhuo Peng >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If an Array is created with null_count == -1 but without any null (and thus > no null bitmap buffer), then ArrayData.null_count will remain -1 when > exporting if null_count is never computed. The exported C struct also has > null_count == -1 [1]. But when importing, if null_count != 0, an error [2] > will be raised. > [1] > https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560 > [2] > https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9032) [C++] Split arrow/util/bit_util.h into multiple header files
[ https://issues.apache.org/jira/browse/ARROW-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9032: -- Labels: pull-request-available (was: ) > [C++] Split arrow/util/bit_util.h into multiple header files > > > Key: ARROW-9032 > URL: https://issues.apache.org/jira/browse/ARROW-9032 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This header has grown quite large and any given compilation unit's use of it > is likely limited to only a couple of functions or classes. I suspect it > would improve compilation time to split up this header into a few headers > organized by frequency of code use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6602) [Doc] Add feature / implementation matrix
[ https://issues.apache.org/jira/browse/ARROW-6602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6602: -- Labels: pull-request-available (was: ) > [Doc] Add feature / implementation matrix > - > > Key: ARROW-6602 > URL: https://issues.apache.org/jira/browse/ARROW-6602 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We have many different implementations and each implementation makes a > different set of features available. It would be nice to have a top-level doc > page making it clear which implementation supports what. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8766) [Python] A FileSystem implementation based on Python callbacks
[ https://issues.apache.org/jira/browse/ARROW-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8766: -- Labels: dataset-dask-integration filesystem pull-request-available (was: dataset-dask-integration filesystem) > [Python] A FileSystem implementation based on Python callbacks > -- > > Key: ARROW-8766 > URL: https://issues.apache.org/jira/browse/ARROW-8766 > Project: Apache Arrow > Issue Type: Sub-task > Components: Python >Reporter: Joris Van den Bossche >Assignee: Antoine Pitrou >Priority: Major > Labels: dataset-dask-integration, filesystem, > pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The new {{pyarrow.fs}} filesystems are now actual C++ objects, and no longer > "just" a python interface. So they can't easily be expanded from the Python > side, and the existing integration with {{fsspec}} filesystems is therefore > also not working anymore. > One possible solution is to have a C++ filesystem that calls back into a > python object for each of its methods (possibly similar to how you can > implement a flight server in Python, I suppose). > Such a FileSystem implementation would allow to make a {{pyarrow.fs}} wrapper > for {{fsspec}} filesystems, and thus allow such filesystems to be used in > pyarrow where new filesystems are expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3154) [Python][C++] Document how to write _metadata, _common_metadata files with Parquet datasets
[ https://issues.apache.org/jira/browse/ARROW-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3154: -- Labels: dataset parquet pull-request-available (was: dataset parquet) > [Python][C++] Document how to write _metadata, _common_metadata files with > Parquet datasets > --- > > Key: ARROW-3154 > URL: https://issues.apache.org/jira/browse/ARROW-3154 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney >Priority: Major > Labels: dataset, parquet, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is not mentioned in great detail in > http://arrow.apache.org/docs/python/parquet.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9029) [C++] Implement BitmapScanner interface to accelerate processing of mostly-not-null data
[ https://issues.apache.org/jira/browse/ARROW-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9029: -- Labels: pull-request-available (was: ) > [C++] Implement BitmapScanner interface to accelerate processing of > mostly-not-null data > > > Key: ARROW-9029 > URL: https://issues.apache.org/jira/browse/ARROW-9029 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > In analytics, it is common for data to be all not-null or mostly not-null. > Data with > 50% nulls tends to be more exceptional. In this light, our > {{BitmapReader}} class which allows iteration of each bit in a bitmap can be > computationally suboptimal for mostly set validity bitmaps. > I propose instead a new interface for use in kernel implementations, for lack > of a better term {{BitmapScanner}}. This works as follows: > * Uses popcount to accumulate consecutive 64-bit words from a bitmap where > all values are set, up to some limit (e.g. anywhere from 8 to 128 words or > more -- we can use benchmarks to determine what is a good limit). The length > of this "all-on" run is returned to the caller in a single function call, so > that this "run" of data can be processed without any bit-by-bit bitmap > checking > * If words containing unset bits is encountered, the scanner will similarly > accumulate non-full words until the next full word is encountered or a limit > is hit. The length of this "has nulls" run is returned to the caller, which > then proceeds bit-by-bit to process the data > For data with a lot of nulls, this may degrade performance somewhat but > probably not that much empirically. However, data that is mostly-not-null > should benefit from this. > This BitmapScanner utility can probably also be used to accelerate the > implementation of Filter for mostly-not-null data -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8946) [Python] Add tests for parquet.write_metadata metadata_collector
[ https://issues.apache.org/jira/browse/ARROW-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8946: -- Labels: pull-request-available (was: ) > [Python] Add tests for parquet.write_metadata metadata_collector > > > Key: ARROW-8946 > URL: https://issues.apache.org/jira/browse/ARROW-8946 > Project: Apache Arrow > Issue Type: Test > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Follow-up on ARROW-8062: the PR added functionality to > {{parquet.write_metadata}} to pass a a collection of metadata objects to be > concatenated. We should add some specific tests for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9026) [C++/Python] Force package removal from arrow-nightlies conda repository
[ https://issues.apache.org/jira/browse/ARROW-9026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9026: -- Labels: pull-request-available (was: ) > [C++/Python] Force package removal from arrow-nightlies conda repository > > > Key: ARROW-9026 > URL: https://issues.apache.org/jira/browse/ARROW-9026 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9024) [C++/Python] Install anaconda-client in conda-clean job
[ https://issues.apache.org/jira/browse/ARROW-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9024: -- Labels: pull-request-available (was: ) > [C++/Python] Install anaconda-client in conda-clean job > --- > > Key: ARROW-9024 > URL: https://issues.apache.org/jira/browse/ARROW-9024 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9023) [C++] Use mimalloc conda package
[ https://issues.apache.org/jira/browse/ARROW-9023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9023: -- Labels: pull-request-available (was: ) > [C++] Use mimalloc conda package > > > Key: ARROW-9023 > URL: https://issues.apache.org/jira/browse/ARROW-9023 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Uwe Korn >Assignee: Uwe Korn >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9022) [C++][Compute] Make Add function safe for numeric limits
[ https://issues.apache.org/jira/browse/ARROW-9022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9022: -- Labels: pull-request-available (was: ) > [C++][Compute] Make Add function safe for numeric limits > > > Key: ARROW-9022 > URL: https://issues.apache.org/jira/browse/ARROW-9022 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the output type of the Add function is identical with the argument > types which makes it unsafe to add numeric limit values, so instead of using > {{(int8, int8) -> int8}} signature we should use {{((int8, int8) -> int16}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9021) [Python] The filesystem keyword in parquet.read_table is not documented
[ https://issues.apache.org/jira/browse/ARROW-9021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9021: -- Labels: pull-request-available (was: ) > [Python] The filesystem keyword in parquet.read_table is not documented > --- > > Key: ARROW-9021 > URL: https://issues.apache.org/jira/browse/ARROW-9021 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Joris Van den Bossche >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8979) [C++] Implement bitmap word reader and writer
[ https://issues.apache.org/jira/browse/ARROW-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8979: -- Labels: pull-request-available (was: ) > [C++] Implement bitmap word reader and writer > - > > Key: ARROW-8979 > URL: https://issues.apache.org/jira/browse/ARROW-8979 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Yibo Cai >Assignee: Yibo Cai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Below three Jira tasks optimize bitmap operations(logical, copy, compare, > etc) unaligned case. They use word-by-word approach instead of bit-by-bit to > improve performance. > There are some common code of read/write bitmap in words. It's better to > implement word based bitmap reader and writer to wrap similar function and > reduce code redundancy. > https://issues.apache.org/jira/browse/ARROW-8553 > https://issues.apache.org/jira/browse/ARROW-8843 > https://issues.apache.org/jira/browse/ARROW-8844 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4633) [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway
[ https://issues.apache.org/jira/browse/ARROW-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-4633: -- Labels: dataset-parquet-read newbie parquet pull-request-available (was: dataset-parquet-read newbie parquet) > [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway > -- > > Key: ARROW-4633 > URL: https://issues.apache.org/jira/browse/ARROW-4633 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1, 0.12.0 > Environment: Linux, Python 3.7.1, pyarrow.__version__ = 0.12.0 >Reporter: Taylor Johnson >Assignee: Wes McKinney >Priority: Minor > Labels: dataset-parquet-read, newbie, parquet, > pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The following code seems to suggest that ParquetFile.read(use_threads=False) > still creates a ThreadPool. This is observed in > ParquetFile.read_row_group(use_threads=False) as well. > This does not appear to be a problem in > pyarrow.Table.to_pandas(use_threads=False). > I've tried tracing the error. Starting in python/pyarrow/parquet.py, both > ParquetReader.read_all() and ParquetReader.read_row_group() pass the > use_threads input along to self.reader which is a ParquetReader imported from > _parquet.pyx > Following the calls into python/pyarrow/_parquet.pyx, we see that > ParquetReader.read_all() and ParquetReader.read_row_group() have the > following code which seems a bit suspicious > {quote}if use_threads: > self.set_use_threads(use_threads) > {quote} > Why not just always call self.set_use_threads(use_threads)? > The ParquetReader.set_use_threads simply calls > self.reader.get().set_use_threads(use_threads). This self.reader is assigned > as unique_ptr[FileReader]. I think this points to > cpp/src/parquet/arrow/reader.cc, but I'm not sure about that. The > FileReader::Impl::ReadRowGroup logic looks ok, as a call to > ::arrow::internal::GetCpuThreadPool() is only called if use_threads is True. > The same is true for ReadTable. > So when is the ThreadPool getting created? > Example code: > -- > {quote}import pandas as pd > import psutil > import pyarrow as pa > import pyarrow.parquet as pq > use_threads=False > p=psutil.Process() > print('Starting with {} threads'.format(p.num_threads())) > df = pd.DataFrame(\{'x':[0]}) > table = pa.Table.from_pandas(df) > print('After table creation, {} threads'.format(p.num_threads())) > df = table.to_pandas(use_threads=use_threads) > print('table.to_pandas(use_threads={}), {} threads'.format(use_threads, > p.num_threads())) > writer = pq.ParquetWriter('tmp.parquet', table.schema) > writer.write_table(table) > writer.close() > print('After writing parquet file, {} threads'.format(p.num_threads())) > pf = pq.ParquetFile('tmp.parquet') > print('After ParquetFile, {} threads'.format(p.num_threads())) > df = pf.read(use_threads=use_threads).to_pandas() > print('After pf.read(use_threads={}), {} threads'.format(use_threads, > p.num_threads())) > {quote} > --- > $ python pyarrow_test.py > Starting with 1 threads > After table creation, 1 threads > table.to_pandas(use_threads=False), 1 threads > After writing parquet file, 1 threads > After ParquetFile, 1 threads > After pf.read(use_threads=False), 5 threads -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2702) [Python] Examine usages of Invalid and TypeError errors in numpy_to_arrow.cc to see if we are using the right error type in each instance
[ https://issues.apache.org/jira/browse/ARROW-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2702: -- Labels: pull-request-available (was: ) > [Python] Examine usages of Invalid and TypeError errors in numpy_to_arrow.cc > to see if we are using the right error type in each instance > - > > Key: ARROW-2702 > URL: https://issues.apache.org/jira/browse/ARROW-2702 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > See discussion in [https://github.com/apache/arrow/pull/2075] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9018) [C++] Remove APIs that were deprecated in 0.17.x and prior
[ https://issues.apache.org/jira/browse/ARROW-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9018: -- Labels: pull-request-available (was: ) > [C++] Remove APIs that were deprecated in 0.17.x and prior > -- > > Key: ARROW-9018 > URL: https://issues.apache.org/jira/browse/ARROW-9018 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8904) [Python] Fix usages of deprecated C++ APIs related to child/field
[ https://issues.apache.org/jira/browse/ARROW-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8904: -- Labels: pull-request-available (was: ) > [Python] Fix usages of deprecated C++ APIs related to child/field > - > > Key: ARROW-8904 > URL: https://issues.apache.org/jira/browse/ARROW-8904 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > {code} > -- Running cmake --build for pyarrow > cmake --build . --config debug -- -j16 > [19/20] Building CXX object CMakeFiles/lib.dir/lib.cpp.o > lib.cpp:20265:85: warning: 'num_children' is deprecated: Use num_fields() > [-Wdeprecated-declarations] > __pyx_t_1 = __pyx_f_7pyarrow_3lib__normalize_index(__pyx_v_i, > __pyx_v_self->type->num_children()); if (unlikely(__pyx_t_1 == > ((Py_ssize_t)-1L))) __PYX_ERR(1, 119, __pyx_L1_error) > > ^ > /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been > explicitly marked deprecated here > ARROW_DEPRECATED("Use num_fields()") > ^ > /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from > macro 'ARROW_DEPRECATED' > # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) >^ > lib.cpp:20276:76: warning: 'child' is deprecated: Use field(i) > [-Wdeprecated-declarations] > __pyx_t_2 = > __pyx_f_7pyarrow_3lib_pyarrow_wrap_field(__pyx_v_self->type->child(__pyx_v_index)); > if (unlikely(!__pyx_t_2)) __PYX_ERR(1, 120, __pyx_L1_error) >^ > /home/wesm/local/include/arrow/type.h:251:3: note: 'child' has been > explicitly marked deprecated here > ARROW_DEPRECATED("Use field(i)") > ^ > /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from > macro 'ARROW_DEPRECATED' > # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) >^ > lib.cpp:20507:56: warning: 'num_children' is deprecated: Use num_fields() > [-Wdeprecated-declarations] > __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_self->type->num_children()); if > (unlikely(!__pyx_t_1)) __PYX_ERR(1, 139, __pyx_L1_error) >^ > /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been > explicitly marked deprecated here > ARROW_DEPRECATED("Use num_fields()") > ^ > /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from > macro 'ARROW_DEPRECATED' > # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) >^ > lib.cpp:23361:44: warning: 'num_children' is deprecated: Use num_fields() > [-Wdeprecated-declarations] > __pyx_r = __pyx_v_self->__pyx_base.type->num_children(); >^ > /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been > explicitly marked deprecated here > ARROW_DEPRECATED("Use num_fields()") > ^ > /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from > macro 'ARROW_DEPRECATED' > # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) >^ > lib.cpp:24039:44: warning: 'num_children' is deprecated: Use num_fields() > [-Wdeprecated-declarations] > __pyx_r = __pyx_v_self->__pyx_base.type->num_children(); >^ > /home/wesm/local/include/arrow/type.h:263:3: note: 'num_children' has been > explicitly marked deprecated here > ARROW_DEPRECATED("Use num_fields()") > ^ > /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from > macro 'ARROW_DEPRECATED' > # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) >^ > lib.cpp:58220:37: warning: 'child' is deprecated: Use field(pos) > [-Wdeprecated-declarations] > __pyx_v_child = __pyx_v_self->ap->child(__pyx_v_child_id); > ^ > /home/wesm/local/include/arrow/array.h:1281:3: note: 'child' has been > explicitly marked deprecated here > ARROW_DEPRECATED("Use field(pos)") > ^ > /home/wesm/local/include/arrow/util/macros.h:104:48: note: expanded from > macro 'ARROW_DEPRECATED' > # define ARROW_DEPRECATED(...) __attribute__((deprecated(__VA_ARGS__))) >^ > lib.cpp:58956:74: warning: 'children' is de
[jira] [Updated] (ARROW-8951) [C++] Fix compiler warning in compute/kernels/scalar_cast_temporal.cc
[ https://issues.apache.org/jira/browse/ARROW-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8951: -- Labels: pull-request-available (was: ) > [C++] Fix compiler warning in compute/kernels/scalar_cast_temporal.cc > - > > Key: ARROW-8951 > URL: https://issues.apache.org/jira/browse/ARROW-8951 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The kernel functor can return an uninitialized value on errors > {code} > ../src/arrow/compute/kernels/scalar_cast_temporal.cc: In member function ‘OUT > arrow::compute::internal::ParseTimestamp::Call(arrow::compute::KernelContext*, > ARG0) const [with OUT = long int; ARG0 = > nonstd::sv_lite::basic_string_view]’: > ../src/arrow/compute/kernels/scalar_cast_temporal.cc:267:12: warning: > ‘result’ may be used uninitialized in this function [-Wmaybe-uninitialized] > return result; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9016) [Java] Remove direct references to Netty/Unsafe Allocators
[ https://issues.apache.org/jira/browse/ARROW-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9016: -- Labels: pull-request-available (was: ) > [Java] Remove direct references to Netty/Unsafe Allocators > -- > > Key: ARROW-9016 > URL: https://issues.apache.org/jira/browse/ARROW-9016 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Ryan Murray >Assignee: Ryan Murray >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As part of ARROW-8230 this removes direct references to Netty and Unsafe > Allocation managers in the `DefaultAllocationManagerOption` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9015) [Java] Make BaseAllocator package private
[ https://issues.apache.org/jira/browse/ARROW-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9015: -- Labels: pull-request-available (was: ) > [Java] Make BaseAllocator package private > - > > Key: ARROW-9015 > URL: https://issues.apache.org/jira/browse/ARROW-9015 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Ryan Murray >Assignee: Ryan Murray >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As part of the netty work in ARROW-8230 it became clear that BaseAllocator > should be package private -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9014) [Packaging] Bump the minor part of the automatically generated version in crossbow
[ https://issues.apache.org/jira/browse/ARROW-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9014: -- Labels: pull-request-available (was: ) > [Packaging] Bump the minor part of the automatically generated version in > crossbow > -- > > Key: ARROW-9014 > URL: https://issues.apache.org/jira/browse/ARROW-9014 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Crossbow uses setuptools_scm to generate a development version number using > git describe command. This means that it finds the latest {{reachable}} tag > from the current commit on master. > The minor releases are created from the master branch whereas the patch > release tags point to commits on maintenance branches (like 0.17.x) which > means that if we already have released a patch version, like 0.17.1 then > crossbow generates a version number like > 0.17.0.dev{number-of-commits-from-0.17.0} and bumps its patch tag, eventually > creating binary packages with version 0.17.1.dev123. > The main problem with this is that the produced nightly python wheels are not > picked up by pip, because on pypi we already have that patch release > available and pip doesn't consider 0.17.1.dev123 newer than 0.17.1 (with > --pre option passed). > So to force pip to install the newer nightly packages we need to bump the > minor version instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9010) [Java] Framework and interface changes for RecordBatch IPC buffer compression
[ https://issues.apache.org/jira/browse/ARROW-9010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9010: -- Labels: pull-request-available (was: ) > [Java] Framework and interface changes for RecordBatch IPC buffer compression > - > > Key: ARROW-9010 > URL: https://issues.apache.org/jira/browse/ARROW-9010 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This is the first sub-work item of ARROW-8672 ( > [Java] Implement RecordBatch IPC buffer compression from ARROW-300). However, > it does not involve any concrete compression algorithms. The purpose of this > PR is to establish basic interfaces for data compression, and make changes to > the IPC framework so that different compression algorithms can be plug-in > smoothly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9005) Support sort expression
[ https://issues.apache.org/jira/browse/ARROW-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9005: -- Labels: pull-request-available (was: ) > Support sort expression > --- > > Key: ARROW-9005 > URL: https://issues.apache.org/jira/browse/ARROW-9005 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust - DataFusion >Reporter: QP Hou >Assignee: QP Hou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9004) [C++][Gandiva] Upgrade to LLVM 10
[ https://issues.apache.org/jira/browse/ARROW-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-9004: -- Labels: pull-request-available (was: ) > [C++][Gandiva] Upgrade to LLVM 10 > - > > Key: ARROW-9004 > URL: https://issues.apache.org/jira/browse/ARROW-9004 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva >Reporter: Kouhei Sutou >Assignee: Kouhei Sutou >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8929) [C++] Change compute::Arity:VarArgs min_args default to 0
[ https://issues.apache.org/jira/browse/ARROW-8929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8929: -- Labels: pull-request-available (was: ) > [C++] Change compute::Arity:VarArgs min_args default to 0 > - > > Key: ARROW-8929 > URL: https://issues.apache.org/jira/browse/ARROW-8929 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The issue of minimum number of arguments is separate from providing an > {{InputType}} for input type checking. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-8985) [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers type for forward compatibility
[ https://issues.apache.org/jira/browse/ARROW-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-8985: -- Labels: pull-request-available (was: ) > [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers > type for forward compatibility > > > Key: ARROW-8985 > URL: https://issues.apache.org/jira/browse/ARROW-8985 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > This will permit larger or smaller decimals to be added to the format later > without having to add a new Type union value -- This message was sent by Atlassian Jira (v8.3.4#803005)