[jira] [Created] (ARROW-9344) [C++][Flight] measure latency quantile in flight benchmark
Yibo Cai created ARROW-9344: --- Summary: [C++][Flight] measure latency quantile in flight benchmark Key: ARROW-9344 URL: https://issues.apache.org/jira/browse/ARROW-9344 Project: Apache Arrow Issue Type: Improvement Components: C++, FlightRPC Reporter: Yibo Cai Assignee: Yibo Cai ARROW-9206 measures average latency in flight benchmark. In practice, latency quantile is necessary to show the whole picture of rpc performance. E.g., 99% quantile, max, median. A naive approach to save latencies of all batches is not applicable. Boost accumulator_set implements p square quantile algorithm which uses O(1) space with trivial computation overhead for each batch. It can be used in calculating latency quantiles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9343) [C++][Gandiva] CastINT/Float functions from string should handle leading/trailing white spaces
Projjal Chanda created ARROW-9343: - Summary: [C++][Gandiva] CastINT/Float functions from string should handle leading/trailing white spaces Key: ARROW-9343 URL: https://issues.apache.org/jira/browse/ARROW-9343 Project: Apache Arrow Issue Type: Bug Reporter: Projjal Chanda -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9342) [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions with optional trimtext argument for string
Sagnik Chakraborty created ARROW-9342: - Summary: [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions with optional trimtext argument for string Key: ARROW-9342 URL: https://issues.apache.org/jira/browse/ARROW-9342 Project: Apache Arrow Issue Type: Task Reporter: Sagnik Chakraborty -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9341) [GLib] Use arrow::Datum version Take()
Kouhei Sutou created ARROW-9341: --- Summary: [GLib] Use arrow::Datum version Take() Key: ARROW-9341 URL: https://issues.apache.org/jira/browse/ARROW-9341 Project: Apache Arrow Issue Type: Improvement Components: GLib Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9340) [R] Use CRAN version of decor package
Neal Richardson created ARROW-9340: -- Summary: [R] Use CRAN version of decor package Key: ARROW-9340 URL: https://issues.apache.org/jira/browse/ARROW-9340 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9339) [Rust] Comments on SIMD in Arrow README are incorrect
Paddy Horan created ARROW-9339: -- Summary: [Rust] Comments on SIMD in Arrow README are incorrect Key: ARROW-9339 URL: https://issues.apache.org/jira/browse/ARROW-9339 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Paddy Horan Assignee: Paddy Horan -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9338) [Rust] Add instructions for running clippy locally
Paddy Horan created ARROW-9338: -- Summary: [Rust] Add instructions for running clippy locally Key: ARROW-9338 URL: https://issues.apache.org/jira/browse/ARROW-9338 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Paddy Horan Similar to the "Code Formatting" section in the top level README it would be useful to add instructions for running clippy locally to avoid wasted CI time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9337) [R] On C++ library build failure, give an unambiguous message
Neal Richardson created ARROW-9337: -- Summary: [R] On C++ library build failure, give an unambiguous message Key: ARROW-9337 URL: https://issues.apache.org/jira/browse/ARROW-9337 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 See e.g. ARROW-9303, where the downstream error message is misleading. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9336) Creating RecordBatch with structs missing keys results in a malformed table
Steven Willis created ARROW-9336: Summary: Creating RecordBatch with structs missing keys results in a malformed table Key: ARROW-9336 URL: https://issues.apache.org/jira/browse/ARROW-9336 Project: Apache Arrow Issue Type: Bug Components: Ruby Affects Versions: 0.17.1 Reporter: Steven Willis Using {{::Arrow::RecordBatch.new(schema, data)}} (which uses the {{RecordBatchBuilder}}) appears to handle when a record is missing an entry for a top level column, but it doesn't handle when a record is missing an entry within a struct column. For example, I'd expect the following code to print out {{true}} for each {{puts}}, but 2 of them are {{false}}: {code:ruby} require 'parquet' require 'arrow' schema = [ {name: "a", type: "string"}, {name: "b", type: "struct", fields: [ {name: "c", type: "string"}, {name: "d", type: "string"}, ] }, ] arrow_schema = ::Arrow::Schema.new(schema) record_batch = ::Arrow::RecordBatch.new( arrow_schema, [ {"a" => "a", "b" => {"c" => "c", }}, {"b" => {"c" => "c", }}, {"b" => {"d" => "d"}}, ] ) table = record_batch.to_table puts(table['a'][0] == 'a') puts(table['a'][1].nil?) puts(table['a'][2].nil?) puts(table['b'][0].key?('c')) puts(table['b'][0]['c'] == 'c') puts(table['b'][0].key?('d')) puts(table['b'][0]['d'].nil?) # False ? puts(!table['b'][0].key?('e')) puts(table['b'][1].key?('c')) puts(table['b'][1]['c'] == 'c') puts(table['b'][1].key?('d')) puts(table['b'][1]['d'].nil?) puts(!table['b'][1].key?('e')) puts(table['b'][2].key?('c')) puts(table['b'][2]['c'].nil?) puts(table['b'][2].key?('d')) puts(table['b'][2]['d'] == 'd') # False ? puts(!table['b'][2].key?('e')) {code} I'd expect {{puts(table)}} to print this representation: {noformat} a b 0 a {"c"=>"c", "d"=>nil} 1 {"c"=>"c", "d"=>nil} 2 {"c"=>nil, "d"=>"d"} {noformat} But it prints this instead: {noformat} a b 0 a {"c"=>"c", "d"=>"d"} 1 {"c"=>"c", "d"=>nil} 2 {"c"=>nil, "d"=>nil} {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9335) [Website] Update website for 1.0
Neal Richardson created ARROW-9335: -- Summary: [Website] Update website for 1.0 Key: ARROW-9335 URL: https://issues.apache.org/jira/browse/ARROW-9335 Project: Apache Arrow Issue Type: Improvement Components: Website Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 Umbrella issue for various others. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9334) [Dev][Archery] debian-c-glib and ubuntu-c-glib lack utf8proc
Antoine Pitrou created ARROW-9334: - Summary: [Dev][Archery] debian-c-glib and ubuntu-c-glib lack utf8proc Key: ARROW-9334 URL: https://issues.apache.org/jira/browse/ARROW-9334 Project: Apache Arrow Issue Type: Bug Components: Archery, C, Developer Tools, GLib Reporter: Antoine Pitrou The "debian-c-glib" and "ubuntu-c-glib" docker-compose configurations fail with the following message: {code:java} CMake Error at /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:137 (message): Could NOT find utf8proc (missing: UTF8PROC_LIB UTF8PROC_INCLUDE_DIR) Call Stack (most recent call first): /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE) cmake_modules/Findutf8proc.cmake:41 (find_package_handle_standard_args) cmake_modules/ThirdpartyToolchain.cmake:159 (find_package) cmake_modules/ThirdpartyToolchain.cmake:2096 (resolve_dependency) CMakeLists.txt:467 (include) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9333) [Python] Expose IPC write options in Python
Antoine Pitrou created ARROW-9333: - Summary: [Python] Expose IPC write options in Python Key: ARROW-9333 URL: https://issues.apache.org/jira/browse/ARROW-9333 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Antoine Pitrou We want to allow Python users to use the latest metadata version and/or enable buffer compression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9332) [Python][Dataset] Support pickling of ParquetFileFragment's RowGroupInfo
Joris Van den Bossche created ARROW-9332: Summary: [Python][Dataset] Support pickling of ParquetFileFragment's RowGroupInfo Key: ARROW-9332 URL: https://issues.apache.org/jira/browse/ARROW-9332 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Joris Van den Bossche Follow-up on ARROW-8651 to ensure we can also preserve the statistics information of {{RowGroupInfo}} of a {{ParquetFileFragment}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9331) [C++] Improve the performance of Tensor-to-SparseTensor conversion
Kenta Murata created ARROW-9331: --- Summary: [C++] Improve the performance of Tensor-to-SparseTensor conversion Key: ARROW-9331 URL: https://issues.apache.org/jira/browse/ARROW-9331 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Kenta Murata Assignee: Kenta Murata -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [arrow-testing] pitrou opened a new pull request #34: ARROW-9330: Add IPC fuzz regression files
pitrou opened a new pull request #34: URL: https://github.com/apache/arrow-testing/pull/34 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] pitrou merged pull request #34: ARROW-9330: Add IPC fuzz regression files
pitrou merged pull request #34: URL: https://github.com/apache/arrow-testing/pull/34 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-9330) [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)
Antoine Pitrou created ARROW-9330: - Summary: [C++] Fix crashes on corrupt IPC input (OSS-Fuzz) Key: ARROW-9330 URL: https://issues.apache.org/jira/browse/ARROW-9330 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Antoine Pitrou Assignee: Antoine Pitrou Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9329) [C++][Gandiva] Implement castTimestampToDate function
Projjal Chanda created ARROW-9329: - Summary: [C++][Gandiva] Implement castTimestampToDate function Key: ARROW-9329 URL: https://issues.apache.org/jira/browse/ARROW-9329 Project: Apache Arrow Issue Type: Task Reporter: Projjal Chanda Assignee: Projjal Chanda -- This message was sent by Atlassian Jira (v8.3.4#803005)