[jira] [Created] (ARROW-9344) [C++][Flight] measure latency quantile in flight benchmark

2020-07-06 Thread Yibo Cai (Jira)
Yibo Cai created ARROW-9344:
---

 Summary: [C++][Flight] measure latency quantile in flight benchmark
 Key: ARROW-9344
 URL: https://issues.apache.org/jira/browse/ARROW-9344
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC
Reporter: Yibo Cai
Assignee: Yibo Cai


ARROW-9206 measures average latency in flight benchmark.
 In practice, latency quantile is necessary to show the whole picture of rpc 
performance. E.g., 99% quantile, max, median.
 A naive approach to save latencies of all batches is not applicable. Boost 
accumulator_set implements p square quantile algorithm which uses O(1) space 
with trivial computation overhead for each batch. It can be used in calculating 
latency quantiles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9343) [C++][Gandiva] CastINT/Float functions from string should handle leading/trailing white spaces

2020-07-06 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-9343:
-

 Summary: [C++][Gandiva] CastINT/Float functions from string should 
handle leading/trailing white spaces
 Key: ARROW-9343
 URL: https://issues.apache.org/jira/browse/ARROW-9343
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Projjal Chanda






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9342) [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions with optional trimtext argument for string

2020-07-06 Thread Sagnik Chakraborty (Jira)
Sagnik Chakraborty created ARROW-9342:
-

 Summary: [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions with 
optional trimtext argument for string
 Key: ARROW-9342
 URL: https://issues.apache.org/jira/browse/ARROW-9342
 Project: Apache Arrow
  Issue Type: Task
Reporter: Sagnik Chakraborty






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9341) [GLib] Use arrow::Datum version Take()

2020-07-06 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-9341:
---

 Summary: [GLib] Use arrow::Datum version Take()
 Key: ARROW-9341
 URL: https://issues.apache.org/jira/browse/ARROW-9341
 Project: Apache Arrow
  Issue Type: Improvement
  Components: GLib
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9340) [R] Use CRAN version of decor package

2020-07-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9340:
--

 Summary: [R] Use CRAN version of decor package
 Key: ARROW-9340
 URL: https://issues.apache.org/jira/browse/ARROW-9340
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9339) [Rust] Comments on SIMD in Arrow README are incorrect

2020-07-06 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9339:
--

 Summary: [Rust] Comments on SIMD in Arrow README are incorrect
 Key: ARROW-9339
 URL: https://issues.apache.org/jira/browse/ARROW-9339
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan
Assignee: Paddy Horan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9338) [Rust] Add instructions for running clippy locally

2020-07-06 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-9338:
--

 Summary: [Rust] Add instructions for running clippy locally
 Key: ARROW-9338
 URL: https://issues.apache.org/jira/browse/ARROW-9338
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Paddy Horan


Similar to the "Code Formatting" section in the top level README it would be 
useful to add instructions for running clippy locally to avoid wasted CI time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9337) [R] On C++ library build failure, give an unambiguous message

2020-07-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9337:
--

 Summary: [R] On C++ library build failure, give an unambiguous 
message
 Key: ARROW-9337
 URL: https://issues.apache.org/jira/browse/ARROW-9337
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


See e.g. ARROW-9303, where the downstream error message is misleading.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9336) Creating RecordBatch with structs missing keys results in a malformed table

2020-07-06 Thread Steven Willis (Jira)
Steven Willis created ARROW-9336:


 Summary: Creating RecordBatch with structs missing keys results in 
a malformed table
 Key: ARROW-9336
 URL: https://issues.apache.org/jira/browse/ARROW-9336
 Project: Apache Arrow
  Issue Type: Bug
  Components: Ruby
Affects Versions: 0.17.1
Reporter: Steven Willis


Using {{::Arrow::RecordBatch.new(schema, data)}} (which uses the 
{{RecordBatchBuilder}}) appears to handle when a record is missing an entry for 
a top level column, but it doesn't handle when a record is missing an entry 
within a struct column. For example, I'd expect the following code to print out 
{{true}} for each {{puts}}, but 2 of them are {{false}}:

{code:ruby}
require 'parquet'
require 'arrow'

schema = [
  {name: "a", type: "string"},
  {name: "b", type: "struct", fields: [
 {name: "c", type: "string"},
 {name: "d", type: "string"},
   ]
  },
]

arrow_schema = ::Arrow::Schema.new(schema)

record_batch = ::Arrow::RecordBatch.new(
  arrow_schema,
  [
{"a" => "a", "b" => {"c" => "c",   }},
{"b" => {"c" => "c",   }},
{"b" => {"d" => "d"}},
  ]
)
table = record_batch.to_table

puts(table['a'][0] == 'a')
puts(table['a'][1].nil?)
puts(table['a'][2].nil?)

puts(table['b'][0].key?('c'))
puts(table['b'][0]['c'] == 'c')
puts(table['b'][0].key?('d'))
puts(table['b'][0]['d'].nil?) # False ?
puts(!table['b'][0].key?('e'))

puts(table['b'][1].key?('c'))
puts(table['b'][1]['c'] == 'c')
puts(table['b'][1].key?('d'))
puts(table['b'][1]['d'].nil?)
puts(!table['b'][1].key?('e'))

puts(table['b'][2].key?('c'))
puts(table['b'][2]['c'].nil?)
puts(table['b'][2].key?('d'))
puts(table['b'][2]['d'] == 'd') # False ?
puts(!table['b'][2].key?('e'))
{code}

I'd expect {{puts(table)}} to print this representation:

{noformat}
a   b
0   a   {"c"=>"c", "d"=>nil}
1   {"c"=>"c", "d"=>nil}
2   {"c"=>nil, "d"=>"d"}
{noformat}

But it prints this instead:

{noformat}
a   b
0   a   {"c"=>"c", "d"=>"d"}
1   {"c"=>"c", "d"=>nil}
2   {"c"=>nil, "d"=>nil}
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9335) [Website] Update website for 1.0

2020-07-06 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9335:
--

 Summary: [Website] Update website for 1.0
 Key: ARROW-9335
 URL: https://issues.apache.org/jira/browse/ARROW-9335
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Website
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


Umbrella issue for various others.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9334) [Dev][Archery] debian-c-glib and ubuntu-c-glib lack utf8proc

2020-07-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9334:
-

 Summary: [Dev][Archery] debian-c-glib and ubuntu-c-glib lack 
utf8proc
 Key: ARROW-9334
 URL: https://issues.apache.org/jira/browse/ARROW-9334
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery, C, Developer Tools, GLib
Reporter: Antoine Pitrou


The "debian-c-glib" and "ubuntu-c-glib" docker-compose configurations fail with 
the following message:
{code:java}
CMake Error at 
/usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
  Could NOT find utf8proc (missing: UTF8PROC_LIB UTF8PROC_INCLUDE_DIR)
Call Stack (most recent call first):
  /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:378 
(_FPHSA_FAILURE_MESSAGE)
  cmake_modules/Findutf8proc.cmake:41 (find_package_handle_standard_args)
  cmake_modules/ThirdpartyToolchain.cmake:159 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:2096 (resolve_dependency)
  CMakeLists.txt:467 (include)

 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9333) [Python] Expose IPC write options in Python

2020-07-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9333:
-

 Summary: [Python] Expose IPC write options in Python
 Key: ARROW-9333
 URL: https://issues.apache.org/jira/browse/ARROW-9333
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou


We want to allow Python users to use the latest metadata version and/or enable 
buffer compression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9332) [Python][Dataset] Support pickling of ParquetFileFragment's RowGroupInfo

2020-07-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-9332:


 Summary: [Python][Dataset] Support pickling of 
ParquetFileFragment's RowGroupInfo
 Key: ARROW-9332
 URL: https://issues.apache.org/jira/browse/ARROW-9332
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


Follow-up on ARROW-8651 to ensure we can also preserve the statistics 
information of {{RowGroupInfo}} of a {{ParquetFileFragment}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9331) [C++] Improve the performance of Tensor-to-SparseTensor conversion

2020-07-06 Thread Kenta Murata (Jira)
Kenta Murata created ARROW-9331:
---

 Summary: [C++] Improve the performance of Tensor-to-SparseTensor 
conversion
 Key: ARROW-9331
 URL: https://issues.apache.org/jira/browse/ARROW-9331
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kenta Murata
Assignee: Kenta Murata






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [arrow-testing] pitrou opened a new pull request #34: ARROW-9330: Add IPC fuzz regression files

2020-07-06 Thread GitBox


pitrou opened a new pull request #34:
URL: https://github.com/apache/arrow-testing/pull/34


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-testing] pitrou merged pull request #34: ARROW-9330: Add IPC fuzz regression files

2020-07-06 Thread GitBox


pitrou merged pull request #34:
URL: https://github.com/apache/arrow-testing/pull/34


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (ARROW-9330) [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)

2020-07-06 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9330:
-

 Summary: [C++] Fix crashes on corrupt IPC input (OSS-Fuzz)
 Key: ARROW-9330
 URL: https://issues.apache.org/jira/browse/ARROW-9330
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9329) [C++][Gandiva] Implement castTimestampToDate function

2020-07-06 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-9329:
-

 Summary: [C++][Gandiva] Implement castTimestampToDate function
 Key: ARROW-9329
 URL: https://issues.apache.org/jira/browse/ARROW-9329
 Project: Apache Arrow
  Issue Type: Task
Reporter: Projjal Chanda
Assignee: Projjal Chanda






--
This message was sent by Atlassian Jira
(v8.3.4#803005)