[jira] [Updated] (ARROW-13125) [R] Throw error when 2+ args passed to desc() in arrange()

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13125:
---
Labels: pull-request-available  (was: )

> [R] Throw error when 2+ args passed to desc() in arrange()
> --
>
> Key: ARROW-13125
> URL: https://issues.apache.org/jira/browse/ARROW-13125
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 4.0.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently this does not result in an error, but it should:
> {code:r}Table$create(x = 1:3, y = 4:6) %>% arrange(desc(x, y)){code}
> The same problem affects dplyr on R data frames. I opened 
> https://github.com/tidyverse/dplyr/issues/5921 for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13125) [R] Throw error when 2+ args passed to desc() in arrange()

2021-06-18 Thread Ian Cook (Jira)
Ian Cook created ARROW-13125:


 Summary: [R] Throw error when 2+ args passed to desc() in arrange()
 Key: ARROW-13125
 URL: https://issues.apache.org/jira/browse/ARROW-13125
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 4.0.1
Reporter: Ian Cook
Assignee: Ian Cook
 Fix For: 5.0.0


Currently this does not result in an error, but it should:
{code:r}Table$create(x = 1:3, y = 4:6) %>% arrange(desc(x, y)){code}
The same problem affects dplyr on R data frames. I opened 
https://github.com/tidyverse/dplyr/issues/5921 for that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13124) [Ruby] Add support for memory view

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13124:
---
Labels: pull-request-available  (was: )

> [Ruby] Add support for memory view
> --
>
> Key: ARROW-13124
> URL: https://issues.apache.org/jira/browse/ARROW-13124
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Ruby
>Reporter: Kouhei Sutou
>Assignee: Kouhei Sutou
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13124) [Ruby] Add support for memory view

2021-06-18 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-13124:


 Summary: [Ruby] Add support for memory view
 Key: ARROW-13124
 URL: https://issues.apache.org/jira/browse/ARROW-13124
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Ruby
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13123) Cannot build Pyarro on python38 docker image

2021-06-18 Thread Alexandre Campino (Jira)
Alexandre Campino created ARROW-13123:
-

 Summary: Cannot build Pyarro on python38 docker image
 Key: ARROW-13123
 URL: https://issues.apache.org/jira/browse/ARROW-13123
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Alexandre Campino
 Attachments: Dockerfile

Hi all,

 

Trying to build pyarrow on a python38 docker image in order to build a 
aws-lambda layer. 

I had success on doing this couple months back with the dockerfile attached. 
(Python37) but now getting the error either I try either python version below.
{code:java}
FROM lambci/lambda:build-python3.7{code}
{code:java}
FROM lambci/lambda:build-python3.8{code}
which leads me to think that something else has changed. 
{noformat}
#8 6.648 Scanning dependencies of target boost_ep
#8 6.657 [  0%] Creating directories for 'boost_ep'
#8 6.689 [  1%] Performing download step (download, verify and extract) for 
'boost_ep'
#8 8.038 -- boost_ep download command succeeded.  See also 
/arrow/cpp/build/boost_ep-prefix/src/boost_ep-stamp/boost_ep-download-*.log
#8 8.049 [  1%] No patch step for 'boost_ep'
#8 8.060 [  1%] No update step for 'boost_ep'
#8 8.071 [  2%] Performing configure step for 'boost_ep'
#8 8.088 CMake Error at 
/arrow/cpp/build/boost_ep-prefix/src/boost_ep-stamp/boost_ep-configure-RELEASE.cmake:16
 (message):
#8 8.088   Command failed: 1
#8 8.088
#8 8.088'./bootstrap.sh' 
'--prefix=/arrow/cpp/build/boost_ep-prefix/src/boost_ep' 
'--with-libraries=filesystem,regex,system'
#8 8.088
#8 8.088   See also
#8 8.088
#8 8.088 
/arrow/cpp/build/boost_ep-prefix/src/boost_ep-stamp/boost_ep-configure-*.log
#8 8.088
#8 8.088
#8 8.089 make[2]: *** [boost_ep-prefix/src/boost_ep-stamp/boost_ep-configure] 
Error 1
#8 8.089 make[1]: *** [CMakeFiles/boost_ep.dir/all] Error 2
#8 8.090 make: *** [all] Error 2{noformat}
{noformat}
executor failed running [/bin/sh -c mkdir /arrow && curl -o 
/tmp/apache-arrow.tar.gz -SL 
https://github.com/apache/arrow/archive/apache-arrow-${ARROW_VERSION}.tar.gz
 && tar -xvf /tmp/apache-arrow.tar.gz -C /arrow --strip-components 1 && 
mkdir /arrow/dist && export LD_LIBRARY_PATH=/dist/lib:$LD_LIBRARY_PATH 
&& mkdir -p /arrow/cpp/build && cd /arrow/cpp/build && cmake 
-DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE -DCMAKE_INSTALL_LIBDIR=lib 
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME -DARROW_PARQUET=on -DARROW_PYTHON=on 
-DARROW_PLASMA=on  
   -DARROW_WITH_SNAPPY=on -DARROW_BUILD_TESTS=OFF .. && make && 
make install]: exit code: 2
The terminal process "C:\WINDOWS\System32\cmd.exe /K 
C:\tools\cmder\vendor\bin\vscode_init.cmd /d /c docker build --pull --rm -f 
"Docker\pyarrow\Dockerfile" -t pyarrow37:lambci-lambda "Docker\pyarrow"" 
terminated with exit code: 1.{noformat}
Note: Dockerfile has plenty of commented out code. Output above corresponds to 
the non commented out code built attempt. 

 

Thank you

Alex

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13110) [C++] Deadlock can happen when using BackgroundGenerator without transferring callbacks

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13110:
---
Labels: pull-request-available  (was: )

> [C++] Deadlock can happen when using BackgroundGenerator without transferring 
> callbacks
> ---
>
> Key: ARROW-13110
> URL: https://issues.apache.org/jira/browse/ARROW-13110
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Weston Pace
>Assignee: Weston Pace
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13064) [C++] Add a general "if, ifelse, ..., else" kernel

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13064:
---
Labels: pull-request-available  (was: )

> [C++] Add a general "if, ifelse, ..., else" kernel
> --
>
> Key: ARROW-13064
> URL: https://issues.apache.org/jira/browse/ARROW-13064
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ian Cook
>Assignee: David Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> ARROW-10640 added a ternary {{if_else}} kernel. Add another kernel that 
> extends this concept to an arbitrary number of conditions and associated 
> results, like a vectorized {{if-ifelse-...-else}} with an arbitrary number of 
> {{ifelse}} and with the {{else}} optional. This is like a SQL {{CASE}} 
> statement.
> How best to achieve this is not obvious. To enable SQL-style uses, it would 
> be most efficient to implement this as a variadic kernel where the 
> even-number arguments (0, 2, ...) are the arrays of boolean conditions, the 
> odd-number arguments (1, 3, ...) are the corresponding arrays of results, and 
> the final argument is the {{else}} result. But I'm not sure if this is 
> practical. Maybe instead we should implement this to operate on listarrays, 
> like NumPy's 
> {{[np.where|https://numpy.org/doc/stable/reference/generated/numpy.where.html]}}
>  or 
> {{[np.select|https://numpy.org/doc/stable/reference/generated/numpy.select.html]}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11514) [R][C++] Bindings for paste(), paste0(), str_c()

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-11514:
---
Labels: pull-request-available  (was: )

> [R][C++] Bindings for paste(), paste0(), str_c()
> 
>
> Key: ARROW-11514
> URL: https://issues.apache.org/jira/browse/ARROW-11514
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * In {{paste()}} and {{paste0()}}, use the {{REPLACE}} null handling behavior 
> with replacement string {{"NA"}} (for consistency with base R)
>  * In {{str_c()}}, use the {{EMIT_NULL}} null handling behavior (for 
> consistency with stringr)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13092) [C++] CreateDir should fail if the target exists and is not a directory

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13092:
---
Labels: pull-request-available  (was: )

> [C++] CreateDir should fail if the target exists and is not a directory
> ---
>
> Key: ARROW-13092
> URL: https://issues.apache.org/jira/browse/ARROW-13092
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in 
> https://github.com/apache/arrow/pull/10540#issuecomment-862284472 .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13034) [Python][Docs] Update outdated examples for hdfs/azure on the Parquet doc page

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13034:
---
Labels: pull-request-available  (was: )

> [Python][Docs] Update outdated examples for hdfs/azure on the Parquet doc page
> --
>
> Key: ARROW-13034
> URL: https://issues.apache.org/jira/browse/ARROW-13034
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From https://github.com/apache/arrow/issues/10492
> - The chapter "Writing to Partitioned Datasets" still presents a "solution" 
> with "hdfs.connect" but since it's mentioned as deprecated no more a good 
> idea to mention it.
> - The chapter "Reading a Parquet File from Azure Blob storage" is based on 
> the package "azure.storage.blob" ... but an old one and the actual 
> "azure-sdk-for-python" doesn't have any-more methods like 
> get_blob_to_stream(). Possible to update this part with new blob storage 
> possibilities, and also another mentioning the same concept with Delta Lake 
> (similar principle but since there are differences ...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13095) [C++] Implement trigonometric compute functions

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13095:
---
Labels: beginner pull-request-available  (was: beginner)

> [C++] Implement trigonometric compute functions
> ---
>
> Key: ARROW-13095
> URL: https://issues.apache.org/jira/browse/ARROW-13095
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: David Li
>Assignee: David Li
>Priority: Major
>  Labels: beginner, pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> sin, cos, asin, acos, tan, atan, cotan, atan2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13116) [R] Test for RecordBatchReader to C-interface fails on arrow-r-minimal due to missing dependencies

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13116:
---
Labels: pull-request-available  (was: )

> [R] Test for RecordBatchReader to C-interface fails on arrow-r-minimal due to 
> missing dependencies
> --
>
> Key: ARROW-13116
> URL: https://issues.apache.org/jira/browse/ARROW-13116
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Reporter: Nic Crane
>Assignee: Nic Crane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test just needs updating with skip_if_not_available("dataset")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13042) [C++] Automatic checks that kernels don't leave uninitialized data in output

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13042:
---
Labels: pull-request-available  (was: )

> [C++] Automatic checks that kernels don't leave uninitialized data in output
> 
>
> Key: ARROW-13042
> URL: https://issues.apache.org/jira/browse/ARROW-13042
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To minimize the risk of issues such as ARROW-13041, perhaps our compute 
> kernel test harness should include a check that allocated data is always 
> initialized (using Valgrind).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13097) [C++] Provide a simple reflection utility for {{struct}}s

2021-06-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13097:
---
Labels: pull-request-available  (was: )

> [C++] Provide a simple reflection utility for {{struct}}s
> -
>
> Key: ARROW-13097
> URL: https://issues.apache.org/jira/browse/ARROW-13097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In cases such as ARROW-13025 it's advantageous to avoid boilerplate when 
> dealing with objects which are basic structs of data members. A simple 
> reflection utility (get/set the value of a data member, print the name of a 
> member to string) would allow writing functionality generically in terms of a 
> tuple of properties, greatly reducing boilerplate.
> See a sketch of one such utility here 
> https://gist.github.com/bkietz/7899f477e86df49f21ab17201c518d74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-12074) [C++][Compute] Add scalar arithmetic kernels for decimal inputs

2021-06-18 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-12074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman resolved ARROW-12074.
--
Resolution: Fixed

Issue resolved by pull request 10364
[https://github.com/apache/arrow/pull/10364]

> [C++][Compute] Add scalar arithmetic kernels for decimal inputs
> ---
>
> Key: ARROW-12074
> URL: https://issues.apache.org/jira/browse/ARROW-12074
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Yibo Cai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 5.0.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13122) [C++][Compute] Dispatch* should examine options as well as input types

2021-06-18 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-13122:


 Summary: [C++][Compute] Dispatch* should examine options as well 
as input types
 Key: ARROW-13122
 URL: https://issues.apache.org/jira/browse/ARROW-13122
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ben Kietzman


{{Function::Dispatch*}} should have access to options as well as argument types.

This will allow kernel authors to write kernels which are specific to certain 
configurations of options. Otherwise we may be leaving performance on the table 
when for example a function's output *could* be contiguously preallocated, but 
only for the default FunctionOptions. Currently the author would have no choice 
but to choose the lowest-common-denominator flags for the kernel.

In another vein, "cast" is currently a MetaFunction instead of a ScalarFunction 
since it derives its output type from CastOptions. This requires a special case 
in Expressions since Expressions can only represent calls to scalar functions. 
Ideally a function which is semantically scalar like "cast" wouldn't need to 
resort to using a MetaFunction for dispatch

See also: https://github.com/apache/arrow/pull/10547#discussion_r654573800



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-13121) [C++][Compute] Extract preallocation logic from KernelExecutor

2021-06-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365633#comment-17365633
 ] 

Antoine Pitrou commented on ARROW-13121:


I'd also mention that when I worked on ARROW-13042, I spent a lot of time 
trying to figure out which code paths exactly got executed in {{exec.cc}} and I 
never fully figured it out (one particularly case was implicit casting and 
broadcasting with a NullScalar LHS and a ChunkedArray RHS on a scalar kernel). 
I ended up trying to find clues in other places instead.

> [C++][Compute] Extract preallocation logic from KernelExecutor
> --
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other 
> buffers based on simple flags on each Kernel. This is not very flexible and 
> we end up leaving a lot of performance on the table in cases where we can 
> preallocate but the behavior can't be captured in the available flags. For 
> example, in the case of {{binary_string_join_element_wise}}, it would be 
> possible to preallocate all buffers (even the character buffer) and write 
> output into slices.
> Having this as a public function would enable us to unit test it directly 
> (currently Executors are only tested indirectly through calling of 
> compute::Functions) and reuse it, for example to correctly preallocate a 
> small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each 
> ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for 
> example
> // kernels which support contiguous output preallocation will preallocate a 
> single Datum
> // (and can then output into slices of that Datum).
> Result> Kernel::prepare_output(
>   const Kernel*,
>   KernelContext*,
>   const std::vector& inputs)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13121) [C++][Compute] Extract preallocation logic from KernelExecutor

2021-06-18 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-13121:
-
Summary: [C++][Compute] Extract preallocation logic from KernelExecutor  
(was: [C++][Compute] Extract preallocation logic to a public function)

> [C++][Compute] Extract preallocation logic from KernelExecutor
> --
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other 
> buffers based on simple flags on each Kernel. This is not very flexible and 
> we end up leaving a lot of performance on the table in cases where we can 
> preallocate but the behavior can't be captured in the available flags. For 
> example, in the case of {{binary_string_join_element_wise}}, it would be 
> possible to preallocate all buffers (even the character buffer) and write 
> output into slices.
> Having this as a public function would enable us to unit test it directly 
> (currently Executors are only tested indirectly through calling of 
> compute::Functions) and reuse it, for example to correctly preallocate a 
> small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each 
> ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for 
> example
> // kernels which support contiguous output preallocation will preallocate a 
> single Datum
> // (and can then output into slices of that Datum).
> Result> Kernel::prepare_output(
>   const Kernel*,
>   KernelContext*,
>   const std::vector& inputs)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13121) [C++][Compute] Extract preallocation logic to a public function

2021-06-18 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman updated ARROW-13121:
-
Summary: [C++][Compute] Extract preallocation logic to a public function  
(was: [C++][Compute] Extract preallocation logic to a method of kernels)

> [C++][Compute] Extract preallocation logic to a public function
> ---
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other 
> buffers based on simple flags on each Kernel. This is not very flexible and 
> we end up leaving a lot of performance on the table in cases where we can 
> preallocate but the behavior can't be captured in the available flags. For 
> example, in the case of {{binary_string_join_element_wise}}, it would be 
> possible to preallocate all buffers (even the character buffer) and write 
> output into slices.
> Having this as a public function would enable us to unit test it directly 
> (currently Executors are only tested indirectly through calling of 
> compute::Functions) and reuse it, for example to correctly preallocate a 
> small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each 
> ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for 
> example
> // kernels which support contiguous output preallocation will preallocate a 
> single Datum
> // (and can then output into slices of that Datum).
> Result> Kernel::prepare_output(
>   const Kernel*,
>   KernelContext*,
>   const std::vector& inputs)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-13121) [C++][Compute] Extract preallocation logic to a method of kernels

2021-06-18 Thread Ben Kietzman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365627#comment-17365627
 ] 

Ben Kietzman commented on ARROW-13121:
--

[~wesm]

> [C++][Compute] Extract preallocation logic to a method of kernels
> -
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other 
> buffers based on simple flags on each Kernel. This is not very flexible and 
> we end up leaving a lot of performance on the table in cases where we can 
> preallocate but the behavior can't be captured in the available flags. For 
> example, in the case of {{binary_string_join_element_wise}}, it would be 
> possible to preallocate all buffers (even the character buffer) and write 
> output into slices.
> Having this as a public function would enable us to unit test it directly 
> (currently Executors are only tested indirectly through calling of 
> compute::Functions) and reuse it, for example to correctly preallocate a 
> small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each 
> ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for 
> example
> // kernels which support contiguous output preallocation will preallocate a 
> single Datum
> // (and can then output into slices of that Datum).
> Result> Kernel::prepare_output(
>   const Kernel*,
>   KernelContext*,
>   const std::vector& inputs)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13121) [C++][Compute] Extract preallocation logic to a method of kernels

2021-06-18 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-13121:


 Summary: [C++][Compute] Extract preallocation logic to a method of 
kernels
 Key: ARROW-13121
 URL: https://issues.apache.org/jira/browse/ARROW-13121
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ben Kietzman


Currently KernelExecutor handles preallocation of null bitmaps and other 
buffers based on simple flags on each Kernel. This is not very flexible and we 
end up leaving a lot of performance on the table in cases where we can 
preallocate but the behavior can't be captured in the available flags. For 
example, in the case of {{binary_string_join_element_wise}}, it would be 
possible to preallocate all buffers (even the character buffer) and write 
output into slices.

Having this as a public function would enable us to unit test it directly 
(currently Executors are only tested indirectly through calling of 
compute::Functions) and reuse it, for example to correctly preallocate a small 
temporary for pipelined execution

One way this could be added is as a new method on each Kernel:

{code}
// Output preallocated Datums sufficient for execution of the kernel on each 
ExecBatch.
// The output Datums may not be identically chunked to the input batches, for 
example
// kernels which support contiguous output preallocation will preallocate a 
single Datum
// (and can then output into slices of that Datum).
Result> Kernel::prepare_output(
  const Kernel*,
  KernelContext*,
  const std::vector& inputs)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-18 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-13120:
---
Description: 
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

  
{code:java}
#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
 arrow::arrow_reader::tests::failing_test stdout 
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 

  was:
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

 
{code:java}

#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
 arrow::arrow_reader::tests::failing_test stdout 
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 


> [Rust][Parquet] Cannot read multiple batches from parquet with string list 
> column
> -
>
> Key: ARROW-13120
> URL: https://issues.apache.org/jira/browse/ARROW-13120
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Morgan Cassels
>Priority: Major
> Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. 
> The attached parquet `test.parquet` has 31430 rows and a single column 
> containing string lists. This issue does not appear to occur for parquets 
> with integer list columns.
>   
> {code:java}
> #[test]
>  fn failing_test() {
>  let parquet_file_reader = get_test_reader("test.parquet");
>  let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
>  let mut record_batches = Vec::new();
>  let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
>  for batch in record_batch_reader {
>record_batches.push(batch);
>  }
> }
> {code}
>  
> {code:java}
>  arrow::arrow_reader::tests::failing_test stdout 
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
> infallable creation of GenericListArray from ArrayDataRef failed: 
> InvalidArgumentError("offsets do not start at zero")', 
> arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-18 Thread Morgan Cassels (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morgan Cassels updated ARROW-13120:
---
Description: 
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

 
{code:java}

#[test]
 fn failing_test() {
 let parquet_file_reader = get_test_reader("test.parquet");
 let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
 let mut record_batches = Vec::new();
 let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
 for batch in record_batch_reader {
   record_batches.push(batch);
 }
}
{code}
 
{code:java}
 arrow::arrow_reader::tests::failing_test stdout 
thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
{code}
 

 

  was:
This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

```
#[test]
fnfailing_test() {
letparquet_file_reader =
get_test_reader("test.parquet");
letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader);

letmutrecord_batches = Vec::new();

letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap();

forbatchinrecord_batch_reader {
record_batches.push(batch);
}
}
```
```
 arrow::arrow_reader::tests::failing_test stdout 

thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```


> [Rust][Parquet] Cannot read multiple batches from parquet with string list 
> column
> -
>
> Key: ARROW-13120
> URL: https://issues.apache.org/jira/browse/ARROW-13120
> Project: Apache Arrow
>  Issue Type: Bug
>Reporter: Morgan Cassels
>Priority: Major
> Attachments: test.parquet
>
>
> This issue only occurs when the batch size < the number of rows in the table. 
> The attached parquet `test.parquet` has 31430 rows and a single column 
> containing string lists. This issue does not appear to occur for parquets 
> with integer list columns.
>  
>  
> {code:java}
> #[test]
>  fn failing_test() {
>  let parquet_file_reader = get_test_reader("test.parquet");
>  let mut arrow_reader = ParquetFileArrowReader::new(parquet_file_reader);
>  let mut record_batches = Vec::new();
>  let record_batch_reader = arrow_reader.get_record_reader(1024).unwrap();
>  for batch in record_batch_reader {
>record_batches.push(batch);
>  }
> }
> {code}
>  
> {code:java}
>  arrow::arrow_reader::tests::failing_test stdout 
> thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
> infallable creation of GenericListArray from ArrayDataRef failed: 
> InvalidArgumentError("offsets do not start at zero")', 
> arrow/src/array/array_list.rs:195:45
> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13120) [Rust][Parquet] Cannot read multiple batches from parquet with string list column

2021-06-18 Thread Morgan Cassels (Jira)
Morgan Cassels created ARROW-13120:
--

 Summary: [Rust][Parquet] Cannot read multiple batches from parquet 
with string list column
 Key: ARROW-13120
 URL: https://issues.apache.org/jira/browse/ARROW-13120
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Morgan Cassels
 Attachments: test.parquet

This issue only occurs when the batch size < the number of rows in the table. 
The attached parquet `test.parquet` has 31430 rows and a single column 
containing string lists. This issue does not appear to occur for parquets with 
integer list columns.

 

```
#[test]
fnfailing_test() {
letparquet_file_reader =
get_test_reader("test.parquet");
letmutarrow_reader = ParquetFileArrowReader::new(parquet_file_reader);

letmutrecord_batches = Vec::new();

letrecord_batch_reader = arrow_reader.get_record_reader(1024).unwrap();

forbatchinrecord_batch_reader {
record_batches.push(batch);
}
}
```
```
 arrow::arrow_reader::tests::failing_test stdout 

thread 'arrow::arrow_reader::tests::failing_test' panicked at 'Expected 
infallable creation of GenericListArray from ArrayDataRef failed: 
InvalidArgumentError("offsets do not start at zero")', 
arrow/src/array/array_list.rs:195:45

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-11514) [R][C++] Bindings for paste(), paste0(), str_c()

2021-06-18 Thread Ian Cook (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-11514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-11514:
-
Summary: [R][C++] Bindings for paste(), paste0(), str_c()  (was: [R] 
Bindings for paste(), paste0(), str_c())

> [R][C++] Bindings for paste(), paste0(), str_c()
> 
>
> Key: ARROW-11514
> URL: https://issues.apache.org/jira/browse/ARROW-11514
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Ian Cook
>Priority: Major
> Fix For: 5.0.0
>
>
> * In {{paste()}} and {{paste0()}}, use the {{REPLACE}} null handling behavior 
> with replacement string {{"NA"}} (for consistency with base R)
>  * In {{str_c()}}, use the {{EMIT_NULL}} null handling behavior (for 
> consistency with stringr)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13119) [R] Set empty schema in scalar Expressions

2021-06-18 Thread Ian Cook (Jira)
Ian Cook created ARROW-13119:


 Summary: [R] Set empty schema in scalar Expressions
 Key: ARROW-13119
 URL: https://issues.apache.org/jira/browse/ARROW-13119
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Ian Cook
Assignee: Ian Cook
 Fix For: 5.0.0


Closely related to ARROW-13117 is the problem of {{type()}} and {{type_id()}} 
not working for scalar expressions. For example, currently this happens:

{code:r}> Expression$scalar("foo")$type()
 Error: !is.null(schema) is not TRUE

> Expression$scalar(42L)$type()
 Error: !is.null(schema) is not TRUE{code}

This is what we want to happen:
{code:r}> Expression$scalar("foo")$type()
Utf8
string

> Expression$scalar(42L)$type()
Int32
int32{code}
This is simple to solve; we just need to set {{schema}} to an empty schema for 
all scalar expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13119) [R] Set empty schema in scalar Expressions

2021-06-18 Thread Ian Cook (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Cook updated ARROW-13119:
-
Description: 
Closely related to ARROW-13117 is the problem of {{type()}} and {{type_id()}} 
not working for scalar expressions. For example, currently this happens:

{code:r}> Expression$scalar("foo")$type()
Error: !is.null(schema) is not TRUE

> Expression$scalar(42L)$type()
Error: !is.null(schema) is not TRUE{code}

This is what we want to happen:
{code:r}> Expression$scalar("foo")$type()
Utf8
string

> Expression$scalar(42L)$type()
Int32
int32{code}
This is simple to solve; we just need to set {{schema}} to an empty schema for 
all scalar expressions.

  was:
Closely related to ARROW-13117 is the problem of {{type()}} and {{type_id()}} 
not working for scalar expressions. For example, currently this happens:

{code:r}> Expression$scalar("foo")$type()
 Error: !is.null(schema) is not TRUE

> Expression$scalar(42L)$type()
 Error: !is.null(schema) is not TRUE{code}

This is what we want to happen:
{code:r}> Expression$scalar("foo")$type()
Utf8
string

> Expression$scalar(42L)$type()
Int32
int32{code}
This is simple to solve; we just need to set {{schema}} to an empty schema for 
all scalar expressions.


> [R] Set empty schema in scalar Expressions
> --
>
> Key: ARROW-13119
> URL: https://issues.apache.org/jira/browse/ARROW-13119
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
> Fix For: 5.0.0
>
>
> Closely related to ARROW-13117 is the problem of {{type()}} and {{type_id()}} 
> not working for scalar expressions. For example, currently this happens:
> {code:r}> Expression$scalar("foo")$type()
> Error: !is.null(schema) is not TRUE
> > Expression$scalar(42L)$type()
> Error: !is.null(schema) is not TRUE{code}
> This is what we want to happen:
> {code:r}> Expression$scalar("foo")$type()
> Utf8
> string
> > Expression$scalar(42L)$type()
> Int32
> int32{code}
> This is simple to solve; we just need to set {{schema}} to an empty schema 
> for all scalar expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-13118) [R] Improve handling of R scalars in some nse_funcs

2021-06-18 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365552#comment-17365552
 ] 

Ian Cook commented on ARROW-13118:
--

[~npr] what do you think? Does the solution described above (creating a 
{{wrap_r_scalar}} function) seem reasonable?

> [R] Improve handling of R scalars in some nse_funcs
> ---
>
> Key: ARROW-13118
> URL: https://issues.apache.org/jira/browse/ARROW-13118
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
> Fix For: 5.0.0
>
>
> Some of the functions in {{nse_funcs}} do not behave properly when passed R 
> scalar input in expressions in dplyr verbs. Some examples:
> {code:r}
> Table$create(x = 1) %>% mutate(as.character(42))
> Table$create(x = 1) %>% mutate(is.character(("foo")))
> Table$create(x = 1) %>% mutate(nchar("foo"))
> Table$create(x = 1) %>% mutate(is.infinite(Inf))
> {code}
> This could be resolved by using {{build_expr()}} instead of 
> {{Expression$create()}}, but {{build_expr()}} is awfully heavy. The only part 
> of it we really need to make this work is this:
> {code:r}
> args <- lapply(args, function(x) {
>   if (!inherits(x, "Expression")) {
> x <- Expression$scalar(x)
>   }
>   x
> }){code}
> Maybe we could make a function called {{wrap_r_scalar}}, like this:
> {code:r}
> wrap_r_scalar <- function(x) {
>   if (!inherits(x "Expression")) {
> assert_that(
>   length(x) == 1,
>   msg = "Literal vectors of length != 1 not supported"
> )
> Expression$scalar(x)
>   } else {
> x
>   }
> }
> {code}
> and use it as needed in various of the {{nse_funcs}} functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13118) [R] Improve handling of R scalars in some nse_funcs

2021-06-18 Thread Ian Cook (Jira)
Ian Cook created ARROW-13118:


 Summary: [R] Improve handling of R scalars in some nse_funcs
 Key: ARROW-13118
 URL: https://issues.apache.org/jira/browse/ARROW-13118
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Ian Cook
Assignee: Ian Cook
 Fix For: 5.0.0


Some of the functions in {{nse_funcs}} do not behave properly when passed R 
scalar input in expressions in dplyr verbs. Some examples:
{code:r}
Table$create(x = 1) %>% mutate(as.character(42))
Table$create(x = 1) %>% mutate(is.character(("foo")))
Table$create(x = 1) %>% mutate(nchar("foo"))
Table$create(x = 1) %>% mutate(is.infinite(Inf))
{code}
This could be resolved by using {{build_expr()}} instead of 
{{Expression$create()}}, but {{build_expr()}} is awfully heavy. The only part 
of it we really need to make this work is this:
{code:r}
args <- lapply(args, function(x) {
  if (!inherits(x, "Expression")) {
x <- Expression$scalar(x)
  }
  x
}){code}
Maybe we could make a function called {{wrap_r_scalar}}, like this:
{code:r}
wrap_r_scalar <- function(x) {
  if (!inherits(x "Expression")) {
assert_that(
  length(x) == 1,
  msg = "Literal vectors of length != 1 not supported"
)
Expression$scalar(x)
  } else {
x
  }
}
{code}
and use it as needed in various of the {{nse_funcs}} functions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13117) [R] Retain schema in new Expressions

2021-06-18 Thread Ian Cook (Jira)
Ian Cook created ARROW-13117:


 Summary: [R] Retain schema in new Expressions
 Key: ARROW-13117
 URL: https://issues.apache.org/jira/browse/ARROW-13117
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Ian Cook
Assignee: Ian Cook
 Fix For: 5.0.0


When a new Expression is created, {{schema}} should be retained from the 
expression(s) it was created from. That way, the {{type()}} and {{type_id()}} 
methods of the new Expression will work. For example, currently this happens:
{code:r}
> x <- Expression$field_ref("x")
> x$schema <- Schema$create(x = int32())
> 
> y <- Expression$field_ref("y")
> y$schema <- Schema$create(x = int32())
> 
> Expression$create("add_checked", x, y)$type()
Error: !is.null(schema) is not TRUE {code}

This is what we want to happen:
{code:r}
> Expression$create("add_checked", x, y)$type()
Int32
int32
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (ARROW-13097) [C++] Provide a simple reflection utility for {{struct}}s

2021-06-18 Thread Ben Kietzman (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Kietzman reassigned ARROW-13097:


Assignee: Ben Kietzman

> [C++] Provide a simple reflection utility for {{struct}}s
> -
>
> Key: ARROW-13097
> URL: https://issues.apache.org/jira/browse/ARROW-13097
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Assignee: Ben Kietzman
>Priority: Major
>
> In cases such as ARROW-13025 it's advantageous to avoid boilerplate when 
> dealing with objects which are basic structs of data members. A simple 
> reflection utility (get/set the value of a data member, print the name of a 
> member to string) would allow writing functionality generically in terms of a 
> tuple of properties, greatly reducing boilerplate.
> See a sketch of one such utility here 
> https://gist.github.com/bkietz/7899f477e86df49f21ab17201c518d74



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13116) [R] Test for RecordBatchReader to C-interface fails on arrow-r-minimal due to missing dependencies

2021-06-18 Thread Nic Crane (Jira)
Nic Crane created ARROW-13116:
-

 Summary: [R] Test for RecordBatchReader to C-interface fails on 
arrow-r-minimal due to missing dependencies
 Key: ARROW-13116
 URL: https://issues.apache.org/jira/browse/ARROW-13116
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Nic Crane
Assignee: Nic Crane


The test just needs updating with skip_if_not_available("dataset")



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13115) plasma.PlasmaClient do not disconnect when user tried to delete it

2021-06-18 Thread Yuxian Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuxian Meng updated ARROW-13115:

Description: 
```
import pyarrow.plasma as plasma
for _ in range(1):
c = plasma.connect("/tmp/plasma")
del c
```
The above code turns out not to call c.disconnect() automatically, and will 
cause `Connection to IPC socket failed` error.

  was:
```
import pyarrow.plasma as plasma
c = plasma.connect("/tmp.plasma")
del c
```


> plasma.PlasmaClient do not disconnect when user tried to delete it
> --
>
> Key: ARROW-13115
> URL: https://issues.apache.org/jira/browse/ARROW-13115
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 4.0.0
>Reporter: Yuxian Meng
>Priority: Critical
>
> ```
> import pyarrow.plasma as plasma
> for _ in range(1):
> c = plasma.connect("/tmp/plasma")
> del c
> ```
> The above code turns out not to call c.disconnect() automatically, and will 
> cause `Connection to IPC socket failed` error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13115) plasma.PlasmaClient do not disconnect when user tried to delete it

2021-06-18 Thread Yuxian Meng (Jira)
Yuxian Meng created ARROW-13115:
---

 Summary: plasma.PlasmaClient do not disconnect when user tried to 
delete it
 Key: ARROW-13115
 URL: https://issues.apache.org/jira/browse/ARROW-13115
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 4.0.0
Reporter: Yuxian Meng


```
import pyarrow.plasma as plasma
c = plasma.connect("/tmp.plasma")
del c
```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13114) [R] use altrep when possible for (RecordBatch, Table) -> data.frame

2021-06-18 Thread Romain Francois (Jira)
Romain Francois created ARROW-13114:
---

 Summary: [R] use altrep when possible for (RecordBatch, Table) -> 
data.frame
 Key: ARROW-13114
 URL: https://issues.apache.org/jira/browse/ARROW-13114
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain Francois
Assignee: Romain Francois






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13113) [R] use RTasks to manage parallel in converting arrow to R

2021-06-18 Thread Romain Francois (Jira)
Romain Francois created ARROW-13113:
---

 Summary: [R] use RTasks to manage parallel in converting arrow to R
 Key: ARROW-13113
 URL: https://issues.apache.org/jira/browse/ARROW-13113
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain Francois
Assignee: Romain Francois






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13112) [R] altrep vectors for strings

2021-06-18 Thread Romain Francois (Jira)
Romain Francois created ARROW-13112:
---

 Summary: [R] altrep vectors for strings
 Key: ARROW-13112
 URL: https://issues.apache.org/jira/browse/ARROW-13112
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Romain Francois
Assignee: Romain Francois






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13111) [R] altrep vectors for ChunkedArray

2021-06-18 Thread Romain Francois (Jira)
Romain Francois created ARROW-13111:
---

 Summary: [R] altrep vectors for ChunkedArray
 Key: ARROW-13111
 URL: https://issues.apache.org/jira/browse/ARROW-13111
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Reporter: Romain Francois
Assignee: Romain Francois






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-13046) [Release] JS package failing test prior to publish

2021-06-18 Thread Kouhei Sutou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kouhei Sutou resolved ARROW-13046.
--
Fix Version/s: 5.0.0
 Assignee: Kouhei Sutou
   Resolution: Fixed

I've updated 
https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide .

> [Release] JS package failing test prior to publish
> --
>
> Key: ARROW-13046
> URL: https://issues.apache.org/jira/browse/ARROW-13046
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Jorge Leitão
>Assignee: Kouhei Sutou
>Priority: Major
> Fix For: 5.0.0
>
>
> While trying to publish the JS, I am getting an error when running the tests 
> (on mac).
> To reproduce, run `dev/release/post-05-js.sh 4.0.1` on branch 
> `release-arrow-4.0.1`:
> {code:java}
> ~/projects/arrow/apache-arrow-4.0.1/js ~/projects/arrow
> yarn install v1.22.1
> [1/5]   Validating package.json...
> [2/5]   Resolving packages...
> [3/5]   Fetching packages...
> info google-closure-compiler-linux@20210406.0.0: The platform "darwin" is 
> incompatible with this module.
> info "google-closure-compiler-linux@20210406.0.0" is an optional dependency 
> and failed compatibility check. Excluding it from installation.
> info google-closure-compiler-windows@20210406.0.0: The platform "darwin" is 
> incompatible with this module.
> info "google-closure-compiler-windows@20210406.0.0" is an optional dependency 
> and failed compatibility check. Excluding it from installation.
> [4/5]   Linking dependencies...
> warning "lerna > @lerna/version > @lerna/github-client > @octokit/rest > 
> @octokit/plugin-request-log@1.0.3" has unmet peer dependency 
> "@octokit/core@>=3".
> [5/5]   Building fresh packages...
> warning Your current version of Yarn is out of date. The latest version is 
> "1.22.5", while you're on "1.22.1".
> info To upgrade, run the following command:
> $ brew upgrade yarn
> ✨  Done in 121.72s.
> yarn run v1.22.1
> $ 
> /Users/jorgecarleitao/projects/arrow/apache-arrow-4.0.1/js/node_modules/.bin/gulp
> [05:39:21] Using gulpfile ~/projects/arrow/apache-arrow-4.0.1/js/gulpfile.js
> [05:39:21] Starting 'default'...
> [05:39:21] Starting 'clean'...
> [05:39:21] Starting 'clean:ts'...
> [05:39:21] Starting 'clean:apache-arrow'...
> [05:39:21] Starting 'clean:es5:cjs'...
> [05:39:21] Starting 'clean:es2015:cjs'...
> [05:39:21] Starting 'clean:esnext:cjs'...
> [05:39:21] Starting 'clean:es5:esm'...
> [05:39:21] Starting 'clean:es2015:esm'...
> [05:39:21] Starting 'clean:esnext:esm'...
> [05:39:21] Starting 'clean:es5:cls'...
> [05:39:21] Starting 'clean:es2015:cls'...
> [05:39:21] Starting 'clean:esnext:cls'...
> [05:39:21] Starting 'clean:es5:umd'...
> [05:39:21] Starting 'clean:es2015:umd'...
> [05:39:21] Starting 'clean:esnext:umd'...
> [05:39:21] Finished 'clean:ts' after 211 ms
> [05:39:21] Finished 'clean:apache-arrow' after 199 ms
> [05:39:21] Finished 'clean:es5:cjs' after 195 ms
> [05:39:21] Finished 'clean:es2015:cjs' after 196 ms
> [05:39:21] Finished 'clean:esnext:cjs' after 190 ms
> [05:39:21] Finished 'clean:es5:esm' after 180 ms
> [05:39:21] Finished 'clean:es2015:esm' after 172 ms
> [05:39:21] Finished 'clean:esnext:esm' after 169 ms
> [05:39:21] Finished 'clean:es5:cls' after 151 ms
> [05:39:21] Finished 'clean:es2015:cls' after 146 ms
> [05:39:22] Finished 'clean:esnext:cls' after 163 ms
> [05:39:22] Finished 'clean:es5:umd' after 149 ms
> [05:39:22] Finished 'clean:es2015:umd' after 146 ms
> [05:39:22] Finished 'clean:esnext:umd' after 142 ms
> [05:39:22] Finished 'clean' after 293 ms
> [05:39:22] Starting 'build'...
> [05:39:22] Starting 'build:ts'...
> [05:39:22] Starting 'build:apache-arrow'...
> [05:39:22] Starting 'build:es5:cjs'...
> [05:39:22] Starting 'clean:ts'...
> [05:39:22] Starting 'clean:es5:cjs'...
> [05:39:22] Finished 'clean:ts' after 728 μs
> [05:39:22] Starting 'compile:ts'...
> [05:39:22] Starting 'build:es2015:umd'...
> [05:39:22] Starting 'build:esnext:cjs'...
> [05:39:22] Starting 'build:esnext:esm'...
> [05:39:22] Starting 'build:esnext:umd'...
> [05:39:22] Finished 'clean:es5:cjs' after 11 ms
> [05:39:22] Starting 'compile:es5:cjs'...
> [05:39:22] Starting 'build:es2015:cls'...
> [05:39:22] Starting 'clean:esnext:cjs'...
> [05:39:22] Starting 'clean:esnext:esm'...
> [05:39:22] Starting 'build:esnext:cls'...
> [05:39:22] Starting 'clean:es2015:cls'...
> [05:39:22] Finished 'clean:esnext:cjs' after 30 ms
> [05:39:22] Starting 'compile:esnext:cjs'...
> [05:39:22] Finished 'clean:esnext:esm' after 28 ms
> [05:39:22] Starting 'compile:esnext:esm'...
> [05:39:22] Starting 'clean:esnext:cls'...
> [05:39:22] Finished 'clean:es2015:cls' after 53 ms
> [05:39:22] Starting 'compile:es2015:cls'...
> [05:39:22] Finished 'clean:esnext:cls' after 43 ms
> 

[jira] [Commented] (ARROW-13046) [Release] JS package failing test prior to publish

2021-06-18 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-13046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17365257#comment-17365257
 ] 

Kouhei Sutou commented on ARROW-13046:
--

Published.

I needed to be logged in to {{registry.yarnpkg.com}} not 
{{registry.npmjs.org}}: {{npm login --registry=https://registry.yarnpkg.com/}}

> [Release] JS package failing test prior to publish
> --
>
> Key: ARROW-13046
> URL: https://issues.apache.org/jira/browse/ARROW-13046
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: JavaScript
>Reporter: Jorge Leitão
>Priority: Major
>
> While trying to publish the JS, I am getting an error when running the tests 
> (on mac).
> To reproduce, run `dev/release/post-05-js.sh 4.0.1` on branch 
> `release-arrow-4.0.1`:
> {code:java}
> ~/projects/arrow/apache-arrow-4.0.1/js ~/projects/arrow
> yarn install v1.22.1
> [1/5]   Validating package.json...
> [2/5]   Resolving packages...
> [3/5]   Fetching packages...
> info google-closure-compiler-linux@20210406.0.0: The platform "darwin" is 
> incompatible with this module.
> info "google-closure-compiler-linux@20210406.0.0" is an optional dependency 
> and failed compatibility check. Excluding it from installation.
> info google-closure-compiler-windows@20210406.0.0: The platform "darwin" is 
> incompatible with this module.
> info "google-closure-compiler-windows@20210406.0.0" is an optional dependency 
> and failed compatibility check. Excluding it from installation.
> [4/5]   Linking dependencies...
> warning "lerna > @lerna/version > @lerna/github-client > @octokit/rest > 
> @octokit/plugin-request-log@1.0.3" has unmet peer dependency 
> "@octokit/core@>=3".
> [5/5]   Building fresh packages...
> warning Your current version of Yarn is out of date. The latest version is 
> "1.22.5", while you're on "1.22.1".
> info To upgrade, run the following command:
> $ brew upgrade yarn
> ✨  Done in 121.72s.
> yarn run v1.22.1
> $ 
> /Users/jorgecarleitao/projects/arrow/apache-arrow-4.0.1/js/node_modules/.bin/gulp
> [05:39:21] Using gulpfile ~/projects/arrow/apache-arrow-4.0.1/js/gulpfile.js
> [05:39:21] Starting 'default'...
> [05:39:21] Starting 'clean'...
> [05:39:21] Starting 'clean:ts'...
> [05:39:21] Starting 'clean:apache-arrow'...
> [05:39:21] Starting 'clean:es5:cjs'...
> [05:39:21] Starting 'clean:es2015:cjs'...
> [05:39:21] Starting 'clean:esnext:cjs'...
> [05:39:21] Starting 'clean:es5:esm'...
> [05:39:21] Starting 'clean:es2015:esm'...
> [05:39:21] Starting 'clean:esnext:esm'...
> [05:39:21] Starting 'clean:es5:cls'...
> [05:39:21] Starting 'clean:es2015:cls'...
> [05:39:21] Starting 'clean:esnext:cls'...
> [05:39:21] Starting 'clean:es5:umd'...
> [05:39:21] Starting 'clean:es2015:umd'...
> [05:39:21] Starting 'clean:esnext:umd'...
> [05:39:21] Finished 'clean:ts' after 211 ms
> [05:39:21] Finished 'clean:apache-arrow' after 199 ms
> [05:39:21] Finished 'clean:es5:cjs' after 195 ms
> [05:39:21] Finished 'clean:es2015:cjs' after 196 ms
> [05:39:21] Finished 'clean:esnext:cjs' after 190 ms
> [05:39:21] Finished 'clean:es5:esm' after 180 ms
> [05:39:21] Finished 'clean:es2015:esm' after 172 ms
> [05:39:21] Finished 'clean:esnext:esm' after 169 ms
> [05:39:21] Finished 'clean:es5:cls' after 151 ms
> [05:39:21] Finished 'clean:es2015:cls' after 146 ms
> [05:39:22] Finished 'clean:esnext:cls' after 163 ms
> [05:39:22] Finished 'clean:es5:umd' after 149 ms
> [05:39:22] Finished 'clean:es2015:umd' after 146 ms
> [05:39:22] Finished 'clean:esnext:umd' after 142 ms
> [05:39:22] Finished 'clean' after 293 ms
> [05:39:22] Starting 'build'...
> [05:39:22] Starting 'build:ts'...
> [05:39:22] Starting 'build:apache-arrow'...
> [05:39:22] Starting 'build:es5:cjs'...
> [05:39:22] Starting 'clean:ts'...
> [05:39:22] Starting 'clean:es5:cjs'...
> [05:39:22] Finished 'clean:ts' after 728 μs
> [05:39:22] Starting 'compile:ts'...
> [05:39:22] Starting 'build:es2015:umd'...
> [05:39:22] Starting 'build:esnext:cjs'...
> [05:39:22] Starting 'build:esnext:esm'...
> [05:39:22] Starting 'build:esnext:umd'...
> [05:39:22] Finished 'clean:es5:cjs' after 11 ms
> [05:39:22] Starting 'compile:es5:cjs'...
> [05:39:22] Starting 'build:es2015:cls'...
> [05:39:22] Starting 'clean:esnext:cjs'...
> [05:39:22] Starting 'clean:esnext:esm'...
> [05:39:22] Starting 'build:esnext:cls'...
> [05:39:22] Starting 'clean:es2015:cls'...
> [05:39:22] Finished 'clean:esnext:cjs' after 30 ms
> [05:39:22] Starting 'compile:esnext:cjs'...
> [05:39:22] Finished 'clean:esnext:esm' after 28 ms
> [05:39:22] Starting 'compile:esnext:esm'...
> [05:39:22] Starting 'clean:esnext:cls'...
> [05:39:22] Finished 'clean:es2015:cls' after 53 ms
> [05:39:22] Starting 'compile:es2015:cls'...
> [05:39:22] Finished 'clean:esnext:cls' after 43 ms
> [05:39:22] Starting 'compile:esnext:cls'...
> [05:39:23]