subject:"\[jira\] \[Commented\] \(ARROW\-13121\) \[C\+\+\]\[Compute\] Extract preallocation logic from KernelExecutor"

[jira] [Commented] (ARROW-13121) [C++][Compute] Extract preallocation logic from KernelExecutor

2021-06-21 Thread Ben Kietzman (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366621#comment-17366621
 ] 

Ben Kietzman commented on ARROW-13121:
--

Another point which we might like to optimize: 
https://github.com/apache/arrow/pull/10550#discussion_r654521795

> [C++][Compute] Extract preallocation logic from KernelExecutor
> --
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other 
> buffers based on simple flags on each Kernel. This is not very flexible and 
> we end up leaving a lot of performance on the table in cases where we can 
> preallocate but the behavior can't be captured in the available flags. For 
> example, in the case of {{binary_string_join_element_wise}}, it would be 
> possible to preallocate all buffers (even the character buffer) and write 
> output into slices.
> Having this as a public function would enable us to unit test it directly 
> (currently Executors are only tested indirectly through calling of 
> compute::Functions) and reuse it, for example to correctly preallocate a 
> small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each 
> ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for 
> example
> // kernels which support contiguous output preallocation will preallocate a 
> single Datum
> // (and can then output into slices of that Datum).
> Result> Kernel::prepare_output(
>   const Kernel*,
>   KernelContext*,
>   const std::vector& inputs)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13121) [C++][Compute] Extract preallocation logic from KernelExecutor

2021-06-18 Thread Antoine Pitrou (Jira)



[ 
https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365633#comment-17365633
 ] 

Antoine Pitrou commented on ARROW-13121:


I'd also mention that when I worked on ARROW-13042, I spent a lot of time 
trying to figure out which code paths exactly got executed in {{exec.cc}} and I 
never fully figured it out (one particularly case was implicit casting and 
broadcasting with a NullScalar LHS and a ChunkedArray RHS on a scalar kernel). 
I ended up trying to find clues in other places instead.

> [C++][Compute] Extract preallocation logic from KernelExecutor
> --
>
> Key: ARROW-13121
> URL: https://issues.apache.org/jira/browse/ARROW-13121
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Ben Kietzman
>Priority: Major
>
> Currently KernelExecutor handles preallocation of null bitmaps and other 
> buffers based on simple flags on each Kernel. This is not very flexible and 
> we end up leaving a lot of performance on the table in cases where we can 
> preallocate but the behavior can't be captured in the available flags. For 
> example, in the case of {{binary_string_join_element_wise}}, it would be 
> possible to preallocate all buffers (even the character buffer) and write 
> output into slices.
> Having this as a public function would enable us to unit test it directly 
> (currently Executors are only tested indirectly through calling of 
> compute::Functions) and reuse it, for example to correctly preallocate a 
> small temporary for pipelined execution
> One way this could be added is as a new method on each Kernel:
> {code}
> // Output preallocated Datums sufficient for execution of the kernel on each 
> ExecBatch.
> // The output Datums may not be identically chunked to the input batches, for 
> example
> // kernels which support contiguous output preallocation will preallocate a 
> single Datum
> // (and can then output into slices of that Datum).
> Result> Kernel::prepare_output(
>   const Kernel*,
>   KernelContext*,
>   const std::vector& inputs)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-13121) [C++][Compute] Extract preallocation logic from KernelExecutor

[jira] [Commented] (ARROW-13121) [C++][Compute] Extract preallocation logic from KernelExecutor

2 matches

Site Navigation

Mail list logo

Footer information