[ https://issues.apache.org/jira/browse/ARROW-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365627#comment-17365627 ]
Ben Kietzman commented on ARROW-13121: -------------------------------------- [~wesm] > [C++][Compute] Extract preallocation logic to a method of kernels > ----------------------------------------------------------------- > > Key: ARROW-13121 > URL: https://issues.apache.org/jira/browse/ARROW-13121 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Ben Kietzman > Priority: Major > > Currently KernelExecutor handles preallocation of null bitmaps and other > buffers based on simple flags on each Kernel. This is not very flexible and > we end up leaving a lot of performance on the table in cases where we can > preallocate but the behavior can't be captured in the available flags. For > example, in the case of {{binary_string_join_element_wise}}, it would be > possible to preallocate all buffers (even the character buffer) and write > output into slices. > Having this as a public function would enable us to unit test it directly > (currently Executors are only tested indirectly through calling of > compute::Functions) and reuse it, for example to correctly preallocate a > small temporary for pipelined execution > One way this could be added is as a new method on each Kernel: > {code} > // Output preallocated Datums sufficient for execution of the kernel on each > ExecBatch. > // The output Datums may not be identically chunked to the input batches, for > example > // kernels which support contiguous output preallocation will preallocate a > single Datum > // (and can then output into slices of that Datum). > Result<std::vector<Datum>> Kernel::prepare_output( > const Kernel*, > KernelContext*, > const std::vector<ExecBatch>& inputs) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)