[ https://issues.apache.org/jira/browse/ARROW-15215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Li updated ARROW-15215: ----------------------------- Description: All six kernels use two sets of otherwise very similar kernel utilities for copying slices of an array into an output array. However, there's no reason they can't use the same utilities. The first set are here: "CopyFixedWidth" https://github.com/apache/arrow/blob/bd356295f6beaba744a2c6b498455701f53a64f8/cpp/src/arrow/compute/kernels/scalar_if_else.cc#L1282-L1284 The second set are here: "ReplaceWithMask::CopyData" https://github.com/apache/arrow/blob/bd356295f6beaba744a2c6b498455701f53a64f8/cpp/src/arrow/compute/kernels/vector_replace.cc#L208-L209 (This is a little confusing because the utilities are intertwined into the kernel implementation) They would need to be moved into a new header to share them between the codegen units. Also, their interfaces would need to be consolidated. Additionally, the utilities may be excessively verbose, or generate too much code for what they do. For instance, some of the utilities are templated out for every Arrow type. Instead, we could replace all instantiations for numbers, decimals, temporal types, and so on with a single one for FixedWidthType (an abstract base class). Care should be taken to evaluate the benchmarks for these kernels to ensure there is not a regression. was:All four kernels (and soon to be fill_null_forward/backward) make use of a set of very similar utilities for copying data between arrays; we should consolidate those into a single set of helpers instead of duplicating them, and consider whether they could be further consolidated (e.g. making use of the FixedWidthType hierarchy instead of specializing for every type) > [C++] Consolidate kernel data-copy utilities between replace_with_mask, > case_when, coalesce, choose, fill_null_forward, fill_null_backward > ------------------------------------------------------------------------------------------------------------------------------------------ > > Key: ARROW-15215 > URL: https://issues.apache.org/jira/browse/ARROW-15215 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: David Li > Assignee: Jabari Booker > Priority: Major > Labels: good-second-issue > > All six kernels use two sets of otherwise very similar kernel utilities for > copying slices of an array into an output array. However, there's no reason > they can't use the same utilities. > The first set are here: "CopyFixedWidth" > https://github.com/apache/arrow/blob/bd356295f6beaba744a2c6b498455701f53a64f8/cpp/src/arrow/compute/kernels/scalar_if_else.cc#L1282-L1284 > The second set are here: "ReplaceWithMask::CopyData" > https://github.com/apache/arrow/blob/bd356295f6beaba744a2c6b498455701f53a64f8/cpp/src/arrow/compute/kernels/vector_replace.cc#L208-L209 > (This is a little confusing because the utilities are intertwined into the > kernel implementation) > They would need to be moved into a new header to share them between the > codegen units. Also, their interfaces would need to be consolidated. > Additionally, the utilities may be excessively verbose, or generate too much > code for what they do. For instance, some of the utilities are templated out > for every Arrow type. Instead, we could replace all instantiations for > numbers, decimals, temporal types, and so on with a single one for > FixedWidthType (an abstract base class). Care should be taken to evaluate the > benchmarks for these kernels to ensure there is not a regression. -- This message was sent by Atlassian Jira (v8.20.1#820001)