This is a very good question.
I agree with @Antoine and would like to add that the focus of compute
functions is to have a public API
while utility functions are for internal use.

A similar operation to ARROW-12739 are structural transformations [1] such
as "list_flatten" [2],
which makes use of a memory pool. Based on this, I would consider it a
compute kernel as a query engine
can benefit from it. To be more precise, compute functions are defined as
"analytical functions that process
primarily columnar data for either scalar or array inputs. These are
intended for use inside query engines,
data frames, etc."

Nevertheless, there are utility functions which make use of memory pools
(e.g., bitmap operations),
so I do not think that the use of a memory pool should dictate between
utility and compute functions.

~Eduardo

[1] https://arrow.apache.org/docs/cpp/compute.html#id2
[2]
https://github.com/edponce/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_nested.cc

On Tue, May 11, 2021 at 4:13 PM Antoine Pitrou <anto...@python.org> wrote:

>
> Le 11/05/2021 à 22:10, Weston Pace a écrit :
> > How does one decide between "utility function" and "compute function"?
> >    For example, https://issues.apache.org/jira/browse/ARROW-12739 is
> > very similar to StructArray::Make which is implemented as a static
> > function.  However, 12739 would require pool allocation (to
> > concatenate the list items into one large contiguous array) and array
> > iteration (to copy into the allocated array).  Does that make it a
> > compute function?
>
> If it's useful internally as a building block, then IMHO it should
> probably be a utility function.
>
> In this case it is a user request, and it has a non-trivial computation
> cost, so I'd say it should be a compute function.
>
> Regards
>
> Antoine.
>

Reply via email to