This is a very good question. I agree with @Antoine and would like to add that the focus of compute functions is to have a public API while utility functions are for internal use.
A similar operation to ARROW-12739 are structural transformations [1] such as "list_flatten" [2], which makes use of a memory pool. Based on this, I would consider it a compute kernel as a query engine can benefit from it. To be more precise, compute functions are defined as "analytical functions that process primarily columnar data for either scalar or array inputs. These are intended for use inside query engines, data frames, etc." Nevertheless, there are utility functions which make use of memory pools (e.g., bitmap operations), so I do not think that the use of a memory pool should dictate between utility and compute functions. ~Eduardo [1] https://arrow.apache.org/docs/cpp/compute.html#id2 [2] https://github.com/edponce/arrow/blob/master/cpp/src/arrow/compute/kernels/vector_nested.cc On Tue, May 11, 2021 at 4:13 PM Antoine Pitrou <anto...@python.org> wrote: > > Le 11/05/2021 à 22:10, Weston Pace a écrit : > > How does one decide between "utility function" and "compute function"? > > For example, https://issues.apache.org/jira/browse/ARROW-12739 is > > very similar to StructArray::Make which is implemented as a static > > function. However, 12739 would require pool allocation (to > > concatenate the list items into one large contiguous array) and array > > iteration (to copy into the allocated array). Does that make it a > > compute function? > > If it's useful internally as a building block, then IMHO it should > probably be a utility function. > > In this case it is a user request, and it has a non-trivial computation > cost, so I'd say it should be a compute function. > > Regards > > Antoine. >