Hi Wes,
Le 18/04/2020 à 23:41, Wes McKinney a écrit : > > There are some problems with our current collection of kernels in the > context of array-expr evaluation in query processing: > > * For efficiency, kernels used for array-expr evaluation should write > into preallocated memory as their default mode. This enables the > interpreter to avoid temporary memory allocations and improve CPU > cache utilization. Almost none of our kernels are implemented this way > currently. > * The current approach for expr-type kernels of having a top-level > memory-allocating function is not scalable for binding developers. I > believe instead that kernels should be selected and invoked > generically by using the string name of the kernel > > On this last point, what I am suggesting is that we do something more like > > ASSIGN_OR_RAISE(auto kernel, compute::GetKernel("greater", {type0, type1})); > ArrayData* out = ... ; > RETURN_NOT_OK(kernel->Call({arg0, arg1}, &out)); Sounds good to me. > In particular, when we reason that successive kernel invocations can > reuse memory, we can have code that is doing in essence > > k1->Call({arg0, arg1}, &out) > k2->Call({out}, &out)) > k3->Call({arg2, out}, &out) This assumes that all these kernels can safely write into one of their inputs. This should be true for trivial ones, but not if e.g. a kernel makes two passes over its input. For example, the SortToIndices kernel first scans the input for min and max values, and then switches on two different sorting algorithms depending on those statistics (using a O(n) counting sort if the values are in a small enough range). (I'm also not sure how C++ handles this: if you have a `const T*` input and a `T*` output, does C++ reason that the two pointers can point to the same memory?) It would be interesting to know how costly repeated allocation/deallocation is. Modern allocators like jemalloc do their own caching instead of always returning memory to the system. We could also have our own caching layer. Regards Antoine.