tustvold commented on issue #4729: URL: https://github.com/apache/arrow-rs/issues/4729#issuecomment-1690743797
> Like you said, the new model forces all the kernels to support dictionary, but wouldn't this better than having the inconsistencies I think kernels materializing dictionaries implicitly is a worse UX than DF explicitly doing this with type coercion. In both cases the performance will be poor, at least currently it is explicit and DF can avoid doing this more than once. Broadly speaking I think all the kernels where it makes sense to accommodate dictionaries, now support dictionaries in some form? > Because the filter predicate only need to be evaluated on the dictionary values, whose cardinality could be much lower? especially if the predicate is a complex one like a UDF. Aah sorry I thought you meant filter in arrow parlance, i.e. a selection kernel. Yes there are broadly speaking two types of kernels where dictionaries work well: * Unary kernels - where the function can just be applied to the dictionary values * Binary kernels with a scalar argument, i.e. effectively a unary kernel However, this requires explicit handling of the "dictionary" case. The proposed new model, so much as I understand it would not achieve this, and would be no better than DF coercing both inputs non-dictionary types? > I can still see SIMD getting triggered Oh it is getting triggered, it is just generating very sub-optimal code :smile: There's a veritable wall of memory shuffle operators, I honestly have a hard time following what LLVM is doing... > which in turn is called by Array::GetScalar, which seems like a pretty fundamental method. I would not expect array kernels to be calling such a method, rather I'd expect them to be vectorised and operating on the underlying buffers, I could be wrong though... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
