tustvold commented on issue #4729:
URL: https://github.com/apache/arrow-rs/issues/4729#issuecomment-1690743797

   > Like you said, the new model forces all the kernels to support dictionary, 
but wouldn't this better than having the inconsistencies
   
   I think kernels materializing dictionaries implicitly is a worse UX than DF 
explicitly doing this with type coercion. In both cases the performance will be 
poor, at least currently it is explicit and DF can avoid doing this more than 
once. Broadly speaking I think all the kernels where it makes sense to 
accommodate dictionaries, now support dictionaries in some form?
   
   > Because the filter predicate only need to be evaluated on the dictionary 
values, whose cardinality could be much lower? especially if the predicate is a 
complex one like a UDF.
   
   Aah sorry I thought you meant filter in arrow parlance, i.e. a selection 
kernel. 
   
   Yes there are broadly speaking two types of kernels where dictionaries work 
well:
   
   * Unary kernels - where the function can just be applied to the dictionary 
values
   * Binary kernels with a scalar argument, i.e. effectively a unary kernel
   
   However, this requires explicit handling of the "dictionary" case. The 
proposed new model, so much as I understand it would not achieve this, and 
would be no better than DF coercing both inputs non-dictionary types?
   
   > I can still see SIMD getting triggered
   
   Oh it is getting triggered, it is just generating very sub-optimal code 
:smile: There's a veritable wall of memory shuffle operators, I honestly have a 
hard time following what LLVM is doing... 
   
   > which in turn is called by Array::GetScalar, which seems like a pretty 
fundamental method.
   
   I would not expect array kernels to be calling such a method, rather I'd 
expect them to be vectorised and operating on the underlying buffers, I could 
be wrong though...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to