viirya commented on PR #6269: URL: https://github.com/apache/arrow-datafusion/pull/6269#issuecomment-1539052195
> Yeah, I guess I was thinking it would nice to avoid the unpacking of the dictionary result into a primitive array (when possible) I meant, for mathematics numerical kernels (e.g. add, minus etc.), the result of operation between two dictionary arrays is primitive array. We don't unpack dictionary array into primitive array. This is why the coercion rule specifies the result type of such op as primitive type instead of dictionary of it. But for such op between dictionary and a scalar, the result is dictionary array as for such op it can simply apply on dictionary values which is not the same for above case (dictionary and dictionary). So the inconsistency (primitive for dictionary/dictionary and dictionary for dictionary/scalar) leads to the bug we saw. We can either changing primitive result of op on dictionary/dictionary to dictionary, or changing dictionary result of op on dictionary/scalar to primitive. This takes the later one as a fix. One reason is that this is simply to apply to fix the issue now. Another reason is that I'm not sure packing op result of dictionary/dictionary as dictionary making sense. It is doable but considering dictionary encoding during mathematics numerical op, it might be introducing performance penalty. I'll find some time trying that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
