theirix commented on PR #19369: URL: https://github.com/apache/datafusion/pull/19369#issuecomment-3706941265
> Seeing all this logic introduced, I'm beginning to question whether there is actual benefit to having a native log implementation 🤔 > > Perhaps we should just revert to casting it to float and accept the accuracy loss > > Thoughts @theirix ? Fair enough, the logic becomes more convoluted. The original idea was to introduce common decimal operations. Scale-preserving operations like abs, round, gcd, etc., are easy to implement and support. Some other operations with a natural mapping to decimals (like log10, pow10) adjust scales and do not have a natural analogue in the arrow buffer, leading to more complex logic. These operations are typical for data analytics, and applications could benefit from them. So ten-based operations can be calculated precisely, while for the rest and for more complicated operations, of course, it is fine to lose precision using a native float implementation. First, we should reuse the arrow's foundational primitives as much as possible. If there is an `OP_checked`, it's better to piggyback on it. A few num traits were recently added to decimals in arrow-buffer, making it easier for us. Second, I believe more logic should be isolated in `calculate_binary_decimal_math`, especially for handling different scales, to shift responsibility from UDF implementers (like pow) to middleware. It is in progress, and I'll submit it shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
