felipecrv commented on PR #36800:
URL: https://github.com/apache/arrow/pull/36800#issuecomment-1648934117

   This implementation is very elegant and simple, but it's giving up some 
accuracy we can reclaim with a bit more careful handling of the units.
   
   If I understand correctly each DURATION operand is being converted to a 
FLOAT64 before the division, so `x NANO / y MILLI` will first divide `x` by 
`1e9`, then `y` by `1e3`, and then perform another division with those two 
results.
   
   There is a considerable accuracy loss for low-valued `x` [1]. An alternative 
way to do it is to first build a fraction from the units. In this `NANO/MILLI` 
example that would be `1e-9/1e-3 = 1e-6`.
   
   So the actual computation would be `(x * 1e-6) / y`. This is also more 
efficient: one `*` and one `/` instead of 3 `/`.
   
   My concern can probably be ignored if numpy/pandas does the naive scaling on 
both operands and the divide again.
   
   @pitrou what do you think.
   
   
![image](https://github.com/apache/arrow/assets/207795/aac4f042-4e43-4b0c-bd45-b072ed80d6b6)
   
   [1] 
https://herbie.uwplse.org/demo/8908317b92cc5eb8646c2968ab4956e4e96c9cea.2.0/graph.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to