On 1 June 2017 at 20:53, Armin Rigo <armin.r...@gmail.com> wrote: > Hi, > > On 31 May 2017 at 17:11, Tuom Larsen <tuom.lar...@gmail.com> wrote: > > # k = i//j # 2.12 seconds > > k = int(i/j) # 0.98 seconds > > Note first that if you don't do anything with 'k', it might be optimized > away. > > I just wrote a pure C example doing the same thing, and indeed > converting the integers to float, dividing, and then converting back > to integer... is 2.2x times faster there too. > > Go figure it out. I have no idea why the CPU behaves like that. > Maybe Neal can provide a clue. >
I took a look at this earlier today. It turns out that, on Skylake (and I think it's similar on other recent x86_64 implementations), 80-bit floating point division has a latency of 14 cycles, where 32 bit integer division has a latency of 26 cycles. I expect this is because there are only two hardware division units, and both are on the floating point path. -- William Leslie Notice: Likely much of this email is, by the nature of copyright, covered under copyright law. You absolutely MAY reproduce any part of it in accordance with the copyright law of the nation you are reading this in. Any attempt to DENY YOU THOSE RIGHTS would be illegal without prior contractual agreement.
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev