Timing 1000 copys of mul rcx on a i7 I get 1 mul per 3 cycles , so this is the latency to get rax the lower product . Doing the same with mul rdx I get 1 mul per 8 or 9 cycles , I dont know i I'm doing this right as it seems a lot slower than the low part of the product. Doing the same for mov $1,%rax mul %rcx I get 1.7 to 2 cycles per mul
Even if it does do 1c/l it may be impossible to use as other bottlenecks will cut in first add,adc seems to be 1/2 cycles respectively On Mar 6, 3:47 am, Bill Hart <goodwillh...@googlemail.com> wrote: > Given the support MPIR already has with more on the way, I won't be > surprised if hardware companies soon start giving us access to, or > just outright giving us recent hardware to help us support them. > > 1 mul per cycle is huge, if that is correct. If that is per core, then > it is outstanding. Some of the stats I have seen published for that > chip, specifically for the sort of maths we do, are off the chart. > It's the chip I am most enthusiastic about presently. > > Bill. > > 2009/3/6 Jason Martin <jason.worth.mar...@gmail.com>: > > > > > On Thu, Mar 5, 2009 at 7:38 PM, <ja...@njkfrudils.plus.com> wrote: > > >> On Thursday 05 March 2009 22:54:34 Jason Martin wrote: > >>> > I've got a 4.4c/l , Agner Fog's says the thruput on 64 bit mul is 4c/l , > >>> > but in another section it says you can issue one every cycle , if the > >>> > latter section means 32bit then , 4c/l could be right ( I say could not > >>> > would as consider the K8) , I expect I can do a bit better than 4.4c/l , > >>> > but who knows.A lot of the speed comes from reducing the overhead in > >>> > mul_basecase. > > >>> > It quite a wide cpu , I would expect the combined functions(like > >>> > addlsh1) > >>> > to make a real difference on it. > > >>> I've measure it directly to be 4 cycles per mul for sustained 64-bit > >>> multiplies on core2, which agrees with Agner Fog. Even though there > >>> are 3 ALU issue ports on the core2, only one of them can issue the > >>> 64-bit mul. > > >>> By the way, I ended up sitting next to an Intel hardware engineer on a > >>> recent trans-atlantic flight. He assured me that the Core i7 can > >>> perform sustained 64-bit multiplies at 1 cycle each. Of course, I'll > >>> need to try this myself to believe it, but I'm pretty excited to get > >>> my hands on the hardware. > > >>> --jason > > >> 64bit x 64bit --> 128 bit ??? > > > That's what the man said (and you know who you are, so feel free to > > chime in on this discussion if you're reading it!). Of course, that's > > a pretty significant improvement over core 2, so I'm going to need > > actual hardware to verify it. > > > --jason > > - Show quoted text - --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---