Oh that's nonsense, there's no such thing. Bill.
2009/3/14 Bill Hart <goodwillh...@googlemail.com>: > Presumably those are dependent muls. What about independent? > > e.g. > > mul rcx > xchg rbx, rax > mul rcx > xchg > > etc. > > Bill. > > 2009/3/14 jason <ja...@njkfrudils.plus.com>: >> >> Timing 1000 copys of mul rcx on a i7 I get 1 mul per 3 cycles , so >> this is the latency to get rax the lower product >> . Doing the same with mul rdx I get 1 mul per 8 or 9 cycles , I dont >> know i I'm doing this right as it seems a lot slower than the low part >> of the product. Doing the same for >> mov $1,%rax >> mul %rcx >> I get 1.7 to 2 cycles per mul >> >> Even if it does do 1c/l it may be impossible to use as other >> bottlenecks will cut in first >> >> add,adc seems to be 1/2 cycles respectively >> >> >> On Mar 6, 3:47 am, Bill Hart <goodwillh...@googlemail.com> wrote: >>> Given the support MPIR already has with more on the way, I won't be >>> surprised if hardware companies soon start giving us access to, or >>> just outright giving us recent hardware to help us support them. >>> >>> 1 mul per cycle is huge, if that is correct. If that is per core, then >>> it is outstanding. Some of the stats I have seen published for that >>> chip, specifically for the sort of maths we do, are off the chart. >>> It's the chip I am most enthusiastic about presently. >>> >>> Bill. >>> >>> 2009/3/6 Jason Martin <jason.worth.mar...@gmail.com>: >>> >>> >>> >>> > On Thu, Mar 5, 2009 at 7:38 PM, <ja...@njkfrudils.plus.com> wrote: >>> >>> >> On Thursday 05 March 2009 22:54:34 Jason Martin wrote: >>> >>> > I've got a 4.4c/l , Agner Fog's says the thruput on 64 bit mul is >>> >>> > 4c/l , >>> >>> > but in another section it says you can issue one every cycle , if the >>> >>> > latter section means 32bit then , 4c/l could be right ( I say could >>> >>> > not >>> >>> > would as consider the K8) , I expect I can do a bit better than >>> >>> > 4.4c/l , >>> >>> > but who knows.A lot of the speed comes from reducing the overhead in >>> >>> > mul_basecase. >>> >>> >>> > It quite a wide cpu , I would expect the combined functions(like >>> >>> > addlsh1) >>> >>> > to make a real difference on it. >>> >>> >>> I've measure it directly to be 4 cycles per mul for sustained 64-bit >>> >>> multiplies on core2, which agrees with Agner Fog. Even though there >>> >>> are 3 ALU issue ports on the core2, only one of them can issue the >>> >>> 64-bit mul. >>> >>> >>> By the way, I ended up sitting next to an Intel hardware engineer on a >>> >>> recent trans-atlantic flight. He assured me that the Core i7 can >>> >>> perform sustained 64-bit multiplies at 1 cycle each. Of course, I'll >>> >>> need to try this myself to believe it, but I'm pretty excited to get >>> >>> my hands on the hardware. >>> >>> >>> --jason >>> >>> >> 64bit x 64bit --> 128 bit ??? >>> >>> > That's what the man said (and you know who you are, so feel free to >>> > chime in on this discussion if you're reading it!). Of course, that's >>> > a pretty significant improvement over core 2, so I'm going to need >>> > actual hardware to verify it. >>> >>> > --jason >>> > - Show quoted text - >> >> >> > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---