Timing  1000 copys of mul rcx on a i7 I get 1 mul per 3 cycles , so
this is the latency to get rax the lower product
. Doing the same with mul rdx I get 1 mul per 8 or 9 cycles , I dont
know i I'm doing this right as it seems a lot slower than the low part
of the product. Doing the same for
mov $1,%rax
mul %rcx
I get 1.7 to 2 cycles per mul

Even if it does do 1c/l it may be impossible to use as other
bottlenecks will cut in first

add,adc seems to be 1/2 cycles respectively


On Mar 6, 3:47 am, Bill Hart <goodwillh...@googlemail.com> wrote:
> Given the support MPIR already has with more on the way, I won't be
> surprised if hardware companies soon start giving us access to, or
> just outright giving us recent hardware to help us support them.
>
> 1 mul per cycle is huge, if that is correct. If that is per core, then
> it is outstanding. Some of the stats I have seen published for that
> chip, specifically for the sort of maths we do, are off the chart.
> It's the chip I am most enthusiastic about presently.
>
> Bill.
>
> 2009/3/6 Jason Martin <jason.worth.mar...@gmail.com>:
>
>
>
> > On Thu, Mar 5, 2009 at 7:38 PM,  <ja...@njkfrudils.plus.com> wrote:
>
> >> On Thursday 05 March 2009 22:54:34 Jason Martin wrote:
> >>> > I've got a 4.4c/l , Agner Fog's says the thruput on 64 bit mul is 4c/l ,
> >>> > but in another section it says you can issue one every cycle , if the
> >>> > latter section means 32bit then , 4c/l could be right ( I say could not
> >>> > would as consider the K8) , I expect I can do a bit better than 4.4c/l ,
> >>> > but who knows.A lot of the speed comes from reducing the overhead in
> >>> > mul_basecase.
>
> >>> > It quite a wide cpu , I would expect the combined functions(like 
> >>> > addlsh1)
> >>> > to make a real difference on it.
>
> >>> I've measure it directly to be 4 cycles per mul for sustained 64-bit
> >>> multiplies on core2, which agrees with Agner Fog.  Even though there
> >>> are 3 ALU issue ports on the core2, only one of them can issue the
> >>> 64-bit mul.
>
> >>> By the way, I ended up sitting next to an Intel hardware engineer on a
> >>> recent trans-atlantic flight.  He assured me that the Core i7 can
> >>> perform sustained 64-bit multiplies at 1 cycle each.  Of course, I'll
> >>> need to try this myself to believe it, but I'm pretty excited to get
> >>> my hands on the hardware.
>
> >>> --jason
>
> >> 64bit x 64bit --> 128 bit  ???
>
> > That's what the man said (and you know who you are, so feel free to
> > chime in on this discussion if you're reading it!).  Of course, that's
> > a pretty significant improvement over core 2, so I'm going to need
> > actual hardware to verify it.
>
> > --jason
> > - Show quoted text -
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to