On Sunday 23 November 2008 18:53:46 Bill Hart wrote:
> That's very impressive!
>
> What do you mean by a slot?
>

whatever I want it to mean!! seeing how vague most asm docs are.

A macro-op . 


> I presume by ax you mean rax, etc.
>

yeah , just lazy

> There's also going to be some loop overhead right?

included allready

Just been thinking about the timings I got from the different unrolling.

mov $0,%r9
mul %rcx
add %rax,%r8
mov 8(%rsi,%rbx,8),%rax
adc %rdx,%r9
mov %r8,(%rdi,%rbx,8)
mul %rcx
mov $0,%r10
add %rax,%r9
mov 16(%rsi,%rbx,8),%rax
adc %rdx,%r10
mov %r9,8(%rdi,%rbx,8)

above a "basic block2" for  two limbs , this is just two of the  "basic block" 
before , stuck together , but with the loads shifted up .

If we assume this runs in 5 cycles  , and the loop control take an extra 1 
cycle then
unroll by 2 is (1*5+1)/2=3 c/l
unroll by 4 is (2*5+1)/4=2.75 c/l
unroll by 8 is (4*5+1)/8=2.625 c/l
unroll by 16 is (8*5+1)/16=2.5625 c/l

which matches my timings exactly plus a little bit

the loop control is
add $4,%ebx
jnz loop

which is two slots , and each "basic block2" has a spare slot , maybe putting 
the loop control at a different place (say after the mul?)  , will make a 
difference.




>
> Bill.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to