On Sunday 23 November 2008 18:53:46 Bill Hart wrote: > That's very impressive! > > What do you mean by a slot? >
whatever I want it to mean!! seeing how vague most asm docs are. A macro-op . > I presume by ax you mean rax, etc. > yeah , just lazy > There's also going to be some loop overhead right? included allready Just been thinking about the timings I got from the different unrolling. mov $0,%r9 mul %rcx add %rax,%r8 mov 8(%rsi,%rbx,8),%rax adc %rdx,%r9 mov %r8,(%rdi,%rbx,8) mul %rcx mov $0,%r10 add %rax,%r9 mov 16(%rsi,%rbx,8),%rax adc %rdx,%r10 mov %r9,8(%rdi,%rbx,8) above a "basic block2" for two limbs , this is just two of the "basic block" before , stuck together , but with the loads shifted up . If we assume this runs in 5 cycles , and the loop control take an extra 1 cycle then unroll by 2 is (1*5+1)/2=3 c/l unroll by 4 is (2*5+1)/4=2.75 c/l unroll by 8 is (4*5+1)/8=2.625 c/l unroll by 16 is (8*5+1)/16=2.5625 c/l which matches my timings exactly plus a little bit the loop control is add $4,%ebx jnz loop which is two slots , and each "basic block2" has a spare slot , maybe putting the loop control at a different place (say after the mul?) , will make a difference. > > Bill. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---