It seems like unrolling our block2 by 2 could be made optimal in theory. You need 2 slots for the loop control. There are 14 slots in your block2.
2*14 + 2 = 30. That would give 10/4 = 2.5c/l. By the way, you suggest that perhaps moving the loop control up might help. If the processor has out-of-order capability, why would this help? Is there something else that prevents that from executing earlier regardless? Bill. 2008/11/23 <[EMAIL PROTECTED]>: > > On Sunday 23 November 2008 18:53:46 Bill Hart wrote: >> That's very impressive! >> >> What do you mean by a slot? >> > > whatever I want it to mean!! seeing how vague most asm docs are. > > A macro-op . > > >> I presume by ax you mean rax, etc. >> > > yeah , just lazy > >> There's also going to be some loop overhead right? > > included allready > > Just been thinking about the timings I got from the different unrolling. > > mov $0,%r9 > mul %rcx > add %rax,%r8 > mov 8(%rsi,%rbx,8),%rax > adc %rdx,%r9 > mov %r8,(%rdi,%rbx,8) > mul %rcx > mov $0,%r10 > add %rax,%r9 > mov 16(%rsi,%rbx,8),%rax > adc %rdx,%r10 > mov %r9,8(%rdi,%rbx,8) > > above a "basic block2" for two limbs , this is just two of the "basic block" > before , stuck together , but with the loads shifted up . > > If we assume this runs in 5 cycles , and the loop control take an extra 1 > cycle then > unroll by 2 is (1*5+1)/2=3 c/l > unroll by 4 is (2*5+1)/4=2.75 c/l > unroll by 8 is (4*5+1)/8=2.625 c/l > unroll by 16 is (8*5+1)/16=2.5625 c/l > > which matches my timings exactly plus a little bit > > the loop control is > add $4,%ebx > jnz loop > > which is two slots , and each "basic block2" has a spare slot , maybe putting > the loop control at a different place (say after the mul?) , will make a > difference. > > > > >> >> Bill. > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---