On Sunday 23 November 2008 22:49:21 Jason Martin wrote: > > You assume OOO works perfectly. > > > > mov $0,%r11 > > mul %rcx > > add %rax,%r10 > > mov 24(%rsi,%rbx,8),%rax > > adc %rdx,%r11 > > mov %r10,16(%rdi,%rbx,8) > > mul %rcx > > here mov $0,%r8 > > add %rax,%r11 > > mov 32(%rsi,%rbx,8),%rax > > adc %rdx,%r8 > > mov %r11,24(%rdi,%rbx,8) > > > > moving the line at "here" up one before the mul , slows things down from > > 2.78 to 3.03 c/l , whereas if OOO was perfect , it should not have any > > effect. This may be due to a cpu scheduler bug , or perhaps the shedulers > > not perfect , mul being long latency , two macro ops , two pipes , only > > pipe 0_1 etc > > If its a bug then perhaps K10 is better? > > I've seen similar wackiness with the core 2 out-of-order engine. It's > strange enough that sometimes sticking in a nop actually saves a > cycle!
another oddity.. loop: mov (%rdi),%rcx adc %rcx,%rcx mov %rcx,(%rdi) ... 8 way unrolled lshift by 1 mov 56(%rdi),%r9 adc %r9,%r9 mov %r9,56(%rdi) lea 64(%rdi),%rdi dec %rsi jnz loop runs at 1.11c/l whereas the rshift by 1 (ie with rcr instead of adc) does not, you have to bunch them up into 4's to get to 1.11c/l mov (%rdi),%rcx mov -8(%rdi),%r8 mov -16(%rdi),%r9 mov -24(%rdi),%r10 rcr $1,%rcx rcr $1,%r8 rcr $1,%r9 rcr $1,%r10 mov %rcx,(%rdi) mov %r8,-8(%rdi) mov %r9,-16(%rdi) mov %r10,-24(%rdi) mov -32(%rdi),%rcx mov -40(%rdi),%r8 mov -48(%rdi),%r9 mov -56(%rdi),%r10 rcr $1,%rcx rcr $1,%r8 rcr $1,%r9 rcr $1,%r10 mov %rcx,-32(%rdi) mov %r8,-40(%rdi) mov %r9,-48(%rdi) mov %r10,-56(%rdi) lea -64(%rdi),%rdi dec %rsi jnz loop Again , it looks like the OOO is broken. But if you look at the gmp-4.2.4 mpn_mul_1 , which runs at 3c/l , the OOO has to get work from three separate iterations to fill out the slots. While I'm at it , I got some more complaints :) timing mpn_add/sub_n with the gmp speed program the results stay fairly consistent . You may get say 24.5 cycles in one run and 24.6 in another. Ok , occasionally you 200 cycles , but I assume thats an interupt or some such thing. But , for my mpn_com_n , which is mind numbingly simple (mov,not,mov) , sometimes I get 20cycles , 40 cycles, 30 cycles .... . Whats going on there! , I dont know. Confused. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---