On Nov 23, 11:16 pm, ja...@njkfrudils.plus.com wrote:
> On Sunday 23 November 2008 22:49:21 Jason Martin wrote:
>
>
>
> > > You assume OOO works perfectly.
>
> > >     mov $0,%r11
> > >        mul %rcx
> > >        add %rax,%r10
> > >        mov 24(%rsi,%rbx,8),%rax
> > >        adc %rdx,%r11
> > >        mov %r10,16(%rdi,%rbx,8)
> > >        mul %rcx
> > > here        mov $0,%r8
> > >        add %rax,%r11
> > >        mov 32(%rsi,%rbx,8),%rax
> > >        adc %rdx,%r8
> > >        mov %r11,24(%rdi,%rbx,8)
>
> > > moving the line at "here" up one before the mul , slows things down from
> > > 2.78 to 3.03 c/l , whereas if OOO was perfect , it should not have any
> > > effect. This may be due to a cpu scheduler bug , or perhaps the shedulers
> > > not perfect , mul being long latency , two macro ops , two pipes , only
> > > pipe 0_1 etc
> > > If its a bug then perhaps K10 is better?
>
> > I've seen similar wackiness with the core 2 out-of-order engine.  It's
> > strange enough that sometimes sticking in a nop actually saves a
> > cycle!
>
> another oddity..
>
> loop:
>         mov     (%rdi),%rcx
>         adc     %rcx,%rcx
>         mov     %rcx,(%rdi)
> ... 8 way unrolled lshift by 1
>         mov     56(%rdi),%r9
>         adc     %r9,%r9
>         mov     %r9,56(%rdi)
>         lea     64(%rdi),%rdi
>         dec     %rsi
>         jnz     loop
>
> runs at 1.11c/l
>
> whereas the rshift by 1 (ie with rcr instead of adc) does not, you have to
> bunch them up into 4's to get to 1.11c/l
>
>        mov     (%rdi),%rcx
>         mov     -8(%rdi),%r8
>         mov     -16(%rdi),%r9
>         mov     -24(%rdi),%r10
>         rcr     $1,%rcx
>         rcr     $1,%r8
>         rcr     $1,%r9
>         rcr     $1,%r10
>         mov     %rcx,(%rdi)
>         mov     %r8,-8(%rdi)
>         mov     %r9,-16(%rdi)
>         mov     %r10,-24(%rdi)
>
>        mov     -32(%rdi),%rcx
>         mov     -40(%rdi),%r8
>         mov     -48(%rdi),%r9
>         mov     -56(%rdi),%r10
>         rcr     $1,%rcx
>         rcr     $1,%r8
>         rcr     $1,%r9
>         rcr     $1,%r10
>         mov     %rcx,-32(%rdi)
>         mov     %r8,-40(%rdi)
>         mov     %r9,-48(%rdi)
>         mov     %r10,-56(%rdi)
>
>         lea     -64(%rdi),%rdi
>         dec     %rsi
>         jnz     loop
>
> Again , it looks like the OOO is broken.
> But if you look at the gmp-4.2.4 mpn_mul_1 , which runs at 3c/l , the OOO has
> to get work from three separate iterations to fill out the slots.
>
> While I'm at it , I got some more complaints :)
>
> timing mpn_add/sub_n with the gmp speed program the results stay fairly
> consistent . You may get say 24.5 cycles in one run and 24.6 in another. Ok ,
> occasionally you 200 cycles , but I assume thats an interupt or some such
> thing. But , for my mpn_com_n , which is mind numbingly simple
> (mov,not,mov) , sometimes I get 20cycles , 40 cycles, 30 cycles  .... . Whats
> going on there! , I dont know.
>
> Confused.

I think I understand whats going on (at least a bit more!) . My above
mpn_com_n has a cache bank conflict .

My old one was
load not store
load not store
load not store
load not store
etc

and it run at 1.3c/l , however for some alignments of src/dst it would
run at 2.0c/l . This also appears to make the timings with the speed
program also vary a lot. (Note this suggests if your timings vary alot
perhaps your having these kind of problems).
A new one is

load load not not store store
load load not not store store
etc

This runs at 1.3c/l for all alignments , and the timings dont vary.
(just a little bit of jitter)

So far I checked some of my existing asm fns and found no problems.
add/sub/addmul/submul/mul/lshift/rshift/addlsh1/sublsh1/com are done.
mul_basecase is still to do out of the ones mpir has.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to