Ciao, Il Mer, 14 Agosto 2019 1:21 am, Torbjörn Granlund ha scritto: > I saw this change go in: > > diff -r 118627eed635 -r bb86e66536d5 mpn/x86_64/coreihwl/gcd_11.asm > --- a/mpn/x86_64/coreihwl/gcd_11.asm Tue Aug 13 22:20:06 2019 +0200 > +++ b/mpn/x86_64/coreihwl/gcd_11.asm Wed Aug 14 01:06:08 2019 +0200 > @@ -79,10 +79,10 @@ > > ALIGN(16) C > L(top): bsf v0, %rcx C > + mov u0, %r9 C > sub %rax, u0 C u - v > cmovc v0, u0 C u = |u - v| > cmovc %r9, %rax C v = min(u,v) > - shrx( %rcx, u0, %r9) C > shrx( %rcx, u0, u0) C > mov %rax, v0 C > sub u0, v0 C v - u > > What's the purpose of this change?
Failing tests :-) > Did you time it on hwl, bwl, skl to make sure it's not slower than the > changed code? No. > The double shrx was not a mistake; it sped things up quite a bit. > (I use the same trick for zen and zen2.) Yes, changing the loop was not the best idea, but I would have liked to insert the correct code before the nightly tests... Another possible solution, without changing the loop is: PROLOGUE(mpn_gcd_11) FUNC_ENTRY(2) mov v0, %rax C sub u0, v0 C jz L(end) C mov u0, %r9 C set %r9 ALIGN(16) C L(top): bsf v0, %rcx C sub %rax, u0 C u - v cmovc v0, u0 C u = |u - v| cmovc %r9, %rax C v = min(u,v) shrx( %rcx, u0, %r9) C shrx( %rcx, u0, u0) C mov %rax, v0 C sub u0, v0 C v - u jnz L(top) C L(end): FUNC_EXIT() ret Ĝis, m -- http://bodrato.it/papers/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel