Marco Bodrato <bodr...@mail.dm.unipi.it> writes:

> Using masks does not always give the fastest code. I tried the
> following variation on Niels' code, and, on my laptop with "g++-10 -O2
> -mtune=icelake-client -march=icelake-client", the resulting code is
> comparable (faster?) with the current asm.

Cool! 

For assembly, it looks like we currently only have assembly for x86_64/
and x86_64/k8/. I think it's possibly to do something more clever on
more recent processors with mulx, e.g, it will get pretty neat to keep
the u1 recurrency variable in the special %rdx register.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to