Marco Bodrato <bodr...@mail.dm.unipi.it> writes: > Using masks does not always give the fastest code. I tried the > following variation on Niels' code, and, on my laptop with "g++-10 -O2 > -mtune=icelake-client -march=icelake-client", the resulting code is > comparable (faster?) with the current asm.
Cool! For assembly, it looks like we currently only have assembly for x86_64/ and x86_64/k8/. I think it's possibly to do something more clever on more recent processors with mulx, e.g, it will get pretty neat to keep the u1 recurrency variable in the special %rdx register. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel