Re: Fast constant-time gcd computation and modular inversion

2022-05-28 Thread Albin Ahlbäck
On 24/05/22 11:59, Niels Möller wrote: > Hi, I've had a first look at the paper by djb and Bo-Yin Yang, > https://eprint.iacr.org/2019/266.pdf. Mainly focusing on the integer > case. Have you looked at https://eprint.iacr.org/2020/972.pdf, where the author seems to suggests an even faster

Re: mpn_mulhigh_basecase for Broadwell

2024-02-13 Thread Albin Ahlbäck
copyright claim for FLINT and put my name (spelled Albin Ahlbäck) in the GMP copyright claim instead. Just some notes: - We use our own M4 syntax for the beginning and ending of the function, but it should be easy to translate to GMP's syntax. - It currently only works for n &

mpn_mulhigh_basecase for Broadwell

2024-02-06 Thread Albin Ahlbäck
and put my name (spelled Albin Ahlbäck) in the GMP copyright claim instead. Just some notes: - We use our own M4 syntax for the beginning and ending of the function, but it should be easy to translate to GMP's syntax. - It currently only works for n > 5 (I believe) as we in FLINT have speciali

Options for x86_64 "versions" for system types

2024-03-07 Thread Albin Ahlbäck
Hi, I'm sure that many of you have heard about many Linux distributions are playing with the idea of setting the baseline for what CPU architecture is supported regarding x86_64 CPUs, mainly "x86_64-v3". I cannot see that GMP today allows specifying `--host=x86_64-v3-...' or something

Re: Options for x86_64 "versions" for system types

2024-03-08 Thread Albin Ahlbäck
On 3/8/24 07:31, Niels Möller wrote: Albin Ahlbäck writes: I cannot see that GMP today allows specifying `--host=x86_64-v3-...' or something similar. I suppose that packagers specifying such a host triplet could yield speedups for (most?) users downloading precompiled binaries

Re: How to calculate cycles/limb in assembly routines

2024-04-04 Thread Albin Ahlbäck
Thanks for the fast and helpful reply! I see, I definitely need to read up on the CPU pipelines. I also tested one of your automated scripts for measuring cycles per limbs for a variety of functions, and it checks out. Anyway, in regards to the performance of multiplication: I did manage to

Re: How to calculate cycles/limb in assembly routines

2024-04-05 Thread Albin Ahlbäck
Thanks for the further explanation, Niels! > For an assembly loop, one can find out from properties of the > processor what cycle counts are implied by these three limits. It's > often possible (but tedious) to tweak scheduling to get an actual > speed pretty close to the limit. And it aids

How to calculate cycles/limb in assembly routines

2024-04-04 Thread Albin Ahlbäck
Hello, I am looking at Torbjörn's `aorsmul_1.asm' for Apple M1, and I am having trouble understanding how the cycles per limb number was calculated. As I understand it, the cycles per limb number represents the loop(s) in any routine. Looking at the main loop, it seems like it should scale