On 24/05/22 11:59, Niels Möller wrote:
> Hi, I've had a first look at the paper by djb and Bo-Yin Yang,
> https://eprint.iacr.org/2019/266.pdf. Mainly focusing on the integer
> case.
Have you looked at https://eprint.iacr.org/2020/972.pdf, where the
author seems to suggests an even faster
copyright
claim for FLINT and put my name (spelled Albin Ahlbäck) in the GMP
copyright claim instead.
Just some notes:
- We use our own M4 syntax for the beginning and ending of the
function, but it should be easy to translate to GMP's syntax.
- It currently only works for n &
and put my name (spelled Albin Ahlbäck) in the GMP
copyright claim instead.
Just some notes:
- We use our own M4 syntax for the beginning and ending of the function,
but it should be easy to translate to GMP's syntax.
- It currently only works for n > 5 (I believe) as we in FLINT have
speciali
Hi,
I'm sure that many of you have heard about many Linux distributions are
playing with the idea of setting the baseline for what CPU architecture
is supported regarding x86_64 CPUs, mainly "x86_64-v3".
I cannot see that GMP today allows specifying `--host=x86_64-v3-...' or
something
On 3/8/24 07:31, Niels Möller wrote:
Albin Ahlbäck writes:
I cannot see that GMP today allows specifying `--host=x86_64-v3-...'
or something similar. I suppose that packagers specifying such a host
triplet could yield speedups for (most?) users downloading precompiled
binaries
Thanks for the fast and helpful reply!
I see, I definitely need to read up on the CPU pipelines. I also tested
one of your automated scripts for measuring cycles per limbs for a
variety of functions, and it checks out.
Anyway, in regards to the performance of multiplication: I did manage to
Thanks for the further explanation, Niels!
> For an assembly loop, one can find out from properties of the
> processor what cycle counts are implied by these three limits. It's
> often possible (but tedious) to tweak scheduling to get an actual
> speed pretty close to the limit. And it aids
Hello,
I am looking at Torbjörn's `aorsmul_1.asm' for Apple M1, and I am having
trouble understanding how the cycles per limb number was calculated.
As I understand it, the cycles per limb number represents the loop(s) in
any routine. Looking at the main loop, it seems like it should scale