For a release, we need to fix a performance issue noted by Marcel Keller.
It shows up on Ivy Bridge when linking against MPIR with pthreads vs
without (yes it's very strange).

A program used by Marcel to get timings is attached.

Also attached are addmul_1.asm by Jens Nurmann, which tries to fix the
issue. And there is a similar file addmul_1.opt by Marcell which also fixes
the issue.

Unfortunately, speed says Jens' code is faster, but Marcell's program says
his is much faster. This is likely down to the aforementioned issues with
speed on Intel processors.

This should be resolved and the correct version included in MPIR (and the
Ivy Bridge tuning may need to be redone). Of course try will also needs to
be run for 24 hours on the code that is chosen to ensure it is correct.

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/d/optout.

Attachment: addmul_1.opt.s
Description: Binary data

#include <mpir.h>
#include <stdlib.h>
#include <time.h>

const int t = 5;
const int n = 1e8;

void mpn(mp_limb_t* zz, mp_limb_t* x, mp_limb_t y)
{
    struct timespec start, stop;
    clock_gettime(CLOCK_REALTIME, &start);
    for (int i = 0; i < n; i++)
        zz[t] = mpn_addmul_1(zz, x, t, y);
    clock_gettime(CLOCK_REALTIME, &stop);
    printf("mpn_addmul_1: %f\n", 1e-9 * (stop.tv_nsec - start.tv_nsec) +
            (stop.tv_sec - start.tv_sec));
}

int main()
{
    mp_limb_t x[t+1], y, z[t+1];
    mpn(z, x, y);
}

Attachment: addmul_1.asm
Description: Binary data

Reply via email to