Hi,

I've dug a bit deeper, and it seems that there is an alignment issue within 
addmul_1. I've created two marginally different programs, of which one is 
much faster:

$ gcc -O3 addmul_1.s -o addmul_1.o -c; for i in a b; do gcc -O3 $i.cpp -o 
$i.out addmul_1.o; ./$i.out ; done
Time: 0.490647
Time: 0.681671

addmul_1.s is the Sandy Bridge-optimized assembly. When I change the 
alignment there a bit as in addmul_1.opt.s, the difference disappears:

$ gcc -O3 addmul_1.opt.s -o addmul_1.o -c; for i in a b; do gcc -O3 $i.cpp 
-o $i.out addmul_1.o; ./$i.out ; objdump -CSD $i.out > $i.dis ;done
Time: 0.505714
Time: 0.495640

Best regards,
Marcel


On Friday, August 11, 2017 at 7:00:10 PM UTC+1, Bill Hart wrote:
>
> We've noticed similar sorts of things. One possibility is that the loop in 
> your test code is not aligned as well in one version. Or perhaps your stack 
> is hitting the same location modulo 4096, which is a known issue on some 
> modern processors. There might be SSE code in the linker and AVX code in 
> the addmul_1 function. The kernel might pin the process to a different CPU 
> which is slightly slower or faster, when the pthreads library is used. You 
> might also hit some frequency scaling in the CPU due to the pthreads 
> library taking longer to link in. There's so many possibilities on a modern 
> CPU, it hardly bears thinking about.
>
> Also, in your code, you don't seem to set y anywhere and I wasn't aware 
> you could use 1e8 as an int constant.
>
> Bill.
>
> On 11 August 2017 at 18:10, Marcel Keller <m.ke...@bristol.ac.uk 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I've noticed that the performance of mpn_addmul_1 can depend considerably 
>> on whether I link against libpthread, which strikes me as very weird:
>>
>> $ g++ -O3 Time-addmul_1.cpp ~/src/mpir-3.0.0-ivybridge/mpn/addmul_1.o -o 
>> a.out
>>
>> $ g++ -O3 Time-addmul_1.cpp ~/src/mpir-3.0.0-ivybridge/mpn/addmul_1.o -o 
>> b.out  -lpthread
>>
>> $ ./a.out
>> mpn_addmul_1: 0.506279
>>
>> $ ./b.out
>> mpn_addmul_1: 0.682086
>>
>> Disassembling the binaries shows that the mpn function in 
>> Time-addmul_1.cpp is compiled exactly the same way.
>>
>> I'm running CentOS 7 and GCC 6.2. The source as well as the outputs are 
>> attached.
>>
>> Does anyone have any idea why this could be?
>>
>> Best regards,
>> Marcel
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mpir-devel" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to mpir-devel+...@googlegroups.com <javascript:>.
>> To post to this group, send email to mpir-...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/mpir-devel.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at https://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/d/optout.

Attachment: b.out
Description: Binary data

Attachment: a.out
Description: Binary data

#include "common.h"

void g()
{
}

int main()
{
    mp_limb_t x[t+1], y, z[t+1];
    mpn(x, y, z);
}
#include <mpir.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

const int t = 5;
const int n = 1e8;

void f();

void mpn(mp_limb_t* x, mp_limb_t y, mp_limb_t* z)
{
    //f();
    //mp_limb_t x[t+1], y, z[t+1];
    struct timespec start, stop;
    clock_gettime(CLOCK_REALTIME, &start);
    for (int i = 0; i < n; i++)
        z[t] = mpn_addmul_1(z, x, t, y);
    clock_gettime(CLOCK_REALTIME, &stop);
    printf("Time: %f\n", 1e-9 * (stop.tv_nsec - start.tv_nsec) +
            (stop.tv_sec - start.tv_sec));
}

void f()
{
    printf("mpn: %x\n", mpn);
    printf("add_mul_1: %x\n", __gmpn_addmul_1);
    printf("diff: %x\n", (long)__gmpn_addmul_1 - (long)mpn);
}
#include "common.h"

int main()
{
    mp_limb_t x[t+1], y, z[t+1];
    mpn(x, y, z);
}

Attachment: addmul_1.opt.s
Description: Binary data

Attachment: addmul_1.s
Description: Binary data

Reply via email to