From: mpir-devel@googlegroups.com [mailto:mpir-devel@googlegroups.com] On 
Behalf Of Bill Hart
Sent: Wednesday, February 26, 2014 12:36 PM
To: mpir-devel
Subject: [mpir-devel] Chasing ghosts. Any ideas?

For two days now I've been trying to find out why the 8092 x 8064 bit division 
in our benchmark is slower on K10 than with GMP.

Here is what the benchmark calls:

* mpz_tdiv_q -- 128 x 126 limbs
* mpn_tdiv_q -- 128 x 126 limbs
* mpn_sb_divappr_q -- 7 x 4 limbs

Here is what I have tried:

* corrected some bugs in speed (not relevant to the benchmark)
* timed mpn_sb_divappr_q at 7 x 4 limbs using speed : GMP is about 5-10% slower
* timed mpn_tdiv_q (the code that is executed is identical to that in GMP) : 
GMP is 1% faster
* tried replacing the MPIR mpz_tdiv_q code with the GMP code : no change
* GMP does memory allocation at the mpz level and mpn level, we do it only at 
the mpn level. I tried changing this : no change
* I timed mpn_sub_n, mpn_lshift_n, mpn_copy (the functions called along the 
way) : GMP is slower or the same speed for all of these
* I found and removed an orphaned memory allocation in mpz_tdiv_q : no change
* I tried combining two memory allocations in our mpn code into one : this 
slowed it down even more
* I checked the precomputed inverse uses almost identical code, it's probably 1 
or 2 cycles slower in MPIR, but this is far too small to make a difference
* I made sure the same random numbers were being generated for MPIR and GMP
* I made sure our benchmark code generated new random numbers every 1024 
iterations to ensure the algorithms weren't affected by the choice of numbers 
(in fact the time varies a lot when the numbers change)
* I tried the same compiler flags as GMP uses : no significant change
* tried --enable-alloca : no change
* tried both static and dynamic linking : makes at most 1% difference
* checked that MPIR is faster at this benchmark than GMP on penryn as expected

mpn_tdiv_q takes around 500 cycles, so we are talking about 30 cycles here. 
mpn_sb_divappr_q takes a little over 1/3 of that time.

Anyone have any brainwaves? I'm completely and utterly out of ideas. I've tried 
absolutely everything. I cannot think of a single additional thing to try!

1-3% is believable due to random C compiler issues. 6-7% is just not 
believable. And in fact it is probably more than that since mpn_sb_divappr_q is 
faster in MPIR than GMP as are some of the other mpn functions called along the 
way.
>>
Suggestion:
In the debugger, go to assembly mode and step through both the GCC code and the 
MPIR code for the hotspot where they perform better.
Write out the actual instructions executed in both cases.
If their code is faster, then some different instructions are being executed.
I think that the only way to know for sure what is different is to examine it 
carefully yourself.
<<

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mpir-devel+unsubscr...@googlegroups.com.
To post to this group, send email to mpir-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/mpir-devel.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to