From: mpir-devel@googlegroups.com [mailto:mpir-devel@googlegroups.com] On Behalf Of Bill Hart Sent: Wednesday, February 26, 2014 12:36 PM To: mpir-devel Subject: [mpir-devel] Chasing ghosts. Any ideas?
For two days now I've been trying to find out why the 8092 x 8064 bit division in our benchmark is slower on K10 than with GMP. Here is what the benchmark calls: * mpz_tdiv_q -- 128 x 126 limbs * mpn_tdiv_q -- 128 x 126 limbs * mpn_sb_divappr_q -- 7 x 4 limbs Here is what I have tried: * corrected some bugs in speed (not relevant to the benchmark) * timed mpn_sb_divappr_q at 7 x 4 limbs using speed : GMP is about 5-10% slower * timed mpn_tdiv_q (the code that is executed is identical to that in GMP) : GMP is 1% faster * tried replacing the MPIR mpz_tdiv_q code with the GMP code : no change * GMP does memory allocation at the mpz level and mpn level, we do it only at the mpn level. I tried changing this : no change * I timed mpn_sub_n, mpn_lshift_n, mpn_copy (the functions called along the way) : GMP is slower or the same speed for all of these * I found and removed an orphaned memory allocation in mpz_tdiv_q : no change * I tried combining two memory allocations in our mpn code into one : this slowed it down even more * I checked the precomputed inverse uses almost identical code, it's probably 1 or 2 cycles slower in MPIR, but this is far too small to make a difference * I made sure the same random numbers were being generated for MPIR and GMP * I made sure our benchmark code generated new random numbers every 1024 iterations to ensure the algorithms weren't affected by the choice of numbers (in fact the time varies a lot when the numbers change) * I tried the same compiler flags as GMP uses : no significant change * tried --enable-alloca : no change * tried both static and dynamic linking : makes at most 1% difference * checked that MPIR is faster at this benchmark than GMP on penryn as expected mpn_tdiv_q takes around 500 cycles, so we are talking about 30 cycles here. mpn_sb_divappr_q takes a little over 1/3 of that time. Anyone have any brainwaves? I'm completely and utterly out of ideas. I've tried absolutely everything. I cannot think of a single additional thing to try! 1-3% is believable due to random C compiler issues. 6-7% is just not believable. And in fact it is probably more than that since mpn_sb_divappr_q is faster in MPIR than GMP as are some of the other mpn functions called along the way. >> Suggestion: In the debugger, go to assembly mode and step through both the GCC code and the MPIR code for the hotspot where they perform better. Write out the actual instructions executed in both cases. If their code is faster, then some different instructions are being executed. I think that the only way to know for sure what is different is to examine it carefully yourself. << -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to mpir-devel+unsubscr...@googlegroups.com. To post to this group, send email to mpir-devel@googlegroups.com. Visit this group at http://groups.google.com/group/mpir-devel. For more options, visit https://groups.google.com/groups/opt_out.