Merry Christmas Brian (and all). I used to have an Athlon XP but no longer. However I do have an AMP Turion 64 x2 and of course have access to Opterons. I'm betting the tuning parameters are nearly the same for these machines. We can give it a go. I'll send a file hopefully later today. It can't hurt to try anyway. Of course I'll need to get make tune working again, which might be a mission.
Well done to Jason Moxham with the assembly improvements. That's pretty amazing. Loving the enthusiasm. Bill. 2008/12/24 Cactus <rieman...@googlemail.com>: > > > > On Dec 24, 8:50 am, Cactus <rieman...@googlemail.com> wrote: >> On Dec 23, 11:31 pm, ja...@njkfrudils.plus.com wrote: >> >> >> >> > On Tuesday 23 December 2008 22:52:10 Cactus wrote: >> >> > > On Dec 22, 11:55 pm, jason <ja...@njkfrudils.plus.com> wrote: >> > > > On Dec 20, 1:13 pm, Cactus <rieman...@googlemail.com> wrote: >> > > > > On Dec 20, 10:49 am, Cactus <rieman...@googlemail.com> wrote: >> > > > > > On Dec 20, 3:56 am, "Bill Hart" <goodwillh...@googlemail.com> >> > > > > > wrote: >> >> > > > > Following up my earlier results, I have now played with alignment and >> > > > > jump decisions and I find that: >> >> > > > > jc .1 >> > > > > jmp .2 >> >> > > > > align 16 >> > > > > .1:mov rax, [r10+r8*8] >> >> > > > > in which there is a jump to aligned code (rather than falling through >> > > > > and hence executing the padding code) gives significantly better >> > > > > results: >> >> > > > > Jason's Code (mp_add_n and mp_sub_n): >> > > > > Jason's Code (mp_addmul_n and mp_submul_n): >> > > > > Jason's Code (mp_mul_1): >> >> > > > > Running benchmarks >> > > > > Category base >> > > > > Program multiply >> > > > > multiply 128 128 >> > > > > MPIRbench.base.multiply.128.128 result: 26701842 >> > > > > multiply 512 512 >> > > > > MPIRbench.base.multiply.512.512 result: 6455010 >> > > > > multiply 8192 8192 >> > > > > MPIRbench.base.multiply.8192.8192 result: 61537 >> > > > > multiply 131072 131072 >> > > > > MPIRbench.base.multiply.131072.131072 result: 938 >> > > > > multiply 2097152 2097152 >> > > > > MPIRbench.base.multiply.2097152.2097152 result: 23.0 >> > > > > MPIRbench.base.multiply result: 46978.70 >> > > > > Program divide >> > > > > divide 8192 32 >> > > > > MPIRbench.base.divide.8192.32 result: 677900 >> > > > > divide 8192 64 >> > > > > MPIRbench.base.divide.8192.64 result: 689331 >> > > > > divide 8192 128 >> > > > > MPIRbench.base.divide.8192.128 result: 269308 >> > > > > divide 8192 4096 >> > > > > MPIRbench.base.divide.8192.4096 result: 116612 >> > > > > divide 8192 8064 >> > > > > MPIRbench.base.divide.8192.8064 result: 1027764 >> > > > > divide 131072 8192 >> > > > > MPIRbench.base.divide.131072.8192 result: 2667 >> > > > > divide 131072 65536 >> > > > > MPIRbench.base.divide.131072.65536 result: 1249 >> > > > > divide 8388608 4194304 >> > > > > MPIRbench.base.divide.8388608.4194304 result: 2.56 >> > > > > MPIRbench.base.divide result: 24471.64 >> > > > > MPIRbench.base result 33906.43 >> > > > > Category app >> > > > > Program rsa >> > > > > rsa 512 >> > > > > MPIRbench.app.rsa.512 result: 14055 >> > > > > rsa 1024 >> > > > > MPIRbench.app.rsa.1024 result: 2735 >> > > > > rsa 2048 >> > > > > MPIRbench.app.rsa.2048 result: 498 >> > > > > MPIRbench.app.rsa result: 2675.09 >> > > > > MPIRbench.app result 2675.09 >> > > > > MPIRbench result: 9523.81 >> >> > > > > This is about 8% faster than my original Windows code. >> >> > > > > Well done Jason! >> >> > > > > Brian >> >> > > > I've put the mpn_mul_basecase in the mpir development branch , ready >> > > > for conversion to windows.http://www.digitalmischief.co.uk/fruitbowl/is >> > > > the latest with a new mpn_sqr_basecase and mpn_redc_basecase , which >> > > > overall gives me a 60% (which by co-incidence is the same ratio as >> > > > 4/2.5 >> > > > the addmul >> > > > ratio's!!!) improvement over gmp-4.2.4, they are very much still >> > > > cut&paste , so expect a few more % in time. I'm going to try a >> > > > division_basecase and a mullow and mulhigh basecase next , there is >> > > > also a addmul loop in bdivmod.c which does something , and may be >> > > > worth doing. >> >> > > Hi Jason, >> >> > > Thanks for the mpn_mul_basecase code. >> >> > > I have converted this to Windows and it is slower than my old code - >> > > the mpirbench score with the new code is 9350 whereas the current code >> > > is 9550, which is a 2% performance loss. Only the mpn_mul_basecase >> > > code is different - I have kept your other routines in place in making >> > > this comparison. >> >> > Odd!!! >> > Did you run tune? , I assume your old code is is the Gaudry code, ,doesn't >> > even sound like its running!! >> >> > > In this case there is about the same prologue/epilogue overhead in >> > > both versions so it will be interesting to see how it compares on >> > > Linux. >> >> > > Brian >> >> I can't run tune under Windows so, no, the same tuning parameters are >> being used in both runs. >> >> If you have tuning parameters that might be appropriate for an AMD >> Athlon X2, I can try them. >> >> I am confident that the right code is being used in these comparisons >> because I use a debugger to check this (I have been caught by this >> sort of problem previously). >> >> My existing code is basically a translation of Pierrick Gaudry's code >> for YASM with Intel syntax. >> >> Brian > > Hi All > > I have tracked down a problem in my Windows conversion of the > mul_basecase code and this is now showing a good performance gain from > 9,520 to 10,100 - a good gain. > > In overall terms Jason's work takes my original Windows code from > 8,800 to 10,100 - a 15% gain in performance. This is without any > tuning so there may be more to be gained if the tuning parameters are > adjusted. > > Does anyone have a comparison between the tuning needed for our old > code and that for Jason's code. THis would help me as I can then try > out new parameters on Windows. I could try to get tune working on > Windows but I am fearful that this is likely to be a big job. > > A happy Christmas to all. > > Brian > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---