On Tuesday 23 December 2008 23:31:33 ja...@njkfrudils.plus.com wrote: > On Tuesday 23 December 2008 22:52:10 Cactus wrote: > > On Dec 22, 11:55 pm, jason <ja...@njkfrudils.plus.com> wrote: > > > On Dec 20, 1:13 pm, Cactus <rieman...@googlemail.com> wrote: > > > > On Dec 20, 10:49 am, Cactus <rieman...@googlemail.com> wrote: > > > > > On Dec 20, 3:56 am, "Bill Hart" <goodwillh...@googlemail.com> > > > > > wrote: > > > > > > > > Following up my earlier results, I have now played with alignment and > > > > jump decisions and I find that: > > > > > > > > jc .1 > > > > jmp .2 > > > > > > > > align 16 > > > > .1:mov rax, [r10+r8*8] > > > > > > > > in which there is a jump to aligned code (rather than falling through > > > > and hence executing the padding code) gives significantly better > > > > results: > > > > > > > > Jason's Code (mp_add_n and mp_sub_n): > > > > Jason's Code (mp_addmul_n and mp_submul_n): > > > > Jason's Code (mp_mul_1): > > > > > > > > Running benchmarks > > > > Category base > > > > Program multiply > > > > multiply 128 128 > > > > MPIRbench.base.multiply.128.128 result: 26701842 > > > > multiply 512 512 > > > > MPIRbench.base.multiply.512.512 result: 6455010 > > > > multiply 8192 8192 > > > > MPIRbench.base.multiply.8192.8192 result: 61537 > > > > multiply 131072 131072 > > > > MPIRbench.base.multiply.131072.131072 result: 938 > > > > multiply 2097152 2097152 > > > > MPIRbench.base.multiply.2097152.2097152 result: 23.0 > > > > MPIRbench.base.multiply result: 46978.70 > > > > Program divide > > > > divide 8192 32 > > > > MPIRbench.base.divide.8192.32 result: 677900 > > > > divide 8192 64 > > > > MPIRbench.base.divide.8192.64 result: 689331 > > > > divide 8192 128 > > > > MPIRbench.base.divide.8192.128 result: 269308 > > > > divide 8192 4096 > > > > MPIRbench.base.divide.8192.4096 result: 116612 > > > > divide 8192 8064 > > > > MPIRbench.base.divide.8192.8064 result: 1027764 > > > > divide 131072 8192 > > > > MPIRbench.base.divide.131072.8192 result: 2667 > > > > divide 131072 65536 > > > > MPIRbench.base.divide.131072.65536 result: 1249 > > > > divide 8388608 4194304 > > > > MPIRbench.base.divide.8388608.4194304 result: 2.56 > > > > MPIRbench.base.divide result: 24471.64 > > > > MPIRbench.base result 33906.43 > > > > Category app > > > > Program rsa > > > > rsa 512 > > > > MPIRbench.app.rsa.512 result: 14055 > > > > rsa 1024 > > > > MPIRbench.app.rsa.1024 result: 2735 > > > > rsa 2048 > > > > MPIRbench.app.rsa.2048 result: 498 > > > > MPIRbench.app.rsa result: 2675.09 > > > > MPIRbench.app result 2675.09 > > > > MPIRbench result: 9523.81 > > > > > > > > This is about 8% faster than my original Windows code. > > > > > > > > Well done Jason! > > > > > > > > Brian > > > > > > I've put the mpn_mul_basecase in the mpir development branch , ready > > > for conversion to windows.http://www.digitalmischief.co.uk/fruitbowl/is > > > the latest with a new mpn_sqr_basecase and mpn_redc_basecase , which > > > overall gives me a 60% (which by co-incidence is the same ratio as > > > 4/2.5 the addmul > > > ratio's!!!) improvement over gmp-4.2.4, they are very much still > > > cut&paste , so expect a few more % in time. I'm going to try a > > > division_basecase and a mullow and mulhigh basecase next , there is > > > also a addmul loop in bdivmod.c which does something , and may be > > > worth doing. > > > > Hi Jason, > > > > Thanks for the mpn_mul_basecase code. > > > > I have converted this to Windows and it is slower than my old code - > > the mpirbench score with the new code is 9350 whereas the current code > > is 9550, which is a 2% performance loss. Only the mpn_mul_basecase > > code is different - I have kept your other routines in place in making > > this comparison. > > Odd!!! > Did you run tune? , I assume your old code is is the Gaudry code, ,doesn't > even sound like its running!!
cant run make speed on mpir trunck (gcd broke it) on my K8 linux ./speed -c -s 1-40 mpn_mul_basecase gives gmp4.2.4 1 26.18 2 52.36 3 82.56 4 148.04 5 192.37 6 242.70 7 307.24 8 385.07 9 552.10 10 647.35 11 766.93 12 874.69 13 990.73 14 1114.40 15 1278.22 16 1414.25 17 1559.25 18 1717.57 19 1917.33 20 2082.33 21 2258.80 22 2442.40 23 2684.40 24 2881.25 25 3085.25 26 3301.50 27 3584.67 28 3807.67 29 4040.67 30 4283.67 31 4604.67 32 4855.67 33 5131.50 34 5408.50 35 5765.50 36 6046.50 37 6338.50 38 6641.50 39 7041.50 40 7350.50 mpir toom3 branch 1 8.06 2 18.14 3 59.44 4 92.79 5 137.01 6 178.34 7 227.98 8 282.49 9 357.27 10 424.28 11 508.00 12 592.33 13 676.88 14 767.20 15 864.15 16 968.50 17 1077.09 18 1346.75 19 1454.00 20 1460.75 21 1595.14 22 1734.14 23 1881.33 24 2034.00 25 2193.60 26 2356.60 27 2526.00 28 2702.60 29 2887.50 30 3074.50 31 3268.00 32 3468.50 33 3680.00 34 4163.33 35 4378.67 36 4403.00 37 4634.00 38 4871.00 39 5135.50 40 5385.50 mpir-k8 branch 1 8.06 2 21.16 3 54.38 4 76.55 5 102.73 6 136.97 7 182.28 8 219.56 9 261.86 10 332.27 11 404.04 12 455.41 13 515.80 14 594.41 15 691.75 16 757.93 17 836.85 18 949.82 19 1073.60 20 1153.50 21 1249.44 22 1371.38 23 1518.50 24 1616.71 25 1729.86 26 1875.50 27 2050.67 28 2161.00 29 2292.20 30 2457.00 31 2656.20 32 2789.25 33 2938.00 34 3126.00 35 3353.00 36 3495.25 37 3667.67 38 4369.00 39 4636.33 40 4807.00 sounds like some sort of configure problem should be about 25% faster than Gaudry on mpirbench , and a few % at least than prevous best Can some one else confirm the linux scores? > > > In this case there is about the same prologue/epilogue overhead in > > both versions so it will be interesting to see how it compares on > > Linux. > > > > Brian > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---