[mpir-devel] Re: K8 mul_basecase

jason Tue, 23 Dec 2008 15:52:50 -0800

On Tuesday 23 December 2008 23:31:33 ja...@njkfrudils.plus.com wrote:
> On Tuesday 23 December 2008 22:52:10 Cactus wrote:
> > On Dec 22, 11:55 pm, jason <ja...@njkfrudils.plus.com> wrote:
> > > On Dec 20, 1:13 pm, Cactus <rieman...@googlemail.com> wrote:
> > > > On Dec 20, 10:49 am, Cactus <rieman...@googlemail.com> wrote:
> > > > > On Dec 20, 3:56 am, "Bill Hart" <goodwillh...@googlemail.com>
> > > > > wrote:
> > > >
> > > > Following up my earlier results, I have now played with alignment and
> > > > jump decisions and I find that:
> > > >
> > > >     jc      .1
> > > >     jmp     .2
> > > >
> > > >     align   16
> > > > .1:mov     rax, [r10+r8*8]
> > > >
> > > > in which there is a jump to aligned code (rather than falling through
> > > > and hence executing the padding code) gives significantly better
> > > > results:
> > > >
> > > >  Jason's Code (mp_add_n and mp_sub_n):
> > > > Jason's Code (mp_addmul_n and mp_submul_n):
> > > > Jason's Code (mp_mul_1):
> > > >
> > > > Running benchmarks
> > > >   Category base
> > > >     Program multiply
> > > >       multiply 128 128
> > > >       MPIRbench.base.multiply.128.128 result: 26701842
> > > >       multiply 512 512
> > > >       MPIRbench.base.multiply.512.512 result: 6455010
> > > >       multiply 8192 8192
> > > >       MPIRbench.base.multiply.8192.8192 result: 61537
> > > >       multiply 131072 131072
> > > >       MPIRbench.base.multiply.131072.131072 result: 938
> > > >       multiply 2097152 2097152
> > > >       MPIRbench.base.multiply.2097152.2097152 result: 23.0
> > > >     MPIRbench.base.multiply result: 46978.70
> > > >     Program divide
> > > >       divide 8192 32
> > > >       MPIRbench.base.divide.8192.32 result: 677900
> > > >       divide 8192 64
> > > >       MPIRbench.base.divide.8192.64 result: 689331
> > > >       divide 8192 128
> > > >       MPIRbench.base.divide.8192.128 result: 269308
> > > >       divide 8192 4096
> > > >       MPIRbench.base.divide.8192.4096 result: 116612
> > > >       divide 8192 8064
> > > >       MPIRbench.base.divide.8192.8064 result: 1027764
> > > >       divide 131072 8192
> > > >       MPIRbench.base.divide.131072.8192 result: 2667
> > > >       divide 131072 65536
> > > >       MPIRbench.base.divide.131072.65536 result: 1249
> > > >       divide 8388608 4194304
> > > >       MPIRbench.base.divide.8388608.4194304 result: 2.56
> > > >     MPIRbench.base.divide result: 24471.64
> > > >   MPIRbench.base result 33906.43
> > > >   Category app
> > > >     Program rsa
> > > >       rsa 512
> > > >       MPIRbench.app.rsa.512 result: 14055
> > > >       rsa 1024
> > > >       MPIRbench.app.rsa.1024 result: 2735
> > > >       rsa 2048
> > > >       MPIRbench.app.rsa.2048 result: 498
> > > >     MPIRbench.app.rsa result: 2675.09
> > > >   MPIRbench.app result 2675.09
> > > > MPIRbench result: 9523.81
> > > >
> > > > This is about 8% faster than my original Windows code.
> > > >
> > > > Well done Jason!
> > > >
> > > >      Brian
> > >
> > > I've put the mpn_mul_basecase in the mpir development branch , ready
> > > for conversion to windows.http://www.digitalmischief.co.uk/fruitbowl/is
> > > the latest with a new mpn_sqr_basecase and mpn_redc_basecase , which
> > > overall gives me a 60% (which by co-incidence is the same ratio as
> > > 4/2.5 the addmul
> > > ratio's!!!) improvement over gmp-4.2.4, they are very much still
> > > cut&paste , so expect a few more % in time. I'm going to try a
> > > division_basecase and a mullow and mulhigh basecase next , there is
> > > also a addmul loop in bdivmod.c which does something , and may be
> > > worth doing.
> >
> > Hi Jason,
> >
> > Thanks for the mpn_mul_basecase code.
> >
> > I have converted this to Windows and it is slower than my old code -
> > the mpirbench score with the new code is 9350 whereas the current code
> > is 9550, which is a 2% performance loss. Only the mpn_mul_basecase
> > code is different - I have kept your other routines in place in making
> > this comparison.
>
> Odd!!!
> Did you run tune? , I assume your old code is is the Gaudry code, ,doesn't
> even sound like its running!!


cant run make speed on mpir trunck (gcd broke it)
on my K8 linux ./speed -c -s 1-40 mpn_mul_basecase gives
gmp4.2.4
1               26.18
2               52.36
3               82.56
4              148.04
5              192.37
6              242.70
7              307.24
8              385.07
9              552.10
10             647.35
11             766.93
12             874.69
13             990.73
14            1114.40
15            1278.22
16            1414.25
17            1559.25
18            1717.57
19            1917.33
20            2082.33
21            2258.80
22            2442.40
23            2684.40
24            2881.25
25            3085.25
26            3301.50
27            3584.67
28            3807.67
29            4040.67
30            4283.67
31            4604.67
32            4855.67
33            5131.50
34            5408.50
35            5765.50
36            6046.50
37            6338.50
38            6641.50
39            7041.50
40            7350.50

mpir toom3 branch
1                8.06
2               18.14
3               59.44
4               92.79
5              137.01
6              178.34
7              227.98
8              282.49
9              357.27
10             424.28
11             508.00
12             592.33
13             676.88
14             767.20
15             864.15
16             968.50
17            1077.09
18            1346.75
19            1454.00
20            1460.75
21            1595.14
22            1734.14
23            1881.33
24            2034.00
25            2193.60
26            2356.60
27            2526.00
28            2702.60
29            2887.50
30            3074.50
31            3268.00
32            3468.50
33            3680.00
34            4163.33
35            4378.67
36            4403.00
37            4634.00
38            4871.00
39            5135.50
40            5385.50

mpir-k8 branch
1                8.06
2               21.16
3               54.38
4               76.55
5              102.73
6              136.97
7              182.28
8              219.56
9              261.86
10             332.27
11             404.04
12             455.41
13             515.80
14             594.41
15             691.75
16             757.93
17             836.85
18             949.82
19            1073.60
20            1153.50
21            1249.44
22            1371.38
23            1518.50
24            1616.71
25            1729.86
26            1875.50
27            2050.67
28            2161.00
29            2292.20
30            2457.00
31            2656.20
32            2789.25
33            2938.00
34            3126.00
35            3353.00
36            3495.25
37            3667.67
38            4369.00
39            4636.33
40            4807.00

sounds like some sort of configure problem
should be about 25% faster than Gaudry on mpirbench , and a few % at least 
than prevous best 

Can some one else confirm the linux scores?

>
> > In this case there is about the same prologue/epilogue overhead in
> > both versions so it will be interesting to see how it compares on
> > Linux.
> >
> >     Brian
>
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

[mpir-devel] Re: K8 mul_basecase

Reply via email to