at http://www.digitalmischief.co.uk/fruitbowl/
is a gmp with a new mul/addmul and basecase running at 2.5c/l

On my machine a 1800Mhz sempron , I get a benchmark score of 8209 which when 
scaled to a 2600Mhz cpu (for example) would give a score of 11860.

The new mul_basecase is actually slower than the old one until you hit about 8 
limbs, so obviously some more work needs to be done for the smaller sizes.
I rewrote mpn_mul_1 to use a varient which did not need pipelining , and 
compairing this to mpn_addmul_1 which does use pipelining , at small sizes we 
get

./speed -r -c -s 1-40 mpn_mul_1.23 mpn_addmul_1.23
overhead 6.04 cycles, precision 10000 units of 5.53e-10 secs, CPU freq 1808.24 
MHz
         mpn_mul_1.23 mpn_addmul_1.23
1               15.11       #0.5334
2              #17.12        1.3529
3              #18.14        1.2774
4              #22.16        1.1364
5              #25.19        1.1999
6              #26.20        1.3074
7              #30.22        1.3001
8              #33.25        1.1826
9              #35.27        1.2290
10             #37.28        1.2428
11             #41.31        1.1464
12             #44.33        1.0916
13             #45.34        1.1780
14             #48.36        1.1663
15             #51.38        1.1177
16             #55.40        1.0555
17             #55.42        1.1456
18             #59.44        1.1184
19             #61.46        1.0992
20             #66.47        1.0312
21             #65.49        1.1233
22             #70.52        1.0856
23             #71.53        1.0851
24             #77.54        1.0139
25             #75.57        1.1071
26             #81.59        1.0616
27             #81.61        1.0746
28              88.72       #0.9995
29             #85.79        1.0926
30             #92.76        1.0437
31             #91.81        1.0664
32              99.78       #0.9911
33             #95.90        1.0833
34            #103.92        1.0292
35            #101.90        1.0598
36             124.72       #0.8738
37             119.83       #0.9510
38            #127.80        1.0157
39            #125.87        1.0395
40            #131.88        1.0148

So perhaps to get better speed for small sizes , we may have to use 
non-pipelined version , anyway there are still more overheads that can be 
optimized first.

The new mul_basecase is also a "cut and paste" of mul_1/addmul_1 so I wouldn't 
bother to look at it. I'll make a neat version when I get faster version of 
mul/addmul

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to