Running the k8/k10 asm code with no changes on the core2 machine sage we get 
this

popcount,hamdist   no popcount instruction

slowdowns---------
add,sub are 0.50x
rshift1,lshift1 0.70x
k8 lshift,rshift 0.91x
addmul_1,submul_1  0.89x   but faster for <20 limbs...

speedups----------
and,ior,xor are 1.13x
nand,nior,xnor,andn,iorn are 1.50x
com is 2.00x
divebyff 1.40x  although not better until 12limbs
diveby3  2.30x
addadd,addsub 1.50x
sumdiff 1.26x
addlsh1 1.50x
sublsh1 1.40x
k10 lshift,rshift 1.18x
mul_1   1.04x

for mul basecase we get
./speed -c -r -s 1-40 mpn_jaytest mpn_mul_basecase
overhead 6.12 cycles, precision 10000 units of 3.75e-10 secs, CPU freq 2666.76 
MHz
          mpn_jaytest mpn_mul_basecase
1               #9.21        2.3531
2              #21.43        2.0070
3              #56.00        1.5234
4              #91.36        1.4960
5             #136.17        1.4320
6             #195.54        1.3769
7             #261.22        1.3718
8             #336.56        1.3482
9             #419.62        1.3441
10            #527.14        1.3054
11            #634.44        1.3105
12            #744.00        1.3172
13            #873.85        1.2931
14           #1024.55        1.1088
15           #1169.00        1.0873
16           #1328.89        1.0704
17           #1492.50        1.0672
18           #1710.00        1.0317
19           #1880.00        1.0488
20           #2112.00        1.0246
21           #2288.00        1.0385
22           #2547.50        1.0128
23           #2787.50        1.0063
24           #3012.50        1.0108
25           #3212.50        1.0241
26            3556.67       #0.9953
27            3836.67       #0.9939
28            4106.67       #0.9935
29            4370.00       #0.9908
30            4700.00       #0.9617
31            4996.67       #0.9973
32            5380.00       #0.9944
33            5685.00       #0.9727
34            6105.00       #0.9853
35            6375.00       #0.9906
36            6775.00       #0.9764
37            7540.00       #0.9151
38            7515.00       #0.9714
39           #7955.00        1.0578
40           #8750.00        1.0086

This is all with no tweeking on
cpu family      : 6
model           : 29
model name      : Intel(R) Xeon(R) CPU           X7460  @ 2.66GHz
stepping        : 1





--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to