Consider addmul on the k8 , running speed we get 

        mpn_addmul_1.333
1                8.06
2               17.12
3               18.14
4               21.15
5               23.17
6               27.20
7               31.23
8               32.24
9               36.27
10              37.27
11              41.30
12              41.31
13              46.34
14              47.35
15              51.38
16              51.38
17              56.42
18              57.43
19              61.46
20              61.46
1000          2543.00

Calculating the overhead for 1000 limbs we get 2543-2.5*1000=43 cycles
and the overhead for 18 limbs is 57-2.5*18=12 cycles
13 cycles is a branch mis-predict you get on the final loop of a loop with a 
count >8 , so we have 30 cycles overhead on a 1000 limbs . 
Why?
Surely the overheads should be the same ?
This is not restricted to just addmul , all the functions are the same , we 
seem have that the overhead is proportional to the total runtime. Could the 
time stamp counter be at fault , or do we have a pipeline bubble every so 
often?

Jason

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to