Consider addmul on the k8 , running speed we get
mpn_addmul_1.333 1 8.06 2 17.12 3 18.14 4 21.15 5 23.17 6 27.20 7 31.23 8 32.24 9 36.27 10 37.27 11 41.30 12 41.31 13 46.34 14 47.35 15 51.38 16 51.38 17 56.42 18 57.43 19 61.46 20 61.46 1000 2543.00 Calculating the overhead for 1000 limbs we get 2543-2.5*1000=43 cycles and the overhead for 18 limbs is 57-2.5*18=12 cycles 13 cycles is a branch mis-predict you get on the final loop of a loop with a count >8 , so we have 30 cycles overhead on a 1000 limbs . Why? Surely the overheads should be the same ? This is not restricted to just addmul , all the functions are the same , we seem have that the overhead is proportional to the total runtime. Could the time stamp counter be at fault , or do we have a pipeline bubble every so often? Jason --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---