Do you get consistent numbers if you run only for a single value of n? i.e. it's not an artifact of the way the buffers are allocated or something?
david On May 4, 10:27 am, Jason Moxham <ja...@njkfrudils.plus.com> wrote: > Hi > > I've been playing with some assembler for the Intel Core2 chips and have come > across this timing oddity which I cant explain . Any ideas? > > Attached is an attempt at mpn_addlsh1_n > > running timings for a few sizes > limbs time in cycles > 990 3358.04 > 991 3323.79 > 992 2787.45 > 993 3357.63 > 994 3358.74 > 995 3393.34 > 996 2798.41 > 997 3370.40 > 998 3389.18 > 999 3358.13 > 1000 2809.83 > 1001 3385.78 > 1002 3424.43 > 1003 3373.76 > 1004 2820.91 > 1005 3389.62 > 1006 3416.26 > 1007 3339.87 > 1008 2833.34 > 1009 3371.09 > 1010 3429.02 > > As you can see the timings when n%4=0 are much faster , as it's a 4-way unroll > we expect it to be a little faster , but nothing like this. For example > going from 1008 to 1009 limbs takes an extra 538 cycles !!!!! > You will also notice a useless push %rbp , and the alignment for the loop is > 32 not 16 , without this I could not get the fast speed for the n%4=0 case > This is on a core2 and a penryn > > Jason > > addlsh1_n.asm > 2KViewDownload --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en -~----------~----~----~----~------~----~------~--~---