Do you get consistent numbers if you run only for a single value of n?
i.e. it's not an artifact of the way the buffers are allocated or
something?

david

On May 4, 10:27 am, Jason Moxham <ja...@njkfrudils.plus.com> wrote:
> Hi
>
> I've been playing with some assembler for the Intel Core2 chips and have come
> across this timing oddity which I cant explain . Any ideas?
>
> Attached is an attempt at mpn_addlsh1_n
>
> running timings for a few sizes
>  limbs       time in cycles
> 990           3358.04
> 991           3323.79
> 992           2787.45
> 993           3357.63
> 994           3358.74
> 995           3393.34
> 996           2798.41
> 997           3370.40
> 998           3389.18
> 999           3358.13
> 1000          2809.83
> 1001          3385.78
> 1002          3424.43
> 1003          3373.76
> 1004          2820.91
> 1005          3389.62
> 1006          3416.26
> 1007          3339.87
> 1008          2833.34
> 1009          3371.09
> 1010          3429.02
>
> As you can see the timings when n%4=0 are much faster , as it's a 4-way unroll
> we expect it to be a little faster  , but nothing like this. For example
> going from 1008 to 1009 limbs takes an extra 538 cycles !!!!!
> You will also notice a useless push %rbp , and the alignment for the loop is
> 32 not 16 , without this I could not get the fast speed for the n%4=0 case
> This is on a core2 and a penryn
>
> Jason
>
>  addlsh1_n.asm
> 2KViewDownload
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to