On 2013-03-08 03:46, Torbjorn Granlund wrote:
I assume you mean that the destination ptr are naturally aligned, while
the source ptrs are 32-bit aligned?

Yes.

My guess for the "jaggyness" is that of two src ptrs, you rarely strike
a case where they are 256-bit aligned, in particular not when both are
256-bit aligned.  But that happens much more often for 128-bit
alignment.  My copy was alignment insensitive, perhaps thanks to
scheduling, or that it stresses the unaligned load logic less, with its
one load-per-store?

I don't know. I do know there's something bizzare going on that's probably needs some chip knowledge to figure out.

For instance, testing the -128 patch I posted here, and making no other change except *adding* :128 markers to both source operands, I hoped to determine what effect source alignment has on the loop. (This change is not generally correct, but does work for the case of speed with specified alignment.)

The peak result is slightly *slower* than before.

                with align                       without align
            mpn_and_n    mpn_nand_n          mpn_and_n    mpn_nand_n
10            #1.7989        1.8987              1.7990        1.8989
50            #0.9393        1.0693              0.9395        1.0694
100           #1.2491        1.3891              1.2496        1.3893
500           #0.8154        0.9753              0.8156        0.9756
1000           0.8746        1.0642             #0.7787        0.9435
5000          #1.4067        1.4939              1.5012        1.5577
10000         #1.5454        1.6702              1.5521        1.5926

_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to