David Miller <da...@davemloft.net> writes: Technically we could use this on some chips we don't distinguish on a fine enough granularity yet. For example we can assume popc is available on T2 as well as UltraSPARC-IV. But for now, just T3 and later. I suppose we should mention this as a comment in the code.
I think that popc runs in the multiplier unit on T4, and thus has similar characteristics. It fully pipelines but has a latency of 12 cycles. That's one deep pipeline! 2013-03-22 David S. Miller <da...@davemloft.net> * mpn/sparc64/ultrasparct3/hamdist.asm: New file. * mpn/sparc64/ultrasparct3/popcount.asm: New file. The code is in. Thanks for this contribution! I also updated the asm.html tables. You have a lot of work to do before the T4 column is filled in with optimal code... I actually wrote a v9 popcount a while back. It is about 5 times as large as yours, and I don't think it runs faster enough be worth it. I attached it anyway.
sparc64-popcount.asm
Description: Binary data
-- Torbjörn
_______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel