David Miller <da...@davemloft.net> writes:

  Technically we could use this on some chips we don't distinguish on
  a fine enough granularity yet.  For example we can assume popc is
  available on T2 as well as UltraSPARC-IV.
  
  But for now, just T3 and later.
  
I suppose we should mention this as a comment in the code.

  I think that popc runs in the multiplier unit on T4, and thus has
  similar characteristics.  It fully pipelines but has a latency of
  12 cycles.
  
That's one deep pipeline!

  2013-03-22  David S. Miller  <da...@davemloft.net>
  
        * mpn/sparc64/ultrasparct3/hamdist.asm: New file.
        * mpn/sparc64/ultrasparct3/popcount.asm: New file.
  
The code is in.  Thanks for this contribution!  I also updated the
asm.html tables.  You have a lot of work to do before the T4 column is
filled in with optimal code...

I actually wrote a v9 popcount a while back.  It is about 5 times as
large as yours, and I don't think it runs faster enough be worth it.
I attached it anyway.

Attachment: sparc64-popcount.asm
Description: Binary data

-- 
Torbjörn
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
http://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to