Hi New karasub/add for nehalem , I did do a re-shuffle and it was pretty much optimal , but the feedin/winddown code killed it , so at the mo this is just the existing K10 code with the inc's replaced by an add and lea's , just do a diff. Going to a 3-way unroll on karasub free's up 2 registers which we can use to store the carrys therefore the false dependence is gone and the latency bound is much reduced , but in practice I couldn't find anything better :( , and storing the carry in a mem location was even worse.
I expect the other Intel chips will benefit as well from inc's to lea's , I'll check it out later. Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.