Hi

New karasub/add for nehalem , I did do a re-shuffle and it was pretty much 
optimal , but the feedin/winddown code killed it , so at the mo this is just 
the existing K10 code with the inc's replaced by an add and lea's , just do a 
diff.
Going to a 3-way unroll on karasub free's up 2 registers which we can use to 
store the carrys therefore the false dependence is gone and the latency bound 
is much reduced , but in practice I couldn't find anything better :( , and 
storing the carry in a mem location was even worse.

I expect the other Intel chips will benefit as well from inc's to lea's , I'll 
check it out later.

Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to