On Wednesday 13 July 2011 10:06:50 Jason wrote: > Hi > > New karasub/add for nehalem , I did do a re-shuffle and it was pretty much > optimal , but the feedin/winddown code killed it , so at the mo this is > just the existing K10 code with the inc's replaced by an add and lea's , > just do a diff. > Going to a 3-way unroll on karasub free's up 2 registers which we can use > to store the carrys therefore the false dependence is gone and the latency > bound is much reduced , but in practice I couldn't find anything better :( > , and storing the carry in a mem location was even worse. > > I expect the other Intel chips will benefit as well from inc's to lea's , > I'll check it out later. > > Jason
Hi I added two new asm functions for the K8/K10 mpn_double(mp_ptr rp,mp_size_t n) mpn_half(mp_ptr rp,mp_size_t n) which are inplace left and right shifts by 1 , these functions can be used unconditionally as they fall back via macro's to lshift1 or even lshift On the K8/K10 they are 30% faster than our existing lshift1/rshift1 running at the optimal 1.0 cycle per word. I'll add the relevant C code later today , as I've only done a few quick tests on them so far. Jason -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-devel@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.