On Wednesday 13 July 2011 10:06:50 Jason wrote:
> Hi
> 
> New karasub/add for nehalem , I did do a re-shuffle and it was pretty much
> optimal , but the feedin/winddown code killed it , so at the mo this is
> just the existing K10 code with the inc's replaced by an add and lea's ,
> just do a diff.
> Going to a 3-way unroll on karasub free's up 2 registers which we can use
> to store the carrys therefore the false dependence is gone and the latency
> bound is much reduced , but in practice I couldn't find anything better :(
> , and storing the carry in a mem location was even worse.
> 
> I expect the other Intel chips will benefit as well from inc's to lea's ,
> I'll check it out later.
> 
> Jason

Hi

I added two new asm functions for the K8/K10

mpn_double(mp_ptr rp,mp_size_t n)

mpn_half(mp_ptr rp,mp_size_t n)

which are inplace left and right shifts by 1 , these functions can be used 
unconditionally as they fall back via macro's to lshift1 or even lshift

On the K8/K10 they are 30% faster than our existing lshift1/rshift1 running at 
the optimal 1.0 cycle per word.

I'll add the relevant C code later today , as I've only done a few quick tests 
on them so far.

Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to