On Sunday 19 June 2011 23:54:39 Bill Hart wrote:
> Looks promising.
> 
> I've started working on FFT's again. So maybe this MPIR release will
> have some new multiplication speedups at last!
> 
> Bill.
> 
> On 19 June 2011 23:24, Jason <ja...@njkfrudils.plus.com> wrote:
> >> 24) New Toom22 code , the new code is smaller if we let the high part
> >> 
> >> >= low part which is the opposite of the current code , so it's
> >> 
> >> probably easier just to rewrite the whole thing.
> > 
> > Hi
> > 
> > Here is a outline of the new toom22_n code , there are obvious O(1)
> > speedups to do , but I'll leave them until I've tested the new assembler
> > code as the linear part O(n) is what has improved . I rewrote all the
> > code as that was the easiest way as there are other slight minor
> > differences(and I do so hate reading other's code). The original code
> > has the differences between high an low parts and this has not changed ,
> > what has changed is the last section where we add/sub the sub-products
> > together to form the desired full product. Originally this consisted of
> > three add's which on the K8 run at 4.5 cycles per word , this was
> > improved with the new mpn_addadd_n function which ran at 3.5 cycles per
> > word and now I have a new mpn_karaadd(ie mpn_addaddadd) function which
> > runs at 2.5 cycles a word. The addadd function gave us 2-7% speedup and
> > I pretty much expect the same again.The lower bound is actually 2.0
> > cycles a word and I think I may be able to get it without too much pipe
> > lining. Similar improvements are possible on the Intel cpu's and many
> > others( the RISC cpu's are probably easier). I've only writen the inner
> > loop of addaddadd function so far but I dont for-see any difficulties ,
> > should be able to finish it it next week.In case you are wondering the
> > new asm code wont be general code like the mpn_addadd_n was but will be
> > specific for toom22 multiplication as it has to cope with operand
> > overlap and the "odd" cases.
> > 
> > 
> > Jason
> > 
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mpir-devel" group. To post to this group, send email to
> > mpir-devel@googlegroups.com. To unsubscribe from this group, send email
> > to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this
> > group at http://groups.google.com/group/mpir-devel?hl=en.


Perhaps we should consider a floating point FFT as it looks like all future 
cpu's will have much improved floating point capability (ie GPU's) so even if 
the algorithm is less effective the result will still run faster than an 
integer FFT , at least over some ranges. 

Also our selection mechanism (ie thresholds) can be be more complicated at the 
FFT sizes , we could use a large array to store the best param's (which we may 
be using some form of now ?.....)

Another thing is is fixed sized mpn's , mostly (I imagine) for ECM , I have 
current code which runs a little bit(2%) faster for small sizes and I think I 
can do a bit better (5%) for base-case sizes  , I don't know if this is 
practical for MPIR , but I'm pretty certain it could benefit ECM . I'm quoting 
AMD chips here as I'm more familiar with them, Intel chips should have a 
benefit too , but without actual code I'm speculating.

Jason

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-devel@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to