On Monday 19 May 2008, Bill Hart wrote:
> You seemed to be getting up to 8% at points there. That's definitely
> worth it. I'll be interested to see this evening how it comes out,
> though I recommend optimising my combine3 function (which I suppose
> should now be combine8), even including it inline rather than have it
> in a separate file.
>
> Of course on the Opteron, SSE should be switched off, since it is
> definitely slower by about 5%-10% even with careful optimisation.
>
> Bill.

Okay, so a good compromise is to remove all SSE2 stuff from the main function 
_mzd_mul_m4rm_impl and put it in static inline _mzd_combine8 function which 
is specifically tailored towards this particular application. Thus the code 
still looks relatively pretty/elegant but we can have SSE2 support. 

Martin


-- 
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_www: http://www.informatik.uni-bremen.de/~malb
_jab: [EMAIL PROTECTED]


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to sage-devel@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to