On 16 January 2012 19:01, Timon Gehr <timon.g...@gmx.ch> wrote: > On 01/16/2012 05:59 PM, Manu wrote: > >> On 16 January 2012 18:48, Andrei Alexandrescu >> <seewebsiteforem...@erdani.org >> <mailto:SeeWebsiteForEmail@**erdani.org<seewebsiteforem...@erdani.org> >> >> >> >> wrote: >> >> On 1/16/12 10:46 AM, Manu wrote: >> >> A function using float arrays and a function using hardware vectors >> should certainly not be the same speed. >> >> >> My point was that the version using float arrays should >> opportunistically use hardware ops whenever possible. >> >> >> I think this is a mistake, because such a piece of code never exists >> outside of some context. If the context it exists within is all FPU code >> (and it is, it's a float array), then swapping between FPU and SIMD >> execution units will probably result in the function being slower than >> the original (also the float array is unaligned). The SIMD version >> however must exist within a SIMD context, since the API can't implicitly >> interact with floats, this guarantees that the context of each function >> matches that within which it lives. >> This is fundamental to fast vector performance. Using SIMD is an all or >> nothing decision, you can't just mix it in here and there. >> You don't go casting back and fourth between floats and ints on every >> other line... obviously it's imprecise, but it's also a major >> performance hazard. There is no difference here, except the performance >> hazard is much worse. >> > > I think DMD now uses XMM registers for scalar floating point arithmetic on > x86_64. >
x64 can do the swapping too with no penalty, but that is the only architecture that can. So it might be a viable x64 optimisation, but only for x64 codegen, which means any tech to detect and apply the optimisation should live in the back end, not in the front end as a higher level semantic.