On 16 January 2012 19:01, Timon Gehr <timon.g...@gmx.ch> wrote:

> On 01/16/2012 05:59 PM, Manu wrote:
>
>> On 16 January 2012 18:48, Andrei Alexandrescu
>> <seewebsiteforem...@erdani.org 
>> <mailto:SeeWebsiteForEmail@**erdani.org<seewebsiteforem...@erdani.org>
>> >>
>>
>> wrote:
>>
>>    On 1/16/12 10:46 AM, Manu wrote:
>>
>>        A function using float arrays and a function using hardware vectors
>>        should certainly not be the same speed.
>>
>>
>>    My point was that the version using float arrays should
>>    opportunistically use hardware ops whenever possible.
>>
>>
>> I think this is a mistake, because such a piece of code never exists
>> outside of some context. If the context it exists within is all FPU code
>> (and it is, it's a float array), then swapping between FPU and SIMD
>> execution units will probably result in the function being slower than
>> the original (also the float array is unaligned). The SIMD version
>> however must exist within a SIMD context, since the API can't implicitly
>> interact with floats, this guarantees that the context of each function
>> matches that within which it lives.
>> This is fundamental to fast vector performance. Using SIMD is an all or
>> nothing decision, you can't just mix it in here and there.
>> You don't go casting back and fourth between floats and ints on every
>> other line... obviously it's imprecise, but it's also a major
>> performance hazard. There is no difference here, except the performance
>> hazard is much worse.
>>
>
> I think DMD now uses XMM registers for scalar floating point arithmetic on
> x86_64.
>

x64 can do the swapping too with no penalty, but that is the only
architecture that can. So it might be a viable x64 optimisation, but only
for x64 codegen, which means any tech to detect and apply the optimisation
should live in the back end, not in the front end as a higher level
semantic.

Reply via email to