On 2 October 2012 05:28, F i L <witte2...@gmail.com> wrote: > Not to resurrect the dead, I just wanted to share an article I came across > concerning SIMD with Manu.. > > http://www.gamasutra.com/view/**feature/4248/designing_fast_** > crossplatform_simd_.php<http://www.gamasutra.com/view/feature/4248/designing_fast_crossplatform_simd_.php> > > QUOTE: > > 1. Returning results by value > > By observing the intrisics interface a vector library must imitate that > interface to maximize performance. Therefore, you must return the results > by value and not by reference, as such: > > //correct > inline Vec4 VAdd(Vec4 va, Vec4 vb) > { > return(_mm_add_ps(va, vb)); > }; > > On the other hand if the data is returned by reference the interface will > generate code bloat. The incorrect version below: > > //incorrect (code bloat!) > inline void VAddSlow(Vec4& vr, Vec4 va, Vec4 vb) > { > vr = _mm_add_ps(va, vb); > }; > > The reason you must return data by value is because the quad-word > (128-bit) fits nicely inside one SIMD register. And one of the key factors > of a vector library is to keep the data inside these registers as much as > possible. By doing that, you avoid unnecessary loads and stores operations > from SIMD registers to memory or FPU registers. When combining multiple > vector operations the "returned by value" interface allows the compiler to > optimize these loads and stores easily by minimizing SIMD to FPU or memory > transfers. > > 2. Data Declared "Purely" > > Here, "pure data" is defined as data declared outside a "class" or > "struct" by a simple "typedef" or "define". When I was researching various > vector libraries before coding VMath, I observed one common pattern among > all libraries I looked at during that time. In all cases, developers > wrapped the basic quad-word type inside a "class" or "struct" instead of > declaring it purely, as follows: > > class Vec4 > { > ... > private: > __m128 xyzw; > }; > > This type of data encapsulation is a common practice among C++ developers > to make the architecture of the software robust. The data is protected and > can be accessed only by the class interface functions. Nonetheless, this > design causes code bloat by many different compilers in different > platforms, especially if some sort of GCC port is being used. > > An approach that is much friendlier to the compiler is to declare the > vector data "purely", as follows: > > typedef __m128 Vec4; > > ENDQUOTE; > > > > > The article is 2 years old, but It appears my earlier performance issue > wasn't D related at all, but an issue with C as well. I think in this > situation, it might be best (most optimized) to handle simd "the C way" by > creating and alias or union of a simd intrinsic. D has a big advantage over > C/C++ here because of UFCS, in that we can write external functions that > appear no different to encapsulated object methods. That combined with > public-aliasing means the end-user only sees our pretty functions, but > we're not sacrificing performance at all. >
These are indeed common gotchas. But they don't necessarily apply to D, and if they do, then they should be bugged and hopefully addressed. There is no reason that D needs to follow these typical performance patterns from C. It's worth noting that not all C compilers suffer from this problem. There are many (most actually) compilers that can recognise a struct with a single member and treat it as if it were an instance of that member directly when being passed by value. It only tends to be a problem on older games-console compilers. As I said earlier. When I get back to finishing srd.simd off (I presume this will be some time after Walter has finished Win64 support), I'll go through and scrutinise the code-gen for the API very thoroughly. We'll see what that reveals. But I don't think there's any reason we should suffer the same legacy C by-value code-gen problems in D... (hopefully I won't eat those words ;)