Re: OOP, faster data layouts, compilers

Don Tue, 26 Apr 2011 01:06:04 -0700

Sean Cavanaugh wrote:

On 4/22/2011 2:20 PM, bearophile wrote:
Kai Meyer:
The purpose of the original post was to indicate that some low level
research shows that underlying data structures (as applied to video game
development) can have an impact on the performance of the application,
which D (I think) cares very much about.
The idea of the original post was a bit more complex: how can weinvent new/better ways to express semantics in D code that will notforbid future D compilers to perform a bit of changes in the layout ofdata structures to increase code performance? Complex transforms ofthe data layout seem too much complex for even a good compiler, butmaybe simpler ones will be possible. And I think to do this the D codeneeds some more semantics. I was suggesting an annotation that forbidsinbound pointers, that allows the compiler to move data around alittle, but this is just a start.
Bye,
bearophile
In many ways the biggest thing I use regularly in game development thatI would lose by moving to D would be good built-in SIMD support. The PCcompilers from MS and Intel both have intrinsic data types andinstructions that cover all the operations from SSE1 up to AVX. Theintrinsics are nice in that the job of register allocation andscheduling is given to the compiler and generally the code it outputs isgood enough (though it needs to be watched at times).
Unlike ASM, intrinsics can be inlined so your math library can provide aplatform abstraction at that layer before building up to largeroperations (like vectorized forms of sin, cos, etc) and algorithms (likefrustum cull checks, k-dop polygon collision etc), which makes portingand reusing the algorithms to other platforms much much easier, as onlythe low level layer needs to be ported, and only outliers at thealgorithm level need to be tweaked after you get it up and running.
On the consoles there is AltiVec (VMX) which is very similar to SSE inmany ways. The common ground is basically SSE1 tier operations : 128bit values operating on 4x32 bit integer and 4x32 bit float support. 64bit AMD/Intel makes SSE2 the minimum standard, and a systems language onthose platforms should reflect that.

Yes. It is for primarily for this reason that we made static arraysreturn-by-value. It is intended that on x86, float[4] will be an SSE1register.So it should be possible to write SIMD code with standard arrayoperations. (Note that this is *much* easier for the compiler, thantrying to vectorize scalar code).


This gives syntax like:
float[4] a, b, c;
a[] += b[] * c[];
(currently works, but doesn't use SSE, so has dismal performance).

Loading and storing is comparable across platforms with similaralignment restrictions or penalties for working with unaligned data.Packing/swizzle/shuffle/permuting are different but this is not a hugeproblem for most algorithms. The lack of fused multiply and add on theIntel side can be worked around or abstracted (i.e. always write code asif it existed, have the Intel version expand to multiple ops).
And now my wish list:
If you have worked with shader programming through HLSL or CG theexpressiveness of doing the work in SIMD is very high. If I could writesomething that looked exactly like HLSL but it was integrated perfectlyin a language like D or C++, it would be pretty huge to me. The amountof math you can have in a line or two in HLSL is mind boggling at times,yet extremely intuitive and rather easy to debug.

Re: OOP, faster data layouts, compilers

Reply via email to