On 2/4/2012 7:37 PM, Martin Nowak wrote:
Am 05.02.2012, 02:13 Uhr, schrieb Manu <turkey...@gmail.com>:

On 5 February 2012 03:08, Martin Nowak <d...@dawgfoto.de> wrote:

Let me restate the main point.
Your approach to a higher level module wraps intrinsics with named
functions.
There is little gain in making simd(AND, f, f2) to and(f, f2) when
you can
easily take this to the level GLSL achieves.


What is missing to reach that level in your opinion? I think I basically
offer that (with some more work)
It's not clear to me what you object to...
I'm not prohibiting the operators, just adding the explicit functions,
which may be more efficient in certain cases (they receive the version).

Also the 'gains' of wrapping an intrinsic in an almost identical function
are, portability, and potential optimisation for hardware versioning. I'm
specifically trying to build something that's barely above the intrinsics
here, although a lot of the more arcane intrinsics are being collated
into
their typically useful functionality.

Are you just focused on the primitive math ops, or something broader?

GLSL achieves very clear and simple to write construction and conversion
of values.

I think wrapping the core.simd vector types in an alias this struct
makes it a snap
to define conversion through constructors and swizzling through
properties/opDispatch.
Then you can overload operands to do the implementation specific stuff
and add named methods
for the rest.


The GLSL or HLSL sync is fairly nice, but has a few advantages that are harder to take advantage of on PC SIMD:

The hardware that runs HLSL can handle natively operate on data types 'smaller' than the register, either handled natively or by turning all the instructions into a mass of scalar ops that are then run in parallel as best as possible. In SIMD land on CPU's the design is much more rigid: we are effectively stuck using float and float4 data types, and emulating float2 and float3. For a very long time there was not even a a dot product instruction, as from Intel's point of view your data is transposed incorrectly if you needed to do one (plus they have to handle dot2, dot3, dot4 etc).

The cost of this emulation of float2 and float3 types is that we have to put 'some data' in the unused slots of the SIMD register on swizzle operations, which will usually lead to the SIMD instructions generating INF's and NANs in that slot and hurting performance.

The other major problem with the shader swizzle syntax is that it 'doesnt scale'. If you are using a 128 register holding 8 shorts or 16 bytes, what are the letters here? Shaders assume 4 is the limit so you have either xyzw and rgba. Then there are platform considerations (i.e. you can can't swizzle 8 bit data on SSE, you have to use a series of pack|unpack and shuffles, but VMX can easily)

That said: shader swizzle syntax is very nice, it can certainly reduce the amount of code you write by a huge factor (though the codegen is another matter) Even silly tricks with swizzling literals in HLSL are useful like the following code to sum up some numbers:

if (dot(a, 1.f.xxx) > 0)


Reply via email to