I have found a nice paper, "Extending a C-like Language for
Portable SIMD
Programming", (2012), by Roland L., Sebastian Hack and Ingo
Wald:
http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
SIMD programming is necessary in a system language, or in any
language
that wants to use the modern CPUs well. So languages like C,
C++, D (and
Mono-C#) support such wider registers.
The authors of this paper have understood that it's also
important to
make SIMD programming easy, almost as easy as scalar code, so
most
programmers are able to write such kind of correct code.
So this this paper presents ideas to better express SIMD
semantics in a
C-like language. They introduce few new constructs in a large
subset of C
language, with few ideas. The result coding patterns seem
easy enough (they
are surely look simpler than most multi-core coding patterns
I've seen,
including Cilk+).
They present a simple scalar program in C:
struct data_t {
int key;
int other;
};
int search(data_t* data , int N) {
for (int i = 0; i < N; i++) {
int x = data[i].key;
if (4 < x & x <= 8) return x;
}
return -1;
}
Then they explain the three most common ways to represent an
array of
structs, here a struct that contains 3 values:
x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6
z6 x7 y7 z7
(a) Array of Structures (AoS)
x0 x1 x2 x3 x4 x5 x6 x7 y0 y1 y2 y3 y4 y5 y6 y7 z0 z1 z2
z3 z4 z5 z6
z7
(b) Structure of Arrays (SoA)
x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7
z4 z5 z6 z7
(c) Hybrid Structure of Arrays (Hybrid SoA)
They explain how the (c) is the preferred pattern in SIMD
programming.
Using the (c) data pattern they show how in C with (nice)
SIMD intrinsics
you write vectorized code (a simd_data_t struct instance
contains 8 int
values):
struct simd_data_t {
simd_int key;
simd_int other;
};
int search(simd_data_t* data , int N) {
for (int i = 0; i < N/L; ++i) {
simd_int x = load(data[i].key);
simd_int cmp = simd_and(simd_lt(4, x),
simd_le(x, 8));
int mask = simd_to_mask(cmp);
if (mask != 0) {
simd_int result = simd_and(mask , x);
for (int j = 0; j < log2(L); j++)
result = simd_or(result ,
whole_reg_shr(result , 1 << j));
return simd_extract(result , 0);
}
}
return -1;
}
D should do become able to do this (that is not too much
bad), or better.
Their C language extensions allow to write nicer code like:
struct data_t {
int key;
int other;
};
int search(data_t *scalar data , int scalar N) {
int L = lengthof(*data);
for (int i = 0; i < N/L; ++i) {
int x = data[i].key;
if (4 < x & x <= 8)
int block[L] result = [x, 0];
scalar {
for (int j = 0; j < log2(L); ++j)
result |= whole_reg_shr(result , 1 << j);
return get(x, 0);
}
}
return -1;
}
This is based on just few simple ideas, explained in the
paper (they are
interesting, but quoting here those parts of the paper is not
a good idea).
Such ideas are not directly portable to D (unless the
front-end is changed.
Their compiler works by lowering, and emits regular C++ code
with
intrinsics).
Near the end of the paper they also propose some C++ library
code:
the C++ template mechanism would allow to define a hybrid
SoA container
class: Similar to std::vector which abstracts a traditional
C array, one
could implement a wrapper around a T block[N]*:<
// scalar context throughout this example
struct vec3 { float x, y, z; };
// vec3 block[N]* pointing to ceil(n/N) elements
hsoa <vec3 > vecs(n);
// preferred vector length of vec3 automatically derived
static const int N = hsoa <vec3 >::vector_length;
int i = /*...*/
hsoa <vec3 >::block_index ii = /*...*/
vec3 v = vecs[i]; // gather
vecs[i] = v; // scatter
vec3 block[N] w = vecs[ii]; // fetch whole block
hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
r = v; // pipe through proxy
// for each element
vecs.foreach([](vec3& scalar v) { /*...*/ });
Regardless of the other ideas of their C-like language, a
similar struct
should be added to Phobos once a bit higher level SIMD
support is in better
shape in D. Supporting Hybrid-SoA and few operations on it
will be an
important but probably quite short and simple addition to
Phobos
collections (it's essentially an struct that acts like an
array, with few
simple extra operations).
I think no commonly used language allows both very simple and
quite
efficient SIMD programming (Scala, CUDA, C, C++, C#, Java,
Go, and
currently Rust too, are not able to support SIMD programming
well. I think
currently Haskell too is not supporting it well, but Haskell
is very
flexible, and it's compiled by a native compiler, so such
things are maybe
possible to add). So supporting it well in D will be an
interesting selling
point of D. (Supporting a very simple SIMD coding in D will
make D more
widespread, but such kind of programming will probably keep
being a small
niche).
Bye,
bearophile