Re: Very simple SIMD programming

Paulo Pinto Wed, 24 Oct 2012 07:20:48 -0700

On Wednesday, 24 October 2012 at 12:50:44 UTC, Manu wrote:

On 24 October 2012 15:39, Paulo Pinto <pj...@progtools.org>wrote:
On Wednesday, 24 October 2012 at 02:41:53 UTC, bearophilewrote:
I have found a nice paper, "Extending a C-like Language forPortable SIMDProgramming", (2012), by Roland L., Sebastian Hack and IngoWald:
http://www.cdl.uni-saarland.**de/projects/vecimp/vecimp_tr.**pdf<http://www.cdl.uni-saarland.de/projects/vecimp/vecimp_tr.pdf>
SIMD programming is necessary in a system language, or in anylanguagethat wants to use the modern CPUs well. So languages like C,C++, D (and
Mono-C#) support such wider registers.
The authors of this paper have understood that it's alsoimportant tomake SIMD programming easy, almost as easy as scalar code, somost
programmers are able to write such kind of correct code.
So this this paper presents ideas to better express SIMDsemantics in aC-like language. They introduce few new constructs in a largesubset of Clanguage, with few ideas. The result coding patterns seemeasy enough (theyare surely look simpler than most multi-core coding patternsI've seen,
including Cilk+).


They present a simple scalar program in C:

struct data_t {
    int key;
    int other;
};

int search(data_t* data , int N) {
    for (int i = 0; i < N; i++) {
        int x = data[i].key;
        if (4 < x & x <= 8) return x;
    }
    return -1;
}
Then they explain the three most common ways to represent anarray of
structs, here a struct that contains 3 values:
x0 y0 z0 x1 y1 z1 x2 y2 z2 x3 y3 z3 x4 y4 z4 x5 y5 z5 x6 y6z6 x7 y7 z7
(a) Array of Structures (AoS)
x0 x1 x2 x3 x4 x5 x6 x7 y0 y1 y2 y3 y4 y5 y6 y7 z0 z1 z2z3 z4 z5 z6
z7
(b) Structure of Arrays (SoA)
x0 x1 x2 x3 y0 y1 y2 y3 z0 z1 z2 z3 x4 x5 x6 x7 y4 y5 y6 y7z4 z5 z6 z7
(c) Hybrid Structure of Arrays (Hybrid SoA)
They explain how the (c) is the preferred pattern in SIMDprogramming.
Using the (c) data pattern they show how in C with (nice)SIMD intrinsicsyou write vectorized code (a simd_data_t struct instancecontains 8 int
values):

struct simd_data_t {
    simd_int key;
    simd_int other;
};

int search(simd_data_t* data , int N) {
    for (int i = 0; i < N/L; ++i) {
        simd_int x = load(data[i].key);
        simd_int cmp = simd_and(simd_lt(4, x),
        simd_le(x, 8));
        int mask = simd_to_mask(cmp);
        if (mask != 0) {
            simd_int result = simd_and(mask , x);
            for (int j = 0; j < log2(L); j++)
                result = simd_or(result ,
                whole_reg_shr(result , 1 << j));
                return simd_extract(result , 0);
            }
        }
    return -1;
}
D should do become able to do this (that is not too muchbad), or better.
Their C language extensions allow to write nicer code like:

struct data_t {
    int key;
    int other;
};

int search(data_t *scalar data , int scalar N) {
    int L = lengthof(*data);
    for (int i = 0; i < N/L; ++i) {
        int x = data[i].key;
        if (4 < x & x <= 8)
            int block[L] result = [x, 0];
        scalar {
            for (int j = 0; j < log2(L); ++j)
                result |= whole_reg_shr(result , 1 << j);
            return get(x, 0);
        }
    }
    return -1;
}
This is based on just few simple ideas, explained in thepaper (they areinteresting, but quoting here those parts of the paper is nota good idea).Such ideas are not directly portable to D (unless thefront-end is changed.Their compiler works by lowering, and emits regular C++ codewith
intrinsics).
Near the end of the paper they also propose some C++ librarycode:
the C++ template mechanism would allow to define a hybridSoA container
class: Similar to std::vector which abstracts a traditionalC array, one
could implement a wrapper around a T block[N]*:<
// scalar context throughout this example
struct vec3 { float x, y, z; };
// vec3 block[N]* pointing to ceil(n/N) elements
hsoa <vec3 > vecs(n);
// preferred vector length of vec3 automatically derived
static const int N = hsoa <vec3 >::vector_length;
int i = /*...*/
hsoa <vec3 >::block_index ii = /*...*/
vec3 v = vecs[i]; // gather
vecs[i] = v; // scatter
vec3 block[N] w = vecs[ii]; // fetch whole block
hsoa <vec3 >::ref r = vecs[i]; // get proxy to a scalar
r = v; // pipe through proxy
// for each element
vecs.foreach([](vec3& scalar v) { /*...*/ });
Regardless of the other ideas of their C-like language, asimilar structshould be added to Phobos once a bit higher level SIMDsupport is in bettershape in D. Supporting Hybrid-SoA and few operations on itwill be animportant but probably quite short and simple addition toPhoboscollections (it's essentially an struct that acts like anarray, with few
simple extra operations).
I think no commonly used language allows both very simple andquiteefficient SIMD programming (Scala, CUDA, C, C++, C#, Java,Go, andcurrently Rust too, are not able to support SIMD programmingwell. I thinkcurrently Haskell too is not supporting it well, but Haskellis veryflexible, and it's compiled by a native compiler, so suchthings are maybepossible to add). So supporting it well in D will be aninteresting sellingpoint of D. (Supporting a very simple SIMD coding in D willmake D morewidespread, but such kind of programming will probably keepbeing a small
niche).

Bye,
bearophile
Actually, I am yet to see any language that has SIMD as partof thelanguage standard and not as an extension where each vendordoes its own
way.
HLSL, GLSL, Cg? :)

I was thinking about general purpose programming languages, notdomain specific ones.


--
Paulo

Re: Very simple SIMD programming

Reply via email to