I've been thinking about this a bit, and as usual, Julia's multiple dispatch might make such a thing possible in a novel way. The heart of ISPC is allowing a function that looks like
int addScalar (int a, int b) { return a + b; } effectively be vector<int> addVector (vector<int> a, vector<int> b) { return /*AVX version of */a + b; } This is what vectorizing compilers do, but they don't handle control flow like ISPC does. Also, ISPCs "foreach" and "foreach_tiled" allow these vectorized functions to be consumed more efficiently, for instance by handling the ragged/unaligned front and back of arrays with scalar versions, and the middle bits with vectorized functions. With support for hardware vectors in Julia, you can start to imagine writing macros that automatically generate the relevant functions, e.g. generating AddVector from addScalar. However, to do anything cleverer than the (already extremely clever) LLVM vectorizer, you have to expose masking operations. To handle incoherent/divergent control flow, you issue vector operations that are masked, allowing some lanes of the vector to stop participating in the program for a period. In a contrived example int addScalar(int a, int b) { return a % 2 ? a + b : a - b; } would be turned into something like the below vector<int> addVector(vector<int> a, vector<int> b) { mask = all; // a register with all 1s, indicating all lanes participate int mod = a % 2; // vectorized, using mask mask = maskwhere(mod != 0); vector<int> result = a + b; // vectorized, using mask mask = invert(mask); result = a - b; // vectorized, using mask return result; } If you look at it closely, you've got versions generated for each function that are - scalar - vector-enabled, but for arbitrary length vectors - specialized for (one or more hardware) vector sizes - specialized by alignment (as vector sizes get bigger, e.g. the 32- and 64-byte AVX versions coming out, you can't just rely on the runtime to align everything properly, it will be too wasteful) So, I think it's a big ask, but I think it could be produced incrementally. We'd need help from the Julia language/standard library itself to expose masked vector operations. *Sebastian Good* On Tue, Sep 23, 2014 at 2:52 PM, Jeff Waller <truth...@gmail.com> wrote: > Could this theoretical thing be approached incrementally? Meaning here's > a project and he's some intermediate results and now it's 1.5x faster, and > now he's something better and it's 2.7 all the while the goal is apparent > but difficult. > > Or would it kind of be all works or doesn't? >