> I don't know much about SIMD, it looks like your approach is to figure out 
> how to take nim code and SIMDify it?

No not quite. You will write explicit SIMD instructions, but it will 
automatically transform them to use the best possible option given runtime 
detection. So you can write a loop to iterate over an array of integers and say 
add 1 to each value, for instance. If runtime detection sees SSE2 is available, 
it will add 4 integers at a time. If it finds AVX2 is available, it will do 8 
integers at a time. This is a much less difficult thing than auto 
vectorization, which yes, the C compilers can do in some simple cases.

So for instance:
    
    
    var
        a = newSeq[float32](12)
        b = newSeq[float32](12)
        r = newSeq[float32](12)
    
    for i,v in a:
        a[i] = float32(i)
        b[i] = 2.0'f32
    
    SIMD(width):
        for i in countup(0,<a.len,width div 4):
            let av = simd.loadu_ps(addr a[i])
            let bv = simd.loadu_ps(addr b[i])
            let rv = simd.add_ps(av,bv)
            simd.storeu_ps(addr r[i],rv)
    

if sse2 is detected, it will use the SSE2 version of loadu and add and storeu, 
and iterate over the array 4 at a time (16 byte width divided by 4 bytes per 
float32). if avx2 is detected, it will use AVX2 versions of loadu and add and 
storeu, and iterate over the array 8 at a time (32 byte width divided by 4 
bytes per float32)

.NET / C# has a similar abstraction as this which they accomplish with the JIT

Reply via email to