> I don't know much about SIMD, it looks like your approach is to figure out > how to take nim code and SIMDify it?
No not quite. You will write explicit SIMD instructions, but it will automatically transform them to use the best possible option given runtime detection. So you can write a loop to iterate over an array of integers and say add 1 to each value, for instance. If runtime detection sees SSE2 is available, it will add 4 integers at a time. If it finds AVX2 is available, it will do 8 integers at a time. This is a much less difficult thing than auto vectorization, which yes, the C compilers can do in some simple cases. So for instance: var a = newSeq[float32](12) b = newSeq[float32](12) r = newSeq[float32](12) for i,v in a: a[i] = float32(i) b[i] = 2.0'f32 SIMD(width): for i in countup(0,<a.len,width div 4): let av = simd.loadu_ps(addr a[i]) let bv = simd.loadu_ps(addr b[i]) let rv = simd.add_ps(av,bv) simd.storeu_ps(addr r[i],rv) if sse2 is detected, it will use the SSE2 version of loadu and add and storeu, and iterate over the array 4 at a time (16 byte width divided by 4 bytes per float32). if avx2 is detected, it will use AVX2 versions of loadu and add and storeu, and iterate over the array 8 at a time (32 byte width divided by 4 bytes per float32) .NET / C# has a similar abstraction as this which they accomplish with the JIT