I am learning some Nim, and have a hunch that the metaprogramming features of Nim may allow for a user friendly SIMD library. The primary challenge with SIMD is that various processors support different SIMD features. So to write code that will run as fast as possible on every CPU, you have to write many versions of the same function, then detect CPU features at run time and use the appropriate one.
I would like to approximate the APIs available in Boost.SIM or .NET, both of which allow you to write the code for your algorithm once, and at runtime the appropriate thing happens. With C# the way this works is if you have an array of say, 32 bit floats, you can access Vector<float> and that will have a property Count, which is figure out when the code is Jitted, tell you how wide the SIMD lane is for that type, on that cpu. So you can then write your algorithm appropriately by looping over your array Vector<float>.Count at a time and then do whatever SIMD operations you want inside the loop. I know in Nim we don't have the benefit of a JIT but boost.simd is able to do something similar with C++ templates. I'm not entirely sure what the best way to approach this is in Nim, so was just looking for some high level guidance/ideas.