I'll follow your development and provide input. The library may prove quite useful for some handcrafted computations in Arraymancer.
My first remark would be to use a ./bin or ./out folder and .gitignore it, you actually added the produced library to your git repo. Second, I think instead of asking "sse" or "avx", you should use compile-time define with `when defined(sse)` like I do [here](https://github.com/mratsim/Arraymancer/blob/master/src/tensor/backend/openmp.nim#L18) for openmp and cuda. And at compilation you can use `nim c -d:sse -o:out/yourproject yourproject.nim` Lastly, I think the killer feature would be runtime CPU feature detection. You might want to check Rust [Faster](https://github.com/AdamNiederer/faster), and for runtime CPU features detection, lots of multimedia libraries like FFMPEG or VLC have it, I think the cleanest codebase with runtime CPU detection is for the [SIMD image library](https://github.com/ermig1979/Simd).