Yeah I mean I think this is exactly where nim shines as opposed to Rust, as you pick out; Nim keeps the simple things simple and the hard things possible, Rust instead is trying to market a lego brick with 3.46 dimensions.
Nimpy is operatively intuitive, fast, 2-way, etc. So use that for 98% of what you're asking. If you care for optimizations take a look at some of Treeform's work with Guzba, especially on the SIMD side.. I've never seen optimizations be so mature(as opposed to pre-mature), aka not deeply dug into the fractational dimensional Lego house that you are trying to keep stable in int:=3 dimensional space. Parallel is relatively plenty mature in this ecosystem, the tricky bit is that while nim keeps easy things easy, difficult things stay difficult, although crucially, and hopefully up to your wits; Macros should make this a temporary phenomenon until you have refined your spec and DSL.