(I should perhaps qualify beyond -d:release using gcc or clang with high optimization levels on the back end since jil210 revealed in [another thread](https://forum.nim-lang.org/t/3198) the baseline 5X performance mystery arose from using tcc as a back end.)
Also, for the curious, the reason C++ STL will usually underperform Nim's hash tables in benchmarks like this is that STL iterator deletion semantics make the natural STL hash table collision resolution implementation choice be external chaining. Those extra linked list indirections add latency, especially when tables do not fit in on-CPU cache.