# Parallelism Parallelism is a hard problem and it really depends on what kind of parallelism you need.
For my problem domain (numerical computing, machine learning, deep learning), Nim macros + OpenMP offer me unparalleled flexibility. For example I can reach the speed of OpenBLAS which is a matrix multiplication library that has been tuned for 10+ years and is coming from 20+ years of research on optimisations. I do plan to carry this success to convolutions which are still lagging on CPU. I have to benchmark actual implementations but naive convolutions are incredibly slow and would need to be optimised about 40x to reach 80% of your theoretical CPU peak performance (only a few operations like matrix multiplication and convolution can saturate CPU theoretical performance, 99% of the others are bounded by memory bandwidth). Convolutions are used every time you need to blur, sharpen, edge enhance or detect on images and are keys to image, sound, speech perception in deep learning. C, C++, Fortran can also use OpenMP but are severely lacking in dev productivity and the potential to give a nice high level wrapper. # Replacing or expanding Furthermore there is no need to "replace". There are problem domains with no established language especially for production. (What strategy consulting called `blue ocean`, don't compete where there is a lot of fishes, migrate to a new uncharted place) Interestingly in the two I'm thinking of research is done in Python. ## Blockchain The first one is blockchain, beyond Status others have been using Nim due to it's easy interface with C++ and translation of Python research: * [EmberCrypto](https://github.com/EmberCrypto/Ember) * [KIP Foundation](https://github.com/KIPFoundation/nim-ewasm-contracts) ## Reinforcement learning The second one is reinforcement learning. Everything is done in Python and there is no standard way to produce a generic and ship an AI that can play [platformers for example](https://www.youtube.com/watch?v=qv6UVOQ0F44) (Note that this is quite different from deep learning as with reinforcement learning we don't know the correct solution and the neural network in video is also different in a fundamental way). Contrary to traditional machine learning and deep learning, not every languages under the sun has a graveyard of failed reinforcement learning projects. Furthermore while most languages have matrix libraries, most compiled languages do not have a basic 4-dimensional tensor library which is needed to go beyond simple statistical or evolutionary reinforcement learning (think genetic algorithm) and add visual perception to the mix. And lastly, the only way to check reinforcement learning successes is on controlled experiments. It is easy to implement toy examples, but advanced examples basically need emulator bindings and most emulators are written in C++. For example I easily wrapped the [Arcade Learning Environment](https://github.com/mgbellemare/Arcade-Learning-Environment) to do controlled experiments on Atari games from [C++ to Nim](https://github.com/numforge/agent-smith). ## GPU computing One note on GPU computing: Cuda, OpenCL, AMD ROCm, Vulkan Compute are very to deal with, I'm not sure if it is also the case with OpenGL, DirectX but being able to produce C or C++ code is a killer advantage to abstract away all the GPU mess. ## JIT, VMs and interpreters * * * I've implemented several VMs in the past year ([Nimbus VM](https://github.com/status-im/nimbus/blob/6a24701bbf0dab12ddbcf76560ecf3f745429823/nimbus/vm/interpreter_dispatch.nim#L22-L36) for blockchain, [Glyph for SuperNes emulation (incomplete)](https://github.com/mratsim/glyph/blob/8b278c5e76c3f1053a196173a93686afda0596cc/glyph/snes/opcodes.nim#L16-L32) and [Photon JIT](https://github.com/numforge/laser/blob/9fbb8d2a573d950573c7249e3a5d6cdd784a639e/laser/photon_jit/x86_64/x86_64_ops.nim#L24-L51) for x86_64 JIT Assembler) and I don't see any language competing with Nim in this space. Thanks to metaprogramming, opcodes mapping is a breeze and you can cleanly separate your [dispatch technique](https://github.com/status-im/nimbus/wiki/Interpreter-optimization-resources) with your opcode implementation while avoiding function call/vtable overhead which kills your cache.