Since no one here actually suggested any changes to the actual code, I tried PGO just to see for myself how much of a difference it can bring. Tried both llvm and gcc, the former gave negative results, but gcc works great: !/usr/bin/env bash set -e # Substitute your paths to the Nim lib and program's C files NIMLIB="/home/user/.choosenim/toolchains/nim-#devel/lib/" SRC=(~/.cache/nim/optimized_r/*.c) # Regular compilation to generate .c files and a baseline executable: nim c --gc:orc -d:danger --passC:"-flto" --passL:"-flto" -o:optimized_nim optimized.nim # Compilation of an exec which will produce the profiling data gcc -O3 -flto -fprofile-generate -I$NIMLIB ${SRC[@]} -o pg # Generate a test file BIBLE=/tmp/kjvbible_x10.txt if [ ! -f "$BIBLE" ]; then for i in {1..10}; do cat kjvbible.txt >>"$BIBLE" done fi # Run the profiling ./pg < /tmp/kjvbible_x50.txt >/dev/null # This produces "@moptimized.nim.gcda" and a few .gcda files for the stdlib # Compile an optimized program gcc -O3 -flto -fprofile-use -I$NIMLIB ${SRC[@]} -o optimized_nim_pgo # And an even more optimized one gcc -O3 -march=native -mtune=native -flto -fprofile-use -I$NIMLIB ${SRC[@]} -o optimized_nim_pgo_native # You obviously need hyperfine installed for this echo -Go optimized hyperfine -r 10 "./optimized-go <$BIBLE >/dev/null" echo -Nim optimized hyperfine -r 10 "./optimized_nim <$BIBLE >/dev/null" echo -Nim optimized + pgo hyperfine -r 10 "./optimized_nim_pgo <$BIBLE >/dev/null" echo -Nim optimized + pgo + native hyperfine -r 10 "./optimized_nim_pgo_native <$BIBLE >/dev/null" Run
(btw, why no Bash highlighting support here?) I ran the script on the latest version of the code (from the `hashrearhatch` branch) and the condensed results are: -Go optimized Time (mean ± σ): 2.075 s ± 0.048 s [User: 1.994 s, System: 0.100 s] -Nim optimized Time (mean ± σ): 1.924 s ± 0.068 s [User: 1.838 s, System: 0.068 s] -Nim optimized + pgo Time (mean ± σ): 1.785 s ± 0.033 s [User: 1.697 s, System: 0.067 s] -Nim optimized + pgo + native Time (mean ± σ): 1.739 s ± 0.028 s [User: 1.658 s, System: 0.063 s] Run Which is a **10%** speed bump for the Nim version. That makes Go a **19%** slower on this size of the testing data. Pretty neat for a zero amount of manual work.