Since no one here actually suggested any changes to the actual code, I tried 
PGO just to see for myself how much of a difference it can bring. Tried both 
llvm and gcc, the former gave negative results, but gcc works great:
    
    
    !/usr/bin/env bash
    set -e
    # Substitute your paths to the Nim lib and program's C files
    NIMLIB="/home/user/.choosenim/toolchains/nim-#devel/lib/"
    SRC=(~/.cache/nim/optimized_r/*.c)
    
    # Regular compilation to generate .c files and a baseline executable:
    nim c --gc:orc -d:danger --passC:"-flto" --passL:"-flto" -o:optimized_nim 
optimized.nim
    
    # Compilation of an exec which will produce the profiling data
    gcc -O3 -flto -fprofile-generate -I$NIMLIB ${SRC[@]} -o pg
    
    # Generate a test file
    BIBLE=/tmp/kjvbible_x10.txt
    if [ ! -f "$BIBLE" ]; then
        for i in {1..10}; do
            cat kjvbible.txt >>"$BIBLE"
        done
    fi
    
    # Run the profiling
    ./pg < /tmp/kjvbible_x50.txt >/dev/null
    # This produces "@moptimized.nim.gcda" and a few .gcda files for the stdlib
    
    # Compile an optimized program
    gcc -O3 -flto -fprofile-use -I$NIMLIB ${SRC[@]} -o optimized_nim_pgo
    # And an even more optimized one
    gcc -O3 -march=native -mtune=native -flto -fprofile-use -I$NIMLIB ${SRC[@]} 
-o optimized_nim_pgo_native
    
    # You obviously need hyperfine installed for this
    echo -Go optimized
    hyperfine -r 10 "./optimized-go <$BIBLE >/dev/null"
    
    echo -Nim optimized
    hyperfine -r 10 "./optimized_nim <$BIBLE >/dev/null"
    echo -Nim optimized + pgo
    hyperfine -r 10 "./optimized_nim_pgo <$BIBLE >/dev/null"
    echo -Nim optimized + pgo + native
    hyperfine -r 10 "./optimized_nim_pgo_native <$BIBLE >/dev/null"
    
    
    Run

(btw, why no Bash highlighting support here?)

I ran the script on the latest version of the code (from the `hashrearhatch` 
branch) and the condensed results are: 
    
    
    -Go optimized
      Time (mean ± σ):      2.075 s ±  0.048 s    [User: 1.994 s, System: 0.100 
s]
    -Nim optimized
      Time (mean ± σ):      1.924 s ±  0.068 s    [User: 1.838 s, System: 0.068 
s]
    -Nim optimized + pgo
      Time (mean ± σ):      1.785 s ±  0.033 s    [User: 1.697 s, System: 0.067 
s]
    -Nim optimized + pgo + native
      Time (mean ± σ):      1.739 s ±  0.028 s    [User: 1.658 s, System: 0.063 
s]
    
    
    Run

Which is a **10%** speed bump for the Nim version. That makes Go a **19%** 
slower on this size of the testing data. Pretty neat for a zero amount of 
manual work.

Reply via email to