Just wanted to share some info about using LTO + PGO with Clang for Nim. First of all you should know that PGO optimization is not always good because it optimizes code paths for the profile-guided run, so some corner cases may have even less performance.
The process: * Compile your application like `nim c -d:danger --cc:clang --passC:"-flto -fprofile-instr-generate" --passL:"-flto -fprofile-instr-generate" file.nim` * Run it with your typical workloads to generate the profiling data for PGO - `./file`. After that you should have a file named `default.profraw` in the folder where you ran your program. * Use `llvm-profdata merge default.profraw -output data.profdata` to process the profiling data for Clang to use * Compile your program again, this time like so (you should be in the same folder with the `data.profdata file`) `nim c -d:danger --cc:clang --passC:"-flto -fprofile-instr-use=data.profdata" --passL:"-flto -fprofile-instr-use=data.profdata" file.nim` After that the process is done, you can now test your binary to see if you got any performance boost :) I tried doing that for my `mathexpr` library: # Don't mind the nimbench, I know I shouldn't use it :P import mathexpr, nimbench let e = newEvaluator() e.addVars({"a": 3.0, "b": 5.7}) bench("test", m): for x in 1..m: var c = e.eval("(a^a + b * 2 - 3*4.2412+5335^2-4e3)^2") if c == 0: echo "can't" runBenchmarks() Run No LTO/PGO (also yeah, I'm using gc:arc since it's faster :P) - `nim c -d:danger --gc:arc --cc:clang -r tests/bench.nim`: ============================================================================ bench.nim relative time/iter iters/s ============================================================================ "test" 435.24ns 2.30M Run LTO only - `nim c -d:danger --gc:arc --cc:clang --passC:"-flto" --passL:"-flto" -r tests/bench.nim`: "test" 332.45ns 3.01M Run LTO+PGO (I won't show all commands, just the last one) - `nim c -r -d:danger --gc:arc --cc:clang --passC:"-flto -fprofile-instr-use=perf.profdata" --passL:"-flto -fprofile-instr-use=perf.profdata" tests/bench.nim`: "test" 266.02ns 3.76M Run Thanks for reading :)