Just wanted to share some info about using LTO + PGO with Clang for Nim.

First of all you should know that PGO optimization is not always good because 
it optimizes code paths for the profile-guided run, so some corner cases may 
have even less performance.

The process:

  * Compile your application like



`nim c -d:danger --cc:clang --passC:"-flto -fprofile-instr-generate" 
--passL:"-flto -fprofile-instr-generate" file.nim`

  * Run it with your typical workloads to generate the profiling data for PGO - 
`./file`.



After that you should have a file named `default.profraw` in the folder where 
you ran your program.

  * Use



`llvm-profdata merge default.profraw -output data.profdata` to process the 
profiling data for Clang to use

  * Compile your program again, this time like so (you should be in the same 
folder with the `data.profdata file`)



`nim c -d:danger --cc:clang --passC:"-flto -fprofile-instr-use=data.profdata" 
--passL:"-flto -fprofile-instr-use=data.profdata" file.nim`

After that the process is done, you can now test your binary to see if you got 
any performance boost :)

I tried doing that for my `mathexpr` library: 
    
    
    # Don't mind the nimbench, I know I shouldn't use it :P
    import mathexpr, nimbench
    
    let e = newEvaluator()
    e.addVars({"a": 3.0, "b": 5.7})
    
    bench("test", m):
      for x in 1..m:
        var c = e.eval("(a^a + b * 2 - 3*4.2412+5335^2-4e3)^2")
        if c == 0:
          echo "can't"
    
    runBenchmarks()
    
    
    Run

No LTO/PGO (also yeah, I'm using gc:arc since it's faster :P) - `nim c 
-d:danger --gc:arc --cc:clang -r tests/bench.nim`: 
    
    
    ============================================================================
    bench.nim                                       relative  time/iter  iters/s
    ============================================================================
    "test"                                                     435.24ns    2.30M
    
    
    Run

LTO only - `nim c -d:danger --gc:arc --cc:clang --passC:"-flto" --passL:"-flto" 
-r tests/bench.nim`: 
    
    
    "test"                                                     332.45ns    3.01M
    
    
    Run

LTO+PGO (I won't show all commands, just the last one) - `nim c -r -d:danger 
--gc:arc --cc:clang --passC:"-flto -fprofile-instr-use=perf.profdata" 
--passL:"-flto -fprofile-instr-use=perf.profdata" tests/bench.nim`: 
    
    
    "test"                                                     266.02ns    3.76M
    
    
    Run

Thanks for reading :) 

Reply via email to