Re: A microbenchmarking library
Thanks to @dm1try latest PR it now works fine on OSX!
Re: A microbenchmarking library
I've left it running for longer, definitely stuck. Scale factor is 1.0
Re: A microbenchmarking library
2 minutes top on a very busy machine. I guess there's something wrong in the `getMonotonicTime` implementation for OSX then... What is the computed `scaleFactor` in `timer.nim` ?
Re: A microbenchmarking library
How long is it supposed to run? It's also freezing for me on MacOS, killed it after a minute.
Re: A microbenchmarking library
Good news, criterion is now available on nimble! I also need your help, if you're on Windows or OSX can you please check if the `tfib.nim` example runs fine for you? I tried setting up Travis to run the test suite on OSX too but got a timeout after 10m so either the timing code is wrong for that OS or the machine is very slow. Thank you in advance!
Re: A microbenchmarking library
Oh I see what you mean. As usual one must be smart enough to set the CPU governor to 'performance' and pin the benchmarking thread to a single core to prevent measuring errors due to SMP migration. Benchmarking is twice as hard as writing code :)
Re: A microbenchmarking library
Indeed, gcc 8.2 generates perfectly optimized code using cmov instruction for "GOTO label", as [https://www.godbolt.org](https://www.godbolt.org)/ proves with compiler option -O3: int w1(int64_t a, int64_t b) { int r=0; if (a == b) r = 10; else r = 17; return r; } int w2(int64_t a, int64_t b) { int r=10; if (a == b) goto LW2; r = 17; LW2: return r; } w1(long, long): cmp rdi, rsi mov edx, 17 mov eax, 10 cmovne eax, edx ret w2(long, long): cmp rdi, rsi mov edx, 10 mov eax, 17 cmove eax, edx ret Run
Re: A microbenchmarking library
I don't see an issue with the gotos, those are direct jumps so there is no cost there. The real question is the if branching vs cmov. I do not now the impact of Meltdown, Spectre and L1TF mitigation on Haswell branch predictors though as one of the main goal was fixing speculative execution and much of Haswell performance came from its very impressive predictors. Here is a benchmark suite for branch vs branchless: [https://github.com/xiadz/cmov](https://github.com/xiadz/cmov). On past architecture (circa 2010) if/else was faster than cmov for predictable branches (like testing if nil), seems like it changed for Skylake (security patch might change that again). Also ARM, MIPS arch, and even AMD processors might have a completely different behaviour. Finally cmov only make sense for assignment like this: let foo = if bool_a: 10 else: 20
Re: A microbenchmarking library
Yes, as I already said, 100 cycles is fine for me. mratsim, what do you generally think about all the GOTOs? I had the feeling it makes it for C compiler a bit hard to 100 % optimize the code, for example it may be difficult to apply cmov instructions to get branchless code. I think it will make no difference in practice, and of course no one intents to change the code generator, but I still wonder if my assumption is fully wrong.
Re: A microbenchmarking library
100 cycles is the cost of a cache miss so allocating then comparing 2 strings for 100 cycles seems pretty reasonable.
Re: A microbenchmarking library
For your example, my output is # $ nim c t.nim $ ./t Benchmark: fib5() Collected 241 samples Warning: Found 6 mild and 15 extreme outliers in the time measurements Warning: Found 8 mild and 10 extreme outliers in the cycles measurements Time Mean: 251.7599ns (249.2240ns .. 255.4843ns) Std: 24.0680ns (7.8357ns .. 37.5615ns) Slope: 249.0758ns (248.7008ns .. 249.5158ns) r^2: 1. (1. .. 1.) Cycles Mean: 647cycles (642cycles .. 654cycles) Std: 46cycles (15cycles .. 72cycles) Slope: 645cycles (644cycles .. 646cycles) r^2: 1. (1. .. 1.) Run Watch for relation of cycles to ns, it is in the range 1..10 as expected. For your github page it seems to be approx 1000? Wrong scaling for me, or do i miss something?
Re: A microbenchmarking library
> The example output on your github page is still wrong, the cycles count is > much too high. Is it? I've probably benchmarked it without the release switch. As usual beware of the optimizer, make sure the benchmark function doesn't elide the comparison completely. Introducing an argument using measureArg is a nice way to prevent this class of problems.
Re: A microbenchmarking library
OK, here is the assembler listing... # nim c -d:release t.nim # gcc.options.speed = "-save-temps -march=native -O3 -fno-strict-aliasing" $ cat t.nim proc t(a, b: string) : bool = a == b proc main = var a = "Rust" var b = "Nim" echo t(a, b) main() Run t_IxGYsz1VoA2HIiGBY5mgGw: .LFB20: .cfi_startproc movl$1, %eax cmpq%rsi, %rdi je .L32 testq %rdi, %rdi je .L35 movq(%rdi), %rdx testq %rsi, %rsi je .L36 xorl%eax, %eax cmpq(%rsi), %rdx je .L37 .L32: ret .p2align 4,,10 .p2align 3 .L35: testq %rsi, %rsi je .L32 cmpq$0, (%rsi) sete%al ret .p2align 4,,10 .p2align 3 .L37: testq %rdx, %rdx je .L28 subq$8, %rsp .cfi_def_cfa_offset 16 addq$16, %rsi addq$16, %rdi callmemcmp@PLT testl %eax, %eax sete%al addq$8, %rsp .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L36: testq %rdx, %rdx sete%al ret .L28: .L21: .L24: movl$1, %eax ret .cfi_endproc Run
Re: A microbenchmarking library
> C code looks like And what does the produced assembler code look like?
Re: A microbenchmarking library
Nice. The example output on your github page is still wrong, cycles count is much too high. Min cycle count seems to be 4 for most simple procs, but that is no problem. I have just tested a plain string comparison -- about 100 cycles, as expected from its C code. So string comparison is not a very cheap operation in Nim -- not too surprising, as special nil case has to be considered. import criterion var cfg = newDefaultConfig() benchmark cfg: proc t0() {.measure.} = var a = "Rust" var b = "Nim" doAssert a != b proc t1() {.measure.} = var a = "Rust" var b = "Nim" doAssert a > b Run $ ./t Benchmark: t0() Collected 277 samples Warning: Found 4 mild and 8 extreme outliers in the time measurements Warning: Found 4 mild and 6 extreme outliers in the cycles measurements Time Mean: 42.4753ns (41.2772ns .. 43.8444ns) Std: 11.6368ns (7.6329ns .. 16.4890ns) Slope: 42.2136ns (41.7204ns .. 42.9674ns) r^2: 0.9977 (0.9960 .. 0.9988) Cycles Mean: 108cycles (105cycles .. 111cycles) Std: 25cycles (18cycles .. 33cycles) Slope: 109cycles (108cycles .. 111cycles) r^2: 0.9977 (0.9958 .. 0.9988) Benchmark: t1() Collected 275 samples Warning: Found 21 mild and 5 extreme outliers in the time measurements Warning: Found 2 mild and 3 extreme outliers in the cycles measurements Time Mean: 42.1040ns (41.2619ns .. 42.9831ns) Std: 7.4068ns (5.1946ns .. 10.0585ns) Slope: 48.4274ns (46.7242ns .. 49.9679ns) r^2: 0.9934 (0.9891 .. 0.9968) Cycles Mean: 107cycles (105cycles .. 109cycles) Std: 15cycles (13cycles .. 17cycles) Slope: 125cycles (121cycles .. 129cycles) r^2: 0.9934 (0.9886 .. 0.9966) Run C code looks like static N_INLINE(NIM_BOOL, eqStrings)(NimStringDesc* a, NimStringDesc* b) { NIM_BOOL result; NI alen; NI blen; { result = (NIM_BOOL)0; { if (!(a == b)) goto LA3_; result = NIM_TRUE; goto BeforeRet_; } LA3_: ; { if (!(a == NIM_NIL)) goto LA7_; alen = ((NI) 0); } goto LA5_; LA7_: ; { alen = (*a).Sup.len; } LA5_: ; { if (!(b == NIM_NIL)) goto LA12_; blen = ((NI) 0); } goto LA10_; LA12_: ; { blen = (*b).Sup.len; } LA10_: ; { if (!(alen == blen)) goto LA17_; { if (!(alen == ((NI) 0))) goto LA21_; result = NIM_TRUE; goto BeforeRet_; } LA21_: ; result = equalMem_fmeFeLBvgmAHG9bC8ETS9bYQt(((void*) ((*a).data)), ((void*) ((*b).data)), ((NI) (alen))); goto BeforeRet_; } LA17_: ; }BeforeRet_: ; return result; } Run
Re: A microbenchmarking library
That's now fixed! Thanks for the heads up!
Re: A microbenchmarking library
Cool! Where is dip module? :) (in statistics.nim)
A microbenchmarking library
Dear Nimmers, [Here's](https://github.com/LemonBoy/criterion.nim) a nice little library for all your microbenchmarking needs. Benchmarking is hard and the aim of this library is to abstract away most of the complexity and pitfalls and provide the user (hopefully) meaningful results. If you ever used Haskell's [criterion](http://www.serpentine.com/criterion) package then you may already be familiar with how the library works. I've been chipping away at the API in order to make the library as ergonomic as possible and now I think it's right about time to ask for some feedback from some third party users. * * * Have fun and keep on hacking, LemonBoy