Thank you guys. I couldn't imagine how many things could go wrong in a computation session, under Windows. I rebooted my PC, and now the benchmarks run 3 times faster (!), and I see no real differences in the cases, except in the global context.
I agree that annotating "pure" functions could be very useful for high performance code. I miss STATIC variables, even more, though. My functions use a bunch of small constant arrays, which I'd like to specify as static, loaded together with the function code. Is there a way to do this? (Now I put my functions into modules, where outside of the functions I write const array definitions. Inside the functions these arrays are declared global. It works, but I ended up with many small modules, and a lot of "using" statements.) On Fri, Mar 28, 2014 at 12:09 PM, Stefan Karpinski <ste...@karpinski.org>wrote: > Either way, one thing is quite unfortunate about this code. The > compilation process isn't able to figure out that 10^8 is a constant so it > recomputes it on every loop iteration. We really need a way to annotate > functions as being pure in the very specific sense that the compiler is > free to evaluate them at compile time if all of its arguments are known at > compile time (or partially evaluate if when some of the arguments are > known). > > > On Fri, Mar 28, 2014 at 11:24 AM, John Myles White < > johnmyleswh...@gmail.com> wrote: > >> Yeah, that's true. I didn't read the IR carefully enough. >> >> Laszlo, are you on the latest Julia? I worry that it's hard to make >> comparisons if you're running an older version of Julia. >> >> -- John >> >> On Mar 28, 2014, at 8:18 AM, Stefan Karpinski <ste...@karpinski.org> >> wrote: >> >> Perhaps I should have said "isomorphic" - the only differences there are >> names. It's more obvious that the native code is the same - only the source >> line annotations are different at all. >> >> >> On Fri, Mar 28, 2014 at 11:16 AM, John Myles White < >> johnmyleswh...@gmail.com> wrote: >> >>> On my system, the two functions produce different LLVM IR: >>> >>> julia> code_llvm(f1, ()) >>> >>> define i64 @julia_f115727() { >>> top: >>> %0 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !726 >>> %1 = icmp slt i64 %0, 1, !dbg !726 >>> br i1 %1, label %L2, label %if, !dbg !726 >>> >>> if: ; preds = %top, %if >>> %j.04 = phi i64 [ %3, %if ], [ 1, %top ] >>> %k.03 = phi i64 [ %4, %if ], [ 1, %top ] >>> %2 = and i64 %k.03, 1, !dbg !727 >>> %3 = add i64 %j.04, %2, !dbg !727 >>> %4 = add i64 %k.03, 1, !dbg !728 >>> %5 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !726 >>> %6 = icmp sgt i64 %4, %5, !dbg !726 >>> br i1 %6, label %L2, label %if, !dbg !726 >>> >>> L2: ; preds = %if, %top >>> %j.0.lcssa = phi i64 [ 1, %top ], [ %3, %if ] >>> ret i64 %j.0.lcssa, !dbg !729 >>> } >>> >>> julia> code_llvm(f2, ()) >>> >>> define i64 @julia_f215728() { >>> top: >>> %0 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !729 >>> %1 = icmp slt i64 %0, 1, !dbg !729 >>> br i1 %1, label %L6, label %L3, !dbg !729 >>> >>> L3: ; preds = %top, %L3 >>> %j.08 = phi i64 [ %3, %L3 ], [ 1, %top ] >>> %k.07 = phi i64 [ %4, %L3 ], [ 1, %top ] >>> %2 = and i64 %k.07, 1, !dbg !730 >>> %3 = add i64 %j.08, %2, !dbg !730 >>> %4 = add i64 %k.07, 1, !dbg !731 >>> %5 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !729 >>> %6 = icmp slt i64 %5, %4, !dbg !729 >>> br i1 %6, label %L6, label %L3, !dbg !729 >>> >>> L6: ; preds = %L3, %top >>> %j.0.lcssa = phi i64 [ 1, %top ], [ %3, %L3 ] >>> ret i64 %j.0.lcssa, !dbg !732 >>> } >>> >>> But the performance is identical or slightly in favor of f1. >>> >>> -- John >>> >>> On Mar 28, 2014, at 8:02 AM, Stefan Karpinski <ste...@karpinski.org> >>> wrote: >>> >>> > Both way of writing a while loop should be the same. If you're seeing >>> a difference, something else is going on. I'm not able to reproduce this: >>> > >>> > function f1() >>> > j = k = 1 >>> > while k <= 10^8 >>> > j += k & 1 >>> > k += 1 >>> > end >>> > return j >>> > end >>> > >>> > function f2() >>> > j = k = 1 >>> > while true >>> > k <= 10^8 || break >>> > j += k & 1 >>> > k += 1 >>> > end >>> > return j >>> > end >>> > >>> > function f3() >>> > j = k = 1 >>> > while true >>> > k > 10^8 && break >>> > j += k & 1 >>> > k += 1 >>> > end >>> > return j >>> > end >>> > >>> > julia> @time f1() >>> > elapsed time: 0.644661304 seconds (64 bytes allocated) >>> > 50000001 >>> > >>> > julia> @time f2() >>> > elapsed time: 0.640951585 seconds (64 bytes allocated) >>> > 50000001 >>> > >>> > julia> @time f3() >>> > elapsed time: 0.639177183 seconds (64 bytes allocated) >>> > 50000001 >>> > >>> > All three functions produce identical native code. Can you send >>> exactly what your function definitions are, how you're timing them and >>> perhaps the output of code_native(f1,())? >>> > >>> > >>> > On Fri, Mar 28, 2014 at 10:48 AM, Laszlo Hars <laszloh...@gmail.com> >>> wrote: >>> > Thanks, John, for your replies. In my system your code gives reliable >>> results, too, if we increase the loop limits to 10^9: >>> > >>> > julia> mean(t1s ./ t2s) >>> > 11.924373323658703 >>> > >>> > This 12% makes a significant difference in my function of nested loops >>> (could add up to a factor of 2 slow down). So, the question remains: >>> > >>> > - what is the fastest coding of a while loop? >>> > >>> >>> >> >> >