Either way, one thing is quite unfortunate about this code. The compilation
process isn't able to figure out that 10^8 is a constant so it recomputes
it on every loop iteration. We really need a way to annotate functions as
being pure in the very specific sense that the compiler is free to evaluate
them at compile time if all of its arguments are known at compile time (or
partially evaluate if when some of the arguments are known).


On Fri, Mar 28, 2014 at 11:24 AM, John Myles White <johnmyleswh...@gmail.com
> wrote:

> Yeah, that's true. I didn't read the IR carefully enough.
>
> Laszlo, are you on the latest Julia? I worry that it's hard to make
> comparisons if you're running an older version of Julia.
>
>  -- John
>
> On Mar 28, 2014, at 8:18 AM, Stefan Karpinski <ste...@karpinski.org>
> wrote:
>
> Perhaps I should have said "isomorphic" – the only differences there are
> names. It's more obvious that the native code is the same – only the source
> line annotations are different at all.
>
>
> On Fri, Mar 28, 2014 at 11:16 AM, John Myles White <
> johnmyleswh...@gmail.com> wrote:
>
>> On my system, the two functions produce different LLVM IR:
>>
>> julia> code_llvm(f1, ())
>>
>> define i64 @julia_f115727() {
>> top:
>>   %0 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !726
>>   %1 = icmp slt i64 %0, 1, !dbg !726
>>   br i1 %1, label %L2, label %if, !dbg !726
>>
>> if:                                               ; preds = %top, %if
>>   %j.04 = phi i64 [ %3, %if ], [ 1, %top ]
>>   %k.03 = phi i64 [ %4, %if ], [ 1, %top ]
>>   %2 = and i64 %k.03, 1, !dbg !727
>>   %3 = add i64 %j.04, %2, !dbg !727
>>   %4 = add i64 %k.03, 1, !dbg !728
>>   %5 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !726
>>   %6 = icmp sgt i64 %4, %5, !dbg !726
>>   br i1 %6, label %L2, label %if, !dbg !726
>>
>> L2:                                               ; preds = %if, %top
>>   %j.0.lcssa = phi i64 [ 1, %top ], [ %3, %if ]
>>   ret i64 %j.0.lcssa, !dbg !729
>> }
>>
>> julia> code_llvm(f2, ())
>>
>> define i64 @julia_f215728() {
>> top:
>>   %0 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !729
>>   %1 = icmp slt i64 %0, 1, !dbg !729
>>   br i1 %1, label %L6, label %L3, !dbg !729
>>
>> L3:                                               ; preds = %top, %L3
>>   %j.08 = phi i64 [ %3, %L3 ], [ 1, %top ]
>>   %k.07 = phi i64 [ %4, %L3 ], [ 1, %top ]
>>   %2 = and i64 %k.07, 1, !dbg !730
>>   %3 = add i64 %j.08, %2, !dbg !730
>>   %4 = add i64 %k.07, 1, !dbg !731
>>   %5 = call i64 @julia_power_by_squaring1373(i64 10, i64 8), !dbg !729
>>   %6 = icmp slt i64 %5, %4, !dbg !729
>>   br i1 %6, label %L6, label %L3, !dbg !729
>>
>> L6:                                               ; preds = %L3, %top
>>   %j.0.lcssa = phi i64 [ 1, %top ], [ %3, %L3 ]
>>   ret i64 %j.0.lcssa, !dbg !732
>> }
>>
>> But the performance is identical or slightly in favor of f1.
>>
>>  -- John
>>
>> On Mar 28, 2014, at 8:02 AM, Stefan Karpinski <ste...@karpinski.org>
>> wrote:
>>
>> > Both way of writing a while loop should be the same. If you're seeing a
>> difference, something else is going on. I'm not able to reproduce this:
>> >
>> > function f1()
>> >   j = k = 1
>> >   while k <= 10^8
>> >     j += k & 1
>> >     k += 1
>> >   end
>> >   return j
>> > end
>> >
>> > function f2()
>> >   j = k = 1
>> >   while true
>> >     k <= 10^8 || break
>> >     j += k & 1
>> >     k += 1
>> >   end
>> >   return j
>> > end
>> >
>> > function f3()
>> >   j = k = 1
>> >   while true
>> >     k > 10^8 && break
>> >     j += k & 1
>> >     k += 1
>> >   end
>> >   return j
>> > end
>> >
>> > julia> @time f1()
>> > elapsed time: 0.644661304 seconds (64 bytes allocated)
>> > 50000001
>> >
>> > julia> @time f2()
>> > elapsed time: 0.640951585 seconds (64 bytes allocated)
>> > 50000001
>> >
>> > julia> @time f3()
>> > elapsed time: 0.639177183 seconds (64 bytes allocated)
>> > 50000001
>> >
>> > All three functions produce identical native code. Can you send exactly
>> what your function definitions are, how you're timing them and perhaps the
>> output of code_native(f1,())?
>> >
>> >
>> > On Fri, Mar 28, 2014 at 10:48 AM, Laszlo Hars <laszloh...@gmail.com>
>> wrote:
>> > Thanks, John, for your replies. In my system your code gives reliable
>> results, too, if we increase the loop limits to 10^9:
>> >
>> > julia> mean(t1s ./ t2s)
>> > 11.924373323658703
>> >
>> > This 12% makes a significant difference in my function of nested loops
>> (could add up to a factor of 2 slow down). So, the question remains:
>> >
>> > - what is the fastest coding of a while loop?
>> >
>>
>>
>
>

Reply via email to