Hey Keno,

Thanks - that's correct. I was just remarking that those 2 functions should
generate the same code but they don't. Hopefully this small test case will
be an easy way to figure out why julia is still generating the slightly
suboptimal code for my original f() method by loading up that zero from
memory instead of treating it as a constant (and propagating it
appropriately).

Thanks!
Rak.


On Wed, Apr 9, 2014 at 7:16 PM, Keno Fischer
<kfisc...@college.harvard.edu>wrote:

> The reason is fairly straightforward (though of course it should be
> properly optimized). Assignments always yield the RHS so `foo2` is
> equivalent to
>
> function foo3()
>     result::Int = 0
>     return 0
> end
>
> To see this you can do:
>
> julia> @code_typed foo2()
> 1-element Array{Any,1}:
>  :($(Expr(:lambda, {}, {{:result},{{:result,Int64,18}},{}}, :(begin  #
> none, line 2:
>         result = top(typeassert)(0,Int)::Int64
>          return 0
>     end::Int64))))
>
> while for some reason the compiler doesn't know that the value is constant
> in the case of foo (though it should).
>
>
>
> On Wed, Apr 9, 2014 at 7:06 PM, Rak Rok <rak...@gmail.com> wrote:
>
>> I think the following code is more illustrative of the issue:
>>
>> julia> function foo()
>>>            result::Int = 0
>>>            result
>>>        end
>>> foo (generic function with 1 method)
>>> julia> code_native(foo, ())
>>>  .section __TEXT,__text,regular,pure_instructions
>>> Filename: none
>>> Source line: 2
>>> push RBP
>>>  mov RBP, RSP
>>> movabs RAX, 140263599429192
>>> Source line: 2
>>> mov RAX, QWORD PTR [RAX]
>>> Source line: 3
>>> pop RBP
>>> ret
>>> julia> function foo2()
>>>            result::Int = 0
>>>        end
>>> foo2 (generic function with 1 method)
>>> julia> code_native(foo2, ())
>>> .section __TEXT,__text,regular,pure_instructions
>>> Filename: none
>>> Source line: 2
>>> push RBP
>>>  mov RBP, RSP
>>> xor EAX, EAX
>>> Source line: 2
>>> pop RBP
>>> ret
>>
>>
>> For some reason in the first case Julia loads the 0 from memory whereas
>> in the second case it simply xors EAX (instead of RAX, oddly, but not sure
>> if it matters here).
>>
>> Rak.
>>
>>
>>
>> On Wed, Apr 9, 2014 at 6:32 PM, Rak Rok <rak...@gmail.com> wrote:
>>
>>> Excellent! Thanks guys, this is a TON better!
>>>
>>> I find it a little bit strange that it fetches the value of 0 from some
>>> area in memory instead of just say xor RAX, RAX. I'm guessing it doesn't
>>> know that the value is actually numeric 0? Is that because zero(T) isn't
>>> being inlined fully perhaps?
>>>
>>> If it could realize that zero(T) is actually 0, it would probably avoid
>>> the add altogether and simply do a mov into RAX.
>>>
>>> Interestingly, the code generated for Uints implies that that it knows
>>> that zero(T) is actually just 0 and we get:
>>> julia> code_native(f, (Uint,))
>>>     .section __TEXT,__text,regular,pure_instructions
>>> Filename: none
>>> Source line: 3
>>>     push RBP
>>>     mov RBP, RSP
>>>     test RDI, RDI
>>>     je 9
>>> Source line: 3
>>>     imul RDI, RDI
>>>     jmpq 2
>>>     xor EDI, EDI
>>> Source line: 6
>>>     mov RAX, RDI
>>>     pop RBP
>>>     ret
>>>
>>> Which is a little strange in its own way, since the xor EDI, EDI is
>>> unecessary (and the jmpq 2 as well), but is at least better in that it
>>> doesn't load a 0 from memory. Any clues?
>>>
>>> Thanks for your help guys,
>>> Rak.
>>>
>>> On Thursday, April 3, 2014 5:24:27 PM UTC-4, Matt Bauman wrote:
>>>>
>>>> Talk about a perfect storm of fixes and enhancements!  With the latest
>>>> commit by Arch Robison, the native code is remarkably similar to clang's
>>>> output — just load, test, and multiply and add.  It's funny to read folks
>>>> in the issue talking about how much faster the 0.3 version is… when the
>>>> final result ended up being something like > 50 million fold faster than
>>>> 0.2.  So very impressive.
>>>>
>>>> You should complain about strange LLVM code more often, Rak.  Here's
>>>> the native code now:
>>>>
>>>> julia> code_native(f, (Int,))
>>>> .section __TEXT,__text,regular,pure_instructions
>>>> Filename: none
>>>> Source line: 2
>>>> push RBP
>>>> mov RBP, RSP
>>>>  movabs RAX, 140508429342280
>>>> Source line: 2
>>>> mov RAX, QWORD PTR [RAX]
>>>>  test RDI, RDI
>>>> jle 7
>>>> Source line: 3
>>>>  imul RDI, RDI
>>>> add RAX, RDI
>>>> Source line: 6
>>>> pop RBP
>>>> ret
>>>>
>>>> On Wednesday, April 2, 2014 9:26:44 PM UTC-4, andrew cooke wrote:
>>>>>
>>>>>
>>>>> thanks!  it's just possible this will fix a performance issue of mine
>>>>> :o)
>>>>>
>>>>> On Wednesday, 2 April 2014 16:57:36 UTC-3, Steven G. Johnson wrote:
>>>>>>
>>>>>> Just filed this as an issue:
>>>>>>     https://github.com/JuliaLang/julia/issues/6382
>>>>>>
>>>>> julia> code_native(f, (Uint,))
>>>  .section __TEXT,__text,regular,pure_instructions
>>> Filename: none
>>> Source line: 3
>>>  push RBP
>>>  mov RBP, RSP
>>>  test RDI, RDI
>>>  je 9
>>> Source line: 3
>>>  imul RDI, RDI
>>>  jmpq 2
>>>  xor EDI, EDI
>>> Source line: 6
>>>  mov RAX, RDI
>>>  pop RBP
>>>  ret
>>>
>>>
>>
>

Reply via email to