Re: [julia-users] Re: Precompilation and functions with keyword arguments

2015-11-28 Thread Tim Holy
Nice detective work, Dan! Quite helpful.

Best,
--Tim

On Wednesday, November 25, 2015 02:08:47 PM Dan wrote:
> Further investigation suggests recompilation is happening. This raises
> questions:
> 1) why is there recompilation?
> 2) why are the memory allocations associated not reported?
> 3) is there some memory leak?
> 
> The following run suggests the recompilation:
>   | | |_| | | | (_| |  |  Version 0.4.2-pre+15 (2015-11-20 12:12 UTC)
> 
>  _/ |\__'_|_|_|\__'_|  |  Commit c4bbb89* (5 days old release-0.4)
> 
> |__/   |  x86_64-linux-gnu
> 
> julia> testfunction2(n) = (x = 0.0; for i=1:n x += sin(i) ; end ; x)
> testfunction2 (generic function with 1 method)
> 
> 
> julia> @time precompile(testfunction2,(Int64,))
>   0.004197 seconds (2.84 k allocations: 146.004 KB)
> 
> 
> julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false
> )
> "\ndefine double @julia_testfunction2_21229(i64) {\ntop:\n  %1 = icmp sgt
> i64 %0, 0\n  %2 = select i1 %1, i64 %0, i64 0\n  %3 = icmp eq i64 %2, 0\n
>  br i1 %3, label %L3, label %L\n\nL:
>  ; preds = %pass, %top\n  %x.0 = phi double [ %11, %pass ], [
> 0.00e+00, %top ]\n  %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n
>  %4 = sitofp i64 %\"#s1.0\" to double\n  %5 = call double inttoptr (i64
> 139659687619792 to double (double)*)(double inreg %4)\n  %6 = fcmp uno
> double %4, 0.00e+00\n  %7 = fcmp ord double %5, 0.00e+00\n  %8 = or
> i1 %7, %6\n  br i1 %8, label %pass, label %fail\n\nfail:
>   ; preds = %L\n  %9 = load %jl_value_t**
> @jl_domain_exception, align 8\n  call void
> @jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n
>  unreachable\n\npass: ; preds =
> %L\n  %10 = add i64 %\"#s1.0\", 1\n  %11 = fadd double %x.0, %5\n  %12 =
> icmp eq i64 %\"#s1.0\", %2\n  br i1 %12, label %L3, label %L\n\nL3:
>   ; preds = %pass, %top\n  %x.1 = phi
> double [ 0.00e+00, %top ], [ %11, %pass ]\n  ret double %x.1\n}\n"
> 
> 
> julia> @time testfunction2(100_000)
>   0.005485 seconds (5 allocations: 176 bytes)
> 1.84103630354
> 
> 
> julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false
> )
> "\ndefine double @julia_testfunction2_21293(i64) {\ntop:\n  %1 = icmp sgt
> i64 %0, 0\n  %2 = select i1 %1, i64 %0, i64 0\n  %3 = icmp eq i64 %2, 0\n
>  br i1 %3, label %L3, label %L\n\nL:
>  ; preds = %pass, %top\n  %x.0 = phi double [ %11, %pass ], [
> 0.00e+00, %top ]\n  %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n
>  %4 = sitofp i64 %\"#s1.0\" to double\n  %5 = call double inttoptr (i64
> 139659687619792 to double (double)*)(double inreg %4)\n  %6 = fcmp uno
> double %4, 0.00e+00\n  %7 = fcmp ord double %5, 0.00e+00\n  %8 = or
> i1 %7, %6\n  br i1 %8, label %pass, label %fail\n\nfail:
>   ; preds = %L\n  %9 = load %jl_value_t**
> @jl_domain_exception, align 8\n  call void
> @jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n
>  unreachable\n\npass: ; preds =
> %L\n  %10 = add i64 %\"#s1.0\", 1\n  %11 = fadd double %x.0, %5\n  %12 =
> icmp eq i64 %\"#s1.0\", %2\n  br i1 %12, label %L3, label %L\n\nL3:
>   ; preds = %pass, %top\n  %x.1 = phi
> double [ 0.00e+00, %top ], [ %11, %pass ]\n  ret double %x.1\n}\n"
> 
> 
> As can be seen, the first version is julia_testfunction2_21229 and the
> second is julia_testfunction2_21293. The signatures are the same.
> 
> Disclaimer - I'm very unfamiliar with the Julia codegen code (and much of
> the rest).
> Intuition: It might be related to the following snippet from src/codegen.cpp
> 
> 
> extern "C" DLLEXPORT
> void *jl_get_llvmf(jl_function_t *f, jl_tupletype_t *tt, bool getwrapper)
> {
> 
> 
> 
> if (sf->linfo->functionObject != NULL) {
> // found in the system image: force a recompile
> Function *llvmf = (Function*)sf->linfo->functionObject;
> if (llvmf->isDeclaration()) {
> sf->linfo->specFunctionObject = NULL;
> sf->linfo->functionObject = NULL;
> }
> }
> 
> 
> Note the force a recompile bit.
> 
> On Wednesday, November 25, 2015 at 10:23:04 PM UTC+2, Dan wrote:
> > I've managed to reproduce the timing peculiarities you described, with a
> > different function. In fact, I get the same first run slower behavior even
> > with regular functions with -no- keywords. It is almost as if `precompile`
> > is not doing the entire job.
> > Have you tried, just testing the basic precompile-1st run-2nd run timing
> > of a no keyword generic function?
> > 
> > On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote:
> >> I've been experimenting further with SnoopCompile and Immerse/Gadfly,
> >> trying to
> >> shave off more of the time-to-first-plot. If I pull out all the stops
> >> (using the
> >> "userimg.jl" strategy), I can get it down t

[julia-users] Re: Precompilation and functions with keyword arguments

2015-11-25 Thread Dan
Further investigation suggests recompilation is happening. This raises 
questions:
1) why is there recompilation?
2) why are the memory allocations associated not reported?
3) is there some memory leak?

The following run suggests the recompilation:

  | | |_| | | | (_| |  |  Version 0.4.2-pre+15 (2015-11-20 12:12 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit c4bbb89* (5 days old release-0.4)
|__/   |  x86_64-linux-gnu


julia> testfunction2(n) = (x = 0.0; for i=1:n x += sin(i) ; end ; x)
testfunction2 (generic function with 1 method)


julia> @time precompile(testfunction2,(Int64,))
  0.004197 seconds (2.84 k allocations: 146.004 KB)


julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false
)
"\ndefine double @julia_testfunction2_21229(i64) {\ntop:\n  %1 = icmp sgt 
i64 %0, 0\n  %2 = select i1 %1, i64 %0, i64 0\n  %3 = icmp eq i64 %2, 0\n 
 br i1 %3, label %L3, label %L\n\nL:   
 ; preds = %pass, %top\n  %x.0 = phi double [ %11, %pass ], [ 
0.00e+00, %top ]\n  %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n 
 %4 = sitofp i64 %\"#s1.0\" to double\n  %5 = call double inttoptr (i64 
139659687619792 to double (double)*)(double inreg %4)\n  %6 = fcmp uno 
double %4, 0.00e+00\n  %7 = fcmp ord double %5, 0.00e+00\n  %8 = or 
i1 %7, %6\n  br i1 %8, label %pass, label %fail\n\nfail:   
  ; preds = %L\n  %9 = load %jl_value_t** 
@jl_domain_exception, align 8\n  call void 
@jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n 
 unreachable\n\npass: ; preds = 
%L\n  %10 = add i64 %\"#s1.0\", 1\n  %11 = fadd double %x.0, %5\n  %12 = 
icmp eq i64 %\"#s1.0\", %2\n  br i1 %12, label %L3, label %L\n\nL3: 
  ; preds = %pass, %top\n  %x.1 = phi 
double [ 0.00e+00, %top ], [ %11, %pass ]\n  ret double %x.1\n}\n"


julia> @time testfunction2(100_000)
  0.005485 seconds (5 allocations: 176 bytes)
1.84103630354


julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false
)
"\ndefine double @julia_testfunction2_21293(i64) {\ntop:\n  %1 = icmp sgt 
i64 %0, 0\n  %2 = select i1 %1, i64 %0, i64 0\n  %3 = icmp eq i64 %2, 0\n 
 br i1 %3, label %L3, label %L\n\nL:   
 ; preds = %pass, %top\n  %x.0 = phi double [ %11, %pass ], [ 
0.00e+00, %top ]\n  %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n 
 %4 = sitofp i64 %\"#s1.0\" to double\n  %5 = call double inttoptr (i64 
139659687619792 to double (double)*)(double inreg %4)\n  %6 = fcmp uno 
double %4, 0.00e+00\n  %7 = fcmp ord double %5, 0.00e+00\n  %8 = or 
i1 %7, %6\n  br i1 %8, label %pass, label %fail\n\nfail:   
  ; preds = %L\n  %9 = load %jl_value_t** 
@jl_domain_exception, align 8\n  call void 
@jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n 
 unreachable\n\npass: ; preds = 
%L\n  %10 = add i64 %\"#s1.0\", 1\n  %11 = fadd double %x.0, %5\n  %12 = 
icmp eq i64 %\"#s1.0\", %2\n  br i1 %12, label %L3, label %L\n\nL3: 
  ; preds = %pass, %top\n  %x.1 = phi 
double [ 0.00e+00, %top ], [ %11, %pass ]\n  ret double %x.1\n}\n"


As can be seen, the first version is julia_testfunction2_21229 and the 
second is julia_testfunction2_21293. The signatures are the same.

Disclaimer - I'm very unfamiliar with the Julia codegen code (and much of 
the rest).
Intuition: It might be related to the following snippet from src/codegen.cpp
:

extern "C" DLLEXPORT
void *jl_get_llvmf(jl_function_t *f, jl_tupletype_t *tt, bool getwrapper)
{
:
:
:
if (sf->linfo->functionObject != NULL) {
// found in the system image: force a recompile
Function *llvmf = (Function*)sf->linfo->functionObject;
if (llvmf->isDeclaration()) {
sf->linfo->specFunctionObject = NULL;
sf->linfo->functionObject = NULL;
}
}


Note the force a recompile bit.


On Wednesday, November 25, 2015 at 10:23:04 PM UTC+2, Dan wrote:
>
> I've managed to reproduce the timing peculiarities you described, with a 
> different function. In fact, I get the same first run slower behavior even 
> with regular functions with -no- keywords. It is almost as if `precompile` 
> is not doing the entire job.
> Have you tried, just testing the basic precompile-1st run-2nd run timing 
> of a no keyword generic function? 
>
> On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote:
>>
>> I've been experimenting further with SnoopCompile and Immerse/Gadfly, 
>> trying to 
>> shave off more of the time-to-first-plot. If I pull out all the stops 
>> (using the 
>> "userimg.jl" strategy), I can get it down to about 2.5 seconds. However, 
>> doing 
>> the same plot a second time is about 0.02s. This indicates that despite 
>> my 
>> efforts, there's s

[julia-users] Re: Precompilation and functions with keyword arguments

2015-11-25 Thread Dan
Aha! tested the same sequence in a non-REPL run and the 1st and 2nd 
runtimes after a `precompile` are the same.

On Wednesday, November 25, 2015 at 10:23:04 PM UTC+2, Dan wrote:
>
> I've managed to reproduce the timing peculiarities you described, with a 
> different function. In fact, I get the same first run slower behavior even 
> with regular functions with -no- keywords. It is almost as if `precompile` 
> is not doing the entire job.
> Have you tried, just testing the basic precompile-1st run-2nd run timing 
> of a no keyword generic function? 
>
> On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote:
>>
>> I've been experimenting further with SnoopCompile and Immerse/Gadfly, 
>> trying to 
>> shave off more of the time-to-first-plot. If I pull out all the stops 
>> (using the 
>> "userimg.jl" strategy), I can get it down to about 2.5 seconds. However, 
>> doing 
>> the same plot a second time is about 0.02s. This indicates that despite 
>> my 
>> efforts, there's still a lot that's not being cached. 
>>
>> About 0.3s of that total (not much, but it's all I have data on) can be 
>> observed via snooping as re-compilation of functions that you might 
>> imagine 
>> should have been precompiled. The biggest offenders are all functions 
>> with 
>> keyword arguments. In miniature, I think you can see this here: 
>>
>> julia> function foo(X; thin=true) 
>>svdfact(X) 
>>end 
>> foo (generic function with 1 method) 
>>
>> # Before compiling this, let's make sure the work compiling foo isn't 
>> hidden 
>> # by other compilation needs: 
>> julia> A = rand(3,3) 
>> 3x3 Array{Float64,2}: 
>>  0.570780.33557   0.56497   
>>  0.0679035  0.944406  0.816098 
>>  0.0922775  0.404697  0.0900726 
>>
>> julia> svdfact(A) 
>> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>>  -0.507226   0.861331   0.0288001 
>>  -0.825227  -0.475789  -0.304344 
>>  -0.248438  -0.178138   0.952127 , 
>> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
>> Array{Float64,2}: 
>>  -0.248222  -0.707397  -0.661797 
>>   0.873751  -0.458478   0.162348 
>>   0.418264   0.537948  -0.731893) 
>>
>> # OK, let's precompile foo 
>> julia> @time precompile(foo, (Matrix{Float64},)) 
>>   0.000469 seconds (541 allocations: 35.650 KB) 
>>
>> julia> @time foo(A) 
>>   0.001174 seconds (18 allocations: 3.063 KB) 
>> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>>  -0.507226   0.861331   0.0288001 
>>  -0.825227  -0.475789  -0.304344 
>>  -0.248438  -0.178138   0.952127 , 
>> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
>> Array{Float64,2}: 
>>  -0.248222  -0.707397  -0.661797 
>>   0.873751  -0.458478   0.162348 
>>   0.418264   0.537948  -0.731893) 
>>
>> # Note the 2nd call is 10x faster, despite precompilation 
>> julia> @time foo(A) 
>>   0.000164 seconds (18 allocations: 3.063 KB) 
>> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>>  -0.507226   0.861331   0.0288001 
>>  -0.825227  -0.475789  -0.304344 
>>  -0.248438  -0.178138   0.952127 , 
>> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
>> Array{Float64,2}: 
>>  -0.248222  -0.707397  -0.661797 
>>   0.873751  -0.458478   0.162348 
>>   0.418264   0.537948  -0.731893) 
>>
>> # Note adding a keyword argument to the call causes a further 10x 
>> slowdown... 
>> julia> @time foo(A; thin=true) 
>>   0.014787 seconds (3.36 k allocations: 166.622 KB) 
>> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>>  -0.507226   0.861331   0.0288001 
>>  -0.825227  -0.475789  -0.304344 
>>  -0.248438  -0.178138   0.952127 , 
>> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
>> Array{Float64,2}: 
>>  -0.248222  -0.707397  -0.661797 
>>   0.873751  -0.458478   0.162348 
>>   0.418264   0.537948  -0.731893) 
>>
>> # ...but only for the first call 
>> julia> @time foo(A; thin=true) 
>>   0.000209 seconds (19 allocations: 3.141 KB) 
>> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>>  -0.507226   0.861331   0.0288001 
>>  -0.825227  -0.475789  -0.304344 
>>  -0.248438  -0.178138   0.952127 , 
>> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
>> Array{Float64,2}: 
>>  -0.248222  -0.707397  -0.661797 
>>   0.873751  -0.458478   0.162348 
>>   0.418264   0.537948  -0.731893) 
>>
>>
>> Obviously the times here don't add up to much, but for a project the size 
>> of 
>> Gadfly it might matter. 
>>
>> I should add that 
>> precompile(foo, (Vector{Any}, Matrix{Float64})) 
>> doesn't seem to do anything useful. 
>>
>> Any ideas? 
>>
>> Best, 
>> --Tim 
>>
>

[julia-users] Re: Precompilation and functions with keyword arguments

2015-11-25 Thread Dan
I've managed to reproduce the timing peculiarities you described, with a 
different function. In fact, I get the same first run slower behavior even 
with regular functions with -no- keywords. It is almost as if `precompile` 
is not doing the entire job.
Have you tried, just testing the basic precompile-1st run-2nd run timing of 
a no keyword generic function? 

On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote:
>
> I've been experimenting further with SnoopCompile and Immerse/Gadfly, 
> trying to 
> shave off more of the time-to-first-plot. If I pull out all the stops 
> (using the 
> "userimg.jl" strategy), I can get it down to about 2.5 seconds. However, 
> doing 
> the same plot a second time is about 0.02s. This indicates that despite my 
> efforts, there's still a lot that's not being cached. 
>
> About 0.3s of that total (not much, but it's all I have data on) can be 
> observed via snooping as re-compilation of functions that you might 
> imagine 
> should have been precompiled. The biggest offenders are all functions with 
> keyword arguments. In miniature, I think you can see this here: 
>
> julia> function foo(X; thin=true) 
>svdfact(X) 
>end 
> foo (generic function with 1 method) 
>
> # Before compiling this, let's make sure the work compiling foo isn't 
> hidden 
> # by other compilation needs: 
> julia> A = rand(3,3) 
> 3x3 Array{Float64,2}: 
>  0.570780.33557   0.56497   
>  0.0679035  0.944406  0.816098 
>  0.0922775  0.404697  0.0900726 
>
> julia> svdfact(A) 
> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>  -0.507226   0.861331   0.0288001 
>  -0.825227  -0.475789  -0.304344 
>  -0.248438  -0.178138   0.952127 , 
> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
> Array{Float64,2}: 
>  -0.248222  -0.707397  -0.661797 
>   0.873751  -0.458478   0.162348 
>   0.418264   0.537948  -0.731893) 
>
> # OK, let's precompile foo 
> julia> @time precompile(foo, (Matrix{Float64},)) 
>   0.000469 seconds (541 allocations: 35.650 KB) 
>
> julia> @time foo(A) 
>   0.001174 seconds (18 allocations: 3.063 KB) 
> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>  -0.507226   0.861331   0.0288001 
>  -0.825227  -0.475789  -0.304344 
>  -0.248438  -0.178138   0.952127 , 
> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
> Array{Float64,2}: 
>  -0.248222  -0.707397  -0.661797 
>   0.873751  -0.458478   0.162348 
>   0.418264   0.537948  -0.731893) 
>
> # Note the 2nd call is 10x faster, despite precompilation 
> julia> @time foo(A) 
>   0.000164 seconds (18 allocations: 3.063 KB) 
> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>  -0.507226   0.861331   0.0288001 
>  -0.825227  -0.475789  -0.304344 
>  -0.248438  -0.178138   0.952127 , 
> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
> Array{Float64,2}: 
>  -0.248222  -0.707397  -0.661797 
>   0.873751  -0.458478   0.162348 
>   0.418264   0.537948  -0.731893) 
>
> # Note adding a keyword argument to the call causes a further 10x 
> slowdown... 
> julia> @time foo(A; thin=true) 
>   0.014787 seconds (3.36 k allocations: 166.622 KB) 
> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>  -0.507226   0.861331   0.0288001 
>  -0.825227  -0.475789  -0.304344 
>  -0.248438  -0.178138   0.952127 , 
> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
> Array{Float64,2}: 
>  -0.248222  -0.707397  -0.661797 
>   0.873751  -0.458478   0.162348 
>   0.418264   0.537948  -0.731893) 
>
> # ...but only for the first call 
> julia> @time foo(A; thin=true) 
>   0.000209 seconds (19 allocations: 3.141 KB) 
> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: 
>  -0.507226   0.861331   0.0288001 
>  -0.825227  -0.475789  -0.304344 
>  -0.248438  -0.178138   0.952127 , 
> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 
> Array{Float64,2}: 
>  -0.248222  -0.707397  -0.661797 
>   0.873751  -0.458478   0.162348 
>   0.418264   0.537948  -0.731893) 
>
>
> Obviously the times here don't add up to much, but for a project the size 
> of 
> Gadfly it might matter. 
>
> I should add that 
> precompile(foo, (Vector{Any}, Matrix{Float64})) 
> doesn't seem to do anything useful. 
>
> Any ideas? 
>
> Best, 
> --Tim 
>