Re: [julia-users] Re: Precompilation and functions with keyword arguments
Nice detective work, Dan! Quite helpful. Best, --Tim On Wednesday, November 25, 2015 02:08:47 PM Dan wrote: > Further investigation suggests recompilation is happening. This raises > questions: > 1) why is there recompilation? > 2) why are the memory allocations associated not reported? > 3) is there some memory leak? > > The following run suggests the recompilation: > | | |_| | | | (_| | | Version 0.4.2-pre+15 (2015-11-20 12:12 UTC) > > _/ |\__'_|_|_|\__'_| | Commit c4bbb89* (5 days old release-0.4) > > |__/ | x86_64-linux-gnu > > julia> testfunction2(n) = (x = 0.0; for i=1:n x += sin(i) ; end ; x) > testfunction2 (generic function with 1 method) > > > julia> @time precompile(testfunction2,(Int64,)) > 0.004197 seconds (2.84 k allocations: 146.004 KB) > > > julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false > ) > "\ndefine double @julia_testfunction2_21229(i64) {\ntop:\n %1 = icmp sgt > i64 %0, 0\n %2 = select i1 %1, i64 %0, i64 0\n %3 = icmp eq i64 %2, 0\n > br i1 %3, label %L3, label %L\n\nL: > ; preds = %pass, %top\n %x.0 = phi double [ %11, %pass ], [ > 0.00e+00, %top ]\n %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n > %4 = sitofp i64 %\"#s1.0\" to double\n %5 = call double inttoptr (i64 > 139659687619792 to double (double)*)(double inreg %4)\n %6 = fcmp uno > double %4, 0.00e+00\n %7 = fcmp ord double %5, 0.00e+00\n %8 = or > i1 %7, %6\n br i1 %8, label %pass, label %fail\n\nfail: > ; preds = %L\n %9 = load %jl_value_t** > @jl_domain_exception, align 8\n call void > @jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n > unreachable\n\npass: ; preds = > %L\n %10 = add i64 %\"#s1.0\", 1\n %11 = fadd double %x.0, %5\n %12 = > icmp eq i64 %\"#s1.0\", %2\n br i1 %12, label %L3, label %L\n\nL3: > ; preds = %pass, %top\n %x.1 = phi > double [ 0.00e+00, %top ], [ %11, %pass ]\n ret double %x.1\n}\n" > > > julia> @time testfunction2(100_000) > 0.005485 seconds (5 allocations: 176 bytes) > 1.84103630354 > > > julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false > ) > "\ndefine double @julia_testfunction2_21293(i64) {\ntop:\n %1 = icmp sgt > i64 %0, 0\n %2 = select i1 %1, i64 %0, i64 0\n %3 = icmp eq i64 %2, 0\n > br i1 %3, label %L3, label %L\n\nL: > ; preds = %pass, %top\n %x.0 = phi double [ %11, %pass ], [ > 0.00e+00, %top ]\n %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n > %4 = sitofp i64 %\"#s1.0\" to double\n %5 = call double inttoptr (i64 > 139659687619792 to double (double)*)(double inreg %4)\n %6 = fcmp uno > double %4, 0.00e+00\n %7 = fcmp ord double %5, 0.00e+00\n %8 = or > i1 %7, %6\n br i1 %8, label %pass, label %fail\n\nfail: > ; preds = %L\n %9 = load %jl_value_t** > @jl_domain_exception, align 8\n call void > @jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n > unreachable\n\npass: ; preds = > %L\n %10 = add i64 %\"#s1.0\", 1\n %11 = fadd double %x.0, %5\n %12 = > icmp eq i64 %\"#s1.0\", %2\n br i1 %12, label %L3, label %L\n\nL3: > ; preds = %pass, %top\n %x.1 = phi > double [ 0.00e+00, %top ], [ %11, %pass ]\n ret double %x.1\n}\n" > > > As can be seen, the first version is julia_testfunction2_21229 and the > second is julia_testfunction2_21293. The signatures are the same. > > Disclaimer - I'm very unfamiliar with the Julia codegen code (and much of > the rest). > Intuition: It might be related to the following snippet from src/codegen.cpp > > > extern "C" DLLEXPORT > void *jl_get_llvmf(jl_function_t *f, jl_tupletype_t *tt, bool getwrapper) > { > > > > if (sf->linfo->functionObject != NULL) { > // found in the system image: force a recompile > Function *llvmf = (Function*)sf->linfo->functionObject; > if (llvmf->isDeclaration()) { > sf->linfo->specFunctionObject = NULL; > sf->linfo->functionObject = NULL; > } > } > > > Note the force a recompile bit. > > On Wednesday, November 25, 2015 at 10:23:04 PM UTC+2, Dan wrote: > > I've managed to reproduce the timing peculiarities you described, with a > > different function. In fact, I get the same first run slower behavior even > > with regular functions with -no- keywords. It is almost as if `precompile` > > is not doing the entire job. > > Have you tried, just testing the basic precompile-1st run-2nd run timing > > of a no keyword generic function? > > > > On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote: > >> I've been experimenting further with SnoopCompile and Immerse/Gadfly, > >> trying to > >> shave off more of the time-to-first-plot. If I pull out all the stops > >> (using the > >> "userimg.jl" strategy), I can get it down t
[julia-users] Re: Precompilation and functions with keyword arguments
Further investigation suggests recompilation is happening. This raises questions: 1) why is there recompilation? 2) why are the memory allocations associated not reported? 3) is there some memory leak? The following run suggests the recompilation: | | |_| | | | (_| | | Version 0.4.2-pre+15 (2015-11-20 12:12 UTC) _/ |\__'_|_|_|\__'_| | Commit c4bbb89* (5 days old release-0.4) |__/ | x86_64-linux-gnu julia> testfunction2(n) = (x = 0.0; for i=1:n x += sin(i) ; end ; x) testfunction2 (generic function with 1 method) julia> @time precompile(testfunction2,(Int64,)) 0.004197 seconds (2.84 k allocations: 146.004 KB) julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false ) "\ndefine double @julia_testfunction2_21229(i64) {\ntop:\n %1 = icmp sgt i64 %0, 0\n %2 = select i1 %1, i64 %0, i64 0\n %3 = icmp eq i64 %2, 0\n br i1 %3, label %L3, label %L\n\nL: ; preds = %pass, %top\n %x.0 = phi double [ %11, %pass ], [ 0.00e+00, %top ]\n %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n %4 = sitofp i64 %\"#s1.0\" to double\n %5 = call double inttoptr (i64 139659687619792 to double (double)*)(double inreg %4)\n %6 = fcmp uno double %4, 0.00e+00\n %7 = fcmp ord double %5, 0.00e+00\n %8 = or i1 %7, %6\n br i1 %8, label %pass, label %fail\n\nfail: ; preds = %L\n %9 = load %jl_value_t** @jl_domain_exception, align 8\n call void @jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n unreachable\n\npass: ; preds = %L\n %10 = add i64 %\"#s1.0\", 1\n %11 = fadd double %x.0, %5\n %12 = icmp eq i64 %\"#s1.0\", %2\n br i1 %12, label %L3, label %L\n\nL3: ; preds = %pass, %top\n %x.1 = phi double [ 0.00e+00, %top ], [ %11, %pass ]\n ret double %x.1\n}\n" julia> @time testfunction2(100_000) 0.005485 seconds (5 allocations: 176 bytes) 1.84103630354 julia> a = Base._dump_function(testfunction2,(Int64,),false,false,true,false ) "\ndefine double @julia_testfunction2_21293(i64) {\ntop:\n %1 = icmp sgt i64 %0, 0\n %2 = select i1 %1, i64 %0, i64 0\n %3 = icmp eq i64 %2, 0\n br i1 %3, label %L3, label %L\n\nL: ; preds = %pass, %top\n %x.0 = phi double [ %11, %pass ], [ 0.00e+00, %top ]\n %\"#s1.0\" = phi i64 [ %10, %pass ], [ 1, %top ]\n %4 = sitofp i64 %\"#s1.0\" to double\n %5 = call double inttoptr (i64 139659687619792 to double (double)*)(double inreg %4)\n %6 = fcmp uno double %4, 0.00e+00\n %7 = fcmp ord double %5, 0.00e+00\n %8 = or i1 %7, %6\n br i1 %8, label %pass, label %fail\n\nfail: ; preds = %L\n %9 = load %jl_value_t** @jl_domain_exception, align 8\n call void @jl_throw_with_superfluous_argument(%jl_value_t* %9, i32 1)\n unreachable\n\npass: ; preds = %L\n %10 = add i64 %\"#s1.0\", 1\n %11 = fadd double %x.0, %5\n %12 = icmp eq i64 %\"#s1.0\", %2\n br i1 %12, label %L3, label %L\n\nL3: ; preds = %pass, %top\n %x.1 = phi double [ 0.00e+00, %top ], [ %11, %pass ]\n ret double %x.1\n}\n" As can be seen, the first version is julia_testfunction2_21229 and the second is julia_testfunction2_21293. The signatures are the same. Disclaimer - I'm very unfamiliar with the Julia codegen code (and much of the rest). Intuition: It might be related to the following snippet from src/codegen.cpp : extern "C" DLLEXPORT void *jl_get_llvmf(jl_function_t *f, jl_tupletype_t *tt, bool getwrapper) { : : : if (sf->linfo->functionObject != NULL) { // found in the system image: force a recompile Function *llvmf = (Function*)sf->linfo->functionObject; if (llvmf->isDeclaration()) { sf->linfo->specFunctionObject = NULL; sf->linfo->functionObject = NULL; } } Note the force a recompile bit. On Wednesday, November 25, 2015 at 10:23:04 PM UTC+2, Dan wrote: > > I've managed to reproduce the timing peculiarities you described, with a > different function. In fact, I get the same first run slower behavior even > with regular functions with -no- keywords. It is almost as if `precompile` > is not doing the entire job. > Have you tried, just testing the basic precompile-1st run-2nd run timing > of a no keyword generic function? > > On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote: >> >> I've been experimenting further with SnoopCompile and Immerse/Gadfly, >> trying to >> shave off more of the time-to-first-plot. If I pull out all the stops >> (using the >> "userimg.jl" strategy), I can get it down to about 2.5 seconds. However, >> doing >> the same plot a second time is about 0.02s. This indicates that despite >> my >> efforts, there's s
[julia-users] Re: Precompilation and functions with keyword arguments
Aha! tested the same sequence in a non-REPL run and the 1st and 2nd runtimes after a `precompile` are the same. On Wednesday, November 25, 2015 at 10:23:04 PM UTC+2, Dan wrote: > > I've managed to reproduce the timing peculiarities you described, with a > different function. In fact, I get the same first run slower behavior even > with regular functions with -no- keywords. It is almost as if `precompile` > is not doing the entire job. > Have you tried, just testing the basic precompile-1st run-2nd run timing > of a no keyword generic function? > > On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote: >> >> I've been experimenting further with SnoopCompile and Immerse/Gadfly, >> trying to >> shave off more of the time-to-first-plot. If I pull out all the stops >> (using the >> "userimg.jl" strategy), I can get it down to about 2.5 seconds. However, >> doing >> the same plot a second time is about 0.02s. This indicates that despite >> my >> efforts, there's still a lot that's not being cached. >> >> About 0.3s of that total (not much, but it's all I have data on) can be >> observed via snooping as re-compilation of functions that you might >> imagine >> should have been precompiled. The biggest offenders are all functions >> with >> keyword arguments. In miniature, I think you can see this here: >> >> julia> function foo(X; thin=true) >>svdfact(X) >>end >> foo (generic function with 1 method) >> >> # Before compiling this, let's make sure the work compiling foo isn't >> hidden >> # by other compilation needs: >> julia> A = rand(3,3) >> 3x3 Array{Float64,2}: >> 0.570780.33557 0.56497 >> 0.0679035 0.944406 0.816098 >> 0.0922775 0.404697 0.0900726 >> >> julia> svdfact(A) >> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: >> -0.507226 0.861331 0.0288001 >> -0.825227 -0.475789 -0.304344 >> -0.248438 -0.178138 0.952127 , >> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 >> Array{Float64,2}: >> -0.248222 -0.707397 -0.661797 >> 0.873751 -0.458478 0.162348 >> 0.418264 0.537948 -0.731893) >> >> # OK, let's precompile foo >> julia> @time precompile(foo, (Matrix{Float64},)) >> 0.000469 seconds (541 allocations: 35.650 KB) >> >> julia> @time foo(A) >> 0.001174 seconds (18 allocations: 3.063 KB) >> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: >> -0.507226 0.861331 0.0288001 >> -0.825227 -0.475789 -0.304344 >> -0.248438 -0.178138 0.952127 , >> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 >> Array{Float64,2}: >> -0.248222 -0.707397 -0.661797 >> 0.873751 -0.458478 0.162348 >> 0.418264 0.537948 -0.731893) >> >> # Note the 2nd call is 10x faster, despite precompilation >> julia> @time foo(A) >> 0.000164 seconds (18 allocations: 3.063 KB) >> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: >> -0.507226 0.861331 0.0288001 >> -0.825227 -0.475789 -0.304344 >> -0.248438 -0.178138 0.952127 , >> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 >> Array{Float64,2}: >> -0.248222 -0.707397 -0.661797 >> 0.873751 -0.458478 0.162348 >> 0.418264 0.537948 -0.731893) >> >> # Note adding a keyword argument to the call causes a further 10x >> slowdown... >> julia> @time foo(A; thin=true) >> 0.014787 seconds (3.36 k allocations: 166.622 KB) >> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: >> -0.507226 0.861331 0.0288001 >> -0.825227 -0.475789 -0.304344 >> -0.248438 -0.178138 0.952127 , >> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 >> Array{Float64,2}: >> -0.248222 -0.707397 -0.661797 >> 0.873751 -0.458478 0.162348 >> 0.418264 0.537948 -0.731893) >> >> # ...but only for the first call >> julia> @time foo(A; thin=true) >> 0.000209 seconds (19 allocations: 3.141 KB) >> Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: >> -0.507226 0.861331 0.0288001 >> -0.825227 -0.475789 -0.304344 >> -0.248438 -0.178138 0.952127 , >> [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 >> Array{Float64,2}: >> -0.248222 -0.707397 -0.661797 >> 0.873751 -0.458478 0.162348 >> 0.418264 0.537948 -0.731893) >> >> >> Obviously the times here don't add up to much, but for a project the size >> of >> Gadfly it might matter. >> >> I should add that >> precompile(foo, (Vector{Any}, Matrix{Float64})) >> doesn't seem to do anything useful. >> >> Any ideas? >> >> Best, >> --Tim >> >
[julia-users] Re: Precompilation and functions with keyword arguments
I've managed to reproduce the timing peculiarities you described, with a different function. In fact, I get the same first run slower behavior even with regular functions with -no- keywords. It is almost as if `precompile` is not doing the entire job. Have you tried, just testing the basic precompile-1st run-2nd run timing of a no keyword generic function? On Tuesday, November 24, 2015 at 9:43:41 PM UTC+2, Tim Holy wrote: > > I've been experimenting further with SnoopCompile and Immerse/Gadfly, > trying to > shave off more of the time-to-first-plot. If I pull out all the stops > (using the > "userimg.jl" strategy), I can get it down to about 2.5 seconds. However, > doing > the same plot a second time is about 0.02s. This indicates that despite my > efforts, there's still a lot that's not being cached. > > About 0.3s of that total (not much, but it's all I have data on) can be > observed via snooping as re-compilation of functions that you might > imagine > should have been precompiled. The biggest offenders are all functions with > keyword arguments. In miniature, I think you can see this here: > > julia> function foo(X; thin=true) >svdfact(X) >end > foo (generic function with 1 method) > > # Before compiling this, let's make sure the work compiling foo isn't > hidden > # by other compilation needs: > julia> A = rand(3,3) > 3x3 Array{Float64,2}: > 0.570780.33557 0.56497 > 0.0679035 0.944406 0.816098 > 0.0922775 0.404697 0.0900726 > > julia> svdfact(A) > Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: > -0.507226 0.861331 0.0288001 > -0.825227 -0.475789 -0.304344 > -0.248438 -0.178138 0.952127 , > [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 > Array{Float64,2}: > -0.248222 -0.707397 -0.661797 > 0.873751 -0.458478 0.162348 > 0.418264 0.537948 -0.731893) > > # OK, let's precompile foo > julia> @time precompile(foo, (Matrix{Float64},)) > 0.000469 seconds (541 allocations: 35.650 KB) > > julia> @time foo(A) > 0.001174 seconds (18 allocations: 3.063 KB) > Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: > -0.507226 0.861331 0.0288001 > -0.825227 -0.475789 -0.304344 > -0.248438 -0.178138 0.952127 , > [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 > Array{Float64,2}: > -0.248222 -0.707397 -0.661797 > 0.873751 -0.458478 0.162348 > 0.418264 0.537948 -0.731893) > > # Note the 2nd call is 10x faster, despite precompilation > julia> @time foo(A) > 0.000164 seconds (18 allocations: 3.063 KB) > Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: > -0.507226 0.861331 0.0288001 > -0.825227 -0.475789 -0.304344 > -0.248438 -0.178138 0.952127 , > [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 > Array{Float64,2}: > -0.248222 -0.707397 -0.661797 > 0.873751 -0.458478 0.162348 > 0.418264 0.537948 -0.731893) > > # Note adding a keyword argument to the call causes a further 10x > slowdown... > julia> @time foo(A; thin=true) > 0.014787 seconds (3.36 k allocations: 166.622 KB) > Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: > -0.507226 0.861331 0.0288001 > -0.825227 -0.475789 -0.304344 > -0.248438 -0.178138 0.952127 , > [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 > Array{Float64,2}: > -0.248222 -0.707397 -0.661797 > 0.873751 -0.458478 0.162348 > 0.418264 0.537948 -0.731893) > > # ...but only for the first call > julia> @time foo(A; thin=true) > 0.000209 seconds (19 allocations: 3.141 KB) > Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}(3x3 Array{Float64,2}: > -0.507226 0.861331 0.0288001 > -0.825227 -0.475789 -0.304344 > -0.248438 -0.178138 0.952127 , > [1.4844598265207638,0.5068781079827415,0.19995120630810712],3x3 > Array{Float64,2}: > -0.248222 -0.707397 -0.661797 > 0.873751 -0.458478 0.162348 > 0.418264 0.537948 -0.731893) > > > Obviously the times here don't add up to much, but for a project the size > of > Gadfly it might matter. > > I should add that > precompile(foo, (Vector{Any}, Matrix{Float64})) > doesn't seem to do anything useful. > > Any ideas? > > Best, > --Tim >