Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < Julia < Java < Matlab < the rest

Tobias Knopp Tue, 17 Jun 2014 13:25:23 -0700

I think one has to distinguish between the Julia core dependencies and the 
runtime dependencies. The later (like OpenBlas) don't tell us much how fast 
"Julia" is. The libm issue discussed in this thread is of such a nature.


Am Dienstag, 17. Juni 2014 22:03:51 UTC+2 schrieb Tony Kelman:
>
> We're diverging from the topic of the thread, but anyway...
>
> No, MSVC OpenBLAS will probably never happen, you'd have to CMake-ify the 
> whole thing and probably translate all of the assembly to Intel syntax. And 
> skip the Fortran, or use Intel's compiler. I don't think they have the 
> resources to do that.
>
> There's a C99-only optimized BLAS implementation under development by the 
> FLAME group at University of Texas here https://github.com/flame/blis 
> that does aim to eventually support MSVC. It's nowhere near as mature as 
> OpenBLAS in terms of automatically detecting architecture, cache sizes, 
> etc. But their papers look very promising. They could use more people 
> poking at it and submitting patches to get it to the usability level we'd 
> need.
>
> The rest of the dependencies vary significantly in how painful they would 
> be to build with MSVC. GMP in particular was forked into a new project 
> called MPIR, with MSVC compatibility being one of the major reasons.
>
>
>
> On Tuesday, June 17, 2014 12:47:49 PM UTC-7, David Anthoff wrote:
>>
>> I was more thinking that this might make a difference for some of the 
>> dependencies, like openblas? But I’m not even sure that can be compiled at 
>> all using MS compilers…
>>
>>  
>>
>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On 
>> Behalf Of *Tobias Knopp
>> *Sent:* Tuesday, June 17, 2014 12:42 PM
>> *To:* julia...@googlegroups.com
>> *Subject:* Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < 
>> Julia < Java < Matlab < the rest
>>
>>  
>>
>> There are some remaining issues but compilation with MSVC is almost 
>> possible. I did some initial work and Tony Kelman made lots of progress in 
>> https://github.com/JuliaLang/julia/pull/6230. But there have not been 
>> any speed comparisons as far as I know. Note that Julia uses JIT 
>> compilation and thus I would not expect to have the source compiler have a 
>> huge impact.
>>
>>  
>>
>>
>> Am Dienstag, 17. Juni 2014 21:25:50 UTC+2 schrieb David Anthoff:
>>
>> Another interesting result from the paper is how much faster Visual C++ 
>> 2010 generated code is than gcc, on Windows. For their example, the gcc 
>> runtime is 2.29 the runtime of the MS compiled version. The difference 
>> might be even larger with Visual C++ 2013 because that is when MS added an 
>> auto-vectorizer that is on by default.
>>
>>  
>>
>> I vaguely remember a discussion about compiling julia itself with the MS 
>> compiler on Windows, is that working and is that making a performance 
>> difference?
>>
>>  
>>
>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On 
>> Behalf Of *Peter Simon
>> *Sent:* Tuesday, June 17, 2014 12:08 PM
>> *To:* julia...@googlegroups.com
>> *Subject:* Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < 
>> Julia < Java < Matlab < the rest
>>
>>  
>>
>> Sorry, Florian and David, for not seeing that you were way ahead of me.
>>
>>  
>>
>> On the subject of the log function:  I tried implementing mylog() as 
>> defined by Andreas on Julia running on CentOS and the result was a 
>> significant slowdown! (Yes, I defined the mylog function outside of main, 
>> at the module level).  Not sure if this is due to variation in the quality 
>> of the libm function on various systems or what.  If so, then it makes 
>> sense that Julia wants a uniformly accurate and fast implementation via 
>> openlibm.  But for fastest transcendental function performance, I assume 
>> that one must use the micro-coded versions built into the processor's 
>> FPU--Is that what the fast libm implementations do?  In that case, how 
>> could one hope to compete when using a C-coded version?
>>
>>  
>>
>> --Peter
>>
>>
>>
>> On Tuesday, June 17, 2014 10:57:47 AM UTC-7, David Anthoff wrote:
>>
>> I submitted three pull requests to the original repo that get rid of 
>> three different array allocations in loops and that make things a fair bit 
>> faster altogether:
>>
>>  
>>
>>
>> https://github.com/jesusfv/Comparison-Programming-Languages-Economics/pulls
>>
>>  
>>
>> I think it would also make sense to run these benchmarks on julia 0.3.0 
>> instead of 0.2.1, given that there have been a fair number of performance 
>> imrpovements.
>>
>>  
>>
>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On 
>> Behalf Of *Florian Oswald
>> *Sent:* Tuesday, June 17, 2014 10:50 AM
>> *To:* julia...@googlegroups.com
>> *Subject:* Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < 
>> Julia < Java < Matlab < the rest
>>
>>  
>>
>> thanks peter. I made that devectorizing change after dalua suggested so. 
>> It made a massive difference!
>>
>> On Tuesday, 17 June 2014, Peter Simon <psimo...@gmail.com> wrote:
>>
>> You're right.  Replacing the NumericExtensions function calls with a 
>> small loop
>>
>>  
>>
>>         maxDifference  = 0.0
>>         for k = 1:length(mValueFunction)
>>             maxDifference = max(maxDifference, abs(mValueFunction[k]- 
>> mValueFunctionNew[k]))
>>         end
>>
>>
>> makes no significant difference in execution time or memory allocation 
>> and eliminates the dependency.
>>
>>  
>>
>> --Peter
>>
>>
>>
>> On Tuesday, June 17, 2014 10:05:03 AM UTC-7, Andreas Noack Jensen wrote:
>>
>> ...but the Numba version doesn't use tricks like that. 
>>
>>  
>>
>> The uniform metric can also be calculated with a small loop. I think that 
>> requiring dependencies is against the purpose of the exercise.
>>
>>  
>>
>> 2014-06-17 18:56 GMT+02:00 Peter Simon <psimo...@gmail.com>:
>>
>> As pointed out by Dahua, there is a lot of unnecessary memory allocation. 
>>  This can be reduced significantly by replacing the lines
>>
>>  
>>
>>         maxDifference  = maximum(abs(mValueFunctionNew-mValueFunction))
>>         mValueFunction    = mValueFunctionNew
>>         mValueFunctionNew = zeros(nGridCapital,nGridProductivity)
>>
>>  
>>
>>  
>>
>> with
>>
>>  
>>
>>         maxDifference  = maximum(abs!(subtract!(mValueFunction, 
>> mValueFunctionNew)))
>>         (mValueFunction, mValueFunctionNew) = (mValueFunctionNew, 
>> mValueFunction)
>>         fill!(mValueFunctionNew, 0.0)
>>
>>  
>>
>> abs! and subtract! require adding the line
>>
>>  
>>
>> using NumericExtensions
>>
>>  
>>
>> prior to the function line.  I think the OP used Julia 0.2; I don't 
>> believe that NumericExtensions will work with that old version.  When I 
>> combine these changes with adding 
>>
>>  
>>
>> @inbounds begin
>> ...
>> end
>>
>>  
>>
>> block around the "while" loop, I get about 25% reduction in execution 
>> time, and reduction of memory allocation from roughly 700 MByte to 180 MByte
>>
>>  
>>
>> --Peter
>>
>>
>>
>> On Tuesday, June 17, 2014 9:32:34 AM UTC-7, John Myles White wrote:
>>
>> Sounds like we need to rerun these benchmarks after the new GC branch 
>> gets updated.
>>
>>  
>>
>>  -- John
>>
>>  
>>
>> On Jun 17, 2014, at 9:31 AM, Stefan Karpinski <ste...@karpinski.org> 
>> wrote:
>>
>>  
>>
>> That definitely smells like a GC issue. Python doesn't have this 
>> particular problem since it uses reference counting.
>>
>>  
>>
>> On Tue, Jun 17, 2014 at 12:21 PM, Cristóvão Duarte Sousa <
>> cri...@gmail.com> wrote:
>>
>> I've just done measurements of algorithm inner loop times in my machine 
>> by changing the code has shown in this commit 
>> <https://github.com/cdsousa/Comparison-Programming-Languages-Economics/commit/4f6198ad24adc146c268a1c2eeac14d5ae0f300c>
>> .
>>
>>  
>>
>> I've found out something... see for yourself:
>>
>>  
>>
>> using Winston
>> numba_times = readdlm("numba_times.dat")[10:end];
>> plot(numba_times)
>>
>>
>> <https://lh6.googleusercontent.com/-m1c6SAbijVM/U6BpmBmFbqI/AAAAAAAADdc/wtxnKuGFDy0/s1600/numba_times.png>
>>
>> julia_times = readdlm("julia_times.dat")[10:end];
>> plot(julia_times)
>>
>>  
>>
>>
>> <https://lh4.googleusercontent.com/-7iprMnjyZQY/U6Bp8gHVNJI/AAAAAAAADdk/yUgu8RyZ-Kw/s1600/julia_times.png>
>>
>> println((median(numba_times), mean(numba_times), var(numba_times)))
>>
>> (0.0028225183486938477,0.0028575707378805993,2.4830103817464292e-8)
>>
>>  
>>
>> println((median(julia_times), mean(julia_times), var(julia_times)))
>>
>> (0.0028240440000000004,0.0034863882123824454,1.7058255003790299e-6)
>>
>>  
>>
>> So, while inner loop times have more or less the same median on both 
>> Julia and Numba tests, the mean and variance are higher in Julia.
>>
>>  
>>
>> Can that be due to the garbage collector being kicking in?
>>
>>
>>
>> On Monday, June 16, 2014 4:52:07 PM UTC+1, Florian Oswald wrote:
>>
>> Dear all,
>>
>>  
>>
>> I thought you might find this paper interesting: 
>> http://economics.sas.upenn.edu/~jesusfv/comparison_languages.pdf
>>
>>  
>>
>> It takes a standard model from macro economics and computes it's solution 
>> with an identical algorithm in several languages. Julia is roughly 2.6 
>> times slower than the best C++ executable. I was bit puzzled by the result, 
>> since in the benchmarks on http://julialang.org/, the slowest test is 
>> 1.66 times C. I realize that those benchmarks can't cover all possible 
>> situations. That said, I couldn't really find anything unusual in the Julia 
>> code, did some profiling and removed type inference, but still that's as 
>> fast as I got it. That's not to say that I'm disappointed, I still think 
>> this is great. Did I miss something obvious here or is there something 
>> specific to this algorithm? 
>>
>>  
>>
>> The codes are on github at 
>>
>>  
>>
>> https://github.com/jesusfv/Comparison-Programming-Languages-Economics
>>
>>  
>>
>>  
>>
>>  
>>
>>  
>>
>>
>>
>>  
>>
>> -- 
>> Med venlig hilsen
>>
>> Andreas Noack Jensen
>>
>>

Re: [julia-users] Benchmarking study: C++ < Fortran < Numba < Julia < Java < Matlab < the rest

Reply via email to