[julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Jason Riedy
And Elliot Saba writes:
 The first thing you should do is run your code once to warm up the
 JIT, and then run it again to measure the actual run time, rather
 than compile time + run time.

To be fair, he seems to be timing MATLAB in the same way, so he's
comparing systems appropriately at that level.

It's just the tuned BLAS+LAPACK  fftw v. the default ones.  This
is one reason why MATLAB bundles so much.  (Another reason being
the differences in numerical results causing support calls.  Took
a long time before MATLAB gave in to per-platform-tuned libraries.)



Re: [julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Andreas Noack
In addition our lu calculates a partially pivoted lu and returns the L and
U matrices and the vector of permutations. To get something comparable in
MATLAB you'll have to write

[L,,U,p] = lu(A,'vector')

On my old Mac where Julia is compiled with OpenBLAS the timings are

MATLAB:
 tic();for i = 1:10
[L,U,p] = qr(A, 'vector');
end;toc()/10

ans =

3.4801

Julia:
julia tic(); for i = 1:10
   qr(A);
   end;toc()/10
elapsed time: 14.758491472 seconds
1.4758491472

Med venlig hilsen

Andreas Noack

2014-09-18 15:33 GMT-04:00 Jason Riedy ja...@lovesgoodfood.com:

 And Elliot Saba writes:
  The first thing you should do is run your code once to warm up the
  JIT, and then run it again to measure the actual run time, rather
  than compile time + run time.

 To be fair, he seems to be timing MATLAB in the same way, so he's
 comparing systems appropriately at that level.

 It's just the tuned BLAS+LAPACK  fftw v. the default ones.  This
 is one reason why MATLAB bundles so much.  (Another reason being
 the differences in numerical results causing support calls.  Took
 a long time before MATLAB gave in to per-platform-tuned libraries.)




Re: [julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Stefan Karpinski
I'm slightly confused – does that mean Julia is 2.4x faster in this case?

On Thu, Sep 18, 2014 at 3:53 PM, Andreas Noack andreasnoackjen...@gmail.com
 wrote:

 In addition our lu calculates a partially pivoted lu and returns the L and
 U matrices and the vector of permutations. To get something comparable in
 MATLAB you'll have to write

 [L,,U,p] = lu(A,'vector')

 On my old Mac where Julia is compiled with OpenBLAS the timings are

 MATLAB:
  tic();for i = 1:10
 [L,U,p] = qr(A, 'vector');
 end;toc()/10

 ans =

 3.4801

 Julia:
 julia tic(); for i = 1:10
qr(A);
end;toc()/10
 elapsed time: 14.758491472 seconds
 1.4758491472

 Med venlig hilsen

 Andreas Noack

 2014-09-18 15:33 GMT-04:00 Jason Riedy ja...@lovesgoodfood.com:

 And Elliot Saba writes:
  The first thing you should do is run your code once to warm up the
  JIT, and then run it again to measure the actual run time, rather
  than compile time + run time.

 To be fair, he seems to be timing MATLAB in the same way, so he's
 comparing systems appropriately at that level.

 It's just the tuned BLAS+LAPACK  fftw v. the default ones.  This
 is one reason why MATLAB bundles so much.  (Another reason being
 the differences in numerical results causing support calls.  Took
 a long time before MATLAB gave in to per-platform-tuned libraries.)





Re: [julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Andreas Noack
Yes. It appears so on my Mac. I just redid the timings with the same result.

Med venlig hilsen

Andreas Noack

2014-09-18 15:55 GMT-04:00 Stefan Karpinski ste...@karpinski.org:

 I'm slightly confused – does that mean Julia is 2.4x faster in this case?

 On Thu, Sep 18, 2014 at 3:53 PM, Andreas Noack 
 andreasnoackjen...@gmail.com wrote:

 In addition our lu calculates a partially pivoted lu and returns the L
 and U matrices and the vector of permutations. To get something comparable
 in MATLAB you'll have to write

 [L,,U,p] = lu(A,'vector')

 On my old Mac where Julia is compiled with OpenBLAS the timings are

 MATLAB:
  tic();for i = 1:10
 [L,U,p] = qr(A, 'vector');
 end;toc()/10

 ans =

 3.4801

 Julia:
 julia tic(); for i = 1:10
qr(A);
end;toc()/10
 elapsed time: 14.758491472 seconds
 1.4758491472

 Med venlig hilsen

 Andreas Noack

 2014-09-18 15:33 GMT-04:00 Jason Riedy ja...@lovesgoodfood.com:

 And Elliot Saba writes:
  The first thing you should do is run your code once to warm up the
  JIT, and then run it again to measure the actual run time, rather
  than compile time + run time.

 To be fair, he seems to be timing MATLAB in the same way, so he's
 comparing systems appropriately at that level.

 It's just the tuned BLAS+LAPACK  fftw v. the default ones.  This
 is one reason why MATLAB bundles so much.  (Another reason being
 the differences in numerical results causing support calls.  Took
 a long time before MATLAB gave in to per-platform-tuned libraries.)






Re: [julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Stefan Karpinski
Nice :-)

On Thu, Sep 18, 2014 at 4:20 PM, Andreas Noack andreasnoackjen...@gmail.com
 wrote:

 Yes. It appears so on my Mac. I just redid the timings with the same
 result.

 Med venlig hilsen

 Andreas Noack

 2014-09-18 15:55 GMT-04:00 Stefan Karpinski ste...@karpinski.org:

 I'm slightly confused – does that mean Julia is 2.4x faster in this case?

 On Thu, Sep 18, 2014 at 3:53 PM, Andreas Noack 
 andreasnoackjen...@gmail.com wrote:

 In addition our lu calculates a partially pivoted lu and returns the L
 and U matrices and the vector of permutations. To get something comparable
 in MATLAB you'll have to write

 [L,,U,p] = lu(A,'vector')

 On my old Mac where Julia is compiled with OpenBLAS the timings are

 MATLAB:
  tic();for i = 1:10
 [L,U,p] = qr(A, 'vector');
 end;toc()/10

 ans =

 3.4801

 Julia:
 julia tic(); for i = 1:10
qr(A);
end;toc()/10
 elapsed time: 14.758491472 seconds
 1.4758491472

 Med venlig hilsen

 Andreas Noack

 2014-09-18 15:33 GMT-04:00 Jason Riedy ja...@lovesgoodfood.com:

 And Elliot Saba writes:
  The first thing you should do is run your code once to warm up the
  JIT, and then run it again to measure the actual run time, rather
  than compile time + run time.

 To be fair, he seems to be timing MATLAB in the same way, so he's
 comparing systems appropriately at that level.

 It's just the tuned BLAS+LAPACK  fftw v. the default ones.  This
 is one reason why MATLAB bundles so much.  (Another reason being
 the differences in numerical results causing support calls.  Took
 a long time before MATLAB gave in to per-platform-tuned libraries.)







Re: [julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Andreas Noack
I knew something was not right. I typed qr, not lu. Hence in that case,
 MATLAB did pivoting and Julia didn't. Sorry for that.

Here are the right timings for lu which are as expected. MKL is slightly
faster than OpenBLAS.

MATLAB:
 tic();for i = 1:10
[L,U,p] = lu(A, 'vector');
end;toc()/10

ans =

0.2314


Julia:
julia tic(); for i = 1:10
   lu(A);
   end;toc()/10
elapsed time: 3.147632455 seconds

0.3147632455

Med venlig hilsen

Andreas Noack

2014-09-18 16:25 GMT-04:00 Stefan Karpinski ste...@karpinski.org:

 Nice :-)

 On Thu, Sep 18, 2014 at 4:20 PM, Andreas Noack 
 andreasnoackjen...@gmail.com wrote:

 Yes. It appears so on my Mac. I just redid the timings with the same
 result.

 Med venlig hilsen

 Andreas Noack

 2014-09-18 15:55 GMT-04:00 Stefan Karpinski ste...@karpinski.org:

 I'm slightly confused – does that mean Julia is 2.4x faster in this case?

 On Thu, Sep 18, 2014 at 3:53 PM, Andreas Noack 
 andreasnoackjen...@gmail.com wrote:

 In addition our lu calculates a partially pivoted lu and returns the L
 and U matrices and the vector of permutations. To get something comparable
 in MATLAB you'll have to write

 [L,,U,p] = lu(A,'vector')

 On my old Mac where Julia is compiled with OpenBLAS the timings are

 MATLAB:
  tic();for i = 1:10
 [L,U,p] = qr(A, 'vector');
 end;toc()/10

 ans =

 3.4801

 Julia:
 julia tic(); for i = 1:10
qr(A);
end;toc()/10
 elapsed time: 14.758491472 seconds
 1.4758491472

 Med venlig hilsen

 Andreas Noack

 2014-09-18 15:33 GMT-04:00 Jason Riedy ja...@lovesgoodfood.com:

 And Elliot Saba writes:
  The first thing you should do is run your code once to warm up the
  JIT, and then run it again to measure the actual run time, rather
  than compile time + run time.

 To be fair, he seems to be timing MATLAB in the same way, so he's
 comparing systems appropriately at that level.

 It's just the tuned BLAS+LAPACK  fftw v. the default ones.  This
 is one reason why MATLAB bundles so much.  (Another reason being
 the differences in numerical results causing support calls.  Took
 a long time before MATLAB gave in to per-platform-tuned libraries.)








[julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Stephan Buchert
Thanks for the tips. I have now compiled julia on my laptop, and the 
results are:

julia versioninfo()
Julia Version 0.3.0+6
Commit 7681878* (2014-08-20 20:43 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

julia include(code/julia/bench.jl)
LU decomposition, elapsed time: 0.123349203 seconds
FFT , elapsed time: 0.20440579 seconds

Matlab r2104a, with [L,U,P] = lu(A); instead of just lu(A);
LU decomposition, elapsed time: 0.0586 seconds 
FFT  elapsed time: 0.0809 seconds

So a great improvement, but julia seems still 2-3 times slower than matlab, 
the underlying linear algebra libraries, respectively, and for these two 
very limited bench marks. Perhaps Matlab found a way to speed their 
lin.alg. up recently?

The Fedora precompiled openblas was installed already at the first test 
(and presumably used by julia), but, as Andreas has also pointed out,  it 
seems to be significantly slower than an openblas library compiled now with 
the julia installation.  



[julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Peter Simon
I have found that I get better performance from some openblas routines by 
setting the number of blas threads to the number of physical CPU cores 
(half the number returned by CPU_CORES when hyperthreading is enabled):

 Base.blas_set_num_threads(div(CPU_CORES,2))

--Peter


On Thursday, September 18, 2014 3:09:17 PM UTC-7, Stephan Buchert wrote:

 Thanks for the tips. I have now compiled julia on my laptop, and the 
 results are:

 julia versioninfo()
 Julia Version 0.3.0+6
 Commit 7681878* (2014-08-20 20:43 UTC)
 Platform Info:
   System: Linux (x86_64-redhat-linux)
   CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
   WORD_SIZE: 64
   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
   LAPACK: libopenblas
   LIBM: libopenlibm
   LLVM: libLLVM-3.3

 julia include(code/julia/bench.jl)
 LU decomposition, elapsed time: 0.123349203 seconds
 FFT , elapsed time: 0.20440579 seconds

 Matlab r2104a, with [L,U,P] = lu(A); instead of just lu(A);
 LU decomposition, elapsed time: 0.0586 seconds 
 FFT  elapsed time: 0.0809 seconds

 So a great improvement, but julia seems still 2-3 times slower than 
 matlab, the underlying linear algebra libraries, respectively, and for 
 these two very limited bench marks. Perhaps Matlab found a way to speed 
 their lin.alg. up recently?

 The Fedora precompiled openblas was installed already at the first test 
 (and presumably used by julia), but, as Andreas has also pointed out,  it 
 seems to be significantly slower than an openblas library compiled now with 
 the julia installation.  



Re: [julia-users] Re: Matlab bench in Julia

2014-09-18 Thread Andreas Noack
As Douglas Bates wrote, these benchmarks mainly measures the speed of the
underlying libraries. MATLAB uses MKL from Intel which is often the fastest
library. However, the speed of OpenBLAS can be very different on different
architectures and sometimes it can be faster than MKL. I just tried the
benchmarks on a Linux server where that is the case.

Milan, unfortunately I don't remember which distribution it was. I think it
was a couple of months ago, but I'm not sure.

Med venlig hilsen

Andreas Noack

2014-09-18 19:06 GMT-04:00 Peter Simon psimon0...@gmail.com:

 I have found that I get better performance from some openblas routines by
 setting the number of blas threads to the number of physical CPU cores
 (half the number returned by CPU_CORES when hyperthreading is enabled):

  Base.blas_set_num_threads(div(CPU_CORES,2))

 --Peter



 On Thursday, September 18, 2014 3:09:17 PM UTC-7, Stephan Buchert wrote:

 Thanks for the tips. I have now compiled julia on my laptop, and the
 results are:

 julia versioninfo()
 Julia Version 0.3.0+6
 Commit 7681878* (2014-08-20 20:43 UTC)
 Platform Info:
   System: Linux (x86_64-redhat-linux)
   CPU: Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
   WORD_SIZE: 64
   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
   LAPACK: libopenblas
   LIBM: libopenlibm
   LLVM: libLLVM-3.3

 julia include(code/julia/bench.jl)
 LU decomposition, elapsed time: 0.123349203 seconds
 FFT , elapsed time: 0.20440579 seconds

 Matlab r2104a, with [L,U,P] = lu(A); instead of just lu(A);
 LU decomposition, elapsed time: 0.0586 seconds
 FFT  elapsed time: 0.0809 seconds

 So a great improvement, but julia seems still 2-3 times slower than
 matlab, the underlying linear algebra libraries, respectively, and for
 these two very limited bench marks. Perhaps Matlab found a way to speed
 their lin.alg. up recently?

 The Fedora precompiled openblas was installed already at the first test
 (and presumably used by julia), but, as Andreas has also pointed out,  it
 seems to be significantly slower than an openblas library compiled now with
 the julia installation.