Hi

This is my first experiment with Julia and I wanted to share some results. 
I have ported the STREAM benchmark (http://www.cs.virginia.edu/stream/) to 
Julia. The code is available on github 
(https://github.com/kapiliitr/JuliaBenchmarks/blob/master/streamp.jl).

I am getting the following performance results in Julia - 

Array size = 5000000 (elements), Offset = 0 (elements)
Memory per array = 38.14697265625 MiB (= 0.03725290298461914 GiB)
Total memory required = 114.44091796875 MiB (= 0.11175870895385742 GiB)
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:              43.0     1.885108     1.861376     1.908840
Scale:             37.1     2.166505     2.155083     2.177926
Add:               48.2     2.532873     2.487158     2.578587
Triad:             43.1     2.787225     2.784426     2.790023

I am getting the following performance results in C - 

Array size = 5000000 (elements), Offset = 0 (elements)
Memory per array = 38.1 MiB (= 0.0 GiB).
Total memory required = 114.4 MiB (= 0.1 GiB).
Each kernel will be executed 3 times.
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:            8553.3     0.009360     0.009353     0.009366
Scale:           8248.4     0.009712     0.009699     0.009726
Add:             9490.6     0.012987     0.012644     0.013329
Triad:           9032.0     0.013540     0.013286     0.013793


Following are the results with 4 processors in Julia-

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           11122.2     0.007308     0.007193     0.007423
Scale:            465.5     0.217924     0.171840     0.264008
Add:            12481.8     0.009678     0.009614     0.009742
Triad:            471.3     0.267199     0.254624     0.279775


Following are the results with  4 omp threads in C-

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           11077.0     0.007228     0.007222     0.007233
Scale:          10552.7     0.007587     0.007581     0.007594
Add:            11986.9     0.010023     0.010011     0.010036
Triad:          12173.0     0.009865     0.009858     0.009872

As it can be seen that with one thread/process, performance of Julia is 
much less than C for all the functions. However, for multi-process runs, 
Julia performs similar to C for Copy and Add functions but it's performance 
hits for Scale and Triad functions.

What could be the reason behind this ? Could this be a problem in my 
implementation or is this just the way Julia is implemented ?

Thanks

--
Kapil

Reply via email to