Your code is long enough that I, for one, don't have time to dig into it 
myself. But as a guideline, Julia should not be massively slower than C, 
particularly on what seem (upon casual inspection) like very straightforward 
benchmarks.

Have you read the "Performance tips" section of the manual and used the tools 
there to investigate it yourself?

http://docs.julialang.org/en/latest/manual/performance-tips/

--Tim

On Friday, October 31, 2014 11:16:44 AM Kapil Agarwal wrote:
> Hi
> 
> This is my first experiment with Julia and I wanted to share some results.
> I have ported the STREAM benchmark (http://www.cs.virginia.edu/stream/) to
> Julia. The code is available on github
> (https://github.com/kapiliitr/JuliaBenchmarks/blob/master/streamp.jl).
> 
> I am getting the following performance results in Julia -
> 
> Array size = 5000000 (elements), Offset = 0 (elements)
> Memory per array = 38.14697265625 MiB (= 0.03725290298461914 GiB)
> Total memory required = 114.44091796875 MiB (= 0.11175870895385742 GiB)
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:              43.0     1.885108     1.861376     1.908840
> Scale:             37.1     2.166505     2.155083     2.177926
> Add:               48.2     2.532873     2.487158     2.578587
> Triad:             43.1     2.787225     2.784426     2.790023
> 
> I am getting the following performance results in C -
> 
> Array size = 5000000 (elements), Offset = 0 (elements)
> Memory per array = 38.1 MiB (= 0.0 GiB).
> Total memory required = 114.4 MiB (= 0.1 GiB).
> Each kernel will be executed 3 times.
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:            8553.3     0.009360     0.009353     0.009366
> Scale:           8248.4     0.009712     0.009699     0.009726
> Add:             9490.6     0.012987     0.012644     0.013329
> Triad:           9032.0     0.013540     0.013286     0.013793
> 
> 
> Following are the results with 4 processors in Julia-
> 
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           11122.2     0.007308     0.007193     0.007423
> Scale:            465.5     0.217924     0.171840     0.264008
> Add:            12481.8     0.009678     0.009614     0.009742
> Triad:            471.3     0.267199     0.254624     0.279775
> 
> 
> Following are the results with  4 omp threads in C-
> 
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           11077.0     0.007228     0.007222     0.007233
> Scale:          10552.7     0.007587     0.007581     0.007594
> Add:            11986.9     0.010023     0.010011     0.010036
> Triad:          12173.0     0.009865     0.009858     0.009872
> 
> As it can be seen that with one thread/process, performance of Julia is
> much less than C for all the functions. However, for multi-process runs,
> Julia performs similar to C for Copy and Add functions but it's performance
> hits for Scale and Triad functions.
> 
> What could be the reason behind this ? Could this be a problem in my
> implementation or is this just the way Julia is implemented ?
> 
> Thanks
> 
> --
> Kapil

Reply via email to