To be clear, you need to compare the final 'z' not the final 'A' to check if your calculations are consistent. The matrix A does not change through out this calculation, but the matrix z does. Also, there is no parallelism with the @parallel loop unless your start julia with 'julia -np N' where N is the number of processes you'd like to use.
On Thursday, 21 July 2016 12:45:17 UTC-4, Ferran Mazzanti wrote: > > Hi Nathan, > > I posted the codes, so you can check if they do the same thing or not. > These went to separate cells in Jupyter, nothing more and nothing less. > Not even a single line I didn't post. And yes I understand your line of > reasoning, so that's why I got astonished also. > But I can see what is making this huge difference, and I'd like to know :) > > Best, > > Ferran. > > On Thursday, July 21, 2016 at 6:31:57 PM UTC+2, Nathan Smith wrote: >> >> Hey Ferran, >> >> You should be suspicious when your apparent speed up surpasses the level >> of parallelism available on your CPU. I looks like your codes don't >> actually compute the same thing. >> >> I'm assuming you're trying to compute the matrix exponential of A >> (A^1000000000) by repeatedly multiplying A. In your parallel code, each >> process gets a local copy of 'z' and >> uses that. This means each process is computing something like >> (A^(1000000000/# of procs)). Check out this >> <http://docs.julialang.org/en/release-0.4/manual/parallel-computing/#parallel-map-and-loops> >> section >> of the documentation on parallel map and loops to see what I mean. >> >> That said, that doesn't explain your speed up completely, you should also >> make sure that each part of your script is wrapped in a function and that >> you 'warm-up' each function by running it once before comparing. >> >> Cheers, >> Nathan >> >> On Thursday, 21 July 2016 12:00:47 UTC-4, Ferran Mazzanti wrote: >>> >>> Hi, >>> >>> mostly showing my astonishment, but I can even understand the figures in >>> this stupid parallelization code >>> >>> A = [[1.0 1.0001];[1.0002 1.0003]] >>> z = A >>> tic() >>> for i in 1:1000000000 >>> z *= A >>> end >>> toc() >>> A >>> >>> produces >>> >>> elapsed time: 105.458639263 seconds >>> >>> 2x2 Array{Float64,2}: >>> 1.0 1.0001 >>> 1.0002 1.0003 >>> >>> >>> >>> But then add @parallel in the for loop >>> >>> A = [[1.0 1.0001];[1.0002 1.0003]] >>> z = A >>> tic() >>> @parallel for i in 1:1000000000 >>> z *= A >>> end >>> toc() >>> A >>> >>> and get >>> >>> elapsed time: 0.008912282 seconds >>> >>> 2x2 Array{Float64,2}: >>> 1.0 1.0001 >>> 1.0002 1.0003 >>> >>> >>> look at the elapsed time differences! And I'm running this on my Xeon >>> desktop, not even a cluster >>> Of course A-B reports >>> >>> 2x2 Array{Float64,2}: >>> 0.0 0.0 >>> 0.0 0.0 >>> >>> >>> So is this what one should expect from this kind of simple >>> paralleizations? If so, I'm definitely *in love* with Julia :):):) >>> >>> Best, >>> >>> Ferran. >>> >>> >>>