I wonder if we should provide access to DSFMT's random array generation, so 
that one can use an array generator. The requirements are that one has to 
generate at least 384 random numbers at a time or more, and the size of the 
array must necessarily be even.

We should not allow this with the global seed, and it can be through a 
randarray!() function. We can even avoid exporting this function by 
default, since there are lots of conditions it needs, but it gives really 
high performance.

-viral

On Friday, September 12, 2014 6:56:39 PM UTC+5:30, Ján Dolinský wrote:
>
> Yes, 6581 sounds like it. Thanks for the clarification.
>
> Jan 
>
> Dňa piatok, 12. septembra 2014 14:12:46 UTC+2 Andreas Noack napísal(-a):
>>
>> I think the reason for the slow down in rand since 2.1 is this
>>
>> https://github.com/JuliaLang/julia/pull/6581
>>
>> Right now we are filling the array one by one which is not efficient, but 
>> unfortunately it is our best option right now. In applications where you 
>> draw one random variate at a time there shouldn't be difference.
>>
>> Med venlig hilsen
>>
>> Andreas Noack
>>
>> 2014-09-12 4:46 GMT-04:00 Ján Dolinský:
>>
>>> Finally, I found that Octave has an equivalent to sumabs2() called 
>>> sumsq(). Just for sake of completeness here are the timings:
>>>
>>> Octave
>>> X = rand(7000);
>>> tic; sumsq(X); toc;
>>> Elapsed time is 0.0616651 seconds.
>>>
>>> Julia v0.3
>>> @time X = rand(7000,7000);
>>> elapsed time: 0.285218597 seconds (392000160 bytes allocated)
>>> @time sumabs2(X, 1);
>>> elapsed time: 0.05705666 seconds (56496 bytes allocated)
>>>
>>>
>>> Essentially speed is about the  same with Julia being a little faster.
>>>
>>> It was however interesting to observe that @time X = rand(7000,7000);
>>> is about 2.5 times slower in Julia 0.3 than it was in Julia 0.2 ...
>>>
>>> in Julia (v0.2.1):
>>>  @time X = rand(7000,7000);
>>> elapsed time: 0.114418731 seconds (392000128 bytes allocated)
>>>  
>>>
>>> Jan
>>>
>>> Dňa utorok, 9. septembra 2014 17:06:59 UTC+2 Ján Dolinský napísal(-a):
>>>>
>>>>  Hello Andreas,
>>>>
>>>> Thanks for the tip. I'll check it out. Thumbs up for the 0.4!
>>>>
>>>> Jan
>>>>
>>>>  On 09.09.2014 17:04, Andreas Noack wrote:
>>>>  
>>>> If you need the speed now you can try one of the package ArrayViews or 
>>>> ArrayViewsAPL. It is something similar to the functionality in these 
>>>> packages that we are trying to include in base.
>>>>
>>>>  Med venlig hilsen
>>>>
>>>> Andreas Noack
>>>>  
>>>> 2014-09-09 9:38 GMT-04:00 Ján Dolinský:
>>>>
>>>>>  OK, so basically there is nothing wrong with the syntax 
>>>>> X[:,1001:end] ?   
>>>>>  d = sumabs2(X[:,1001:end], 1);
>>>>>  and I should just wait until v0.4 is available (perhaps available 
>>>>> soon in Julia Nightlies PPA).
>>>>>
>>>>> I did the benchmark with the floating point power function based on 
>>>>> Simon's comment. Here are my results (after couple of repetitive 
>>>>> iterations):
>>>>>  @time X.^2;
>>>>> elapsed time: 0.511988142 seconds (392000256 bytes allocated, 2.52% gc 
>>>>> time)
>>>>> @time X.^2.0;
>>>>> elapsed time: 0.411791612 seconds (392000256 bytes allocated, 3.12% gc 
>>>>> time)
>>>>>  
>>>>> Thanks, 
>>>>> Jan Dolinsky
>>>>>
>>>>>   On 09.09.2014 14:06, Andreas Noack wrote:
>>>>>  
>>>>> The problem is that right now X[:,1001,end] makes a copy of the array. 
>>>>> However,  in 0.4 this will instead be a view of the original matrix and 
>>>>> therefore the computing time should be almost the same. 
>>>>>
>>>>>  It might also be worth repeating Simon's comment that the floating 
>>>>> point power function has special handling of 2. The result is that
>>>>>
>>>>> julia> @time A.^2;
>>>>> elapsed time: 1.402791357 seconds (200000256 bytes allocated, 5.90% gc 
>>>>> time)
>>>>>
>>>>> julia> @time A.^2.0;
>>>>> elapsed time: 0.554241105 seconds (200000256 bytes allocated, 15.04% 
>>>>> gc time) 
>>>>>
>>>>>  I tend to agree with Simon that special casing of integer 2 would be 
>>>>> reasonable.
>>>>>  
>>>>>  Med venlig hilsen
>>>>>
>>>>> Andreas Noack
>>>>>  
>>>>> 2014-09-09 4:24 GMT-04:00 Ján Dolinský:
>>>>>
>>>>>  Hello guys,
>>>>>>
>>>>>> Thanks a lot for the lengthy discussions. It helped me a lot to get a 
>>>>>> feeling on what is Julia like. I did some more performance comparisons 
>>>>>> as 
>>>>>> suggested by first two posts (thanks a lot for the tips). In the mean 
>>>>>> time 
>>>>>> I upgraded to v0.3.
>>>>>>  X = rand(7000,7000);
>>>>>> @time d = sum(X.^2, 1);
>>>>>> elapsed time: 0.573125833 seconds (392056672 bytes allocated, 2.25% 
>>>>>> gc time)
>>>>>> @time d = sum(X.*X, 1);
>>>>>> elapsed time: 0.178715901 seconds (392057080 bytes allocated, 14.06% 
>>>>>> gc time)
>>>>>> @time d = sumabs2(X, 1);
>>>>>> elapsed time: 0.067431808 seconds (56496 bytes allocated)
>>>>>>  
>>>>>> In Octave then
>>>>>>  X = rand(7000);
>>>>>> tic; d = sum(X.^2); toc;
>>>>>> Elapsed time is 0.167578 seconds.
>>>>>>  
>>>>>> So the ultimate solution is the sumabs2 function which is a blast. I 
>>>>>> am comming from Matlab/Octave and I would expect X.^2 to be fast "out of 
>>>>>> the box" but nevertheless if I can get an excellent performance by 
>>>>>> learning 
>>>>>> some new paradigms I will go for it.
>>>>>>
>>>>>> The above tests lead me to another question. I often need to 
>>>>>> calculate the "self" dot product over a portion of a matrix, e.g.
>>>>>>  @time d = sumabs2(X[:,1001:end], 1);
>>>>>> elapsed time: 0.175333366 seconds (336048688 bytes allocated, 7.01% 
>>>>>> gc time)
>>>>>>  
>>>>>> Apparently this is not a way to do it in Julia because working on a 
>>>>>> smaller matrix of 7000x6000 gives more than double computing time and 
>>>>>> furthermore it seems to allocate unnecessary memory.
>>>>>>
>>>>>> Best Regards,
>>>>>> Jan
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dňa pondelok, 8. septembra 2014 10:36:02 UTC+2 Ján Dolinský 
>>>>>> napísal(-a): 
>>>>>>  
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am a new Julia user. I am trying to write a function for computing 
>>>>>>> "self" dot product of all columns in a matrix, i.e. calculating a 
>>>>>>> square of 
>>>>>>> each element of a matrix and computing a column-wise sum. I am 
>>>>>>> interested 
>>>>>>> in a proper way of doing it because I often need to process large 
>>>>>>> matrices.
>>>>>>>
>>>>>>> I first put a focus on calculating the squares. For testing purposes 
>>>>>>> I use a matrix of random floats of size 7000x7000. All timings here are 
>>>>>>> deducted after several repetitive runs.
>>>>>>>
>>>>>>> I used to do it in Octave (v3.8.1) a follows:
>>>>>>>  tic; X = rand(7000); toc;
>>>>>>> Elapsed time is 0.579093 seconds.
>>>>>>> tic; XX = X.^2; toc;
>>>>>>> Elapsed time is 0.114737 seconds.
>>>>>>>  
>>>>>>>
>>>>>>> I tried to to the same in Julia (v0.2.1):
>>>>>>>  @time X = rand(7000,7000);
>>>>>>> elapsed time: 0.114418731 seconds (392000128 bytes allocated)
>>>>>>> @time XX = X.^2;
>>>>>>> elapsed time: 0.369641268 seconds (392000224 bytes allocated)
>>>>>>>  
>>>>>>> I was surprised to see that Julia is about 3 times slower when 
>>>>>>> calculating a square than my original routine in Octave. I then read 
>>>>>>> "Performance tips" and found out that one should use * instead of of 
>>>>>>> raising to small integer powers, for example x*x*x instead of x^3. 
>>>>>>> I therefore tested the following.
>>>>>>>  @time XX = X.*X;
>>>>>>> elapsed time: 0.146059577 seconds (392000968 bytes allocated)
>>>>>>>  
>>>>>>> This approach indeed resulted in a lot shorter computing time. It is 
>>>>>>> still however a little slower than my code in Octave. Can someone 
>>>>>>> advise on 
>>>>>>> any performance tips ?
>>>>>>>
>>>>>>> I then finally do a sum over all columns of XX to get the "self" dot 
>>>>>>> product but first I'd like to fix the squaring part.
>>>>>>>
>>>>>>> Thanks a lot. 
>>>>>>> Best Regards,
>>>>>>> Jan 
>>>>>>>
>>>>>>> p.s. In Julia manual I found a while ago an example of using 
>>>>>>> @vectorize macro with a squaring function but can not find it any more. 
>>>>>>> Perhaps the name of macro was different ... 
>>>>>>>   
>>>>>>>  
>>>>>>    
>>>>>  
>>>>>    
>>>>  
>>>>  
>>

Reply via email to