The order of operations is from left to right so the parentheses can be 
important here. We have discussed ways of executing more efficiently for 
matrix products, see https://github.com/JuliaLang/julia/issues/12065, but 
so far nothing has been implemented. In that issue, you can also see the 
reason why an explicit transpose is computed in one of your examples.

On Monday, October 26, 2015 at 3:26:12 PM UTC-4, Michael Lindon wrote:
>
> I have a somewhat large matrix-vector product to calculate, where the 
> matrix is not full rank. I can perform a low rank factorization of the nxn 
> Matrix M as M=LL' where L is nxp.  I'd like to use this information to 
> speed up the computation. Here is a minimal example:
>
> x=rand(10000,100)                                                         
>                                                                             
>                                                              
> v=rand(10000)                                                             
>                                                                             
>                                                              
> xx=x*x'                                                                   
>                                                                             
>                                                              
> xt=x'                                                                     
>                                                                             
>                                                              
> @time xx*v;                                                               
>                                                                             
>                                                              
> @time x*x'*v;                                                             
>                                                                             
>                                                              
> @time x*(x'*v);                                                           
>                                                                             
>                                                              
> @time x*(xt*v);
>
> here are the timings:
>
> julia> @time xx*v;
>   0.038634 seconds (8 allocations: 78.375 KB)
> julia> @time x*x'*v;
>   0.385250 seconds (17 allocations: 770.646 MB, 13.14% gc time)
> julia> @time x*(x'*v);
>   0.000662 seconds (10 allocations: 79.266 KB)
> julia> @time x*(xt*v);
>   0.000819 seconds (10 allocations: 79.266 KB)
>
> I would like to understand why x*(x'*v) is much faster than  x*x'*v, and 
> uses less memory. I would have thought the order of operations would have 
> gone from right to left but it seems the parentheses made a huge 
> difference. I also might have expected that x' would just flick a TRANS='T" 
> switch in the BLAS implementation but by the looks of it there is a very 
> large amount of memory allocation going on in x*x'*v.
>

Reply via email to