I get similar results for OpenBLAS. I expect that axpy gains more from
vectorization than dot.

On Fri, Sep 9, 2016 at 5:31 PM, Sheehan Olver <dlfivefi...@gmail.com> wrote:

> I did blas_set_num_threads(1) with the same profile numbers.  This is
> using Apple’s BLAS.
>
> Maybe I’ll try 0.5 and OpenBLAS for comparison.
>
> On 10 Sep 2016, at 2:34 AM, Andreas Noack <andreasnoackjen...@gmail.com>
> wrote:
>
> Try to time it again with threading disabled. Sometimes the
> threading heuristics can cause unintuitive performance.
>
> On Friday, September 9, 2016 at 6:39:13 AM UTC-4, Sheehan Olver wrote:
>>
>>
>> I have the following code that is part of a Householder routine, where 
>> j::Int64,
>> N::Int64, R.cols::Vector{Int64}, wp::Ptr{Float64}, M::Int64,
>> v::Ptr{Float64}:
>>
>>   …
>>         for j=k:N
>>             v=r+(R.cols[j]+k-2)*sz
>>             dt=BLAS.dot(M,wp,1,v,1)
>>             BLAS.axpy!(M,-2*dt,wp,1,v,1)
>>         end
>>     …
>>
>>
>>
>> For some reason, the BLAS.dot call takes 3x as long as the BLAS.axpy!
>> call.  Is this expected, or is there something wrong?
>>
>>
>>
>

Reply via email to