I think it is on julia 0.5 but it does not help. Though it produces some 
simd instructions for moving memory 
it still uses scalar float instructions for this loop for @simd version 
(similar for just @inbounds and julia -O on 0.4)

movsd (%r15,%r11,8), %xmm0    # xmm0 = mem[0],zero
movsd (%rbx,%r11,8), %xmm2    # xmm2 = mem[0],zero
subsd (%rax), %xmm0
mulsd %xmm0, %xmm0
addsd %xmm1, %xmm0
subsd 8(%rax), %xmm2
mulsd %xmm2, %xmm2
addsd %xmm0, %xmm2


On Friday, June 3, 2016 at 6:15:50 PM UTC+3, Stefan Karpinski wrote:
>
> Julia also has an `-O3` option – you could try that.
>
> On Fri, Jun 3, 2016 at 10:54 AM, Angel de Vicente <angel.vice...@gmail.com 
> <javascript:>> wrote:
>
>> Lutfullah Tomak <tomak...@gmail.com <javascript:>> writes:
>> > It may be because ifort uses proper simd instructions. Eriks's 
>> suggestion for @simd
>> > does not use simd instructions in my laptop.
>>
>> I don't see any improvement in Julia by using @simd either.
>>
>> On the other hand, vectorization with the Intel compiler definitely
>> helps, but even with vectorization off, somehow it manages to go faster
>>
>> [angelv@duna TESTS]$ ifort -no-vec -O3 -o test_ifort test.F90
>> [angelv@duna TESTS]$ ./test_ifort
>>   0.065636 seconds
>>    9363171.53179644
>> [angelv@duna TESTS]$
>>
>> --
>> Ángel de Vicente
>> http://www.iac.es/galeria/angelv/
>>
>
>

Reply via email to