I think it is on julia 0.5 but it does not help. Though it produces some simd instructions for moving memory it still uses scalar float instructions for this loop for @simd version (similar for just @inbounds and julia -O on 0.4)
movsd (%r15,%r11,8), %xmm0 # xmm0 = mem[0],zero movsd (%rbx,%r11,8), %xmm2 # xmm2 = mem[0],zero subsd (%rax), %xmm0 mulsd %xmm0, %xmm0 addsd %xmm1, %xmm0 subsd 8(%rax), %xmm2 mulsd %xmm2, %xmm2 addsd %xmm0, %xmm2 On Friday, June 3, 2016 at 6:15:50 PM UTC+3, Stefan Karpinski wrote: > > Julia also has an `-O3` option – you could try that. > > On Fri, Jun 3, 2016 at 10:54 AM, Angel de Vicente <angel.vice...@gmail.com > <javascript:>> wrote: > >> Lutfullah Tomak <tomak...@gmail.com <javascript:>> writes: >> > It may be because ifort uses proper simd instructions. Eriks's >> suggestion for @simd >> > does not use simd instructions in my laptop. >> >> I don't see any improvement in Julia by using @simd either. >> >> On the other hand, vectorization with the Intel compiler definitely >> helps, but even with vectorization off, somehow it manages to go faster >> >> [angelv@duna TESTS]$ ifort -no-vec -O3 -o test_ifort test.F90 >> [angelv@duna TESTS]$ ./test_ifort >> 0.065636 seconds >> 9363171.53179644 >> [angelv@duna TESTS]$ >> >> -- >> Ángel de Vicente >> http://www.iac.es/galeria/angelv/ >> > >