kromanova added inline comments.
================ Comment at: emmintrin.h:1607 +/// +/// This intrinsic corresponds to the <c> VMOVSD / MOVSD </c> instruction. +/// ---------------- probinson wrote: > should this be VMOVQ/MOVQ instead? Probably yes. Let me know if you have a different opinion. If I use this intrinsic by itself, clang generates VMOVSD instruction. It happens because the default domain is chooses to generate smaller instruction code. I got confused because I couldn't find Intel's documentation about _mm_loadu_si64, so I just wrote a test like the one below and looked what instructions got generated. ``` __m128i foo22 (void const * __a) { return _mm_loadu_si64 (__a); } ``` However, if I change the test and use an intrisic to add 2 64-bit integers after the load intrinsics, I can see that VMOVQ instruction gets generated. ``` __m128d foo44 (double const * __a) { __m128i first = _mm_loadu_si64 (__a); __m128i second = _mm_loadu_si64 (__a); return _mm_add_epi64(first, second); } ``` So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I think it makes sense to change the documentation as Paul suggested: /// This intrinsic corresponds to the VMOVSQ/MOVSQ. Or, alternatively, we could list all the instructions that correspond to this intrinsics: /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD. ================ Comment at: emmintrin.h:1607 +/// +/// This intrinsic corresponds to the <c> VMOVSD / MOVSD </c> instruction. +/// ---------------- kromanova wrote: > probinson wrote: > > should this be VMOVQ/MOVQ instead? > Probably yes. Let me know if you have a different opinion. > > If I use this intrinsic by itself, clang generates VMOVSD instruction. > It happens because the default domain is chooses to generate smaller > instruction code. > I got confused because I couldn't find Intel's documentation about > _mm_loadu_si64, so I just wrote a test like the one below and looked what > instructions got generated. > > ``` > __m128i foo22 (void const * __a) > { > return _mm_loadu_si64 (__a); > } > ``` > > However, if I change the test and use an intrisic to add 2 64-bit integers > after the load intrinsics, I can see that VMOVQ instruction gets generated. > > ``` > __m128d foo44 (double const * __a) > { > __m128i first = _mm_loadu_si64 (__a); > __m128i second = _mm_loadu_si64 (__a); > return _mm_add_epi64(first, second); > > } > ``` > > So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I > think it makes sense to change the documentation as Paul suggested: > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ. > > Or, alternatively, we could list all the instructions that correspond to this > intrinsics: > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD. > > It will be interesting to hear Asaf Badoug opinion, since he added this intrisic. He probably has access to Intel's documentation for this intrinsic too (which I wasn't able to find online). ================ Comment at: emmintrin.h:1607 +/// +/// This intrinsic corresponds to the <c> VMOVSD / MOVSD </c> instruction. +/// ---------------- kromanova wrote: > kromanova wrote: > > probinson wrote: > > > should this be VMOVQ/MOVQ instead? > > Probably yes. Let me know if you have a different opinion. > > > > If I use this intrinsic by itself, clang generates VMOVSD instruction. > > It happens because the default domain is chooses to generate smaller > > instruction code. > > I got confused because I couldn't find Intel's documentation about > > _mm_loadu_si64, so I just wrote a test like the one below and looked what > > instructions got generated. > > > > ``` > > __m128i foo22 (void const * __a) > > { > > return _mm_loadu_si64 (__a); > > } > > ``` > > > > However, if I change the test and use an intrisic to add 2 64-bit integers > > after the load intrinsics, I can see that VMOVQ instruction gets generated. > > > > ``` > > __m128d foo44 (double const * __a) > > { > > __m128i first = _mm_loadu_si64 (__a); > > __m128i second = _mm_loadu_si64 (__a); > > return _mm_add_epi64(first, second); > > > > } > > ``` > > > > So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I > > think it makes sense to change the documentation as Paul suggested: > > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ. > > > > Or, alternatively, we could list all the instructions that correspond to > > this intrinsics: > > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD. > > > > > It will be interesting to hear Asaf Badoug opinion, since he added this > intrisic. He probably has access to Intel's documentation for this intrinsic > too (which I wasn't able to find online). There is a similar situation for one intrisic just a few lines above, namely _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or VMOVUPS/MOVUPS instructions. I have actually asked Simon question about it offline just a couple of days ago. I decided to kept referring to VMOVUPD / MOVUPD as a corresponding instruction for _mm_loadu_pd. However, if we end up doing things differently for _mm_loadu_si64, we need to do a similar change to _mm_loadu_pd (and probably to some other intrinsics). Repository: rL LLVM https://reviews.llvm.org/D28503 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits