[PATCH] D47175: [DOXYGEN] Formatting changes for better intrinsics documentation rendering

2018-05-21 Thread Katya Romanova via Phabricator via cfe-commits
kromanova created this revision.
Herald added a subscriber: cfe-commits.

Below are a few doxygen intrisics documentation changes requested by our 
documentation team:

(1) I added some \see cross-references to a few select intrinsics that are 
related (and have the same or very similar semantics). 
https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdsee

If someone's version of doxygen doesn't support \see directive, please speak 
up! We will try to come up with a different solution.

bmiintrin.h
__bextr_u32 and _bextr_u32
__bextr_u64 and _bextr_u64

lzcntintrin.h:
__lzcnt32 and _lzcnt_u32
__lzcnt64 and _lzcnt_u64

(2) pmmintrin.h, smmintrin.h, xmmintrin.h have very few minor formatting 
changes. They make rendering of our intrinsics documentation better. I don't 
foresee these changes affect anyone's documentation negatively.


Repository:
  rC Clang

https://reviews.llvm.org/D47175

Files:
  lib/Headers/bmiintrin.h
  lib/Headers/lzcntintrin.h
  lib/Headers/pmmintrin.h
  lib/Headers/smmintrin.h
  lib/Headers/xmmintrin.h

Index: lib/Headers/xmmintrin.h
===
--- lib/Headers/xmmintrin.h
+++ lib/Headers/xmmintrin.h
@@ -2495,10 +2495,14 @@
 ///
 ///For example, the following expression checks if an overflow exception has
 ///occurred:
+///\code
 ///  ( _mm_getcsr() & _MM_EXCEPT_OVERFLOW )
+///\endcode
 ///
 ///The following expression gets the current rounding mode:
+///\code
 ///  _MM_GET_ROUNDING_MODE()
+///\endcode
 ///
 /// \headerfile 
 ///
Index: lib/Headers/smmintrin.h
===
--- lib/Headers/smmintrin.h
+++ lib/Headers/smmintrin.h
@@ -493,7 +493,7 @@
 /// \param __V2
 ///A 128-bit vector of [16 x i8].
 /// \param __M
-///A 128-bit vector operand, with mask bits 127, 119, 111 ... 7 specifying
+///A 128-bit vector operand, with mask bits 127, 119, 111...7 specifying
 ///how the values are to be copied. The position of the mask bit corresponds
 ///to the most significant bit of a copied value. When a mask bit is 0, the
 ///corresponding 8-bit element in operand \a __V1 is copied to the same
@@ -1277,8 +1277,8 @@
 /// This intrinsic corresponds to the  VPMOVSXBD / PMOVSXBD  instruction.
 ///
 /// \param __V
-///A 128-bit vector of [16 x i8]. The lower four 8-bit elements are sign-
-///extended to 32-bit values.
+///A 128-bit vector of [16 x i8]. The lower four 8-bit elements are
+///sign-extended to 32-bit values.
 /// \returns A 128-bit vector of [4 x i32] containing the sign-extended values.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_cvtepi8_epi32(__m128i __V)
@@ -1298,8 +1298,8 @@
 /// This intrinsic corresponds to the  VPMOVSXBQ / PMOVSXBQ  instruction.
 ///
 /// \param __V
-///A 128-bit vector of [16 x i8]. The lower two 8-bit elements are sign-
-///extended to 64-bit values.
+///A 128-bit vector of [16 x i8]. The lower two 8-bit elements are
+///sign-extended to 64-bit values.
 /// \returns A 128-bit vector of [2 x i64] containing the sign-extended values.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_cvtepi8_epi64(__m128i __V)
@@ -1319,8 +1319,8 @@
 /// This intrinsic corresponds to the  VPMOVSXWD / PMOVSXWD  instruction.
 ///
 /// \param __V
-///A 128-bit vector of [8 x i16]. The lower four 16-bit elements are sign-
-///extended to 32-bit values.
+///A 128-bit vector of [8 x i16]. The lower four 16-bit elements are
+///sign-extended to 32-bit values.
 /// \returns A 128-bit vector of [4 x i32] containing the sign-extended values.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_cvtepi16_epi32(__m128i __V)
@@ -1338,8 +1338,8 @@
 /// This intrinsic corresponds to the  VPMOVSXWQ / PMOVSXWQ  instruction.
 ///
 /// \param __V
-///A 128-bit vector of [8 x i16]. The lower two 16-bit elements are sign-
-///extended to 64-bit values.
+///A 128-bit vector of [8 x i16]. The lower two 16-bit elements are
+/// sign-extended to 64-bit values.
 /// \returns A 128-bit vector of [2 x i64] containing the sign-extended values.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_cvtepi16_epi64(__m128i __V)
@@ -1357,8 +1357,8 @@
 /// This intrinsic corresponds to the  VPMOVSXDQ / PMOVSXDQ  instruction.
 ///
 /// \param __V
-///A 128-bit vector of [4 x i32]. The lower two 32-bit elements are sign-
-///extended to 64-bit values.
+///A 128-bit vector of [4 x i32]. The lower two 32-bit elements are
+///sign-extended to 64-bit values.
 /// \returns A 128-bit vector of [2 x i64] containing the sign-extended values.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_cvtepi32_epi64(__m128i __V)
@@ -1377,8 +1377,8 @@
 /// This intrinsic corresponds to the  VPMOVZXBW / PMOVZXBW  instruction.
 ///
 /// \param __V
-///A 128-bit vector of [16 x i8]. The lower eight 8-bit elements are zero-
-///extended to 16-bit values.
+///A 128-bit 

[PATCH] D41888: [DOXYGEN] documentation changes to emmintrin.h and tmmintrin.h

2018-02-15 Thread Katya Romanova via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL325312: [DOXYGEN] There was a request in the review D41507 
to change the notation for… (authored by kromanova, committed by ).
Herald added a subscriber: llvm-commits.

Changed prior to commit:
  https://reviews.llvm.org/D41888?vs=134537=134548#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D41888

Files:
  cfe/trunk/lib/Headers/emmintrin.h
  cfe/trunk/lib/Headers/tmmintrin.h

Index: cfe/trunk/lib/Headers/tmmintrin.h
===
--- cfe/trunk/lib/Headers/tmmintrin.h
+++ cfe/trunk/lib/Headers/tmmintrin.h
@@ -276,8 +276,9 @@
 }
 
 /// \brief Horizontally adds the adjacent pairs of values contained in 2 packed
-///128-bit vectors of [8 x i16]. Positive sums greater than 7FFFh are
-///saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
+///128-bit vectors of [8 x i16]. Positive sums greater than 0x7FFF are
+///saturated to 0x7FFF. Negative sums less than 0x8000 are saturated to
+///0x8000.
 ///
 /// \headerfile 
 ///
@@ -300,8 +301,9 @@
 }
 
 /// \brief Horizontally adds the adjacent pairs of values contained in 2 packed
-///64-bit vectors of [4 x i16]. Positive sums greater than 7FFFh are
-///saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
+///64-bit vectors of [4 x i16]. Positive sums greater than 0x7FFF are
+///saturated to 0x7FFF. Negative sums less than 0x8000 are saturated to
+///0x8000.
 ///
 /// \headerfile 
 ///
@@ -417,8 +419,8 @@
 
 /// \brief Horizontally subtracts the adjacent pairs of values contained in 2
 ///packed 128-bit vectors of [8 x i16]. Positive differences greater than
-///7FFFh are saturated to 7FFFh. Negative differences less than 8000h are
-///saturated to 8000h.
+///0x7FFF are saturated to 0x7FFF. Negative differences less than 0x8000 are
+///saturated to 0x8000.
 ///
 /// \headerfile 
 ///
@@ -442,8 +444,8 @@
 
 /// \brief Horizontally subtracts the adjacent pairs of values contained in 2
 ///packed 64-bit vectors of [4 x i16]. Positive differences greater than
-///7FFFh are saturated to 7FFFh. Negative differences less than 8000h are
-///saturated to 8000h.
+///0x7FFF are saturated to 0x7FFF. Negative differences less than 0x8000 are
+///saturated to 0x8000.
 ///
 /// \headerfile 
 ///
Index: cfe/trunk/lib/Headers/emmintrin.h
===
--- cfe/trunk/lib/Headers/emmintrin.h
+++ cfe/trunk/lib/Headers/emmintrin.h
@@ -422,8 +422,8 @@
 }
 
 /// \brief Compares each of the corresponding double-precision values of the
-///128-bit vectors of [2 x double] for equality. Each comparison yields 0h
-///for false, h for true.
+///128-bit vectors of [2 x double] for equality. Each comparison yields 0x0
+///for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -443,7 +443,7 @@
 /// \brief Compares each of the corresponding double-precision values of the
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are less than those in the second operand. Each comparison
-///yields 0h for false, h for true.
+///yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -464,7 +464,7 @@
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are less than or equal to those in the second operand.
 ///
-///Each comparison yields 0h for false, h for true.
+///Each comparison yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -485,7 +485,7 @@
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are greater than those in the second operand.
 ///
-///Each comparison yields 0h for false, h for true.
+///Each comparison yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -506,7 +506,7 @@
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are greater than or equal to those in the second operand.
 ///
-///Each comparison yields 0h for false, h for true.
+///Each comparison yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -528,8 +528,8 @@
 ///operand are ordered with respect to those in the second operand.
 ///
 ///A pair of double-precision values are "ordered" with respect to each
-///other if neither value is a NaN. Each comparison yields 0h for false,
-///h for true.
+///other if neither value is a NaN. Each comparison yields 0x0 for false,
+///0x for true.
 ///
 /// \headerfile 
 ///
@@ -551,8 +551,8 @@
 ///operand are unordered with respect to those in the second operand.
 ///
 ///A 

[PATCH] D41888: [DOXYGEN] documentation changes to emmintrin.h and tmmintrin.h

2018-02-15 Thread Katya Romanova via Phabricator via cfe-commits
kromanova updated this revision to Diff 134537.
kromanova added a comment.

Doug Yung (the reviewer for this patch) noticed that some lines exceed 80 
characters limitation. I fixed that.


https://reviews.llvm.org/D41888

Files:
  lib/Headers/emmintrin.h
  lib/Headers/tmmintrin.h

Index: lib/Headers/tmmintrin.h
===
--- lib/Headers/tmmintrin.h
+++ lib/Headers/tmmintrin.h
@@ -276,8 +276,9 @@
 }
 
 /// \brief Horizontally adds the adjacent pairs of values contained in 2 packed
-///128-bit vectors of [8 x i16]. Positive sums greater than 7FFFh are
-///saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
+///128-bit vectors of [8 x i16]. Positive sums greater than 0x7FFF are
+///saturated to 0x7FFF. Negative sums less than 0x8000 are saturated to
+///0x8000.
 ///
 /// \headerfile 
 ///
@@ -300,8 +301,9 @@
 }
 
 /// \brief Horizontally adds the adjacent pairs of values contained in 2 packed
-///64-bit vectors of [4 x i16]. Positive sums greater than 7FFFh are
-///saturated to 7FFFh. Negative sums less than 8000h are saturated to 8000h.
+///64-bit vectors of [4 x i16]. Positive sums greater than 0x7FFF are
+///saturated to 0x7FFF. Negative sums less than 0x8000 are saturated to
+///0x8000.
 ///
 /// \headerfile 
 ///
@@ -417,8 +419,8 @@
 
 /// \brief Horizontally subtracts the adjacent pairs of values contained in 2
 ///packed 128-bit vectors of [8 x i16]. Positive differences greater than
-///7FFFh are saturated to 7FFFh. Negative differences less than 8000h are
-///saturated to 8000h.
+///0x7FFF are saturated to 0x7FFF. Negative differences less than 0x8000 are
+///saturated to 0x8000.
 ///
 /// \headerfile 
 ///
@@ -442,8 +444,8 @@
 
 /// \brief Horizontally subtracts the adjacent pairs of values contained in 2
 ///packed 64-bit vectors of [4 x i16]. Positive differences greater than
-///7FFFh are saturated to 7FFFh. Negative differences less than 8000h are
-///saturated to 8000h.
+///0x7FFF are saturated to 0x7FFF. Negative differences less than 0x8000 are
+///saturated to 0x8000.
 ///
 /// \headerfile 
 ///
Index: lib/Headers/emmintrin.h
===
--- lib/Headers/emmintrin.h
+++ lib/Headers/emmintrin.h
@@ -422,8 +422,8 @@
 }
 
 /// \brief Compares each of the corresponding double-precision values of the
-///128-bit vectors of [2 x double] for equality. Each comparison yields 0h
-///for false, h for true.
+///128-bit vectors of [2 x double] for equality. Each comparison yields 0x0
+///for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -443,7 +443,7 @@
 /// \brief Compares each of the corresponding double-precision values of the
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are less than those in the second operand. Each comparison
-///yields 0h for false, h for true.
+///yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -464,7 +464,7 @@
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are less than or equal to those in the second operand.
 ///
-///Each comparison yields 0h for false, h for true.
+///Each comparison yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -485,7 +485,7 @@
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are greater than those in the second operand.
 ///
-///Each comparison yields 0h for false, h for true.
+///Each comparison yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -506,7 +506,7 @@
 ///128-bit vectors of [2 x double] to determine if the values in the first
 ///operand are greater than or equal to those in the second operand.
 ///
-///Each comparison yields 0h for false, h for true.
+///Each comparison yields 0x0 for false, 0x for true.
 ///
 /// \headerfile 
 ///
@@ -528,8 +528,8 @@
 ///operand are ordered with respect to those in the second operand.
 ///
 ///A pair of double-precision values are "ordered" with respect to each
-///other if neither value is a NaN. Each comparison yields 0h for false,
-///h for true.
+///other if neither value is a NaN. Each comparison yields 0x0 for false,
+///0x for true.
 ///
 /// \headerfile 
 ///
@@ -551,8 +551,8 @@
 ///operand are unordered with respect to those in the second operand.
 ///
 ///A pair of double-precision values are "unordered" with respect to each
-///other if one or both values are NaN. Each comparison yields 0h for false,
-///h for true.
+///other if one or both values are NaN. Each comparison yields 0x0 

[PATCH] D28462: clang-format: Add new style option AlignConsecutiveMacros

2018-02-01 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added subscribers: kromanova, alexfh.
kromanova added a comment.

We have a request for this feature in clang-format in Sony.


Repository:
  rL LLVM

https://reviews.llvm.org/D28462



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41523: xmmintrin.h documentation fixes and updates

2018-01-08 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added a comment.

In https://reviews.llvm.org/D41523#968776, @craig.topper wrote:

> The builtins are tested in tests like test/CodeGen/sse-builtins.c,


Thank you!

I wonder if -Wdocumentation is working... 
I have enabled it for a few tests, like avx-builtins.c, sse-builtins.c and 
re-run the tests; everything was fine, no errors.

I was suspicious that -Wdocumentation might not be catching the errors and 
intentionally broke a few doxygen comments in avxintrin.h header for  
_mm256_add_pd and _mm256_add_ps (these intrinsics were the first ones used in 
avx-builtins.c) by mismatching the parameter names in the doxygen comments and 
in definitions, by removing doxygen comments section describing the parameter 
names and eventually by removing the entire doxygen comment for these 
intrinsics. However, -Wdocumentation -Werror haven't reported any errors.

Am I missing something? What kind of "documentation" problems -Wocumentation 
option catches?


https://reviews.llvm.org/D41523



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41516: emmintrin.h documentation fixes and updates

2018-01-08 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: cfe/trunk/lib/Headers/emmintrin.h:4683
 ///
-/// This intrinsic has no corresponding instruction.
+/// This intrinsic corresponds to the  MOVDQ2Q  instruction.
 ///

efriedma wrote:
> kromanova wrote:
> > kromanova wrote:
> > > I'm not sure about this change.
> > > 
> > > Intel documentation says they generate MOVDQ2Q (don't have icc handy to 
> > > try).
> > > However, I've tried on Linux/X86_64 with clang and gcc, - and we just 
> > > return.
> > > 
> > Though I suspect it's possible to generate movdq2q, I couldn't come up with 
> > an test to trigger this instruction generation.
> > Should we revert this change?
> > 
> > 
> > ```
> > __m64 fooepi64_pi64 (__m128i a, __m128 c)
> > {
> >   __m64 x;
> > 
> >   x = _mm_movepi64_pi64 (a);
> >   return x;
> > }
> > 
> > ```
> > 
> > on Linux we generate return instruction. 
> > I would expect (v)movq %xmm0,%rax to be generated instead of retq. 
> > Am I missing something? Why do we return 64 bit integer in xmm register 
> > rather than in %rax?
> > 
> The x86-64 calling convention rules say that __m64 is passed/returned in SSE 
> registers.
> 
> Try the following, which generates movdq2q:
> ```
> __m64 foo(__m128i a, __m128 c)
> {
>   return _mm_add_pi8(_mm_movepi64_pi64(a), _mm_set1_pi8(5));
> }
> ```
Thanks! That explains it :)
I can see that MOVDQ2Q gets generated. 

What about intrinsic below, _mm_movpi64_epi64? Can we ever generate MOVD+VMOVQ 
as stated in the review? 
Or should we write VMOVQ / MOVQ?


Repository:
  rL LLVM

https://reviews.llvm.org/D41516



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41516: emmintrin.h documentation fixes and updates

2018-01-08 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.
Herald added a subscriber: llvm-commits.



Comment at: cfe/trunk/lib/Headers/emmintrin.h:3865
 ///
-/// This intrinsic corresponds to the  VPUNPCKLQDQ / PUNPCKLQDQ 
-///   instruction.
+/// This intrinsic does not correspond to a specific instruction.
 ///

It's better if you use the same language as for many intrinsics "before" and 
"after". Just for consistency purpose.

```
/// This intrinsic is a utility function and does not correspond to a specific
///instruction.



Comment at: cfe/trunk/lib/Headers/emmintrin.h:4683
 ///
-/// This intrinsic has no corresponding instruction.
+/// This intrinsic corresponds to the  MOVDQ2Q  instruction.
 ///

kromanova wrote:
> I'm not sure about this change.
> 
> Intel documentation says they generate MOVDQ2Q (don't have icc handy to try).
> However, I've tried on Linux/X86_64 with clang and gcc, - and we just return.
> 
Though I suspect it's possible to generate movdq2q, I couldn't come up with an 
test to trigger this instruction generation.
Should we revert this change?


```
__m64 fooepi64_pi64 (__m128i a, __m128 c)
{
  __m64 x;

  x = _mm_movepi64_pi64 (a);
  return x;
}

```

on Linux we generate return instruction. 
I would expect (v)movq %xmm0,%rax to be generated instead of retq. 
Am I missing something? Why do we return 64 bit integer in xmm register rather 
than in %rax?




Comment at: cfe/trunk/lib/Headers/emmintrin.h:4700
 ///
-/// This intrinsic corresponds to the  VMOVQ / MOVQ / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD+VMOVQ  instruction.
 ///

For Linux x86_64 I can only generate VMOVQ (or MOVQ) instructions respectively 
for AVX/non-AVX case.
Can we even generate MOVD+VMOVQ?
How we want to document this intrinsic?


I have a similar question as above.
```
__m128i foopi64_epi64 (__m64 a)
{
  __m128i x;

  x = _mm_movpi64_epi64 (a);
  return x;
}
```

Why we generate this code 
```
vmovq   %xmm0, %rax
vmovq   %rax, %xmm0
retq
}
```
instead of something simple like vmovq %rdi, %xmm0? 





Repository:
  rL LLVM

https://reviews.llvm.org/D41516



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41507: avxintrin.h documentation fixes and updates

2018-01-08 Thread Katya Romanova via Phabricator via cfe-commits
kromanova accepted this revision.
kromanova added a comment.

LGTM too.


https://reviews.llvm.org/D41507



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41523: xmmintrin.h documentation fixes and updates

2018-01-05 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added a comment.

In https://reviews.llvm.org/D41523#968359, @RKSimon wrote:

> Sort of related - should we enable -Wdocumentation (it's currently -Wall and 
> -Weverything might be too much) on the respective clang builtin tests? 
> Doesn't have to be part of this patch.


Good idea, Simon. I will give it a shot. Could you please let me know the name 
of the directory, where clang builtin tests are located?


https://reviews.llvm.org/D41523



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41516: emmintrin.h documentation fixes and updates

2018-01-04 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: cfe/trunk/lib/Headers/emmintrin.h:1143
 ///
-///If either of the two lower double-precision values is NaN, 1 is 
returned.
+///If either of the two lower double-precision values is NaN, 0 is 
returned.
 ///

Formatting is inconsistent with the rest of the changes above or below. One 
sentence here separated by the empty lines, where everywhere else it's 2 
sentences.



Comment at: cfe/trunk/lib/Headers/emmintrin.h:4683
 ///
-/// This intrinsic has no corresponding instruction.
+/// This intrinsic corresponds to the  MOVDQ2Q  instruction.
 ///

I'm not sure about this change.

Intel documentation says they generate MOVDQ2Q (don't have icc handy to try).
However, I've tried on Linux/X86_64 with clang and gcc, - and we just return.



Repository:
  rL LLVM

https://reviews.llvm.org/D41516



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41523: xmmintrin.h documentation fixes and updates

2018-01-04 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: lib/Headers/xmmintrin.h:2199
 ///
-/// This intrinsic corresponds to the  VPINSRW / PINSRW  instruction.
+/// This intrinsic corresponds to the  PINSRW  instruction.
 ///

craig.topper wrote:
> Why is VPINSRW removed?
I suspect the rational is the same I talked about in mmintrin.h review. 
This intrinsic should use MMX registers and shouldn't have corresponding AVX 
instruction(s).

I've tried this and with or without -mavx for Linux/x86_64 we generate PINSRW 
in both cases (i.e. I wasn't able to trigger generation of VEX prefixed 
instruction).

__m64 foo (__m64 a, int b)
{
  __m64 x;
  x = _mm_insert_pi16 (a, b, 0);
  return x;
}




Comment at: lib/Headers/xmmintrin.h:2659
 ///
-/// This intrinsic corresponds to the  VMOVSS / MOVSS  instruction.
+/// This intrinsic corresponds to the  VBLENDPS / BLENDPS  instruction.
 ///

craig.topper wrote:
> MOVSS is correct for pre SSE4.1 targets.
That's correct.
Doug, I think we should write:
VBLENDPS / BLENDPS / MOVSS


https://reviews.llvm.org/D41523



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41517: mmintrin.h documentation fixes and updates

2018-01-04 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: lib/Headers/mmintrin.h:55
 ///
-/// This intrinsic corresponds to the  VMOVD / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

I tried clang on Linux, x86_64, and if -mavx option is passed, we generate 
VMOVD, if this option is omitted, we generate MOVD.
I think I understand the rational behind this change (namely, to keep MOVD, but 
remove VMOVD),
since this intrinsic should use MMX registers and shouldn't have corresponding 
AVX instruction(s).

However, that's what we generate at the moment when -mavx is passed (I suspect 
because our MMX support is limited)
vmovd   %edi, %xmm0

Since we are writing the documentation for clang compiler, we should document 
what clang compiler is doing, not what is should be doing.
Craig, what do you think? Should we revert back to VMOVD/MOVD?




Comment at: lib/Headers/mmintrin.h:72
 ///
-/// This intrinsic corresponds to the  VMOVD / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

Same as above.



Comment at: lib/Headers/mmintrin.h:88
 ///
-/// This intrinsic corresponds to the  VMOVQ / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

craig.topper wrote:
> Shouldn't this be MOVQ?
Yes, that's correct, (MOVQ) + the same question as above whether we should keep 
VMOVQ/MOVQ.



Comment at: lib/Headers/mmintrin.h:104
 ///
-/// This intrinsic corresponds to the  VMOVQ / MOVD  instruction.
+/// This intrinsic corresponds to the  MOVD  instruction.
 ///

craig.topper wrote:
> Shouldn't this be MOVQ?
Same as above.


https://reviews.llvm.org/D41517



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41517: mmintrin.h documentation fixes and updates

2018-01-03 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: lib/Headers/mmintrin.h:1292
 ///
-/// This intrinsic corresponds to the  VXORPS / XORPS  instruction.
+/// This intrinsic corresponds to the  XOR  instruction.
 ///

craig.topper wrote:
> PXOR?
For which platform/compiler? 
 
I checked, for x86_64 Linux XORPS(no avx)/VXORPS (with -mavx) is generated.
For PS4 we generate XORL.

I guess, we need to write something more generic, implying that an appropriate 
platform-specific XOR instruction is generated. 



Comment at: lib/Headers/mmintrin.h:1384
 ///
-/// This intrinsic corresponds to the  VPSHUFD / PSHUFD  instruction.
+/// This intrinsic corresponds to the  PSHUFD  instruction.
 ///

craig.topper wrote:
> This is overly specific there is no guarantee we'd use those instructions. If 
> it was a constant we'd probably just use a load.
That's right. I think we should use the following wording to match other 
_mm_set* intrinsics documentation in this file.

/// This intrinsic is a utility function and does not correspond to a specific
///instruction.



https://reviews.llvm.org/D41517



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D41507: avxintrin.h documentation fixes and updates

2017-12-21 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: lib/Headers/avxintrin.h:1668
 ///operation to use: \n
-///0x00 : Equal (ordered, non-signaling)
-///0x01 : Less-than (ordered, signaling)
-///0x02 : Less-than-or-equal (ordered, signaling)
-///0x03 : Unordered (non-signaling)
-///0x04 : Not-equal (unordered, non-signaling)
-///0x05 : Not-less-than (unordered, signaling)
-///0x06 : Not-less-than-or-equal (unordered, signaling)
-///0x07 : Ordered (non-signaling)
-///0x08 : Equal (unordered, non-signaling)
-///0x09 : Not-greater-than-or-equal (unordered, signaling)
-///0x0a : Not-greater-than (unordered, signaling)
-///0x0b : False (ordered, non-signaling)
-///0x0c : Not-equal (ordered, non-signaling)
-///0x0d : Greater-than-or-equal (ordered, signaling)
-///0x0e : Greater-than (ordered, signaling)
-///0x0f : True (unordered, non-signaling)
-///0x10 : Equal (ordered, signaling)
-///0x11 : Less-than (ordered, non-signaling)
-///0x12 : Less-than-or-equal (ordered, non-signaling)
-///0x13 : Unordered (signaling)
-///0x14 : Not-equal (unordered, signaling)
-///0x15 : Not-less-than (unordered, non-signaling)
-///0x16 : Not-less-than-or-equal (unordered, non-signaling)
-///0x17 : Ordered (signaling)
-///0x18 : Equal (unordered, signaling)
-///0x19 : Not-greater-than-or-equal (unordered, non-signaling)
-///0x1a : Not-greater-than (unordered, non-signaling)
-///0x1b : False (ordered, signaling)
-///0x1c : Not-equal (ordered, signaling)
-///0x1d : Greater-than-or-equal (ordered, non-signaling)
-///0x1e : Greater-than (ordered, non-signaling)
-///0x1f : True (unordered, signaling)
+///00h: Equal (ordered, non-signaling) \n
+///01h: Less-than (ordered, signaling) \n

lebedev.ri wrote:
> While i'm incompetent to actually review this, i have a question.
> //Why// replace code-friendly `0x00` with `00h`?
> And, //why// is this hex in the first place? Why not decimals?
> 
>> why is this hex in the first place? Why not decimals?
There are a few reasons that I could think of: 

(1) for consistency with the defines in the header file describing these values:
#define _CMP_EQ_OQ0x00 /* Equal (ordered, non-signaling)  */
#define _CMP_LT_OS0x01 /* Less-than (ordered, signaling)  */
#define _CMP_LE_OS0x02 /* Less-than-or-equal (ordered, signaling)  */
#define _CMP_UNORD_Q  0x03 /* Unordered (non-signaling)  */
#define _CMP_NEQ_UQ   0x04 /* Not-equal (unordered, non-signaling)  */

(2) This immediate is 5 bits, so when using hex, it's much easier to tell which 
bits are set/not set/
(3) For consistency with Intel's and AMD's documentation for intrinsics and 
corresponding instructions. 


I s developers prefer to have it in 0x format, this change it was done for 
consistency. 
AMD and Intel manuals use the 'h' suffix style



Comment at: lib/Headers/avxintrin.h:1668
 ///operation to use: \n
-///0x00 : Equal (ordered, non-signaling)
-///0x01 : Less-than (ordered, signaling)
-///0x02 : Less-than-or-equal (ordered, signaling)
-///0x03 : Unordered (non-signaling)
-///0x04 : Not-equal (unordered, non-signaling)
-///0x05 : Not-less-than (unordered, signaling)
-///0x06 : Not-less-than-or-equal (unordered, signaling)
-///0x07 : Ordered (non-signaling)
-///0x08 : Equal (unordered, non-signaling)
-///0x09 : Not-greater-than-or-equal (unordered, signaling)
-///0x0a : Not-greater-than (unordered, signaling)
-///0x0b : False (ordered, non-signaling)
-///0x0c : Not-equal (ordered, non-signaling)
-///0x0d : Greater-than-or-equal (ordered, signaling)
-///0x0e : Greater-than (ordered, signaling)
-///0x0f : True (unordered, non-signaling)
-///0x10 : Equal (ordered, signaling)
-///0x11 : Less-than (ordered, non-signaling)
-///0x12 : Less-than-or-equal (ordered, non-signaling)
-///0x13 : Unordered (signaling)
-///0x14 : Not-equal (unordered, signaling)
-///0x15 : Not-less-than (unordered, non-signaling)
-///0x16 : Not-less-than-or-equal (unordered, non-signaling)
-///0x17 : Ordered (signaling)
-///0x18 : Equal (unordered, signaling)
-///0x19 : Not-greater-than-or-equal (unordered, non-signaling)
-///0x1a : Not-greater-than (unordered, non-signaling)
-///0x1b : False (ordered, signaling)
-///0x1c : Not-equal (ordered, signaling)
-///0x1d : Greater-than-or-equal (ordered, non-signaling)
-///0x1e : Greater-than (ordered, non-signaling)
-///0x1f : True (unordered, signaling)
+///00h: Equal (ordered, non-signaling) \n
+///01h: Less-than (ordered, signaling) \n

kromanova wrote:
> lebedev.ri wrote:
> > While i'm incompetent to actually review this, i have a question.
> > //Why// replace code-friendly `0x00` with `00h`?
> > And, //why// is this hex in the first place? Why not decimals?
> > 
> >> why is this hex 

[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Katya Romanova via Phabricator via cfe-commits
kromanova updated this revision to Diff 84038.
kromanova added a comment.

Changed the instruction name from  VMOVSD to  VMOVQ for _mm_loadu_si64


Repository:
  rL LLVM

https://reviews.llvm.org/D28503

Files:
  avxintrin.h
  emmintrin.h
  mmintrin.h
  pmmintrin.h
  xmmintrin.h

Index: xmmintrin.h
===
--- xmmintrin.h
+++ xmmintrin.h
@@ -2067,7 +2067,7 @@
 ///_MM_HINT_T1: Move data using the T1 hint. The PREFETCHT1 instruction will
 ///be generated. \n
 ///_MM_HINT_T2: Move data using the T2 hint. The PREFETCHT2 instruction will
-///be generated.   
+///be generated.
 #define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
 #endif
 
@@ -2435,17 +2435,17 @@
 ///  For checking exception masks: _MM_MASK_UNDERFLOW, _MM_MASK_OVERFLOW,
 ///  _MM_MASK_INVALID, _MM_MASK_DENORM, _MM_MASK_DIV_ZERO, _MM_MASK_INEXACT.
 ///  There is a convenience wrapper _MM_GET_EXCEPTION_MASK().
-///
+///
 ///
 ///  For checking rounding modes: _MM_ROUND_NEAREST, _MM_ROUND_DOWN,
 ///  _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO. There is a convenience wrapper
 ///  _MM_GET_ROUNDING_MODE(x) where x is one of these macros.
 ///
-/// 
+///
 ///  For checking flush-to-zero mode: _MM_FLUSH_ZERO_ON, _MM_FLUSH_ZERO_OFF.
 ///  There is a convenience wrapper _MM_GET_FLUSH_ZERO_MODE().
 ///
-/// 
+///
 ///  For checking denormals-are-zero mode: _MM_DENORMALS_ZERO_ON,
 ///  _MM_DENORMALS_ZERO_OFF. There is a convenience wrapper
 ///  _MM_GET_DENORMALS_ZERO_MODE().
@@ -2468,11 +2468,11 @@
 unsigned int _mm_getcsr(void);
 
 /// \brief Sets the MXCSR register with the 32-bit unsigned integer value.
-///   
+///
 ///There are several groups of macros associated with this intrinsic,
 ///including:
 ///
-/// 
+///
 ///  For setting exception states: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO,
 ///  _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW,
 ///  _MM_EXCEPT_INEXACT. There is a convenience wrapper
@@ -2517,7 +2517,7 @@
 ///
 /// \param __i
 ///A 32-bit unsigned integer value to be written to the MXCSR register.
-void _mm_setcsr(unsigned int);
+void _mm_setcsr(unsigned int __i);
 
 #if defined(__cplusplus)
 } // extern "C"
Index: pmmintrin.h
===
--- pmmintrin.h
+++ pmmintrin.h
@@ -115,7 +115,7 @@
 
 /// \brief Moves and duplicates high-order (odd-indexed) values from a 128-bit
 ///vector of [4 x float] to float values stored in a 128-bit vector of
-///[4 x float]. 
+///[4 x float].
 ///
 /// \headerfile 
 ///
@@ -136,7 +136,7 @@
 }
 
 /// \brief Duplicates low-order (even-indexed) values from a 128-bit vector of
-///[4 x float] to float values stored in a 128-bit vector of [4 x float]. 
+///[4 x float] to float values stored in a 128-bit vector of [4 x float].
 ///
 /// \headerfile 
 ///
Index: mmintrin.h
===
--- mmintrin.h
+++ mmintrin.h
@@ -211,7 +211,7 @@
 /// This intrinsic corresponds to the  PUNPCKHBW  instruction.
 ///
 /// \param __m1
-///A 64-bit integer vector of [8 x i8]. \n 
+///A 64-bit integer vector of [8 x i8]. \n
 ///Bits [39:32] are written to bits [7:0] of the result. \n
 ///Bits [47:40] are written to bits [23:16] of the result. \n
 ///Bits [55:48] are written to bits [39:32] of the result. \n
Index: emmintrin.h
===
--- emmintrin.h
+++ emmintrin.h
@@ -1599,6 +1599,17 @@
   return ((struct __loadu_pd*)__dp)->__v;
 }
 
+/// \brief Loads a 64-bit integer value to the low element of a 128-bit integer
+///vector and clears the upper element.
+///
+/// \headerfile 
+///
+/// This intrinsic corresponds to the  VMOVQ / MOVQ  instruction.
+///
+/// \param __dp
+///A pointer to a 64-bit memory location. The address of the memory
+///location does not have to be aligned.
+/// \returns A 128-bit vector of [2 x i64] containing the loaded value.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_loadu_si64(void const *__a)
 {
@@ -1609,6 +1620,17 @@
   return (__m128i){__u, 0L};
 }
 
+/// \brief Loads a 64-bit double-precision value to the low element of a
+///128-bit integer vector and clears the upper element.
+///
+/// \headerfile 
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///
+/// \param __dp
+///An pointer to a memory location containing a double-precision value.
+///The address of the memory location does not have to be aligned.
+/// \returns A 128-bit vector of [2 x double] containing the loaded value.
 static __inline__ __m128d __DEFAULT_FN_ATTRS
 _mm_load_sd(double const *__dp)
 {
@@ -4019,7 +4041,7 @@
 /// \param __p
 ///A pointer to the memory location used to identify the cache line 

[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

RKSimon wrote:
> kromanova wrote:
> > kromanova wrote:
> > > kromanova wrote:
> > > > probinson wrote:
> > > > > should this be VMOVQ/MOVQ instead?
> > > > Probably yes. Let me know if you have a different opinion.
> > > >  
> > > > If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> > > > It happens because the default domain is chooses to generate smaller 
> > > > instruction code. 
> > > > I got confused because I couldn't find Intel's documentation about 
> > > > _mm_loadu_si64, so I just wrote a test like the one below and looked 
> > > > what instructions got generated.
> > > > 
> > > > ```
> > > > __m128i foo22 (void const * __a)
> > > > {
> > > >   return _mm_loadu_si64 (__a);
> > > > }
> > > > ```
> > > > 
> > > > However, if I change the test and use an intrisic to add 2 64-bit 
> > > > integers after the load intrinsics, I can see that VMOVQ instruction 
> > > > gets generated.
> > > > 
> > > > ```
> > > > __m128d foo44 (double const * __a)
> > > > {
> > > >   __m128i first  = _mm_loadu_si64 (__a);
> > > >   __m128i second = _mm_loadu_si64 (__a);
> > > >   return _mm_add_epi64(first, second);
> > > > 
> > > > }
> > > > ```
> > > > 
> > > > So, as you see clang could generate either VMOVSD/MOVSD or 
> > > > VMOVSQ/MOVSQ. I think it makes sense to change the documentation as 
> > > > Paul suggested:
> > > > 
> > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> > > > 
> > > > Or, alternatively, we could list all the instructions that correspond 
> > > > to this intrinsics:
> > > > 
> > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> > > > 
> > > >   
> > > It will be interesting to hear Asaf Badoug opinion, since he added this 
> > > intrisic. He probably has access to Intel's documentation for this 
> > > intrinsic too (which I wasn't able to find online).
> > There is a similar situation for one intrisic just a few lines above, 
> > namely _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or 
> > VMOVUPS/MOVUPS instructions. 
> > I have actually asked Simon question about it offline just a couple of days 
> > ago. 
> > 
> > I decided to kept referring to VMOVUPD / MOVUPD as a corresponding 
> > instruction for _mm_loadu_pd. However, if we end up doing things 
> > differently for _mm_loadu_si64, we need to do a similar change to 
> > _mm_loadu_pd (and probably to some other intrinsics).
> It should be VMOVQ/MOVQ (note NOT VMOVSQ/MOVSQ!). Whatever the domain fixup 
> code does to it, that was the original intent of the code and matches what 
> other compilers says it will (probably) be.
Yep, sorry, inaccurate editing after copy and paste. Thank you for noticing.
I agree should say VMOVQ/MOVQ (similar to what is done for _mm_loadu_pd that we 
discussed a few days ago).

I will do this change and reload the review shortly.


Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

probinson wrote:
> should this be VMOVQ/MOVQ instead?
Probably yes. Let me know if you have a different opinion.
 
If I use this intrinsic by itself, clang generates VMOVSD instruction. 
It happens because the default domain is chooses to generate smaller 
instruction code. 
I got confused because I couldn't find Intel's documentation about 
_mm_loadu_si64, so I just wrote a test like the one below and looked what 
instructions got generated.

```
__m128i foo22 (void const * __a)
{
  return _mm_loadu_si64 (__a);
}
```

However, if I change the test and use an intrisic to add 2 64-bit integers 
after the load intrinsics, I can see that VMOVQ instruction gets generated.

```
__m128d foo44 (double const * __a)
{
  __m128i first  = _mm_loadu_si64 (__a);
  __m128i second = _mm_loadu_si64 (__a);
  return _mm_add_epi64(first, second);

}
```

So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I 
think it makes sense to change the documentation as Paul suggested:

/// This intrinsic corresponds to the VMOVSQ/MOVSQ.

Or, alternatively, we could list all the instructions that correspond to this 
intrinsics:

/// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.

  



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

kromanova wrote:
> probinson wrote:
> > should this be VMOVQ/MOVQ instead?
> Probably yes. Let me know if you have a different opinion.
>  
> If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> It happens because the default domain is chooses to generate smaller 
> instruction code. 
> I got confused because I couldn't find Intel's documentation about 
> _mm_loadu_si64, so I just wrote a test like the one below and looked what 
> instructions got generated.
> 
> ```
> __m128i foo22 (void const * __a)
> {
>   return _mm_loadu_si64 (__a);
> }
> ```
> 
> However, if I change the test and use an intrisic to add 2 64-bit integers 
> after the load intrinsics, I can see that VMOVQ instruction gets generated.
> 
> ```
> __m128d foo44 (double const * __a)
> {
>   __m128i first  = _mm_loadu_si64 (__a);
>   __m128i second = _mm_loadu_si64 (__a);
>   return _mm_add_epi64(first, second);
> 
> }
> ```
> 
> So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I 
> think it makes sense to change the documentation as Paul suggested:
> 
> /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> 
> Or, alternatively, we could list all the instructions that correspond to this 
> intrinsics:
> 
> /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> 
>   
It will be interesting to hear Asaf Badoug opinion, since he added this 
intrisic. He probably has access to Intel's documentation for this intrinsic 
too (which I wasn't able to find online).



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

kromanova wrote:
> kromanova wrote:
> > probinson wrote:
> > > should this be VMOVQ/MOVQ instead?
> > Probably yes. Let me know if you have a different opinion.
> >  
> > If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> > It happens because the default domain is chooses to generate smaller 
> > instruction code. 
> > I got confused because I couldn't find Intel's documentation about 
> > _mm_loadu_si64, so I just wrote a test like the one below and looked what 
> > instructions got generated.
> > 
> > ```
> > __m128i foo22 (void const * __a)
> > {
> >   return _mm_loadu_si64 (__a);
> > }
> > ```
> > 
> > However, if I change the test and use an intrisic to add 2 64-bit integers 
> > after the load intrinsics, I can see that VMOVQ instruction gets generated.
> > 
> > ```
> > __m128d foo44 (double const * __a)
> > {
> >   __m128i first  = _mm_loadu_si64 (__a);
> >   __m128i second = _mm_loadu_si64 (__a);
> >   return _mm_add_epi64(first, second);
> > 
> > }
> > ```
> > 
> > So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I 
> > think it makes sense to change the documentation as Paul suggested:
> > 
> > /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> > 
> > Or, alternatively, we could list all the instructions that correspond to 
> > this intrinsics:
> > 
> > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> > 
> >   
> It will be interesting to hear Asaf Badoug opinion, since he added this 
> intrisic. He probably has access to Intel's documentation for this intrinsic 
> too (which I wasn't able to find online).
There is a similar situation for one intrisic just a few lines above, namely 
_mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or VMOVUPS/MOVUPS 
instructions. 
I have actually asked