[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-12 Thread Asaf Badouh via Phabricator via cfe-commits
AsafBadouh added a comment.

LGTM




Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

kromanova wrote:
> RKSimon wrote:
> > kromanova wrote:
> > > kromanova wrote:
> > > > kromanova wrote:
> > > > > probinson wrote:
> > > > > > should this be VMOVQ/MOVQ instead?
> > > > > Probably yes. Let me know if you have a different opinion.
> > > > >  
> > > > > If I use this intrinsic by itself, clang generates VMOVSD 
> > > > > instruction. 
> > > > > It happens because the default domain is chooses to generate smaller 
> > > > > instruction code. 
> > > > > I got confused because I couldn't find Intel's documentation about 
> > > > > _mm_loadu_si64, so I just wrote a test like the one below and looked 
> > > > > what instructions got generated.
> > > > > 
> > > > > ```
> > > > > __m128i foo22 (void const * __a)
> > > > > {
> > > > >   return _mm_loadu_si64 (__a);
> > > > > }
> > > > > ```
> > > > > 
> > > > > However, if I change the test and use an intrisic to add 2 64-bit 
> > > > > integers after the load intrinsics, I can see that VMOVQ instruction 
> > > > > gets generated.
> > > > > 
> > > > > ```
> > > > > __m128d foo44 (double const * __a)
> > > > > {
> > > > >   __m128i first  = _mm_loadu_si64 (__a);
> > > > >   __m128i second = _mm_loadu_si64 (__a);
> > > > >   return _mm_add_epi64(first, second);
> > > > > 
> > > > > }
> > > > > ```
> > > > > 
> > > > > So, as you see clang could generate either VMOVSD/MOVSD or 
> > > > > VMOVSQ/MOVSQ. I think it makes sense to change the documentation as 
> > > > > Paul suggested:
> > > > > 
> > > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> > > > > 
> > > > > Or, alternatively, we could list all the instructions that correspond 
> > > > > to this intrinsics:
> > > > > 
> > > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> > > > > 
> > > > >   
> > > > It will be interesting to hear Asaf Badoug opinion, since he added this 
> > > > intrisic. He probably has access to Intel's documentation for this 
> > > > intrinsic too (which I wasn't able to find online).
> > > There is a similar situation for one intrisic just a few lines above, 
> > > namely _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or 
> > > VMOVUPS/MOVUPS instructions. 
> > > I have actually asked Simon question about it offline just a couple of 
> > > days ago. 
> > > 
> > > I decided to kept referring to VMOVUPD / MOVUPD as a corresponding 
> > > instruction for _mm_loadu_pd. However, if we end up doing things 
> > > differently for _mm_loadu_si64, we need to do a similar change to 
> > > _mm_loadu_pd (and probably to some other intrinsics).
> > It should be VMOVQ/MOVQ (note NOT VMOVSQ/MOVSQ!). Whatever the domain fixup 
> > code does to it, that was the original intent of the code and matches what 
> > other compilers says it will (probably) be.
> Yep, sorry, inaccurate editing after copy and paste. Thank you for noticing.
> I agree should say VMOVQ/MOVQ (similar to what is done for _mm_loadu_pd that 
> we discussed a few days ago).
> 
> I will do this change and reload the review shortly.
sorry for the late response.
In general, not all the intrinsics will lowered to the exact instruction that 
described in the software manual guide.
We do make effort that the intrinsics will be implemented as C-style functions, 
it can help the compiler to optimized the code.
you can see that in all the arithmetic intrinsics as example.

it seems that you already got the answer to your questions from the Simon.

BTW, Intel has nice tool that contain descriptions for all the intrinsics.
https://software.intel.com/sites/landingpage/IntrinsicsGuide/




Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Katya Romanova via Phabricator via cfe-commits
kromanova updated this revision to Diff 84038.
kromanova added a comment.

Changed the instruction name from  VMOVSD to  VMOVQ for _mm_loadu_si64


Repository:
  rL LLVM

https://reviews.llvm.org/D28503

Files:
  avxintrin.h
  emmintrin.h
  mmintrin.h
  pmmintrin.h
  xmmintrin.h

Index: xmmintrin.h
===
--- xmmintrin.h
+++ xmmintrin.h
@@ -2067,7 +2067,7 @@
 ///_MM_HINT_T1: Move data using the T1 hint. The PREFETCHT1 instruction will
 ///be generated. \n
 ///_MM_HINT_T2: Move data using the T2 hint. The PREFETCHT2 instruction will
-///be generated.   
+///be generated.
 #define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), 0, (sel)))
 #endif
 
@@ -2435,17 +2435,17 @@
 ///  For checking exception masks: _MM_MASK_UNDERFLOW, _MM_MASK_OVERFLOW,
 ///  _MM_MASK_INVALID, _MM_MASK_DENORM, _MM_MASK_DIV_ZERO, _MM_MASK_INEXACT.
 ///  There is a convenience wrapper _MM_GET_EXCEPTION_MASK().
-///
+///
 ///
 ///  For checking rounding modes: _MM_ROUND_NEAREST, _MM_ROUND_DOWN,
 ///  _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO. There is a convenience wrapper
 ///  _MM_GET_ROUNDING_MODE(x) where x is one of these macros.
 ///
-/// 
+///
 ///  For checking flush-to-zero mode: _MM_FLUSH_ZERO_ON, _MM_FLUSH_ZERO_OFF.
 ///  There is a convenience wrapper _MM_GET_FLUSH_ZERO_MODE().
 ///
-/// 
+///
 ///  For checking denormals-are-zero mode: _MM_DENORMALS_ZERO_ON,
 ///  _MM_DENORMALS_ZERO_OFF. There is a convenience wrapper
 ///  _MM_GET_DENORMALS_ZERO_MODE().
@@ -2468,11 +2468,11 @@
 unsigned int _mm_getcsr(void);
 
 /// \brief Sets the MXCSR register with the 32-bit unsigned integer value.
-///   
+///
 ///There are several groups of macros associated with this intrinsic,
 ///including:
 ///
-/// 
+///
 ///  For setting exception states: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO,
 ///  _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW,
 ///  _MM_EXCEPT_INEXACT. There is a convenience wrapper
@@ -2517,7 +2517,7 @@
 ///
 /// \param __i
 ///A 32-bit unsigned integer value to be written to the MXCSR register.
-void _mm_setcsr(unsigned int);
+void _mm_setcsr(unsigned int __i);
 
 #if defined(__cplusplus)
 } // extern "C"
Index: pmmintrin.h
===
--- pmmintrin.h
+++ pmmintrin.h
@@ -115,7 +115,7 @@
 
 /// \brief Moves and duplicates high-order (odd-indexed) values from a 128-bit
 ///vector of [4 x float] to float values stored in a 128-bit vector of
-///[4 x float]. 
+///[4 x float].
 ///
 /// \headerfile 
 ///
@@ -136,7 +136,7 @@
 }
 
 /// \brief Duplicates low-order (even-indexed) values from a 128-bit vector of
-///[4 x float] to float values stored in a 128-bit vector of [4 x float]. 
+///[4 x float] to float values stored in a 128-bit vector of [4 x float].
 ///
 /// \headerfile 
 ///
Index: mmintrin.h
===
--- mmintrin.h
+++ mmintrin.h
@@ -211,7 +211,7 @@
 /// This intrinsic corresponds to the  PUNPCKHBW  instruction.
 ///
 /// \param __m1
-///A 64-bit integer vector of [8 x i8]. \n 
+///A 64-bit integer vector of [8 x i8]. \n
 ///Bits [39:32] are written to bits [7:0] of the result. \n
 ///Bits [47:40] are written to bits [23:16] of the result. \n
 ///Bits [55:48] are written to bits [39:32] of the result. \n
Index: emmintrin.h
===
--- emmintrin.h
+++ emmintrin.h
@@ -1599,6 +1599,17 @@
   return ((struct __loadu_pd*)__dp)->__v;
 }
 
+/// \brief Loads a 64-bit integer value to the low element of a 128-bit integer
+///vector and clears the upper element.
+///
+/// \headerfile 
+///
+/// This intrinsic corresponds to the  VMOVQ / MOVQ  instruction.
+///
+/// \param __dp
+///A pointer to a 64-bit memory location. The address of the memory
+///location does not have to be aligned.
+/// \returns A 128-bit vector of [2 x i64] containing the loaded value.
 static __inline__ __m128i __DEFAULT_FN_ATTRS
 _mm_loadu_si64(void const *__a)
 {
@@ -1609,6 +1620,17 @@
   return (__m128i){__u, 0L};
 }
 
+/// \brief Loads a 64-bit double-precision value to the low element of a
+///128-bit integer vector and clears the upper element.
+///
+/// \headerfile 
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///
+/// \param __dp
+///An pointer to a memory location containing a double-precision value.
+///The address of the memory location does not have to be aligned.
+/// \returns A 128-bit vector of [2 x double] containing the loaded value.
 static __inline__ __m128d __DEFAULT_FN_ATTRS
 _mm_load_sd(double const *__dp)
 {
@@ -4019,7 +4041,7 @@
 /// \param __p
 ///A pointer to the memory location used to identify the cache line 

[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

RKSimon wrote:
> kromanova wrote:
> > kromanova wrote:
> > > kromanova wrote:
> > > > probinson wrote:
> > > > > should this be VMOVQ/MOVQ instead?
> > > > Probably yes. Let me know if you have a different opinion.
> > > >  
> > > > If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> > > > It happens because the default domain is chooses to generate smaller 
> > > > instruction code. 
> > > > I got confused because I couldn't find Intel's documentation about 
> > > > _mm_loadu_si64, so I just wrote a test like the one below and looked 
> > > > what instructions got generated.
> > > > 
> > > > ```
> > > > __m128i foo22 (void const * __a)
> > > > {
> > > >   return _mm_loadu_si64 (__a);
> > > > }
> > > > ```
> > > > 
> > > > However, if I change the test and use an intrisic to add 2 64-bit 
> > > > integers after the load intrinsics, I can see that VMOVQ instruction 
> > > > gets generated.
> > > > 
> > > > ```
> > > > __m128d foo44 (double const * __a)
> > > > {
> > > >   __m128i first  = _mm_loadu_si64 (__a);
> > > >   __m128i second = _mm_loadu_si64 (__a);
> > > >   return _mm_add_epi64(first, second);
> > > > 
> > > > }
> > > > ```
> > > > 
> > > > So, as you see clang could generate either VMOVSD/MOVSD or 
> > > > VMOVSQ/MOVSQ. I think it makes sense to change the documentation as 
> > > > Paul suggested:
> > > > 
> > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> > > > 
> > > > Or, alternatively, we could list all the instructions that correspond 
> > > > to this intrinsics:
> > > > 
> > > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> > > > 
> > > >   
> > > It will be interesting to hear Asaf Badoug opinion, since he added this 
> > > intrisic. He probably has access to Intel's documentation for this 
> > > intrinsic too (which I wasn't able to find online).
> > There is a similar situation for one intrisic just a few lines above, 
> > namely _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or 
> > VMOVUPS/MOVUPS instructions. 
> > I have actually asked Simon question about it offline just a couple of days 
> > ago. 
> > 
> > I decided to kept referring to VMOVUPD / MOVUPD as a corresponding 
> > instruction for _mm_loadu_pd. However, if we end up doing things 
> > differently for _mm_loadu_si64, we need to do a similar change to 
> > _mm_loadu_pd (and probably to some other intrinsics).
> It should be VMOVQ/MOVQ (note NOT VMOVSQ/MOVSQ!). Whatever the domain fixup 
> code does to it, that was the original intent of the code and matches what 
> other compilers says it will (probably) be.
Yep, sorry, inaccurate editing after copy and paste. Thank you for noticing.
I agree should say VMOVQ/MOVQ (similar to what is done for _mm_loadu_pd that we 
discussed a few days ago).

I will do this change and reload the review shortly.


Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Simon Pilgrim via Phabricator via cfe-commits
RKSimon added inline comments.



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

kromanova wrote:
> kromanova wrote:
> > kromanova wrote:
> > > probinson wrote:
> > > > should this be VMOVQ/MOVQ instead?
> > > Probably yes. Let me know if you have a different opinion.
> > >  
> > > If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> > > It happens because the default domain is chooses to generate smaller 
> > > instruction code. 
> > > I got confused because I couldn't find Intel's documentation about 
> > > _mm_loadu_si64, so I just wrote a test like the one below and looked what 
> > > instructions got generated.
> > > 
> > > ```
> > > __m128i foo22 (void const * __a)
> > > {
> > >   return _mm_loadu_si64 (__a);
> > > }
> > > ```
> > > 
> > > However, if I change the test and use an intrisic to add 2 64-bit 
> > > integers after the load intrinsics, I can see that VMOVQ instruction gets 
> > > generated.
> > > 
> > > ```
> > > __m128d foo44 (double const * __a)
> > > {
> > >   __m128i first  = _mm_loadu_si64 (__a);
> > >   __m128i second = _mm_loadu_si64 (__a);
> > >   return _mm_add_epi64(first, second);
> > > 
> > > }
> > > ```
> > > 
> > > So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. 
> > > I think it makes sense to change the documentation as Paul suggested:
> > > 
> > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> > > 
> > > Or, alternatively, we could list all the instructions that correspond to 
> > > this intrinsics:
> > > 
> > > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> > > 
> > >   
> > It will be interesting to hear Asaf Badoug opinion, since he added this 
> > intrisic. He probably has access to Intel's documentation for this 
> > intrinsic too (which I wasn't able to find online).
> There is a similar situation for one intrisic just a few lines above, namely 
> _mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or VMOVUPS/MOVUPS 
> instructions. 
> I have actually asked Simon question about it offline just a couple of days 
> ago. 
> 
> I decided to kept referring to VMOVUPD / MOVUPD as a corresponding 
> instruction for _mm_loadu_pd. However, if we end up doing things differently 
> for _mm_loadu_si64, we need to do a similar change to _mm_loadu_pd (and 
> probably to some other intrinsics).
It should be VMOVQ/MOVQ (note NOT VMOVSQ/MOVSQ!). Whatever the domain fixup 
code does to it, that was the original intent of the code and matches what 
other compilers says it will (probably) be.


Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Katya Romanova via Phabricator via cfe-commits
kromanova added inline comments.



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

probinson wrote:
> should this be VMOVQ/MOVQ instead?
Probably yes. Let me know if you have a different opinion.
 
If I use this intrinsic by itself, clang generates VMOVSD instruction. 
It happens because the default domain is chooses to generate smaller 
instruction code. 
I got confused because I couldn't find Intel's documentation about 
_mm_loadu_si64, so I just wrote a test like the one below and looked what 
instructions got generated.

```
__m128i foo22 (void const * __a)
{
  return _mm_loadu_si64 (__a);
}
```

However, if I change the test and use an intrisic to add 2 64-bit integers 
after the load intrinsics, I can see that VMOVQ instruction gets generated.

```
__m128d foo44 (double const * __a)
{
  __m128i first  = _mm_loadu_si64 (__a);
  __m128i second = _mm_loadu_si64 (__a);
  return _mm_add_epi64(first, second);

}
```

So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I 
think it makes sense to change the documentation as Paul suggested:

/// This intrinsic corresponds to the VMOVSQ/MOVSQ.

Or, alternatively, we could list all the instructions that correspond to this 
intrinsics:

/// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.

  



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

kromanova wrote:
> probinson wrote:
> > should this be VMOVQ/MOVQ instead?
> Probably yes. Let me know if you have a different opinion.
>  
> If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> It happens because the default domain is chooses to generate smaller 
> instruction code. 
> I got confused because I couldn't find Intel's documentation about 
> _mm_loadu_si64, so I just wrote a test like the one below and looked what 
> instructions got generated.
> 
> ```
> __m128i foo22 (void const * __a)
> {
>   return _mm_loadu_si64 (__a);
> }
> ```
> 
> However, if I change the test and use an intrisic to add 2 64-bit integers 
> after the load intrinsics, I can see that VMOVQ instruction gets generated.
> 
> ```
> __m128d foo44 (double const * __a)
> {
>   __m128i first  = _mm_loadu_si64 (__a);
>   __m128i second = _mm_loadu_si64 (__a);
>   return _mm_add_epi64(first, second);
> 
> }
> ```
> 
> So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I 
> think it makes sense to change the documentation as Paul suggested:
> 
> /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> 
> Or, alternatively, we could list all the instructions that correspond to this 
> intrinsics:
> 
> /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> 
>   
It will be interesting to hear Asaf Badoug opinion, since he added this 
intrisic. He probably has access to Intel's documentation for this intrinsic 
too (which I wasn't able to find online).



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

kromanova wrote:
> kromanova wrote:
> > probinson wrote:
> > > should this be VMOVQ/MOVQ instead?
> > Probably yes. Let me know if you have a different opinion.
> >  
> > If I use this intrinsic by itself, clang generates VMOVSD instruction. 
> > It happens because the default domain is chooses to generate smaller 
> > instruction code. 
> > I got confused because I couldn't find Intel's documentation about 
> > _mm_loadu_si64, so I just wrote a test like the one below and looked what 
> > instructions got generated.
> > 
> > ```
> > __m128i foo22 (void const * __a)
> > {
> >   return _mm_loadu_si64 (__a);
> > }
> > ```
> > 
> > However, if I change the test and use an intrisic to add 2 64-bit integers 
> > after the load intrinsics, I can see that VMOVQ instruction gets generated.
> > 
> > ```
> > __m128d foo44 (double const * __a)
> > {
> >   __m128i first  = _mm_loadu_si64 (__a);
> >   __m128i second = _mm_loadu_si64 (__a);
> >   return _mm_add_epi64(first, second);
> > 
> > }
> > ```
> > 
> > So, as you see clang could generate either VMOVSD/MOVSD or VMOVSQ/MOVSQ. I 
> > think it makes sense to change the documentation as Paul suggested:
> > 
> > /// This intrinsic corresponds to the VMOVSQ/MOVSQ.
> > 
> > Or, alternatively, we could list all the instructions that correspond to 
> > this intrinsics:
> > 
> > /// This intrinsic corresponds to the VMOVSQ/MOVSQ/VMOVSD/MOVSD.
> > 
> >   
> It will be interesting to hear Asaf Badoug opinion, since he added this 
> intrisic. He probably has access to Intel's documentation for this intrinsic 
> too (which I wasn't able to find online).
There is a similar situation for one intrisic just a few lines above, namely 
_mm_loadu_pd. It could generate either VMOVUPD / MOVUPD or VMOVUPS/MOVUPS 
instructions. 
I have actually asked 

[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-11 Thread Paul Robinson via Phabricator via cfe-commits
probinson added inline comments.



Comment at: emmintrin.h:1607
+///
+/// This intrinsic corresponds to the  VMOVSD / MOVSD  instruction.
+///

should this be VMOVQ/MOVQ instead?


Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-10 Thread Albert Gutowski via Phabricator via cfe-commits
agutowski added a comment.

For my part, LGTM.


Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D28503: Documentation for the newly added x86 intrinsics.

2017-01-10 Thread michael zuckerman via Phabricator via cfe-commits
m_zuckerman added a comment.

For my intrinsics ( _mm256_cvtsd_f64, _mm256_cvtsi256_si32 and 
_mm256_cvtss_f32) - ** LGTM**.


Repository:
  rL LLVM

https://reviews.llvm.org/D28503



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits