On Monday 24 April 2017, Jakub Jelinek wrote: > On Mon, Apr 24, 2017 at 11:01:29AM +0200, Allan Sandfeld Jensen wrote: > > On Monday 24 April 2017, Jakub Jelinek wrote: > > > On Mon, Apr 24, 2017 at 10:34:58AM +0200, Allan Sandfeld Jensen wrote: > > > > That is a different instruction. That is the vpsllw not vpsllwi > > > > > > > > The intrinsics I changed is the immediate version, I didn't change > > > > the non- immediate version. It is probably a bug if you can give > > > > non-immediate values to the immediate only intrinsic. At least both > > > > versions handles it, if in different ways, but is is illegal > > > > arguments. > > > > > > The documentation is unclear on that and I've only recently fixed up > > > some cases where these intrinsics weren't able to handle non-constant > > > arguments in some cases, while both ICC and clang coped with that > > > fine. > > > So it is clearly allowed and handled by all the compilers and needs to > > > be supported, people use that in real-world code. > > > > Undoubtedly it happens. I just make a mistake myself that created that > > case. But it is rather unfortunate, and means we make wrong code > > currently for corner case values. > > The intrinsic documentation is poor, usually you have a good documentation > on what the instructions do, and then you just have to guess what the > intrinsics do. You can of course ask Intel for clarification. > > If you try: > #include <x86intrin.h> > > __m128i > foo (__m128i a, int b) > { > return _mm_slli_epi16 (a, b); > } > and call it with 257 from somewhere else, you can see that all the > compilers will give you zero vector. And similarly if you use 257 > literally instead of b. So what the intrinsic (unlike the instruction) > actually does is that it compares all bits of the imm8 argument (supposedly > using unsigned comparison) and if it is bigger than 15 (or 7 or 31 or 63 > depending on the bitsize of element) it yields 0 vector. > Good point. I was using intel's documentation at https://software.intel.com/sites/landingpage/IntrinsicsGuide/, but if all compilers including us does something else, practicality wins.
It did make me curious and test out what _mm_slli_epi16(v, -250); compiles to. For some reason that becomes an undefined shift using the non-immediate sll in gcc, but returns the zero-vector in clang. With my patch it was a 6 bit shift, but that is apparently not de-facto standard. `Allan