[Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)

2014-10-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from Jakub Jelinek  ---
.


[Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)

2014-10-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #5 from Jakub Jelinek  ---
Trying icc 14.0.2.144 Build 2014012, I see that
a) it indeed fails to report the bug in your source
b) when using -c, it silently discards the upper 8 bits of the immediate, so
   you end up with:
   0:c4 e3 7d 0e c1 cdvpblendw $0xcd,%ymm1,%ymm0,%ymm0
c) when using -S, it generates invalid assembly:
vpblendw  $43981, %ymm1, %ymm0, %ymm0   #4.16
   which doesn't assemble at least with gas.
So, I believe erroring out on this is significantly better than what icc does
with it.


[Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)

2014-10-29 Thread peter.bumbulis at ianywhere dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #4 from Peter Bumbulis  ---
(In reply to Peter Bumbulis from comment #2)
> The referenced web page is incorrect.  Look in the instruction set reference
> manual
> (https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf,
> search for VPBLENDMW) or the intrinsics guide
> (https://software.intel.com/sites/landingpage/IntrinsicsGuide/).
> 
> These instructions blend 16 bit quantities:  you can fit 16 of these in a
> 256 bit register.  For AVX512 it's a 32-bit constant.

My mistake:  it looks like the generated code only uses the low 8 bytes.  Sorry
for any wasted bandwidth.


[Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)

2014-10-29 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #3 from Jakub Jelinek  ---
(In reply to Peter Bumbulis from comment #2)
> The referenced web page is incorrect.  Look in the instruction set reference
> manual
> (https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf,
> search for VPBLENDMW) or the intrinsics guide
> (https://software.intel.com/sites/landingpage/IntrinsicsGuide/).
> 
> These instructions blend 16 bit quantities:  you can fit 16 of these in a
> 256 bit register.  For AVX512 it's a 32-bit constant.

Your first reference is AVX512 documentation, _mm256_blend_epi16 is not
_mm256_mask_blend_epi16.  _mm256_blend_epi16 is for VPBLENDW instruction, and
the
https://software.intel.com/sites/landingpage/IntrinsicsGuide/ looks incorrect,
because it doesn't describe what the VPBLENDW instruction does.  In particular,
it only has 8-bit immediate, and both 128-bit lanes are blended the same given
that mask:
IF (imm8[0] == 1) THEN DEST[15:0] <- SRC2[15:0]
ELSE DEST[15:0] <- SRC1[15:0]
IF (imm8[1] == 1) THEN DEST[31:16] <- SRC2[31:16]
ELSE DEST[31:16] <- SRC1[31:16]
IF (imm8[2] == 1) THEN DEST[47:32] <- SRC2[47:32]
ELSE DEST[47:32] <- SRC1[47:32]
IF (imm8[3] == 1) THEN DEST[63:48] <- SRC2[63:48]
ELSE DEST[63:48] <- SRC1[63:48]
IF (imm8[4] == 1) THEN DEST[79:64] <- SRC2[79:64]
ELSE DEST[79:64] <- SRC1[79:64]
IF (imm8[5] == 1) THEN DEST[95:80] <- SRC2[95:80]
ELSE DEST[95:80] <- SRC1[95:80]
IF (imm8[6] == 1) THEN DEST[111:96] <- SRC2[111:96]
ELSE DEST[111:96] <- SRC1[111:96]
IF (imm8[7] == 1) THEN DEST[127:112] <- SRC2[127:112]
ELSE DEST[127:112] <- SRC1[127:112]
IF (imm8[0] == 1) THEN DEST[143:128] <- SRC2[143:128]
ELSE DEST[143:128] <- SRC1[143:128]
IF (imm8[1] == 1) THEN DEST[159:144] <- SRC2[159:144]
ELSE DEST[159:144] <- SRC1[159:144]
IF (imm8[2] == 1) THEN DEST[175:160] <- SRC2[175:160]
ELSE DEST[175:160] <- SRC1[175:160]
IF (imm8[3] == 1) THEN DEST[191:176] <- SRC2[191:176]
ELSE DEST[191:176] <- SRC1[191:176]
IF (imm8[4] == 1) THEN DEST[207:192] <- SRC2[207:192]
ELSE DEST[207:192] <- SRC1[207:192]
IF (imm8[5] == 1) THEN DEST[223:208] <- SRC2[223:208]
ELSE DEST[223:208] <- SRC1[223:208]
IF (imm8[6] == 1) THEN DEST[239:224] <- SRC2[239:224]
ELSE DEST[239:224] <- SRC1[239:224]
IF (imm8[7] == 1) THEN DEST[255:240] <- SRC2[255:240]
ELSE DEST[255:240] <- SRC1[255:240]


[Bug target/63678] __mm256_blend_epi16 only accepts 8-bit masks (should accept 16-bit)

2014-10-29 Thread peter.bumbulis at ianywhere dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63678

--- Comment #2 from Peter Bumbulis  ---
The referenced web page is incorrect.  Look in the instruction set reference
manual
(https://software.intel.com/sites/default/files/managed/c6/a9/319433-020.pdf,
search for VPBLENDMW) or the intrinsics guide
(https://software.intel.com/sites/landingpage/IntrinsicsGuide/).

These instructions blend 16 bit quantities:  you can fit 16 of these in a 256
bit register.  For AVX512 it's a 32-bit constant.