Re: [RFC] Slightly fix up vgather* patterns
On 10/08/2011 08:43 AM, Jakub Jelinek wrote: > (define_expand "avx2_gathersi" > - [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "") > - (unspec:VEC_GATHER_MODE > - [(match_operand:VEC_GATHER_MODE 1 "register_operand" "") > -(match_operand: 2 "memory_operand" "") > -(match_operand: 3 "register_operand" "") > -(match_operand:VEC_GATHER_MODE 4 "register_operand" "") > -(match_operand:SI 5 "const1248_operand " "")] > - UNSPEC_GATHER))] > + [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "") > +(unspec:VEC_GATHER_MODE > + [(match_operand:VEC_GATHER_MODE 1 "register_operand" "") > + (match_operand: 2 "memory_operand" "") > + (match_operand: 3 "register_operand" "") > + (match_operand:VEC_GATHER_MODE 4 "register_operand" "") > + (match_operand:SI 5 "const1248_operand " "")] > + UNSPEC_GATHER)) > + (clobber (match_dup 4))])] >"TARGET_AVX2") The use of match_dup in the clobber is wrong. We should not be clobbering the user-visible copy of the operand. That does not make sense when dealing with the user-visible builtin. > > (define_insn "*avx2_gathersi" > - [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=x") > + [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x") > (unspec:VEC_GATHER_MODE > - [(match_operand:VEC_GATHER_MODE 1 "register_operand" "0") > + [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0") > (mem: > - (match_operand:P 2 "register_operand" "r")) > -(match_operand: 3 "register_operand" "x") > -(match_operand:VEC_GATHER_MODE 4 "register_operand" "x") > -(match_operand:SI 5 "const1248_operand" "n")] > - UNSPEC_GATHER))] > + (match_operand:P 3 "register_operand" "r")) > +(match_operand: 4 "register_operand" "x") > +(match_operand:VEC_GATHER_MODE 5 "register_operand" "1") > +(match_operand:SI 6 "const1248_operand" "n")] > + UNSPEC_GATHER)) > + (clobber (match_operand:VEC_GATHER_MODE 1 "register_operand" "=&x"))] >"TARGET_AVX2" > - "vgatherd\t{%4, (%2, %3, %c5), %0|%0, (%2, %3, > %c5), %4}" > + "vgatherd\t{%1, (%3, %4, %c6), %0|%0, (%3, %4, > %c6), %1}" >[(set_attr "type" "ssemov") > (set_attr "prefix" "vex") > (set_attr "mode" "")]) Instead, use (clobber (match_scratch)) and matching constraints with operand 4. > Still, the insn description is imprecise, saying that it loads from mem > at the address register is wrong and perhaps some DCE might delete > what shouldn't be deleted. So, either it should (use (mem (scratch))) > or something similar, or in the unspec list all the memory locations > that are being read > (mem: (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI) > (parallel [(const_int N)] > for N 0 through something (but it is complicated by Pmode size vs. > the need to do nothing/truncate/sign_extend the vec_select to the right > mode). I think that a (mem (scratch)) as input to the unspec is probably best. The exact memory usage is almost certainly too complex to describe in a useful way. r~
Re: [RFC] Slightly fix up vgather* patterns
On Sun, Oct 09, 2011 at 12:55:40PM +0200, Uros Bizjak wrote: > About memory - can't we use (mem:BLK (match_operand:P > "register_operand" "r")) here? I don't think it is sufficient. Consider e.g. _mm_i32gather_pd (NULL, index, 1); where index is initialized from loading consecutive (32-bit) double * pointers from an array. Then it loads for elt 0 through 1 *(double *)(0 + index[elt]). Describing this as mem:BLK (register initialized to 0) is wrong. But even with non-zero base, say if base is a pointer pointing into a middle of some array and some offsets are positive and some negative using mem:BLK of the base would just mean non-negative offsets from it. OT, seems avx2intrin.h is weird for many of the gather patterns: E.g. the _mm_i32gather_pd inline uses: __v2df src = _mm_setzero_pd (); __v2df mask = _mm_cmpeq_pd (src, src); which will work and set mask to all ones floating point vector, but e.g. _mm256_i32gather_pd uses __v4df src = _mm256_setzero_pd (); __v4df mask = _mm256_set1_pd((double)(long long int) -1); which I believe will create a { -1.0, -1.0, -1.0, -1.0 }; vector. Either it could be __v4df src = _mm256_setzero_pd (); __v4df mask = _mm256_cmp_pd (src, src, _CMP_EQ_OQ); or it would need to be something like #define __MM_ALL_ONES_DOUBLE \ (__extension__ ((union { long long int __l; double __d; }) { __l: -1 }).__d) __v4df src = _mm256_setzero_pd (); __v4df mask = _mm256_set1_pd (__MM_ALL_ONES_DOUBLE); Though, only the most significant bit of the mask is used by the instruction and thus perhaps -1.0 is useful too. Though, it is certainly more expensive than the _mm256_cmp_pd alternative (needs to be loaded from memory). BTW, the expander probably needs some help to emit code for the second case for the third case, it loads it from memory too. > BTW: No need to use %c modifier: > > /* Meaning of CODE: >L,W,B,Q,S,T -- print the opcode suffix for specified size of operand. >C -- print opcode suffix for set/cmov insn. >c -- like C, but print reversed condition >... > */ Ok. Jakub
Re: [RFC] Slightly fix up vgather* patterns
On Sat, Oct 8, 2011 at 5:43 PM, Jakub Jelinek wrote: > The AVX2 docs say that the insns will #UD if any of the mask, src and index > registers are the same, but e.g. on > #include > > __m256 m; > float f[1024]; > > __m256 > foo (void) > { > __m256i mi = (__m256i) m; > return _mm256_mask_i32gather_ps (m, f, mi, m, 4); > } > > which is IMHO valid and should for m being zero vector just return a > zero vector and clear mask (in this case it was already cleared) we compile > it as > vmovdqa m(%rip), %ymm1 > vmovaps %ymm1, %ymm0 > vgatherdps %ymm1, (%rax, %ymm1, 4), %ymm0 > and thus IMHO it will #UD. Also, the insns should make it clear that > the mask register is modified too (the patch clobbers it, perhaps > we could instead say that it zeros the register (which is true if > it doesn't segfault), but then what if a segfault handler chooses to > continue with the next insn and doesn't clear the mask register?). > Still, the insn description is imprecise, saying that it loads from mem > at the address register is wrong and perhaps some DCE might delete > what shouldn't be deleted. So, either it should (use (mem (scratch))) > or something similar, or in the unspec list all the memory locations > that are being read > (mem: (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI) > (parallel [(const_int N)] > for N 0 through something (but it is complicated by Pmode size vs. > the need to do nothing/truncate/sign_extend the vec_select to the right > mode). > > What do you think? Regarding the clear of mask operand: I agree that this should be modelled as a clobber. Zeroing can't be guaranteed due to the fact you described above. About memory - can't we use (mem:BLK (match_operand:P "register_operand" "r")) here? BTW: No need to use %c modifier: /* Meaning of CODE: L,W,B,Q,S,T -- print the opcode suffix for specified size of operand. C -- print opcode suffix for set/cmov insn. c -- like C, but print reversed condition ... */ Uros.
[RFC] Slightly fix up vgather* patterns
Hi! The AVX2 docs say that the insns will #UD if any of the mask, src and index registers are the same, but e.g. on #include __m256 m; float f[1024]; __m256 foo (void) { __m256i mi = (__m256i) m; return _mm256_mask_i32gather_ps (m, f, mi, m, 4); } which is IMHO valid and should for m being zero vector just return a zero vector and clear mask (in this case it was already cleared) we compile it as vmovdqa m(%rip), %ymm1 vmovaps %ymm1, %ymm0 vgatherdps %ymm1, (%rax, %ymm1, 4), %ymm0 and thus IMHO it will #UD. Also, the insns should make it clear that the mask register is modified too (the patch clobbers it, perhaps we could instead say that it zeros the register (which is true if it doesn't segfault), but then what if a segfault handler chooses to continue with the next insn and doesn't clear the mask register?). Still, the insn description is imprecise, saying that it loads from mem at the address register is wrong and perhaps some DCE might delete what shouldn't be deleted. So, either it should (use (mem (scratch))) or something similar, or in the unspec list all the memory locations that are being read (mem: (plus:SI (reg:SI) (vec_select:SI (match_operand:V4SI) (parallel [(const_int N)] for N 0 through something (but it is complicated by Pmode size vs. the need to do nothing/truncate/sign_extend the vec_select to the right mode). What do you think? 2011-10-08 Jakub Jelinek * config/i386/sse.md (avx2_gathersi, avx2_gatherdi, avx2_gatherdi256): Add clobber of operand 4. (*avx2_gathersi, *avx2_gatherdi, *avx2_gatherdi256): Add clobber of the mask register, add earlyclobber to both output operands. --- gcc/config/i386/sse.md.jj 2011-10-07 10:03:27.0 +0200 +++ gcc/config/i386/sse.md 2011-10-08 17:14:50.0 +0200 @@ -12521,55 +12521,59 @@ (define_mode_attr VEC_GATHER_MODE (V8SI "V8SI") (V8SF "V8SI")]) (define_expand "avx2_gathersi" - [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "") - (unspec:VEC_GATHER_MODE - [(match_operand:VEC_GATHER_MODE 1 "register_operand" "") - (match_operand: 2 "memory_operand" "") - (match_operand: 3 "register_operand" "") - (match_operand:VEC_GATHER_MODE 4 "register_operand" "") - (match_operand:SI 5 "const1248_operand " "")] - UNSPEC_GATHER))] + [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "") + (unspec:VEC_GATHER_MODE +[(match_operand:VEC_GATHER_MODE 1 "register_operand" "") + (match_operand: 2 "memory_operand" "") + (match_operand: 3 "register_operand" "") + (match_operand:VEC_GATHER_MODE 4 "register_operand" "") + (match_operand:SI 5 "const1248_operand " "")] +UNSPEC_GATHER)) + (clobber (match_dup 4))])] "TARGET_AVX2") (define_insn "*avx2_gathersi" - [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=x") + [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "=&x") (unspec:VEC_GATHER_MODE - [(match_operand:VEC_GATHER_MODE 1 "register_operand" "0") + [(match_operand:VEC_GATHER_MODE 2 "register_operand" "0") (mem: -(match_operand:P 2 "register_operand" "r")) - (match_operand: 3 "register_operand" "x") - (match_operand:VEC_GATHER_MODE 4 "register_operand" "x") - (match_operand:SI 5 "const1248_operand" "n")] - UNSPEC_GATHER))] +(match_operand:P 3 "register_operand" "r")) + (match_operand: 4 "register_operand" "x") + (match_operand:VEC_GATHER_MODE 5 "register_operand" "1") + (match_operand:SI 6 "const1248_operand" "n")] + UNSPEC_GATHER)) + (clobber (match_operand:VEC_GATHER_MODE 1 "register_operand" "=&x"))] "TARGET_AVX2" - "vgatherd\t{%4, (%2, %3, %c5), %0|%0, (%2, %3, %c5), %4}" + "vgatherd\t{%1, (%3, %4, %c6), %0|%0, (%3, %4, %c6), %1}" [(set_attr "type" "ssemov") (set_attr "prefix" "vex") (set_attr "mode" "")]) (define_expand "avx2_gatherdi" - [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "") - (unspec:VEC_GATHER_MODE - [(match_operand:VEC_GATHER_MODE 1 "register_operand" "") - (match_operand: 2 "memory_operand" "") - (match_operand: 3 "register_operand" "") - (match_operand:VEC_GATHER_MODE 4 "register_operand" "") - (match_operand:SI 5 "const1248_operand " "")] - UNSPEC_GATHER))] + [(parallel [(set (match_operand:VEC_GATHER_MODE 0 "register_operand" "") + (unspec:VEC_GATHER_MODE +[(match_operand:VEC_GATHER_MODE 1 "register_operand" "") + (match_operand: 2 "memory_operand" "") + (match_operand: 3 "register_operand" "") + (match_operand:VEC_GATHER_MODE 4 "register_operand