On Sun, Oct 09, 2011 at 12:55:40PM +0200, Uros Bizjak wrote: > About memory - can't we use (mem:BLK (match_operand:P > "register_operand" "r")) here?
I don't think it is sufficient. Consider e.g. _mm_i32gather_pd (NULL, index, 1); where index is initialized from loading consecutive (32-bit) double * pointers from an array. Then it loads for elt 0 through 1 *(double *)(0 + index[elt]). Describing this as mem:BLK (register initialized to 0) is wrong. But even with non-zero base, say if base is a pointer pointing into a middle of some array and some offsets are positive and some negative using mem:BLK of the base would just mean non-negative offsets from it. OT, seems avx2intrin.h is weird for many of the gather patterns: E.g. the _mm_i32gather_pd inline uses: __v2df src = _mm_setzero_pd (); __v2df mask = _mm_cmpeq_pd (src, src); which will work and set mask to all ones floating point vector, but e.g. _mm256_i32gather_pd uses __v4df src = _mm256_setzero_pd (); __v4df mask = _mm256_set1_pd((double)(long long int) -1); which I believe will create a { -1.0, -1.0, -1.0, -1.0 }; vector. Either it could be __v4df src = _mm256_setzero_pd (); __v4df mask = _mm256_cmp_pd (src, src, _CMP_EQ_OQ); or it would need to be something like #define __MM_ALL_ONES_DOUBLE \ (__extension__ ((union { long long int __l; double __d; }) { __l: -1 }).__d) __v4df src = _mm256_setzero_pd (); __v4df mask = _mm256_set1_pd (__MM_ALL_ONES_DOUBLE); Though, only the most significant bit of the mask is used by the instruction and thus perhaps -1.0 is useful too. Though, it is certainly more expensive than the _mm256_cmp_pd alternative (needs to be loaded from memory). BTW, the expander probably needs some help to emit code for the second case for the third case, it loads it from memory too. > BTW: No need to use %c modifier: > > /* Meaning of CODE: > L,W,B,Q,S,T -- print the opcode suffix for specified size of operand. > C -- print opcode suffix for set/cmov insn. > c -- like C, but print reversed condition > ... > */ Ok. Jakub