On 16 Mar 07:12, Ulrich Drepper wrote:
> [This patch is so far really meant for commenting.  I haven't tested it
> at all yet.]
> 
> Intel's intrinsic specification includes one set which currently is not
> defined in gcc's headers: the _mm*_undefined_* intrinsics.
What specification are talking about? As far as I know they are present
in ICC headers, but not in manuals such as:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
> The purpose of these instrinsics (currently three classes, three formats
> each) is to create a pseudo-value the compiler does not assume is
> uninitialized without incurring any code doing so.  The purpose is to
> use these intrinsics in places where it is known the value of a register
> is never used.  This is already important with AVX2 and becomes really
> crucial with AVX512.
> 
> Currently three different techniques are used:
> 
> - _mm*_setzero_*() is used.  Even though the XOR operation does not
>   cost anything it still messes with the instruction scheduling and
>   more code is generated.
> 
> - another parameter is duplicated.  This leads most of the time to
>   one additional move instruction.
> 
> - uninitialized variables are used (this is in new AVX512 code).  The
>   compiler should generate warnings for these headers.  I haven't
>   tried it.
Uninitialized variables certainly are bad. Replacing them with
setzero/undefined is a good idea.
Also in most AVX512 cases those values shouldn't be present in code.
They are either optimized away in case of -1 mask or result in
zero-masking being applied. Do you know of any cases where xor is
generated (except for destination in gather/scatter)
> 
> Using the _mm*_undefined_*() intrinsics is much cleaner and also
> potentially allows to generate better code.
> 
> For now the implementation uses an inline asm to suggest to the compiler
> that the variable is initialized.  This does not prevent a real register
> to be allocated for this purpose but it saves the XOR instruction.
> 
> The correct and optimal implementation will require a compiler built-in
> which will do something different based on how the value is used:
> 
> - if the value is never modified then any register should be picked.
>   In function/intrinsic calls the parameter simply need not be loaded at
>   all.
> 
> - if the value is modified (and allocated to a register or memory
>   location) no initialization for the variable is needed (equivalent
>   to the asm now).
> 
> 
> The questions are:
> 
> - is there interest in adding the necessary compiler built-in?
> 
> - if yes, anyone interested in working on this?
> 
> - and: is it worth adding a patch like the on here in the meantime?
> 
> As it stands now gcc's instrinsics are not complete and programs following
> Intel's manuals can fail to compile.
>
Compatibility with ICC is certainly good. I tried your patch, and
undefined is similar in behavior to setzero, but it also clobbers
flags. Maybe just define it to setzero for now?
> 
> 
> 2014-03-16  Ulrich Drepper  <drep...@gmail.com>
> 
>       * config/i386/avxintrin.h (_mm256_undefined_si256): Define.
>       (_mm256_undefined_ps): Define.
>       (_mm256_undefined_pd): Define.
>       * config/i386/emmintrin.h (_mm_undefined_si128): Define.
>       (_mm_undefined_pd): Define.
>       * config/i386/xmmintrin.h (_mm_undefined_ps): Define.
>       * config/i386/avx512fintrin.h (_mm512_undefined_si512): Define.
>       (_mm512_undefined_ps): Define.
>       (_mm512_undefined_pd): Define.
>       Use _mm*_undefined_*.
>       * config/i386/avx2intrin.h: Use _mm*_undefined_*.
> 

Reply via email to