[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

Uroš Bizjak  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|ubizjak at gmail dot com   |unassigned at gcc dot 
gnu.org

--- Comment #9 from Uroš Bizjak  ---
Oh well ... it looks that the implementation wandered into the areas of the
compiler I'm not familiar with ... 

Unassigning myself, considering that at the end of a day, the prototype patch
looks more like a band-aid for some different problem.

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #8 from Jakub Jelinek  ---
That does something different though.  But there is in C:
  if (targetm.calls.promote_prototypes (fundecl ? TREE_TYPE
(fundecl) : 0)
  && INTEGRAL_TYPE_P (type)
  && (TYPE_PRECISION (type) < TYPE_PRECISION
(integer_type_node)))
parmval = default_conversion (parmval);
and in C++:
  else if (targetm.calls.promote_prototypes (type)
   && INTEGRAL_TYPE_P (type)
   && COMPLETE_TYPE_P (type)
   && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE
(integer_type_node)))
type = integer_type_node;
and
  else if (targetm.calls.promote_prototypes (type)
   && INTEGRAL_TYPE_P (type)
   && COMPLETE_TYPE_P (type)
   && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE
(integer_type_node)))
val = cp_perform_integral_promotions (val, complain);
This shows a clear inconsistency between C and C++, C passes the FUNCTION_TYPE,
while C++ passes the argument type.
If all the FEs passed the FUNCTION_TYPE/METHOD_TYPE at least, then i386 target
hook could decide say based on some custom attribute you'd use on those
builtins.
Or we could change the target hook further and pass a fndecl (if known) and
type to the target hook.

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #7 from Uroš Bizjak  ---
Another idea is to add an "nopromote" attribute to builtin decl (in
ix86_Add_new_buitins), detect this attribute in TARGET_PROMOTE_PROTOTYPES and
disable promotion in this case.

If this approach works, we can perhaps selectively add "nopromote" attribute to
masked builtins.

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread mpolacek at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #6 from Marek Polacek  ---
There is "type generic" attribute which disables certain promotions I think (or
at least float -> double).

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #5 from Uroš Bizjak  ---
Also,

#include 

__mmask16 m;

__m512i zzz;

__m512i
foo (__m512i x, __m512i y, int a)
{
  zzz = _mm512_mask_slli_epi32 (y, m, x, a);
  return _mm512_mask_srai_epi32 (y, m, x, a);
}

defeats the proposed prototype patch, resulting in:

vmovd   %edi, %xmm2
vmovdqa64   %zmm1, %zmm3
movzwl  m(%rip), %eax
vmovdqa64   %zmm1, %zmm4
kmovw   %eax, %k1
vpslld  %xmm2, %zmm0, %zmm3{%k1}
vpsrad  %xmm2, %zmm0, %zmm4{%k1}
vmovdqa64   %zmm3, zzz(%rip)
vmovdqa64   %zmm4, %zmm0
ret

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #4 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #3)
> Because builtins are treated like any other function calls, and if
> short/char args are promoted for normal calls, they are promoted for
> builtins too.

Indeed. Disabling TARGET_PROMOTE_PROTOTYPES gets us the direct move from
memory.

So... do we really need to promote all these builtins that usually result in a
well known instruction? Is there a way to mark (some of ?) them with a
"nopromote" flag?

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Because builtins are treated like any other function calls, and if short/char
args are promoted for normal calls, they are promoted for builtins too.

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

--- Comment #2 from Uroš Bizjak  ---
Hm, why does middle-end convert to integer in the first place? .optimized tree
dump reads:

foo (__m512i x, __m512i y, int a)
{
  short unsigned int m.1_1;
  int _2;
  vector(16) int _7;
  vector(16) int _8;
  vector(16) int _9;
  vector(8) long long int _10;

   [local count: 1]:
  m.1_1 = m;
  _2 = (int) m.1_1;
  _7 = VIEW_CONVERT_EXPR(y_5(D));
  _8 = VIEW_CONVERT_EXPR(x_6(D));
  _9 = __builtin_ia32_psradi512_mask (_8, a_3(D), _7, _2);
  _10 = VIEW_CONVERT_EXPR<__m512i>(_9);
  return _10;

}

[Bug target/82897] Unnecessary zero-extension when loading mask register from memory

2017-11-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-11-08
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com
 Ever confirmed|0   |1

--- Comment #1 from Uroš Bizjak  ---
Created attachment 42561
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42561=edit
Prototype patch

This patch changes predicate for mask register to nonimmediate operand. This
way, combine is able to simplify sequences of zero-extend and subreg operators
to a simple move.