Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Jakub Jelinek
On Mon, Feb 11, 2019 at 04:56:45PM +0100, Uros Bizjak wrote:
> > Let's first define what MODE_XI means in standard_sse_constant_opcode
> > as well as in all these mov patterns for with and without AVX512VL.   
> > Without
> > a clear definition, we can't get out of this mess.
> 
> INT_MODE (OI, 32);
> INT_MODE (XI, 64);
> 
> So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> in case of const_1, all 512 bits set.
> 
> We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> zeroing of highpart in case of 128 bit xor), so TImode in this case.
> 
> Some targets prefer V4SF mode, so they will emit float xorps for zeroing
> 
> Then the introduction of AVX512F fubared everything by overloading the
> meaning of insn mode.

I don't see much changes in AVX512F here, most of the behavior has been
there already in AVX.
Most of the SSE/AVX/AVX512 instructions affect the whole register,
usually there is DEST[MAX_VL-1:VL] <- 0 at the end of each instruction.
But, using the MAX_VL to determine get_attr_mode doesn't seem really useful,
because that changes dynamically at runtime based on the actual hw, not on
what we've been compiled for.
So, I believe we want to use that VL value to determine the bitsize of the
mode corresponding to get_attr_mode.  And in that case, for
*movoi_internal_avx and *movti_internal, I believe the right mode is MODE_OI
resp. MODE_TI for AVX512VL, because e.g.
vmovdqa32 %ymm12, %ymm23
is a VL = 256 instruction, not VL = 512.  Similarly, if we want to set
%ymm25 to all ones, i.e. movoi_internal_avx, we use
vpternlogd  $0xFF, %ymm25, %ymm25, %ymm25
which is again VL = 256 instruction, so should use MODE_OI.
We'd need to use
vmovdqa32 %zmm12, %zmm23
or
vpternlogd  $0xFF, %zmm25, %zmm25, %zmm25
instructions for AVX512F without AVX512VL, but as has been discussed, this
won't really happen, because hard_regno_mode_ok refuses to allocate 256-bit
or 128-bit modes in ext sse registers.

Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread H.J. Lu
In Mon, Feb 11, 2019 at 7:56 AM Uros Bizjak  wrote:
>
> On Mon, Feb 11, 2019 at 3:32 PM H.J. Lu  wrote:
> >
> > On Mon, Feb 11, 2019 at 5:51 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:
> > >
> > > > > No. As said, please correctly set mode to XImode in mode attribute 
> > > > > calculation.
> > > >
> > > > There is
> > > >
> > > >  switch (get_attr_type (insn))
> > > > {
> > > > case TYPE_SSELOG1:
> > > >   return standard_sse_constant_opcode (insn, operands);
> > > >
> > > > standard_sse_constant_opcode has
> > > >
> > > > else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> > > > {
> > > >   enum attr_mode insn_mode = get_attr_mode (insn);
> > > >
> > > >   switch (insn_mode)
> > > > {
> > > > case MODE_XI:
> > > > case MODE_V8DF:
> > > > case MODE_V16SF:
> > > >   gcc_assert (TARGET_AVX512F);
> > > >   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 
> > > > 0xFF}";
> > >
> > > If there is something wrong with standard_sse_constant_opcode, then
> > > fix the problem in the function itself. With your previous patch, you
> > > introduced a regression, and the presented fix is another kludge to
> > > fix a stack of kludges inside standard_sse_constant_opcode.
> > >
> > > Please take your time and propose some acceptable solution that would
> > > put some logic into const_0/const_1 handling. The situation is not OK
> > > and your patch makes it even worse.
> > >
> >
> > Let's first define what MODE_XI means in standard_sse_constant_opcode
> > as well as in all these mov patterns for with and without AVX512VL.   
> > Without
> > a clear definition, we can't get out of this mess.
>
> INT_MODE (OI, 32);
> INT_MODE (XI, 64);
>
> So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> in case of const_1, all 512 bits set.
>
> We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> zeroing of highpart in case of 128 bit xor), so TImode in this case.
>
> Some targets prefer V4SF mode, so they will emit float xorps for zeroing
>
> Then the introduction of AVX512F fubared everything by overloading the
> meaning of insn mode.

Exactly.

How should we use INSN mode,  MODE_XI, in standard_sse_constant_opcode
and patterns which use standard_sse_constant_opcode? 2 options:

1.  MODE_XI should only used to check if EXT_REX_SSE_REG_P is true
in any register operand.  The operand size must be determined by operand
itself , not by MODE_XI.  The operand encoding size should be determined
by the operand size, EXT_REX_SSE_REG_P and AVX512VL.
2. MODE_XI should be used to determine the operand encoding size.
EXT_REX_SSE_REG_P and AVX512VL should be checked for encoding
instructions.

Which way should we go?

-- 
H.J.


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 3:32 PM H.J. Lu  wrote:
>
> On Mon, Feb 11, 2019 at 5:51 AM Uros Bizjak  wrote:
> >
> > On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:
> >
> > > > No. As said, please correctly set mode to XImode in mode attribute 
> > > > calculation.
> > >
> > > There is
> > >
> > >  switch (get_attr_type (insn))
> > > {
> > > case TYPE_SSELOG1:
> > >   return standard_sse_constant_opcode (insn, operands);
> > >
> > > standard_sse_constant_opcode has
> > >
> > > else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> > > {
> > >   enum attr_mode insn_mode = get_attr_mode (insn);
> > >
> > >   switch (insn_mode)
> > > {
> > > case MODE_XI:
> > > case MODE_V8DF:
> > > case MODE_V16SF:
> > >   gcc_assert (TARGET_AVX512F);
> > >   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> >
> > If there is something wrong with standard_sse_constant_opcode, then
> > fix the problem in the function itself. With your previous patch, you
> > introduced a regression, and the presented fix is another kludge to
> > fix a stack of kludges inside standard_sse_constant_opcode.
> >
> > Please take your time and propose some acceptable solution that would
> > put some logic into const_0/const_1 handling. The situation is not OK
> > and your patch makes it even worse.
> >
>
> Let's first define what MODE_XI means in standard_sse_constant_opcode
> as well as in all these mov patterns for with and without AVX512VL.   Without
> a clear definition, we can't get out of this mess.

INT_MODE (OI, 32);
INT_MODE (XI, 64);

So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
in case of const_1, all 512 bits set.

We can load zeros with narrower instruction, (e.g. 256 bit by inherent
zeroing of highpart in case of 128 bit xor), so TImode in this case.

Some targets prefer V4SF mode, so they will emit float xorps for zeroing

Then the introduction of AVX512F fubared everything by overloading the
meaning of insn mode.

Uros.


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread H.J. Lu
On Mon, Feb 11, 2019 at 5:51 AM Uros Bizjak  wrote:
>
> On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:
>
> > > No. As said, please correctly set mode to XImode in mode attribute 
> > > calculation.
> >
> > There is
> >
> >  switch (get_attr_type (insn))
> > {
> > case TYPE_SSELOG1:
> >   return standard_sse_constant_opcode (insn, operands);
> >
> > standard_sse_constant_opcode has
> >
> > else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> > {
> >   enum attr_mode insn_mode = get_attr_mode (insn);
> >
> >   switch (insn_mode)
> > {
> > case MODE_XI:
> > case MODE_V8DF:
> > case MODE_V16SF:
> >   gcc_assert (TARGET_AVX512F);
> >   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
>
> If there is something wrong with standard_sse_constant_opcode, then
> fix the problem in the function itself. With your previous patch, you
> introduced a regression, and the presented fix is another kludge to
> fix a stack of kludges inside standard_sse_constant_opcode.
>
> Please take your time and propose some acceptable solution that would
> put some logic into const_0/const_1 handling. The situation is not OK
> and your patch makes it even worse.
>

Let's first define what MODE_XI means in standard_sse_constant_opcode
as well as in all these mov patterns for with and without AVX512VL.   Without
a clear definition, we can't get out of this mess.

-- 
H.J.


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:

> > No. As said, please correctly set mode to XImode in mode attribute 
> > calculation.
>
> There is
>
>  switch (get_attr_type (insn))
> {
> case TYPE_SSELOG1:
>   return standard_sse_constant_opcode (insn, operands);
>
> standard_sse_constant_opcode has
>
> else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> {
>   enum attr_mode insn_mode = get_attr_mode (insn);
>
>   switch (insn_mode)
> {
> case MODE_XI:
> case MODE_V8DF:
> case MODE_V16SF:
>   gcc_assert (TARGET_AVX512F);
>   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";

If there is something wrong with standard_sse_constant_opcode, then
fix the problem in the function itself. With your previous patch, you
introduced a regression, and the presented fix is another kludge to
fix a stack of kludges inside standard_sse_constant_opcode.

Please take your time and propose some acceptable solution that would
put some logic into const_0/const_1 handling. The situation is not OK
and your patch makes it even worse.

Uros.


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Jakub Jelinek
On Mon, Feb 11, 2019 at 02:15:18PM +0100, Uros Bizjak wrote:
> > Here is the updated patch to remove ext_sse_reg_operand check with a
> > testcase.
> >
> > OK for trunk?
> 
> No. As said, please correctly set mode to XImode in mode attribute 
> calculation.

The instructions in question are
vmovdqu32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqu32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem
vmovdqa32 mem, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, mem
vmovdqa32 %{x,y}mm{[0-9],[12][0-9],3[01]}, %{x,y}mm{1[6-9],2[0-9],3[01]}
vmovdqa32 %{x,y}mm{1[6-9],2[0-9],3[01]}, %{x,y}mm{[0-9],[12][0-9],3[01]}
Why should those instructions be XImode?  They have 16 or 32 byte operands,
never 64 byte.

Using EXT_REX_SSE_REG_P is what *movdi_internal uses too:
case MODE_TI:
  /* Handle AVX512 registers set.  */
  if (EXT_REX_SSE_REG_P (operands[0])
  || EXT_REX_SSE_REG_P (operands[1]))
return "vmovdqa64\t{%1, %0|%0, %1}";
  return "%vmovdqa\t{%1, %0|%0, %1}";
Sure, in that case it is not MODE_DI, but MODE_TI, because the move is
actually 128-bit, not 64-bit, but we do not claim it is 512-bit.

*movsi_internal is incorrect (and inefficient):
case MODE_TI:
  return "%vmovdqa\t{%1, %0|%0, %1}";
case MODE_XI:
  return "vmovdqa32\t{%g1, %g0|%g0, %g1}";
...
(eq_attr "alternative" "8,9")
  (cond [(ior (match_operand 0 "ext_sse_reg_operand")
  (match_operand 1 "ext_sse_reg_operand"))
   (const_string "XI")
 (ior (not (match_test "TARGET_SSE2"))
  (match_test "TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL"))
   (const_string "V4SF")
 (match_test "TARGET_AVX")
   (const_string "TI")
 (match_test "optimize_function_for_size_p (cfun)")
   (const_string "V4SF")
]
(const_string "TI"))
In my reading, for (set (reg:SI xmm16) (reg:SI xmm17)) the above will
emit vmovdqa32  %zmm17, %zmm16 even for -mavx512vl, which looks wrong.
So, I'd suggest (and (not (match_test "TARGET_AVX512VL"))
 (ior (match_operand 0 "ext_sse_reg_operand")
  (match_operand 1 "ext_sse_reg_operand"))
   (const_string "XI")

I see other wierdo stuff e.g. in *movdf_internal:
   /* movaps is one byte shorter for non-AVX targets.  */
   (eq_attr "alternative" "13,17")
 (cond [(and (ior (not (match_test "TARGET_PREFER_AVX256"))
  (not (match_test "TARGET_AVX512VL")))
 (ior (match_operand 0 "ext_sse_reg_operand")
  (match_operand 1 "ext_sse_reg_operand")))
  (const_string "V8DF")
If !TARGET_AVX512VL and one or both of the operands is (are)
ext_sse_reg_operand, obviously MODE_V8DF needs to be used.  But doesn't the
above force use of vmovapd  %zmm16, %zmm17 even if just -mavx512vl
-mprefer-vector-width=512?  I don't see any reason not to use
vmovapd %xmm16, %xmm17 in that case.  -mprefer-vector-width=512 is not you
must use ZMM all the time, but it is fine to use even EVEX instructions with
512-bit width.  Ditto *movsf_internal.

Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread H.J. Lu
On Mon, Feb 11, 2019 at 5:15 AM Uros Bizjak  wrote:
>
> On Mon, Feb 11, 2019 at 2:10 PM H.J. Lu  wrote:
> >
> > On Sat, Feb 09, 2019 at 02:39:30PM +0100, Jakub Jelinek wrote:
> > > On Sat, Feb 09, 2019 at 01:22:30PM +0100, Jakub Jelinek wrote:
> > > > On Sat, Feb 09, 2019 at 04:11:43AM -0800, H.J. Lu wrote:
> > > > > I believe all usages of
> > > > >
> > > > > (ior (match_operand 0 "ext_sse_reg_operand")
> > > > >   (match_operand 1 "ext_sse_reg_operand"))
> > > > >
> > > > > should be checked.  I am not sure if they should be there at all.
> > > >
> > > > E.g. in i386.md all the other spots look fine, because {DI,SI,DF,SF}mode
> > > > is allowed in ext sse regs even with -mavx512f.  And sse.md doesn't use 
> > > > this
> > > > at all.  What I'm wondering is if we need the sse.md 
> > > > (*mov_internal)
> > > > code I've cited earlier, doing bootstrap/regtest now with 
> > > > gcc_unreachable in
> > > > there (and in *mov{o,x}i_internal* for MODE_XI too) too see if it ever
> > > > triggers.
> > >
> > > The following didn't ICE on anything, which is not a proof, but given that
> > > hard_regno_mode_ok should return false for ext_sse_reg_operand regs for
> > > avx512f && !avx512vl, it matches my expectations, on the other hand, it 
> > > was
> > > a normal defaults bootstrap, don't have a knl which might be best for this
> > > to test -mavx512f -mno-avx512vl on everything.
> > > So perhaps we can also nuke the large if from mov_internal.
> > >
> > > --- gcc/config/i386/i386.md.jj2019-02-09 12:35:57.971475641 +0100
> > > +++ gcc/config/i386/i386.md   2019-02-09 12:37:40.776802962 +0100
> > > @@ -1905,6 +1905,7 @@ (define_insn "*movoi_internal_avx"
> > >return standard_sse_constant_opcode (insn, operands);
> > >
> > >  case TYPE_SSEMOV:
> > > +  gcc_assert (get_attr_mode (insn) != MODE_XI);
> > >if (misaligned_operand (operands[0], OImode)
> > > || misaligned_operand (operands[1], OImode))
> > >   {
> > > @@ -1970,6 +1971,7 @@ (define_insn "*movti_internal"
> > >  case TYPE_SSEMOV:
> > >/* TDmode values are passed as TImode on the stack.  Moving them
> > >to stack may result in unaligned memory access.  */
> > > +  gcc_assert (get_attr_mode (insn) != MODE_XI);
> > >if (misaligned_operand (operands[0], TImode)
> > > || misaligned_operand (operands[1], TImode))
> > >   {
> > > --- gcc/config/i386/sse.md.jj 2019-01-28 21:57:39.301110220 +0100
> > > +++ gcc/config/i386/sse.md2019-02-09 12:36:45.863696416 +0100
> > > @@ -989,6 +989,7 @@ (define_insn "mov_internal"
> > > && (EXT_REX_SSE_REG_P (operands[0])
> > > || EXT_REX_SSE_REG_P (operands[1])))
> > >   {
> > > +   gcc_unreachable ();
> > > if (memory_operand (operands[0], mode))
> > >   {
> > > if ( == 32)
> > >
> >
> > Here is the updated patch to remove ext_sse_reg_operand check with a
> > testcase.
> >
> > OK for trunk?
>
> No. As said, please correctly set mode to XImode in mode attribute 
> calculation.

There is

 switch (get_attr_type (insn))
{
case TYPE_SSELOG1:
  return standard_sse_constant_opcode (insn, operands);

standard_sse_constant_opcode has

else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
{
  enum attr_mode insn_mode = get_attr_mode (insn);

  switch (insn_mode)
{
case MODE_XI:
case MODE_V8DF:
case MODE_V16SF:
  gcc_assert (TARGET_AVX512F);
  return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";

What mode should be used to set %xmm23 to -1 with AVX512VL?  What mode
should be used to load %xmm23 with AVX512VL? There is no need to
check ext_sse_reg_operand here the same as in

(define_insn "mov_internal"
  [(set (match_operand:VMOVE 0 "nonimmediate_operand"
 "=v,v ,v ,m")
(match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
 " C,BC,vm,v"))]
  "TARGET_SSE
   && (register_operand (operands[0], mode)
   || register_operand (operands[1], mode))"
{

> Uros.
>
> > Thanks.
> >
> > H.J.
> > ---
> > Since hard_regno_mode_ok only allows xmm16-xmm31 in OImode or TImode
> > with TARGET_AVX512VL:
> >
> >   /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
> >   if (TARGET_AVX512VL
> >   && (mode == OImode
> >   || mode == TImode
> >   || VALID_AVX256_REG_MODE (mode)
> >   || VALID_AVX512VL_128_REG_MODE (mode)))
> > return true;
> >
> >   /* xmm16-xmm31 are only available for AVX-512.  */
> >   if (EXT_REX_SSE_REGNO_P (regno))
> > return false;
> >
> > there is no need to check ext_sse_reg_operand in *movoi_internal_avx nor
> > *movti_internal.  Instead, we should check EXT_REX_SSE_REG_P for upper 16
> > vector registers.
> >
> > 2019-02-11  H.J. Lu  
> > Jakub Jelinek  
> >
> > gcc/
> >
> > PR target/89229
> > * config/i386/i386.md (*movoi_internal_avx): Check
> >

Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 2:10 PM H.J. Lu  wrote:
>
> On Sat, Feb 09, 2019 at 02:39:30PM +0100, Jakub Jelinek wrote:
> > On Sat, Feb 09, 2019 at 01:22:30PM +0100, Jakub Jelinek wrote:
> > > On Sat, Feb 09, 2019 at 04:11:43AM -0800, H.J. Lu wrote:
> > > > I believe all usages of
> > > >
> > > > (ior (match_operand 0 "ext_sse_reg_operand")
> > > >   (match_operand 1 "ext_sse_reg_operand"))
> > > >
> > > > should be checked.  I am not sure if they should be there at all.
> > >
> > > E.g. in i386.md all the other spots look fine, because {DI,SI,DF,SF}mode
> > > is allowed in ext sse regs even with -mavx512f.  And sse.md doesn't use 
> > > this
> > > at all.  What I'm wondering is if we need the sse.md (*mov_internal)
> > > code I've cited earlier, doing bootstrap/regtest now with gcc_unreachable 
> > > in
> > > there (and in *mov{o,x}i_internal* for MODE_XI too) too see if it ever
> > > triggers.
> >
> > The following didn't ICE on anything, which is not a proof, but given that
> > hard_regno_mode_ok should return false for ext_sse_reg_operand regs for
> > avx512f && !avx512vl, it matches my expectations, on the other hand, it was
> > a normal defaults bootstrap, don't have a knl which might be best for this
> > to test -mavx512f -mno-avx512vl on everything.
> > So perhaps we can also nuke the large if from mov_internal.
> >
> > --- gcc/config/i386/i386.md.jj2019-02-09 12:35:57.971475641 +0100
> > +++ gcc/config/i386/i386.md   2019-02-09 12:37:40.776802962 +0100
> > @@ -1905,6 +1905,7 @@ (define_insn "*movoi_internal_avx"
> >return standard_sse_constant_opcode (insn, operands);
> >
> >  case TYPE_SSEMOV:
> > +  gcc_assert (get_attr_mode (insn) != MODE_XI);
> >if (misaligned_operand (operands[0], OImode)
> > || misaligned_operand (operands[1], OImode))
> >   {
> > @@ -1970,6 +1971,7 @@ (define_insn "*movti_internal"
> >  case TYPE_SSEMOV:
> >/* TDmode values are passed as TImode on the stack.  Moving them
> >to stack may result in unaligned memory access.  */
> > +  gcc_assert (get_attr_mode (insn) != MODE_XI);
> >if (misaligned_operand (operands[0], TImode)
> > || misaligned_operand (operands[1], TImode))
> >   {
> > --- gcc/config/i386/sse.md.jj 2019-01-28 21:57:39.301110220 +0100
> > +++ gcc/config/i386/sse.md2019-02-09 12:36:45.863696416 +0100
> > @@ -989,6 +989,7 @@ (define_insn "mov_internal"
> > && (EXT_REX_SSE_REG_P (operands[0])
> > || EXT_REX_SSE_REG_P (operands[1])))
> >   {
> > +   gcc_unreachable ();
> > if (memory_operand (operands[0], mode))
> >   {
> > if ( == 32)
> >
>
> Here is the updated patch to remove ext_sse_reg_operand check with a
> testcase.
>
> OK for trunk?

No. As said, please correctly set mode to XImode in mode attribute calculation.

Uros.

> Thanks.
>
> H.J.
> ---
> Since hard_regno_mode_ok only allows xmm16-xmm31 in OImode or TImode
> with TARGET_AVX512VL:
>
>   /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
>   if (TARGET_AVX512VL
>   && (mode == OImode
>   || mode == TImode
>   || VALID_AVX256_REG_MODE (mode)
>   || VALID_AVX512VL_128_REG_MODE (mode)))
> return true;
>
>   /* xmm16-xmm31 are only available for AVX-512.  */
>   if (EXT_REX_SSE_REGNO_P (regno))
> return false;
>
> there is no need to check ext_sse_reg_operand in *movoi_internal_avx nor
> *movti_internal.  Instead, we should check EXT_REX_SSE_REG_P for upper 16
> vector registers.
>
> 2019-02-11  H.J. Lu  
> Jakub Jelinek  
>
> gcc/
>
> PR target/89229
> * config/i386/i386.md (*movoi_internal_avx): Check
> EXT_REX_SSE_REG_P instead of MODE_XI for upper 16 vector
> registers.
> (*movti_internal): Likewise.
>
> gcc/testsuite/
>
> PR target/89229
> * gcc.target/i386/pr89229-1.c: New test.
> ---
>  gcc/config/i386/i386.md   | 22 +--
>  gcc/testsuite/gcc.target/i386/pr89229-1.c | 47 +++
>  2 files changed, 56 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-1.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 3d9141ae450..5b89e52493e 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1910,7 +1910,8 @@
> {
>   if (get_attr_mode (insn) == MODE_V8SF)
> return "vmovups\t{%1, %0|%0, %1}";
> - else if (get_attr_mode (insn) == MODE_XI)
> + else if (EXT_REX_SSE_REG_P (operands[0])
> +  || EXT_REX_SSE_REG_P (operands[1]))
> return "vmovdqu32\t{%1, %0|%0, %1}";
>   else
> return "vmovdqu\t{%1, %0|%0, %1}";
> @@ -1919,7 +1920,8 @@
> {
>   if (get_attr_mode (insn) == MODE_V8SF)
> return "vmovaps\t{%1, %0|%0, %1}";
> - else if (get_attr_mode (insn) == MODE_XI)
> +  

Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread H.J. Lu
On Sat, Feb 09, 2019 at 02:39:30PM +0100, Jakub Jelinek wrote:
> On Sat, Feb 09, 2019 at 01:22:30PM +0100, Jakub Jelinek wrote:
> > On Sat, Feb 09, 2019 at 04:11:43AM -0800, H.J. Lu wrote:
> > > I believe all usages of
> > > 
> > > (ior (match_operand 0 "ext_sse_reg_operand")
> > >   (match_operand 1 "ext_sse_reg_operand"))
> > > 
> > > should be checked.  I am not sure if they should be there at all.
> > 
> > E.g. in i386.md all the other spots look fine, because {DI,SI,DF,SF}mode
> > is allowed in ext sse regs even with -mavx512f.  And sse.md doesn't use this
> > at all.  What I'm wondering is if we need the sse.md (*mov_internal)
> > code I've cited earlier, doing bootstrap/regtest now with gcc_unreachable in
> > there (and in *mov{o,x}i_internal* for MODE_XI too) too see if it ever
> > triggers.
> 
> The following didn't ICE on anything, which is not a proof, but given that
> hard_regno_mode_ok should return false for ext_sse_reg_operand regs for
> avx512f && !avx512vl, it matches my expectations, on the other hand, it was
> a normal defaults bootstrap, don't have a knl which might be best for this
> to test -mavx512f -mno-avx512vl on everything.
> So perhaps we can also nuke the large if from mov_internal.
> 
> --- gcc/config/i386/i386.md.jj2019-02-09 12:35:57.971475641 +0100
> +++ gcc/config/i386/i386.md   2019-02-09 12:37:40.776802962 +0100
> @@ -1905,6 +1905,7 @@ (define_insn "*movoi_internal_avx"
>return standard_sse_constant_opcode (insn, operands);
>  
>  case TYPE_SSEMOV:
> +  gcc_assert (get_attr_mode (insn) != MODE_XI);
>if (misaligned_operand (operands[0], OImode)
> || misaligned_operand (operands[1], OImode))
>   {
> @@ -1970,6 +1971,7 @@ (define_insn "*movti_internal"
>  case TYPE_SSEMOV:
>/* TDmode values are passed as TImode on the stack.  Moving them
>to stack may result in unaligned memory access.  */
> +  gcc_assert (get_attr_mode (insn) != MODE_XI);
>if (misaligned_operand (operands[0], TImode)
> || misaligned_operand (operands[1], TImode))
>   {
> --- gcc/config/i386/sse.md.jj 2019-01-28 21:57:39.301110220 +0100
> +++ gcc/config/i386/sse.md2019-02-09 12:36:45.863696416 +0100
> @@ -989,6 +989,7 @@ (define_insn "mov_internal"
> && (EXT_REX_SSE_REG_P (operands[0])
> || EXT_REX_SSE_REG_P (operands[1])))
>   {
> +   gcc_unreachable ();
> if (memory_operand (operands[0], mode))
>   {
> if ( == 32)
> 

Here is the updated patch to remove ext_sse_reg_operand check with a
testcase.

OK for trunk?

Thanks.

H.J.
---
Since hard_regno_mode_ok only allows xmm16-xmm31 in OImode or TImode
with TARGET_AVX512VL:

  /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
  if (TARGET_AVX512VL
  && (mode == OImode
  || mode == TImode
  || VALID_AVX256_REG_MODE (mode)
  || VALID_AVX512VL_128_REG_MODE (mode)))
return true;

  /* xmm16-xmm31 are only available for AVX-512.  */
  if (EXT_REX_SSE_REGNO_P (regno))
return false;

there is no need to check ext_sse_reg_operand in *movoi_internal_avx nor
*movti_internal.  Instead, we should check EXT_REX_SSE_REG_P for upper 16
vector registers.

2019-02-11  H.J. Lu  
Jakub Jelinek  

gcc/

PR target/89229
* config/i386/i386.md (*movoi_internal_avx): Check
EXT_REX_SSE_REG_P instead of MODE_XI for upper 16 vector
registers.
(*movti_internal): Likewise.

gcc/testsuite/

PR target/89229
* gcc.target/i386/pr89229-1.c: New test.
---
 gcc/config/i386/i386.md   | 22 +--
 gcc/testsuite/gcc.target/i386/pr89229-1.c | 47 +++
 2 files changed, 56 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-1.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3d9141ae450..5b89e52493e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1910,7 +1910,8 @@
{
  if (get_attr_mode (insn) == MODE_V8SF)
return "vmovups\t{%1, %0|%0, %1}";
- else if (get_attr_mode (insn) == MODE_XI)
+ else if (EXT_REX_SSE_REG_P (operands[0])
+  || EXT_REX_SSE_REG_P (operands[1]))
return "vmovdqu32\t{%1, %0|%0, %1}";
  else
return "vmovdqu\t{%1, %0|%0, %1}";
@@ -1919,7 +1920,8 @@
{
  if (get_attr_mode (insn) == MODE_V8SF)
return "vmovaps\t{%1, %0|%0, %1}";
- else if (get_attr_mode (insn) == MODE_XI)
+ else if (EXT_REX_SSE_REG_P (operands[0])
+  || EXT_REX_SSE_REG_P (operands[1]))
return "vmovdqa32\t{%1, %0|%0, %1}";
  else
return "vmovdqa\t{%1, %0|%0, %1}";
@@ -1933,11 +1935,7 @@
(set_attr "type" "sselog1,sselog1,ssemov,ssemov")
(set_attr "prefix" "vex")
(set (attr "mode")
-   (cond [(and (not 

Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread Jakub Jelinek
On Sat, Feb 09, 2019 at 01:22:30PM +0100, Jakub Jelinek wrote:
> On Sat, Feb 09, 2019 at 04:11:43AM -0800, H.J. Lu wrote:
> > I believe all usages of
> > 
> > (ior (match_operand 0 "ext_sse_reg_operand")
> >   (match_operand 1 "ext_sse_reg_operand"))
> > 
> > should be checked.  I am not sure if they should be there at all.
> 
> E.g. in i386.md all the other spots look fine, because {DI,SI,DF,SF}mode
> is allowed in ext sse regs even with -mavx512f.  And sse.md doesn't use this
> at all.  What I'm wondering is if we need the sse.md (*mov_internal)
> code I've cited earlier, doing bootstrap/regtest now with gcc_unreachable in
> there (and in *mov{o,x}i_internal* for MODE_XI too) too see if it ever
> triggers.

The following didn't ICE on anything, which is not a proof, but given that
hard_regno_mode_ok should return false for ext_sse_reg_operand regs for
avx512f && !avx512vl, it matches my expectations, on the other hand, it was
a normal defaults bootstrap, don't have a knl which might be best for this
to test -mavx512f -mno-avx512vl on everything.
So perhaps we can also nuke the large if from mov_internal.

--- gcc/config/i386/i386.md.jj  2019-02-09 12:35:57.971475641 +0100
+++ gcc/config/i386/i386.md 2019-02-09 12:37:40.776802962 +0100
@@ -1905,6 +1905,7 @@ (define_insn "*movoi_internal_avx"
   return standard_sse_constant_opcode (insn, operands);
 
 case TYPE_SSEMOV:
+  gcc_assert (get_attr_mode (insn) != MODE_XI);
   if (misaligned_operand (operands[0], OImode)
  || misaligned_operand (operands[1], OImode))
{
@@ -1970,6 +1971,7 @@ (define_insn "*movti_internal"
 case TYPE_SSEMOV:
   /* TDmode values are passed as TImode on the stack.  Moving them
 to stack may result in unaligned memory access.  */
+  gcc_assert (get_attr_mode (insn) != MODE_XI);
   if (misaligned_operand (operands[0], TImode)
  || misaligned_operand (operands[1], TImode))
{
--- gcc/config/i386/sse.md.jj   2019-01-28 21:57:39.301110220 +0100
+++ gcc/config/i386/sse.md  2019-02-09 12:36:45.863696416 +0100
@@ -989,6 +989,7 @@ (define_insn "mov_internal"
  && (EXT_REX_SSE_REG_P (operands[0])
  || EXT_REX_SSE_REG_P (operands[1])))
{
+ gcc_unreachable ();
  if (memory_operand (operands[0], mode))
{
  if ( == 32)

Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread Jakub Jelinek
On Sat, Feb 09, 2019 at 04:11:43AM -0800, H.J. Lu wrote:
> I believe all usages of
> 
> (ior (match_operand 0 "ext_sse_reg_operand")
>   (match_operand 1 "ext_sse_reg_operand"))
> 
> should be checked.  I am not sure if they should be there at all.

E.g. in i386.md all the other spots look fine, because {DI,SI,DF,SF}mode
is allowed in ext sse regs even with -mavx512f.  And sse.md doesn't use this
at all.  What I'm wondering is if we need the sse.md (*mov_internal)
code I've cited earlier, doing bootstrap/regtest now with gcc_unreachable in
there (and in *mov{o,x}i_internal* for MODE_XI too) too see if it ever
triggers.

Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread H.J. Lu
On Sat, Feb 9, 2019 at 2:50 AM Jakub Jelinek  wrote:
>
> On Sat, Feb 09, 2019 at 11:40:49AM +0100, Jakub Jelinek wrote:
> > 2019-02-09  Jakub Jelinek  
> >
> >   PR target/89229
> >   * config/i386/i386.md (*movoi_internal_avx, *movti_internal): Handle
> >   MODE_XI properly.
>
> Actually, I believe this shouldn't be needed, basically I think MODE_XI
> should never be the case for these instructions, because hard_regno_mode_ok
> shouldn't allow that:
>
>   /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
>   if (TARGET_AVX512VL
>   && (mode == OImode
>   || mode == TImode
>   || VALID_AVX256_REG_MODE (mode)
>   || VALID_AVX512VL_128_REG_MODE (mode)))
> return true;
>
>   /* xmm16-xmm31 are only available for AVX-512.  */
>   if (EXT_REX_SSE_REGNO_P (regno))
> return false;
>
> but then the question is if we really need:
> (and (not (match_test "TARGET_AVX512VL"))
> (ior (match_operand 0 "ext_sse_reg_operand")
>  (match_operand 1 "ext_sse_reg_operand")))
>  (const_string "XI")
> on both of the instructions, not avx512vl, the above shouldn't allow
> ext_sse_reg_operand through with OImode or TImode.
> We still need the MODE_XI -> EXT_REX_SSE_REGNO_P patch H.J. posted.
>
> Jakub

I believe all usages of

(ior (match_operand 0 "ext_sse_reg_operand")
  (match_operand 1 "ext_sse_reg_operand"))

should be checked.  I am not sure if they should be there at all.

-- 
H.J.


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread Jakub Jelinek
On Sat, Feb 09, 2019 at 11:40:49AM +0100, Jakub Jelinek wrote:
> 2019-02-09  Jakub Jelinek  
> 
>   PR target/89229
>   * config/i386/i386.md (*movoi_internal_avx, *movti_internal): Handle
>   MODE_XI properly.

Actually, I believe this shouldn't be needed, basically I think MODE_XI
should never be the case for these instructions, because hard_regno_mode_ok
shouldn't allow that:

  /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
  if (TARGET_AVX512VL
  && (mode == OImode
  || mode == TImode
  || VALID_AVX256_REG_MODE (mode)
  || VALID_AVX512VL_128_REG_MODE (mode)))
return true;

  /* xmm16-xmm31 are only available for AVX-512.  */
  if (EXT_REX_SSE_REGNO_P (regno))
return false;

but then the question is if we really need:
(and (not (match_test "TARGET_AVX512VL"))
(ior (match_operand 0 "ext_sse_reg_operand")
 (match_operand 1 "ext_sse_reg_operand")))
 (const_string "XI")
on both of the instructions, not avx512vl, the above shouldn't allow
ext_sse_reg_operand through with OImode or TImode.
We still need the MODE_XI -> EXT_REX_SSE_REGNO_P patch H.J. posted.

Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread Jakub Jelinek
On Sat, Feb 09, 2019 at 10:56:38AM +0100, Jakub Jelinek wrote:
> On Sat, Feb 09, 2019 at 10:50:43AM +0100, Uros Bizjak wrote:
> > > Also need this patch since we no longer set MODE_XI for
> > > AVX512VL.
> > 
> > No. Please figure out correct condition to set mode attribute to XImode 
> > instead.
> 
> If it is AVX512VL, isn't MODE_OI or MODE_TI correct in those cases though?
> While the instructions need EVEX encoding if they have [xy]mm{16,...31}
> operands, they operate just on 256 or 128 bits.

That said, mov{oi,ti}_internal is severely broken for avx512f without
avx512vl even after this patch.

I think the following patch, incremental to H.J.'s patch, should fix that.
It is pretty much a copy of what sse.md (*mov_internal) pattern does,
just specialized to the particular instructions (i.e. that it is integral,
not floating, and always 32-byte or always 16-byte).  sse.md has:
  /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
 in avx512f, so we need to use workarounds, to access sse registers
 16-31, which are evex-only. In avx512vl we don't need workarounds.  */
  if (TARGET_AVX512F &&  < 64 && !TARGET_AVX512VL
  && (EXT_REX_SSE_REG_P (operands[0])
  || EXT_REX_SSE_REG_P (operands[1])))
{
  if (memory_operand (operands[0], mode))
{
  if ( == 32)
return "vextract64x4\t{$0x0, %g1, %0|%0, %g1, 
0x0}";
  else if ( == 16)
return "vextract32x4\t{$0x0, %g1, %0|%0, %g1, 
0x0}";
  else
gcc_unreachable ();
}
  else if (memory_operand (operands[1], mode))
{
  if ( == 32)
return "vbroadcast64x4\t{%1, %g0|%g0, %1}";
  else if ( == 16)
return "vbroadcast32x4\t{%1, %g0|%g0, %1}";
  else
gcc_unreachable ();
}
  else
/* Reg -> reg move is always aligned.  Just use wider move.  */
switch (get_attr_mode (insn))
  {
  case MODE_V8SF:
  case MODE_V4SF:
return "vmovaps\t{%g1, %g0|%g0, %g1}";
  case MODE_V4DF:
  case MODE_V2DF:
return "vmovapd\t{%g1, %g0|%g0, %g1}";
  case MODE_OI:
  case MODE_TI:
return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
  default:
gcc_unreachable ();
  }
}
before it tries to handle the normal cases.  Ok for trunk if it passes
bootstrap/regtest?

2019-02-09  Jakub Jelinek  

PR target/89229
* config/i386/i386.md (*movoi_internal_avx, *movti_internal): Handle
MODE_XI properly.

--- gcc/config/i386/i386.md.jj  2019-02-09 11:18:53.995450055 +0100
+++ gcc/config/i386/i386.md 2019-02-09 11:26:04.364342306 +0100
@@ -1905,6 +1905,18 @@ (define_insn "*movoi_internal_avx"
   return standard_sse_constant_opcode (insn, operands);
 
 case TYPE_SSEMOV:
+  /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
+in avx512f, so we need to use workarounds to access sse registers
+16-31, which are evex-only. In avx512vl we don't need workarounds.  */
+  if (get_attr_mode (insn) == MODE_XI)
+   {
+ if (memory_operand (operands[0], OImode))
+   return "vextracti64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
+ else if (memory_operand (operands[1], OImode))
+   return "vbroadcasti64x4\t{%1, %g0|%g0, %1}";
+ else
+   return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
+   }
   if (misaligned_operand (operands[0], OImode)
  || misaligned_operand (operands[1], OImode))
{
@@ -1968,6 +1980,18 @@ (define_insn "*movti_internal"
   return standard_sse_constant_opcode (insn, operands);
 
 case TYPE_SSEMOV:
+  /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
+in avx512f, so we need to use workarounds to access sse registers
+16-31, which are evex-only. In avx512vl we don't need workarounds.  */
+  if (get_attr_mode (insn) == MODE_XI)
+   {
+ if (memory_operand (operands[0], TImode))
+   return "vextracti32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
+ else if (memory_operand (operands[1], TImode))
+   return "vbroadcasti32x4\t{%1, %g0|%g0, %1}";
+ else
+   return "vmovdqa64\t{%g1, %g0|%g0, %g1}";
+   }
   /* TDmode values are passed as TImode on the stack.  Moving them
 to stack may result in unaligned memory access.  */
   if (misaligned_operand (operands[0], TImode)


Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread Jakub Jelinek
On Sat, Feb 09, 2019 at 10:50:43AM +0100, Uros Bizjak wrote:
> > Also need this patch since we no longer set MODE_XI for
> > AVX512VL.
> 
> No. Please figure out correct condition to set mode attribute to XImode 
> instead.

If it is AVX512VL, isn't MODE_OI or MODE_TI correct in those cases though?
While the instructions need EVEX encoding if they have [xy]mm{16,...31}
operands, they operate just on 256 or 128 bits.

Jakub


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-09 Thread Uros Bizjak
On 2/9/19, H.J. Lu  wrote:
> On Fri, Feb 8, 2019 at 3:28 AM H.J. Lu  wrote:
>>
>> On Fri, Feb 8, 2019 at 1:51 AM Uros Bizjak  wrote:
>> >
>> > On Thu, Feb 7, 2019 at 10:11 PM H.J. Lu  wrote:
>> > >
>> > > OImode and TImode moves must be done in XImode to access upper 16
>> > > vector registers without AVX512VL.  With AVX512VL, we can access
>> > > upper 16 vector registers in OImode and TImode.
>> > >
>> > > PR target/89229
>> > > * config/i386/i386.md (*movoi_internal_avx): Set mode to XI
>> > > for
>> > > upper 16 vector registers without TARGET_AVX512VL.
>> > > (*movti_internal): Likewise.
>> >
>> > Please use (not (match_test "...")) instead of (match_test "!...") and
>> > put the new test as the first argument of the AND rtx.
>> >
>> > LGTM with the above change.
>>
>> This is the patch I am checking in.
>>
>> Thanks.
>>
>> H.J.
>> ---
>> OImode and TImode moves must be done in XImode to access upper 16
>> vector registers without AVX512VL.  With AVX512VL, we can access
>> upper 16 vector registers in OImode and TImode.
>>
>> PR target/89229
>> * config/i386/i386.md (*movoi_internal_avx): Set mode to XI for
>> upper 16 vector registers without TARGET_AVX512VL.
>> (*movti_internal): Likewise.
>> ---
>>  gcc/config/i386/i386.md | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>> index c1492363bca..3d9141ae450 100644
>> --- a/gcc/config/i386/i386.md
>> +++ b/gcc/config/i386/i386.md
>> @@ -1933,8 +1933,9 @@
>> (set_attr "type" "sselog1,sselog1,ssemov,ssemov")
>> (set_attr "prefix" "vex")
>> (set (attr "mode")
>> - (cond [(ior (match_operand 0 "ext_sse_reg_operand")
>> - (match_operand 1 "ext_sse_reg_operand"))
>> + (cond [(and (not (match_test "TARGET_AVX512VL"))
>> + (ior (match_operand 0 "ext_sse_reg_operand")
>> + (match_operand 1 "ext_sse_reg_operand")))
>>   (const_string "XI")
>>  (and (eq_attr "alternative" "1")
>>   (match_test "TARGET_AVX512VL"))
>> @@ -2012,8 +2013,9 @@
>> (set (attr "mode")
>>   (cond [(eq_attr "alternative" "0,1")
>>   (const_string "DI")
>> -(ior (match_operand 0 "ext_sse_reg_operand")
>> - (match_operand 1 "ext_sse_reg_operand"))
>> +(and (not (match_test "TARGET_AVX512VL"))
>> + (ior (match_operand 0 "ext_sse_reg_operand")
>> + (match_operand 1 "ext_sse_reg_operand")))
>>   (const_string "XI")
>>  (and (eq_attr "alternative" "3")
>>   (match_test "TARGET_AVX512VL"))
>> --
>
> Also need this patch since we no longer set MODE_XI for
> AVX512VL.

No. Please figure out correct condition to set mode attribute to XImode instead.

Uros.