Mode change for bswap pattern expansion

2014-01-27 Thread Paulo Matos
Hello,

On a vector processor we can do a bswapsi with two instructions, by first 
rotating half-words (16 bits) by 8 and then rotating full words by 16. 
However, this means expanding:
(set (match_operand:SI 0 register_operand )
 (bswap:SI (match_operand:SI 1 register_operand )))

to:
(set (match_dup:V2HI 0)
 (rotate:V2HI (match_dup:V2HI 1)
  (const_int 8)))
(set (match_dup:SI 0)
 (rotate:SI (match_dup:SI 0)
(const_int 16)))

This is obviously not correct, because match_dup cannot set the mode. The point 
I am trying to make is that I can't find a good way to deal with the mode 
changes. I don't think GCC is too happy if I change the modes of the same 
operand from one instruction to the other right? The only other way is to emit 
paradoxical subregs. So something along these lines:
(set (subreg:V2HI (match_dup 0) 0)
 (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
  (const_int 8)))
(set (match_dup 0)
 (rotate:SI (match_dup 0)
(const_int 16)))

Is there a better way to handle a situation like this?

Cheers,

Paulo Matos




Re: Mode change for bswap pattern expansion

2014-01-27 Thread Richard Sandiford
Paulo Matos pma...@broadcom.com writes:
 On a vector processor we can do a bswapsi with two instructions, by first 
 rotating half-words (16 bits) by 8 and then rotating full words by 16. 
 However, this means expanding:
 (set (match_operand:SI 0 register_operand )
  (bswap:SI (match_operand:SI 1 register_operand )))

 to:
 (set (match_dup:V2HI 0)
  (rotate:V2HI (match_dup:V2HI 1)
   (const_int 8)))
 (set (match_dup:SI 0)
  (rotate:SI (match_dup:SI 0)
 (const_int 16)))

 This is obviously not correct, because match_dup cannot set the mode. The 
 point I am trying to make is that I can't find a good way to deal with the 
 mode changes. I don't think GCC is too happy if I change the modes of the 
 same operand from one instruction to the other right? The only other way is 
 to emit paradoxical subregs. So something along these lines:
 (set (subreg:V2HI (match_dup 0) 0)
  (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
   (const_int 8)))
 (set (match_dup 0)
  (rotate:SI (match_dup 0)
 (const_int 16)))

It's usually better not to hard-code the subregs in the pattern.
Instead you could use C code to create the subregs, e.g.:

  [(set (match_dup 3)
(rotate:V2HI (match_dup 2)
 (const_int 8)))
   (set (match_dup 0)
(rotate:SI (match_dup 4)
   (const_int 16)))]
  
{
  operands[2] = gen_lowpart (V2HImode, operands[1]);
  operands[3] = gen_reg_rtx (V2HImode);
  operands[4] = gen_lowpart (SImode, operands[3]);
}

so that any hard regs are correctly handled.  Or it might be easier to code
it using emit_insn (gen_* (...))s instead.

BTW, paradoxical subregs are where the outer mode is strictly larger
than the inner mode.

MIPS uses essentially the same sequence, except that it has a special
instruction to do the first rotate (WSBH), rather than it being an instance
of a general vector rotate.  For MIPS we just model it as an unspec SImode
operation.  Maybe that would be easier here too.

Thanks,
Richard


RE: Mode change for bswap pattern expansion

2014-01-27 Thread Paulo Matos


 -Original Message-
 From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
 Sent: 27 January 2014 16:06
 To: Paulo Matos
 Cc: gcc@gcc.gnu.org
 Subject: Re: Mode change for bswap pattern expansion
 
 Paulo Matos pma...@broadcom.com writes:
  On a vector processor we can do a bswapsi with two instructions, by first
 rotating half-words (16 bits) by 8 and then rotating full words by 16.
  However, this means expanding:
  (set (match_operand:SI 0 register_operand )
   (bswap:SI (match_operand:SI 1 register_operand )))
 
  to:
  (set (match_dup:V2HI 0)
   (rotate:V2HI (match_dup:V2HI 1)
(const_int 8)))
  (set (match_dup:SI 0)
   (rotate:SI (match_dup:SI 0)
  (const_int 16)))
 
  This is obviously not correct, because match_dup cannot set the mode. The 
  point
 I am trying to make is that I can't find a good way to deal with the mode
 changes. I don't think GCC is too happy if I change the modes of the same 
 operand
 from one instruction to the other right? The only other way is to emit
 paradoxical subregs. So something along these lines:
  (set (subreg:V2HI (match_dup 0) 0)
   (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
(const_int 8)))
  (set (match_dup 0)
   (rotate:SI (match_dup 0)
  (const_int 16)))
 
 It's usually better not to hard-code the subregs in the pattern.
 Instead you could use C code to create the subregs, e.g.:
 
   [(set (match_dup 3)
 (rotate:V2HI (match_dup 2)
  (const_int 8)))
(set (match_dup 0)
 (rotate:SI (match_dup 4)
(const_int 16)))]
   
 {
   operands[2] = gen_lowpart (V2HImode, operands[1]);
   operands[3] = gen_reg_rtx (V2HImode);
   operands[4] = gen_lowpart (SImode, operands[3]);
 }
 
 so that any hard regs are correctly handled.  Or it might be easier to code
 it using emit_insn (gen_* (...))s instead.
 
 BTW, paradoxical subregs are where the outer mode is strictly larger
 than the inner mode.


That's right. My mis-understanding.
 
 MIPS uses essentially the same sequence, except that it has a special
 instruction to do the first rotate (WSBH), rather than it being an instance
 of a general vector rotate.  For MIPS we just model it as an unspec SImode
 operation.  Maybe that would be easier here too.
 

I will look at how MIPS is doing it.

However, the unspec SI has severe performance penalties on my port since it is 
able to issue more that one instruction per cycle, therefore having each 
instruction separately allows the scheduler to issue each of the bswapsi parts 
into different slots with other instructions.


Thanks,

Paulo Matos

 Thanks,
 Richard


Re: Mode change for bswap pattern expansion

2014-01-27 Thread Richard Sandiford
Paulo Matos pma...@broadcom.com writes:
 -Original Message-
 From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
 Sent: 27 January 2014 16:06
 To: Paulo Matos
 Cc: gcc@gcc.gnu.org
 Subject: Re: Mode change for bswap pattern expansion
 
 Paulo Matos pma...@broadcom.com writes:
  On a vector processor we can do a bswapsi with two instructions, by first
 rotating half-words (16 bits) by 8 and then rotating full words by 16.
  However, this means expanding:
  (set (match_operand:SI 0 register_operand )
   (bswap:SI (match_operand:SI 1 register_operand )))
 
  to:
  (set (match_dup:V2HI 0)
   (rotate:V2HI (match_dup:V2HI 1)
(const_int 8)))
  (set (match_dup:SI 0)
   (rotate:SI (match_dup:SI 0)
  (const_int 16)))
 
  This is obviously not correct, because match_dup cannot set the mode. The 
  point
 I am trying to make is that I can't find a good way to deal with the mode
 changes. I don't think GCC is too happy if I change the modes of the
 same operand
 from one instruction to the other right? The only other way is to emit
 paradoxical subregs. So something along these lines:
  (set (subreg:V2HI (match_dup 0) 0)
   (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
(const_int 8)))
  (set (match_dup 0)
   (rotate:SI (match_dup 0)
  (const_int 16)))
 
 It's usually better not to hard-code the subregs in the pattern.
 Instead you could use C code to create the subregs, e.g.:
 
   [(set (match_dup 3)
 (rotate:V2HI (match_dup 2)
  (const_int 8)))
(set (match_dup 0)
 (rotate:SI (match_dup 4)
(const_int 16)))]
   
 {
   operands[2] = gen_lowpart (V2HImode, operands[1]);
   operands[3] = gen_reg_rtx (V2HImode);
   operands[4] = gen_lowpart (SImode, operands[3]);
 }
 
 so that any hard regs are correctly handled.  Or it might be easier to code
 it using emit_insn (gen_* (...))s instead.
 
 BTW, paradoxical subregs are where the outer mode is strictly larger
 than the inner mode.


 That's right. My mis-understanding.
  
 MIPS uses essentially the same sequence, except that it has a special
 instruction to do the first rotate (WSBH), rather than it being an instance
 of a general vector rotate.  For MIPS we just model it as an unspec SImode
 operation.  Maybe that would be easier here too.
 

 I will look at how MIPS is doing it.

 However, the unspec SI has severe performance penalties on my port since
 it is able to issue more that one instruction per cycle, therefore
 having each instruction separately allows the scheduler to issue each of
 the bswapsi parts into different slots with other instructions.

Sorry, I meant we use an unspec for the first (V2HI) rotate.
I.e. rather than:

  (set (subreg:V2HI (match_dup 2) 0)
   (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
(const_int 8)))
  (set (match_dup 0)
   (rotate:SI (match_dup 2)
  (const_int 16)))

we have:

  (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
  (set (match_dup 0)
   (rotate:SI (match_dup 2)
  (const_int 16)))

In your case the define_insn for the UNSPEC_FOO pattern would have the
same attributes as a V2HI rotate, so it should get scheduled in the same way.

Thanks,
Richard



RE: Mode change for bswap pattern expansion

2014-01-27 Thread Paulo Matos
 -Original Message-
 From: Richard Sandiford [mailto:rsand...@linux.vnet.ibm.com]
 Sent: 27 January 2014 16:50
 To: Paulo Matos
 Cc: gcc@gcc.gnu.org
 Subject: Re: Mode change for bswap pattern expansion
  
 Sorry, I meant we use an unspec for the first (V2HI) rotate.
 I.e. rather than:
 
   (set (subreg:V2HI (match_dup 2) 0)
(rotate:V2HI (subreg:V2HI (match_dup 1) 0)
 (const_int 8)))
   (set (match_dup 0)
(rotate:SI (match_dup 2)
   (const_int 16)))
 
 we have:
 
   (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
   (set (match_dup 0)
(rotate:SI (match_dup 2)
   (const_int 16)))
 
 In your case the define_insn for the UNSPEC_FOO pattern would have the
 same attributes as a V2HI rotate, so it should get scheduled in the same way.
 

In that case it would work. My only concern would then be if it prevents 
further optimizations. On the other hand I am not sure if GCC would try to 
optimize a rotate with vector V2HI mode...
Might give both solutions a try and see what results I get.

Thanks,

Paulo Matos

 Thanks,
 Richard



Re: Mode change for bswap pattern expansion

2014-01-27 Thread pinskia


On Jan 27, 2014, at 8:59 AM, Paulo Matos pma...@broadcom.com wrote:

 -Original Message-
 From: Richard Sandiford [mailto:rsand...@linux.vnet.ibm.com]
 Sent: 27 January 2014 16:50
 To: Paulo Matos
 Cc: gcc@gcc.gnu.org
 Subject: Re: Mode change for bswap pattern expansion
 
 Sorry, I meant we use an unspec for the first (V2HI) rotate.
 I.e. rather than:
 
  (set (subreg:V2HI (match_dup 2) 0)
   (rotate:V2HI (subreg:V2HI (match_dup 1) 0)
(const_int 8)))
  (set (match_dup 0)
   (rotate:SI (match_dup 2)
  (const_int 16)))
 
 we have:
 
  (set (match_dup 2) (unspec:SI [(match_dup 1)] UNSPEC_FOO))
  (set (match_dup 0)
   (rotate:SI (match_dup 2)
  (const_int 16)))
 
 In your case the define_insn for the UNSPEC_FOO pattern would have the
 same attributes as a V2HI rotate, so it should get scheduled in the same way.
 
 In that case it would work. My only concern would then be if it prevents 
 further optimizations. On the other hand I am not sure if GCC would try to 
 optimize a rotate with vector V2HI mode...
 Might give both solutions a try and see what results I get.

Maybe do a split after reload instead?

Thanks,
Andrew

 
 Thanks,
 
 Paulo Matos
 
 Thanks,
 Richard