Re: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-14 Thread Uros Bizjak via Gcc-patches
On Fri, Jul 14, 2023 at 11:27 AM Roger Sayle  wrote:
>
>
> > From: Uros Bizjak 
> > Sent: 13 July 2023 19:21
> >
> > On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle 
> > wrote:
> > >
> > > This patch resolves PR target/110588 to catch another case in combine
> > > where the i386 backend should be generating a btl instruction.  This
> > > adds another define_insn_and_split to recognize the RTL representation
> > > for this case.
> > >
> > > I also noticed that two related define_insn_and_split weren't using
> > > the preferred string style for single statement
> > > preparation-statements, so I've reformatted these to be consistent in 
> > > style with
> > the new one.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-13  Roger Sayle  
> > >
> > > gcc/ChangeLog
> > > PR target/110588
> > > * config/i386/i386.md (*bt_setcqi): Prefer string form
> > > preparation statement over braces for a single statement.
> > > (*bt_setncqi): Likewise.
> > > (*bt_setncqi_2): New define_insn_and_split.
> > >
> > > gcc/testsuite/ChangeLog
> > > PR target/110588
> > > * gcc.target/i386/pr110588.c: New test case.
> >
> > +;; Help combine recognize bt followed by setnc (PR target/110588)
> > +(define_insn_and_split "*bt_setncqi_2"
> > +  [(set (match_operand:QI 0 "register_operand")  (eq:QI
> > +  (zero_extract:SWI48
> > +(match_operand:SWI48 1 "register_operand")
> > +(const_int 1)
> > +(zero_extend:SI (match_operand:QI 2 "register_operand")))
> > +  (const_int 0)))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (reg:CCC FLAGS_REG)
> > +(compare:CCC
> > + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
> > + (const_int 0)))
> > +   (set (match_dup 0)
> > +(ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
> > +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> >
> > I don't think the above transformation is 100% correct, mainly due to the 
> > use of
> > paradoxical subreg.
> >
> > The combined instruction is operating with a zero_extended QImode register, 
> > so
> > all bits of the register are well defined. You are splitting using 
> > paradoxical subreg,
> > so you don't know what garbage is there in the highpart of the count 
> > register.
> > However, BTL/BTQ uses modulo 64 (or 32) of this register, so even with a 
> > slightly
> > invalid RTX, everything checks out.
> >
> > +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> >
> > You probably need mode instead of SImode here.
>
> The define_insn for *bt is:
>
> (define_insn "*bt"
>   [(set (reg:CCC FLAGS_REG)
> (compare:CCC
>   (zero_extract:SWI48
> (match_operand:SWI48 0 "nonimmediate_operand" "r,m")
> (const_int 1)
> (match_operand:SI 1 "nonmemory_operand" "r,"))
>   (const_int 0)))]
>
> So  isn't appropriate here.
>
> But now you've made me think about it, it's inconsistent that all of the 
> shifts
> and rotates in i386.md standardize on QImode for shift counts, but the bit 
> test
> instructions use SImode?  I think this explains where the paradoxical SUBREGs
> come from, and in theory any_extend from QImode to SImode here could/should
> be handled/unnecessary.
>
> Is it worth investigating a follow-up patch to convert all ZERO_EXTRACTs and
> SIGN_EXTRACTs in i386.md to use QImode (instead of SImode)?

IIRC, zero_extract was moved from modeless to a pattern with defined
mode a while ago. Perhaps SImode is just because of these ancient
times, and BT pattern was written that way to satisfy combine. I think
it is definitely worth investigating; perhaps some BT-related pattern
will become obsolete because of the change.

Uros.


RE: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-14 Thread Roger Sayle


> From: Uros Bizjak 
> Sent: 13 July 2023 19:21
> 
> On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle 
> wrote:
> >
> > This patch resolves PR target/110588 to catch another case in combine
> > where the i386 backend should be generating a btl instruction.  This
> > adds another define_insn_and_split to recognize the RTL representation
> > for this case.
> >
> > I also noticed that two related define_insn_and_split weren't using
> > the preferred string style for single statement
> > preparation-statements, so I've reformatted these to be consistent in style 
> > with
> the new one.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2023-07-13  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/110588
> > * config/i386/i386.md (*bt_setcqi): Prefer string form
> > preparation statement over braces for a single statement.
> > (*bt_setncqi): Likewise.
> > (*bt_setncqi_2): New define_insn_and_split.
> >
> > gcc/testsuite/ChangeLog
> > PR target/110588
> > * gcc.target/i386/pr110588.c: New test case.
> 
> +;; Help combine recognize bt followed by setnc (PR target/110588)
> +(define_insn_and_split "*bt_setncqi_2"
> +  [(set (match_operand:QI 0 "register_operand")  (eq:QI
> +  (zero_extract:SWI48
> +(match_operand:SWI48 1 "register_operand")
> +(const_int 1)
> +(zero_extend:SI (match_operand:QI 2 "register_operand")))
> +  (const_int 0)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (reg:CCC FLAGS_REG)
> +(compare:CCC
> + (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
> + (const_int 0)))
> +   (set (match_dup 0)
> +(ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
> +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> 
> I don't think the above transformation is 100% correct, mainly due to the use 
> of
> paradoxical subreg.
> 
> The combined instruction is operating with a zero_extended QImode register, so
> all bits of the register are well defined. You are splitting using 
> paradoxical subreg,
> so you don't know what garbage is there in the highpart of the count register.
> However, BTL/BTQ uses modulo 64 (or 32) of this register, so even with a 
> slightly
> invalid RTX, everything checks out.
> 
> +  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
> 
> You probably need mode instead of SImode here.

The define_insn for *bt is:

(define_insn "*bt"
  [(set (reg:CCC FLAGS_REG)
(compare:CCC
  (zero_extract:SWI48
(match_operand:SWI48 0 "nonimmediate_operand" "r,m")
(const_int 1)
(match_operand:SI 1 "nonmemory_operand" "r,"))
  (const_int 0)))]

So  isn't appropriate here.

But now you've made me think about it, it's inconsistent that all of the shifts
and rotates in i386.md standardize on QImode for shift counts, but the bit test
instructions use SImode?  I think this explains where the paradoxical SUBREGs
come from, and in theory any_extend from QImode to SImode here could/should 
be handled/unnecessary.

Is it worth investigating a follow-up patch to convert all ZERO_EXTRACTs and
SIGN_EXTRACTs in i386.md to use QImode (instead of SImode)?

Thanks in advance,
Roger
--




Re: [x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-13 Thread Uros Bizjak via Gcc-patches
On Thu, Jul 13, 2023 at 7:10 PM Roger Sayle  wrote:
>
>
> This patch resolves PR target/110588 to catch another case in combine
> where the i386 backend should be generating a btl instruction.  This adds
> another define_insn_and_split to recognize the RTL representation for this
> case.
>
> I also noticed that two related define_insn_and_split weren't using the
> preferred string style for single statement preparation-statements, so
> I've reformatted these to be consistent in style with the new one.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-07-13  Roger Sayle  
>
> gcc/ChangeLog
> PR target/110588
> * config/i386/i386.md (*bt_setcqi): Prefer string form
> preparation statement over braces for a single statement.
> (*bt_setncqi): Likewise.
> (*bt_setncqi_2): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog
> PR target/110588
> * gcc.target/i386/pr110588.c: New test case.

+;; Help combine recognize bt followed by setnc (PR target/110588)
+(define_insn_and_split "*bt_setncqi_2"
+  [(set (match_operand:QI 0 "register_operand")
+ (eq:QI
+  (zero_extract:SWI48
+(match_operand:SWI48 1 "register_operand")
+(const_int 1)
+(zero_extend:SI (match_operand:QI 2 "register_operand")))
+  (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+(compare:CCC
+ (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
+ (const_int 0)))
+   (set (match_dup 0)
+(ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")

I don't think the above transformation is 100% correct, mainly due to
the use of paradoxical subreg.

The combined instruction is operating with a zero_extended QImode
register, so all bits of the register are well defined. You are
splitting using paradoxical subreg, so you don't know what garbage is
there in the highpart of the count register. However, BTL/BTQ uses
modulo 64 (or 32) of this register, so even with a slightly invalid
RTX, everything checks out.

+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")

You probably need mode instead of SImode here.

Otherwise OK.

Thanks,
Uros.


[x86 PATCH] PR target/110588: Add *bt_setncqi_2 to generate btl

2023-07-13 Thread Roger Sayle

This patch resolves PR target/110588 to catch another case in combine
where the i386 backend should be generating a btl instruction.  This adds
another define_insn_and_split to recognize the RTL representation for this
case.

I also noticed that two related define_insn_and_split weren't using the
preferred string style for single statement preparation-statements, so
I've reformatted these to be consistent in style with the new one.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-07-13  Roger Sayle  

gcc/ChangeLog
PR target/110588
* config/i386/i386.md (*bt_setcqi): Prefer string form
preparation statement over braces for a single statement.
(*bt_setncqi): Likewise.
(*bt_setncqi_2): New define_insn_and_split.

gcc/testsuite/ChangeLog
PR target/110588
* gcc.target/i386/pr110588.c: New test case.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e47ced1..04eca049 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16170,9 +16170,7 @@
  (const_int 0)))
(set (match_dup 0)
 (eq:QI (reg:CCC FLAGS_REG) (const_int 0)))]
-{
-  operands[2] = lowpart_subreg (SImode, operands[2], QImode);
-})
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
 
 ;; Help combine recognize bt followed by setnc
 (define_insn_and_split "*bt_setncqi"
@@ -16193,9 +16191,7 @@
  (const_int 0)))
(set (match_dup 0)
 (ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
-{
-  operands[2] = lowpart_subreg (SImode, operands[2], QImode);
-})
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
 
 (define_insn_and_split "*bt_setnc"
   [(set (match_operand:SWI48 0 "register_operand")
@@ -16219,6 +16215,27 @@
   operands[2] = lowpart_subreg (SImode, operands[2], QImode);
   operands[3] = gen_reg_rtx (QImode);
 })
+
+;; Help combine recognize bt followed by setnc (PR target/110588)
+(define_insn_and_split "*bt_setncqi_2"
+  [(set (match_operand:QI 0 "register_operand")
+   (eq:QI
+ (zero_extract:SWI48
+   (match_operand:SWI48 1 "register_operand")
+   (const_int 1)
+   (zero_extend:SI (match_operand:QI 2 "register_operand")))
+ (const_int 0)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_BT && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (reg:CCC FLAGS_REG)
+(compare:CCC
+ (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2))
+ (const_int 0)))
+   (set (match_dup 0)
+(ne:QI (reg:CCC FLAGS_REG) (const_int 0)))]
+  "operands[2] = lowpart_subreg (SImode, operands[2], QImode);")
 
 ;; Store-flag instructions.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr110588.c 
b/gcc/testsuite/gcc.target/i386/pr110588.c
new file mode 100644
index 000..4505c87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr110588.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mtune=core2" } */
+
+unsigned char foo (unsigned char x, int y)
+{
+  int _1 = (int) x;
+  int _2 = _1 >> y;
+  int _3 = _2 & 1;
+  unsigned char _8 = (unsigned char) _3;
+  unsigned char _6 = _8 ^ 1;
+  return _6;
+}
+
+/* { dg-final { scan-assembler "btl" } } */
+/* { dg-final { scan-assembler "setnc" } } */
+/* { dg-final { scan-assembler-not "sarl" } } */
+/* { dg-final { scan-assembler-not "andl" } } */
+/* { dg-final { scan-assembler-not "xorl" } } */