Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-05 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 5, 2021 at 11:25 AM Hongtao Liu  wrote:
>
> On Tue, Jan 5, 2021 at 3:20 PM Uros Bizjak  wrote:
> >
> > On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak  wrote:
> > > >
> > > > +(define_split
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +(zero_extend:SI
> > > > +  (not:HI
> > > > +(subreg:HI
> > > > +  (unspec:SI
> > > > +[(match_operand:V16QI 1 "register_operand")]
> > > > +UNSPEC_MOVMSK) 0]
> > > > +  "TARGET_SSE2"
> > > > +  [(set (match_dup 2)
> > > > +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > > > +   (set (match_dup 0)
> > > > +(match_dup 3))]
> > >
> > > Just write:
> > >
> > > (set (match_dup 0)
> > > (xor:SI (match_dup 2)(const_int 65535))
> >
>
> Yes, changed.
>
> > BTW: This could be a universal combine splitter to simplify
> >
> > unsigned int foo (unsigned short z)
> > {
> > return (unsigned short)~z;
> > }
> >
> > Trying 7 -> 8:
> >7: r87:HI=~r88:SI#0
> >  REG_DEAD r88:SI
> >8: r86:SI=zero_extend(r87:HI)
> >  REG_DEAD r87:HI
> > Failed to match this instruction:
> > (set (reg:SI 86)
> >(zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0
> >
> > But combine does not "split" to one insns.
>
> Yes, according to PSabi, the top half of the register is not
> necessarily 0, so if you add the splitter, it just changes from notl +
> movzwl to xor + movzwl, which doesn't look better?

Indeed.

The patch is OK.

Uros.


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-05 Thread Hongtao Liu via Gcc-patches
On Tue, Jan 5, 2021 at 3:20 PM Uros Bizjak  wrote:
>
> On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak  wrote:
> > >
> > > +(define_split
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +(zero_extend:SI
> > > +  (not:HI
> > > +(subreg:HI
> > > +  (unspec:SI
> > > +[(match_operand:V16QI 1 "register_operand")]
> > > +UNSPEC_MOVMSK) 0]
> > > +  "TARGET_SSE2"
> > > +  [(set (match_dup 2)
> > > +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > > +   (set (match_dup 0)
> > > +(match_dup 3))]
> >
> > Just write:
> >
> > (set (match_dup 0)
> > (xor:SI (match_dup 2)(const_int 65535))
>

Yes, changed.

> BTW: This could be a universal combine splitter to simplify
>
> unsigned int foo (unsigned short z)
> {
> return (unsigned short)~z;
> }
>
> Trying 7 -> 8:
>7: r87:HI=~r88:SI#0
>  REG_DEAD r88:SI
>8: r86:SI=zero_extend(r87:HI)
>  REG_DEAD r87:HI
> Failed to match this instruction:
> (set (reg:SI 86)
>(zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0
>
> But combine does not "split" to one insns.

Yes, according to PSabi, the top half of the register is not
necessarily 0, so if you add the splitter, it just changes from notl +
movzwl to xor + movzwl, which doesn't look better?

>
> Uros.



-- 
BR,
Hongtao


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 5, 2021 at 8:04 AM Uros Bizjak  wrote:
> >
> > +(define_split
> > +  [(set (match_operand:SI 0 "register_operand")
> > +(zero_extend:SI
> > +  (not:HI
> > +(subreg:HI
> > +  (unspec:SI
> > +[(match_operand:V16QI 1 "register_operand")]
> > +UNSPEC_MOVMSK) 0]
> > +  "TARGET_SSE2"
> > +  [(set (match_dup 2)
> > +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> > +   (set (match_dup 0)
> > +(match_dup 3))]
>
> Just write:
>
> (set (match_dup 0)
> (xor:SI (match_dup 2)(const_int 65535))

BTW: This could be a universal combine splitter to simplify

unsigned int foo (unsigned short z)
{
return (unsigned short)~z;
}

Trying 7 -> 8:
   7: r87:HI=~r88:SI#0
 REG_DEAD r88:SI
   8: r86:SI=zero_extend(r87:HI)
 REG_DEAD r87:HI
Failed to match this instruction:
(set (reg:SI 86)
   (zero_extend:SI (not:HI (subreg:HI (reg:SI 88) 0

But combine does not "split" to one insns.

Uros.


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Uros Bizjak via Gcc-patches
On Tue, Jan 5, 2021 at 7:30 AM Hongtao Liu  wrote:
>
> On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu  wrote:
> >
> > On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek  wrote:
> > >
> > > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches 
> > > wrote:
> > > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > > +  [(set (match_operand:SI 0 "register_operand")
> > > > +   (zero_extend:SI (subreg:HI (unspec:SI
> > > > + [(match_operand:V16QI 1 "register_operand")]
> > > > +  UNSPEC_MOVMSK) 0)))]
> > >
> > > Also, please fix up formatting.  Should be:
> > > (zero_extend:SI
> > >   (subreg:HI
> > > (unspec:SI
> > >   [(match_operand:V16QI 1 "register_operand")]
> > >   UNSPEC_MOVMSK) 0)))]
> > > I think.
> > >
> > > Jakub
> > >
> >
> > Yes, thanks for the review both, and happy new year!
> >
> > --
> > BR,
> > Hongtao
>
> Sorry for the bother, this is an incremental patch to split
> (zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI to
>
> pmovmskb%xmm0, %eax
> -   notl%eax
> -   movzwl  %ax, %eax
> +   xorl$65535, %eax
>
>
> The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
>   Ok for trunk?
>
> The following patch adds define_insn_and_split to optimize
>
>vpmovmskb   %xmm0, %eax
> -   movzwl  %ax, %eax
> notl%eax
>
> and combine splitter to optimize
>
> pmovmskb%xmm0, %eax
> -   notl%eax
> -   movzwl  %ax, %eax
> +   xorl$65535, %eax
>
> gcc/ChangeLog
> PR target/98461
> * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> define_insn_and_split for zero_extend of subreg HI of pmovskb
> result.
> (*sse2_pmovskb_zexthisi): Add new combine splitters for
> zero_extend of not of subreg HI of pmovskb result.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md| 32 +++
>  .../gcc.target/i386/sse2-pr98461-2.c  | 25 +++
>  2 files changed, 57 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4fcff0800c0 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
> (set_attr "prefix" "maybe_vex")
> (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +(zero_extend:SI
> +  (subreg:HI
> +(unspec:SI
> +  [(match_operand:V16QI 1 "register_operand")]
> +  UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2 && ix86_pre_reload_split ()"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
> +(define_split
> +  [(set (match_operand:SI 0 "register_operand")
> +(zero_extend:SI
> +  (not:HI
> +(subreg:HI
> +  (unspec:SI
> +[(match_operand:V16QI 1 "register_operand")]
> +UNSPEC_MOVMSK) 0]
> +  "TARGET_SSE2"
> +  [(set (match_dup 2)
> +(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
> +   (set (match_dup 0)
> +(match_dup 3))]

Just write:

(set (match_dup 0)
(xor:SI (match_dup 2)(const_int 65535))

Uros.

>
> +{
> +  operands[2] = gen_reg_rtx (SImode);
> +  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
> +  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
> +})
> +
>  (define_split
>[(set (match_operand:SI 0 "register_operand")
>  (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 000..330272c69bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,25 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
> +/* { dg-final { scan-assembler-times "\txorl" 1 } } */
> +
> +#include 
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> +
> +unsigned int movemask_not2(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  res = ~res;
> +  return res;
> +}
> +
> +unsigned int movemask_zero_extend(__m128i logical) {
> +  unsigned int res = _mm_movemask_epi8(logical);
> +  return res & 0x;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Hongtao Liu via Gcc-patches
On Mon, Jan 4, 2021 at 4:59 PM Hongtao Liu  wrote:
>
> On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek  wrote:
> >
> > On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +   (zero_extend:SI (subreg:HI (unspec:SI
> > > + [(match_operand:V16QI 1 "register_operand")]
> > > +  UNSPEC_MOVMSK) 0)))]
> >
> > Also, please fix up formatting.  Should be:
> > (zero_extend:SI
> >   (subreg:HI
> > (unspec:SI
> >   [(match_operand:V16QI 1 "register_operand")]
> >   UNSPEC_MOVMSK) 0)))]
> > I think.
> >
> > Jakub
> >
>
> Yes, thanks for the review both, and happy new year!
>
> --
> BR,
> Hongtao

Sorry for the bother, this is an incremental patch to split
(zero_extend:SI (not:HI (subreg:HI (pmovmskb result:SI to

pmovmskb%xmm0, %eax
-   notl%eax
-   movzwl  %ax, %eax
+   xorl$65535, %eax


The patch is below, regtestes and bootstrapped on x86_64-linux-gnu{-m32,}.
  Ok for trunk?

The following patch adds define_insn_and_split to optimize

   vpmovmskb   %xmm0, %eax
-   movzwl  %ax, %eax
notl%eax

and combine splitter to optimize

pmovmskb%xmm0, %eax
-   notl%eax
-   movzwl  %ax, %eax
+   xorl$65535, %eax

gcc/ChangeLog
PR target/98461
* config/i386/sse.md (*sse2_pmovskb_zexthisi): New
define_insn_and_split for zero_extend of subreg HI of pmovskb
result.
(*sse2_pmovskb_zexthisi): Add new combine splitters for
zero_extend of not of subreg HI of pmovskb result.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md| 32 +++
 .../gcc.target/i386/sse2-pr98461-2.c  | 25 +++
 2 files changed, 57 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4fcff0800c0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,38 @@ (define_insn "*sse2_pmovmskb_ext"
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+(zero_extend:SI
+  (subreg:HI
+(unspec:SI
+  [(match_operand:V16QI 1 "register_operand")]
+  UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2 && ix86_pre_reload_split ()"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
+(define_split
+  [(set (match_operand:SI 0 "register_operand")
+(zero_extend:SI
+  (not:HI
+(subreg:HI
+  (unspec:SI
+[(match_operand:V16QI 1 "register_operand")]
+UNSPEC_MOVMSK) 0]
+  "TARGET_SSE2"
+  [(set (match_dup 2)
+(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
+   (set (match_dup 0)
+(match_dup 3))]
+{
+  operands[2] = gen_reg_rtx (SImode);
+  operands[3] = gen_int_mode ((HOST_WIDE_INT_1 << 16) - 1, SImode);
+  operands[3] = gen_rtx_XOR (SImode, operands[2], operands[3]);
+})
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
 (unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 000..330272c69bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,25 @@
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 3 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } *
+/* { dg-final { scan-assembler-times "\txorl" 1 } } */
+
+#include 
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}
+
+unsigned int movemask_not2(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  res = ~res;
+  return res;
+}
+
+unsigned int movemask_zero_extend(__m128i logical) {
+  unsigned int res = _mm_movemask_epi8(logical);
+  return res & 0x;
+}
-- 
2.18.1


-- 
BR,
Hongtao


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Hongtao Liu via Gcc-patches
On Mon, Jan 4, 2021 at 4:49 PM Jakub Jelinek  wrote:
>
> On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +   (zero_extend:SI (subreg:HI (unspec:SI
> > + [(match_operand:V16QI 1 "register_operand")]
> > +  UNSPEC_MOVMSK) 0)))]
>
> Also, please fix up formatting.  Should be:
> (zero_extend:SI
>   (subreg:HI
> (unspec:SI
>   [(match_operand:V16QI 1 "register_operand")]
>   UNSPEC_MOVMSK) 0)))]
> I think.
>
> Jakub
>

Yes, thanks for the review both, and happy new year!

-- 
BR,
Hongtao


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Jakub Jelinek via Gcc-patches
On Mon, Jan 04, 2021 at 01:56:44PM +0800, Hongtao Liu via Gcc-patches wrote:
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +   (zero_extend:SI (subreg:HI (unspec:SI
> + [(match_operand:V16QI 1 "register_operand")]
> +  UNSPEC_MOVMSK) 0)))]

Also, please fix up formatting.  Should be:
(zero_extend:SI
  (subreg:HI
(unspec:SI
  [(match_operand:V16QI 1 "register_operand")]
  UNSPEC_MOVMSK) 0)))]
I think.

Jakub



Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-04 Thread Uros Bizjak via Gcc-patches
On Mon, Jan 4, 2021 at 8:52 AM Hongtao Liu  wrote:
>
> On Mon, Jan 4, 2021 at 3:40 PM Uros Bizjak  wrote:
> >
> > On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu  wrote:
> > >
> > > Hi:
> > >   The following patch adds define_insn_and_split to optimize
> > >
> > >vpmovmskb   %xmm0, %eax
> > > -   movzwl  %ax, %eax
> > > notl%eax
> > >
> > >   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
> > >   Ok for trunk?
> > >
> > > gcc/ChangeLog
> > > PR target/98461
> > > * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> > > define_insn_and_split for zero_extend of subreg HI of pmovskb
> > > result.
> > >
> > > gcc/testsuite/ChangeLog
> > > * gcc.target/i386/sse-pr98461-2.c: New test.
> > > ---
> > >  gcc/config/i386/sse.md | 11 +++
> > >  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +
> > >  2 files changed, 24 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index d84103807ff..4ed6b9ae476 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "mode" "SI")])
> > >
> > > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > > +  [(set (match_operand:SI 0 "register_operand")
> > > +   (zero_extend:SI (subreg:HI (unspec:SI
> > > + [(match_operand:V16QI 1 "register_operand")]
> > > +  UNSPEC_MOVMSK) 0)))]
> > > +  "TARGET_SSE2"
> >
> > This needs ix86_pre_reload_split () in insn predicate.
> >
>
> Yes, there's subreg in the pattern.

Also the insn pattern does not have operand constraints.

> Assume patch is pre-approved with that change and
> regtested/bootstrapped on x86_64-linux-gnu{-m32,}.

LGTM with the above addition.

Uros.


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-03 Thread Hongtao Liu via Gcc-patches
On Mon, Jan 4, 2021 at 3:40 PM Uros Bizjak  wrote:
>
> On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu  wrote:
> >
> > Hi:
> >   The following patch adds define_insn_and_split to optimize
> >
> >vpmovmskb   %xmm0, %eax
> > -   movzwl  %ax, %eax
> > notl%eax
> >
> >   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
> >   Ok for trunk?
> >
> > gcc/ChangeLog
> > PR target/98461
> > * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> > define_insn_and_split for zero_extend of subreg HI of pmovskb
> > result.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/sse-pr98461-2.c: New test.
> > ---
> >  gcc/config/i386/sse.md | 11 +++
> >  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +
> >  2 files changed, 24 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index d84103807ff..4ed6b9ae476 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "mode" "SI")])
> >
> > +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> > +  [(set (match_operand:SI 0 "register_operand")
> > +   (zero_extend:SI (subreg:HI (unspec:SI
> > + [(match_operand:V16QI 1 "register_operand")]
> > +  UNSPEC_MOVMSK) 0)))]
> > +  "TARGET_SSE2"
>
> This needs ix86_pre_reload_split () in insn predicate.
>

Yes, there's subreg in the pattern.
Assume patch is pre-approved with that change and
regtested/bootstrapped on x86_64-linux-gnu{-m32,}.

> Uros.
>
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0)
> > +   (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> > +
> >  (define_split
> >[(set (match_operand:SI 0 "register_operand")
> > (unspec:SI
> > diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > new file mode 100644
> > index 000..60fc1f3e9c1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> > @@ -0,0 +1,13 @@
> > +/* PR target/98461 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> > +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
> > +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> > +/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
> > +
> > +#include 
> > +
> > +unsigned int movemask_not1(__m128i logical) {
> > +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> > +  return ~res;
> > +}
> > --
> > 2.18.1
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao


Re: [PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-03 Thread Uros Bizjak via Gcc-patches
On Mon, Jan 4, 2021 at 6:54 AM Hongtao Liu  wrote:
>
> Hi:
>   The following patch adds define_insn_and_split to optimize
>
>vpmovmskb   %xmm0, %eax
> -   movzwl  %ax, %eax
> notl%eax
>
>   Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
>   Ok for trunk?
>
> gcc/ChangeLog
> PR target/98461
> * config/i386/sse.md (*sse2_pmovskb_zexthisi): New
> define_insn_and_split for zero_extend of subreg HI of pmovskb
> result.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/sse-pr98461-2.c: New test.
> ---
>  gcc/config/i386/sse.md | 11 +++
>  gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +
>  2 files changed, 24 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index d84103807ff..4ed6b9ae476 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
> (set_attr "prefix" "maybe_vex")
> (set_attr "mode" "SI")])
>
> +(define_insn_and_split "*sse2_pmovskb_zexthisi"
> +  [(set (match_operand:SI 0 "register_operand")
> +   (zero_extend:SI (subreg:HI (unspec:SI
> + [(match_operand:V16QI 1 "register_operand")]
> +  UNSPEC_MOVMSK) 0)))]
> +  "TARGET_SSE2"

This needs ix86_pre_reload_split () in insn predicate.

Uros.

> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +   (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
> +
>  (define_split
>[(set (match_operand:SI 0 "register_operand")
> (unspec:SI
> diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> new file mode 100644
> index 000..60fc1f3e9c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
> @@ -0,0 +1,13 @@
> +/* PR target/98461 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
> +/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
> +/* { dg-final { scan-assembler-not "\tmovzwl" } } */
> +/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
> +
> +#include 
> +
> +unsigned int movemask_not1(__m128i logical) {
> +  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
> +  return ~res;
> +}
> --
> 2.18.1
>
>
> --
> BR,
> Hongtao


[PATCH]i386: Optimize pmovskb on zero_extend of subreg HI of the result [PR98461]

2021-01-03 Thread Hongtao Liu via Gcc-patches
Hi:
  The following patch adds define_insn_and_split to optimize

   vpmovmskb   %xmm0, %eax
-   movzwl  %ax, %eax
notl%eax

  Bootstrapped/regtested on x86_64-linux-gnu {,-m32}.
  Ok for trunk?

gcc/ChangeLog
PR target/98461
* config/i386/sse.md (*sse2_pmovskb_zexthisi): New
define_insn_and_split for zero_extend of subreg HI of pmovskb
result.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse-pr98461-2.c: New test.
---
 gcc/config/i386/sse.md | 11 +++
 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c | 13 +
 2 files changed, 24 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d84103807ff..4ed6b9ae476 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16099,6 +16099,17 @@ (define_insn "*sse2_pmovmskb_ext"
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "SI")])

+(define_insn_and_split "*sse2_pmovskb_zexthisi"
+  [(set (match_operand:SI 0 "register_operand")
+   (zero_extend:SI (subreg:HI (unspec:SI
+ [(match_operand:V16QI 1 "register_operand")]
+  UNSPEC_MOVMSK) 0)))]
+  "TARGET_SSE2"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))])
+
 (define_split
   [(set (match_operand:SI 0 "register_operand")
(unspec:SI
diff --git a/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
new file mode 100644
index 000..60fc1f3e9c1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-pr98461-2.c
@@ -0,0 +1,13 @@
+/* PR target/98461 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-sse3 -masm=att" } */
+/* { dg-final { scan-assembler-times "\tpmovmskb\t" 1 } } */
+/* { dg-final { scan-assembler-not "\tmovzwl" } } */
+/* { dg-final { scan-assembler-times "\tnotl" 1 } } */
+
+#include 
+
+unsigned int movemask_not1(__m128i logical) {
+  unsigned short res = (unsigned short)(_mm_movemask_epi8(logical));
+  return ~res;
+}
-- 
2.18.1


-- 
BR,
Hongtao