On Thu, Aug 20, 2020 at 3:40 PM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Thu, Aug 20, 2020 at 9:31 AM Hongtao Liu <crazy...@gmail.com> wrote:
> >
> > On Thu, Aug 20, 2020 at 3:24 PM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > On Wed, Aug 19, 2020 at 3:05 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 19, 2020 at 4:25 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > > >
> > > > > On Mon, Aug 17, 2020 at 6:08 PM Uros Bizjak <ubiz...@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, Aug 14, 2020 at 10:26 AM Hongtao Liu <crazy...@gmail.com> 
> > > > > > wrote:
> > > > > > >
> > > > > > > Enable operator or/xor/and/andn/not for mask register, kxnor is 
> > > > > > > not
> > > > > > > enabled since there's no corresponding instruction for general
> > > > > > > registers.
> > > > > > >
> > > > > > > gcc/
> > > > > > >         PR target/88808
> > > > > > >         * config/i386/i386.md: (*movsi_internal): Adjust 
> > > > > > > constraints
> > > > > > >         for mask registers.
> > > > > > >         (*movhi_internal): Ditto.
> > > > > > >         (*movqi_internal): Ditto.
> > > > > > >         (*anddi_1): Support mask register operations
> > > > > > >         (*and<mode>_1): Ditto.
> > > > > > >         (*andqi_1): Ditto.
> > > > > > >         (*andn<mode>_1): Ditto.
> > > > > > >         (*<code><mode>_1): Ditto.
> > > > > > >         (*<code>qi_1): Ditto.
> > > > > > >         (*one_cmpl<mode>2_1): Ditto.
> > > > > > >         (*one_cmplsi2_1_zext): Ditto.
> > > > > > >         (*one_cmplqi2_1): Ditto.
> > > > > > >
> > > > > > > gcc/testsuite/
> > > > > > >         * gcc.target/i386/bitwise_mask_op-1.c: New test.
> > > > > > >         * gcc.target/i386/bitwise_mask_op-2.c: New test.
> > > > > > >         * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> > > > > > >         * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> > > > > > >         * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> > > > > > >         * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
> > > > > >
> > > > > > index 74d207c3711..e8ad79d1b0a 100644
> > > > > > --- a/gcc/config/i386/i386.md
> > > > > > +++ b/gcc/config/i386/i386.md
> > > > > > @@ -2294,7 +2294,7 @@
> > > > > >
> > > > > >  (define_insn "*movsi_internal"
> > > > > >    [(set (match_operand:SI 0 "nonimmediate_operand"
> > > > > > -    "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,*k")
> > > > > > +    "=r,m ,*y,*y,?*y,?m,?r,?*y,*v,*v,*v,m ,?r,?*v,*k,*k ,*rm,k")
> > > > > >      (match_operand:SI 1 "general_operand"
> > > > > >      "g ,re,C ,*y,m  ,*y,*y,r  ,C ,*v,m ,*v,*v,r  ,*r,*km,*k 
> > > > > > ,CBC"))]
> > > > > >    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > > > >
> > > > > > I'd rather see *k everywhere, also with *movqi_internal and
> > > > > > *movhi_internal patterns. The "*" means that the allocator won't
> > > > > > allocate a mask register by default, but it will be used to optimize
> > > > > > moves. With the above change, you are risking that during integer
> > > > > > register pressure, the register allocator will allocate zero to a 
> > > > > > mask
> > > > > > register, and later "optimize" the move with a direct maskreg-intreg
> > > > > > move.
> > > > > >
> > > > > > The current strategy is that only general registers get allocated 
> > > > > > for
> > > > > > integer modes. Let's keep it this way for now.
> > > > > >
> > > > >
> > > > > Yes,  though it would fail gcc.target/i386/avx512dq-pr88465.c and
> > > > > gcc.target/i386/avx512f-pr88465.c, i think it's more reasonable not to
> > > > > move zero into mask register directly.
> > > >
> > > > Although it would be nice if the register allocator was smart enough,
> > > > the current strategy is to introduce peephole2 patterns to fix these
> > > > problems, similar to [1]. These peepholes can be introduced in a
> > > > follow-up patch.
> > > >
> > > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551744.html
> > > >
> > >
> > > peephole2 added.
> > >
> > > > > > Otherwise, the patchset LGTM, but please test the suggested changes 
> > > > > > and repost.
> > > > > >
> > > > > > BTW: Do you plan to remove mask operations from sse.md? ATM, they 
> > > > > > are
> > > > > > used to distinguish mask operations, generated from builtins from
> > > > > > generic operations, so I'd like to keep them for a while. The 
> > > > > > drawback
> > > > > > is, that they are not combined with other operations, but at the end
> > > > > > of the day, this is what the programmer asked for by using builtins.
> > > > >
> > > > > Agree, I prefer to keep them.
> > > >
> > > > Thinking some more about the approach, it looks to me that the optimal
> > > > solution is a post-reload splitter that would convert "generic"
> > > > patterns to mask operations from sse.md. The mask operations don't set
> > > > flags, so we can substantially improve post reload scheduling of these
> > > > instructions by removing flags clobber.
> > > >
> > > > So, simply add "#" to relevant alternatives of logic patterns and add
> > > > something like:
> > > >
> > > > --cut here--
> > > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > > index 41c6dbfa668..ad49bdc7583 100644
> > > > --- a/gcc/config/i386/sse.md
> > > > +++ b/gcc/config/i386/sse.md
> > > > @@ -1470,6 +1470,18 @@
> > > >            ]
> > > >            (const_string "<MODE>")))])
> > > >
> > > > +(define_split
> > > > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > > > +       (any_logic:SWI1248_AVX512BW
> > > > +         (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> > > > +         (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> > > > +   (clobber (reg:CC FLAGS_REG))]
> > > > +  "TARGET_AVX512F && reload_completed"
> > > > +  [(parallel
> > > > +     [(set (match_dup 0)
> > > > +          (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
> > > > +      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> > > > +
> > > >  (define_insn "kandn<mode>"
> > > >    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
> > > >         (and:SWI1248_AVX512BW
> > > > --cut here--
> > > >
> > > > and similar for kandn and knot in sse.md. You will have to add
> > > > mask_reg_operand predicate, see e.g. sse_reg_operand in predicates.md
> > > > for example.
> > > >
> > > > We don't lose anything, because all important transformations,
> > > > propagations and simplifications with these patterns happen before
> > > > reload.
> > >
> > > define_splits are added for those bitwise operations.
> > >
> > > >
> > > > Uros.
> > >
> > > Also add bellow part which will pass gcc.target/i386/bitwise_mask_op-3.c
> > >
> > > -     must go into Q_REGS.  */
> > > +     must go into Q_REGS or ALL_MASK_REGS.  */
> > >    if (GET_MODE (x) == QImode && !CONSTANT_P (x))
> > >      {
> > >        if (Q_CLASS_P (regclass))
> > >         return regclass;
> > >        else if (reg_class_subset_p (Q_REGS, regclass))
> > >         return Q_REGS;
> > > +      else if (MASK_CLASS_P (regclass))
> > > +       return regclass;
> > >        else
> > >         return NO_REGS;
> > >
> > >
> > > Update patch.
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
> >
> > networking is slow to send out mail with attachment, so i copy the
> > patch into mail.
> >
> > gcc/
> >         PR target/88808
> >         * config/i386/i386.c (ix86_preferred_reload_class): Allow
> >         QImode data go into mask registers.
> >         * config/i386/i386.md: (*movhi_internal): Adjust constraints
> >         for mask registers.
> >         (*movqi_internal): Ditto.
> >         (*anddi_1): Support mask register operations
> >         (*and<mode>_1): Ditto.
> >         (*andqi_1): Ditto.
> >         (*andn<mode>_1): Ditto.
> >         (*<code><mode>_1): Ditto.
> >         (*<code>qi_1): Ditto.
> >         (*one_cmpl<mode>2_1): Ditto.
> >         (*one_cmplsi2_1_zext): Ditto.
> >         (*one_cmplqi2_1): Ditto.
> >         (define_peephole2): Move constant 0/-1 directly into mask
> >         registers.
> >         * config/i386/predicates.md (mask_reg_operand): New predicate.
> >         * config/i386/sse.md (define_split): Add post-reload splitters
> >         that would convert "generic" patterns to mask patterns.
> >         (*knotsi_1_zext): New define_insn.
> >
> > gcc/testsuite/
> >         * gcc.target/i386/bitwise_mask_op-1.c: New test.
> >         * gcc.target/i386/bitwise_mask_op-2.c: New test.
> >         * gcc.target/i386/bitwise_mask_op-3.c: New test.
> >         * gcc.target/i386/avx512bw-pr88465.c: New testcase.
> >         * gcc.target/i386/avx512bw-kunpckwd-1.c: Adjust testcase.
> >         * gcc.target/i386/avx512bw-kunpckwd-3.c: Ditto.
> >         * gcc.target/i386/avx512dq-kmovb-5.c: Ditto.
> >         * gcc.target/i386/avx512f-kmovw-5.c: Ditto.
>
> A little nit, please put new splitters after the instruction pattern.
>
> OK for the whole patch set with the above change,
>

Yes, thanks for the review.

> Thanks,
> Uros.
>
> > ---
> >  gcc/config/i386/i386.c                        |   4 +-
> >  gcc/config/i386/i386.md                       | 209 ++++++++++++------
> >  gcc/config/i386/predicates.md                 |   5 +
> >  gcc/config/i386/sse.md                        |  59 +++++
> >  .../gcc.target/i386/avx512bw-kunpckwd-1.c     |   2 +-
> >  .../gcc.target/i386/avx512bw-kunpckwd-3.c     |   2 +-
> >  .../gcc.target/i386/avx512bw-pr88465.c        |  23 ++
> >  .../gcc.target/i386/avx512dq-kmovb-5.c        |   2 +-
> >  .../gcc.target/i386/avx512f-kmovw-5.c         |   2 +-
> >  .../gcc.target/i386/bitwise_mask_op-1.c       | 178 +++++++++++++++
> >  .../gcc.target/i386/bitwise_mask_op-2.c       |   8 +
> >  .../gcc.target/i386/bitwise_mask_op-3.c       |  44 ++++
> >  12 files changed, 471 insertions(+), 67 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr88465.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/bitwise_mask_op-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/bitwise_mask_op-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > index d71d6d55be6..e8a2182ceb0 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -18407,13 +18407,15 @@ ix86_preferred_reload_class (rtx x,
> > reg_class_t regclass)
> >      return INTEGER_CLASS_P (regclass) ? regclass : NO_REGS;
> >
> >    /* QImode constants are easy to load, but non-constant QImode data
> > -     must go into Q_REGS.  */
> > +     must go into Q_REGS or ALL_MASK_REGS.  */
> >    if (GET_MODE (x) == QImode && !CONSTANT_P (x))
> >      {
> >        if (Q_CLASS_P (regclass))
> >          return regclass;
> >        else if (reg_class_subset_p (Q_REGS, regclass))
> >          return Q_REGS;
> > +      else if (MASK_CLASS_P (regclass))
> > +        return regclass;
> >        else
> >          return NO_REGS;
> >      }
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 3a15941c3e8..676525fbc1f 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -2403,8 +2403,8 @@
> >             (symbol_ref "true")))])
> >
> >  (define_insn "*movhi_internal"
> > -  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,k,k 
> > ,r,m,k")
> > -        (match_operand:HI 1 "general_operand"      "r
> > ,rn,rm,rn,r,km,k,k,CBC"))]
> > +  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k
> > ,*r,*m,*k")
> > +        (match_operand:HI 1 "general_operand"      "r
> > ,rn,rm,rn,*r,*km,*k,*k,CBC"))]
> >    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> >  {
> >    switch (get_attr_type (insn))
> > @@ -2491,9 +2491,9 @@
> >
> >  (define_insn "*movqi_internal"
> >    [(set (match_operand:QI 0 "nonimmediate_operand"
> > -                        "=Q,R,r,q,q,r,r ,?r,m ,k,k,r,m,k,k,k")
> > +                        "=Q,R,r,q,q,r,r ,?r,m ,*k,*k,*r,*m,*k,*k,*k")
> >          (match_operand:QI 1 "general_operand"
> > -                        "Q ,R,r,n,m,q,rn, m,qn,r,k,k,k,m,C,BC"))]
> > +                        "Q ,R,r,n,m,q,rn, m,qn,*r,*k,*k,*k,*m,C,BC"))]
> >    "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> >  {
> >    char buf[128];
> > @@ -2624,6 +2624,19 @@
> >             ]
> >             (const_string "QI")))])
> >
> > +/* Reload dislikes loading 0/-1 directly into mask registers.
> > +   Try to tidy things up here.  */
> > +(define_peephole2
> > +  [(set (match_operand:SWI 0 "general_reg_operand")
> > +        (match_operand:SWI 1 "immediate_operand"))
> > +   (set (match_operand:SWI 2 "mask_reg_operand")
> > +        (match_dup 0))]
> > +  "peep2_reg_dead_p (2, operands[0])
> > +   && (const0_operand (operands[1], <MODE>mode)
> > +       || (constm1_operand (operands[1], <MODE>mode)
> > +           && (<MODE_SIZE> > 1 || TARGET_AVX512DQ)))"
> > +  [(set (match_dup 2) (match_dup 1))])
> > +
> >  ;; Stores and loads of ax to arbitrary constant address.
> >  ;; We fake an second form of instruction to force reload to load address
> >  ;; into register when rax is not available
> > @@ -9044,19 +9057,21 @@
> >  })
> >
> >  (define_insn "*anddi_1"
> > -  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r")
> > +  [(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r,k")
> >          (and:DI
> > -         (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm")
> > -         (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L")))
> > +         (match_operand:DI 1 "nonimmediate_operand" "%0,0,0,qm,k")
> > +         (match_operand:DI 2 "x86_64_szext_general_operand" "Z,re,m,L,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "TARGET_64BIT && ix86_binary_operator_ok (AND, DImode, operands)"
> >    "@
> >     and{l}\t{%k2, %k0|%k0, %k2}
> >     and{q}\t{%2, %0|%0, %2}
> >     and{q}\t{%2, %0|%0, %2}
> > +   #
> >     #"
> > -  [(set_attr "type" "alu,alu,alu,imovx")
> > -   (set_attr "length_immediate" "*,*,*,0")
> > +  [(set_attr "isa" "x64,x64,x64,x64,avx512bw")
> > +   (set_attr "type" "alu,alu,alu,imovx,msklog")
> > +   (set_attr "length_immediate" "*,*,*,0,*")
> >     (set (attr "prefix_rex")
> >       (if_then_else
> >         (and (eq_attr "type" "imovx")
> > @@ -9064,7 +9079,7 @@
> >                   (match_operand 1 "ext_QIreg_operand")))
> >         (const_string "1")
> >         (const_string "*")))
> > -   (set_attr "mode" "SI,DI,DI,SI")])
> > +   (set_attr "mode" "SI,DI,DI,SI,DI")])
> >
> >  (define_insn_and_split "*anddi_1_btr"
> >    [(set (match_operand:DI 0 "nonimmediate_operand" "=rm")
> > @@ -9130,17 +9145,25 @@
> >     (set_attr "mode" "SI")])
> >
> >  (define_insn "*and<mode>_1"
> > -  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya")
> > -        (and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" "%0,0,qm")
> > -                   (match_operand:SWI24 2 "<general_operand>" "r<i>,m,L")))
> > +  [(set (match_operand:SWI24 0 "nonimmediate_operand" "=rm,r,Ya,k")
> > +        (and:SWI24 (match_operand:SWI24 1 "nonimmediate_operand" 
> > "%0,0,qm,k")
> > +                   (match_operand:SWI24 2 "<general_operand>" 
> > "r<i>,m,L,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "ix86_binary_operator_ok (AND, <MODE>mode, operands)"
> >    "@
> >     and{<imodesuffix>}\t{%2, %0|%0, %2}
> >     and{<imodesuffix>}\t{%2, %0|%0, %2}
> > +   #
> >     #"
> > -  [(set_attr "type" "alu,alu,imovx")
> > -   (set_attr "length_immediate" "*,*,0")
> > +  [(set (attr "isa")
> > +        (cond [(eq_attr "alternative" "3")
> > +                 (if_then_else (eq_attr "mode" "SI")
> > +                   (const_string "avx512bw")
> > +                   (const_string "avx512f"))
> > +              ]
> > +              (const_string "*")))
> > +   (set_attr "type" "alu,alu,imovx,msklog")
> > +   (set_attr "length_immediate" "*,*,0,*")
> >     (set (attr "prefix_rex")
> >       (if_then_else
> >         (and (eq_attr "type" "imovx")
> > @@ -9148,20 +9171,28 @@
> >                   (match_operand 1 "ext_QIreg_operand")))
> >         (const_string "1")
> >         (const_string "*")))
> > -   (set_attr "mode" "<MODE>,<MODE>,SI")])
> > +   (set_attr "mode" "<MODE>,<MODE>,SI,<MODE>")])
> >
> >  (define_insn "*andqi_1"
> > -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
> > -        (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
> > -                (match_operand:QI 2 "general_operand" "qn,m,rn")))
> > +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,k")
> > +        (and:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
> > +                (match_operand:QI 2 "general_operand" "qn,m,rn,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "ix86_binary_operator_ok (AND, QImode, operands)"
> >    "@
> >     and{b}\t{%2, %0|%0, %2}
> >     and{b}\t{%2, %0|%0, %2}
> > -   and{l}\t{%k2, %k0|%k0, %k2}"
> > -  [(set_attr "type" "alu")
> > -   (set_attr "mode" "QI,QI,SI")
> > +   and{l}\t{%k2, %k0|%k0, %k2}
> > +   #"
> > +  [(set_attr "type" "alu,alu,alu,msklog")
> > +   (set (attr "mode")
> > +        (cond [(eq_attr "alternative" "2")
> > +                 (const_string "SI")
> > +                (and (eq_attr "alternative" "3")
> > +                     (match_test "!TARGET_AVX512DQ"))
> > +                 (const_string "HI")
> > +               ]
> > +               (const_string "QI")))
> >     ;; Potential partial reg stall on alternative 2.
> >     (set (attr "preferred_for_speed")
> >       (cond [(eq_attr "alternative" "2")
> > @@ -9539,28 +9570,42 @@
> >  })
> >
> >  (define_insn "*andn<mode>_1"
> > -  [(set (match_operand:SWI48 0 "register_operand" "=r,r")
> > +  [(set (match_operand:SWI48 0 "register_operand" "=r,r,k")
> >          (and:SWI48
> > -          (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r"))
> > -          (match_operand:SWI48 2 "nonimmediate_operand" "r,m")))
> > +          (not:SWI48 (match_operand:SWI48 1 "register_operand" "r,r,k"))
> > +          (match_operand:SWI48 2 "nonimmediate_operand" "r,m,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> > -  "TARGET_BMI"
> > -  "andn\t{%2, %1, %0|%0, %1, %2}"
> > -  [(set_attr "type" "bitmanip")
> > -   (set_attr "btver2_decode" "direct, double")
> > +  "TARGET_BMI || TARGET_AVX512BW"
> > +  "@
> > +   andn\t{%2, %1, %0|%0, %1, %2}
> > +   andn\t{%2, %1, %0|%0, %1, %2}
> > +   #"
> > +  [(set_attr "isa" "bmi,bmi,avx512bw")
> > +   (set_attr "type" "bitmanip,bitmanip,msklog")
> > +   (set_attr "btver2_decode" "direct, double,*")
> >     (set_attr "mode" "<MODE>")])
> >
> >  (define_insn "*andn<mode>_1"
> > -  [(set (match_operand:SWI12 0 "register_operand" "=r")
> > +  [(set (match_operand:SWI12 0 "register_operand" "=r,k")
> >          (and:SWI12
> > -          (not:SWI12 (match_operand:SWI12 1 "register_operand" "r"))
> > -          (match_operand:SWI12 2 "register_operand" "r")))
> > +          (not:SWI12 (match_operand:SWI12 1 "register_operand" "r,k"))
> > +          (match_operand:SWI12 2 "register_operand" "r,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> > -  "TARGET_BMI"
> > -  "andn\t{%k2, %k1, %k0|%k0, %k1, %k2}"
> > -  [(set_attr "type" "bitmanip")
> > -   (set_attr "btver2_decode" "direct")
> > -   (set_attr "mode" "SI")])
> > +  "TARGET_BMI || TARGET_AVX512BW"
> > +  "@
> > +   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
> > +   #"
> > +  [(set_attr "isa" "bmi,avx512f")
> > +   (set_attr "type" "bitmanip,msklog")
> > +   (set_attr "btver2_decode" "direct,*")
> > +   (set (attr "mode")
> > +        (cond [(eq_attr "alternative" "0")
> > +                 (const_string "SI")
> > +               (and (eq_attr "alternative" "1")
> > +                    (match_test "!TARGET_AVX512DQ"))
> > +                  (const_string "HI")
> > +              ]
> > +              (const_string "<MODE>")))])
> >
> >  (define_insn "*andn_<mode>_ccno"
> >    [(set (reg FLAGS_REG)
> > @@ -9631,14 +9676,24 @@
> >  })
> >
> >  (define_insn "*<code><mode>_1"
> > -  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r")
> > +  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,r,k")
> >          (any_or:SWI248
> > -         (match_operand:SWI248 1 "nonimmediate_operand" "%0,0")
> > -         (match_operand:SWI248 2 "<general_operand>" "r<i>,m")))
> > +         (match_operand:SWI248 1 "nonimmediate_operand" "%0,0,k")
> > +         (match_operand:SWI248 2 "<general_operand>" "r<i>,m,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
> > -  "<logic>{<imodesuffix>}\t{%2, %0|%0, %2}"
> > -  [(set_attr "type" "alu")
> > +  "@
> > +   <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
> > +   <logic>{<imodesuffix>}\t{%2, %0|%0, %2}
> > +   #"
> > +  [(set (attr "isa")
> > +        (cond [(eq_attr "alternative" "2")
> > +                 (if_then_else (eq_attr "mode" "SI,DI")
> > +                   (const_string "avx512bw")
> > +                   (const_string "avx512f"))
> > +              ]
> > +              (const_string "*")))
> > +   (set_attr "type" "alu, alu, msklog")
> >     (set_attr "mode" "<MODE>")])
> >
> >  (define_insn_and_split "*iordi_1_bts"
> > @@ -9711,17 +9766,26 @@
> >     (set_attr "mode" "SI")])
> >
> >  (define_insn "*<code>qi_1"
> > -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r")
> > -        (any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0")
> > -                   (match_operand:QI 2 "general_operand" "qn,m,rn")))
> > +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,q,r,k")
> > +        (any_or:QI (match_operand:QI 1 "nonimmediate_operand" "%0,0,0,k")
> > +                   (match_operand:QI 2 "general_operand" "qn,m,rn,k")))
> >     (clobber (reg:CC FLAGS_REG))]
> >    "ix86_binary_operator_ok (<CODE>, QImode, operands)"
> >    "@
> >     <logic>{b}\t{%2, %0|%0, %2}
> >     <logic>{b}\t{%2, %0|%0, %2}
> > -   <logic>{l}\t{%k2, %k0|%k0, %k2}"
> > -  [(set_attr "type" "alu")
> > -   (set_attr "mode" "QI,QI,SI")
> > +   <logic>{l}\t{%k2, %k0|%k0, %k2}
> > +   #"
> > +  [(set_attr "isa" "*,*,*,avx512f")
> > +   (set_attr "type" "alu,alu,alu,msklog")
> > +   (set (attr "mode")
> > +        (cond [(eq_attr "alternative" "2")
> > +                 (const_string "SI")
> > +                (and (eq_attr "alternative" "3")
> > +                     (match_test "!TARGET_AVX512DQ"))
> > +                 (const_string "HI")
> > +               ]
> > +               (const_string "QI")))
> >     ;; Potential partial reg stall on alternative 2.
> >     (set (attr "preferred_for_speed")
> >       (cond [(eq_attr "alternative" "2")
> > @@ -10370,31 +10434,52 @@
> >    "split_double_mode (DImode, &operands[0], 2, &operands[0], 
> > &operands[2]);")
> >
> >  (define_insn "*one_cmpl<mode>2_1"
> > -  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm")
> > -        (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" "0")))]
> > +  [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm,k")
> > +        (not:SWI248 (match_operand:SWI248 1 "nonimmediate_operand" 
> > "0,k")))]
> >    "ix86_unary_operator_ok (NOT, <MODE>mode, operands)"
> > -  "not{<imodesuffix>}\t%0"
> > -  [(set_attr "type" "negnot")
> > +  "@
> > +   not{<imodesuffix>}\t%0
> > +   #"
> > +  [(set (attr "isa")
> > +        (cond [(eq_attr "alternative" "2")
> > +                 (if_then_else (eq_attr "mode" "SI,DI")
> > +                   (const_string "avx512bw")
> > +                   (const_string "avx512f"))
> > +              ]
> > +              (const_string "*")))
> > +   (set_attr "type" "negnot,msklog")
> >     (set_attr "mode" "<MODE>")])
> >
> >  (define_insn "*one_cmplsi2_1_zext"
> > -  [(set (match_operand:DI 0 "register_operand" "=r")
> > +  [(set (match_operand:DI 0 "register_operand" "=r,k")
> >          (zero_extend:DI
> > -          (not:SI (match_operand:SI 1 "register_operand" "0"))))]
> > +          (not:SI (match_operand:SI 1 "register_operand" "0,k"))))]
> >    "TARGET_64BIT && ix86_unary_operator_ok (NOT, SImode, operands)"
> > -  "not{l}\t%k0"
> > -  [(set_attr "type" "negnot")
> > -   (set_attr "mode" "SI")])
> > +  "@
> > +   not{l}\t%k0
> > +   #"
> > +  [(set_attr "isa" "x64,avx512bw")
> > +   (set_attr "type" "negnot,msklog")
> > +   (set_attr "mode" "SI,SI")])
> >
> >  (define_insn "*one_cmplqi2_1"
> > -  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r")
> > -        (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0")))]
> > +  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm,r,k")
> > +        (not:QI (match_operand:QI 1 "nonimmediate_operand" "0,0,k")))]
> >    "ix86_unary_operator_ok (NOT, QImode, operands)"
> >    "@
> >     not{b}\t%0
> > -   not{l}\t%k0"
> > -  [(set_attr "type" "negnot")
> > -   (set_attr "mode" "QI,SI")
> > +   not{l}\t%k0
> > +   #"
> > +  [(set_attr "isa" "*,*,avx512f")
> > +   (set_attr "type" "negnot,negnot,msklog")
> > +   (set (attr "mode")
> > +        (cond [(eq_attr "alternative" "1")
> > +                 (const_string "SI")
> > +                (and (eq_attr "alternative" "2")
> > +                     (match_test "!TARGET_AVX512DQ"))
> > +                 (const_string "HI")
> > +               ]
> > +               (const_string "QI")))
> >     ;; Potential partial reg stall on alternative 1.
> >     (set (attr "preferred_for_speed")
> >       (cond [(eq_attr "alternative" "1")
> > diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> > index 07e69d555c0..dd1b31479f5 100644
> > --- a/gcc/config/i386/predicates.md
> > +++ b/gcc/config/i386/predicates.md
> > @@ -87,6 +87,11 @@
> >    (and (match_code "reg")
> >         (match_test "REGNO (op) == FLAGS_REG")))
> >
> > +;; True if the operand is a MASK register.
> > +(define_predicate "mask_reg_operand"
> > +  (and (match_code "reg")
> > +       (match_test "MASK_REGNO_P (REGNO (op))")))
> > +
> >  ;; Match a DI, SI, HI or QImode nonimmediate_operand.
> >  (define_special_predicate "int_nonimmediate_operand"
> >    (and (match_operand 0 "nonimmediate_operand")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index b6348de67cb..4372a9fd785 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -1452,6 +1452,18 @@
> >    "TARGET_AVX512F
> >     && !(MEM_P (operands[0]) && MEM_P (operands[1]))")
> >
> > +(define_split
> > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > +        (any_logic:SWI1248_AVX512BW
> > +          (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")
> > +          (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(parallel
> > +     [(set (match_dup 0)
> > +           (any_logic:SWI1248_AVX512BW (match_dup 1) (match_dup 2)))
> > +      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> > +
> >  (define_insn "k<code><mode>"
> >    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
> >          (any_logic:SWI1248_AVX512BW
> > @@ -1474,6 +1486,21 @@
> >             ]
> >             (const_string "<MODE>")))])
> >
> > +(define_split
> > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > +        (and:SWI1248_AVX512BW
> > +          (not:SWI1248_AVX512BW
> > +            (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand"))
> > +          (match_operand:SWI1248_AVX512BW 2 "mask_reg_operand")))
> > +   (clobber (reg:CC FLAGS_REG))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(parallel
> > +     [(set (match_dup 0)
> > +           (and:SWI1248_AVX512BW
> > +             (not:SWI1248_AVX512BW (match_dup 1))
> > +             (match_dup 2)))
> > +      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> > +
> >  (define_insn "kandn<mode>"
> >    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
> >          (and:SWI1248_AVX512BW
> > @@ -1520,6 +1547,16 @@
> >             ]
> >             (const_string "<MODE>")))])
> >
> > +(define_split
> > +  [(set (match_operand:SWI1248_AVX512BW 0 "mask_reg_operand")
> > +        (not:SWI1248_AVX512BW
> > +          (match_operand:SWI1248_AVX512BW 1 "mask_reg_operand")))]
> > +  "TARGET_AVX512F && reload_completed"
> > +  [(parallel
> > +     [(set (match_dup 0)
> > +           (not:SWI1248_AVX512BW (match_dup 1)))
> > +      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> > +
> >  (define_insn "knot<mode>"
> >    [(set (match_operand:SWI1248_AVX512BW 0 "register_operand" "=k")
> >          (not:SWI1248_AVX512BW
> > @@ -1541,6 +1578,28 @@
> >             ]
> >             (const_string "<MODE>")))])
> >
> > +(define_split
> > +  [(set (match_operand:DI 0 "mask_reg_operand")
> > +        (zero_extend:DI
> > +          (not:DI (match_operand:SI 1 "mask_reg_operand"))))]
> > +  "TARGET_AVX512BW && reload_completed"
> > +  [(parallel
> > +     [(set (match_dup 0)
> > +           (zero_extend:DI
> > +             (not:SI (match_dup 1))))
> > +      (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> > +
> > +(define_insn "*knotsi_1_zext"
> > +  [(set (match_operand:DI 0 "register_operand" "=k")
> > +        (zero_extend:DI
> > +          (not:SI (match_operand:SI 1 "register_operand" "k"))))
> > +   (unspec [(const_int 0)] UNSPEC_MASKOP)]
> > +  "TARGET_AVX512BW"
> > +  "knotd\t{%1, %0|%0, %1}";
> > +  [(set_attr "type" "msklog")
> > +   (set_attr "prefix" "vex")
> > +   (set_attr "mode" "SI")])
> > +
> >  (define_insn "kadd<mode>"
> >    [(set (match_operand:SWI1248_AVX512BWDQ2 0 "register_operand" "=k")
> >          (plus:SWI1248_AVX512BWDQ2
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-1.c
> > b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-1.c
> > index 94422f36010..46d9351f275 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-1.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-1.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-mavx512bw -O2" } */
> > -/* { dg-final { scan-assembler-times "kunpckwd\[
> > \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
> > +/* { dg-final { scan-assembler-times "kunpckwd\[
> > \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> >
> >  #include <immintrin.h>
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
> > b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
> > index c68ad8cc1f7..fe13f4f33fc 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx512bw-kunpckwd-3.c
> > @@ -1,6 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-mavx512bw -O2" } */
> > -/* { dg-final { scan-assembler-times "kunpckwd\[
> > \\t\]+\[^\{\n\]*%k\[1-7\](?:\n|\[ \\t\]+#)" 1 } } */
> > +/* { dg-final { scan-assembler-times "kunpckwd\[
> > \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> >
> >  #include <immintrin.h>
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr88465.c
> > b/gcc/testsuite/gcc.target/i386/avx512bw-pr88465.c
> > new file mode 100644
> > index 00000000000..8e34bf45365
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr88465.c
> > @@ -0,0 +1,23 @@
> > +/* PR target/88465 */
> > +/* { dg-do compile { target { ! ia32 } } } */
> > +/* { dg-options "-O2 -mavx512bw" } */
> > +/* { dg-final { scan-assembler-times "kxor\[qd\]\[ \t]" 2 } } */
> > +/* { dg-final { scan-assembler-times "kxnor\[dq\]\[ \t]" 2 } } */
> > +
> > +void
> > +foo (void)
> > +{
> > +  unsigned int k = 0;
> > +  __asm volatile ("" : : "k" (k));
> > +  k = -1;
> > +  __asm volatile ("" : : "k" (k));
> > +}
> > +
> > +void
> > +bar (void)
> > +{
> > +  unsigned long long k = 0;
> > +  __asm volatile ("" : : "k" (k));
> > +  k = -1;
> > +  __asm volatile ("" : : "k" (k));
> > +}
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
> > b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
> > index 49817097e26..114e03ee93d 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx512dq-kmovb-5.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-mavx512dq -O2" } */
> > +/* { dg-options "-mavx512dq -mno-avx512bw -O2" } */
> >  /* { dg-final { scan-assembler-times "kmovb\[
> > \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> >
> >  #include <immintrin.h>
> > diff --git a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
> > b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
> > index 7bb34d34d8d..79d37394b36 100644
> > --- a/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
> > +++ b/gcc/testsuite/gcc.target/i386/avx512f-kmovw-5.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-mavx512f -O2" } */
> > +/* { dg-options "-mavx512f -mno-avx512bw -O2" } */
> >  /* { dg-final { scan-assembler-times "kmovw\[
> > \\t\]+\[^\{\n\]*%k\[0-7\](?:\n|\[ \\t\]+#)" 1 } } */
> >
> >  #include <immintrin.h>
> > diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-1.c
> > b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-1.c
> > new file mode 100644
> > index 00000000000..61f71ab8b23
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-1.c
> > @@ -0,0 +1,178 @@
> > +/* PR target/88808  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512bw -mno-avx512dq -O2" } */
> > +
> > +#include <immintrin.h>
> > +__m512i
> > +foo_orq (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > +  __mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d);
> > +  return _mm512_mask_add_epi8 (c, m1 | m2, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "korq" "1" { target { ! ia32 } } } } 
> > */
> > +
> > +__m512i
> > +foo_ord (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > +  __mmask32 m2 = _mm512_cmpeq_epi16_mask (c, d);
> > +  return _mm512_mask_add_epi16 (c, m1 | m2, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "kord" "1" } }  */
> > +
> > +__m512i
> > +foo_orw (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > +  __mmask16 m2 = _mm512_cmpeq_epi32_mask (c, d);
> > +  return _mm512_mask_add_epi32 (c, m1 | m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_orb (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > +  __mmask8 m2 = _mm512_cmpeq_epi64_mask (c, d);
> > +  return _mm512_mask_add_epi64 (c, m1 | m2, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "korw" "2" } }  */
> > +
> > +__m512i
> > +foo_xorq (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > +  __mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d);
> > +  return _mm512_mask_add_epi8 (c, m1 ^ m2, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "kxorq" "1" { target { ! ia32 } } } } 
> >  */
> > +
> > +__m512i
> > +foo_xord (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > +  __mmask32 m2 = _mm512_cmpeq_epi16_mask (c, d);
> > +  return _mm512_mask_add_epi16 (c, m1 ^ m2, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "kxord" "1" } }  */
> > +
> > +__m512i
> > +foo_xorw (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > +  __mmask16 m2 = _mm512_cmpeq_epi32_mask (c, d);
> > +  return _mm512_mask_add_epi32 (c, m1 ^ m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_xorb (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > +  __mmask8 m2 = _mm512_cmpeq_epi64_mask (c, d);
> > +  return _mm512_mask_add_epi64 (c, m1 ^ m2, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "korw" "2" } }  */
> > +
> > +__m512i
> > +foo_andq (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > +  __mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d);
> > +  return _mm512_mask_add_epi8 (c, m1 & m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andd (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > +  __mmask32 m2 = _mm512_cmpeq_epi16_mask (c, d);
> > +  return _mm512_mask_add_epi16 (c, m1 & m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andw (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > +  __mmask16 m2 = _mm512_cmpeq_epi32_mask (c, d);
> > +  return _mm512_mask_add_epi32 (c, m1 & m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andb (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > +  __mmask8 m2 = _mm512_cmpeq_epi64_mask (c, d);
> > +  return _mm512_mask_add_epi64 (c, m1 & m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andnq (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > +  __mmask64 m2 = _mm512_cmpeq_epi8_mask (c, d);
> > +  return _mm512_mask_add_epi8 (c, m1 & ~m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andnd (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > +  __mmask32 m2 = _mm512_cmpeq_epi16_mask (c, d);
> > +  return _mm512_mask_add_epi16 (c, m1 & ~m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andnw (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > +  __mmask16 m2 = _mm512_cmpeq_epi32_mask (c, d);
> > +  return _mm512_mask_add_epi32 (c, m1 & ~m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_andnb (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > +  __mmask8 m2 = _mm512_cmpeq_epi64_mask (c, d);
> > +  return _mm512_mask_add_epi64 (c, m1 & ~m2, a, d);
> > +}
> > +
> > +__m512i
> > +foo_notq (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask64 m1 = _mm512_cmpeq_epi8_mask (a, b);
> > +  return _mm512_mask_add_epi8 (c, ~m1, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "knotq" "2" { target { ! ia32 } } } } 
> >  */
> > +
> > +__m512i
> > +foo_notd (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask32 m1 = _mm512_cmpeq_epi16_mask (a, b);
> > +  return _mm512_mask_add_epi16 (c, ~m1, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "knotd" "2" { target { ! ia32 } } } } 
> >  */
> > +
> > +__m512i
> > +foo_notw (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask16 m1 = _mm512_cmpeq_epi32_mask (a, b);
> > +  return _mm512_mask_add_epi32 (c, ~m1, a, d);
> > +}
> > +
> > +__m512i
> > +foo_notb (__m512i a, __m512i b, __m512i c, __m512i d)
> > +{
> > +  __mmask8 m1 = _mm512_cmpeq_epi64_mask (a, b);
> > +  return _mm512_mask_add_epi64 (c, ~m1, a, d);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "knotw" "4" } }  */
> > diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-2.c
> > b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-2.c
> > new file mode 100644
> > index 00000000000..850f0b42652
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-2.c
> > @@ -0,0 +1,8 @@
> > +/* PR target/88808  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512bw -mavx512dq -O2" } */
> > +/* { dg-final { scan-assembler-times "knotb" "2" } }  */
> > +/* { dg-final { scan-assembler-times "korb" "1" } }  */
> > +/* { dg-final { scan-assembler-times "kxorb" "1" } }  */
> > +#include "bitwise_mask_op-1.c"
> > +
> > diff --git a/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> > b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> > new file mode 100644
> > index 00000000000..18bf4f0d768
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/bitwise_mask_op-3.c
> > @@ -0,0 +1,44 @@
> > +/* PR target/88808  */
> > +/* { dg-do compile } */
> > +/* { dg-options "-mavx512bw -mavx512dq -O2" } */
> > +
> > +#include <immintrin.h>
> > +volatile __mmask8 foo;
> > +void
> > +foo_orb (__m512i a, __m512i b)
> > +{
> > +  __mmask8 m1 = _mm512_cmp_epi64_mask (a, b, 2);
> > +  __mmask8 m2 = _mm512_cmp_epi64_mask (a, b, 4);
> > +  foo = m1 | m2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "korb\[\t \]" "1" } }  */
> > +
> > +void
> > +foo_xorb (__m512i a, __m512i b)
> > +{
> > +  __mmask8 m1 = _mm512_cmp_epi64_mask (a, b, 2);
> > +  __mmask8 m2 = _mm512_cmp_epi64_mask (a, b, 4);
> > +  foo = m1 ^ m2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "kxorb\[\t \]" "1" } }  */
> > +
> > +void
> > +foo_andb (__m512i a, __m512i b)
> > +{
> > +  __mmask8 m1 = _mm512_cmp_epi64_mask (a, b, 2);
> > +  __mmask8 m2 = _mm512_cmp_epi64_mask (a, b, 4);
> > +  foo = m1 & m2;
> > +}
> > +
> > +void
> > +foo_andnb (__m512i a, __m512i b)
> > +{
> > +  __mmask8 m1 = _mm512_cmp_epi64_mask (a, b, 2);
> > +  __mmask8 m2 = _mm512_cmp_epi64_mask (a, b, 4);
> > +  foo = m1 & ~m2;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "knotb\[\t \]" "1" } }  */
> > +/* { dg-final { scan-assembler-times "kmovb\[\t \]" "4"} }  */
> > --
> > 2.18.1
> >
> >
> > --
> > BR,
> > Hongtao



-- 
BR,
Hongtao

Reply via email to