Re: [PATCH] Improve rotate fold-const pattern matching (PR target/82498)
On Thu, 12 Oct 2017, Jakub Jelinek wrote: > Hi! > > Marc in the PR mentioned that it is not really good that the recommended > rotate pattern is recognized only during forwprop1 and later, which is after > einline and that inlining or early opts could have changed stuff too much so > that we wouldn't recogize it anymore. Hmm, but the only thing functions see is inlining early optimized bodies into them and then constant propagation performed, so I don't see how we could confuse the pattern in a way to be indetectable. Also early inlining is performed on early optimized bodies so cost metrics see rotates, not the unrecognized form. > The following patch handles that pattern in fold_binary_loc too, and while > I've touched it, it cleans a lot of ugliness/duplication in that code. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Still looks like an improvement, thus ok. Thanks, Richard. > 2017-10-12 Jakub Jelinek> > PR target/82498 > * fold-const.c (fold_binary_loc) : Code cleanups, > instead of handling MINUS_EXPR twice (once for each argument), > canonicalize operand order and handle just once, use rtype where > possible. Handle (A << B) | (A >> (-B & (Z - 1))). > > * gcc.dg/tree-ssa/pr82498.c: New test. > > --- gcc/fold-const.c.jj 2017-10-11 22:37:51.0 +0200 > +++ gcc/fold-const.c 2017-10-12 13:17:45.360589554 +0200 > @@ -9429,7 +9429,10 @@ fold_binary_loc (location_t loc, >/* (A << C1) + (A >> C2) if A is unsigned and C1+C2 is the size of A >is a rotate of A by C1 bits. */ >/* (A << B) + (A >> (Z - B)) if A is unsigned and Z is the size of A > - is a rotate of A by B bits. */ > + is a rotate of A by B bits. > + Similarly for (A << B) | (A >> (-B & C3)) where C3 is Z-1, > + though in this case CODE must be | and not + or ^, otherwise > + it doesn't return A when B is 0. */ >{ > enum tree_code code0, code1; > tree rtype; > @@ -9447,25 +9450,32 @@ fold_binary_loc (location_t loc, > == GET_MODE_UNIT_PRECISION (TYPE_MODE (rtype > { > tree tree01, tree11; > + tree orig_tree01, orig_tree11; > enum tree_code code01, code11; > > - tree01 = TREE_OPERAND (arg0, 1); > - tree11 = TREE_OPERAND (arg1, 1); > + tree01 = orig_tree01 = TREE_OPERAND (arg0, 1); > + tree11 = orig_tree11 = TREE_OPERAND (arg1, 1); > STRIP_NOPS (tree01); > STRIP_NOPS (tree11); > code01 = TREE_CODE (tree01); > code11 = TREE_CODE (tree11); > + if (code11 != MINUS_EXPR > + && (code01 == MINUS_EXPR || code01 == BIT_AND_EXPR)) > + { > + std::swap (code0, code1); > + std::swap (code01, code11); > + std::swap (tree01, tree11); > + std::swap (orig_tree01, orig_tree11); > + } > if (code01 == INTEGER_CST > && code11 == INTEGER_CST > && (wi::to_widest (tree01) + wi::to_widest (tree11) > - == element_precision (TREE_TYPE (TREE_OPERAND (arg0, 0) > + == element_precision (rtype))) > { > tem = build2_loc (loc, LROTATE_EXPR, > - TREE_TYPE (TREE_OPERAND (arg0, 0)), > - TREE_OPERAND (arg0, 0), > + rtype, TREE_OPERAND (arg0, 0), > code0 == LSHIFT_EXPR > - ? TREE_OPERAND (arg0, 1) > - : TREE_OPERAND (arg1, 1)); > + ? orig_tree01 : orig_tree11); > return fold_convert_loc (loc, type, tem); > } > else if (code11 == MINUS_EXPR) > @@ -9477,39 +9487,37 @@ fold_binary_loc (location_t loc, > STRIP_NOPS (tree111); > if (TREE_CODE (tree110) == INTEGER_CST > && 0 == compare_tree_int (tree110, > - element_precision > - (TREE_TYPE (TREE_OPERAND > - (arg0, 0 > + element_precision (rtype)) > && operand_equal_p (tree01, tree111, 0)) > - return > - fold_convert_loc (loc, type, > - build2 ((code0 == LSHIFT_EXPR > -? LROTATE_EXPR > -: RROTATE_EXPR), > - TREE_TYPE (TREE_OPERAND (arg0, > 0)), > - TREE_OPERAND (arg0, 0), > - TREE_OPERAND (arg0, 1))); > + { > + tem = build2_loc (loc, (code0 == LSHIFT_EXPR > +
Re: [x86] GFNI enabling[1/4]
On Fri, Oct 13, 2017 at 07:03:14AM +, Koval, Julia wrote: --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -753,6 +753,10 @@ mrdpid Target Report Mask(ISA_RDPID) Var(ix86_isa_flags2) Save Support RDPID built-in functions and code generation. +mgfni +Target Report Mask(ISA_GFNI) Var(ix86_isa_flags2) Save +Support RDPID built-in functions and code generation. + Pasto? It would surprise me if the description was meant to be exactly the same as -mrdpid. Jakub 0001-gfni-option.patch Description: 0001-gfni-option.patch
[x86] GFNI enabling[1/4]
Hi, gcc/ * gcc/common/config/i386/i386-common.c (OPTION_MASK_ISA_GFNI_SET, (OPTION_MASK_ISA_GFNI_UNSET): New. (ix86_handle_option): Handle OPT_mgfni. * gcc/config/i386/cpuid.h (bit_GFNI): New. * gcc/config/i386/driver-i386.c (host_detect_local_cpu): Detect gfni. * gcc/config/i386/i386-c.c (ix86_target_macros_internal): Define __GFNI__. * gcc/config/i386/i386.c (ix86_target_string): Add -mgfni. (ix86_valid_target_attribute_inner_p): Add OPT_mgfni. * gcc/config/i386/i386.h (TARGET_GFNI, TARGET_GFNI_P): New. * gcc/config/i386/i386.opt: Add mgfni. Here is the first patch of GFNI isaset enabling. It adds new option -mgfni for GFNI isaset and cpuid bit. Docs for new instructions and isasets are here: https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf Ok for trunk? Thanks, Julia 0001-gfni-option.patch Description: 0001-gfni-option.patch
Re: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET
On Thu, Oct 12, 2017 at 8:45 PM, Tsimbalist, Igor Vwrote: > Uros, > > Attached is an updated patch. The main difference is in option name and > attribute name change. Other code is the same. Trivial changes (if they fall under "trivial" rule) don't need extra approval, but OK nevertheless. Thanks, Uros. > Igor > > >> -Original Message- >> From: Tsimbalist, Igor V >> Sent: Tuesday, September 19, 2017 5:06 PM >> To: Uros Bizjak ; gcc-patches@gcc.gnu.org >> Cc: Tsimbalist, Igor V >> Subject: RE: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET >> >> Uros, thank you for the approval. Based on the approval of the first 3 >> patches >> (I've submitted them today), I need to adjust option and attribute names. I >> will resubmit the patch when I fix option and attribute names. >> >> Thanks, >> Igor >> >> >> > -Original Message- >> > From: Uros Bizjak [mailto:ubiz...@gmail.com] >> > Sent: Monday, September 18, 2017 11:58 AM >> > To: gcc-patches@gcc.gnu.org >> > Cc: Tsimbalist, Igor V ; Tsimbalist, Igor >> > V >> > Subject: Re: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET >> > >> > Hello! >> > >> > > gcc/ >> > > >> > > * common/config/i386/i386-common.c (OPTION_MASK_ISA_IBT_SET): >> > New. >> > > (OPTION_MASK_ISA_SHSTK_SET): Likewise. >> > > (OPTION_MASK_ISA_IBT_UNSET): Likewise. >> > > (OPTION_MASK_ISA_SHSTK_UNSET): Likewise. >> > > (ix86_handle_option): Add -mibt, -mshstk, -mcet handling. >> > > * config.gcc (extra_headers): Add cetintrin.h for x86 targets. >> > > (extra_objs): Add cet.o for Linux/x86 targets. >> > > (tmake_file): Add i386/t-cet for Linux/x86 targets. >> > > * config/i386/cet.c: New file. >> > > * config/i386/cetintrin.h: Likewise. >> > > * config/i386/t-cet: Likewise. >> > > * config/i386/cpuid.h (bit_SHSTK): New. >> > > (bit_IBT): Likewise. >> > > * config/i386/driver-i386.c (host_detect_local_cpu): Detect and pass >> > > IBT and SHSTK bits. >> > > * config/i386/i386-builtin-types.def >> > > (VOID_FTYPE_UNSIGNED_PVOID): New. >> > > (VOID_FTYPE_UINT64_PVOID): Likewise. >> > > * config/i386/i386-builtin.def: Add CET intrinsics. >> > > * config/i386/i386-c.c (ix86_target_macros_internal): Add >> > > OPTION_MASK_ISA_IBT, OPTION_MASK_ISA_SHSTK handling. >> > > * config/i386/i386-passes.def: Add pass_insert_endbranch pass. >> > > * config/i386/i386-protos.h (make_pass_insert_endbranch): New >> > > prototype. >> > > * config/i386/i386.c (rest_of_insert_endbranch): New. >> > > (pass_data_insert_endbranch): Likewise. >> > > (pass_insert_endbranch): Likewise. >> > > (make_pass_insert_endbranch): Likewise. >> > > (ix86_notrack_prefixed_insn_p): Likewise. >> > > (ix86_target_string): Add -mibt, -mshstk flags. >> > > (ix86_option_override_internal): Add flag_instrument_control_flow >> > > processing. >> > > (ix86_valid_target_attribute_inner_p): Set OPT_mibt, OPT_mshstk. >> > > (ix86_print_operand): Add 'notrack' prefix output. >> > > (ix86_init_mmx_sse_builtins): Add CET intrinsics. >> > > (ix86_expand_builtin): Expand CET intrinsics. >> > > (x86_output_mi_thunk): Add 'endbranch' instruction. >> > > * config/i386/i386.h (TARGET_IBT): New. >> > > (TARGET_IBT_P): Likewise. >> > > (TARGET_SHSTK): Likewise. >> > > (TARGET_SHSTK_P): Likewise. >> > > * config/i386/i386.md (unspecv): Add UNSPECV_NOP_RDSSP, >> > > UNSPECV_INCSSP, UNSPECV_SAVEPREVSSP, UNSPECV_RSTORSSP, >> > UNSPECV_WRSS, >> > > UNSPECV_WRUSS, UNSPECV_SETSSBSY, UNSPECV_CLRSSBSY. >> > > (builtin_setjmp_setup): New pattern. >> > > (builtin_longjmp): Likewise. >> > > (rdssp): Likewise. >> > > (incssp): Likewise. >> > > (saveprevssp): Likewise. >> > > (rstorssp): Likewise. >> > > (wrss): Likewise. >> > > (wruss): Likewise. >> > > (setssbsy): Likewise. >> > > (clrssbsy): Likewise. >> > > (nop_endbr): Likewise. >> > > * config/i386/i386.opt: Add -mcet, -mibt, -mshstk and -mcet-switch >> > > options. >> > > * config/i386/immintrin.h: Include . >> > > * config/i386/linux-common.h >> > > (file_end_indicate_exec_stack_and_cet): New prototype. >> > > (TARGET_ASM_FILE_END): New. >> > >> > LGTM. >> > >> > OK for mainline. >> > >> > Thanks, >> > Uros.
Re: [PATCH] Avoid UB in ia32intrin.h rotate patterns (PR target/82498)
On Thu, Oct 12, 2017 at 9:39 PM, Jakub Jelinekwrote: > Hi! > > The ia32intrin.h rotate intrinsics require the second argument to be > in between 1 and 31 (or 63), otherwise they invoke UB. But, we can do much > better while generating the same instruction when optimizing, so the > following patch uses the patterns we pattern recognize well and where > the second argument can be any value. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2017-10-12 Jakub Jelinek > > PR target/82498 > * config/i386/ia32intrin.h (__rold, __rord, __rolq, __rorq): Allow > any values of __C while still being pattern recognizable as a simple > rotate instruction. > > * gcc.dg/ubsan/pr82498.c: New test. LGTM. Thanks, Uros. > --- gcc/config/i386/ia32intrin.h.jj 2017-01-01 12:45:42.0 +0100 > +++ gcc/config/i386/ia32intrin.h2017-10-12 09:55:24.235602737 +0200 > @@ -147,7 +147,8 @@ extern __inline unsigned int > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > __rold (unsigned int __X, int __C) > { > - return (__X << __C) | (__X >> (32 - __C)); > + __C &= 31; > + return (__X << __C) | (__X >> (-__C & 31)); > } > > /* 8bit ror */ > @@ -171,7 +172,8 @@ extern __inline unsigned int > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > __rord (unsigned int __X, int __C) > { > - return (__X >> __C) | (__X << (32 - __C)); > + __C &= 31; > + return (__X >> __C) | (__X << (-__C & 31)); > } > > /* Pause */ > @@ -239,7 +241,8 @@ extern __inline unsigned long long > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > __rolq (unsigned long long __X, int __C) > { > - return (__X << __C) | (__X >> (64 - __C)); > + __C &= 63; > + return (__X << __C) | (__X >> (-__C & 63)); > } > > /* 64bit ror */ > @@ -247,7 +250,8 @@ extern __inline unsigned long long > __attribute__((__gnu_inline__, __always_inline__, __artificial__)) > __rorq (unsigned long long __X, int __C) > { > - return (__X >> __C) | (__X << (64 - __C)); > + __C &= 63; > + return (__X >> __C) | (__X << (-__C & 63)); > } > > /* Read flags register */ > --- gcc/testsuite/gcc.dg/ubsan/pr82498.c.jj 2017-10-12 09:40:36.025438511 > +0200 > +++ gcc/testsuite/gcc.dg/ubsan/pr82498.c2017-10-12 10:06:06.636790077 > +0200 > @@ -0,0 +1,159 @@ > +/* PR target/82498 */ > +/* { dg-do run { target i?86-*-* x86_64-*-* } } */ > +/* { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined" } */ > + > +#include > + > +volatile unsigned int a; > +volatile unsigned long long b; > +volatile int c; > + > +int > +main () > +{ > + a = 0x12345678U; > + a = __rold (a, 0); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rold (a, 32); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rold (a, -32); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rold (a, 37); > + if (a != 0x468acf02U) > +__builtin_abort (); > + a = __rold (a, -5); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rord (a, 0); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rord (a, 32); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rord (a, -32); > + if (a != 0x12345678U) > +__builtin_abort (); > + a = __rord (a, -37); > + if (a != 0x468acf02U) > +__builtin_abort (); > + a = __rord (a, 5); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = 0; > + a = __rold (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = 32; > + a = __rold (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = -32; > + a = __rold (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = 37; > + a = __rold (a, c); > + if (a != 0x468acf02U) > +__builtin_abort (); > + c = -5; > + a = __rold (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = 0; > + a = __rord (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = 32; > + a = __rord (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = -32; > + a = __rord (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > + c = -37; > + a = __rord (a, c); > + if (a != 0x468acf02U) > +__builtin_abort (); > + c = 5; > + a = __rord (a, c); > + if (a != 0x12345678U) > +__builtin_abort (); > +#ifdef __x86_64__ > + b = 0x123456789abcdef1ULL; > + b = __rolq (b, 0); > + if (b != 0x123456789abcdef1ULL) > +__builtin_abort (); > + b = __rolq (b, 64); > + if (b != 0x123456789abcdef1ULL) > +__builtin_abort (); > + b = __rolq (b, -64); > + if (b != 0x123456789abcdef1ULL) > +__builtin_abort (); > + b = __rolq (b, 69); > + if (b != 0x468acf13579bde22ULL) > +__builtin_abort (); > + b = __rolq (b, -5); > + if (b != 0x123456789abcdef1ULL) > +__builtin_abort (); > + b = __rorq (b, 0); > + if (b != 0x123456789abcdef1ULL) > +
Re: [PATCH] Fix various arithmetic patterns with %[abcd]h destination (PR target/82524)
On Thu, Oct 12, 2017 at 9:49 PM, Jakub Jelinekwrote: > Hi! > > As mentioned in the PR, there are two bugs in these. One is that > the zero_extract destination is effectively another input operand (for the > remaining bits that are unchanged) and thus the constraint can't be =Q, > but has to be +Q. > And the other problem is that then LRA ICEs whenever it has 3 different > operands, because it is unable to reload it properly. Uros mentioned > that it could be reloaded by using *insvqi_2-like insn to move the > 8 bits from the operand that should use "0" constraint into the destination > register, but LRA isn't able to do that right now. > So this patch instead adds insn conditions that either the destination > is the same as the first input operand or as one of the input operands > (the latter for commutative patterns). > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/7.3? > > 2017-10-12 Jakub Jelinek > > PR target/82524 > * config/i386/i386.md (addqi_ext_1, andqi_ext_1, > *andqi_ext_1_cc, *qi_ext_1, *xorqi_ext_1_cc): Change > =Q constraints to +Q and into insn condition add check > that operands[0] and operands[1] are equal. > (*addqi_ext_2, *andqi_ext_2, *qi_ext_2): Change > =Q constraints to +Q and into insn condition add check > that operands[0] is equal to either operands[1] or operands[2]. > > * gcc.c-torture/execute/pr82524.c: New test. OK for mainline and gcc-7 branch. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2017-10-12 14:05:15.0 +0200 > +++ gcc/config/i386/i386.md 2017-10-12 17:07:11.723151868 +0200 > @@ -6264,7 +6264,7 @@ (define_insn "*add_5" > (set_attr "mode" "")]) > > (define_insn "addqi_ext_1" > - [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q") > + [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q") > (const_int 8) > (const_int 8)) > (subreg:SI > @@ -6275,7 +6275,8 @@ (define_insn "addqi_ext_1" >(const_int 8)) 0) > (match_operand:QI 2 "general_operand" "QnBc,m")) 0)) > (clobber (reg:CC FLAGS_REG))] > - "" > + "/* FIXME: without this LRA can't reload this pattern, see PR82524. */ > + rtx_equal_p (operands[0], operands[1])" > { >switch (get_attr_type (insn)) > { > @@ -6300,7 +6301,7 @@ (define_insn "addqi_ext_1" > (set_attr "mode" "QI")]) > > (define_insn "*addqi_ext_2" > - [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q") > + [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q") > (const_int 8) > (const_int 8)) > (subreg:SI > @@ -6314,7 +6315,9 @@ (define_insn "*addqi_ext_2" >(const_int 8) >(const_int 8)) 0)) 0)) >(clobber (reg:CC FLAGS_REG))] > - "" > + "/* FIXME: without this LRA can't reload this pattern, see PR82524. */ > + rtx_equal_p (operands[0], operands[1]) > + || rtx_equal_p (operands[0], operands[2])" >"add{b}\t{%h2, %h0|%h0, %h2}" >[(set_attr "type" "alu") > (set_attr "mode" "QI")]) > @@ -8998,7 +9001,7 @@ (define_insn "*andqi_2_slp" > (set_attr "mode" "QI")]) > > (define_insn "andqi_ext_1" > - [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q") > + [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q") > (const_int 8) > (const_int 8)) > (subreg:SI > @@ -9009,7 +9012,8 @@ (define_insn "andqi_ext_1" >(const_int 8)) 0) > (match_operand:QI 2 "general_operand" "QnBc,m")) 0)) > (clobber (reg:CC FLAGS_REG))] > - "" > + "/* FIXME: without this LRA can't reload this pattern, see PR82524. */ > + rtx_equal_p (operands[0], operands[1])" >"and{b}\t{%2, %h0|%h0, %2}" >[(set_attr "isa" "*,nox64") > (set_attr "type" "alu") > @@ -9027,7 +9031,7 @@ (define_insn "*andqi_ext_1_cc" >(const_int 8)) 0) > (match_operand:QI 2 "general_operand" "QnBc,m")) > (const_int 0))) > - (set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q") > + (set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q") > (const_int 8) > (const_int 8)) > (subreg:SI > @@ -9037,14 +9041,16 @@ (define_insn "*andqi_ext_1_cc" >(const_int 8) >(const_int 8)) 0) > (match_dup 2)) 0))] > - "ix86_match_ccmode (insn, CCNOmode)" > + "ix86_match_ccmode (insn, CCNOmode) > + /* FIXME: without this LRA can't reload this pattern, see PR82524. */ > + && rtx_equal_p (operands[0], operands[1])" >"and{b}\t{%2, %h0|%h0, %2}" >[(set_attr "isa" "*,nox64") >
Re: [PATCH] Improve x86 and + rotate (PR target/82498)
On Thu, Oct 12, 2017 at 9:11 PM, Jakub Jelinekwrote: > On Thu, Oct 12, 2017 at 10:40:22AM +0200, Uros Bizjak wrote: >> > So, if you aren't against it, I can extend the patch to handle the 4 >> > other mask patterns; as for other modes, SImode is what is being handled >> > already, DImode is not a problem, because the FEs truncate the shift counts >> > to integer_type_node already, and for HImode I haven't seen problem >> > probably because most tunings avoid HImode math and so it isn't worth >> > optimizing. >> >> OK, I think that we can live wtih 4 new patterns. Since these are all >> written in the same way (as in the patch you posted), the ammended >> patch is pre-approved for mainline. > > Thanks, here is what I've committed to trunk after another bootstrap/regtest > on x86_64-linux and i686-linux: > > 2017-10-12 Jakub Jelinek > > PR target/82498 > * config/i386/i386.md (*ashl3_mask_1, > *3_mask_1, *3_mask_1, > *_mask_1, *btr_mask_1): New define_insn_and_split > patterns. > > * gcc.target/i386/pr82498-1.c: New test. > * gcc.target/i386/pr82498-2.c: New test. OK for mainline. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2017-10-11 22:37:55.933863355 +0200 > +++ gcc/config/i386/i386.md 2017-10-12 11:30:38.191535974 +0200 > @@ -10228,6 +10228,26 @@ (define_insn_and_split "*ashl3_mas >(clobber (reg:CC FLAGS_REG))])] >"operands[2] = gen_lowpart (QImode, operands[2]);") > > +(define_insn_and_split "*ashl3_mask_1" > + [(set (match_operand:SWI48 0 "nonimmediate_operand") > + (ashift:SWI48 > + (match_operand:SWI48 1 "nonimmediate_operand") > + (and:QI > + (match_operand:QI 2 "register_operand") > + (match_operand:QI 3 "const_int_operand" > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (ASHIFT, mode, operands) > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) > + == GET_MODE_BITSIZE (mode)-1 > + && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(parallel > + [(set (match_dup 0) > + (ashift:SWI48 (match_dup 1) > +(match_dup 2))) > + (clobber (reg:CC FLAGS_REG))])]) > + > (define_insn "*bmi2_ashl3_1" >[(set (match_operand:SWI48 0 "register_operand" "=r") > (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm") > @@ -10728,6 +10748,26 @@ (define_insn_and_split "*(clobber (reg:CC FLAGS_REG))])] >"operands[2] = gen_lowpart (QImode, operands[2]);") > > +(define_insn_and_split "*3_mask_1" > + [(set (match_operand:SWI48 0 "nonimmediate_operand") > + (any_shiftrt:SWI48 > + (match_operand:SWI48 1 "nonimmediate_operand") > + (and:QI > + (match_operand:QI 2 "register_operand") > + (match_operand:QI 3 "const_int_operand" > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (, mode, operands) > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) > + == GET_MODE_BITSIZE (mode)-1 > + && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(parallel > + [(set (match_dup 0) > + (any_shiftrt:SWI48 (match_dup 1) > + (match_dup 2))) > + (clobber (reg:CC FLAGS_REG))])]) > + > (define_insn_and_split "*3_doubleword" >[(set (match_operand:DWI 0 "register_operand" "=") > (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0") > @@ -11187,6 +11227,26 @@ (define_insn_and_split "*(clobber (reg:CC FLAGS_REG))])] >"operands[2] = gen_lowpart (QImode, operands[2]);") > > +(define_insn_and_split "*3_mask_1" > + [(set (match_operand:SWI48 0 "nonimmediate_operand") > + (any_rotate:SWI48 > + (match_operand:SWI48 1 "nonimmediate_operand") > + (and:QI > + (match_operand:QI 2 "register_operand") > + (match_operand:QI 3 "const_int_operand" > + (clobber (reg:CC FLAGS_REG))] > + "ix86_binary_operator_ok (, mode, operands) > + && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) > + == GET_MODE_BITSIZE (mode)-1 > + && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(parallel > + [(set (match_dup 0) > + (any_rotate:SWI48 (match_dup 1) > +(match_dup 2))) > + (clobber (reg:CC FLAGS_REG))])]) > + > ;; Implement rotation using two double-precision > ;; shift instructions and a scratch register. > > @@ -11494,6 +11554,30 @@ (define_insn_and_split "*_ma >(clobber (reg:CC FLAGS_REG))])] >"operands[1] = gen_lowpart (QImode, operands[1]);") > > +(define_insn_and_split "*_mask_1" > + [(set (match_operand:SWI48 0 "register_operand") > + (any_or:SWI48 > + (ashift:SWI48 > + (const_int 1) > + (and:QI > + (match_operand:QI 1 "register_operand") > + (match_operand:QI 2 "const_int_operand"))) > + (match_operand:SWI48 3 "register_operand"))) > + (clobber