Re: [PATCH] Improve rotate fold-const pattern matching (PR target/82498)

2017-10-13 Thread Richard Biener
On Thu, 12 Oct 2017, Jakub Jelinek wrote:

> Hi!
> 
> Marc in the PR mentioned that it is not really good that the recommended
> rotate pattern is recognized only during forwprop1 and later, which is after
> einline and that inlining or early opts could have changed stuff too much so
> that we wouldn't recogize it anymore.

Hmm, but the only thing functions see is inlining early optimized bodies
into them and then constant propagation performed, so I don't see how
we could confuse the pattern in a way to be indetectable.  Also
early inlining is performed on early optimized bodies so cost metrics
see rotates, not the unrecognized form.

> The following patch handles that pattern in fold_binary_loc too, and while
> I've touched it, it cleans a lot of ugliness/duplication in that code.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Still looks like an improvement, thus ok.

Thanks,
Richard.

> 2017-10-12  Jakub Jelinek  
> 
>   PR target/82498
>   * fold-const.c (fold_binary_loc) : Code cleanups,
>   instead of handling MINUS_EXPR twice (once for each argument),
>   canonicalize operand order and handle just once, use rtype where
>   possible.  Handle (A << B) | (A >> (-B & (Z - 1))).
> 
>   * gcc.dg/tree-ssa/pr82498.c: New test.
> 
> --- gcc/fold-const.c.jj   2017-10-11 22:37:51.0 +0200
> +++ gcc/fold-const.c  2017-10-12 13:17:45.360589554 +0200
> @@ -9429,7 +9429,10 @@ fold_binary_loc (location_t loc,
>/* (A << C1) + (A >> C2) if A is unsigned and C1+C2 is the size of A
>is a rotate of A by C1 bits.  */
>/* (A << B) + (A >> (Z - B)) if A is unsigned and Z is the size of A
> -  is a rotate of A by B bits.  */
> +  is a rotate of A by B bits.
> +  Similarly for (A << B) | (A >> (-B & C3)) where C3 is Z-1,
> +  though in this case CODE must be | and not + or ^, otherwise
> +  it doesn't return A when B is 0.  */
>{
>   enum tree_code code0, code1;
>   tree rtype;
> @@ -9447,25 +9450,32 @@ fold_binary_loc (location_t loc,
>   == GET_MODE_UNIT_PRECISION (TYPE_MODE (rtype
> {
>   tree tree01, tree11;
> + tree orig_tree01, orig_tree11;
>   enum tree_code code01, code11;
>  
> - tree01 = TREE_OPERAND (arg0, 1);
> - tree11 = TREE_OPERAND (arg1, 1);
> + tree01 = orig_tree01 = TREE_OPERAND (arg0, 1);
> + tree11 = orig_tree11 = TREE_OPERAND (arg1, 1);
>   STRIP_NOPS (tree01);
>   STRIP_NOPS (tree11);
>   code01 = TREE_CODE (tree01);
>   code11 = TREE_CODE (tree11);
> + if (code11 != MINUS_EXPR
> + && (code01 == MINUS_EXPR || code01 == BIT_AND_EXPR))
> +   {
> + std::swap (code0, code1);
> + std::swap (code01, code11);
> + std::swap (tree01, tree11);
> + std::swap (orig_tree01, orig_tree11);
> +   }
>   if (code01 == INTEGER_CST
>   && code11 == INTEGER_CST
>   && (wi::to_widest (tree01) + wi::to_widest (tree11)
> - == element_precision (TREE_TYPE (TREE_OPERAND (arg0, 0)
> + == element_precision (rtype)))
> {
>   tem = build2_loc (loc, LROTATE_EXPR,
> -   TREE_TYPE (TREE_OPERAND (arg0, 0)),
> -   TREE_OPERAND (arg0, 0),
> +   rtype, TREE_OPERAND (arg0, 0),
> code0 == LSHIFT_EXPR
> -   ? TREE_OPERAND (arg0, 1)
> -   : TREE_OPERAND (arg1, 1));
> +   ? orig_tree01 : orig_tree11);
>   return fold_convert_loc (loc, type, tem);
> }
>   else if (code11 == MINUS_EXPR)
> @@ -9477,39 +9487,37 @@ fold_binary_loc (location_t loc,
>   STRIP_NOPS (tree111);
>   if (TREE_CODE (tree110) == INTEGER_CST
>   && 0 == compare_tree_int (tree110,
> -   element_precision
> -   (TREE_TYPE (TREE_OPERAND
> -   (arg0, 0
> +   element_precision (rtype))
>   && operand_equal_p (tree01, tree111, 0))
> -   return
> - fold_convert_loc (loc, type,
> -   build2 ((code0 == LSHIFT_EXPR
> -? LROTATE_EXPR
> -: RROTATE_EXPR),
> -   TREE_TYPE (TREE_OPERAND (arg0, 
> 0)),
> -   TREE_OPERAND (arg0, 0),
> -   TREE_OPERAND (arg0, 1)));
> +   {
> + tem = build2_loc (loc, (code0 == LSHIFT_EXPR
> +  

Re: [x86] GFNI enabling[1/4]

2017-10-13 Thread Jakub Jelinek
On Fri, Oct 13, 2017 at 07:03:14AM +, Koval, Julia wrote:

--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -753,6 +753,10 @@ mrdpid
 Target Report Mask(ISA_RDPID) Var(ix86_isa_flags2) Save
 Support RDPID built-in functions and code generation.
 
+mgfni
+Target Report Mask(ISA_GFNI) Var(ix86_isa_flags2) Save
+Support RDPID built-in functions and code generation.
+

Pasto?  It would surprise me if the description was meant to be
exactly the same as -mrdpid.

Jakub


0001-gfni-option.patch
Description: 0001-gfni-option.patch


[x86] GFNI enabling[1/4]

2017-10-13 Thread Koval, Julia
Hi,

gcc/
* gcc/common/config/i386/i386-common.c (OPTION_MASK_ISA_GFNI_SET,
(OPTION_MASK_ISA_GFNI_UNSET): New.
(ix86_handle_option): Handle OPT_mgfni.
* gcc/config/i386/cpuid.h (bit_GFNI): New.
* gcc/config/i386/driver-i386.c (host_detect_local_cpu): Detect gfni.
* gcc/config/i386/i386-c.c (ix86_target_macros_internal): Define 
__GFNI__.
* gcc/config/i386/i386.c (ix86_target_string): Add -mgfni.
(ix86_valid_target_attribute_inner_p): Add OPT_mgfni.
* gcc/config/i386/i386.h (TARGET_GFNI, TARGET_GFNI_P): New.
* gcc/config/i386/i386.opt: Add mgfni.

Here is the first patch of GFNI isaset enabling. It adds new option -mgfni for 
GFNI isaset and cpuid bit.
Docs for new instructions and isasets are here:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
Ok for trunk? 

Thanks,
Julia


0001-gfni-option.patch
Description: 0001-gfni-option.patch


Re: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET

2017-10-13 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 8:45 PM, Tsimbalist, Igor V
 wrote:
> Uros,
>
> Attached is an updated patch. The main difference is in option name and 
> attribute name change. Other code is the same.

Trivial changes (if they fall under "trivial" rule) don't need extra
approval, but OK nevertheless.

Thanks,
Uros.

> Igor
>
>
>> -Original Message-
>> From: Tsimbalist, Igor V
>> Sent: Tuesday, September 19, 2017 5:06 PM
>> To: Uros Bizjak ; gcc-patches@gcc.gnu.org
>> Cc: Tsimbalist, Igor V 
>> Subject: RE: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET
>>
>> Uros, thank you for the approval. Based on the approval of the first 3 
>> patches
>> (I've submitted them today), I need to adjust option and attribute names. I
>> will resubmit the patch when I fix option and attribute names.
>>
>> Thanks,
>> Igor
>>
>>
>> > -Original Message-
>> > From: Uros Bizjak [mailto:ubiz...@gmail.com]
>> > Sent: Monday, September 18, 2017 11:58 AM
>> > To: gcc-patches@gcc.gnu.org
>> > Cc: Tsimbalist, Igor V ; Tsimbalist, Igor
>> > V 
>> > Subject: Re: 0004-Part-4.-Update-x86-backend-to-enable-Intel-CET
>> >
>> > Hello!
>> >
>> > > gcc/
>> > >
>> > > * common/config/i386/i386-common.c (OPTION_MASK_ISA_IBT_SET):
>> > New.
>> > > (OPTION_MASK_ISA_SHSTK_SET): Likewise.
>> > > (OPTION_MASK_ISA_IBT_UNSET): Likewise.
>> > > (OPTION_MASK_ISA_SHSTK_UNSET): Likewise.
>> > > (ix86_handle_option): Add -mibt, -mshstk, -mcet handling.
>> > > * config.gcc (extra_headers): Add cetintrin.h for x86 targets.
>> > > (extra_objs): Add cet.o for Linux/x86 targets.
>> > > (tmake_file): Add i386/t-cet for Linux/x86 targets.
>> > > * config/i386/cet.c: New file.
>> > > * config/i386/cetintrin.h: Likewise.
>> > > * config/i386/t-cet: Likewise.
>> > > * config/i386/cpuid.h (bit_SHSTK): New.
>> > > (bit_IBT): Likewise.
>> > > * config/i386/driver-i386.c (host_detect_local_cpu): Detect and pass
>> > > IBT and SHSTK bits.
>> > > * config/i386/i386-builtin-types.def
>> > > (VOID_FTYPE_UNSIGNED_PVOID): New.
>> > > (VOID_FTYPE_UINT64_PVOID): Likewise.
>> > > * config/i386/i386-builtin.def: Add CET intrinsics.
>> > > * config/i386/i386-c.c (ix86_target_macros_internal): Add
>> > > OPTION_MASK_ISA_IBT, OPTION_MASK_ISA_SHSTK handling.
>> > > * config/i386/i386-passes.def: Add pass_insert_endbranch pass.
>> > > * config/i386/i386-protos.h (make_pass_insert_endbranch): New
>> > > prototype.
>> > > * config/i386/i386.c (rest_of_insert_endbranch): New.
>> > > (pass_data_insert_endbranch): Likewise.
>> > > (pass_insert_endbranch): Likewise.
>> > > (make_pass_insert_endbranch): Likewise.
>> > > (ix86_notrack_prefixed_insn_p): Likewise.
>> > > (ix86_target_string): Add -mibt, -mshstk flags.
>> > > (ix86_option_override_internal): Add flag_instrument_control_flow
>> > > processing.
>> > > (ix86_valid_target_attribute_inner_p): Set OPT_mibt, OPT_mshstk.
>> > > (ix86_print_operand): Add 'notrack' prefix output.
>> > > (ix86_init_mmx_sse_builtins): Add CET intrinsics.
>> > > (ix86_expand_builtin): Expand CET intrinsics.
>> > > (x86_output_mi_thunk): Add 'endbranch' instruction.
>> > > * config/i386/i386.h (TARGET_IBT): New.
>> > > (TARGET_IBT_P): Likewise.
>> > > (TARGET_SHSTK): Likewise.
>> > > (TARGET_SHSTK_P): Likewise.
>> > > * config/i386/i386.md (unspecv): Add UNSPECV_NOP_RDSSP,
>> > > UNSPECV_INCSSP, UNSPECV_SAVEPREVSSP, UNSPECV_RSTORSSP,
>> > UNSPECV_WRSS,
>> > > UNSPECV_WRUSS, UNSPECV_SETSSBSY, UNSPECV_CLRSSBSY.
>> > > (builtin_setjmp_setup): New pattern.
>> > > (builtin_longjmp): Likewise.
>> > > (rdssp): Likewise.
>> > > (incssp): Likewise.
>> > > (saveprevssp): Likewise.
>> > > (rstorssp): Likewise.
>> > > (wrss): Likewise.
>> > > (wruss): Likewise.
>> > > (setssbsy): Likewise.
>> > > (clrssbsy): Likewise.
>> > > (nop_endbr): Likewise.
>> > > * config/i386/i386.opt: Add -mcet, -mibt, -mshstk and -mcet-switch
>> > > options.
>> > > * config/i386/immintrin.h: Include .
>> > > * config/i386/linux-common.h
>> > > (file_end_indicate_exec_stack_and_cet): New prototype.
>> > > (TARGET_ASM_FILE_END): New.
>> >
>> > LGTM.
>> >
>> > OK for mainline.
>> >
>> > Thanks,
>> > Uros.


Re: [PATCH] Avoid UB in ia32intrin.h rotate patterns (PR target/82498)

2017-10-13 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 9:39 PM, Jakub Jelinek  wrote:
> Hi!
>
> The ia32intrin.h rotate intrinsics require the second argument to be
> in between 1 and 31 (or 63), otherwise they invoke UB.  But, we can do much
> better while generating the same instruction when optimizing, so the
> following patch uses the patterns we pattern recognize well and where
> the second argument can be any value.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-10-12  Jakub Jelinek  
>
> PR target/82498
> * config/i386/ia32intrin.h (__rold, __rord, __rolq, __rorq): Allow
> any values of __C while still being pattern recognizable as a simple
> rotate instruction.
>
> * gcc.dg/ubsan/pr82498.c: New test.

LGTM.

Thanks,
Uros.

> --- gcc/config/i386/ia32intrin.h.jj 2017-01-01 12:45:42.0 +0100
> +++ gcc/config/i386/ia32intrin.h2017-10-12 09:55:24.235602737 +0200
> @@ -147,7 +147,8 @@ extern __inline unsigned int
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  __rold (unsigned int __X, int __C)
>  {
> -  return (__X << __C) | (__X >> (32 - __C));
> +  __C &= 31;
> +  return (__X << __C) | (__X >> (-__C & 31));
>  }
>
>  /* 8bit ror */
> @@ -171,7 +172,8 @@ extern __inline unsigned int
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  __rord (unsigned int __X, int __C)
>  {
> -  return (__X >> __C) | (__X << (32 - __C));
> +  __C &= 31;
> +  return (__X >> __C) | (__X << (-__C & 31));
>  }
>
>  /* Pause */
> @@ -239,7 +241,8 @@ extern __inline unsigned long long
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  __rolq (unsigned long long __X, int __C)
>  {
> -  return (__X << __C) | (__X >> (64 - __C));
> +  __C &= 63;
> +  return (__X << __C) | (__X >> (-__C & 63));
>  }
>
>  /* 64bit ror */
> @@ -247,7 +250,8 @@ extern __inline unsigned long long
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  __rorq (unsigned long long __X, int __C)
>  {
> -  return (__X >> __C) | (__X << (64 - __C));
> +  __C &= 63;
> +  return (__X >> __C) | (__X << (-__C & 63));
>  }
>
>  /* Read flags register */
> --- gcc/testsuite/gcc.dg/ubsan/pr82498.c.jj 2017-10-12 09:40:36.025438511 
> +0200
> +++ gcc/testsuite/gcc.dg/ubsan/pr82498.c2017-10-12 10:06:06.636790077 
> +0200
> @@ -0,0 +1,159 @@
> +/* PR target/82498 */
> +/* { dg-do run { target i?86-*-* x86_64-*-* } } */
> +/* { dg-options "-fsanitize=undefined -fno-sanitize-recover=undefined" } */
> +
> +#include 
> +
> +volatile unsigned int a;
> +volatile unsigned long long b;
> +volatile int c;
> +
> +int
> +main ()
> +{
> +  a = 0x12345678U;
> +  a = __rold (a, 0);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rold (a, 32);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rold (a, -32);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rold (a, 37);
> +  if (a != 0x468acf02U)
> +__builtin_abort ();
> +  a = __rold (a, -5);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rord (a, 0);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rord (a, 32);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rord (a, -32);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  a = __rord (a, -37);
> +  if (a != 0x468acf02U)
> +__builtin_abort ();
> +  a = __rord (a, 5);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = 0;
> +  a = __rold (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = 32;
> +  a = __rold (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = -32;
> +  a = __rold (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = 37;
> +  a = __rold (a, c);
> +  if (a != 0x468acf02U)
> +__builtin_abort ();
> +  c = -5;
> +  a = __rold (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = 0;
> +  a = __rord (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = 32;
> +  a = __rord (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = -32;
> +  a = __rord (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +  c = -37;
> +  a = __rord (a, c);
> +  if (a != 0x468acf02U)
> +__builtin_abort ();
> +  c = 5;
> +  a = __rord (a, c);
> +  if (a != 0x12345678U)
> +__builtin_abort ();
> +#ifdef __x86_64__
> +  b = 0x123456789abcdef1ULL;
> +  b = __rolq (b, 0);
> +  if (b != 0x123456789abcdef1ULL)
> +__builtin_abort ();
> +  b = __rolq (b, 64);
> +  if (b != 0x123456789abcdef1ULL)
> +__builtin_abort ();
> +  b = __rolq (b, -64);
> +  if (b != 0x123456789abcdef1ULL)
> +__builtin_abort ();
> +  b = __rolq (b, 69);
> +  if (b != 0x468acf13579bde22ULL)
> +__builtin_abort ();
> +  b = __rolq (b, -5);
> +  if (b != 0x123456789abcdef1ULL)
> +__builtin_abort ();
> +  b = __rorq (b, 0);
> +  if (b != 0x123456789abcdef1ULL)
> +

Re: [PATCH] Fix various arithmetic patterns with %[abcd]h destination (PR target/82524)

2017-10-13 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 9:49 PM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, there are two bugs in these.  One is that
> the zero_extract destination is effectively another input operand (for the
> remaining bits that are unchanged) and thus the constraint can't be =Q,
> but has to be +Q.
> And the other problem is that then LRA ICEs whenever it has 3 different
> operands, because it is unable to reload it properly.  Uros mentioned
> that it could be reloaded by using *insvqi_2-like insn to move the
> 8 bits from the operand that should use "0" constraint into the destination
> register, but LRA isn't able to do that right now.
> So this patch instead adds insn conditions that either the destination
> is the same as the first input operand or as one of the input operands
> (the latter for commutative patterns).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/7.3?
>
> 2017-10-12  Jakub Jelinek  
>
> PR target/82524
> * config/i386/i386.md (addqi_ext_1, andqi_ext_1,
> *andqi_ext_1_cc, *qi_ext_1, *xorqi_ext_1_cc): Change
> =Q constraints to +Q and into insn condition add check
> that operands[0] and operands[1] are equal.
> (*addqi_ext_2, *andqi_ext_2, *qi_ext_2): Change
> =Q constraints to +Q and into insn condition add check
> that operands[0] is equal to either operands[1] or operands[2].
>
> * gcc.c-torture/execute/pr82524.c: New test.

OK for mainline and gcc-7 branch.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2017-10-12 14:05:15.0 +0200
> +++ gcc/config/i386/i386.md 2017-10-12 17:07:11.723151868 +0200
> @@ -6264,7 +6264,7 @@ (define_insn "*add_5"
> (set_attr "mode" "")])
>
>  (define_insn "addqi_ext_1"
> -  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q")
> +  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q")
>  (const_int 8)
>  (const_int 8))
> (subreg:SI
> @@ -6275,7 +6275,8 @@ (define_insn "addqi_ext_1"
>(const_int 8)) 0)
> (match_operand:QI 2 "general_operand" "QnBc,m")) 0))
> (clobber (reg:CC FLAGS_REG))]
> -  ""
> +  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
> +   rtx_equal_p (operands[0], operands[1])"
>  {
>switch (get_attr_type (insn))
>  {
> @@ -6300,7 +6301,7 @@ (define_insn "addqi_ext_1"
> (set_attr "mode" "QI")])
>
>  (define_insn "*addqi_ext_2"
> -  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q")
> +  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q")
>  (const_int 8)
>  (const_int 8))
> (subreg:SI
> @@ -6314,7 +6315,9 @@ (define_insn "*addqi_ext_2"
>(const_int 8)
>(const_int 8)) 0)) 0))
>(clobber (reg:CC FLAGS_REG))]
> -  ""
> +  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
> +   rtx_equal_p (operands[0], operands[1])
> +   || rtx_equal_p (operands[0], operands[2])"
>"add{b}\t{%h2, %h0|%h0, %h2}"
>[(set_attr "type" "alu")
> (set_attr "mode" "QI")])
> @@ -8998,7 +9001,7 @@ (define_insn "*andqi_2_slp"
> (set_attr "mode" "QI")])
>
>  (define_insn "andqi_ext_1"
> -  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q")
> +  [(set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q")
>  (const_int 8)
>  (const_int 8))
> (subreg:SI
> @@ -9009,7 +9012,8 @@ (define_insn "andqi_ext_1"
>(const_int 8)) 0)
> (match_operand:QI 2 "general_operand" "QnBc,m")) 0))
> (clobber (reg:CC FLAGS_REG))]
> -  ""
> +  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
> +   rtx_equal_p (operands[0], operands[1])"
>"and{b}\t{%2, %h0|%h0, %2}"
>[(set_attr "isa" "*,nox64")
> (set_attr "type" "alu")
> @@ -9027,7 +9031,7 @@ (define_insn "*andqi_ext_1_cc"
>(const_int 8)) 0)
> (match_operand:QI 2 "general_operand" "QnBc,m"))
>   (const_int 0)))
> -   (set (zero_extract:SI (match_operand 0 "ext_register_operand" "=Q,Q")
> +   (set (zero_extract:SI (match_operand 0 "ext_register_operand" "+Q,Q")
>  (const_int 8)
>  (const_int 8))
> (subreg:SI
> @@ -9037,14 +9041,16 @@ (define_insn "*andqi_ext_1_cc"
>(const_int 8)
>(const_int 8)) 0)
> (match_dup 2)) 0))]
> -  "ix86_match_ccmode (insn, CCNOmode)"
> +  "ix86_match_ccmode (insn, CCNOmode)
> +   /* FIXME: without this LRA can't reload this pattern, see PR82524.  */
> +   && rtx_equal_p (operands[0], operands[1])"
>"and{b}\t{%2, %h0|%h0, %2}"
>[(set_attr "isa" "*,nox64")
>

Re: [PATCH] Improve x86 and + rotate (PR target/82498)

2017-10-13 Thread Uros Bizjak
On Thu, Oct 12, 2017 at 9:11 PM, Jakub Jelinek  wrote:
> On Thu, Oct 12, 2017 at 10:40:22AM +0200, Uros Bizjak wrote:
>> > So, if you aren't against it, I can extend the patch to handle the 4
>> > other mask patterns; as for other modes, SImode is what is being handled
>> > already, DImode is not a problem, because the FEs truncate the shift counts
>> > to integer_type_node already, and for HImode I haven't seen problem
>> > probably because most tunings avoid HImode math and so it isn't worth
>> > optimizing.
>>
>> OK, I think that we can live wtih 4 new patterns. Since these are all
>> written in the same way (as in the patch you posted), the ammended
>> patch is pre-approved for mainline.
>
> Thanks, here is what I've committed to trunk after another bootstrap/regtest
> on x86_64-linux and i686-linux:
>
> 2017-10-12  Jakub Jelinek  
>
> PR target/82498
> * config/i386/i386.md (*ashl3_mask_1,
> *3_mask_1, *3_mask_1,
> *_mask_1, *btr_mask_1): New define_insn_and_split
> patterns.
>
> * gcc.target/i386/pr82498-1.c: New test.
> * gcc.target/i386/pr82498-2.c: New test.

OK for mainline.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2017-10-11 22:37:55.933863355 +0200
> +++ gcc/config/i386/i386.md 2017-10-12 11:30:38.191535974 +0200
> @@ -10228,6 +10228,26 @@ (define_insn_and_split "*ashl3_mas
>(clobber (reg:CC FLAGS_REG))])]
>"operands[2] = gen_lowpart (QImode, operands[2]);")
>
> +(define_insn_and_split "*ashl3_mask_1"
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand")
> +   (ashift:SWI48
> + (match_operand:SWI48 1 "nonimmediate_operand")
> + (and:QI
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (ASHIFT, mode, operands)
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> +  == GET_MODE_BITSIZE (mode)-1
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(parallel
> + [(set (match_dup 0)
> +  (ashift:SWI48 (match_dup 1)
> +(match_dup 2)))
> +  (clobber (reg:CC FLAGS_REG))])])
> +
>  (define_insn "*bmi2_ashl3_1"
>[(set (match_operand:SWI48 0 "register_operand" "=r")
> (ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" "rm")
> @@ -10728,6 +10748,26 @@ (define_insn_and_split "*(clobber (reg:CC FLAGS_REG))])]
>"operands[2] = gen_lowpart (QImode, operands[2]);")
>
> +(define_insn_and_split "*3_mask_1"
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand")
> +   (any_shiftrt:SWI48
> + (match_operand:SWI48 1 "nonimmediate_operand")
> + (and:QI
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (, mode, operands)
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> +  == GET_MODE_BITSIZE (mode)-1
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(parallel
> + [(set (match_dup 0)
> +  (any_shiftrt:SWI48 (match_dup 1)
> + (match_dup 2)))
> +  (clobber (reg:CC FLAGS_REG))])])
> +
>  (define_insn_and_split "*3_doubleword"
>[(set (match_operand:DWI 0 "register_operand" "=")
> (any_shiftrt:DWI (match_operand:DWI 1 "register_operand" "0")
> @@ -11187,6 +11227,26 @@ (define_insn_and_split "*(clobber (reg:CC FLAGS_REG))])]
>"operands[2] = gen_lowpart (QImode, operands[2]);")
>
> +(define_insn_and_split "*3_mask_1"
> +  [(set (match_operand:SWI48 0 "nonimmediate_operand")
> +   (any_rotate:SWI48
> + (match_operand:SWI48 1 "nonimmediate_operand")
> + (and:QI
> +   (match_operand:QI 2 "register_operand")
> +   (match_operand:QI 3 "const_int_operand"
> +   (clobber (reg:CC FLAGS_REG))]
> +  "ix86_binary_operator_ok (, mode, operands)
> +   && (INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1))
> +  == GET_MODE_BITSIZE (mode)-1
> +   && can_create_pseudo_p ()"
> +  "#"
> +  "&& 1"
> +  [(parallel
> + [(set (match_dup 0)
> +  (any_rotate:SWI48 (match_dup 1)
> +(match_dup 2)))
> +  (clobber (reg:CC FLAGS_REG))])])
> +
>  ;; Implement rotation using two double-precision
>  ;; shift instructions and a scratch register.
>
> @@ -11494,6 +11554,30 @@ (define_insn_and_split "*_ma
>(clobber (reg:CC FLAGS_REG))])]
>"operands[1] = gen_lowpart (QImode, operands[1]);")
>
> +(define_insn_and_split "*_mask_1"
> +  [(set (match_operand:SWI48 0 "register_operand")
> +   (any_or:SWI48
> + (ashift:SWI48
> +   (const_int 1)
> +   (and:QI
> + (match_operand:QI 1 "register_operand")
> + (match_operand:QI 2 "const_int_operand")))
> + (match_operand:SWI48 3 "register_operand")))
> +   (clobber 

<    1   2