Re: [PATCH] Improve x86-64 32-bit div/mod followed by zero-extension to 64-bit (PR target/82361)

2017-09-30 Thread Uros Bizjak
On Fri, Sep 29, 2017 at 11:05 PM, Jakub Jelinek  wrote:
> Hi!
>
> The following patch adds patterns and splitters for {,u}divmodsi4 followed
> by zero-extension, similarly to other 32-bit operand instructions divl and
> idivl zero extends both results to 64-bit, so there is no need to extend it
> again.  The REE pass ignores instructions that have more than one SET, but
> at least the combiner doesn't.  The patch adds both patterns/splitters that
> zero extend the quotient and patterns/splttiers that zero extend the modulo
> (the combiner wants in that case the modulo to be the first operation).
> I have a patch which I'll attach to the PR, which also has patterns for
> both results zero extended, but as neither combiner nor anything else is
> able to match them right now, I'm not including it here.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-09-29  Jakub Jelinek  
>
> PR target/82361
> * config/i386/i386.md
> (TARGET_USE_8BIT_IDIV zext divmodsi4 splitter): New define_split.
> (divmodsi4_zext_1, divmodsi4_zext_2, *divmodsi4_zext_1,
> *divmodsi4_zext_2): New define_insn_and_split.
> (*divmodsi4_noext_zext_1, *divmodsi4_noext_zext_2): New define_insn.
> (TARGET_USE_8BIT_IDIV zext udivmodsi4 splitter): New define_split.
> (udivmodsi4_zext_1, udivmodsi4_zext_2, *udivmodsi4_zext_1,
> *udivmodsi4_zext_2, *udivmodsi4_pow2_zext_1, *udivmodsi4_pow2_zext_2):
> New define_insn_and_split.
> (*udivmodsi4_noext_zext_1, *udivmodsi4_noext_zext_2): New define_insn.
> * config/i386/i386.c (ix86_split_idivmod): Handle operands[0] or
> operands[1] having DImode when mode is SImode.
>
> * gcc.target/i386/pr82361-1.c: New test.
> * gcc.target/i386/pr82361-2.c: New test.

OK, although this is quite some work for relatively small gain. The
reason that zext for divisions was not implemented was that a zext was
relatviely cheap comparing to idiv insn, so it was not a pressing
issue,

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2017-09-29 09:19:42.0 +0200
> +++ gcc/config/i386/i386.md 2017-09-29 19:19:34.795293575 +0200
> @@ -7635,6 +7635,36 @@ (define_split
>[(const_int 0)]
>"ix86_split_idivmod (mode, operands, true); DONE;")
>
> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> +   (zero_extend:DI
> + (div:SI (match_operand:SI 2 "register_operand")
> + (match_operand:SI 3 "nonimmediate_operand"
> +   (set (match_operand:SI 1 "register_operand")
> +   (mod:SI (match_dup 2) (match_dup 3)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_8BIT_IDIV
> +   && TARGET_QIMODE_MATH
> +   && can_create_pseudo_p ()
> +   && !optimize_insn_for_size_p ()"
> +  [(const_int 0)]
> +  "ix86_split_idivmod (SImode, operands, true); DONE;")
> +
> +(define_split
> +  [(set (match_operand:DI 1 "register_operand")
> +   (zero_extend:DI
> + (mod:SI (match_operand:SI 2 "register_operand")
> + (match_operand:SI 3 "nonimmediate_operand"
> +   (set (match_operand:SI 0 "register_operand")
> +   (div:SI  (match_dup 2) (match_dup 3)))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_USE_8BIT_IDIV
> +   && TARGET_QIMODE_MATH
> +   && can_create_pseudo_p ()
> +   && !optimize_insn_for_size_p ()"
> +  [(const_int 0)]
> +  "ix86_split_idivmod (SImode, operands, true); DONE;")
> +
>  (define_insn_and_split "divmod4_1"
>[(set (match_operand:SWI48 0 "register_operand" "=a")
> (div:SWI48 (match_operand:SWI48 2 "register_operand" "0")
> @@ -7670,6 +7700,79 @@ (define_insn_and_split "divmod4_1"
>[(set_attr "type" "multi")
> (set_attr "mode" "")])
>
> +(define_insn_and_split "divmodsi4_zext_1"
> +  [(set (match_operand:DI 0 "register_operand" "=a")
> +   (zero_extend:DI
> + (div:SI (match_operand:SI 2 "register_operand" "0")
> + (match_operand:SI 3 "nonimmediate_operand" "rm"
> +   (set (match_operand:SI 1 "register_operand" "=")
> +   (mod:SI (match_dup 2) (match_dup 3)))
> +   (unspec [(const_int 0)] UNSPEC_DIV_ALREADY_SPLIT)
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_64BIT"
> +  "#"
> +  "reload_completed"
> +  [(parallel [(set (match_dup 1)
> +  (ashiftrt:SI (match_dup 4) (match_dup 5)))
> + (clobber (reg:CC FLAGS_REG))])
> +   (parallel [(set (match_dup 0)
> +  (zero_extend:DI (div:SI (match_dup 2) (match_dup 3
> + (set (match_dup 1)
> +  (mod:SI (match_dup 2) (match_dup 3)))
> + (use (match_dup 1))
> + (clobber (reg:CC FLAGS_REG))])]
> +{
> +  operands[5] = GEN_INT (GET_MODE_BITSIZE (SImode)-1);
> +
> +  if (optimize_function_for_size_p (cfun) || TARGET_USE_CLTD)
> +operands[4] = operands[2];
> +  else
> +{
> +  /* Avoid use of cltd in favor of a mov+shift.  */
> +  emit_move_insn 

[PATCH] Improve x86-64 32-bit div/mod followed by zero-extension to 64-bit (PR target/82361)

2017-09-29 Thread Jakub Jelinek
Hi!

The following patch adds patterns and splitters for {,u}divmodsi4 followed
by zero-extension, similarly to other 32-bit operand instructions divl and
idivl zero extends both results to 64-bit, so there is no need to extend it
again.  The REE pass ignores instructions that have more than one SET, but
at least the combiner doesn't.  The patch adds both patterns/splitters that
zero extend the quotient and patterns/splttiers that zero extend the modulo
(the combiner wants in that case the modulo to be the first operation).
I have a patch which I'll attach to the PR, which also has patterns for
both results zero extended, but as neither combiner nor anything else is
able to match them right now, I'm not including it here.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-09-29  Jakub Jelinek  

PR target/82361
* config/i386/i386.md
(TARGET_USE_8BIT_IDIV zext divmodsi4 splitter): New define_split.
(divmodsi4_zext_1, divmodsi4_zext_2, *divmodsi4_zext_1,
*divmodsi4_zext_2): New define_insn_and_split.
(*divmodsi4_noext_zext_1, *divmodsi4_noext_zext_2): New define_insn.
(TARGET_USE_8BIT_IDIV zext udivmodsi4 splitter): New define_split.
(udivmodsi4_zext_1, udivmodsi4_zext_2, *udivmodsi4_zext_1,
*udivmodsi4_zext_2, *udivmodsi4_pow2_zext_1, *udivmodsi4_pow2_zext_2):
New define_insn_and_split.
(*udivmodsi4_noext_zext_1, *udivmodsi4_noext_zext_2): New define_insn.
* config/i386/i386.c (ix86_split_idivmod): Handle operands[0] or
operands[1] having DImode when mode is SImode.

* gcc.target/i386/pr82361-1.c: New test.
* gcc.target/i386/pr82361-2.c: New test.

--- gcc/config/i386/i386.md.jj  2017-09-29 09:19:42.0 +0200
+++ gcc/config/i386/i386.md 2017-09-29 19:19:34.795293575 +0200
@@ -7635,6 +7635,36 @@ (define_split
   [(const_int 0)]
   "ix86_split_idivmod (mode, operands, true); DONE;")
 
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+   (zero_extend:DI
+ (div:SI (match_operand:SI 2 "register_operand")
+ (match_operand:SI 3 "nonimmediate_operand"
+   (set (match_operand:SI 1 "register_operand")
+   (mod:SI (match_dup 2) (match_dup 3)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_8BIT_IDIV
+   && TARGET_QIMODE_MATH
+   && can_create_pseudo_p ()
+   && !optimize_insn_for_size_p ()"
+  [(const_int 0)]
+  "ix86_split_idivmod (SImode, operands, true); DONE;")
+
+(define_split
+  [(set (match_operand:DI 1 "register_operand")
+   (zero_extend:DI
+ (mod:SI (match_operand:SI 2 "register_operand")
+ (match_operand:SI 3 "nonimmediate_operand"
+   (set (match_operand:SI 0 "register_operand")
+   (div:SI  (match_dup 2) (match_dup 3)))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_USE_8BIT_IDIV
+   && TARGET_QIMODE_MATH
+   && can_create_pseudo_p ()
+   && !optimize_insn_for_size_p ()"
+  [(const_int 0)]
+  "ix86_split_idivmod (SImode, operands, true); DONE;")
+
 (define_insn_and_split "divmod4_1"
   [(set (match_operand:SWI48 0 "register_operand" "=a")
(div:SWI48 (match_operand:SWI48 2 "register_operand" "0")
@@ -7670,6 +7700,79 @@ (define_insn_and_split "divmod4_1"
   [(set_attr "type" "multi")
(set_attr "mode" "")])
 
+(define_insn_and_split "divmodsi4_zext_1"
+  [(set (match_operand:DI 0 "register_operand" "=a")
+   (zero_extend:DI
+ (div:SI (match_operand:SI 2 "register_operand" "0")
+ (match_operand:SI 3 "nonimmediate_operand" "rm"
+   (set (match_operand:SI 1 "register_operand" "=")
+   (mod:SI (match_dup 2) (match_dup 3)))
+   (unspec [(const_int 0)] UNSPEC_DIV_ALREADY_SPLIT)
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT"
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 1)
+  (ashiftrt:SI (match_dup 4) (match_dup 5)))
+ (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 0)
+  (zero_extend:DI (div:SI (match_dup 2) (match_dup 3
+ (set (match_dup 1)
+  (mod:SI (match_dup 2) (match_dup 3)))
+ (use (match_dup 1))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+  operands[5] = GEN_INT (GET_MODE_BITSIZE (SImode)-1);
+
+  if (optimize_function_for_size_p (cfun) || TARGET_USE_CLTD)
+operands[4] = operands[2];
+  else
+{
+  /* Avoid use of cltd in favor of a mov+shift.  */
+  emit_move_insn (operands[1], operands[2]);
+  operands[4] = operands[1];
+}
+}
+  [(set_attr "type" "multi")
+   (set_attr "mode" "SI")])
+
+(define_insn_and_split "divmodsi4_zext_2"
+  [(set (match_operand:DI 1 "register_operand" "=")
+   (zero_extend:DI
+ (mod:SI (match_operand:SI 2 "register_operand" "0")
+ (match_operand:SI 3 "nonimmediate_operand" "rm"
+   (set (match_operand:SI 0 "register_operand" "=a")
+   (div:SI (match_dup 2) (match_dup 3)))
+   (unspec