Hi Roger!
On 2022-02-03T21:00:50+0000, "Roger Sayle" <[email protected]> wrote:
> This patch
Thanks!
> addresses the "increased register pressure" regression on
> nvptx-none caused by my change to transition the backend to a
> STORE_FLAG_VALUE = 1 target.
Yes, "addresses", but unfortunately doesn't "resolve". ;-|
> This improved code generation for the
> more common case of producing 0/1 Boolean values, but unfortunately
> made things marginally worse when a 0/-1 mask value is desired.
> Unfortunately, nvptx kernels are extremely sensitive to changes in
> register usage, which was observable in the reported PR.
>
> This patch provides optimizations for -(cond ? 1 : 0), effectively
> simplify this into cond ? -1 : 0, where these ternary operators are
> provided by nvptx's selp instruction, and for the specific case of
> SImode, using (restoring) nvptx's "set" instruction (which avoids
> the need for a predicate register).
I'm confirming the improved code generation (less registers used, less
instructions emitted) in cases where it triggers -- but unfortunately it
doesn't in the PR104345 'libgomp.oacc-c-c++-common/reduction-cplx-dbl.c'
scenario.
> This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
> with a "make" and "make -k check" with no new failures. Unfortunately,
> the exact register usage of a nvptx kernel depends upon the version of
> the Cuda drivers being used (and the hardware), but I believe this
> change should resolve the PR (for Thomas) by improving code generation
> for the cases that regressed. Ok for mainline?
So, testing your patch in isolation, it does *not* resolve PR104345,
unfortunately. I'll next test in combination with your other pending
patches:
- "nvptx: Expand QI mode operations using SI mode instructions".
- "nvptx: Fix and use BI mode logic instructions (e.g. and.pred)"
Grüße
Thomas
> gcc/ChangeLog
> PR target/104345
> * config/nvptx/nvptx.md (sel_true<mode>): Fix indentation.
> (sel_false<mode>): Likewise.
> (define_code_iterator eqne): New code iterator for EQ and NE.
> (*selp<mode>_neg_<code>): New define_insn_and_split to optimize
> the negation of a selp instruction.
> (*selp<mode>_not_<code>): New define_insn_and_split to optimize
> the bitwise not of a selp instruction.
> (*setcc_int<mode>): Use set instruction for neg:SI of a selp.
>
> gcc/testsuite/ChangeLog
> PR target/104345
> * gcc.target/nvptx/neg-selp.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 92768dd..651ba20 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -892,7 +892,7 @@
>
> (define_insn "sel_true<mode>"
> [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
> - (if_then_else:HSDIM
> + (if_then_else:HSDIM
> (ne (match_operand:BI 1 "nvptx_register_operand" "R") (const_int 0))
> (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")
> (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")))]
> @@ -901,7 +901,7 @@
>
> (define_insn "sel_true<mode>"
> [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
> - (if_then_else:SDFM
> + (if_then_else:SDFM
> (ne (match_operand:BI 1 "nvptx_register_operand" "R") (const_int 0))
> (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")
> (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")))]
> @@ -910,7 +910,7 @@
>
> (define_insn "sel_false<mode>"
> [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
> - (if_then_else:HSDIM
> + (if_then_else:HSDIM
> (eq (match_operand:BI 1 "nvptx_register_operand" "R") (const_int 0))
> (match_operand:HSDIM 2 "nvptx_nonmemory_operand" "Ri")
> (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")))]
> @@ -919,13 +919,63 @@
>
> (define_insn "sel_false<mode>"
> [(set (match_operand:SDFM 0 "nvptx_register_operand" "=R")
> - (if_then_else:SDFM
> + (if_then_else:SDFM
> (eq (match_operand:BI 1 "nvptx_register_operand" "R") (const_int 0))
> (match_operand:SDFM 2 "nvptx_nonmemory_operand" "RF")
> (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")))]
> ""
> "%.\\tselp%t0\\t%0, %3, %2, %1;")
>
> +(define_code_iterator eqne [eq ne])
> +
> +;; Split negation of a predicate into a conditional move.
> +(define_insn_and_split "*selp<mode>_neg_<code>"
> + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
> + (neg:HSDIM (eqne:HSDIM
> + (match_operand:BI 1 "nvptx_register_operand" "R")
> + (const_int 0))))]
> + ""
> + "#"
> + "&& 1"
> + [(set (match_dup 0)
> + (if_then_else:HSDIM
> + (eqne (match_dup 1) (const_int 0))
> + (const_int -1)
> + (const_int 0)))])
> +
> +;; Split bitwise not of a predicate into a conditional move.
> +(define_insn_and_split "*selp<mode>_not_<code>"
> + [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
> + (not:HSDIM (eqne:HSDIM
> + (match_operand:BI 1 "nvptx_register_operand" "R")
> + (const_int 0))))]
> + ""
> + "#"
> + "&& 1"
> + [(set (match_dup 0)
> + (if_then_else:HSDIM
> + (eqne (match_dup 1) (const_int 0))
> + (const_int -2)
> + (const_int -1)))])
> +
> +(define_insn "*setcc_int<mode>"
> + [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> + (neg:SI
> + (match_operator:SI 1 "nvptx_comparison_operator"
> + [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
> + (match_operand:HSDIM 3 "nvptx_nonmemory_operand" "Ri")])))]
> + ""
> + "%.\\tset%t0%c1\\t%0, %2, %3;")
> +
> +(define_insn "*setcc_int<mode>"
> + [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> + (neg:SI
> + (match_operator:SI 1 "nvptx_float_comparison_operator"
> + [(match_operand:SDFM 2 "nvptx_register_operand" "R")
> + (match_operand:SDFM 3 "nvptx_nonmemory_operand" "RF")])))]
> + ""
> + "%.\\tset%t0%c1\\t%0, %2, %3;")
> +
> (define_insn "setcc_float<mode>"
> [(set (match_operand:SF 0 "nvptx_register_operand" "=R")
> (match_operator:SF 1 "nvptx_comparison_operator"
> diff --git a/gcc/testsuite/gcc.target/nvptx/neg-selp.c
> b/gcc/testsuite/gcc.target/nvptx/neg-selp.c
> new file mode 100644
> index 0000000..a8f0118
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/neg-selp.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int neg(int x, int y)
> +{
> + int t = (x == y) ? 1 : 0;
> + return -t;
> +}
> +
> +int not(int x, int y)
> +{
> + int t = (x == y) ? 1 : 0;
> + return ~t;
> +}
> +
> +/* { dg-final { scan-assembler-not "neg.s32" } } */
> +/* { dg-final { scan-assembler-not "not.b32" } } */
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht
München, HRB 106955