Tiny cleanup patch - as fallout of trying to understand MI300 and
its fails better
(A) Replace 's_nop 0x0; s_nop 0x0; s_nop 0x0' by 's_nop 0x2'.
Advantage: fewer instructions - this helps on the hardware side by
permitting more follow-up instructions in the instruction cache and
it reduces the file size a bit. The latter is nice by itself but as
to many instructions will cause issues with debugging, it avoids some
assembler errors - admittedly only in near-edge cases, but still.
The latter is issue is https://gcc.gnu.org/PR119367 which is about
location views that are computed as, e.g., '.2byte .LM6-.LM5 and if
this overflows, there is an assembler error.
[I hit this issue during debugging when adding at least two 's_nop'
between all instructions - doing for at least two or (at least) five
will trigger this for libgomp's complex matmul, only. Workaround:
compile those files without '-g'.]
(B) Extend two comments:
Add a reminder (comment) that 'sc0' (scope 0) has a special
meaning for atomics (same as 'glc' in MI200). This is to
prevent messing around with TARGET_GLC_NAME by, e.g., replacing
'sc0' by 'sc1' - which has unintended effects for atomics ...
For template '%' formatting: avoid accidental reuse of letters
'V' and 'R' and make it quicker to understand them when encountered.
Granted, as those appear as 'case' in the switch statement and
are documented there; still, it helps both with understanding a
'%R' more quickly and moves a compile-time fail for already
written code to a handling it correctly already during coding.
OK for mainline?
Tobias
PS: I checked that s_nop 0x0 to 0xf is documented also for old Vega
and for RDNA; additionally, I don't see bootstrap issues in terms
due to llvm-mc complaining about the assembly - neither for a standard
build nor for a gfx942 build.
PPS: I still have to send a fixed version of the second 's_nop' patch,
addressing the review comments.
gcn: Add 'nops' insn, extend comments
Use 's_nops' with a number instead of multiple of 's_nop' when
manually adding 1 to 5 wait state. This helps with
the instruction cache and helps a tiny bit with PR119367 where
a two-byte variable overflows in the debugging location view handling.
Add a comment about 'sc0' to TARGET_GLC_NAME as for atomics it is
unrelated to the scope but to whether the result is stored; i.e.
using e.g. 'sc1' instead of 'sc0' will have undesired consequences!
Update the comment above print_operand_address to document 'R' and 'V';
those are used below as "Temporary hack.", but it makes sense to see
them in the list.
gcc/ChangeLog:
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add comment
about 'sc0'.
* config/gcn/gcn.cc (gcn_md_reorg): Use gen_nops instead of gen_nop.
(print_operand_address): Document 'R' and 'V' in the
pre- comment as well.
* config/gcn/gcn.md (nops): Add.
gcc/config/gcn/gcn-opts.h | 2 ++
gcc/config/gcn/gcn.cc | 9 +++++++--
gcc/config/gcn/gcn.md | 9 +++++++++
3 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index fe68678bd02..0287400fd0c 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -92,6 +92,8 @@ enum hsaco_attr_type
/* Whether to use the 'globally coherent' (glc) or the 'scope' (sc0) flag
for non-scalar memory operations. The string starts on purpose with a space.
Note: for scalar memory operations (i.e. 's_...'), 'glc' is still used.
+ Note: on atomics, glc/sc0 denotes whether the pre-op operation should
+ be used.
CDNA3 also uses 'nt' instead of 'slc' and 'sc1' instead of 'scc'; however,
there is no non-scalar user so far. */
#define TARGET_GLC_NAME (TARGET_CDNA3 ? " sc0" : " glc")
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 8959118b869..75b0936dce8 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -6436,8 +6436,8 @@ gcn_md_reorg (void)
}
/* Insert the required number of NOPs. */
- for (int i = nops_rqd; i > 0; i--)
- emit_insn_after (gen_nop (), last_insn);
+ if (nops_rqd > 0)
+ emit_insn_after (gen_nops (GEN_INT (nops_rqd-1)), last_insn);
/* Age the previous instructions. We can also ignore writes to
registers subsequently overwritten. */
@@ -7283,6 +7283,11 @@ print_operand_address (FILE *file, rtx mem)
H - print second part of a multi-reg value (high-part of 2-reg value)
J - print third part of a multi-reg value
K - print fourth part of a multi-reg value
+ R Print a scalar register number as an integer. Temporary hack.
+ V - Print a vector register number as an integer. Temporary hack.
+
+ Additionally, the standard builtin c, n, a, and l exist; see gccint's
+ "Output Templates and Operand Substitution" for details.
*/
void
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index fad42e6a6bf..9172db08b2e 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -424,6 +424,15 @@ (define_insn "nop"
"s_nop\t0x0"
[(set_attr "type" "sopp")])
+; Variant of 'nop' that accepts a count argument.
+; s_nop accepts 0x0 to 0xf for 1 to 16 nops; however,
+; as %0 prints decimals, only 0 to 9 (= 1 to 10 nops) can be used.
+(define_insn "nops"
+ [(match_operand 0 "const_int_operand")]
+ ""
+ "s_nop\t0x%0"
+ [(set_attr "type" "sopp")])
+
; FIXME: What should the value of the immediate be? Zero is disallowed, so
; pick 1 for now.
(define_insn "trap"