From: Claudiu Zissulescu <claudiu.zissulescu-iancule...@oracle.com>

MEMTAG sanitizer, which is based on the HWASAN sanitizer, will invoke
the target-specific hooks to create a random tag, add tag to memory
address, and finally tag and untag memory.

Implement the target hooks to emit MTE instructions if MEMTAG sanitizer
is in effect.  Continue to use the default target hook if HWASAN is
being used.  Following target hooks are implemented:
   - TARGET_MEMTAG_INSERT_RANDOM_TAG
   - TARGET_MEMTAG_ADD_TAG
   - TARGET_MEMTAG_EXTRACT_TAG

Apart from the target-specific hooks, set the following to values
defined by the Memory Tagging Extension (MTE) in aarch64:
   - TARGET_MEMTAG_TAG_SIZE
   - TARGET_MEMTAG_GRANULE_SIZE

The next instructions were (re-)defined:
   - addg/subg (used by TARGET_MEMTAG_ADD_TAG and
     TARGET_MEMTAG_COMPOSE_OFFSET_TAG hooks)
   - stg/st2g Used to tag/untag a memory granule.
   - tag_memory A target specific instruction, it will will emit MTE
     instructions to tag/untag memory of a given size.
   - compose_tag A target specific instruction that computes a tagged
     address as an offset from a base (tagged) address.
   - gmi Used for randomizing the inserting tag.
   - irg Likewise.

gcc/

        * config/aarch64/aarch64.md (addg): Update pattern to use
        addg/subg instructions.
        (stg): Update pattern.
        (st2g): New pattern.
        (tag_memory): Likewise.
        (compose_tag): Likewise.
        (irq): Update pattern to accept xzr register.
        (gmi): Likewise.
        (UNSPECV_TAG_SPACE): Define.
        * config/aarch64/aarch64.cc (AARCH64_MEMTAG_GRANULE_SIZE):
        Define.
        (AARCH64_MEMTAG_TAG_BITSIZE): Likewise.
        (AARCH64_MEMTAG_TAG_MEMORY_LOOP_THRESHOLD): Likewise.
        (aarch64_override_options_internal): Error out if MTE instructions
        are not available.
        (aarch64_post_cfi_startproc): Emit .cfi_mte_tagged_frame.
        (aarch64_can_tag_addresses): Add MEMTAG specific handling.
        (aarch64_memtag_tag_bitsize): New function
        (aarch64_memtag_granule_size): Likewise.
        (aarch64_memtag_insert_random_tag): Likwise.
        (aarch64_memtag_add_tag): Likewise.
        (aarch64_memtag_extract_tag): Likewise.
        (aarch64_granule16_memory_address_p): Likewise.
        (aarch64_emit_stxg_insn): Likewise.
        (aarch64_gen_tag_memory_postindex): Likewise.
        (aarch64_memtag_tag_memory_via_loop): New definition.
        (aarch64_expand_tag_memory): Likewise.
        (aarch64_check_memtag_ops): Likewise.
        (aarch64_gen_tag_memory_postindex): Likewise.
        (TARGET_MEMTAG_TAG_SIZE): Define.
        (TARGET_MEMTAG_GRANULE_SIZE): Likewise.
        (TARGET_MEMTAG_INSERT_RANDOM_TAG): Likewise.
        (TARGET_MEMTAG_ADD_TAG): Likewise.
        (TARGET_MEMTAG_EXTRACT_TAG): Likewise.
        * config/aarch64/aarch64-builtins.cc
        (aarch64_expand_builtin_memtag): Update set tag builtin logic.
        * config/aarch64/aarch64-linux.h: Pass memtag-stack sanitizer
        specific options to the linker.
        * config/aarch64/aarch64-protos.h
        (aarch64_granule16_memory_address_p): New prototype.
        (aarch64_check_memtag_ops): Likewise.
        (aarch64_expand_tag_memory): Likewise.
        * config/aarch64/constraints.md (Umg): New memory constraint.
        (Uag): New constraint.
        (Ung): Likewise.
        * config/aarch64/predicates.md (aarch64_memtag_tag_offset):
        Refactor it.
        (aarch64_granule16_imm6): Rename from aarch64_granule16_uimm6 and
        refactor it.
        (aarch64_granule16_memory_operand): New constraint.
        * config/aarch64/iterators.md (MTE_PP): New code iterator to be
        used for mte instructions.
        (stg_ops): New code attributes.
        (st2g_ops): Likewise.
        (mte_name): Likewise.

doc/
        * invoke.texi: Update documentation.

gcc/testsuite:

        * gcc.target/aarch64/acle/memtag_1.c: Update test.

Co-authored-by: Indu Bhagat <indu.bha...@oracle.com>
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-iancule...@oracle.com>

UPDATE: aarch64: Add support for memetag-stack sanitizer using MTE insns
---
 gcc/config/aarch64/aarch64-builtins.cc        |   7 +-
 gcc/config/aarch64/aarch64-linux.h            |   4 +-
 gcc/config/aarch64/aarch64-protos.h           |   3 +
 gcc/config/aarch64/aarch64.cc                 | 333 +++++++++++++++++-
 gcc/config/aarch64/aarch64.md                 | 127 +++++--
 gcc/config/aarch64/constraints.md             |  21 ++
 gcc/config/aarch64/iterators.md               |  20 ++
 gcc/config/aarch64/predicates.md              |  13 +-
 gcc/doc/invoke.texi                           |   6 +-
 .../gcc.target/aarch64/acle/memtag_1.c        |   4 +-
 10 files changed, 498 insertions(+), 40 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 408099a50e8..31431693cf2 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -3680,8 +3680,11 @@ aarch64_expand_builtin_memtag (int fcode, tree exp, rtx 
target)
        pat = GEN_FCN (icode) (target, op0, const0_rtx);
        break;
       case AARCH64_MEMTAG_BUILTIN_SET_TAG:
-       pat = GEN_FCN (icode) (op0, op0, const0_rtx);
-       break;
+       {
+         rtx mem = gen_rtx_MEM (TImode, op0);
+         pat = GEN_FCN (icode) (mem, op0);
+         break;
+       }
       default:
        gcc_unreachable();
     }
diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 116bb4e69f3..4fa78e0b2f5 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -48,7 +48,9 @@
    %{static-pie:-Bstatic -pie --no-dynamic-linker -z text} \
    -X                                          \
    %{mbig-endian:-EB} %{mlittle-endian:-EL}     \
-   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
+   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} \
+   %{%:sanitize(memtag-stack):%{!fsanitize-memtag-mode:-z memtag-stack -z 
memtag-mode=sync}} \
+   %{%:sanitize(memtag-stack):%{fsanitize-memtag-mode=*:-z memtag-stack -z 
memtag-mode=%}}"
 
 
 #define LINK_SPEC LINUX_TARGET_LINK_SPEC AARCH64_ERRATA_LINK_SPEC
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 56efcf2c7f2..ab627792963 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1116,6 +1116,9 @@ void aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, 
rtx, rtx);
 
 bool aarch64_prepare_sve_int_fma (rtx *, rtx_code);
 bool aarch64_prepare_sve_cond_int_fma (rtx *, rtx_code);
+
+bool aarch64_granule16_memory_address_p (rtx mem);
+void aarch64_expand_tag_memory (rtx, rtx, rtx);
 #endif /* RTX_CODE */
 
 bool aarch64_process_target_attr (tree);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index c5110566215..b29142e105b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -19109,6 +19109,10 @@ aarch64_override_options_internal (struct gcc_options 
*opts)
 #endif
     }
 
+  if (flag_sanitize & SANITIZE_MEMTAG_STACK && !TARGET_MEMTAG)
+    error ("%<-fsanitize=memtag-stack%> requires the ISA extension %qs",
+          "memtag");
+
   aarch64_feature_flags isa_flags = aarch64_get_isa_flags (opts);
   if ((isa_flags & (AARCH64_FL_SM_ON | AARCH64_FL_ZA_ON))
       && !(isa_flags & AARCH64_FL_SME))
@@ -25718,6 +25722,8 @@ aarch64_asm_output_external (FILE *stream, tree decl, 
const char* name)
   aarch64_asm_output_variant_pcs (stream, decl, name);
 }
 
+bool aarch64_can_tag_addresses (void);
+
 /* Triggered after a .cfi_startproc directive is emitted into the assembly 
file.
    Used to output the .cfi_b_key_frame directive when signing the current
    function with the B key.  */
@@ -25728,6 +25734,10 @@ aarch64_post_cfi_startproc (FILE *f, tree ignored 
ATTRIBUTE_UNUSED)
   if (cfun->machine->frame.laid_out && aarch64_return_address_signing_enabled 
()
       && aarch64_ra_sign_key == AARCH64_KEY_B)
        asm_fprintf (f, "\t.cfi_b_key_frame\n");
+  if (cfun->machine->frame.laid_out && aarch64_can_tag_addresses ()
+      && memtag_sanitize_p ()
+      && !known_eq (cfun->machine->frame.frame_size, 0))
+    asm_fprintf (f, "\t.cfi_mte_tagged_frame\n");
 }
 
 /* Implements TARGET_ASM_FILE_START.  Output the assembly header.  */
@@ -30404,15 +30414,321 @@ aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, 
const_tree type1,
   return NULL;
 }
 
+#define AARCH64_MEMTAG_GRANULE_SIZE  16
+#define AARCH64_MEMTAG_TAG_BITSIZE    4
+
 /* Implement TARGET_MEMTAG_CAN_TAG_ADDRESSES.  Here we tell the rest of the
    compiler that we automatically ignore the top byte of our pointers, which
-   allows using -fsanitize=hwaddress.  */
+   allows using -fsanitize=hwaddress.  In case of -fsanitize=memtag, we
+   additionally ensure that target supports MEMTAG insns.  */
 bool
 aarch64_can_tag_addresses ()
 {
+  if (memtag_sanitize_p ())
+    return !TARGET_ILP32 && TARGET_MEMTAG;
   return !TARGET_ILP32;
 }
 
+/* Implement TARGET_MEMTAG_TAG_BITSIZE.  */
+unsigned char
+aarch64_memtag_tag_bitsize ()
+{
+  if (memtag_sanitize_p ())
+    return AARCH64_MEMTAG_TAG_BITSIZE;
+  return default_memtag_tag_bitsize ();
+}
+
+/* Implement TARGET_MEMTAG_GRANULE_SIZE.  */
+unsigned char
+aarch64_memtag_granule_size ()
+{
+  if (memtag_sanitize_p ())
+    return AARCH64_MEMTAG_GRANULE_SIZE;
+  return default_memtag_granule_size ();
+}
+
+/* Implement TARGET_MEMTAG_INSERT_RANDOM_TAG.  In the case of MTE instructions,
+   make sure the gmi and irg instructions are generated when
+   -fsanitize=memtag-stack is used.  The first argument UNTAGGED can be a
+   tagged pointer, and its tag is used in the exclusion set.  Thus, the TARGET
+   doesn't use the same tag.  */
+rtx
+aarch64_memtag_insert_random_tag (rtx untagged, rtx target)
+{
+  if (memtag_sanitize_p ())
+    {
+      insn_code icode = CODE_FOR_gmi;
+      expand_operand ops_gmi[3];
+      rtx tmp = gen_reg_rtx (Pmode);
+      create_output_operand (&ops_gmi[0], tmp, Pmode);
+      create_input_operand  (&ops_gmi[1], untagged, Pmode);
+      create_integer_operand  (&ops_gmi[2], 0);
+      expand_insn (icode, 3, ops_gmi);
+
+      icode = CODE_FOR_irg;
+      expand_operand ops_irg[3];
+      create_output_operand (&ops_irg[0], target, Pmode);
+      create_input_operand  (&ops_irg[1], untagged, Pmode);
+      create_input_operand  (&ops_irg[2], ops_gmi[0].value, Pmode);
+      expand_insn (icode, 3, ops_irg);
+      return ops_irg[0].value;
+    }
+  else
+    return default_memtag_insert_random_tag (untagged, target);
+}
+
+/* Implement TARGET_MEMTAG_ADD_TAG.  For memtag sanitizer, emit addg/subg
+   instructions, otherwise fall back on the default implementation.  */
+rtx
+aarch64_memtag_add_tag (rtx base, poly_int64 offset, uint8_t tag_offset)
+{
+  if (memtag_sanitize_p ())
+    {
+      rtx target = NULL;
+      poly_int64 addr_offset = offset;
+      rtx offset_rtx = gen_int_mode (addr_offset, DImode);
+
+      if (!aarch64_granule16_imm6 (offset_rtx, DImode))
+       {
+         /* Emit addr arithmetic prior to addg/subg.  */
+         base = expand_simple_binop (Pmode, PLUS, base, offset_rtx,
+                                     NULL, true, OPTAB_LIB_WIDEN);
+         addr_offset = 0;
+       }
+
+      insn_code icode = CODE_FOR_addg;
+      expand_operand ops[4];
+      create_output_operand (&ops[0], target, DImode);
+      create_input_operand (&ops[1], base, DImode);
+      create_integer_operand (&ops[2], addr_offset);
+      create_integer_operand (&ops[3], tag_offset);
+      /* Addr offset and tag offset must be within bounds at this time.  */
+      gcc_assert (aarch64_memtag_tag_offset (ops[3].value, DImode));
+
+      expand_insn (icode, 4, ops);
+      return ops[0].value;
+    }
+  else
+    return default_memtag_add_tag (base, offset, tag_offset);
+}
+
+/* Implement TARGET_MEMTAG_EXTRACT_TAG.  In the case of memtag sanitizer, MTE
+   instructions allows us to work with tag-address tuple, thus no need to
+   extract the tag, emit a simple move.  */
+rtx
+aarch64_memtag_extract_tag (rtx tagged_pointer, rtx target)
+{
+
+  if (memtag_sanitize_p ())
+    {
+      rtx ret = gen_reg_rtx (DImode);
+      emit_move_insn (ret, gen_lowpart (DImode, tagged_pointer));
+      return ret;
+    }
+  else
+    return default_memtag_extract_tag (tagged_pointer, target);
+}
+
+/* Return TRUE if x is a valid memory address form for memtag loads and
+   stores.  */
+bool
+aarch64_granule16_memory_address_p (rtx x)
+{
+  struct aarch64_address_info addr;
+
+  if (!MEM_P (x)
+      || !aarch64_classify_address (&addr, XEXP (x, 0), GET_MODE (x), false))
+    return false;
+
+  /* Check that the offset, if any, is encodable as 9-bit immediate.  */
+  switch (addr.type)
+    {
+    case ADDRESS_REG_IMM:
+      return aarch64_granule16_simm9 (gen_int_mode (addr.const_offset, DImode),
+                                     DImode);
+
+    case ADDRESS_REG_REG:
+      return addr.shift == 0;
+
+    default:
+      break;
+    }
+  return false;
+}
+
+/* Helper to emit either stg or st2g instruction.  */
+static void
+aarch64_emit_stxg_insn (machine_mode mode, rtx nxt, rtx addr, rtx tagp)
+{
+  rtx mem_addr = gen_rtx_MEM (mode, nxt);
+  rtvec vec = gen_rtvec (2, gen_rtx_MEM (mode, addr), tagp);
+  rtx unspec = gen_rtx_UNSPEC_VOLATILE (mode, vec, UNSPECV_TAG_SPACE);
+  emit_set_insn (mem_addr, unspec);
+}
+
+/* Generate post-index stg or st2g based on whether ITER_INCR is worth one or
+   two granules respectively.  */
+static void
+aarch64_gen_tag_memory_postindex (rtx addr, rtx tagged_pointer, int offset)
+{
+  machine_mode stgmode = TImode;
+
+  if (abs (offset) == (AARCH64_MEMTAG_GRANULE_SIZE * 2))
+    stgmode = OImode;
+  gcc_assert (abs (offset) == GET_MODE_SIZE (stgmode).to_constant ());
+
+  rtx next;
+  if (offset < 0)
+    next = gen_rtx_POST_DEC (Pmode, addr);
+  else
+    next = gen_rtx_POST_INC (Pmode, addr);
+
+  aarch64_emit_stxg_insn (stgmode, next, addr, tagged_pointer);
+}
+
+/* Tag the memory via an explicit loop.  This is used when tag_memory expand
+   is invoked for:
+     - non-constant size, or
+     - constant but not encodable size (!aarch64_granule16_simm9 ()), or
+     - constant and encodable size (aarch64_granule16_simm9 ()), but over the
+       unroll threshold (AARCH64_TAG_MEMORY_LOOP_THRESHOLD).  */
+static void
+aarch64_tag_memory_via_loop (rtx base, rtx size, rtx tagged_pointer)
+{
+  rtx_code_label *top_label, *bottom_label;
+  machine_mode iter_mode;
+  unsigned HOST_WIDE_INT len;
+  unsigned HOST_WIDE_INT granule_size;
+  unsigned HOST_WIDE_INT iters;
+  rtx iter_limit = NULL_RTX;
+  granule_size = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
+  unsigned int factor = 1;
+
+  iter_mode = GET_MODE (size);
+  if (iter_mode == VOIDmode)
+    iter_mode = word_mode;
+
+  if (CONST_INT_P (size))
+    {
+      len = INTVAL (size);
+      /* The amount of memory to tag must be aligned to granule size by now.  
*/
+      gcc_assert (abs_hwi (len) % granule_size == 0);
+      iters = abs_hwi (len) / granule_size;
+      /* Using st2g is always a faster way to tag/untag memory when compared
+        to stg.  */
+      if (iters % 2 == 0)
+       factor = 2;
+      iter_limit = GEN_INT (abs_hwi (len));
+    }
+  else
+    iter_limit = size;
+
+  rtx x_addr = base;
+
+  /* Generate the following loop (stg example):
+        mov     x8, #size
+        cmp     x8
+        ble     .L3
+     .L2:
+        stg     x3, [x3], #16
+        subs    x8, x8, #16
+        b.ne    .L2
+     .L3:
+      */
+  int offset = granule_size * factor;
+  rtx iter_incr = GEN_INT (offset);
+  /* Emit ITER.  */
+  rtx iter = gen_reg_rtx (iter_mode);
+  emit_move_insn (iter, iter_limit);
+
+  /* Check if size is zero.  */
+  bottom_label = gen_label_rtx ();
+  if (!CONST_INT_P (size))
+    {
+      rtx branch = aarch64_gen_compare_zero_and_branch (EQ, iter, 
bottom_label);
+      aarch64_emit_unlikely_jump (branch);
+    }
+
+  /* Prepare the addr operand for tagging memory.  */
+  rtx addr_reg = gen_reg_rtx (Pmode);
+  emit_move_insn (addr_reg, x_addr);
+
+  top_label = gen_label_rtx ();
+  /* Emit top label.  */
+  emit_label (top_label);
+
+  /* Tag Memory using post-index stg/st2g.  */
+  aarch64_gen_tag_memory_postindex (addr_reg, tagged_pointer, offset);
+
+  /* Decrement ITER by ITER_INCR.  */
+  emit_insn (gen_subdi3_compare1_imm (iter, iter, iter_incr,
+                                   GEN_INT (-UINTVAL (iter_incr))));
+
+  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  rtx x = gen_rtx_fmt_ee (NE, CCmode, cc_reg, const0_rtx);
+  auto jump = emit_jump_insn (gen_aarch64_bcond (x, cc_reg, top_label));
+  JUMP_LABEL (jump) = top_label;
+
+  /* Emit bottom label.  */
+  if (!CONST_INT_P (size))
+    emit_label (bottom_label);
+}
+
+/* Threshold in number of granules beyond which an explicit loop for
+   tagging a memory block is emitted.  */
+#define AARCH64_TAG_MEMORY_LOOP_THRESHOLD 10
+
+/* Implement expand for tag_memory.  */
+void
+aarch64_expand_tag_memory (rtx base, rtx tagged_pointer, rtx size)
+{
+  rtx addr;
+  HOST_WIDE_INT len, offset;
+  unsigned HOST_WIDE_INT granule_size;
+  unsigned HOST_WIDE_INT iters = 0;
+
+  granule_size = (HOST_WIDE_INT) AARCH64_MEMTAG_GRANULE_SIZE;
+
+  if (!REG_P (tagged_pointer))
+    tagged_pointer = force_reg (Pmode, tagged_pointer);
+
+  /* If size is small enough, I can can unroll the loop using stg/st2g
+     instructions.  */
+  if (CONST_INT_P (size))
+    {
+      len = INTVAL (size);
+      /* The amount of memory to tag must be aligned to granule size by now.  
*/
+      gcc_assert (abs_hwi (len) % granule_size == 0);
+
+      iters = abs_hwi (len) / granule_size;
+    }
+
+  /* Check predicate on max offset possible: offset (in base rtx) + size.  */
+  rtx end_addr = simplify_gen_binary (PLUS, Pmode, base, size);
+  end_addr = gen_rtx_MEM (TImode, end_addr);
+  if (iters > 0
+      && iters <= AARCH64_TAG_MEMORY_LOOP_THRESHOLD
+      && aarch64_granule16_memory_address_p (end_addr))
+    {
+      offset = 0;
+      while (iters)
+       {
+         machine_mode mode = TImode;
+         if (iters / 2)
+           {
+             mode = OImode;
+             iters--;
+           }
+         iters--;
+         addr = plus_constant (Pmode, base, offset);
+         offset +=  GET_MODE_SIZE (mode).to_constant ();
+         aarch64_emit_stxg_insn (mode, addr, addr, tagged_pointer);
+       }
+    }
+  else
+    aarch64_tag_memory_via_loop (base, size, tagged_pointer);
+}
+
 /* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
    section at the end if needed.  */
 void
@@ -32789,6 +33105,21 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MEMTAG_CAN_TAG_ADDRESSES
 #define TARGET_MEMTAG_CAN_TAG_ADDRESSES aarch64_can_tag_addresses
 
+#undef TARGET_MEMTAG_TAG_BITSIZE
+#define TARGET_MEMTAG_TAG_BITSIZE aarch64_memtag_tag_bitsize
+
+#undef TARGET_MEMTAG_GRANULE_SIZE
+#define TARGET_MEMTAG_GRANULE_SIZE aarch64_memtag_granule_size
+
+#undef TARGET_MEMTAG_INSERT_RANDOM_TAG
+#define TARGET_MEMTAG_INSERT_RANDOM_TAG aarch64_memtag_insert_random_tag
+
+#undef TARGET_MEMTAG_ADD_TAG
+#define TARGET_MEMTAG_ADD_TAG aarch64_memtag_add_tag
+
+#undef TARGET_MEMTAG_EXTRACT_TAG
+#define TARGET_MEMTAG_EXTRACT_TAG aarch64_memtag_extract_tag
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fedbd4026a0..1769f9563c8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -412,6 +412,7 @@ (define_c_enum "unspecv" [
     UNSPECV_GCSPOPM            ; Represent GCSPOPM.
     UNSPECV_GCSSS1             ; Represent GCSSS1 Xt.
     UNSPECV_GCSSS2             ; Represent GCSSS2 Xt.
+    UNSPECV_TAG_SPACE          ; Represent MTE tag memory space.
     UNSPECV_TSTART             ; Represent transaction start.
     UNSPECV_TCOMMIT            ; Represent transaction commit.
     UNSPECV_TCANCEL            ; Represent transaction cancel.
@@ -8537,46 +8538,48 @@ (define_insn "aarch64_rndrrs"
 ;; Memory Tagging Extension (MTE) instructions.
 
 (define_insn "irg"
-  [(set (match_operand:DI 0 "register_operand" "=rk")
+  [(set (match_operand:DI 0 "register_operand")
        (ior:DI
-        (and:DI (match_operand:DI 1 "register_operand" "rk")
+        (and:DI (match_operand:DI 1 "register_operand")
                 (const_int MEMTAG_TAG_MASK))
-        (ashift:DI (unspec:QI [(match_operand:DI 2 "register_operand" "r")]
+        (ashift:DI (unspec:QI [(match_operand:DI 2 "aarch64_reg_or_zero")]
                     UNSPEC_GEN_TAG_RND)
                    (const_int 56))))]
   "TARGET_MEMTAG"
-  "irg\\t%0, %1, %2"
-  [(set_attr "type" "memtag")]
+  {@ [ cons: =0, 1, 2 ; attrs: type ]
+     [ rk      , rk, r  ; memtag ] irg\\t%0, %1, %2
+     [ rk      , rk, Z  ; memtag ] irg\\t%0, %1
+  }
 )
 
 (define_insn "gmi"
   [(set (match_operand:DI 0 "register_operand" "=r")
-       (ior:DI (ashift:DI
-                (const_int 1)
-                (and:QI (lshiftrt:DI
-                         (match_operand:DI 1 "register_operand" "rk")
-                         (const_int 56)) (const_int 15)))
-               (match_operand:DI 2 "register_operand" "r")))]
+       (ior:DI
+        (unspec:DI [(match_operand:DI 1 "register_operand" "rk")
+                    (const_int 0)]
+                   UNSPEC_GEN_TAG)
+        (match_operand:DI 2 "aarch64_reg_or_zero" "rZ")))]
   "TARGET_MEMTAG"
-  "gmi\\t%0, %1, %2"
+  "gmi\\t%0, %1, %x2"
   [(set_attr "type" "memtag")]
 )
 
 (define_insn "addg"
-  [(set (match_operand:DI 0 "register_operand" "=rk")
+  [(set (match_operand:DI 0 "register_operand")
        (ior:DI
-        (and:DI (plus:DI (match_operand:DI 1 "register_operand" "rk")
-                         (match_operand:DI 2 "aarch64_granule16_uimm6" "i"))
-                (const_int -1080863910568919041)) ;; 0xf0ff...
+        (and:DI (plus:DI (match_operand:DI 1 "register_operand")
+                         (match_operand:DI 2 "aarch64_granule16_imm6"))
+                (const_int MEMTAG_TAG_MASK))
         (ashift:DI
-         (unspec:QI
-          [(and:QI (lshiftrt:DI (match_dup 1) (const_int 56)) (const_int 15))
-           (match_operand:QI 3 "aarch64_memtag_tag_offset" "i")]
-          UNSPEC_GEN_TAG)
+           (unspec:DI [(match_dup 1)
+                       (match_operand:QI 3 "aarch64_memtag_tag_offset")]
+                       UNSPEC_GEN_TAG)
          (const_int 56))))]
   "TARGET_MEMTAG"
-  "addg\\t%0, %1, #%2, #%3"
-  [(set_attr "type" "memtag")]
+  {@ [ cons: =0 , 1  , 2 , 3 ; attrs: type ]
+     [ rk       , rk , Uag ,  ; memtag   ] addg\t%0, %1, #%2, #%3
+     [ rk       , rk , Ung ,  ; memtag   ] subg\t%0, %1, #%n2, #%3
+  }
 )
 
 (define_insn "subp"
@@ -8610,17 +8613,83 @@ (define_insn "ldg"
 ;; STG doesn't align the address but aborts with alignment fault
 ;; when the address is not 16-byte aligned.
 (define_insn "stg"
-  [(set (mem:QI (unspec:DI
-        [(plus:DI (match_operand:DI 1 "register_operand" "rk")
-                  (match_operand:DI 2 "aarch64_granule16_simm9" "i"))]
-        UNSPEC_TAG_SPACE))
-       (and:QI (lshiftrt:DI (match_operand:DI 0 "register_operand" "rk")
-                            (const_int 56)) (const_int 15)))]
+  [(set (match_operand:TI 0 "aarch64_granule16_memory_operand" "+Umg")
+      (unspec_volatile:TI
+       [(match_dup 0)
+        (match_operand:DI 1 "register_operand" "rk")]
+       UNSPECV_TAG_SPACE))]
+  "TARGET_MEMTAG"
+  "stg\\t%1, %0"
+  [(set_attr "type" "memtag")]
+)
+
+(define_insn "stg_<mte_name>"
+  [(set (mem:TI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+r")))
+       (unspec_volatile:TI
+        [(mem:TI (match_dup 0))
+         (match_operand:DI 1 "register_operand" "rk")]
+        UNSPECV_TAG_SPACE))]
   "TARGET_MEMTAG"
-  "stg\\t%0, [%1, #%2]"
+  "stg\\t%1, <stg_ops>"
   [(set_attr "type" "memtag")]
 )
 
+;; ST2G updates allocation tags for two memory granules (i.e. 32 bytes) at
+;; once, without zero initialization.
+(define_insn "st2g"
+  [(set (match_operand:OI 0 "aarch64_granule16_memory_operand" "=Umg")
+      (unspec_volatile:OI
+       [(match_dup 0)
+        (match_operand:DI 1 "register_operand" "rk")]
+       UNSPECV_TAG_SPACE))]
+  "TARGET_MEMTAG"
+  "st2g\\t%1, %0"
+  [(set_attr "type" "memtag")]
+)
+
+(define_insn "st2g_<mte_name>"
+  [(set (mem:OI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+r")))
+       (unspec_volatile:OI
+        [(mem:OI (match_dup 0))
+         (match_operand:DI 1 "register_operand" "rk")]
+       UNSPECV_TAG_SPACE))]
+  "TARGET_MEMTAG"
+  "st2g\\t%1, <st2g_ops>"
+  [(set_attr "type" "memtag")]
+)
+
+(define_expand "tag_memory"
+  [(match_operand:DI 0 "general_operand" "")
+   (match_operand:DI 1 "nonmemory_operand" "")
+   (match_operand:DI 2 "nonmemory_operand" "")]
+  ""
+  "
+{
+  aarch64_expand_tag_memory (operands[0], operands[1], operands[2]);
+  DONE;
+}")
+
+(define_expand "compose_tag"
+  [(set (match_operand:DI 0 "register_operand")
+       (ior:DI
+        (and:DI (plus:DI (match_operand:DI 1 "register_operand")
+                         (const_int 0))
+                (const_int MEMTAG_TAG_MASK))
+        (ashift:DI
+         (unspec:DI [(match_dup 1)
+                    (match_operand 2 "immediate_operand")]
+                    UNSPEC_GEN_TAG)
+         (const_int 56))))]
+  ""
+  "
+{
+  if (INTVAL (operands[2]) == 0)
+    {
+     emit_move_insn (operands[0], operands[1]);
+     DONE;
+    }
+}")
+
 ;; Load/Store 64-bit (LS64) instructions.
 (define_insn "ld64b"
   [(set (match_operand:V8DI 0 "register_operand" "=r")
diff --git a/gcc/config/aarch64/constraints.md 
b/gcc/config/aarch64/constraints.md
index 7b9e5583bc7..94d2ff4d847 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -346,6 +346,12 @@ (define_memory_constraint "Ump"
        (match_test "aarch64_legitimate_address_p (GET_MODE (op), XEXP (op, 0),
                                                  true, ADDR_QUERY_LDP_STP)")))
 
+(define_memory_constraint "Umg"
+  "@internal
+  A memory address for MTE load/store tag operation."
+  (and (match_code "mem")
+       (match_test "aarch64_granule16_memory_address_p (op)")))
+
 ;; Used for storing or loading pairs in an AdvSIMD register using an STP/LDP
 ;; as a vector-concat.  The address mode uses the same constraints as if it
 ;; were for a single value.
@@ -600,6 +606,21 @@ (define_address_constraint "Dp"
  An address valid for a prefetch instruction."
  (match_test "aarch64_address_valid_for_prefetch_p (op, true)"))
 
+(define_constraint "Uag"
+  "@internal
+  A constant that can be used as address offset for an ADDG operation."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, 0, 1008)
+                   && !(ival & 0xf)")))
+
+(define_constraint "Ung"
+  "@internal
+  A constant that can be used as address offset for an SUBG operation (once
+  negated)."
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (ival, -1008, -1)
+                   && !(ival & 0xf)")))
+
 (define_constraint "vgb"
   "@internal
    A constraint that matches an immediate offset valid for SVE LD1B
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c3771d9402b..33dd13e0eab 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2879,6 +2879,9 @@ (define_code_iterator SVE_UNPRED_FP_BINARY [plus minus 
mult])
 ;; SVE integer comparisons.
 (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
 
+;; pre/post-{inc,dec} for mte instructions.
+(define_code_iterator MTE_PP [post_inc post_dec pre_inc pre_dec])
+
 ;; -------------------------------------------------------------------
 ;; Code Attributes
 ;; -------------------------------------------------------------------
@@ -3225,6 +3228,23 @@ (define_code_attr SVE_COND_FP [(plus "UNSPEC_COND_FADD")
                               (minus "UNSPEC_COND_FSUB")
                               (mult "UNSPEC_COND_FMUL")])
 
+;; Map MTE pre/post to the right asm format
+(define_code_attr stg_ops [(post_inc "[%0], 16")
+                          (post_dec "[%0], -16")
+                          (pre_inc  "[%0, 16]!")
+                          (pre_dec  "[%0, -16]!")])
+
+(define_code_attr st2g_ops [(post_inc "[%0], 32")
+                           (post_dec "[%0], -32")
+                           (pre_inc  "[%0, 32]!")
+                           (pre_dec  "[%0, -32]!")])
+
+;; Map MTE pre/post to names
+(define_code_attr mte_name [(post_inc "postinc")
+                           (post_dec "postdec")
+                           (pre_inc "preinc")
+                           (pre_dec "predec")])
+
 ;; -------------------------------------------------------------------
 ;; Int Iterators.
 ;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 42304cef439..dca0baf75e0 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -1066,13 +1066,20 @@ (define_predicate "aarch64_bytes_per_sve_vector_operand"
        (match_test "known_eq (wi::to_poly_wide (op, mode),
                              BYTES_PER_SVE_VECTOR)")))
 
+;; The uimm4 field is a 4-bit field that only accepts immediates in the
+;; range 0..15.
 (define_predicate "aarch64_memtag_tag_offset"
   (and (match_code "const_int")
-       (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
+       (match_test "UINTVAL (op) <= 15")))
+
+(define_predicate "aarch64_granule16_memory_operand"
+  (and (match_test "TARGET_MEMTAG")
+       (match_code "mem")
+       (match_test "aarch64_granule16_memory_address_p (op)")))
 
-(define_predicate "aarch64_granule16_uimm6"
+(define_predicate "aarch64_granule16_imm6"
   (and (match_code "const_int")
-       (match_test "IN_RANGE (INTVAL (op), 0, 1008)
+       (match_test "IN_RANGE (INTVAL (op), -1008, 1008)
                    && !(INTVAL (op) & 0xf)")))
 
 (define_predicate "aarch64_granule16_simm9"
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 31067b611b9..b12da70eb7c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18333,8 +18333,10 @@ for a list of supported options.
 The option cannot be combined with @option{-fsanitize=thread} or
 @option{-fsanitize=hwaddress}.  Note that the only targets
 @option{-fsanitize=hwaddress} is currently supported on are x86-64
-(only with @code{-mlam=u48} or @code{-mlam=u57} options) and AArch64,
-in both cases only in ABIs with 64-bit pointers.
+(only with @code{-mlam=u48} or @code{-mlam=u57} options) and AArch64, in both
+cases only in ABIs with 64-bit pointers.  Similarly,
+@option{-fsanitize=memtag-stack} is currently only supported on AArch64 ABIs
+with 64-bit pointers.
 
 When compiling with @option{-fsanitize=address}, you should also
 use @option{-g} to produce more meaningful output.
diff --git a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c 
b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
index f8368690032..e94a2220fe3 100644
--- a/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/acle/memtag_1.c
@@ -54,9 +54,9 @@ test_memtag_6 (void *p)
   __arm_mte_set_tag (p);
 }
 
-/* { dg-final { scan-assembler-times {irg\tx..?, x..?, x..?\n} 1 } } */
+/* { dg-final { scan-assembler-times {irg\tx..?, x..?\n} 1 } } */
 /* { dg-final { scan-assembler-times {gmi\tx..?, x..?, x..?\n} 1 } } */
 /* { dg-final { scan-assembler-times {subp\tx..?, x..?, x..?\n} 1 } } */
 /* { dg-final { scan-assembler-times {addg\tx..?, x..?, #0, #1\n} 1 } } */
 /* { dg-final { scan-assembler-times {ldg\tx..?, \[x..?, #0\]\n} 1 } } */
-/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?, #0\]\n} 1 } } */
\ No newline at end of file
+/* { dg-final { scan-assembler-times {stg\tx..?, \[x..?\]\n} 1 } } */
-- 
2.51.0


Reply via email to