Hi Tamar,

On 12/2/25 11:10 AM, Tamar Christina wrote:
Hi Claudiu,

Sorry again for the long delay in getting to this.

Thank you for coming back and have it reviewed :)


+bool aarch64_can_tag_addresses (void);
+

Can you just move this up instead of adding the prototype?

Done.

+/* Helper to emit either stg or st2g instruction.  */
+static void
+aarch64_emit_stxg_insn (machine_mode mode, rtx nxt, rtx addr, rtx tagp)
+{
+  rtx mem_addr = gen_rtx_MEM (mode, nxt);
+  rtvec vec = gen_rtvec (2, gen_rtx_MEM (mode, addr), tagp);
+  rtx unspec = gen_rtx_UNSPEC_VOLATILE (mode, vec,
UNSPECV_TAG_SPACE);
+  emit_set_insn (mem_addr, unspec);
+}

I guess this indirection is needed since stg and st2g are different named 
patterns.
And what's stopping unification here is the constraint modifier differences 
between
the two patterns? Which I'll come back to in a bit.


The above helper is used for both situations, namely, for tagging the memory using a loop, and for the situation when we fully unroll the loop for small size values.

+      iter_limit = GEN_INT (abs_hwi (len));

When can len be negative?

Thank you for catching this, I've fixed the issue.

+  /* Check if size is zero.  */
+  bottom_label = gen_label_rtx ();
+  if (!CONST_INT_P (size))

This just checks if size is a const int though, doesn't check for zero?
Think you're missing || CONST0_RTX (iter_mode) here?


Done.

+    {
+      rtx branch = aarch64_gen_compare_zero_and_branch (EQ, iter,
bottom_label);
+      aarch64_emit_unlikely_jump (branch);
+    }
+
+  /* Prepare the addr operand for tagging memory.  */
+  rtx addr_reg = gen_reg_rtx (Pmode);
+  emit_move_insn (addr_reg, x_addr);
+
+  top_label = gen_label_rtx ();
+  /* Emit top label.  */
+  emit_label (top_label);
+
+  /* Tag Memory using post-index stg/st2g.  */
+  aarch64_gen_tag_memory_postindex (addr_reg, tagged_pointer, offset);
+

Ok so if iters  % 2 we do st2g storing two tags at a time and if not we do stg.
If there any reason we can't always use st2g in the loop and only enter the
loop if iters > 1. And then outside of the loop if iters % 2 != 0 use a single
stg?


This loop construction can be inseted inserted many times in a function. In the prologue, for each sanitized memory region, and in the epilogue for reseting the tags. Hence, I was driven to have this loop designed using a minimal amount of instructions. On the other hand, by always using st2g in the inner loop may improve the speed. I'll have a small prototype as you suggested, and I'll measure the impact of it.

+  /* Decrement ITER by ITER_INCR.  */
+  emit_insn (gen_subdi3_compare1_imm (iter, iter, iter_incr,
+                                   GEN_INT (-UINTVAL (iter_incr))));
+
+  rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM);
+  rtx x = gen_rtx_fmt_ee (NE, CCmode, cc_reg, const0_rtx);
+  auto jump = emit_jump_insn (gen_aarch64_bcond (x, cc_reg, top_label));
+  JUMP_LABEL (jump) = top_label;
+
+  /* Emit bottom label.  */
+  if (!CONST_INT_P (size))
+    emit_label (bottom_label);
+}
+
+/* Threshold in number of granules beyond which an explicit loop for
+   tagging a memory block is emitted.  */
+#define AARCH64_TAG_MEMORY_LOOP_THRESHOLD 10
+

Think we might want this as a target param in aarch64.opt

Done.

+(define_insn "st2g"
+  [(set (match_operand:OI 0 "aarch64_granule16_memory_operand"
"=Umg")

Should this not be + as well as with the rest? Since st2g also has a writeback 
form.

Done.


+      (unspec_volatile:OI
+       [(match_dup 0)
+        (match_operand:DI 1 "register_operand" "rk")]
+       UNSPECV_TAG_SPACE))]
+  "TARGET_MEMTAG"
+  "st2g\\t%1, %0"
+  [(set_attr "type" "memtag")]
+)
+
+(define_insn "st2g_<mte_name>"
+  [(set (mem:OI (MTE_PP:DI (match_operand:DI 0 "register_operand" "+r")))
+       (unspec_volatile:OI
+        [(mem:OI (match_dup 0))
+         (match_operand:DI 1 "register_operand" "rk")]
+       UNSPECV_TAG_SPACE))]
+  "TARGET_MEMTAG"
+  "st2g\\t%1, <st2g_ops>"
+  [(set_attr "type" "memtag")]
+)
+
+(define_expand "tag_memory"
+  [(match_operand:DI 0 "general_operand" "")
+   (match_operand:DI 1 "nonmemory_operand" "")
+   (match_operand:DI 2 "nonmemory_operand" "")]
+  ""
+  "
+{
+  aarch64_expand_tag_memory (operands[0], operands[1], operands[2]);
+  DONE;
+}")

Drop the " around the { and }. The { and } indicate it's C code so we don't 
need to
quote it as a string.

Done.

+
+(define_expand "compose_tag"
+  [(set (match_operand:DI 0 "register_operand")
+       (ior:DI
+        (and:DI (plus:DI (match_operand:DI 1 "register_operand")
+                         (const_int 0))
+                (const_int MEMTAG_TAG_MASK))
+        (ashift:DI
+         (unspec:DI [(match_dup 1)
+                    (match_operand 2 "immediate_operand")]
+                    UNSPEC_GEN_TAG)
+         (const_int 56))))]
+  ""
+  "
+{
+  if (INTVAL (operands[2]) == 0)
+    {
+     emit_move_insn (operands[0], operands[1]);
+     DONE;
+    }
+}")

Same here.

Done>
Thanks for working on this,
Tamar

Thank you for your review, I'll come back to you with the modified patch asap,
Claudiu


Reply via email to