Matthew Malcomson <matthew.malcom...@arm.com> writes:
> @@ -75,6 +89,31 @@ extern hash_set <tree> *asan_used_labels;
>  
>  #define ASAN_USE_AFTER_SCOPE_ATTRIBUTE       "use after scope memory"
>  
> +/* NOTE: The values below define an ABI and are hard-coded to these values in
> +   libhwasan, hence they can't be changed independently here.  */
> +/* How many bits are used to store a tag in a pointer.
> +   HWASAN uses the entire top byte of a pointer (i.e. 8 bits).  */
> +#define HWASAN_TAG_SIZE 8
> +/* Tag Granule of HWASAN shadow stack.
> +   This is the size in real memory that each byte in the shadow memory refers
> +   to.  I.e. if a variable is X bytes long in memory then it's tag in shadow

s/it's/its/

> +   memory will span X / HWASAN_TAG_GRANULE_SIZE bytes.
> +   Most variables will need to be aligned to this amount since two variables
> +   that are neighbours in memory and share a tag granule would need to share

“neighbors” (alas)

> +   the same tag (the shared tag granule can only store one tag).  */
> +#define HWASAN_TAG_SHIFT_SIZE 4
> +#define HWASAN_TAG_GRANULE_SIZE (1ULL << HWASAN_TAG_SHIFT_SIZE)
> +/* Define the tag for the stack background.
> +   This defines what tag the stack pointer will be and hence what tag all
> +   variables that are not given special tags are (e.g. spilled registers,
> +   and parameters passed on the stack).  */
> +#define HWASAN_STACK_BACKGROUND 0
> +/* How many bits to shift in order to access the tag bits.
> +   The tag is stored in the top 8 bits of a pointer hence shifting 56 bits 
> will
> +   leave just the tag.  */
> +#define HWASAN_SHIFT 56
> +#define HWASAN_SHIFT_RTX const_int_rtx[MAX_SAVED_CONST_INT + HWASAN_SHIFT]
> +
>  /* Various flags for Asan builtins.  */
>  enum asan_check_flags
>  {
> diff --git a/gcc/asan.c b/gcc/asan.c
> index 
> 9c9aa4cae35832c1534a2cffac1d3d13eed0e687..f755a3290f1091be14fbe4c51d9579389e5eb245
>  100644
> --- a/gcc/asan.c
> +++ b/gcc/asan.c
> @@ -257,6 +257,15 @@ hash_set<tree> *asan_handled_variables = NULL;
>  
>  hash_set <tree> *asan_used_labels = NULL;
>  
> +/* Global variables for HWASAN stack tagging.  */
> +/* tag_offset records the offset from the frame base tag that the next object
> +   should have.  */
> +static uint8_t tag_offset = 0;
> +/* hwasan_base_ptr is a pointer with the same address as
> +   `virtual_stack_vars_rtx` for the current frame, and with the frame base 
> tag
> +   stored in it.  */
> +static rtx hwasan_base_ptr = NULL_RTX;

Was initially surprised that this didn't need to be GTY, but I guess
there are no ggc_collect calls during the relevant parts of cfgexpand.

It's normally easy to tell when GTY is needed if all the users of the
state are in the same file, but for cases like this it's a little harder.
Might be worth a comment.

Might also be worth having “stack” or “frame” in these names, and similarly
for the public functions.

> @@ -1352,6 +1361,28 @@ asan_redzone_buffer::flush_if_full (void)
>      flush_redzone_payload ();
>  }
>  
> +/* Returns whether we are tagging pointers and checking those tags on memory
> +   access.  */
> +bool
> +hwasan_sanitize_p ()
> +{
> +    return sanitize_flags_p (SANITIZE_HWADDRESS);

Nit: excess indentation.

> @@ -2901,6 +2932,11 @@ initialize_sanitizer_builtins (void)
>      = build_function_type_list (void_type_node, uint64_type_node,
>                               ptr_type_node, NULL_TREE);
>  
> +  tree BT_FN_VOID_PTR_UINT8_SIZE
> +    = build_function_type_list (void_type_node, ptr_type_node,
> +                             unsigned_char_type_node, size_type_node,
> +                             NULL_TREE);

The function prototype seems to be:

  void __hwasan_tag_memory(uptr p, u8 tag, uptr sz)

and size_t doesn't have the same precision as pointers on all targets.
Maybe pointer_sized_int_node would be more accurate.  (Despite its name,
it's unsigned rather than signed.)

> @@ -3702,4 +3740,269 @@ make_pass_asan_O0 (gcc::context *ctxt)
>    return new pass_asan_O0 (ctxt);
>  }
>  
> +/* For stack tagging:
> +     Initialise tag of the base register.

“Initialize”

Very minor, but I've not seen this style of comment elsewhere in GCC,
where the main comment is indented by two extra spaces.  Think a blank
line and no extra indentation is more usual.

> +     This has to be done as soon as the stack is getting expanded to ensure
> +     anything emitted with `get_dynamic_stack_base` will use the value set 
> here
> +     instead of using a register without a tag.
> +     Especially note that RTL expansion of large aligned values does that.  
> */
> +void
> +hwasan_record_base (rtx base)
> +{
> +  targetm.memtag.gentag (base, virtual_stack_vars_rtx);
> +  hwasan_base_ptr = base;
> +}
> +
> +/* For stack tagging:
> +     Return the offset from the frame base tag that the "next" expanded 
> object
> +     should have.  */
> +uint8_t
> +hwasan_current_tag ()
> +{
> +  return tag_offset;
> +}
> +
> +/* For stack tagging:
> +     Increment the tag offset modulo the size a tag can represent.  */
> +void
> +hwasan_increment_tag ()
> +{
> +  uint8_t tag_bits = HWASAN_TAG_SIZE;
> +  STATIC_ASSERT (HWASAN_TAG_SIZE == sizeof (tag_offset) * CHAR_BIT);
> +  tag_offset = (tag_offset + 1) % (1 << tag_bits);

Might be misunderstanding, but doesn't the static assert make the %
redundant?

> +  /* The "background tag" of the stack is zero by definition.
> +     This is the tag that objects like parameters passed on the stack and
> +     spilled registers are given.  It is handy to avoid this for objects we
> +     decide the tags ourselves, partly to ensure that buffer overruns can't

Maybe “for objects whose tags we decide ourselves”?

> +     affect these important variables (e.g. saved link register, saved stack
> +     pointer etc) and partly to make debugging easier (everything with a tag 
> of
> +     zero is space allocated automatically by the compiler).
> +
> +     This is not feasible when using random frame tags (the default
> +     configuration for hwasan) since the tag for the given frame is randomly
> +     chosen at runtime.  In order to avoid any tags matching the stack
> +     background we would need to decide tag offsets at runtime instead of
> +     compile time (and pay the resulting performance cost).
> +
> +     When not using random base tags for each frame (i.e. when compiled with
> +     `--param hwasan-random-frame-tag=0`) the base tag for each frame is 
> zero.
> +     This means the tag that each object gets is equal to the tag_offset used
> +     in determining it.
> +     When this is the case we *can* ensure no object gets the tag of zero by
> +     simply ensuring no object has the tag_offset of zero.  */

Nice comment, thanks for explaining this.

> +  if (tag_offset == HWASAN_STACK_BACKGROUND
> +      && ! param_hwasan_random_frame_tag)
> +    tag_offset += 1;
> +}
> +
> +/* For stack tagging:
> +     Return an RTX representing `base + offset` address
> +     and `tag_of(base) + tag_offset` tag.  */
> +rtx
> +hwasan_with_tag (rtx base, poly_int64 offset)
> +{
> +  gcc_assert (tag_offset < (1 << HWASAN_TAG_SIZE));
> +  return targetm.memtag.addtag (base, offset, tag_offset);
> +}
> +
> +/* Clear internal state for the next function.
> +   This function is called before variables on the stack get expanded, in
> +   `init_vars_expansion`.  */
> +void
> +hwasan_tag_init ()
> +{
> +  delete asan_used_labels;
> +  asan_used_labels = NULL;
> +
> +  hwasan_base_ptr = NULL_RTX;
> +  /* When not using a random frame tag we can avoid the background stack
> +     colour which gives the user a little better debug output upon a crash.

color

> +     Meanwhile, when using a random frame tag it will be nice to avoid adding
> +     tags for the first object since that is unnecessary extra work.
> +     Hence set the initial tag_offset to be 0 if using a random frame tag 
> and 1
> +     otherwise.  */
> +  tag_offset = param_hwasan_random_frame_tag
> +    ? 0
> +    : HWASAN_STACK_BACKGROUND + 1;
> +}
> +
> +/* Return an RTX for the tag of TAGGED_POINTER.  */
> +rtx
> +hwasan_extract_tag (rtx tagged_pointer)
> +{
> +  rtx tag = expand_simple_binop (Pmode,
> +                              LSHIFTRT,
> +                              tagged_pointer,
> +                              HWASAN_SHIFT_RTX,
> +                              NULL_RTX,
> +                              /* unsignedp = */0,
> +                              OPTAB_DIRECT);

Think it's more usual to put multiple arguments on the same line.

> +  return gen_lowpart (QImode, tag);
> +}
> +
> +/* For stack tagging:
> +      Does HWASAN equivalent of `asan_emit_stack_protection`.
> +
> +   Prologue sequence should be emitted directly, while the epilogue sequence 
> is
> +   returned.  The epilogue sequence is what should be used if we're not
> +   protecting alloca objects.
> +
> +   BASES is an array containing the tagged base registers for each object.
> +   We map each object to a given base since large aligned objects have a
> +   different base to others and we need to know which objects use which base.
> +
> +   UNTAGGED_BASES contains the same information as above except without tags.
> +   This is needed since libhwasan only accepts untagged pointers in
> +   __hwasan_tag_memory.
> +
> +   OFFSETS is an array with the start and end offsets for each object stored 
> on
> +   the stack in this frame.  This array is hence twice the length of the 
> other
> +   array arguments (given it has two entries for each stack object).
> +
> +   TAGS is an array containing the tag *offset* each object should have from
> +   the tag in its base pointer.
> +
> +   LENGTH contains the length of the OFFSETS array.  */
> +rtx_insn *
> +hwasan_emit_prologue (rtx *bases,
> +                   rtx *untagged_bases,
> +                   rtx frame_extent,
> +                   poly_int64 *offsets,
> +                   uint8_t *tags,
> +                   size_t length)
> +{
> +  /* We need untagged base pointers since libhwasan only accepts untagged
> +    pointers in __hwasan_tag_memory.  We need the tagged base pointer to 
> obtain
> +    the base tag for an offset.  */
> +
> +  if (length < 2)
> +    return NULL;
> +
> +  poly_int64 bot = 0, top = 0;
> +  size_t i = 0;
> +  for (i = 0; (i * 2) + 1 < length; i++)
> +    {
> +      poly_int64 start = offsets[i * 2];
> +      poly_int64 end = offsets[(i * 2) + 1];
> +
> +      if (known_ge (start, end))
> +     {
> +       top = start;
> +       bot = end;
> +     }
> +      else
> +     {
> +       /* Given how these values are calculated, one must be known greater
> +          than the other.  */
> +       gcc_assert (known_lt (start, end));

Think this should be known_le.  E.g.

  known_ge (-2X - 2, -2)
  known_lt (-2X - 2, -2)

are both false for an unknown X >= 0.

Alternatively, you could assert ordered_p (start, end) before the
known_ge check.

> +       top = end;
> +       bot = start;
> +     }
> +      poly_int64 size = (top - bot);
> +
> +      /* Can't check that all poly_int64's are aligned, but still nice
> +      to check those that are compile-time constants.  */

Wouldn't using multiple_p work?  If an assert of that form triggers
for SVE offsets then it would be worth looking into why.

> +      HOST_WIDE_INT tmp;
> +      if (top.is_constant (&tmp))
> +     gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
> +      if (bot.is_constant (&tmp))
> +     gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
> +      if (size.is_constant (&tmp))
> +     gcc_assert (tmp % HWASAN_TAG_GRANULE_SIZE == 0);
> +
> +      rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +      rtx base_tag = hwasan_extract_tag (bases[i]);
> +      /* In the case of tag overflow we would want modulo wrapping -- which
> +      should be given from the `plus_constant` in QImode.  */
> +      rtx tag = plus_constant (QImode, base_tag, tags[i]);

Might be worth a gcc_assert that GET_MODE_PRECISION (QImode)
== HWASAN_TAG_SIZE.

> +      rtx bottom = convert_memory_address (ptr_mode,
> +                                        plus_constant (Pmode,
> +                                                       untagged_bases[i],
> +                                                       bot));
> +      emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +                      bottom, ptr_mode,
> +                      tag, QImode,
> +                      gen_int_mode (size, ptr_mode), ptr_mode);
> +    }
> +  return hwasan_emit_untag_frame (frame_extent, virtual_stack_vars_rtx);

This call doesn't seem to rely on the prologue processing in the same
way that the asan version does.  I think it would be clearer if this
function emitted only the prologue and if the caller called
hwasan_emit_untag_frame directly.  It would also avoid creating
garbage rtl when var_end_seq needs to be replaced.

> +}
> +
> +/* For stack tagging:
> +     Return RTL insns to clear the tags between DYNAMIC and VARS pointers
> +     into the stack.  These instructions should be emitted at the end of
> +     every function.  */
> +rtx_insn *
> +hwasan_emit_untag_frame (rtx dynamic, rtx vars)
> +{
> +  start_sequence ();
> +
> +  dynamic = convert_memory_address (ptr_mode, dynamic);
> +  vars = convert_memory_address (ptr_mode, vars);
> +
> +  rtx top_rtx;
> +  rtx bot_rtx;
> +  if (FRAME_GROWS_DOWNWARD)
> +    {
> +      top_rtx = vars;
> +      bot_rtx = dynamic;
> +    }
> +  else
> +    {
> +      top_rtx = dynamic;
> +      bot_rtx = vars;
> +    }
> +
> +  rtx size_rtx = expand_simple_binop (ptr_mode, MINUS, top_rtx, bot_rtx,
> +                                   NULL_RTX, /* unsignedp = */0,
> +                                   OPTAB_DIRECT);
> +
> +  rtx ret = init_one_libfunc ("__hwasan_tag_memory");
> +  emit_library_call (ret, LCT_NORMAL, VOIDmode,
> +      bot_rtx, ptr_mode,
> +      const0_rtx, QImode,
> +      size_rtx, ptr_mode);

Nit: should be indented under “ret”.

> +
> +  do_pending_stack_adjust ();
> +  rtx_insn *insns = get_insns ();
> +  end_sequence ();
> +  return insns;
> +}
> +
> +/* For stack tagging:
> +     Return an RTX representing ORIG_BASE without a tag.  */
> +rtx
> +hwasan_create_untagged_base (rtx orig_base)
> +{
> +  rtx untagged_base = gen_reg_rtx (Pmode);
> +  rtx tag_mask = gen_int_mode ((1ULL << HWASAN_SHIFT) - 1, Pmode);

Not sure how much it matters these days, but: HOST_WIDE_INT_1U instead
of 1ULL.

> +  untagged_base = expand_binop (Pmode, and_optab,
> +                             orig_base, tag_mask,
> +                             untagged_base, true, OPTAB_DIRECT);

It's better to pass null as the “target” parameter instead of creating
a register up front.

> +  gcc_assert (untagged_base);
> +  return untagged_base;
> +}
> +
> +/* Needs to be GTY(()), because cgraph_build_static_cdtor may
> +   invoke ggc_collect.  */
> +static GTY(()) tree hwasan_ctor_statements;
> +
> +/* Insert module initialisation into this TU.  This initialisation calls the
> +   initialisation code for libhwasan.  */

“initialization”, here and in the body.

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 
> b270a4ddb9db469ba52e42f36a1bc2f02d8f03fc..13a7cb3877c27c4c45a445b7d2c068038d9c4568
>  100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -377,7 +377,13 @@ align_local_variable (tree decl, bool really_expand)
>        if (really_expand)
>       SET_DECL_ALIGN (decl, align);
>      }
> -  return align / BITS_PER_UNIT;
> +
> +  unsigned int ret_align = align / BITS_PER_UNIT;
> +
> +  if (hwasan_sanitize_stack_p ())
> +    ret_align = MAX (ret_align, HWASAN_TAG_GRANULE_SIZE);

I think it'd be better to update SET_DECL_ALIGN too.

> +
> +  return ret_align;
>  }
>  
>  /* Align given offset BASE with ALIGN.  Truncate up if ALIGN_UP is true,
> @@ -988,7 +994,7 @@ dump_stack_var_partition (void)
>  
>  static void
>  expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
> -                      poly_int64 offset)
> +                      poly_int64 offset, rtx stack_base)

The function comment should describe the new variable.
But I wonder if it would be “better” to export hwasan_(stack_?)base_ptr
from asan.c so that code can treat it similarly to virtual_stack_vars_rtx
where necessary.  E.g. we might eventually want to know this in
find_temp_slot_from_address, although I realise we don't use tagging
for temp slots as things stand.

> @@ -1032,9 +1042,23 @@ public:
>       The vector is in reversed, highest offset pairs come first.  */
>    auto_vec<HOST_WIDE_INT> asan_vec;
>  
> +  /* HWASAN records the poly_int64 so it can handle any stack variable.  */
> +  auto_vec<poly_int64> hwasan_vec;
> +  auto_vec<rtx> hwasan_untagged_base_vec;
> +  auto_vec<rtx> hwasan_base_vec;
> +
>    /* Vector of partition representative decls in between the paddings.  */
>    auto_vec<tree> asan_decl_vec;
>  
> +  /* Vector of tag offsets representing the tag for each stack variable.
> +     Each offset determines the difference between the randomly generated
> +     tag for the current frame and the tag for this stack variable.  */
> +  auto_vec<uint8_t> hwasan_tag_vec;

Did you consider grouping the information into a struct, rather than
having separate vecs for each bit of information?  I.e.:

  untagged base
  tagged base
  nearest offset
  furthest offset
  tag

(Or alternatively, use start offset and end offset, with expand_stack_vars
rather than hwasan_emit_prologue doing the swap.)

> @@ -1123,10 +1153,36 @@ expand_stack_vars (bool (*pred) (size_t), class 
> stack_vars_data *data)
>        if (pred && !pred (i))
>       continue;
>  
> +      base = hwasan_sanitize_stack_p ()
> +     ? data->asan_base
> +     : virtual_stack_vars_rtx;
>        alignb = stack_vars[i].alignb;
>        if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
>       {
> -       base = virtual_stack_vars_rtx;
> +       if (hwasan_sanitize_stack_p ())
> +         {
> +           /* Allocate zero bytes to take advantage of the
> +              alloc_stack_frame_space logic of ensuring the stack is aligned
> +              despite having poly_int64's to deal with.

Was initially a bit confused by this.  I don't think poly_int64s are the
only (or even the main) reason we need to use alloc_stack_frame_space,
and e.g. the plain asan code uses a 0-byte allocation despite requiring
fixed-size objects.

If the comment is explaining the strangeness of “allocating” zero bytes
(which I agree is non-obvious) then I think it would be better to have a
trivial wrapper around alloc_stack_frame_space, e.g. align_frame_offset,
and use that for the existing call too.

> +
> +              There must be no tag granule "shared" between different
> +              objects.  This means that no HWASAN_TAG_GRANULE_SIZE byte
> +              chunk can have more than one object in it.
> +
> +              We ensure this by forcing the end of the last bit of data to
> +              be aligned to HWASAN_TAG_GRANULE_SIZE bytes here, and setting
> +              the start of each variable to be aligned to
> +              HWASAN_TAG_GRANULE_SIZE bytes in `align_local_variable`.
> +
> +              We can't align just one of the start or end, since there are
> +              untagged things stored on the stack that we have no control on
> +              the alignment and these can't share a tag granule with a
> +              tagged variable.  */

FAOD, please keep the rest of the comment though. :-)

> +           gcc_assert (stack_vars[i].alignb >= HWASAN_TAG_GRANULE_SIZE);
> +           offset = alloc_stack_frame_space (0, HWASAN_TAG_GRANULE_SIZE);
> +           data->hwasan_vec.safe_push (offset);
> +           data->hwasan_untagged_base_vec.safe_push (virtual_stack_vars_rtx);
> +         }
>         /* ASAN description strings don't yet have a syntax for expressing
>            polynomial offsets.  */
>         HOST_WIDE_INT prev_offset;
> @@ -1206,6 +1262,14 @@ expand_stack_vars (bool (*pred) (size_t), class 
> stack_vars_data *data)
>             offset = alloc_stack_frame_space (stack_vars[i].size, alignb);
>             base_align = crtl->max_used_stack_slot_alignment;
>           }
> +
> +       if (hwasan_sanitize_stack_p ())
> +         {
> +           data->hwasan_vec.safe_push (offset);
> +           data->hwasan_frame_extent = plus_constant (Pmode,
> +                                                      virtual_stack_vars_rtx,
> +                                                      offset);
> +         }

Since only the final value of this variable is used, it would be
better to do only one plus_constant, to avoid creating unused rtl.

> index 
> 43b491b027615523f0202fe1ded430cac0b4bef4..145a631ceb5239a9f099ef531668603cc2f7aea6
>  100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -6839,6 +6839,37 @@ DEFHOOK
>   pointers.  This feature means that @option{-fsanitize=hwaddress} can work.",
>   bool, (), default_memtag_can_tag_addresses)
>  
> +DEFHOOK
> +(addtag,
> + "Emit an RTX representing BASE offset in value by ADDR_OFFSET and in tag by\

All lines of the documentation should end in “\n\”.

The documentation should use texinfo markup, e.g. @var{base} and
@samp{addtag_force_operand} etc.  

> + TAG_OFFSET.\n\

Very picky, sorry, but it took me a couple of attempts to parse this,
initally tripping over whether “offset” was a noun or not.  Also,
“emit” for rtl normally means “add to the instruction stream”.  How about
something like:

  Return an expression that represents the result of adding
  @var{addr_offset} to the address in pointer @var{base} and
  @var{tag_offset} to the tag in pointer @var{base}.

Equally picky, but I think it looks more natural to me to add
an underscore between all the words in the function name, i.e.
“add_tag” and “gen_tag”.

> +The resulting RTX must either be a valid memory address or be able to get\n\
> +put into an operand with force_operand.  If overridden the more common 
> case\n\
> +is that we force this into an operand using the backend hook\n\
> +\"addtag_force_operand\" that is called in force_operand.\n\
> +\n\
> +It is expected that that \"addtag_force_operand\" recognises the RTX\n\
> +generated by \"addtag\" and emits code to force that RTX into an operand.",

> +rtx, (rtx base, poly_int64 addr_offset, uint8_t tag_offset),
> +default_memtag_addtag)
> +
> +DEFHOOK
> +(addtag_force_operand,
> + "If the RTL expression OPER is of the form generated by\n\
> +targetm.memtag.addtag, then emit instructions to move the value into an\n\
> +operand (i.e. for force_operand).\n\
> +TARGET is an RTX suggestion of where to generate the value.\n\
> +This hook is most often implemented by emitting instructions to put the\n\
> +expression into a pseudo register, then returning that pseudo register.",
> +rtx, (rtx oper, rtx target), NULL)

Why's this needed?  What does the addtag result look like for AArch64?

> +DEFHOOK
> +(gentag,
> + "Set the BASE argument to UNTAGGED with some random tag.\n\
> +This function is used to generate a tagged base for the current stack 
> frame.",
> +  void, (rtx base, rtx untagged),
> +  default_memtag_gentag)

Again being picky, but “gentag” sounds like an operation that generates
a tag rather than something that tags a value.  Would “add_random_tag”
or “add_runtime_tag” make sense?

> @@ -2385,4 +2388,89 @@ default_memtag_can_tag_addresses ()
>    return false;
>  }
>  
> +/* Takes a REG rtx as BASE and some value as UNTAGGED.
> +   Ensures the register in BASE is given the value of UNTAGGED with the
> +   (possibly random) base frame tag on it.  */

Just:

  /* The default implementation of TARGET_MEMTAG_GENTAG.  */

is enough here, and helps to ensure that the comment and documentation
don't get out of sync.

> +void
> +default_memtag_gentag (rtx base, rtx untagged)
> +{
> +  gcc_assert (param_hwasan_instrument_stack);
> +  if (param_hwasan_random_frame_tag)
> +    {
> +    rtx temp = gen_reg_rtx (QImode);
> +    rtx ret = init_one_libfunc ("__hwasan_generate_tag");
> +    rtx new_tag = emit_library_call_value (ret, temp, LCT_NORMAL, QImode);
> +    emit_move_insn (base, untagged);
> +    /* We know that `base` is not the stack pointer, since we never want to 
> put
> +      a randomly generated tag into the stack pointer.  Hence we can use
> +      `store_bit_field` which on aarch64 generates a `bfi` which can not act 
> on
> +      the stack pointer.  */
> +    store_bit_field (base, 8, 56, 0, 0, QImode, new_tag, false);

Is the comment addressing a correctness or an optimisation concern?

Perhaps the easiest thing would be to make the interface return
a fresh register, so that there's no question that store_bit_field
is appropriate.  The kernel case could then just return “untagged”.

> +rtx
> +default_memtag_addtag (rtx base, poly_int64 offset, uint8_t tag_offset)
> +{
> +  /* Need to look into what the most efficient code sequence is.
> +     This is a code sequence that would be emitted *many* times, so we
> +     want it as small as possible.
> +
> +     If the tag offset is greater that (1 << 7) then the most efficient

s/that/than/

> +     sequence here would give UB from signed integer overflow in the
> +     poly_int64.  Hence in that case we emit the slightly less efficient
> +     sequence.

Do you mean compile-time or run-time UB?  

If it's compile-time UB in poly_int64 arithmetic then we can just
use poly_uint64 to do the addition.  But it looks like that's what
the code already does.

> +
> +     There are two places where tag overflow is a question:
> +       - Tagging the shadow stack.
> +       (both tagging and untagging).
> +       - Tagging addressable pointers.
> +
> +     We need to ensure both behaviours are the same (i.e. that the tag that
> +     ends up in a pointer after "overflowing" the tag bits with a tag 
> addition
> +     is the same that ends up in the shadow space).
> +
> +     The aim is that the behaviour of tag addition should follow modulo

behavior :-)

> +     wrapping in both instances.
> +
> +     The libhwasan code doesn't have any path that increments a pointers tag,

pointer's

Thanks,
Richard

Reply via email to