I think the question is not whether this could be done somehow,
but whether it should.  Why design a language feature that requires 
storing tokens and parsing it outside the original context?  

For an attribute this may still be acceptable, but we need the same
thing for array sizes in C.  For a proper language feature, you
would  need to specify the context used to semantically interpret
the moved part of the program.  If you shuffle tokens around - which
we would also need to be able to do recursively, this becomes messy 
quickly. And a C compiler would not just have to store tokens, but
has store and recreate the complete original semantic context, not
just for the current version of the language, but also for all 
future versions. That simply sucks.  


Martin

Am Sonntag, dem 20.07.2025 um 11:20 +0000 schrieb mo...@google.com:
> From: Bill Wendling <mo...@google.com>
> 
> Also, this code doesn't go further than parsing. I.e., it doesn't generate the
> internal gimple code that accesses the struct fields. The code is meant to 
> show
> that it *is* possible to perform a delayed parsing with no "double parsing" 
> and
> still be performant.
> 
> Minor Nomenclature Change
> -------------------------
> 
> I (try to) use "bounds safety" instead of "counted_by" in the code. This is
> because Clang has many more bounds safety attributes in the pipeline and
> "bounds safety" seems like a better overall name for the set of attributes.
> 
> Token-Based Delayed Parsing
> ---------------------------
> 
> The solution introduces a token capture and delayed parsing mechanism that
> stores tokens during initial parsing and performs the actual parse later when
> the struct context is available. This isn't a novel idea, as the OMP code does
> something similar for directives.
> 
> After storing the tokens, the parsing continues as if the expression was 
> parsed
> (but it hasn't been yet). The parsing is delayed until the
> 'verify_counted_by_attribute ()' call, which performs the actual parse and
> reports any semantic errors. (The actual parse is done simply by creating a 
> new
> 'c_parser' object and filling it with the delayed tokens.)
> 
> On a successful parse, the 'C_TOKEN_VEC' tree node is converted into a tree
> node holding an expression. The token vector is freed afterwards.
> 
> If this approach is deemed worthy, I would like to incorporate the "single
> identifier" parsing into the delayed parsing mechanism (a single identifier is
> just a simple expression).
> 
> RFC
> ---
> 
> - Is this something that GCC is interested in to get us past this hurdle
>   that's delaying the upstreaming of more bounds safety checks?
> - Will this solve the issues at hand? (For this implementation, I'm
>   specifically focusing struct field attributes. I haven't thought much about
>   parameter list attributes.)
> - I think there's a layering violation. The 'c-decl.cc' "verifies" the
>   counted_by argument, but it's in essence performing an identifier 
> resolution,
>   which is something the parser should be doing. Perhaps that code should be
>   moved into "finish_struct ()", or somewhere else in "c/c-parser.cc"?
> - My GCC-fu isn't strong and I may have abused some parts of the parser in
>   unholy ways. (I blame beer.)
> - Anything else you'd like to say or recommend.
> 
> Disclaimer
> ----------
> 
> No AI's were harmed in the making of this RFC, though one did have its 
> feelings
> hurt.
> 
> Share and enjoy!
> -bw
> 
> 
> [1] 
> https://discourse.llvm.org/t/rfc-bounds-safety-in-c-syntax-compatibility-with-gcc/85885
> 
> ---
> To: gcc-patches@gcc.gnu.org
> Cc: Kees Cook <k...@kernel.org>
> Cc: Nick Desaulniers <nick.desaulni...@gmail.com>
> Cc: Martin Uecker <uec...@tugraz.at>
> Cc: Qing Zhao <qing.z...@oracle.com>
> Cc: Jakub Jelinek <ja...@redhat.com>
> Cc: Richard Biener <richard.guent...@gmail.com>
> Cc: Yeoul Na <yeoul...@apple.com>
> Cc: John McCall <rjmcc...@apple.com>
> Cc: Aaron Ballman <aa...@aaronballman.com>
> Cc: Justin Stitt <justinst...@google.com>
> 
> v1: - Initial RFC submitted.
>     - Kees Cook's Linus asbestos suit borrowed.
> ---
>  gcc/attribs.cc            |   8 ++-
>  gcc/c-family/c-attribs.cc |  37 +++++++++---
>  gcc/c/c-decl.cc           |  97 ++++++++++++++++++++++--------
>  gcc/c/c-parser.cc         | 123 ++++++++++++++++++++++++++++++++++++--
>  gcc/c/c-parser.h          |   1 +
>  gcc/c/c-typeck.cc         |  21 +++++++
>  6 files changed, 246 insertions(+), 41 deletions(-)
> 
> diff --git a/gcc/attribs.cc b/gcc/attribs.cc
> index 3fce9d625256..74001889c72a 100644
> --- a/gcc/attribs.cc
> +++ b/gcc/attribs.cc
> @@ -752,8 +752,10 @@ decl_attributes (tree *node, tree attributes, int flags,
>       }
>        else
>       {
> -       int nargs = list_length (args);
> -       if (nargs < spec->min_length
> +          int nargs = args && TREE_CODE (args) == C_TOKEN_VEC
> +                          ? 1
> +                          : list_length (args);
> +          if (nargs < spec->min_length
>             || (spec->max_length >= 0
>                 && nargs > spec->max_length))
>           {
> @@ -771,7 +773,7 @@ decl_attributes (tree *node, tree attributes, int flags,
>                       spec->min_length, spec->max_length, nargs);
>             continue;
>           }
> -     }
> +        }
>        gcc_assert (is_attribute_p (spec->name, name));
>  
>        if (spec->decl_required && !DECL_P (*anode))
> diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
> index 1f4a0df12051..0e8c17440b9a 100644
> --- a/gcc/c-family/c-attribs.cc
> +++ b/gcc/c-family/c-attribs.cc
> @@ -2887,7 +2887,14 @@ handle_counted_by_attribute (tree *node, tree name,
>                            bool *no_add_attrs)
>  {
>    tree decl = *node;
> -  tree argval = TREE_VALUE (args);
> +  tree argval;
> +
> +  /* Handle both C_TOKEN_VEC and regular argument formats.  */
> +  if (TREE_CODE (args) == C_TOKEN_VEC)
> +    argval = args;
> +  else
> +    argval = TREE_VALUE (args);
> +
>    tree old_counted_by = lookup_attribute ("counted_by", DECL_ATTRIBUTES 
> (decl));
>  
>    /* This attribute is not supported in C++.  */
> @@ -2922,11 +2929,13 @@ handle_counted_by_attribute (tree *node, tree name,
>               " array member field", name);
>        *no_add_attrs = true;
>      }
> -  /* The argument should be an identifier.  */
> -  else if (TREE_CODE (argval) != IDENTIFIER_NODE)
> +  /* The argument should be an identifier or a C_TOKEN_VEC for complex
> +     expressions.  */
> +  else if (TREE_CODE (argval) != IDENTIFIER_NODE
> +           && TREE_CODE (argval) != C_TOKEN_VEC)
>      {
>        error_at (DECL_SOURCE_LOCATION (decl),
> -             "%<counted_by%> argument is not an identifier");
> +                "%<counted_by%> argument is not an identifier or 
> expression");
>        *no_add_attrs = true;
>      }
>    /* Issue error when there is a counted_by attribute with a different
> @@ -2934,14 +2943,28 @@ handle_counted_by_attribute (tree *node, tree name,
>    else if (old_counted_by != NULL_TREE)
>      {
>        tree old_fieldname = TREE_VALUE (TREE_VALUE (old_counted_by));
> -      if (strcmp (IDENTIFIER_POINTER (old_fieldname),
> -               IDENTIFIER_POINTER (argval)) != 0)
> -     {
> +      /* Only check conflicts for simple identifiers; complex expressions
> +         will be validated during struct completion.  */
> +      if (TREE_CODE (argval) == IDENTIFIER_NODE
> +          && TREE_CODE (old_fieldname) == IDENTIFIER_NODE
> +          && strcmp (IDENTIFIER_POINTER (old_fieldname),
> +                     IDENTIFIER_POINTER (argval))
> +                 != 0)
> +        {
>         error_at (DECL_SOURCE_LOCATION (decl),
>                   "%<counted_by%> argument %qE conflicts with"
>                   " previous declaration %qE", argval, old_fieldname);
>         *no_add_attrs = true;
>       }
> +      else if (TREE_CODE (argval) == C_TOKEN_VEC
> +               || TREE_CODE (old_fieldname) == C_TOKEN_VEC)
> +        {
> +          /* For complex expressions, defer conflict checking to struct
> +             completion.  */
> +          warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
> +                      "multiple %<counted_by%> attributes on the same field; 
> "
> +                      "only the last one will be used");
> +        }
>      }
>  
>    return NULL_TREE;
> diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> index acbe2b88e674..615d0f5a1e88 100644
> --- a/gcc/c/c-decl.cc
> +++ b/gcc/c/c-decl.cc
> @@ -9445,43 +9445,90 @@ verify_counted_by_attribute (tree struct_type, tree 
> field_decl)
>    if (!attr_counted_by)
>      return;
>  
> -  /* If there is an counted_by attribute attached to the field,
> +  /* If there is a counted_by attribute attached to the field,
>       verify it.  */
>  
> -  tree fieldname = TREE_VALUE (TREE_VALUE (attr_counted_by));
> +  tree argument;
> +  tree parsed_expr = NULL_TREE;
>  
> -  /* Verify the argument of the attrbute is a valid field of the
> -     containing structure.  */
> -
> -  tree counted_by_field = lookup_field (struct_type, fieldname);
> -
> -  /* Error when the field is not found in the containing structure and
> -     remove the corresponding counted_by attribute from the field_decl.  */
> -  if (!counted_by_field)
> -    {
> -      error_at (DECL_SOURCE_LOCATION (field_decl),
> -             "argument %qE to the %<counted_by%> attribute"
> -             " is not a field declaration in the same structure"
> -             " as %qD", fieldname, field_decl);
> -      DECL_ATTRIBUTES (field_decl)
> -     = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
> -    }
> +  /* Handle both C_TOKEN_VEC and regular argument formats */
> +  tree args = TREE_VALUE (attr_counted_by);
> +  if (TREE_CODE (args) == C_TOKEN_VEC)
> +    argument = args; /* Direct C_TOKEN_VEC */
>    else
> -  /* Error when the field is not with an integer type.  */
> +    argument = TREE_VALUE (args); /* Standard tree list format */
> +
> +  /* Handle both simple identifiers and complex expressions.  */
> +  if (TREE_CODE (argument) == IDENTIFIER_NODE)
>      {
> +      /* Simple identifier case - existing logic.  */
> +      tree counted_by_field = lookup_field (struct_type, argument);
> +
> +      if (!counted_by_field)
> +        {
> +          error_at (DECL_SOURCE_LOCATION (field_decl),
> +                    "argument %qE to the %<counted_by%> attribute"
> +                    " is not a field declaration in the same structure"
> +                    " as %qD",
> +                    argument, field_decl);
> +          DECL_ATTRIBUTES (field_decl)
> +              = remove_attribute ("counted_by", DECL_ATTRIBUTES 
> (field_decl));
> +          return;
> +        }
> +
> +      /* Find the actual field declaration.  */
>        while (TREE_CHAIN (counted_by_field))
>       counted_by_field = TREE_CHAIN (counted_by_field);
>        tree real_field = TREE_VALUE (counted_by_field);
>  
>        if (!INTEGRAL_TYPE_P (TREE_TYPE (real_field)))
>       {
> -       error_at (DECL_SOURCE_LOCATION (field_decl),
> -                 "argument %qE to the %<counted_by%> attribute"
> -                 " is not a field declaration with an integer type",
> -                 fieldname);
> -       DECL_ATTRIBUTES (field_decl)
> +          error_at (DECL_SOURCE_LOCATION (field_decl),
> +                    "argument %qE to the %<counted_by%> attribute"
> +                    " is not a field declaration with an integer type",
> +                    argument);
> +          DECL_ATTRIBUTES (field_decl)
> +              = remove_attribute ("counted_by", DECL_ATTRIBUTES 
> (field_decl));
> +          return;
> +        }
> +    }
> +  else if (TREE_CODE (argument) == C_TOKEN_VEC)
> +    {
> +      /* Complex expression case - parse the stored tokens.  */
> +      parsed_expr
> +          = c_parse_token_vec_expression_with_struct (argument, struct_type);
> +
> +      if (!parsed_expr)
> +        {
> +          error_at (DECL_SOURCE_LOCATION (field_decl),
> +                    "invalid expression in %<counted_by%> attribute");
> +          DECL_ATTRIBUTES (field_decl)
>           = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
> -     }
> +          return;
> +        }
> +
> +      /* Validate that the expression result is integer type.  */
> +      if (!INTEGRAL_TYPE_P (TREE_TYPE (parsed_expr)))
> +        {
> +          error_at (DECL_SOURCE_LOCATION (field_decl),
> +                    "expression in %<counted_by%> attribute does not"
> +                    " have integer type");
> +          DECL_ATTRIBUTES (field_decl)
> +              = remove_attribute ("counted_by", DECL_ATTRIBUTES 
> (field_decl));
> +          return;
> +        }
> +
> +      /* If we successfully parsed the expression, replace the C_TOKEN_VEC
> +         with the parsed expression for later use.  */
> +      TREE_VALUE (attr_counted_by) = build_tree_list (NULL_TREE, 
> parsed_expr);
> +    }
> +  else
> +    {
> +      error_at (DECL_SOURCE_LOCATION (field_decl),
> +                "invalid argument to %<counted_by%> attribute");
> +      DECL_ATTRIBUTES (field_decl)
> +          = remove_attribute ("counted_by", DECL_ATTRIBUTES (field_decl));
> +      return;
>      }
>  }
>  
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 5119841a5895..a9c641151d39 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -5623,6 +5623,30 @@ c_parser_attribute_arguments (c_parser *parser, bool 
> takes_identifier,
>    return attr_args;
>  }
>  
> +static bool
> +c_parser_bounds_safety_attr (tree attr_name)
> +{
> +  const char *name = IDENTIFIER_POINTER (attr_name);
> +  return !strcmp (name, "counted_by");
> +}
> +
> +/* Check if the bounds safety attribute needs complex expression parsing.  */
> +static bool
> +c_parser_needs_complex_bounds_parsing (c_parser *parser)
> +{
> +  /* Look ahead to see if this is a simple identifier or a complex
> +     expression.  */
> +  c_token *tok = c_parser_peek_token (parser);
> +  if (tok->type != CPP_NAME)
> +    return true;
> +
> +  c_token *next = c_parser_peek_2nd_token (parser);
> +  if (next->type == CPP_CLOSE_PAREN)
> +    return false;
> +
> +  return true;
> +}
> +
>  /* Parse (possibly empty) gnu-attributes.  This is a GNU extension.
>  
>     gnu-attributes:
> @@ -5692,12 +5716,56 @@ c_parser_gnu_attribute (c_parser *parser, tree attrs,
>      }
>    c_parser_consume_token (parser);
>  
> -  tree attr_args
> -    = c_parser_attribute_arguments (parser,
> -                                 attribute_takes_identifier_p (attr_name),
> -                                 false,
> -                                 is_attribute_p ("assume", attr_name),
> -                                 true);
> +  tree attr_args;
> +  if (!c_parser_bounds_safety_attr (attr_name)
> +      || !c_parser_needs_complex_bounds_parsing (parser))
> +    attr_args = c_parser_attribute_arguments (
> +        parser, attribute_takes_identifier_p (attr_name), false,
> +        is_attribute_p ("assume", attr_name), true);
> +  else
> +    {
> +      unsigned int n = 1;
> +      unsigned int paren_count = 0;
> +
> +      for (; true; ++n)
> +        {
> +          c_token *tok = c_parser_peek_nth_token (parser, n);
> +
> +          /* Safety check to avoid infinite loops.  */
> +          if (tok->type == CPP_EOF)
> +            {
> +              c_parser_error (parser, "expected identifier or expression");
> +              return error_mark_node;
> +            }
> +
> +          if (tok->type == CPP_CLOSE_PAREN)
> +            {
> +              if (!paren_count)
> +                break;
> +              paren_count--;
> +            }
> +          else if (tok->type == CPP_OPEN_PAREN)
> +            paren_count++;
> +          else if (tok->type == CPP_SEMICOLON)
> +            {
> +              c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
> +                                         "expected identifier or 
> expression");
> +              return error_mark_node;
> +            }
> +        }
> +
> +      vec<c_token, va_gc> *v;
> +      vec_alloc (v, n - 1);
> +      for (--n; n; --n)
> +        {
> +          c_token *tok = c_parser_peek_token (parser);
> +          v->quick_push (*tok);
> +          c_parser_consume_token (parser);
> +        }
> +
> +      attr_args = make_node (C_TOKEN_VEC);
> +      C_TOKEN_VEC_TOKENS (attr_args) = v;
> +    }
>  
>    attr = build_tree_list (attr_name, attr_args);
>    if (c_parser_next_token_is (parser, CPP_CLOSE_PAREN))
> @@ -27746,6 +27814,49 @@ c_maybe_parse_omp_decl (tree decl, tree d)
>    return true;
>  }
>  
> +/* Context for parsing counted_by expressions within struct scope.  */
> +tree bounds_safety_attr_struct_type = NULL_TREE;
> +
> +/* Parse an expression from a C_TOKEN_VEC.  This is used for delayed
> +   parsing of counted_by attribute expressions.  */
> +
> +tree
> +c_parse_token_vec_expression (tree token_vec)
> +{
> +  vec<c_token, va_gc> *tokens = C_TOKEN_VEC_TOKENS (token_vec);
> +
> +  if (!tokens || vec_safe_is_empty (tokens))
> +    return NULL_TREE;
> +
> +  /* Create a parser for just the list of tokens.  */
> +  c_parser delay_parser = {};
> +  delay_parser.tokens = tokens->address ();
> +  delay_parser.tokens_avail = tokens->length ();
> +  delay_parser.raw_tokens = tokens;
> +
> +  /* Parse the expression.  */
> +  c_expr expr = c_parser_expr_no_commas (&delay_parser, NULL);
> +
> +  if (expr.value == error_mark_node)
> +    return NULL_TREE;
> +
> +  return expr.value;
> +}
> +
> +/* Parse an expression from a C_TOKEN_VEC with struct context.  */
> +
> +tree
> +c_parse_token_vec_expression_with_struct (tree token_vec, tree struct_type)
> +{
> +  tree saved_struct_type = bounds_safety_attr_struct_type;
> +  bounds_safety_attr_struct_type = struct_type;
> +
> +  tree result = c_parse_token_vec_expression (token_vec);
> +
> +  bounds_safety_attr_struct_type = saved_struct_type;
> +  return result;
> +}
> +
>  /* OpenMP 4.0:
>     # pragma omp declare target new-line
>     declarations and definitions
> diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> index a84779bcbf83..577644cef13a 100644
> --- a/gcc/c/c-parser.h
> +++ b/gcc/c/c-parser.h
> @@ -205,5 +205,6 @@ extern void c_parser_declspecs (c_parser *, struct 
> c_declspecs *, bool, bool,
>                               enum c_lookahead_kind);
>  extern struct c_type_name *c_parser_type_name (c_parser *, bool = false);
>  extern bool c_maybe_parse_omp_decl (tree, tree);
> +extern tree c_parse_token_vec_expression_with_struct (tree, tree);
>  
>  #endif
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index f161bd9d0e74..788c0fe3763f 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pretty-print-markup.h"
>  #include "gcc-urlifier.h"
>  
> +/* Forward declaration for bounds safety attribute context.  */
> +extern tree bounds_safety_attr_struct_type;
> +
>  /* Possible cases of implicit conversions.  Used to select diagnostic 
> messages
>     and control folding initializers in convert_for_assignment.  */
>  enum impl_conv {
> @@ -3528,6 +3531,24 @@ build_external_ref (location_t loc, tree id, bool fun, 
> tree *type)
>      return error_mark_node;
>    else
>      {
> +      /* Check if we're parsing a counted_by expression and can resolve
> +         the identifier as a struct field.  */
> +      if (bounds_safety_attr_struct_type)
> +        {
> +          tree field = lookup_field (bounds_safety_attr_struct_type, id);
> +          if (field)
> +            {
> +              /* We found a field - create a reference to it */
> +              while (TREE_CHAIN (field))
> +                field = TREE_CHAIN (field);
> +              tree real_field = TREE_VALUE (field);
> +
> +              ref = real_field;
> +              *type = TREE_TYPE (ref);
> +              return ref;
> +            }
> +        }
> +
>        undeclared_variable (loc, id);
>        return error_mark_node;
>      }

Reply via email to