On Thu, Sep 18, 2025 at 06:48:03PM +0000, Qing Zhao wrote:
>
>
> > On Sep 18, 2025, at 14:20, Kees Cook <[email protected]> wrote:
> >
> >>>>> +- External functions that are address-taken have a weak
> >>>>> __kcfi_typeid_$func
> >>>>> + symbol added with the typeid value available so that the typeid can
> >>>>> be
> >>>>> + referenced from assembly linkages, etc, where the typeid values
> >>>>> cannot be
> >>>>> + calculated (i.e where C type information is missing):
> >>>>> +
> >>>>> + .weak __kcfi_typeid_$func
> >>>>> + .set __kcfi_typeid_$func, $typeid
> >>>>> +
> >>>>
> >>>> From my previous understanding, the above weak symbol is emitted for
> >>>> external functions
> >>>> that are address-taken AND does not have a definition in the
> >>>> compilation. So the weak symbols
> >>>> Is emitted at the declaration site of the external function, is this
> >>>> true?
> >>>>
> >>>> If so, could you please clarify this in the above?
> >>>
> >>> Yes, this happens via assemble_external_real, which can be called under
> >>> a few conditions in gcc/varasm.cc.
> >>
> >> Okay. Please clarify this in the design doc.
> >
> > I mention it later in the "behavioral" section:
> >
> > - assemble_external_real calls kcfi_emit_typeid_symbol to add the
> > __kcfi_typeid_$func symbols.
> >
> > I had left off implementation details (i.e. "called from
> > assemble_external_real") in the "constraints" section. How would you
> > like this arranged?
>
> The original arrangement is good. -:)
>
> I guess that I didn’t make myself clear in the beginning, the following is a
> modified version of
> your previous paragraph:
>
> +- An external function that is address-taken but does not have a definition
> has
> + a weak __kcfi_typeid_$func symbol added at the declaration site. This weak
> + symbol has the typeid value available so that the typeid can be
> + referenced from assembly linkages, etc, where the typeid values cannot be
> + calculated (i.e where C type information is missing):
> +
> + .weak __kcfi_typeid_$func
> + .set __kcfi_typeid_$func, $typeid
> +
>
> Is the above the correct understanding?
Ah! I see, yes, that's correct. I will update it. :)
>
> >>>
> >>>>> +static uint32_t
> >>>>> +kcfi_get_type_id (tree fn_type)
> >>>>> +{
> >>>>> + uint32_t type_id;
> >>>>> +
> >>>>> + /* Cache the attribute identifier. */
> >>>>> + if (!kcfi_type_id_attr)
> >>>>> + kcfi_type_id_attr = get_identifier ("kcfi_type_id");
> >>>>> +
> >>>>> + tree attr = lookup_attribute (IDENTIFIER_POINTER (kcfi_type_id_attr),
> >>>>> + TYPE_ATTRIBUTES (fn_type));
> >>>>
> >>>> The above can be simplified as:
> >>>> + tree attr = lookup_attribute (“kcfi_type_id”, TYPE_ATTRIBUTES
> >>>> (fn_type));
> >>>
> >>> Ugh, I totally misunderstood the examples I saw of this. I thought they
> >>> were caching the string lookup, but now that I look more closely, I see:
> >>>
> >>> #define IDENTIFIER_POINTER(NODE) \
> >>> ((const char *) IDENTIFIER_NODE_CHECK (NODE)->identifier.id.str)
> >>>
> >>> it's just returning the string!
> >>>
> >>> I will throw away the "caching" I was doing. I thought it would actually
> >>> look up the attribute using the tree returned by get_identifier, but I
> >>> see there is no overloaded lookup_attribute that takes a tree argument.
> >>>
> >>> *face palm*
> >>
> >> -:)
> >
> > Okay, so I tried to remove this and remembered that it's actually cached
> > not for lookup_attribute, but for build_tree_list call case:
> >
> > tree attr = build_tree_list (kcfi_type_id_attr, type_id_tree);
> >
> > TYPE_ATTRIBUTES (fn_type) = chainon (TYPE_ATTRIBUTES (fn_type), attr);
> >
> > For _that_, I need a "tree" argument. So instead of building it each
> > time, I have it built already, and I can get at its string for
> > lookup_attribute too. So I think this code is good as-is.
>
> Right, the kcfi_type_id_attr is still needed for the purpose of new type_id
> attribute.
>
> But, for the following
>
> > + tree attr = lookup_attribute (IDENTIFIER_POINTER (kcfi_type_id_attr),
> > + TYPE_ATTRIBUTES (fn_type));
>
> The above can be simplified as:
> + tree attr = lookup_attribute (“kcfi_type_id”, TYPE_ATTRIBUTES (fn_type));
>
> No need to call IDENTIFIER_POINTER (kcfi_type_id_attr) as the first argument
> for the above call.
>
> Hope this is clear.
Right, I did this because it seemed weird to me to open-code the same
literal string twice.
--
Kees Cook