> On 6/17/22 10:18 AM, Jose E. Marchesi wrote:
>> Hi Yonghong.
>> 
>>> On 6/15/22 1:57 PM, David Faust wrote:
>>>>
>>>> On 6/14/22 22:53, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 6/7/22 2:43 PM, David Faust wrote:
>>>>>> Hello,
>>>>>>
>>>>>> This patch series adds support for:
>>>>>>
>>>>>> - Two new C-language-level attributes that allow to associate (to 
>>>>>> "annotate" or
>>>>>>      to "tag") particular declarations and types with arbitrary strings. 
>>>>>> As
>>>>>>      explained below, this is intended to be used to, for example, 
>>>>>> characterize
>>>>>>      certain pointer types.
>>>>>>
>>>>>> - The conveyance of that information in the DWARF output in the form of 
>>>>>> a new
>>>>>>      DIE: DW_TAG_GNU_annotation.
>>>>>>
>>>>>> - The conveyance of that information in the BTF output in the form of 
>>>>>> two new
>>>>>>      kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>>
>>>>>> All of these facilities are being added to the eBPF ecosystem, and 
>>>>>> support for
>>>>>> them exists in some form in LLVM.
>>>>>>
>>>>>> Purpose
>>>>>> =======
>>>>>>
>>>>>> 1)  Addition of C-family language constructs (attributes) to specify 
>>>>>> free-text
>>>>>>        tags on certain language elements, such as struct fields.
>>>>>>
>>>>>>        The purpose of these annotations is to provide additional 
>>>>>> information about
>>>>>>        types, variables, and function parameters of interest to the 
>>>>>> kernel. A
>>>>>>        driving use case is to tag pointer types within the linux kernel 
>>>>>> and eBPF
>>>>>>        programs with additional semantic information, such as '__user' 
>>>>>> or '__rcu'.
>>>>>>
>>>>>>        For example, consider the linux kernel function do_execve with the
>>>>>>        following declaration:
>>>>>>
>>>>>>          static int do_execve(struct filename *filename,
>>>>>>             const char __user *const __user *__argv,
>>>>>>             const char __user *const __user *__envp);
>>>>>>
>>>>>>        Here, __user could be defined with these annotations to record 
>>>>>> semantic
>>>>>>        information about the pointer parameters (e.g., they are 
>>>>>> user-provided) in
>>>>>>        DWARF and BTF information. Other kernel facilites such as the 
>>>>>> eBPF verifier
>>>>>>        can read the tags and make use of the information.
>>>>>>
>>>>>> 2)  Conveying the tags in the generated DWARF debug info.
>>>>>>
>>>>>>        The main motivation for emitting the tags in DWARF is that the 
>>>>>> Linux kernel
>>>>>>        generates its BTF information via pahole, using DWARF as a source:
>>>>>>
>>>>>>            +--------+  BTF                  BTF   +----------+
>>>>>>            | pahole |-------> vmlinux.btf ------->| verifier |
>>>>>>            +--------+                             +----------+
>>>>>>                ^                                        ^
>>>>>>                |                                        |
>>>>>>          DWARF |                                    BTF |
>>>>>>                |                                        |
>>>>>>             vmlinux                              +-------------+
>>>>>>             module1.ko                           | BPF program |
>>>>>>             module2.ko                           +-------------+
>>>>>>               ...
>>>>>>
>>>>>>        This is because:
>>>>>>
>>>>>>        a)  Unlike GCC, LLVM will only generate BTF for BPF programs.
>>>>>>
>>>>>>        b)  GCC can generate BTF for whatever target with -gbtf, but 
>>>>>> there is no
>>>>>>            support for linking/deduplicating BTF in the linker.
>>>>>>
>>>>>>        In the scenario above, the verifier needs access to the pointer 
>>>>>> tags of
>>>>>>        both the kernel types/declarations (conveyed in the DWARF and 
>>>>>> translated
>>>>>>        to BTF by pahole) and those of the BPF program (available 
>>>>>> directly in BTF).
>>>>>>
>>>>>>        Another motivation for having the tag information in DWARF, 
>>>>>> unrelated to
>>>>>>        BPF and BTF, is that the drgn project (another DWARF consumer) 
>>>>>> also wants
>>>>>>        to benefit from these tags in order to differentiate between 
>>>>>> different
>>>>>>        kinds of pointers in the kernel.
>>>>>>
>>>>>> 3)  Conveying the tags in the generated BTF debug info.
>>>>>>
>>>>>>        This is easy: the main purpose of having this info in BTF is for 
>>>>>> the
>>>>>>        compiled eBPF programs. The kernel verifier can then access the 
>>>>>> tags
>>>>>>        of pointers used by the eBPF programs.
>>>>>>
>>>>>>
>>>>>> For more information about these tags and the motivation behind them, 
>>>>>> please
>>>>>> refer to the following linux kernel discussions:
>>>>>>
>>>>>>      https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
>>>>>>      https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
>>>>>>      https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/
>>>>>>
>>>>>>
>>>>>> Implementation Overview
>>>>>> =======================
>>>>>>
>>>>>> To enable these annotations, two new C language attributes are added:
>>>>>> __attribute__((debug_annotate_decl("foo"))) and
>>>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a 
>>>>>> single
>>>>>> arbitrary string constant argument, which will be recorded in the 
>>>>>> generated
>>>>>> DWARF and/or BTF debug information. They have no effect on code 
>>>>>> generation.
>>>>>>
>>>>>> Note that we are not using the same attribute names as LLVM 
>>>>>> (btf_decl_tag and
>>>>>> btf_type_tag, respectively). While these attributes are functionally very
>>>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion 
>>>>>> of "btf"
>>>>>> in the attribute name seems misleading.
>>>>>>
>>>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When 
>>>>>> generating DWARF,
>>>>>> declarations and types will be checked for the corresponding attributes. 
>>>>>> If
>>>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the 
>>>>>> DIE for
>>>>>> the annotated type or declaration, one for each tag. These DIEs link the
>>>>>> arbitrary tag value to the item they annotate.
>>>>>>
>>>>>> For example, the following variable declaration:
>>>>>>
>>>>>>      #define __typetag1 __attribute__((debug_annotate_type ("typetag1")))
>>>>>>
>>>>>>      #define __decltag1 __attribute__((debug_annotate_decl ("decltag1")))
>>>>>>      #define __decltag2 __attribute__((debug_annotate_decl ("decltag2")))
>>>>>>
>>>>>>      int * __typetag1 x __decltag1 __decltag2;
>>>>>
>>>>> Based on the above example
>>>>>            static int do_execve(struct filename *filename,
>>>>>              const char __user *const __user *__argv,
>>>>>              const char __user *const __user *__envp);
>>>>>
>>>>> Should the above example should be the below?
>>>>>        int __typetag1 * x __decltag1 __decltag2
>>>>>
>>>> This example is not related to the one above. It is just meant to
>>>> show the behavior of both attributes. My apologies for not making
>>>> that clear.
>>>
>>> Okay, it should be fine if the dwarf debug_info is shown.
>>>
>>>>
>>>>>>
>>>>>> Produces the following DWARF information:
>>>>>>
>>>>>>     <1><1e>: Abbrev Number: 3 (DW_TAG_variable)
>>>>>>        <1f>   DW_AT_name        : x
>>>>>>        <21>   DW_AT_decl_file   : 1
>>>>>>        <22>   DW_AT_decl_line   : 7
>>>>>>        <23>   DW_AT_decl_column : 18
>>>>>>        <24>   DW_AT_type        : <0x49>
>>>>>>        <28>   DW_AT_external    : 1
>>>>>>        <28>   DW_AT_location    : 9 byte block: 3 0 0 0 0 0 0 0 0        
>>>>>> (DW_OP_addr: 0)
>>>>>>        <32>   DW_AT_sibling     : <0x49>
>>>>>>     <2><36>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>        <37>   DW_AT_name        : (indirect string, offset: 0xd6): 
>>>>>> debug_annotate_decl
>>>>>>        <3b>   DW_AT_const_value : (indirect string, offset: 0xcd): 
>>>>>> decltag2
>>>>>>     <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>        <40>   DW_AT_name        : (indirect string, offset: 0xd6): 
>>>>>> debug_annotate_decl
>>>>>>        <44>   DW_AT_const_value : (indirect string, offset: 0x0): 
>>>>>> decltag1
>>>>>>     <2><48>: Abbrev Number: 0
>>>>>>     <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type)
>>>>>>        <4a>   DW_AT_byte_size   : 8
>>>>>>        <4b>   DW_AT_type        : <0x5d>
>>>>>>        <4f>   DW_AT_sibling     : <0x5d>
>>>>>>     <2><53>: Abbrev Number: 1 (User TAG value: 0x6000)
>>>>>>        <54>   DW_AT_name        : (indirect string, offset: 0x9): 
>>>>>> debug_annotate_type
>>>>>>        <58>   DW_AT_const_value : (indirect string, offset: 0x1d): 
>>>>>> typetag1
>>>>>>     <2><5c>: Abbrev Number: 0
>>>>>>     <1><5d>: Abbrev Number: 5 (DW_TAG_base_type)
>>>>>>        <5e>   DW_AT_byte_size   : 4
>>>>>>        <5f>   DW_AT_encoding    : 5      (signed)
>>>>>>        <60>   DW_AT_name        : int
>>>>>>     <1><64>: Abbrev Number: 0
>>>
>>> This shows the info in .debug_abbrev. What I mean is to
>>> show the related info in .debug_info section which seems more useful to
>>> understand the relationships between different tags. Maybe this is due
>>> to that I am not fully understanding what <1>/<2> means in <1><49> and
>>> <2><53> etc.
>> I think that dump actually shows .debug_info, with the abbrevs
>> expanded...
>> Anyway, it seems to us that the root of this problem is the fact the
>> kernel sparse annotations, such as address_space(__user), are:
>> 1) To be processed by an external kernel-specific tool (
>>     https://sparse.docs.kernel.org/en/latest/annotations.html) and not a
>>     C compiler, and therefore,
>> 2) Not quite the same than compiler attributes (despite the way they
>>     look.)  In particular, they seem to assume an ordering different than
>>     of GNU attributes: in some cases given the same written order, they
>>     refer to different things!.  Which is quite unfortunate :(
>
> Yes, currently __user/__kernel macros (implemented with address_space
> attribute) are processed by macros.
>
>> Now, if I understood properly, you plan to change the definition of
>> __user and __kernel in the kernel sources in order to generate the tag
>> compiler attributes, correct?
>
> Right. The original __user definition likes:
>   # define __user         __attribute__((noderef, address_space(__user)))
>
> The new attribute looks like
>   # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value)))
>   #  define __user        BTF_TYPE_TAG(user)

Ok I see.  So the kernel will stop using sparse attributes to implement
__user and __kernel and start using compiler attributes for tags
instead.

>> Is that the reason why LLVM implements what we assume to be the
>> sparse
>> ordering, and not the correct GNU attributes ordering, for the tag
>> attributes?
>
> Note that __user attributes apply to pointee's and not pointers.
> Just like
>    const int *p;
> the 'const' is not applied to pointer 'p', but the pointee of 'p'.
>
> What current llvm dwarf generation with
>    pointer
>      <--- btf_type_tag
> is just ONE implementation. As I said earlier, I am okay to
> have dwarf implementation like
>    p->btf_type_tag->const->int.
> If you can propose an implementation like this in dwarf. I can propose
> to change implementation in llvm.

I think we are miscommunicating.

Looks like there is a divergence on what attributes apply to what
language entities between the sparse compiler and GCC/LLVM.  How to
represent that in DWARF is a different matter.

For this example:

  int __typetag1 * __typetag2 __typetag3 * g;

a) GCC associates __typetag1 with the pointer-to-pointer-to-int.
b) LLVM associates __typetag1 to pointer-to-int.

Where:

a) Is the expected behavior of a compiler attributes, as documented in
   the GCC manual.

b) Is presumably what the sparse compiler expects, but _not_ the
   ordering expected for a compiler GNU attribute.

So, if the kernel source __user and __kernel annotations (which
currently expand to sparse attributes) follow the sparse ordering, and
you want to implement __user and __kernel in terms of compiler
attributes instead (the annotation attributes) then you will have to:

1) Fix LLVM to implement the usual ordering for these attributes and
2) fix the kernel sources to use that ordering

[Incidentally, the same applies to another "ex-sparse" attribute you
 have in the kernel and also implemented in LLVM with a weird ordering:
 the address_space attribute.]

For 2), it may be possible to write a coccinnelle script to generate the
patch...

Does this make sense?

>> If that is so, we have quite a problem here: I don't think we can
>> change
>> the way GCC handles GNU-like attributes just because the kernel sources
>> want to hook on these __user/__kernel sparse annotations to generate the
>> compiler tags, even if we could mayhaps get GCC to handle
>> debug_annotate_type and debug_annotate_decl differently.  Some would say
>> doing so would perpetuate the mistake instead of fixing it...
>> Is my understanding correct?
>
> Let us just say that the btf_type_tag attribute applies to pointees.
> Does this help?
>
>> 
>>>>>
>>>>> Maybe you can also show what dwarf debug_info looks like
>>>> I am not sure what you mean. This is the .debug_info section as output
>>>> by readelf -w. I did trim some information not relevant to the discussion
>>>> such as the DW_TAG_compile_unit DIE, for brevity.
>>>>
>>>>>
>>>>>>
>>>>>> In the case of BTF, the annotations are recorded in two type kinds 
>>>>>> recently
>>>>>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.
>>>>>> The above example declaration prodcues the following BTF information:
>>>>>>
>>>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>>>>>> [2] PTR '(anon)' type_id=3
>>>>>> [3] TYPE_TAG 'typetag1' type_id=1
>>>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1
>>>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1
>>>>>> [6] VAR 'x' type_id=2, linkage=global
>>>>>> [7] DATASEC '.bss' size=0 vlen=1
>>>>>>  type_id=6 offset=0 size=8 (VAR 'x')
>>>>>>
>>>>>>
>>>>> [...]

Reply via email to