> On 6/17/22 10:18 AM, Jose E. Marchesi wrote: >> Hi Yonghong. >> >>> On 6/15/22 1:57 PM, David Faust wrote: >>>> >>>> On 6/14/22 22:53, Yonghong Song wrote: >>>>> >>>>> >>>>> On 6/7/22 2:43 PM, David Faust wrote: >>>>>> Hello, >>>>>> >>>>>> This patch series adds support for: >>>>>> >>>>>> - Two new C-language-level attributes that allow to associate (to >>>>>> "annotate" or >>>>>> to "tag") particular declarations and types with arbitrary strings. >>>>>> As >>>>>> explained below, this is intended to be used to, for example, >>>>>> characterize >>>>>> certain pointer types. >>>>>> >>>>>> - The conveyance of that information in the DWARF output in the form of >>>>>> a new >>>>>> DIE: DW_TAG_GNU_annotation. >>>>>> >>>>>> - The conveyance of that information in the BTF output in the form of >>>>>> two new >>>>>> kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>>>> >>>>>> All of these facilities are being added to the eBPF ecosystem, and >>>>>> support for >>>>>> them exists in some form in LLVM. >>>>>> >>>>>> Purpose >>>>>> ======= >>>>>> >>>>>> 1) Addition of C-family language constructs (attributes) to specify >>>>>> free-text >>>>>> tags on certain language elements, such as struct fields. >>>>>> >>>>>> The purpose of these annotations is to provide additional >>>>>> information about >>>>>> types, variables, and function parameters of interest to the >>>>>> kernel. A >>>>>> driving use case is to tag pointer types within the linux kernel >>>>>> and eBPF >>>>>> programs with additional semantic information, such as '__user' >>>>>> or '__rcu'. >>>>>> >>>>>> For example, consider the linux kernel function do_execve with the >>>>>> following declaration: >>>>>> >>>>>> static int do_execve(struct filename *filename, >>>>>> const char __user *const __user *__argv, >>>>>> const char __user *const __user *__envp); >>>>>> >>>>>> Here, __user could be defined with these annotations to record >>>>>> semantic >>>>>> information about the pointer parameters (e.g., they are >>>>>> user-provided) in >>>>>> DWARF and BTF information. Other kernel facilites such as the >>>>>> eBPF verifier >>>>>> can read the tags and make use of the information. >>>>>> >>>>>> 2) Conveying the tags in the generated DWARF debug info. >>>>>> >>>>>> The main motivation for emitting the tags in DWARF is that the >>>>>> Linux kernel >>>>>> generates its BTF information via pahole, using DWARF as a source: >>>>>> >>>>>> +--------+ BTF BTF +----------+ >>>>>> | pahole |-------> vmlinux.btf ------->| verifier | >>>>>> +--------+ +----------+ >>>>>> ^ ^ >>>>>> | | >>>>>> DWARF | BTF | >>>>>> | | >>>>>> vmlinux +-------------+ >>>>>> module1.ko | BPF program | >>>>>> module2.ko +-------------+ >>>>>> ... >>>>>> >>>>>> This is because: >>>>>> >>>>>> a) Unlike GCC, LLVM will only generate BTF for BPF programs. >>>>>> >>>>>> b) GCC can generate BTF for whatever target with -gbtf, but >>>>>> there is no >>>>>> support for linking/deduplicating BTF in the linker. >>>>>> >>>>>> In the scenario above, the verifier needs access to the pointer >>>>>> tags of >>>>>> both the kernel types/declarations (conveyed in the DWARF and >>>>>> translated >>>>>> to BTF by pahole) and those of the BPF program (available >>>>>> directly in BTF). >>>>>> >>>>>> Another motivation for having the tag information in DWARF, >>>>>> unrelated to >>>>>> BPF and BTF, is that the drgn project (another DWARF consumer) >>>>>> also wants >>>>>> to benefit from these tags in order to differentiate between >>>>>> different >>>>>> kinds of pointers in the kernel. >>>>>> >>>>>> 3) Conveying the tags in the generated BTF debug info. >>>>>> >>>>>> This is easy: the main purpose of having this info in BTF is for >>>>>> the >>>>>> compiled eBPF programs. The kernel verifier can then access the >>>>>> tags >>>>>> of pointers used by the eBPF programs. >>>>>> >>>>>> >>>>>> For more information about these tags and the motivation behind them, >>>>>> please >>>>>> refer to the following linux kernel discussions: >>>>>> >>>>>> https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/ >>>>>> https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/ >>>>>> https://lore.kernel.org/bpf/20211112012604.1504583-1-...@fb.com/ >>>>>> >>>>>> >>>>>> Implementation Overview >>>>>> ======================= >>>>>> >>>>>> To enable these annotations, two new C language attributes are added: >>>>>> __attribute__((debug_annotate_decl("foo"))) and >>>>>> __attribute__((debug_annotate_type("bar"))). Both attributes accept a >>>>>> single >>>>>> arbitrary string constant argument, which will be recorded in the >>>>>> generated >>>>>> DWARF and/or BTF debug information. They have no effect on code >>>>>> generation. >>>>>> >>>>>> Note that we are not using the same attribute names as LLVM >>>>>> (btf_decl_tag and >>>>>> btf_type_tag, respectively). While these attributes are functionally very >>>>>> similar, they have grown beyond purely BTF-specific uses, so inclusion >>>>>> of "btf" >>>>>> in the attribute name seems misleading. >>>>>> >>>>>> DWARF support is enabled via a new DW_TAG_GNU_annotation. When >>>>>> generating DWARF, >>>>>> declarations and types will be checked for the corresponding attributes. >>>>>> If >>>>>> present, a DW_TAG_GNU_annotation DIE will be created as a child of the >>>>>> DIE for >>>>>> the annotated type or declaration, one for each tag. These DIEs link the >>>>>> arbitrary tag value to the item they annotate. >>>>>> >>>>>> For example, the following variable declaration: >>>>>> >>>>>> #define __typetag1 __attribute__((debug_annotate_type ("typetag1"))) >>>>>> >>>>>> #define __decltag1 __attribute__((debug_annotate_decl ("decltag1"))) >>>>>> #define __decltag2 __attribute__((debug_annotate_decl ("decltag2"))) >>>>>> >>>>>> int * __typetag1 x __decltag1 __decltag2; >>>>> >>>>> Based on the above example >>>>> static int do_execve(struct filename *filename, >>>>> const char __user *const __user *__argv, >>>>> const char __user *const __user *__envp); >>>>> >>>>> Should the above example should be the below? >>>>> int __typetag1 * x __decltag1 __decltag2 >>>>> >>>> This example is not related to the one above. It is just meant to >>>> show the behavior of both attributes. My apologies for not making >>>> that clear. >>> >>> Okay, it should be fine if the dwarf debug_info is shown. >>> >>>> >>>>>> >>>>>> Produces the following DWARF information: >>>>>> >>>>>> <1><1e>: Abbrev Number: 3 (DW_TAG_variable) >>>>>> <1f> DW_AT_name : x >>>>>> <21> DW_AT_decl_file : 1 >>>>>> <22> DW_AT_decl_line : 7 >>>>>> <23> DW_AT_decl_column : 18 >>>>>> <24> DW_AT_type : <0x49> >>>>>> <28> DW_AT_external : 1 >>>>>> <28> DW_AT_location : 9 byte block: 3 0 0 0 0 0 0 0 0 >>>>>> (DW_OP_addr: 0) >>>>>> <32> DW_AT_sibling : <0x49> >>>>>> <2><36>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>>> <37> DW_AT_name : (indirect string, offset: 0xd6): >>>>>> debug_annotate_decl >>>>>> <3b> DW_AT_const_value : (indirect string, offset: 0xcd): >>>>>> decltag2 >>>>>> <2><3f>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>>> <40> DW_AT_name : (indirect string, offset: 0xd6): >>>>>> debug_annotate_decl >>>>>> <44> DW_AT_const_value : (indirect string, offset: 0x0): >>>>>> decltag1 >>>>>> <2><48>: Abbrev Number: 0 >>>>>> <1><49>: Abbrev Number: 4 (DW_TAG_pointer_type) >>>>>> <4a> DW_AT_byte_size : 8 >>>>>> <4b> DW_AT_type : <0x5d> >>>>>> <4f> DW_AT_sibling : <0x5d> >>>>>> <2><53>: Abbrev Number: 1 (User TAG value: 0x6000) >>>>>> <54> DW_AT_name : (indirect string, offset: 0x9): >>>>>> debug_annotate_type >>>>>> <58> DW_AT_const_value : (indirect string, offset: 0x1d): >>>>>> typetag1 >>>>>> <2><5c>: Abbrev Number: 0 >>>>>> <1><5d>: Abbrev Number: 5 (DW_TAG_base_type) >>>>>> <5e> DW_AT_byte_size : 4 >>>>>> <5f> DW_AT_encoding : 5 (signed) >>>>>> <60> DW_AT_name : int >>>>>> <1><64>: Abbrev Number: 0 >>> >>> This shows the info in .debug_abbrev. What I mean is to >>> show the related info in .debug_info section which seems more useful to >>> understand the relationships between different tags. Maybe this is due >>> to that I am not fully understanding what <1>/<2> means in <1><49> and >>> <2><53> etc. >> I think that dump actually shows .debug_info, with the abbrevs >> expanded... >> Anyway, it seems to us that the root of this problem is the fact the >> kernel sparse annotations, such as address_space(__user), are: >> 1) To be processed by an external kernel-specific tool ( >> https://sparse.docs.kernel.org/en/latest/annotations.html) and not a >> C compiler, and therefore, >> 2) Not quite the same than compiler attributes (despite the way they >> look.) In particular, they seem to assume an ordering different than >> of GNU attributes: in some cases given the same written order, they >> refer to different things!. Which is quite unfortunate :( > > Yes, currently __user/__kernel macros (implemented with address_space > attribute) are processed by macros. > >> Now, if I understood properly, you plan to change the definition of >> __user and __kernel in the kernel sources in order to generate the tag >> compiler attributes, correct? > > Right. The original __user definition likes: > # define __user __attribute__((noderef, address_space(__user))) > > The new attribute looks like > # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value))) > # define __user BTF_TYPE_TAG(user)
Ok I see. So the kernel will stop using sparse attributes to implement __user and __kernel and start using compiler attributes for tags instead. >> Is that the reason why LLVM implements what we assume to be the >> sparse >> ordering, and not the correct GNU attributes ordering, for the tag >> attributes? > > Note that __user attributes apply to pointee's and not pointers. > Just like > const int *p; > the 'const' is not applied to pointer 'p', but the pointee of 'p'. > > What current llvm dwarf generation with > pointer > <--- btf_type_tag > is just ONE implementation. As I said earlier, I am okay to > have dwarf implementation like > p->btf_type_tag->const->int. > If you can propose an implementation like this in dwarf. I can propose > to change implementation in llvm. I think we are miscommunicating. Looks like there is a divergence on what attributes apply to what language entities between the sparse compiler and GCC/LLVM. How to represent that in DWARF is a different matter. For this example: int __typetag1 * __typetag2 __typetag3 * g; a) GCC associates __typetag1 with the pointer-to-pointer-to-int. b) LLVM associates __typetag1 to pointer-to-int. Where: a) Is the expected behavior of a compiler attributes, as documented in the GCC manual. b) Is presumably what the sparse compiler expects, but _not_ the ordering expected for a compiler GNU attribute. So, if the kernel source __user and __kernel annotations (which currently expand to sparse attributes) follow the sparse ordering, and you want to implement __user and __kernel in terms of compiler attributes instead (the annotation attributes) then you will have to: 1) Fix LLVM to implement the usual ordering for these attributes and 2) fix the kernel sources to use that ordering [Incidentally, the same applies to another "ex-sparse" attribute you have in the kernel and also implemented in LLVM with a weird ordering: the address_space attribute.] For 2), it may be possible to write a coccinnelle script to generate the patch... Does this make sense? >> If that is so, we have quite a problem here: I don't think we can >> change >> the way GCC handles GNU-like attributes just because the kernel sources >> want to hook on these __user/__kernel sparse annotations to generate the >> compiler tags, even if we could mayhaps get GCC to handle >> debug_annotate_type and debug_annotate_decl differently. Some would say >> doing so would perpetuate the mistake instead of fixing it... >> Is my understanding correct? > > Let us just say that the btf_type_tag attribute applies to pointees. > Does this help? > >> >>>>> >>>>> Maybe you can also show what dwarf debug_info looks like >>>> I am not sure what you mean. This is the .debug_info section as output >>>> by readelf -w. I did trim some information not relevant to the discussion >>>> such as the DW_TAG_compile_unit DIE, for brevity. >>>> >>>>> >>>>>> >>>>>> In the case of BTF, the annotations are recorded in two type kinds >>>>>> recently >>>>>> added to the BTF specification: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG. >>>>>> The above example declaration prodcues the following BTF information: >>>>>> >>>>>> [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED >>>>>> [2] PTR '(anon)' type_id=3 >>>>>> [3] TYPE_TAG 'typetag1' type_id=1 >>>>>> [4] DECL_TAG 'decltag1' type_id=6 component_idx=-1 >>>>>> [5] DECL_TAG 'decltag2' type_id=6 component_idx=-1 >>>>>> [6] VAR 'x' type_id=2, linkage=global >>>>>> [7] DATASEC '.bss' size=0 vlen=1 >>>>>> type_id=6 offset=0 size=8 (VAR 'x') >>>>>> >>>>>> >>>>> [...]