On Thu, May 14, 2020 at 12:29 AM Yonghong Song <y...@fb.com> wrote: > > > > On 5/12/20 11:51 PM, Daniel T. Lee wrote: > > On Wed, May 13, 2020 at 10:40 AM Yonghong Song <y...@fb.com> wrote: > >> > >> > >> > >> On 5/12/20 7:43 AM, Daniel T. Lee wrote: > >>> Currently, the kprobe BPF program attachment method for bpf_load is > >>> quite old. The implementation of bpf_load "directly" controls and > >>> manages(create, delete) the kprobe events of DEBUGFS. On the other hand, > >>> using using the libbpf automatically manages the kprobe event. > >>> (under bpf_link interface) > >>> > >>> By calling bpf_program__attach(_kprobe) in libbpf, the corresponding > >>> kprobe is created and the BPF program will be attached to this kprobe. > >>> To remove this, by simply invoking bpf_link__destroy will clean up the > >>> event. > >>> > >>> This commit refactors kprobe tracing programs (tracex{1~7}_user.c) with > >>> libbpf using bpf_link interface and bpf_program__attach. > >>> > >>> tracex2_kern.c, which tracks system calls (sys_*), has been modified to > >>> append prefix depending on architecture. > >>> > >>> Signed-off-by: Daniel T. Lee <danieltim...@gmail.com> > >>> --- > >>> samples/bpf/Makefile | 12 +++---- > >>> samples/bpf/tracex1_user.c | 41 ++++++++++++++++++++---- > >>> samples/bpf/tracex2_kern.c | 8 ++++- > >>> samples/bpf/tracex2_user.c | 55 ++++++++++++++++++++++++++------ > >>> samples/bpf/tracex3_user.c | 65 ++++++++++++++++++++++++++++---------- > >>> samples/bpf/tracex4_user.c | 55 +++++++++++++++++++++++++------- > >>> samples/bpf/tracex6_user.c | 53 +++++++++++++++++++++++++++---- > >>> samples/bpf/tracex7_user.c | 43 ++++++++++++++++++++----- > >>> 8 files changed, 268 insertions(+), 64 deletions(-) > >>> > >>> diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile > >>> index 424f6fe7ce38..4c91e5914329 100644 > >>> --- a/samples/bpf/Makefile > >>> +++ b/samples/bpf/Makefile > >>> @@ -64,13 +64,13 @@ fds_example-objs := fds_example.o > >>> sockex1-objs := sockex1_user.o > >>> sockex2-objs := sockex2_user.o > >>> sockex3-objs := bpf_load.o sockex3_user.o > >>> -tracex1-objs := bpf_load.o tracex1_user.o $(TRACE_HELPERS) > >>> -tracex2-objs := bpf_load.o tracex2_user.o > >>> -tracex3-objs := bpf_load.o tracex3_user.o > >>> -tracex4-objs := bpf_load.o tracex4_user.o > >>> +tracex1-objs := tracex1_user.o $(TRACE_HELPERS) > >>> +tracex2-objs := tracex2_user.o > >>> +tracex3-objs := tracex3_user.o > >>> +tracex4-objs := tracex4_user.o > >>> tracex5-objs := bpf_load.o tracex5_user.o $(TRACE_HELPERS) > >>> -tracex6-objs := bpf_load.o tracex6_user.o > >>> -tracex7-objs := bpf_load.o tracex7_user.o > >>> +tracex6-objs := tracex6_user.o > >>> +tracex7-objs := tracex7_user.o > >>> test_probe_write_user-objs := bpf_load.o test_probe_write_user_user.o > >>> trace_output-objs := bpf_load.o trace_output_user.o $(TRACE_HELPERS) > >>> lathist-objs := bpf_load.o lathist_user.o > >>> diff --git a/samples/bpf/tracex1_user.c b/samples/bpf/tracex1_user.c > >>> index 55fddbd08702..1b15ab98f7d3 100644 > >>> --- a/samples/bpf/tracex1_user.c > >>> +++ b/samples/bpf/tracex1_user.c > >>> @@ -1,21 +1,45 @@ > >>> // SPDX-License-Identifier: GPL-2.0 > >>> #include <stdio.h> > >>> -#include <linux/bpf.h> > >>> #include <unistd.h> > >>> -#include <bpf/bpf.h> > >>> -#include "bpf_load.h" > >>> +#include <bpf/libbpf.h> > >>> #include "trace_helpers.h" > >>> > >>> +#define __must_check > >> > >> This is not very user friendly. > >> Maybe not including linux/err.h and > >> use libbpf API libbpf_get_error() instead? > >> > > > > This approach looks more apparent and can stick with the libbpf API. > > I'll update code using this way. > > > >>> +#include <linux/err.h> > >>> + > >>> int main(int ac, char **argv) > >>> { > >>> - FILE *f; > >>> + struct bpf_link *link = NULL; > >>> + struct bpf_program *prog; > >>> + struct bpf_object *obj; > >>> char filename[256]; > >>> + FILE *f; > >>> > >>> snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); > >>> + obj = bpf_object__open_file(filename, NULL); > >>> + if (IS_ERR(obj)) { > >>> + fprintf(stderr, "ERROR: opening BPF object file failed\n"); > >>> + obj = NULL; > >>> + goto cleanup; > >> > >> You do not need to goto cleanup, directly return 0 is okay here. > >> The same for other files in this patch. > >> > > > > As you said, it would be better to return right away than to proceed > > any further. I'll apply the code at next patch. > > > >>> + } > >>> + > >>> + prog = bpf_object__find_program_by_name(obj, "bpf_prog1"); > >>> + if (!prog) { > >>> + fprintf(stderr, "ERROR: finding a prog in obj file > >>> failed\n"); > >>> + goto cleanup; > >>> + } > >>> + > >>> + /* load BPF program */ > >>> + if (bpf_object__load(obj)) { > >>> + fprintf(stderr, "ERROR: loading BPF object file failed\n"); > >>> + goto cleanup; > >>> + } > >>> > >>> - if (load_bpf_file(filename)) { > >>> - printf("%s", bpf_log_buf); > >>> - return 1; > >>> + link = bpf_program__attach(prog); > >>> + if (IS_ERR(link)) { > >>> + fprintf(stderr, "ERROR: bpf_program__attach failed\n"); > >>> + link = NULL; > >>> + goto cleanup; > >>> } > >>> > >>> f = popen("taskset 1 ping -c5 localhost", "r"); > >>> @@ -23,5 +47,8 @@ int main(int ac, char **argv) > >>> > >>> read_trace_pipe(); > >>> > >>> +cleanup: > >>> + bpf_link__destroy(link); > >>> + bpf_object__close(obj); > >> > >> Typically in kernel, we do multiple labels for such cases > >> like > >> destroy_link: > >> bpf_link__destroy(link); > >> close_object: > >> bpf_object__close(obj); > >> > >> The error path in the main() function jumps to proper label. > >> This is more clean and less confusion. > >> > >> The same for other cases in this file. > >> > > > > I totally agree that multiple labels are much more intuitive. > > But It's not very common to jump to the destroy_link label. > > > > Either when on the routine is completed successfully and jumps to the > > destroy_link branch, or an error occurred while bpf_program__attach > > was called "several" times and jumps to the destroy_link branch. > > > > Single bpf_program__attach like this tracex1 sample doesn't really have > > to destroy link, since the link has been set to NULL on attach error and > > bpf_link__destroy() is designed to do nothing if passed NULL to it. > > > > So I think current approach will keep consistent between samples since > > most of the sample won't need to jump to destroy_link. > > Since this is the sample code, I won't enforce that. So yes, you can > keep your current approach. > > > > >>> return 0; > >>> } > >>> diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c > >>> index d865bb309bcb..ff5d00916733 100644 > >>> --- a/samples/bpf/tracex2_kern.c > >>> +++ b/samples/bpf/tracex2_kern.c > >>> @@ -11,6 +11,12 @@ > >>> #include <bpf/bpf_helpers.h> > >>> #include <bpf/bpf_tracing.h> > >>> > >>> +#ifdef __x86_64__ > >>> +#define SYSCALL "__x64_" > >>> +#else > >>> +#define SYSCALL > >>> +#endif > >> > >> See test_progs.h, one more case to handle: > >> #ifdef __x86_64__ > >> #define SYS_NANOSLEEP_KPROBE_NAME "__x64_sys_nanosleep" > >> #elif defined(__s390x__) > >> #define SYS_NANOSLEEP_KPROBE_NAME "__s390x_sys_nanosleep" > >> #else > >> #define SYS_NANOSLEEP_KPROBE_NAME "sys_nanosleep" > >> #endif > >> > > > > That was also one of the considerations when writing patches. > > I'm planning to refactor most of the programs in the sample using > > libbpf, and found out that there are bunch of samples that tracks > > syscall with kprobe. Replacing all of them will take lots of macros > > and I thought using prefix will be better idea. > > > > Actually, my initial plan was to create macro of SYSCALL() > > > > #ifdef __x86_64__ > > #define PREFIX "__x64_" > > #elif defined(__s390x__) > > #define PREFIX "__s390x_" > > #else > > #define PREFIX "" > > #endif > > > > #define SYSCALL(SYS) PREFIX ## SYS > > > > And to use this macro universally without creating additional headers, > > I was trying to add this to samples/bpf/syscall_nrs.c which later > > compiles to samples/bpf/syscall_nrs.h. But it was pretty hacky way and > > it won't work properly. So I ended up with just adding prefix to syscall. > > I think it is okay to create a trace_common.h to have this definition > defined in one place and use them in bpf programs. > > > > > Is it necessary to define all of the macro for each architecture? > > Yes, if we define in trace_common.h, let us do for x64/x390x/others > similar to the above. >
Sounds great. I'll add trace_common.h and apply the syscall macro. I'll send a new version of the patch soon. > > > >>> + > >>> struct bpf_map_def SEC("maps") my_map = { > >>> .type = BPF_MAP_TYPE_HASH, > >>> .key_size = sizeof(long), > >>> @@ -77,7 +83,7 @@ struct bpf_map_def SEC("maps") my_hist_map = { > >>> .max_entries = 1024, > >>> }; > >>> > >>> -SEC("kprobe/sys_write") > >>> +SEC("kprobe/" SYSCALL "sys_write") > >>> int bpf_prog3(struct pt_regs *ctx) > >>> { > >>> long write_size = PT_REGS_PARM3(ctx); > >> [...] > > > > > > Thank you for your time and effort for the review :) > > > > Best, > > Daniel > > Thank you for your time and effort for the review :) -- Best, Daniel T. Lee