Hi, > On Tue, Nov 15, 2022 at 2:04 AM TATSUKAWA KOSUKE(立川 江介) > <tatsu-...@nec.com> wrote: >> >> Hi, >> >> I was using nfsslower in bcc tools and noticed that it doesn't print out >> slow commit operations, which sometimes cause long delays in NFS writes. >> I tried adding code to trace slow commit operations and created the >> attached patch. > > The concept and code looks okay to me. One thing is you could use > RAW_TRACEPOINT_PROBE macro to be consistent with some other > bcc tools raw tracepoint usages. > > Please submit a pull request.
thank you for the reply. I'll change the code to use RAW_TRACEPOINT_PROBE macro used in other bcc tools. While testing on some old environments which doesn't support raw tracepoints, I noticed that linux/nfs_fs.h header file causes bcc to print the error message "use of undeclared identifier 'KBUILD_MODNAME'". This problem seems to affect other bcc tools, which #define this symbol to work around it. I'll submit a pull request once I've done a little more testing. Best regards. >> Please let me hear your opinions. >> >> NFS commits are quite slow and are not performed often, so the patch >> shouldn't incur visible overhead. >> >> Below is a sample output from the modified tool when writing a large >> file using dd command. Most commits are issued from kworker threads, >> so the commit output may be mostly dropped when filtered by PID. >> >> TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME >> ... >> 16:46:11 dd 1212 W 1048576 651264 46.58 testfile >> 16:46:11 dd 1212 W 1048576 653312 54.41 testfile >> 16:46:11 dd 1212 W 1048576 654336 18.96 testfile >> 16:46:11 dd 1212 W 1048576 655360 49.05 testfile >> 16:46:11 dd 1212 W 1048576 657408 82.96 testfile >> 16:46:11 dd 1212 W 1048576 659456 109.25 testfile >> 16:46:12 dd 1212 W 1048576 660480 163.55 testfile >> 16:46:12 dd 1212 W 1048576 662528 205.44 testfile >> 16:46:13 dd 1212 W 1048576 663552 751.02 testfile >> 16:46:37 kworker/u8:5 1207 C 0 0 27449.05 testfile >> 16:46:37 kworker/u8:5 1207 C 0 0 26725.16 testfile >> 16:46:37 kworker/u8:5 1207 C 0 0 27592.04 testfile >> 16:46:37 kworker/u8:4 1206 C 0 0 22188.75 testfile >> 16:46:37 kworker/u8:4 1206 C 0 0 26092.59 testfile >> 16:46:37 kworker/u8:4 1206 C 0 0 27268.90 testfile >> 16:46:37 kworker/u8:4 1206 C 0 0 22303.24 testfile >> 16:46:37 kworker/u8:4 1206 C 0 0 27081.34 testfile >> 16:46:37 dd 1212 W 1048576 664576 24337.80 testfile >> 16:46:38 dd 1212 W 1048576 958464 61.77 testfile >> 16:46:38 dd 1212 W 1048576 960512 56.60 testfile >> 16:46:38 dd 1212 W 1048576 963584 55.75 testfile >> 16:46:38 dd 1212 W 1048576 965632 54.84 testfile >> ... >> >> Best regards. >> --- >> Kosuke TATSUKAWA | 1st Platform Software Division >> | NEC Solution Innovators >> | tatsu-...@nec.com >> >> ------------------------------------------------------------------------ >> diff -rup a/tools/nfsslower.py b/tools/nfsslower.py >> --- a/tools/nfsslower 2022-11-14 12:39:23.677073240 +0900 >> +++ b/tools/nfsslower 2022-11-14 14:44:49.960073240 +0900 >> @@ -9,6 +9,8 @@ >> # This script traces some common NFS operations: read, write, opens and >> # getattr. It measures the time spent in these operations, and prints >> details >> # for each that exceeded a threshold. >> +# The script also traces commit operations, which is specific to nfs and >> could >> +# be pretty slow. >> # >> # WARNING: This adds low-overhead instrumentation to these NFS operations, >> # including reads and writes from the file system cache. Such reads and >> writes >> @@ -25,6 +27,8 @@ >> # Currently there aren't any tracepoints available for nfs_read_file, >> # nfs_write_file and nfs_open_file, nfs_getattr does have entry and exit >> # tracepoints but we chose to use kprobes for consistency >> +# Raw tracepoints are used to trace nfs:nfs_initiate_commit and >> +# nfs:nfs_commit_done. >> # >> # 31-Aug-2017 Samuel Nair created this. Should work with NFSv{3,4} >> >> @@ -41,8 +45,8 @@ examples = """ >> ./nfsslower -p 121 # trace pid 121 only >> """ >> parser = argparse.ArgumentParser( >> - description="""Trace READ, WRITE, OPEN \ >> -and GETATTR NFS calls slower than a threshold,\ >> + description="""Trace READ, WRITE, OPEN, GETATTR \ >> +and COMMIT NFS calls slower than a threshold,\ >> supports NFSv{3,4}""", >> formatter_class=argparse.RawDescriptionHelpFormatter, >> epilog=examples) >> @@ -66,11 +70,13 @@ bpf_text = """ >> #include <linux/fs.h> >> #include <linux/sched.h> >> #include <linux/dcache.h> >> +#include <linux/nfs_fs.h> >> >> #define TRACE_READ 0 >> #define TRACE_WRITE 1 >> #define TRACE_OPEN 2 >> #define TRACE_GETATTR 3 >> +#define TRACE_COMMIT 4 >> >> struct val_t { >> u64 ts; >> @@ -79,6 +85,12 @@ struct val_t { >> struct dentry *d; >> }; >> >> +struct commit_t { >> + u64 ts; >> + u64 offset; >> + u64 count; >> +}; >> + >> struct data_t { >> // XXX: switch some to u32's when supported >> u64 ts_us; >> @@ -93,6 +105,7 @@ struct data_t { >> >> BPF_HASH(entryinfo, u64, struct val_t); >> BPF_PERF_OUTPUT(events); >> +BPF_HASH(commitinfo, u64, struct commit_t); >> >> int trace_rw_entry(struct pt_regs *ctx, struct kiocb *iocb, >> struct iov_iter *data) >> @@ -227,6 +240,56 @@ int trace_getattr_return(struct pt_regs >> return trace_exit(ctx, TRACE_GETATTR); >> } >> >> +int raw_nfs_initiate_commit(struct bpf_raw_tracepoint_args *ctx) { >> + u64 key = (u64)ctx->args[0]; >> + struct nfs_commit_data *cd = (struct nfs_commit_data *)key; >> + struct commit_t c = { 0 }; >> + >> + c.ts = bpf_ktime_get_ns(); >> + bpf_probe_read_kernel(&c.offset, sizeof(cd->args.offset), >> + &cd->args.offset); >> + bpf_probe_read_kernel(&c.count, sizeof(cd->args.count), >> &cd->args.count); >> + commitinfo.update(&key, &c); >> + return 0; >> +} >> + >> +int raw_nfs_commit_done(struct bpf_raw_tracepoint_args *ctx) { >> + u64 key = (u64)ctx->args[1]; >> + struct commit_t *cp = commitinfo.lookup(&key); >> + >> + if (cp) { >> + struct nfs_commit_data *cd = (struct nfs_commit_data *)key; >> + struct nfs_open_context *p; >> + struct dentry *de; >> + struct qstr qs; >> + u64 ts = bpf_ktime_get_ns(); >> + u64 delta_us = (ts - cp->ts) / 1000; >> + u32 pid = bpf_get_current_pid_tgid() >> 32; >> + struct data_t data = {.type = TRACE_COMMIT, .offset = cp->offset, >> + .size = cp->count, .ts_us = ts/1000, .delta_us = delta_us, >> + .pid = pid}; >> + >> + commitinfo.delete(&key); >> + bpf_get_current_comm(&data.task, sizeof(data.task)); >> + >> + if(FILTER_PID) >> + return 0; >> + >> + if (FILTER_US) >> + return 0; >> + >> + bpf_probe_read_kernel(&p, sizeof(p), &cd->context); >> + bpf_probe_read_kernel(&de, sizeof(de), &p->dentry); >> + bpf_probe_read_kernel(&qs, sizeof(qs), &de->d_name); >> + if (qs.len) { >> + bpf_probe_read_kernel(&data.file, sizeof(data.file), >> + (void *)qs.name); >> + events.perf_submit(ctx, &data, sizeof(data)); >> + } >> + } >> + return 0; >> +} >> + >> """ >> if min_ms == 0: >> bpf_text = bpf_text.replace('FILTER_US', '0') >> @@ -253,6 +316,8 @@ def print_event(cpu, data, size): >> type = 'O' >> elif event.type == 3: >> type = 'G' >> + elif event.type == 4: >> + type = 'C' >> >> if(csv): >> print("%d,%s,%d,%s,%d,%d,%d,%s" % ( >> @@ -273,7 +338,18 @@ def print_event(cpu, data, size): >> # Currently specifically works for NFSv4, the other kprobes are generic >> # so it should work with earlier NFS versions >> >> -b = BPF(text=bpf_text) >> +# The following warning is shown on kernels after linux-5.18 when using bcc. >> +# Add compile option to silence it. >> +# In file included from /virtual/main.c:7: >> +# In file included from include/linux/nfs_fs.h:31: >> +# In file included from include/linux/sunrpc/auth.h:13: >> +# In file included from include/linux/sunrpc/sched.h:19: >> +# include/linux/sunrpc/xdr.h:751:10: warning: result of comparison of >> constant 4611686018427387903 with expression of type '__u32' (aka 'unsigned >> int') is always false [-Wtautological-constant-out-of-range-compare] >> +# if (len > SIZE_MAX / sizeof(*p)) >> +# ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~ >> +# 1 warning generated. >> +b = BPF(text=bpf_text, >> + cflags=["-Wno-tautological-constant-out-of-range-compare"]) >> b.attach_kprobe(event="nfs_file_read", fn_name="trace_rw_entry") >> b.attach_kprobe(event="nfs_file_write", fn_name="trace_rw_entry") >> b.attach_kprobe(event="nfs4_file_open", fn_name="trace_file_open_entry") >> @@ -286,6 +362,9 @@ b.attach_kretprobe(event="nfs4_file_open >> b.attach_kretprobe(event="nfs_file_open", fn_name="trace_file_open_return") >> b.attach_kretprobe(event="nfs_getattr", fn_name="trace_getattr_return") >> >> +b.attach_raw_tracepoint("nfs_initiate_commit", "raw_nfs_initiate_commit") >> +b.attach_raw_tracepoint("nfs_commit_done", "raw_nfs_commit_done") >> + >> if(csv): >> print("ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE") >> else: >> >> --- Kosuke TATSUKAWA | 1st Platform Software Division | NEC Solution Innovators | tatsu-...@nec.com -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#2025): https://lists.iovisor.org/g/iovisor-dev/message/2025 Mute This Topic: https://lists.iovisor.org/mt/95039760/21656 Group Owner: iovisor-dev+ow...@lists.iovisor.org Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-