On Thu, 2017-02-16 at 22:24 +0100, Daniel Borkmann wrote: > Long standing issue with JITed programs is that stack traces from > function tracing check whether a given address is kernel code > through {__,}kernel_text_address(), which checks for code in core > kernel, modules and dynamically allocated ftrace trampolines. But > what is still missing is BPF JITed programs (interpreted programs > are not an issue as __bpf_prog_run() will be attributed to them), > thus when a stack trace is triggered, the code walking the stack > won't see any of the JITed ones. The same for address correlation > done from user space via reading /proc/kallsyms. This is read by > tools like perf, but the latter is also useful for permanent live > tracing with eBPF itself in combination with stack maps when other > eBPF types are part of the callchain. See offwaketime example on > dumping stack from a map. > > This work tries to tackle that issue by making the addresses and > symbols known to the kernel. The lookup from *kernel_text_address() > is implemented through a latched RB tree that can be read under > RCU in fast-path that is also shared for symbol/size/offset lookup > for a specific given address in kallsyms. The slow-path iteration > through all symbols in the seq file done via RCU list, which holds > a tiny fraction of all exported ksyms, usually below 0.1 percent. > Function symbols are exported as bpf_prog_<tag>, in order to aide > debugging and attribution. This facility is currently enabled for > root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening > is active in any mode. The rationale behind this is that still a lot > of systems ship with world read permissions on kallsyms thus addresses > should not get suddenly exposed for them. If that situation gets > much better in future, we always have the option to change the > default on this. Likewise, unprivileged programs are not allowed > to add entries there either, but that is less of a concern as most > such programs types relevant in this context are for root-only anyway. > If enabled, call graphs and stack traces will then show a correct > attribution; one example is illustrated below, where the trace is > now visible in tooling such as perf script --kallsyms=/proc/kallsyms > and friends. > > Before: > > 7fff8166889d bpf_clone_redirect+0x80007f0020ed > (/lib/modules/4.9.0-rc8+/build/vmlinux) > f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so) > > After: > > 7fff816688b7 bpf_clone_redirect+0x80007f002107 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fffa07ef1fc cls_bpf_classify+0x8000600020dc > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff81678b68 tc_classify+0x80007f002078 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8164d40b __netif_receive_skb_core+0x80007f0025fb > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8164d718 __netif_receive_skb+0x80007f002018 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8164e565 process_backlog+0x80007f002095 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8164dc71 net_rx_action+0x80007f002231 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff81767461 __softirqentry_text_start+0x80007f0020d1 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff817658ac do_softirq_own_stack+0x80007f00201c > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff810a2c20 do_softirq+0x80007f002050 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8168d452 ip_finish_output2+0x80007f002152 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8168ea3d ip_finish_output+0x80007f00217d > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff8168f2af ip_output+0x80007f00203f > (/lib/modules/4.9.0-rc8+/build/vmlinux) > [...] > 7fff81005854 do_syscall_64+0x80007f002054 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > 7fff817649eb return_from_SYSCALL_64+0x80007f002000 > (/lib/modules/4.9.0-rc8+/build/vmlinux) > f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so) > > Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> > Acked-by: Alexei Starovoitov <a...@kernel.org> > Cc: linux-kernel@vger.kernel.org > ---
Latest net-next tree dies on my hosts, and my bisection came to this commit. [ 90.045546] BUG: unable to handle kernel paging request at ffff881fef01a000^M [ 90.052535] IP: __tlb_remove_page_size+0x57/0x90^M [ 90.057152] PGD 2247067 ^M [ 90.057153] PUD 1fdaadc063 ^M [ 90.059691] PMD 1fefb0b063 ^M [ 90.062491] PTE 8000001fef01a161^M [ 90.065287] ^M [ 90.070011] Oops: 0003 [#1] SMP^M [ 90.073478] gsmi: Log Shutdown Reason 0x03^M [ 90.077584] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core^M [ 90.087972] CPU: 34 PID: 9747 Comm: sshd Not tainted 4.10.0-smp-DEV #14^M [ 90.101580] task: ffff881fda56a300 task.stack: ffffc900337d4000^M [ 90.107515] RIP: 0010:__tlb_remove_page_size+0x57/0x90^M [ 90.112651] RSP: 0018:ffffc900337d7c98 EFLAGS: 00010202^M [ 90.117896] RAX: ffff881fef01a000 RBX: ffffc900337d7df8 RCX: 0000000000000001^M [ 90.125086] RDX: ffff880000000000 RSI: 0000000000000011 RDI: ffff88207fffe4c0^M [ 90.132234] RBP: ffffc900337d7ca0 R08: 0000000000000010 R09: ffffc900337d7bd8^M [ 90.139371] R10: 0000000000000020 R11: 0000000000000001 R12: ffff881fda064520^M [ 90.146544] R13: ffffea00ffb28f40 R14: 00007f84584a5000 R15: ffffc900337d7df8^M [ 90.153703] FS: 0000000000000000(0000) GS:ffff881fffd80000(0000) knlGS:0000000000000000^M [ 90.161802] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M [ 90.167548] CR2: ffff881fef01a000 CR3: 0000000001c09000 CR4: 00000000001406e0^M [ 90.174680] Call Trace:^M [ 90.177144] unmap_page_range+0x679/0x840^M [ 90.181154] unmap_single_vma+0x7f/0xf0^M [ 90.184984] unmap_vmas+0x4a/0xa0^M [ 90.188292] exit_mmap+0xa2/0x160^M [ 90.191605] mmput+0x3d/0x100^M[ 90.194584] do_exit+0x325/0xbc0^M [ 90.197810] ? vfs_read+0x95/0x140^M [ 90.201230] do_group_exit+0x49/0xc0^M [ 90.204818] SyS_exit_group+0x14/0x20^M [ 90.208492] entry_SYSCALL_64_fastpath+0x13/0x94^M [ 90.213127] RIP: 0033:0x7f8457b10279^M [ 90.216723] RSP: 002b:00007ffef283a8f0 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7^M [ 90.224286] RAX: ffffffffffffffda RBX: 000055692c599640 RCX: 00007f8457b10279^M [ 90.231432] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 00000000000000ff^M [ 90.238575] RBP: 00007ffef283a9f0 R08: 000000000000003c R09: 00000000000000e7^M [ 90.245728] R10: ffffffffffffff90 R11: 0000000000000246 R12: 000055692c599640^M [ 90.252875] R13: 0000000000002614 R14: 000000000000ac60 R15: 00007ffef283aa90^M [ 90.260018] Code: 89 47 20 31 c0 c3 83 7f 78 13 74 45 55 53 31 f6 48 89 fb bf 00 02 00 01 48 8d 6c 24 08 e8 c2 05 fd ff 48 85 c0 74 30 83 43 78 01 <48> c7 00 00 00 00 00 c7 40 08 00 00 00 00 c7 40 0c fe 01 00 00 ^M [ 90.278939] RIP: __tlb_remove_page_size+0x57/0x90 RSP: ffffc900337d7c98^M [ 90.285550] CR2: ffff881fef01a000^M