2016-08-26 17:42 GMT+03:00 Dmitry Safonov <0x7f454...@gmail.com>: > 2016-08-26 17:32 GMT+03:00 Andy Lutomirski <l...@amacapital.net>: >> On Fri, Aug 26, 2016 at 4:16 AM, Dmitry Safonov <0x7f454...@gmail.com> wrote: >>> 2016-08-26 2:00 GMT+03:00 H. Peter Anvin <h...@zytor.com>: >>>> On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454...@gmail.com> >>>> wrote: >>>>>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <h...@zytor.com>: >>>>>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov >>>>><dsafo...@virtuozzo.com> wrote: >>>>>>>This patches set is cleanly RFC and is not supposed to be applied. >>>>>>>Also for RFC time it builds only on x86_64. >>>>>>> >>>>>>>So, in a mail thread Oleg told that it would be worth to introduce >>>>>>>vm_file >>>>>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs >>>>>>>[1]. >>>>>>>In this patches set I introduce in-kernel filesystem for vdso files. >>>>>>>After patches vDSO VMA now has inode and is just a private file >>>>>>>mapping: >>>>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0 >>>>>>> [vvar] >>>>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18 >>>>>>> [vdso] >>>>>>> >>>>>>>Then I introduce interface in uprobe_events to insert uprobes in >>>>>vdso. >>>>>>>FWIW: >>>>>>> [~]# cd kernel/linux >>>>>>> [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so >>>>>>>Symbol table '.dynsym' contains 11 entries: >>>>>>> Num: Value Size Type Bind Vis Ndx Name >>>>>>> 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND >>>>>>> 1: 0000000000000470 0 SECTION LOCAL DEFAULT 8 >>>>>>>2: 00000000000008d0 885 FUNC WEAK DEFAULT 12 >>>>>>>clock_gettime@@LINUX_2.6 >>>>>>>3: 0000000000000c50 472 FUNC GLOBAL DEFAULT 12 >>>>>>>__vdso_gettimeofday@@LINUX_2.6 >>>>>>>4: 0000000000000c50 472 FUNC WEAK DEFAULT 12 >>>>>>>gettimeofday@@LINUX_2.6 >>>>>>>5: 0000000000000e30 21 FUNC GLOBAL DEFAULT 12 >>>>>>>__vdso_time@@LINUX_2.6 >>>>>>> 6: 0000000000000e30 21 FUNC WEAK DEFAULT 12 >>>>>time@@LINUX_2.6 >>>>>>>7: 00000000000008d0 885 FUNC GLOBAL DEFAULT 12 >>>>>>>__vdso_clock_gettime@@LINUX_2.6 >>>>>>> 8: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6 >>>>>>>9: 0000000000000e50 41 FUNC GLOBAL DEFAULT 12 >>>>>>>__vdso_getcpu@@LINUX_2.6 >>>>>>>10: 0000000000000e50 41 FUNC WEAK DEFAULT 12 >>>>>>>getcpu@@LINUX_2.6 >>>>>>> [~]# cd /sys/kernel/debug/tracing/ >>>>>>> [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events >>>>>>> [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events >>>>>>> [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events >>>>>>> [tracing]# echo 1 > events/uprobes/enable >>>>>>> [tracing]# su test # it has UID=1001 >>>>>>> [tracing]$ date >>>>>>> Thu Aug 25 17:19:29 MSK 2016 >>>>>>> [tracing]$ exit >>>>>>> [tracing]# cat trace >>>>>>> # tracer: nop >>>>>>> # >>>>>>> # entries-in-buffer/entries-written: 175/175 #P:4 >>>>>>> # >>>>>>> # _-----=> irqs-off >>>>>>> # / _----=> need-resched >>>>>>> # | / _---=> hardirq/softirq >>>>>>> # || / _--=> preempt-depth >>>>>>> # ||| / delay >>>>>>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION >>>>>>> # | | | |||| | | >>>>>>> bash-11560 [001] d... 316.470236: time: >>>>>(0x7ffcacebae30) >>>>>>> bash-11560 [001] d... 316.471436: gettimeofday: >>>>>(0x7ffcacebac50) >>>>>>> bash-11560 [001] d... 316.477550: time: >>>>>(0x7ffcacebae30) >>>>>>> bash-11560 [001] d... 316.477655: time: >>>>>(0x7ffcacebae30) >>>>>>> mktemp-11568 [001] d... 316.479589: gettimeofday: >>>>>(0x7ffc603f0c50) >>>>>>> date-11571 [001] d... 316.481890: clock_gettime: >>>>>(0x7ffec9db58d0) >>>>>>>[...] >>>>>>> >>>>>>>If this approach will be decided as fine, I will prepare a better >>>>>>>version, >>>>>>>fixing the following things: >>>>>>>o put vdsofs in generic fs/* dir >>>>>>>o support other archs and vdso blobs >>>>>>>o remove BUG_ON()'s and UID==1001 check >>>>>>>o remove extern's and use headers only >>>>>>>o refactor code in create_trace_uprobe() >>>>>>>o add some state to (struct trace_uprobe), so i.e., `cat >>>>>uprobe_events` >>>>>>>will >>>>>>> print those uprobes as vdso-based >>>>>>>o document this interface in Documentation/trace/uprobetracer.txt >>>>>>>o prepare nice patches set? >>>>>>> >>>>>>>So, opinions? Is it worth to add something like this? >>>>>>> >>>>>>>[1]: https://lkml.org/lkml/2016/7/12/346 >>>>>>> >>>>>>>Dmitry Safonov (3): >>>>>>> x86/vdso: create vdso file, use it for mapping >>>>>>> uprobe: drop isdigit() check in create_trace_uprobe >>>>>>> uprobe: add vdso support >>>>>>> >>>>>>>Cc: Oleg Nesterov <o...@redhat.com> >>>>>>>Cc: Al Viro <v...@zeniv.linux.org.uk> >>>>>>>Cc: Steven Rostedt <rost...@goodmis.org> >>>>>>>Cc: Andy Lutomirski <l...@amacapital.net> >>>>>>>Cc: Thomas Gleixner <t...@linutronix.de> >>>>>>>Cc: Ingo Molnar <mi...@redhat.com> >>>>>>>Cc: "H. Peter Anvin" <h...@zytor.com> >>>>>>>Cc: x...@kernel.org >>>>>>>Cc: Dmitry Safonov <0x7f454...@gmail.com> >>>>>>> >>>>>>>arch/x86/entry/vdso/vma.c | 148 >>>>>>>++++++++++++++++++++++++++++++++++++++++++-- >>>>>>> kernel/trace/trace_uprobe.c | 50 +++++++++++---- >>>>>>> 2 files changed, 180 insertions(+), 18 deletions(-) >>>>>> >>>>>> I think there is a lot to be said for this idea. However, a private >>>>>mapping is definitely wrong for the vvar data; for the vdso code it >>>>>could be considered either way I suppose. >>>>> >>>>>Thanks on your reply. >>>>>As you could see, I preserved pure mapping of pfn for vvar: >>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0 >>>>> [vvar] >>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18 >>>>> [vdso] >>>>>(no inode number). >>>>>I also think it would be useless to do the same to vvar as it >>>>>has just data and there is no point in probing it. >>>> >>>> Well, it would things like mremap() just work and so on. Let's get rid of >>>> special cases if we are. >>> >>> Well, for RFC it wouldn't move context.vdso pointer on mremap(), >>> but as RFC is for x86_64 only, it will work on it. >>> Anyway, I don't think it would be hard to fix and make mremap() work on >>> other archs on post-RFC. >>> >>> The only corner-case I see for now is that /proc/self/map_files/<vdso_range> >>> will point to [vdso] which is broken link. But one could read this file >>> and dump/read vdso blob. >>> So, in the other words: if some program assumes that /proc/self/map_files/* >>> should always point to correct file, it may be confused. Not sure, maybe >>> it would be confused by orphane-file mappings, so having dangling link >>> there is just fine. >> >> I don't see anything a priori wrong with having map_files point >> somewhere, but it could be worth special casing it for special >> mappings to preserve existing behavior (no file at all). > > Yep, that could be easily done, will do. > Anyway, just curious - what may it break? > > Thanks on the reply, Andy. Does the patches set look sane for you?
I mean, the idea, not current big TODO list and nitpicks :) -- Dmitry