2016-08-26 2:00 GMT+03:00 H. Peter Anvin <h...@zytor.com>:
> On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454...@gmail.com> 
> wrote:
>>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <h...@zytor.com>:
>>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
>><dsafo...@virtuozzo.com> wrote:
>>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>>Also for RFC time it builds only on x86_64.
>>>>
>>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>>vm_file
>>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>>[1].
>>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>>After patches vDSO VMA now has inode and is just a private file
>>>>mapping:
>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>> [vvar]
>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>> [vdso]
>>>>
>>>>Then I introduce interface in uprobe_events to insert uprobes in
>>vdso.
>>>>FWIW:
>>>>  [~]# cd kernel/linux
>>>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>>Symbol table '.dynsym' contains 11 entries:
>>>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>>>clock_gettime@@LINUX_2.6
>>>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_gettimeofday@@LINUX_2.6
>>>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>>>gettimeofday@@LINUX_2.6
>>>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_time@@LINUX_2.6
>>>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12
>>time@@LINUX_2.6
>>>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_clock_gettime@@LINUX_2.6
>>>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_getcpu@@LINUX_2.6
>>>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>>>getcpu@@LINUX_2.6
>>>>  [~]# cd /sys/kernel/debug/tracing/
>>>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>>>  [tracing]# echo 1 > events/uprobes/enable
>>>>  [tracing]# su test # it has UID=1001
>>>>  [tracing]$ date
>>>>  Thu Aug 25 17:19:29 MSK 2016
>>>>  [tracing]$ exit
>>>>  [tracing]# cat trace
>>>>  # tracer: nop
>>>>  #
>>>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>>>  #
>>>>  #                              _-----=> irqs-off
>>>>  #                             / _----=> need-resched
>>>>  #                            | / _---=> hardirq/softirq
>>>>  #                            || / _--=> preempt-depth
>>>>  #                            ||| /     delay
>>>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>>  #              | |       |   ||||       |         |
>>>>             bash-11560 [001] d...   316.470236: time:
>>(0x7ffcacebae30)
>>>>     bash-11560 [001] d...   316.471436: gettimeofday:
>>(0x7ffcacebac50)
>>>>             bash-11560 [001] d...   316.477550: time:
>>(0x7ffcacebae30)
>>>>             bash-11560 [001] d...   316.477655: time:
>>(0x7ffcacebae30)
>>>>   mktemp-11568 [001] d...   316.479589: gettimeofday:
>>(0x7ffc603f0c50)
>>>>    date-11571 [001] d...   316.481890: clock_gettime:
>>(0x7ffec9db58d0)
>>>>[...]
>>>>
>>>>If this approach will be decided as fine, I will prepare a better
>>>>version,
>>>>fixing the following things:
>>>>o put vdsofs in generic fs/* dir
>>>>o support other archs and vdso blobs
>>>>o remove BUG_ON()'s and UID==1001 check
>>>>o remove extern's and use headers only
>>>>o refactor code in create_trace_uprobe()
>>>>o add some state to (struct trace_uprobe), so i.e., `cat
>>uprobe_events`
>>>>will
>>>>  print those uprobes as vdso-based
>>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>>o prepare nice patches set?
>>>>
>>>>So, opinions? Is it worth to add something like this?
>>>>
>>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>>
>>>>Dmitry Safonov (3):
>>>>  x86/vdso: create vdso file, use it for mapping
>>>>  uprobe: drop isdigit() check in create_trace_uprobe
>>>>  uprobe: add vdso support
>>>>
>>>>Cc: Oleg Nesterov <o...@redhat.com>
>>>>Cc: Al Viro <v...@zeniv.linux.org.uk>
>>>>Cc: Steven Rostedt <rost...@goodmis.org>
>>>>Cc: Andy Lutomirski <l...@amacapital.net>
>>>>Cc: Thomas Gleixner <t...@linutronix.de>
>>>>Cc: Ingo Molnar <mi...@redhat.com>
>>>>Cc: "H. Peter Anvin" <h...@zytor.com>
>>>>Cc: x...@kernel.org
>>>>Cc: Dmitry Safonov <0x7f454...@gmail.com>
>>>>
>>>>arch/x86/entry/vdso/vma.c   | 148
>>>>++++++++++++++++++++++++++++++++++++++++++--
>>>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>>
>>> I think there is a lot to be said for this idea.  However, a private
>>mapping is definitely wrong for the vvar data; for the vdso code it
>>could be considered either way I suppose.
>>
>>Thanks on your reply.
>>As you could see, I preserved pure mapping of pfn for vvar:
>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>> [vvar]
>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>> [vdso]
>>(no inode number).
>>I also think it would be useless to do the same to vvar as it
>>has just data and there is no point in probing it.
>
> Well, it would things like mremap() just work and so on.  Let's get rid of 
> special cases if we are.

Well, for RFC it wouldn't move context.vdso pointer on mremap(),
but as RFC is for x86_64 only, it will work on it.
Anyway, I don't think it would be hard to fix and make mremap() work on
other archs on post-RFC.

The only corner-case I see for now is that /proc/self/map_files/<vdso_range>
will point to [vdso] which is broken link. But one could read this file
and dump/read vdso blob.
So, in the other words: if some program assumes that /proc/self/map_files/*
should always point to correct file, it may be confused. Not sure, maybe
it would be confused by orphane-file mappings, so having dangling link
there is just fine.

-- 
             Dmitry

Reply via email to