On Mon, Feb 25, 2019 at 11:04:42PM +0900, Masami Hiramatsu wrote: > Hi, > > Here is an RFC series of probe-event to support user-space access > methods, which we discussed in previous thread. > > https://lkml.kernel.org/r/20190220171019.5e81a4946b56982f324f7...@kernel.org > > So in this thread, it is clear that current probe_kernel_read() > and strncpy_from_unsafe() are not enough to access user-space > variables from kprobe events on some arch. On such arch, user > address space and kernel address space can overlap so we have > to change the memory segment to user-mode before copying. > But probe_kernel_read() is designed to access primarily kernel > memory, it may fail to get, or get unexpected value on such > arch. So we need to expand kprobe fetcharg to support new options > for such case. > > For user-space access extension, this series adds 2 features, > "ustring" type and user-space dereference syntax. "ustring" is > used for recording a null-terminated string in user-space from > kprobe events. > > "ustring" type is easy, it is able to use instead of "string" > type, so if you want to record a user-space string via > "__user char *", you can use ustring type instead of string. > For example, > > echo 'p do_sys_open path=+0($arg2):ustring' >> kprobe_events > > will record the path string from user-space. > > The user-space dereference syntax is also simple. Thi just > adds 'u' prefix before an offset value. > > +|-u<OFFSET>(<FETCHARG>) > > e.g. +u8(%ax), +u0(+0(%si)) > > This is more generic. If you want to refer the variable in user- > space from its address or access a field in data structure in > user-space, you need to use this. > > For example, if you probe do_sched_setscheduler(pid, policy, > param) and record param->sched_priority, you can add new > probe as below; > > p do_sched_setscheduler priority=+u0($arg3) > > Actually, with this feature, "ustring" type is not absolutely > necessary, because these are same meanings. > > +0($arg2):ustring == +u0($arg2):string > > Perhups, we may be better removing "ustring" and just introducing > this user-space dereference syntax... > > Note that kprobe event provides these methods, but it doesn't > change it from kernel to user automatically because we do not > know whether the given address is in userspace or kernel on > some arch. > Moreover, from perf-probe, at this moment it is not able to > switch. Since __user is not for compiler but checker, we have > no clue which data structure is in user-space, in debuginfo. > > BTW, according to Linus's comment, I implemented probe_user_read() > and strncpy_from_unsafe_user() APIs. And since those use > "access_ok()" inside it, if CONFIG_DEBUG_ATOMIC_SLEEP=y on x86, > it will get a warn message at once. It should be solved before > merging this series.
I was wondering why access_ok() can sleep. In the arm64 and x86 implementation, I don't see access_ok() itself causing a user pointer dereference access that can cause a page fault. It seems to just be checking the validity of the ranges. Any idea why the access_ok() code has these comments? "Context: User context only. This function may sleep if pagefaults are enabled." My _guess_ is this is because whatever calls access_ok() may also call something else that *does* fault next, if that's the case then that WARN_ON_IN_IRQ() in access_ok() is fine, but at least I guess the comments should be more clear that it is not access_ok() itself that sleeps. thanks for any help on understanding this, - Joel > > Thank you, > > --- > > Masami Hiramatsu (4): > uaccess: Make sure kernel_uaccess_faults_ok is updated before pagefault > uaccess: Add non-pagefault user-space read functions > tracing/probe: Add ustring type for user-space string > tracing/probe: Support user-space dereference > > > Documentation/trace/kprobetrace.rst | 13 ++- > Documentation/trace/uprobetracer.rst | 9 +- > fs/namespace.c | 2 > include/linux/uaccess.h | 13 +++ > kernel/trace/trace.c | 7 +- > kernel/trace/trace_kprobe.c | 65 ++++++++++++++++ > kernel/trace/trace_probe.c | 39 ++++++++-- > kernel/trace/trace_probe.h | 3 + > kernel/trace/trace_probe_tmpl.h | 36 +++++++-- > kernel/trace/trace_uprobe.c | 19 +++++ > mm/maccess.c | 138 > ++++++++++++++++++++++++++++++---- > 11 files changed, 302 insertions(+), 42 deletions(-) > > -- > Masami Hiramatsu (Linaro) <mhira...@kernel.org>