Hi, I'm going to keep this short.
Objective: include perf data (specifically, AUX/Intel PT) in process core dumps. Obstacles and how this patchset deals with them: (1) Need to be able to have perf events running without consumer (perf record) running in the background. Detached events: a new flag to the perf syscall makes a 'detached' event, which exists after its file descriptor is released. Not all detached events are per-thread AUX events: this tries to take into account the need for system-wide persistent events too. (2) Need to be able to kill those events, so they need to be accessible after they are created. Event files: detached events exist as files in tracefs (at the moment), can be opened/mmaped/read/removed. (3) Ring buffer contents from these events needs to end up in the core dump file. Injecting perf ring buffer into the target task's address space. (4) Inheritance will have to allocate ring buffers for such events for this feature to be useful. A parentless detached event is created (with a ring buffer) upon inheritance, no output redirection, each event has its own ring buffer. (5) Sideeffect of (4) is that we can't use GFP_KERNEL pages for such ring buffers or else we'll have to fail inherit_event() (and, therefore, user's fork()) when they exhaust their mlock limit. Using shmemfs-backed pages for such a ring buffer and only pinning them while the corresponding target task is running. Other times these pages can be swapped out. (6) Ring buffer memory accounting needs to take this new arrangement into account: one user can use up at most NR_CPUS * buffer_size memory at any given point in time. Only account the first such event and undo the accounting when the last event is gone. (7) We'll also need to supply all the things that the [PT] decoder normally finds out via sysfs attributes, like clock ratios, capabilities, etc so that it also finds its way into the core dump file. "PMU info" structure is appended to the user page. I've also hack the perf tool to support all this, all these things can be found at [1]. I'm not posting the tooling patches though, them being thoroughly ugly and proof-of-concept. In short, perf record will create detached events with '--detached' and afterwards will open detached events via their path in tracefs. [1] https://git.kernel.org/pub/scm/linux/kernel/git/ash/linux.git/log/?h=perf-detached-shmem-wip Alexander Shishkin (17): perf: Allow mmapping only user page perf: Factor out mlock accounting tracefs: De-globalize instances' callbacks tracefs: Add ->unlink callback to tracefs_dir_ops perf: Introduce detached events perf: Add buffers to the detached events perf: Add pmu_info to user page perf: Allow inheritance for detached events perf: Use shmemfs pages for userspace-only per-thread detached events perf: Implement pinning and scheduling for SHMEM events perf: Implement mlock accounting for shmem ring buffers perf: Track pinned events per user perf: Re-inject shmem buffers after exec perf: Add ioctl(REATTACH) for detached events perf: Allow controlled non-root access to detached events perf/x86/intel/pt: Add PMU info perf/x86/intel/bts: Add PMU info arch/x86/events/intel/bts.c | 20 +- arch/x86/events/intel/pt.c | 23 +- arch/x86/events/intel/pt.h | 11 + fs/tracefs/inode.c | 71 +++- include/linux/perf_event.h | 33 ++ include/linux/sched/user.h | 6 + include/linux/tracefs.h | 3 +- include/uapi/linux/perf_event.h | 15 + kernel/events/core.c | 526 +++++++++++++++++++++++------ kernel/events/internal.h | 27 +- kernel/events/ring_buffer.c | 730 ++++++++++++++++++++++++++++++++++++-- kernel/trace/trace.c | 8 +- kernel/user.c | 1 + 13 files changed, 1315 insertions(+), 159 deletions(-) -- 2.14.1