On Wed, Apr 2, 2025 at 12:20 AM Michal Hocko <[email protected]> wrote: > > On Tue 01-04-25 15:04:11, Andrii Nakryiko wrote: > > On Tue, Apr 1, 2025 at 2:31 PM Steven Rostedt <[email protected]> wrote: > > > > > > On Tue, 1 Apr 2025 11:40:21 -0700 > > > Andrii Nakryiko <[email protected]> wrote: > > > > > > Hi Andrii, > > > > > > > It is useful to be able to access current->mm to, say, record a bunch of > > > > VMA information right before the task exits (e.g., for stack > > > > symbolization reasons when dealing with short-lived processes that exit > > > > in the middle of profiling session). We currently do have > > > > trace_sched_process_exit() in the exit path, but it is called a bit too > > > > late, after exit_mm() resets current->mm to NULL, which makes it > > > > unsuitable for inspecting and recording task's mm_struct-related data > > > > when tracing process lifetimes. > > > > > > My fear of adding another task exit trace event is that it will get a > > > bit confusing as that we now have trace_sched_process_exit() and also > > > trace_task_exit() with slightly different semantics. > > > > > > How about adding a trace_exit_mm()? Add that to the exit_mm() code? > > > > This is kind of the worst of both worlds, no? We still have a new > > tracepoint, but this one can't tell if it's a `group_dead` situation > > or not... I can pass group_dead into exit_mm(), but it will be just > > for the sake of that new tracepoint. > > Is it important to tell the difference between thread and the > whole process group exiting?
Yes, it often is important. In the sense that process group exiting would trigger extra information gathering/aggregation/sending compared to just process (thread) existing. Both are important to track, but it's useful to be able to differentiate. > > Please keep in mind that even group exit doesn't really imply the mm is > going away (clone allows CLONE_VM without CLONE_SIGNAL - i.e. mm could > be shared outside of thread group). Yep, I realize that, and as Steven said, for cases like that a dedicated mm-specific tracepoint might be useful. But I'm not really aware of use cases caring about mm_struct itself, in isolation. It's all usually in the context of thread/process exit, and mm-related information is one of a bunch of extra information that's useful at that point. It's just so that with current sched_process_exit() tracepoint placement we (e.g., BPF program) gets control a bit too late. But it seems like everyone is OK just shifting sched_process_exit() to before exit_mm(), which should cover the use cases I had in mind. We can always add mm-specific tracepoint when the need arises. Thanks everyone for discussion and suggestions! > -- > Michal Hocko > SUSE Labs
