[observability-discuss] pmap -L, agent LWP and ^C

Michael Shapiro Thu, 22 Dec 2005 10:36:34 -0800 (PST)

> Current implementation of pmap -L and plgrp from the NUMA observability 
> toolkit uses agent LWP to issue system calls on behalf of another process.
> If the creator of the agent LWP is killed mid-way, the target process remains 
> stopped and requires explicit kick with prun.
> 
> Is there some way to automatically restart the target process when the 
> originator of agent LWP dies?
> 
> I tried calling Psetflags(PR_RLC) before Pcreate_agent() but it doesn't seem 
> to help.
> 
> - Alexander Kolbasov


PR_RLC has no effect on the agent.  In Solaris 9, after we were having other
issues like this due to certain ptool bugs, Adam enhanced prun(1) to clear
the agent.   One could imagine making PR_RLC have the effect in the kernel
of setting the agent running and making it immediately lwp_exit(), but there
are some potential issues there.  Namely, if the thing that injected the
agent was having it do something quite delicate to the process and died in
the middle of it, then you may have an intermediate state where removing
the agent and continuing the process actual causes deadlock or disaster.

Examples would include if the agent had grabbed locks, was doing some 
manipulation of memory mapping protections, performing text instrumentation,
or performing a sequence of address space modifications which hadn't yet
reached a sane state.  In any of those scenarios, you're kind of screwed.
And in some sense, leaving the agent in place makes the failure mode much
more clear: you can use prun(1), but you better be aware of the consequences.

The fundamental issue is that the kernel can only have knowledge of and know
how to run-on-last-close mechanisms that it fully implements, namely any
tracing bits (system calls, signals, faults) and watchpoints.  Anything
involve more complex interactions between debugger and victim means that
the kernel can't know how to restore the state of the world.  The analogy
here is to the more well-known case of a debugger leaving breakpoint
instructions behind: if a debugger dies and a process w/ no /proc controller
hits a FLT_BPT, we kill the process with SIGTRAP.  So in some sense, the
only other logical choice besides leave frozen/prun or kill -9 would be
for us to do the same thing: if you set RLC and the agent exists, kill
the process with SIGTRAP as part of setting it running.

-Mike

-- 
Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/

[observability-discuss] pmap -L, agent LWP and ^C

Reply via email to