> Current implementation of pmap -L and plgrp from the NUMA observability > toolkit uses agent LWP to issue system calls on behalf of another process. > If the creator of the agent LWP is killed mid-way, the target process remains > stopped and requires explicit kick with prun. > > Is there some way to automatically restart the target process when the > originator of agent LWP dies? > > I tried calling Psetflags(PR_RLC) before Pcreate_agent() but it doesn't seem > to help. > > - Alexander Kolbasov
PR_RLC has no effect on the agent. In Solaris 9, after we were having other issues like this due to certain ptool bugs, Adam enhanced prun(1) to clear the agent. One could imagine making PR_RLC have the effect in the kernel of setting the agent running and making it immediately lwp_exit(), but there are some potential issues there. Namely, if the thing that injected the agent was having it do something quite delicate to the process and died in the middle of it, then you may have an intermediate state where removing the agent and continuing the process actual causes deadlock or disaster. Examples would include if the agent had grabbed locks, was doing some manipulation of memory mapping protections, performing text instrumentation, or performing a sequence of address space modifications which hadn't yet reached a sane state. In any of those scenarios, you're kind of screwed. And in some sense, leaving the agent in place makes the failure mode much more clear: you can use prun(1), but you better be aware of the consequences. The fundamental issue is that the kernel can only have knowledge of and know how to run-on-last-close mechanisms that it fully implements, namely any tracing bits (system calls, signals, faults) and watchpoints. Anything involve more complex interactions between debugger and victim means that the kernel can't know how to restore the state of the world. The analogy here is to the more well-known case of a debugger leaving breakpoint instructions behind: if a debugger dies and a process w/ no /proc controller hits a FLT_BPT, we kill the process with SIGTRAP. So in some sense, the only other logical choice besides leave frozen/prun or kill -9 would be for us to do the same thing: if you set RLC and the agent exists, kill the process with SIGTRAP as part of setting it running. -Mike -- Mike Shapiro, Solaris Kernel Development. blogs.sun.com/mws/
