Roland,

On Thu, Apr 05, 2007 at 02:41:39PM -0700, Roland McGrath wrote:
> > I am running into some problems with pfmon when attaching to a thread
> > inside a multi-threaded program.  I pass the tid (not pid) to the
> > ptrace(), wait*() calls. 
> > 
> > During the attachment, I do the following sequence:
> >      ptrace(PTRACE_ATTACH, tid, NULL, NULL);
> >      ret = waitpid(pid, &status, WUNTRACED);
> 
> Do you mean:
>       ret = waitpid(tid, &status, WUNTRACED);
> here?
> 
Yes.

Here is the strace I get:

ptrace(PTRACE_ATTACH, 19779, 0, 0)      = 0
--- SIGCHLD (Child exited) @ a000000000010621 (2aff00004d43) ---
wait4(19779, 0x60000fffffb7b4b4, WUNTRACED, NULL) = -1 ECHILD (No child 
processes)

The ps command shows:
  PID  SPID TTY          TIME CMD
19652 19652 pts/4    00:00:00 tcsh
19661 19661 pts/4    00:00:00 vim
19778 19778 pts/4    00:00:00 mytest 
19778 19779 pts/4    00:00:00 mytest
19778 19780 pts/4    00:00:00 mytest
19786 19786 pts/4    00:00:00 ps

> > The wait fails with errno=10 (ENOCHILD). If I remove it, I can go past
> > this point.
> 
> Which kernel is this?  Once PTRACE_ATTACH returns success for TID, then
> waitpid on that TID should never produce ECHILD (unless maybe the traced
> process is doing a multithreaded exec right then).  A notable exception is
> that a security module like SELinux can refuse the security_task_wait, and
> this leads to false ECHILD failures.  I recently posted a patch on lkml for
> this (so you'd get EPERM or something else instead of ECHILD).  If you are
> using SELinux, check for avc messages in dmesg.  (I discovered this false
> ECHILD behavior because of a bug in SELinux and/or its standard policy that
> broke using gdb on certain processes.)

This is with 2.6.21-rc5/ia64. I have seen the same behavior with 2.6.20 on i386.
The thread is not doing exec(), it is likely blocked in sleep(). No SELinux.

> 
> PTRACE_ATTACH generates a SIGSTOP for the TID attached, as if by
> tgkill/tkill.  Hence, if it's not already stopped, then it should dequeue
> that SIGSTOP and get to a ptrace signal stop soon, so that a waitpid by the
> ptracer should return.
> 
> > I do understand that SIGSTOP probably applies to the entire process and
> > not just that one thread. Yet it seems strange that the notification is
> > not propagated.
> 
> The SIGSTOP is queued for the one thread if you use tgkill/tkill or ptrace
> to send it, and for the process as a whole if you use kill to send it.  The

Well, I am using regular kill(). I did not know about tkill(). This one seems
to accept regular pid as well, right?


> only instantaneous effect it has is to clear all pending SIGCONT signals
> from all queues.  In the latter case, any thread in the process might be
> the first that happens to dequeue it, but in the former it is on the one
> thread's private queue.  When a thread dequeues a SIGSTOP, if that thread
> is ptrace'd it will stop for ptrace; only when the signal is delivered, by
> a PTRACE_CONT or similar call, or if there is no ptracer for that thread,
> will the process-wide effect of the SIGSTOP take place.
> 
> I hope that helps, though I don't think I know for sure precisely what you
> are doing and what you are seeing so as to try to explain a specific scenario.
> 

-- 

-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to