Re: truss is buggy?

2008-12-23 Thread Laszlo Nagy



It looks like the ptrace() syscall is the problem:

DESCRIPTION
 The ptrace() system call provides tracing and debugging
 facilities.  It allows one process (the tracing process) to
 control another (the traced process).  The tracing process must
 first attach to the traced process, and then issue a series of
 ptrace() system calls to control the execution of the process, as
 well as access process memory and register state.  For the
 duration of the tracing session, the traced process will be
 ``re-parented'', with its parent process ID (and resulting
 behavior) changed to the tracing process.

I imagine that also explains why a truss'ed program will die if you
kill -9 the truss process.  It looks like the reset parent when
trussing behaviour appeared back in 1996 (sys_process.s r1.21).  The
fix would probably be to store the pid of the tracing process somewhere
other than p_ppid...
  
My problem is that there is a process (namely, postgresql stats 
collector) that may have a bug inside. I was asked on the devel list to 
send in some traces so they can figure out why it is in an infinite 
loop, eating 100% CPU time.


However, when I start truss-ing this process, getppid() call changes 
return value. The postgresql stats collector periodically checks if the 
postmaster (its parent process) is alive or not, and will exit 
unconditionally if the postmaster has died. After I start truss-ing, the 
stats collector exits, making it impossible to debug the problem.


I'm not able to change the stats collector's source code, because I'm 
not a C programmer, and because it is a production server and this would 
be too risky.


I also tried to install strace, but it is not available on my platform 
(amd64). I cannot move to i386, because (apparently) the problem exists 
on this platform only. Is this a hopeless situation?


BTW I'm not an expert, but I believe that the process being debugged 
should not see any difference, and it should not be able to tell if it 
is debugged or not. I think this is a bug indeed.



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


truss is buggy?

2008-12-22 Thread Laszlo Nagy
Apparently, the truss trace tool has a bug. At least I was told that 
the tracer program should not change the return value of the getppid() 
call inside the traced process. Here is an example program:


%cat test.c
#include stdio.h

int main() {
  while(1) {
  sleep(5);
  printf(ppid = %d\n, getppid());
  }
}

%gcc -o test test.c
%./test
ppid = 47653
ppid = 47653
ppid = 47653 # Started truss -p 48864 here!
ppid = 49073
ppid = 49073
ppid = 49073


I cannot install strace, beacuse my platform is amd64. What other 
options do I have?


Thanks

  Laszlo


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: truss is buggy?

2008-12-22 Thread Dan Nelson
In the last episode (Dec 22), Laszlo Nagy said:
 Apparently, the truss trace tool has a bug. At least I was told
 that the tracer program should not change the return value of the
 getppid() call inside the traced process. Here is an example program:

It looks like the ptrace() syscall is the problem:

DESCRIPTION
 The ptrace() system call provides tracing and debugging
 facilities.  It allows one process (the tracing process) to
 control another (the traced process).  The tracing process must
 first attach to the traced process, and then issue a series of
 ptrace() system calls to control the execution of the process, as
 well as access process memory and register state.  For the
 duration of the tracing session, the traced process will be
 ``re-parented'', with its parent process ID (and resulting
 behavior) changed to the tracing process.

I imagine that also explains why a truss'ed program will die if you
kill -9 the truss process.  It looks like the reset parent when
trussing behaviour appeared back in 1996 (sys_process.s r1.21).  The
fix would probably be to store the pid of the tracing process somewhere
other than p_ppid...

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org