Hello! Here is a riddle for you :) We are experiencing a strange problem when using Psyscall:
We have a MASTER process that grabs a SLAVE process in order to monitor hardware counters on behalf of the SLAVE. It does so via the following commands: /* Initialization */ 1. Grab the SLAVE process: pctx_capture() 2. Set up SLAVE to count its instructions: cpc_bind_pctx + other cpc library init routines 3. Setting MASTER to detect when SLAVE stops: write_cm(pid, PCSTRACE, NULL); (SLAVE will stop on SIGEMT signal, which will be thrown when the instruction counter in the slave overflows) 4. Setting the SLAVE to run: write_cm(pid, PCRUN, NULL) Now, when SLAVE?s instruction counter overflow it is stopped, MASTER detects this. Then the following happens: /* Control loop */ 1. MASTER calls Psyscall on the SLAVE 2. Psyscall is set up to call cpc_request_preset() and cpc_set_restart() in the SLAVE 3. MASTER sets SLAVE running again: write_cm(pid, PCRUN, NULL) 4. SLAVE runs until its instruction counter overflows again, at which point the sequence executed in the control loop repeats. Control loop usually executes successfully a dozen times. After that, SLAVE either crashes with SIGFAULT or exits prematurely. We don?t know why. When SLAVE runs by itself, it never crashes (it?s a simple program that adds up a bunch of number in the loop.) So we figure this is due to MASTER messing with SLAVE. Core file does not give us much information. We tried substituting item #2 in control loop with a routine that re-intializes the hardware counters from scratch. The result was the same: SLAVE crashes or exits. This is little wonder: counter are re-initialized by calling cpc_bind_pctx, and cpc_bind_pctx calls Psyscall! So we blame the crash on the fact that we use Psyscall. Any ideas how we might approach debugging this? Are we doing something illegal? Thank you. -- This messages posted from opensolaris.org
