Hello! Here is a riddle for you :)

We are experiencing a strange problem when using Psyscall:

We have a MASTER process that grabs a SLAVE process in order to monitor 
hardware counters on behalf of the SLAVE. It does so via the following commands:

/* Initialization */
1.      Grab the SLAVE process: pctx_capture()
2.      Set up SLAVE to count its instructions: cpc_bind_pctx + other cpc 
library init routines
3.      Setting MASTER to detect when SLAVE stops: write_cm(pid, PCSTRACE, 
NULL); (SLAVE will stop on SIGEMT signal, which will be thrown when the 
instruction counter in the slave overflows)
4.      Setting the SLAVE to run: write_cm(pid, PCRUN, NULL)

Now, when SLAVE?s instruction counter overflow it is stopped, MASTER detects 
this. Then the following happens:

/* Control loop */
1.      MASTER calls Psyscall on the SLAVE
2.      Psyscall is set up to call cpc_request_preset() and cpc_set_restart() 
in the SLAVE
3.      MASTER sets SLAVE running again: write_cm(pid, PCRUN, NULL)
4.      SLAVE runs until its instruction counter overflows again, at which 
point the sequence executed in the control loop repeats.

Control loop usually executes successfully a dozen times. After that, SLAVE 
either crashes with SIGFAULT or exits prematurely. We don?t know why. When 
SLAVE runs by itself, it never crashes (it?s a simple program that adds up a 
bunch of number in the loop.) So we figure this is due to MASTER messing with 
SLAVE. Core file does not give us much information.

We tried substituting item #2 in control loop with a routine that re-intializes 
the hardware counters from scratch. The result was the same: SLAVE crashes or 
exits. This is little wonder: counter are re-initialized by calling 
cpc_bind_pctx, and cpc_bind_pctx calls Psyscall! 

So we blame the crash on the fact that we use Psyscall.

Any ideas how we might approach debugging this? Are we doing something illegal?

Thank you.
 
 
--
This messages posted from opensolaris.org

Reply via email to