Re: [9fans] 9vx, kproc and *double sleep*
> FYI, I've wondered if they had the same problem in go runtime because > I suspected the code to be quite similar. And I think go team fixed the > problem in ready() equivalent in go runtime, by adding a flag in Proc > equivalent so the proc (G in go) is put back to the run queue ... are you sure? that big scheduler lock looks like it's doing the job of splhi() to me. - erik
Re: [9fans] unknown error message
On Sat Jun 12 17:46:52 EDT 2010, mathieu.lonja...@gmail.com wrote: > Hello, > > I doubt it will help me debug my problems but out of curiosity, can > anyone tell me where this assertion is coming from please? Even better, > if you have a clue about what I was doing wrong at this point ;) > > 15014.35 a < t->size && t->size < b: assertion failed > btfs 15014: suicide: sys: trap: page fault pc=0x0001079b > ron's excellent suggestion notwithstanding, this looks to be a memory screw: ; g 'a < t->size && t->size < b' /sys/src /sys/src/libc/port/pool.c:218: assert(a < t->size && t->size < b); setting the various parinoia flags (pool(2)) will probablly help. - erik
Re: [9fans] 9vx, kproc and *double sleep*
On Sat, Jun 12, 2010 at 3:15 PM, Charles Forsyth wrote: >>in Linux parlance, Plan 9 is a "preemptible" kernel. Interrupt handlers can >>be interrupted, so to speak. > > interrupt handlers are not normally interruptible during the interrupt > processing, but rather at the end (eg, when anyhigher, anyready or preempted > is called). Yes, I was not careful enough in how I said that. For those who wonder what I was trying to say, see trap(); note what happens after the isr() is called and look where preempted() is called. But, all this said, the problems we're seeing on 9vx are strangely similar to the ones I had on Xen when code that was not supposed to be interrupted got interrupted. There may be no real connection at all however. ron
Re: [9fans] unknown error message
On Sat, Jun 12, 2010 at 9:42 PM, Mathieu Lonjaret wrote: > Hello, > > I doubt it will help me debug my problems but out of curiosity, can > anyone tell me where this assertion is coming from please? Even better, > if you have a clue about what I was doing wrong at this point ;) > > 15014.35 a < t->size && t->size < b: assertion failed > btfs 15014: suicide: sys: trap: page fault pc=0x0001079b The proc will still be there in 'broken' state (I love that feature of Plan 9 ... actually implemented it on Linux but doubt I could ever get them to take that patch). acid 15014 stk() it's great. ron
Re: [9fans] 9vx, kproc and *double sleep*
>in Linux parlance, Plan 9 is a "preemptible" kernel. Interrupt handlers can be >interrupted, so to speak. interrupt handlers are not normally interruptible during the interrupt processing, but rather at the end (eg, when anyhigher, anyready or preempted is called). processes running at non-interrupt level in the kernel can be interrupted unless they are splhi or using ilock. >Except for the clock interrupt handler that's only because the clock interrupt handler directly or indirectly (eg, via sched) calls spllo, and other trap or interrupt handlers could do that.
[9fans] unknown error message
Hello, I doubt it will help me debug my problems but out of curiosity, can anyone tell me where this assertion is coming from please? Even better, if you have a clue about what I was doing wrong at this point ;) 15014.35 a < t->size && t->size < b: assertion failed btfs 15014: suicide: sys: trap: page fault pc=0x0001079b Cheers, Mathieu
Re: [9fans] 9vx, kproc and *double sleep*
There's kind of an interesting similarity here to what I had to deal with on the Xen port. So, a few random thoughts, probably useless, from early problems of this sort I've had. - in Linux parlance, Plan 9 is a "preemptible" kernel. Interrupt handlers can be interrupted, so to speak. Except for the clock interrupt handler: you have to check the interrupt number to make sure you are not pre-empting the clock interrupt handler. Sorry if I'm not saying this very well. On Xen and lguest I had to make sure of this (I mention this in the lguest port talk slides) - splhi -- it's not a true splhi in some sense; is it possible that some code is sneaking in and running even when you splhi()? Could this explain it? - What other aspect of the transition from hardware to software-in-sandbox might explain a non-premptible bit of code getting pre-empted? OK, back to fixing my 1990 civic :-) ron -
Re: [9fans] 9vx, kproc and *double sleep*
> - does richard miller's alternate implementation of wakeup > solve this problem. No, it doesn't.
Re: [9fans] 9vx, kproc and *double sleep*
9fans, FYI, I've wondered if they had the same problem in go runtime because I suspected the code to be quite similar. And I think go team fixed the problem in ready() equivalent in go runtime, by adding a flag in Proc equivalent so the proc (G in go) is put back to the run queue ... Phil; In go/src/pkg/runtime/proc.c: >-< // Mark g ready to run. Sched is already locked. G might be running // already and about to stop. The sched lock protects g->status from // changing underfoot. static void readylocked(G *g) { if(g->m){ // Running on another machine. // Ready it when it stops. g->readyonstop = 1; return; } ... >-< // Scheduler loop: find g to run, run it, repeat. static void scheduler(void) { lock(&sched); ... if(gp->readyonstop){ gp->readyonstop = 0; readylocked(gp); } ...
Re: [9fans] 9vx, kproc and *double sleep*
Hi, I really think the spin model is good. And in fact, I really think current sleep/wakeup/postnote code is good. However, this model makes the assumption that plan9 processes are really Machs and not coroutines. I think we need a larger model, which includes the scheduler. I mean a model that describes a set of processes (in spin meaning), picking one kind of coroutine objects from a run queue (shared by all spin processes) and then calling sleep/wakeup/postnote a few times before putting the coroutine object back to the run queue. These spin processes would represent the cpus (or Machs) while coroutine objects would represent the plan9 processes. I even think we don't have to simulate the fact these processes can be interrupted. Again, the change I proposed is not about sleep/wakeup/postnote, but because wakeup() is ready()'ing the awakened process while the mach on which sleep() runs is still holdind a pointer (up) to the awakened process and can later (in schedinit()) assumes it is safe to access (up)->state. Because of this, schedinit() can tries to call ready() on (up), because because (up)->state may have been changed to Running by a third mach entity. This change only updates schedinit() (and tries) to make (up)->state access safe when it happens after a sleep() is awakened. Phil; in any event, given the long history with sleep/wakeup, changes should be justified with a promula model. the current model omits the spl* and the second lock. (http://swtch.com/spin/sleep_wakeup.txt). - erik