Re: [9fans] 9vx, kproc and *double sleep*
it's interesting that neither of philippe's changes, however justified, make any visible difference to 9vx on my ubuntu 10.04LTS system: 9vx still fails almost immediately. that's consistent with 9vx behaving itself as well as on any other platform until i changed the linux and/or ubuntu version. i'll see if i can brave gdb some more to find out.
Re: [9fans] 9vx, kproc and *double sleep*
Charles, Can you please give us stack information with gdb ? Phil; On Mon, 2010-06-14 at 20:15 +0100, Charles Forsyth wrote: it's interesting that neither of philippe's changes, however justified, make any visible difference to 9vx on my ubuntu 10.04LTS system: 9vx still fails almost immediately. that's consistent with 9vx behaving itself as well as on any other platform until i changed the linux and/or ubuntu version. i'll see if i can brave gdb some more to find out.
Re: [9fans] 9vx, kproc and *double sleep*
If anyone can help me with some valgrind patches we can see if valgrind can be useful. Charles, I am really puzzled about your ubuntu experience. Oh, wait, can you set LANG=C and try again? Or is it? BTW when you get the immediate explosion does a window even ever come up or does it die before that? ron
Re: [9fans] 9vx, kproc and *double sleep*
On Sun, Jun 13, 2010 at 11:03 AM, Philippe Anel x...@bouyapop.org wrote: I tried with adding : while (p-mach) sched_yield(); at the end of sched.c:^runproc(), before the return. It seems to work well. What do you think ? Not sure I understand all the implications but I'll try anything at this point :-) I'm trying now with -O3 back on. The -O3 was a red herring. ron
Re: [9fans] 9vx, kproc and *double sleep*
- splhi -- it's not a true splhi in some sense; is it possible that some code is sneaking in and running even when you splhi()? Could this explain it? The error Philippe has found is only indirectly related to splhi(). It's a race between a process in sleep() returning to the scheduler on cpu A, and the same process being readied and rescheduled on cpu B after the wakeup. On native plan 9, A always wins the race because it runs splhi() and the code path from sleep to schedinit (where up-state==Running is checked) is shorter than the code path from runproc to the point in sched where up-state is set to Running. But the fact that this works is timing-dependent: if cpu A for some reason ran slower than cpu B, it could lose the race even without being interrupted. As Philippe explained, in 9vx the cpus are being simulated by threads. Because these threads are being scheduled by the host operating system, the virtual cpus can appear to be running at different speeds or to pause at awkward moments. Even without any preemption at the plan 9 level of abstraction, the timing assumption which prevents the sleep - reschedule race is no longer guaranteed.
Re: [9fans] 9vx, kproc and *double sleep*
that's only because the clock interrupt handler directly or indirectly (eg, via sched) calls spllo, and other trap or interrupt handlers could do that. wouldn't that be fatal with shared 8259 interrupts? - erik
Re: [9fans] 9vx, kproc and *double sleep*
In fact you're right, and this shows us this would only happens to 9vx. Indeed, the proc is a kproc and thus is not scheduled by the 9vx/a/proc.c scheduler, but by the one in 9vx/sched.c ... where dequeueproc() is not called and where p-mach is not checked. Thank you ! Phil; Richard Miller wrote: Philippe said: Again, the change I proposed is not about sleep/wakeup/postnote, but because wakeup() is ready()'ing the awakened process while the mach on which sleep() runs is still holdind a pointer (up) to the awakened process and can later (in schedinit()) assumes it is safe to access (up)-state. Because of this, schedinit() can tries to call ready() on (up), because because (up)-state may have been changed to Running by a third mach entity. and I tried to summarize: It's a race between a process in sleep() returning to the scheduler on cpu A, and the same process being readied and rescheduled on cpu B after the wakeup. But after further study of proc.c, I now believe we were both wrong. A process on the ready queue can only be taken off the queue and scheduled by calling dequeueproc(), which contains this: /* * p-mach==0 only when process state is saved */ if(p == 0 || p-mach){ unlock(runq); return nil; } So the process p can only be scheduled (i.e. p-state set to Running) if p-mach==nil. The only place p-mach gets set to nil is in schedinit(), *after* the test for p-state==Running. This seems to mean there isn't a race after all, and Philippe's thought experiment is impossible. Am I missing something?
Re: [9fans] 9vx, kproc and *double sleep*
Should be, at least I think so. But I don't even know yet how we can do this. :) I think we can use the go trick and postpone the call to ready() while p-mach != 0. But I don't know where yet. Phil; ron minnich wrote: On Sun, Jun 13, 2010 at 7:26 AM, Philippe Anel x...@bouyapop.org wrote: In fact you're right, and this shows us this would only happens to 9vx. Indeed, the proc is a kproc and thus is not scheduled by the 9vx/a/proc.c scheduler, but by the one in 9vx/sched.c ... where dequeueproc() is not called and where p-mach is not checked. So is changing 9vx/sched.c to do these two steps the real fix? ron
Re: [9fans] 9vx, kproc and *double sleep*
Hi, The solution is not that simple. I mean when kprocs go to sleep through the call to psleep(), a pwakeup() is required. We cannot simply change the following sched.c:^runproc() part : while((p = kprocq.head) == nil){ by: while(((p = kprocq.head) == nil) || p-mach){ The a/proc.c scheduler is different as it goes idle and can be awakened by an interrupt (or a working kproc in 9vx). Phil; So is changing 9vx/sched.c to do these two steps the real fix? ron
Re: [9fans] 9vx, kproc and *double sleep*
Hi, I really think the spin model is good. And in fact, I really think current sleep/wakeup/postnote code is good. However, this model makes the assumption that plan9 processes are really Machs and not coroutines. I think we need a larger model, which includes the scheduler. I mean a model that describes a set of processes (in spin meaning), picking one kind of coroutine objects from a run queue (shared by all spin processes) and then calling sleep/wakeup/postnote a few times before putting the coroutine object back to the run queue. These spin processes would represent the cpus (or Machs) while coroutine objects would represent the plan9 processes. I even think we don't have to simulate the fact these processes can be interrupted. Again, the change I proposed is not about sleep/wakeup/postnote, but because wakeup() is ready()'ing the awakened process while the mach on which sleep() runs is still holdind a pointer (up) to the awakened process and can later (in schedinit()) assumes it is safe to access (up)-state. Because of this, schedinit() can tries to call ready() on (up), because because (up)-state may have been changed to Running by a third mach entity. This change only updates schedinit() (and tries) to make (up)-state access safe when it happens after a sleep() is awakened. Phil; in any event, given the long history with sleep/wakeup, changes should be justified with a promula model. the current model omits the spl* and the second lock. (http://swtch.com/spin/sleep_wakeup.txt). - erik
Re: [9fans] 9vx, kproc and *double sleep*
9fans, FYI, I've wondered if they had the same problem in go runtime because I suspected the code to be quite similar. And I think go team fixed the problem in ready() equivalent in go runtime, by adding a flag in Proc equivalent so the proc (G in go) is put back to the run queue ... Phil; In go/src/pkg/runtime/proc.c: - // Mark g ready to run. Sched is already locked. G might be running // already and about to stop. The sched lock protects g-status from // changing underfoot. static void readylocked(G *g) { if(g-m){ // Running on another machine. // Ready it when it stops. g-readyonstop = 1; return; } ... - // Scheduler loop: find g to run, run it, repeat. static void scheduler(void) { lock(sched); ... if(gp-readyonstop){ gp-readyonstop = 0; readylocked(gp); } ...
Re: [9fans] 9vx, kproc and *double sleep*
- does richard miller's alternate implementation of wakeup solve this problem. No, it doesn't.
Re: [9fans] 9vx, kproc and *double sleep*
There's kind of an interesting similarity here to what I had to deal with on the Xen port. So, a few random thoughts, probably useless, from early problems of this sort I've had. - in Linux parlance, Plan 9 is a preemptible kernel. Interrupt handlers can be interrupted, so to speak. Except for the clock interrupt handler: you have to check the interrupt number to make sure you are not pre-empting the clock interrupt handler. Sorry if I'm not saying this very well. On Xen and lguest I had to make sure of this (I mention this in the lguest port talk slides) - splhi -- it's not a true splhi in some sense; is it possible that some code is sneaking in and running even when you splhi()? Could this explain it? - What other aspect of the transition from hardware to software-in-sandbox might explain a non-premptible bit of code getting pre-empted? OK, back to fixing my 1990 civic :-) ron -
Re: [9fans] 9vx, kproc and *double sleep*
in Linux parlance, Plan 9 is a preemptible kernel. Interrupt handlers can be interrupted, so to speak. interrupt handlers are not normally interruptible during the interrupt processing, but rather at the end (eg, when anyhigher, anyready or preempted is called). processes running at non-interrupt level in the kernel can be interrupted unless they are splhi or using ilock. Except for the clock interrupt handler that's only because the clock interrupt handler directly or indirectly (eg, via sched) calls spllo, and other trap or interrupt handlers could do that.
Re: [9fans] 9vx, kproc and *double sleep*
FYI, I've wondered if they had the same problem in go runtime because I suspected the code to be quite similar. And I think go team fixed the problem in ready() equivalent in go runtime, by adding a flag in Proc equivalent so the proc (G in go) is put back to the run queue ... are you sure? that big scheduler lock looks like it's doing the job of splhi() to me. - erik
Re: [9fans] 9vx, kproc and *double sleep*
I'll look but one thing doesn't make sense to me: On Fri, Jun 11, 2010 at 2:06 PM, Philippe Anel x...@bouyapop.org wrote: // xigh: move unlocking to schedinit() schedinit only runs once and sleep runs all the time. That's the part I don't get. But you might have found something, I sure wish I understood it all better :-) ron
Re: [9fans] 9vx, kproc and *double sleep*
Schedinit() initialize the scheduler label ... on which sleep() goes when calling gotolabel(m-sched). Phil; ron minnich wrote: I'll look but one thing doesn't make sense to me: On Fri, Jun 11, 2010 at 2:06 PM, Philippe Anel x...@bouyapop.org wrote: // xigh: move unlocking to schedinit() schedinit only runs once and sleep runs all the time. That's the part I don't get. But you might have found something, I sure wish I understood it all better :-) ron
Re: [9fans] 9vx, kproc and *double sleep*
schedinit only runs once and sleep runs all the time. That's the part I don't get. gotolabel in sleep sends you back to the setlabel at the top of schedinit. But you might have found something, I sure wish I understood it all better :-) i'm not entirely convinced that the problem isn't the fact that splhi() doesn't do anything. here's what i wonder: - does richard miller's alternate implementation of wakeup solve this problem. - does changing spl* to manipulation of a per-cpu lock solve the problem? sometimes preventing anything else from running on your mach is exactly what you want. in any event, given the long history with sleep/wakeup, changes should be justified with a promula model. the current model omits the spl* and the second lock. (http://swtch.com/spin/sleep_wakeup.txt). - erik
Re: [9fans] 9vx, kproc and *double sleep*
I don't think either splhi fixes the problem ... it only hides it for the 99.9% cases. Phil; erik quanstrom wrote: schedinit only runs once and sleep runs all the time. That's the part I don't get. gotolabel in sleep sends you back to the setlabel at the top of schedinit. But you might have found something, I sure wish I understood it all better :-) i'm not entirely convinced that the problem isn't the fact that splhi() doesn't do anything. here's what i wonder: - does richard miller's alternate implementation of wakeup solve this problem. - does changing spl* to manipulation of a per-cpu lock solve the problem? sometimes preventing anything else from running on your mach is exactly what you want. in any event, given the long history with sleep/wakeup, changes should be justified with a promula model. the current model omits the spl* and the second lock. (http://swtch.com/spin/sleep_wakeup.txt). - erik
Re: [9fans] 9vx, kproc and *double sleep*
Ooops I forgot to answer this : - does changing spl* to manipulation of a per-cpu lock solve the problem? sometimes preventing anything else from running on your mach is exactly what you want. No ... I don't think so. I think the problem comes from the fact the process is no longer exclusively tied to the current Mach when going (back) to schedinit() ... hence the change I did. Phil;
Re: [9fans] 9vx, kproc and *double sleep*
On Fri, Jun 11, 2010 at 2:49 PM, Philippe Anel x...@bouyapop.org wrote: Schedinit() initialize the scheduler label ... on which sleep() goes when calling gotolabel(m-sched). Phil; yep. Toldja I was not awake yet. ron
Re: [9fans] 9vx, kproc and *double sleep*
I'm going to put this change into my hg repo for 9vx and do some testing; others are welcome to as well. That's a pretty interesting catch. ron
Re: [9fans] 9vx, kproc and *double sleep*
On Fri Jun 11 11:03:32 EDT 2010, rminn...@gmail.com wrote: I'm going to put this change into my hg repo for 9vx and do some testing; others are welcome to as well. That's a pretty interesting catch. please wait. we still don't understand this problem very well. (why does this work on real hardware?) i'd hate to just swap buggy implementations of sleep/wakeup. - erik
Re: [9fans] 9vx, kproc and *double sleep*
On Fri Jun 11 10:54:40 EDT 2010, x...@bouyapop.org wrote: I don't think either splhi fixes the problem ... it only hides it for the 99.9% cases. on a casual reading, i agree. unfortunately, the current simplified promela model disagrees, and coraid has run millions of cpu-hrs on quad processor machines running near 100% load with up to 1500 procs, and never seen this. unless you have a good reason why we've never seen such a deadlock, i'm inclined to believe we're missing something. we need better reasons for sticking locks in than guesswork. multiple locks can easily lead to deadlock. have you tried your solution with a single Mach? No ... I don't think so. I think the problem comes from the fact the process is no longer exclusively tied to the current Mach when going (back) to schedinit() ... hence the change I did. have you tried? worst case is you'll have more information on the problem. - erik
Re: [9fans] 9vx, kproc and *double sleep*
I never seen it on real hardware but I think it does not mean it cannot happen. The problem in 9vx comes from the fact 9vx Mach are simulated by pthreads which can be scheduled just before calling gotolabel in sleep(). This gives the time to another Mach (or pthread) to 'readies' the proc A. I never seen it on real hardware but I think it does not mean it cannot happen. The problem in 9vx comes from the fact 9vx Mach are simulated by pthreads which can be scheduled just before calling gotolabel in sleep(). This gives the time to another Mach (or pthread) to 'readies' the proc A. I think it does not happen on real hardware because the cpu just don't stop while calling gotolabel() and executes the scheduler. It does not happen because the cpu is not interupted (thanks to splhi). But still, I feel the problem is here, and we can imagine ... why not, the cpu running proc A blocking on a bus request or something else. I don't know if the model is good or not ... and I wrote this is only a thougth experiment ... with my poor brain :) I think it does not happen on real hardware because the cpu just don't stop while calling gotolabel() and executes the scheduler. It does not happen because the cpu is not interupted (thanks to splhi). But still, I feel the problem is here, and we can imagine ... why not, the cpu running proc A blocking on a bus request or something else. I don't know if the model is good or not ... and I wrote this is only a thougth experiment ... with my poor brain :) Phil; erik quanstrom wrote: On Fri Jun 11 10:54:40 EDT 2010, x...@bouyapop.org wrote: I don't think either splhi fixes the problem ... it only hides it for the 99.9% cases. on a casual reading, i agree. unfortunately, the current simplified promela model disagrees, and coraid has run millions of cpu-hrs on quad processor machines running near 100% load with up to 1500 procs, and never seen this. unless you have a good reason why we've never seen such a deadlock, i'm inclined to believe we're missing something. we need better reasons for sticking locks in than guesswork. multiple locks can easily lead to deadlock. have you tried your solution with a single Mach? No ... I don't think so. I think the problem comes from the fact the process is no longer exclusively tied to the current Mach when going (back) to schedinit() ... hence the change I did. have you tried? worst case is you'll have more information on the problem. - erik
Re: [9fans] 9vx, kproc and *double sleep*
Oooops ... sorry for double copy :) The post was supposed to be : I never seen it on real hardware but I think it does not mean it cannot happen. The problem in 9vx comes from the fact 9vx Mach are simulated by pthreads which can be scheduled just before calling gotolabel in sleep(). This gives the time to another Mach (or pthread) to 'readies' the proc A. I think it does not happen on real hardware because the cpu just don't stop while calling gotolabel() and executes the scheduler. It does not happen because the cpu is not interupted (thanks to splhi). But still, I feel the problem is here, and we can imagine ... why not, the cpu running proc A blocking on a bus request or something else. I don't know if the model is good or not ... and as I wrote, this is only a thougth experiment ... with my poor brain :) Phil; Philippe Anel wrote: I never seen it on real hardware but I think it does not mean it cannot happen. The problem in 9vx comes from the fact 9vx Mach are simulated by pthreads which can be scheduled just before calling gotolabel in sleep(). This gives the time to another Mach (or pthread) to 'readies' the proc A. I never seen it on real hardware but I think it does not mean it cannot happen. The problem in 9vx comes from the fact 9vx Mach are simulated by pthreads which can be scheduled just before calling gotolabel in sleep(). This gives the time to another Mach (or pthread) to 'readies' the proc A. I think it does not happen on real hardware because the cpu just don't stop while calling gotolabel() and executes the scheduler. It does not happen because the cpu is not interupted (thanks to splhi). But still, I feel the problem is here, and we can imagine ... why not, the cpu running proc A blocking on a bus request or something else. I don't know if the model is good or not ... and I wrote this is only a thougth experiment ... with my poor brain :) I think it does not happen on real hardware because the cpu just don't stop while calling gotolabel() and executes the scheduler. It does not happen because the cpu is not interupted (thanks to splhi). But still, I feel the problem is here, and we can imagine ... why not, the cpu running proc A blocking on a bus request or something else. I don't know if the model is good or not ... and I wrote this is only a thougth experiment ... with my poor brain :) Phil; erik quanstrom wrote: On Fri Jun 11 10:54:40 EDT 2010, x...@bouyapop.org wrote: I don't think either splhi fixes the problem ... it only hides it for the 99.9% cases. on a casual reading, i agree. unfortunately, the current simplified promela model disagrees, and coraid has run millions of cpu-hrs on quad processor machines running near 100% load with up to 1500 procs, and never seen this. unless you have a good reason why we've never seen such a deadlock, i'm inclined to believe we're missing something. we need better reasons for sticking locks in than guesswork. multiple locks can easily lead to deadlock. have you tried your solution with a single Mach? No ... I don't think so. I think the problem comes from the fact the process is no longer exclusively tied to the current Mach when going (back) to schedinit() ... hence the change I did. have you tried? worst case is you'll have more information on the problem. - erik
Re: [9fans] 9vx, kproc and *double sleep*
On Fri, Jun 11, 2010 at 8:04 AM, erik quanstrom quans...@quanstro.net wrote: please wait. we still don't understand this problem very well. (why does this work on real hardware?) all the 9vx failures I have seen are with the kexec threads. This is a major 0vx change from 9. I do think that there is something in what Phillipe is saying. ron
Re: [9fans] 9vx, kproc and *double sleep*
On Fri, 11 Jun 2010 16:59:42 +0200 Philippe Anel x...@bouyapop.org wrote: Ooops I forgot to answer this : - does changing spl* to manipulation of a per-cpu lock solve the problem? sometimes preventing anything else from running on your mach is exactly what you want. No ... I don't think so. I think the problem comes from the fact the process is no longer exclusively tied to the current Mach when going (back) to schedinit() ... hence the change I did. Were you able to verify your hypothesis by adding a bit of trapping code + assertion(s) in the original sources? At the point of double sleep one can check state to see if the expected preconditions are true. Alternatively one can check when the expected conditions become true, set a variable and test it where the double sleep print occurs. Then one can sort of walk back to the earliest point where things go wrong.
Re: [9fans] 9vx, kproc and *double sleep*
I only did my tests on 9vx. I have a version that I instrumented with a circular log buffer, and I have some gdb macros which dumps the buffer. I can put the whole source somewhere and even a log with my comments of the bug if you want to see it. But please note that I made several changes (because I had to understand how it works) and I would rather copy my changes to the latest 9vx source tree so that everyone can read it. What do you think ? At the end of this post I added some part of the instruments. Please, I would like to insist on the fact I'm not saying the promela model is wrong. And I realize that the fix I propose might not be the good one. Maybe the problem is even elsewhere. All these is just feelings, logs and headache. Phil; The logs look like this : (gdb) k9log 10 - kernel history size: 524288 element size: 64 element count: 8192 History stack: 56300 elems: dbeb: m= 3 pc=44e235 sp=51f02020 up=0 :0 xp=0 :0 r= 0 a= 0 # kproc: runproc: psleep dbea: m= 3 pc=44e235 sp=51f02020 up=0 :0 xp=0 :0 r= 0 a= 0 # kproc: runproc: search dbe9: m= 3 pc=44e235 sp=51f02020 up=0 :0 xp=0 :0 r= 0 a= 0 # kproc: runproc dbe8: m= 3 pc=44dfb2 sp=51f02070 up=0 :0 xp=4 :8 r= 0 a= 0 # proc: sched: calling runproc dbe7: m= 3 pc=44dfb2 sp=51f02070 up=0 :0 xp=4 :8 r= 0 a= 0 # proc: sched dbe6: m= 3 pc=40e41b sp=51f020b0 up=4 :8 xp=0 :0 r= 0 a= 0 # proc: schedinit: disable up dbe5: m= 3 pc=40e41b sp=51f020b0 up=4 :8 xp=0 :0 r= 983360 a= 0 # proc: sleep: unlock r dbe4: m= 3 pc=40e41b sp=51f020b0 up=4 :8 xp=0 :0 r= 983360 a= 0 # proc: sleep: unlock up dbe3: m= 3 pc=40e41b sp=51f020b0 up=4 :8 xp=0 :0 r= 0 a= 0 # proc: schedinit: up still active dbe2: m= 3 pc=40e41b sp=51f020b0 up=4 :8 xp=0 :0 r= 0 a= 0 # proc: schedinit: start where the first column is the serial of the logged operation, and 'a' just a int. - void sleep(Rendez *r, int (*ftest)(void*), void *arg) { int s; void (*pt)(Proc*, int, vlong); void * pc; pc = getcallerpc(r); k9log(proc: sleep, pc, 0, r, 0); s = splhi(); if (up == nil) dbgbreak(); k9log(proc: sleep: lock r, pc, 0, r, 0); lock(r-xlk); if(r-xp){ k9log(proc: sleep: ** double sleep **, pc, r-xp, r, 0); panic(cpu%d: up=%d *double sleep* r=%p r-p=%d caller=%p\n, m-machno, up ? up-pid : 0, r, r-xp ? r-xp-pid : 0, pc); } /* * Wakeup only knows there may be something to do by testing * r-p in order to get something to lock on. * Flush that information out to memory in case the sleep is * committed. */ r-xp = up; r-xm = m; r-xpc = pc; k9log(proc: sleep: lock up, pc, 0, r, 0); lock(up-rlock); //if (up-state != Running) //dbgbreak(); k9log(proc: sleep: condition, pc, 0, r, up-nlocks.ref); if ((*ftest)(arg)) { k9log(proc: sleep: happened, pc, 0, r, 0); done: /* * if condition happened or a note is pending * never mind */ r-xp = nil; k9log(proc: sleep: unlock up, pc, 0, r, 0); unlock(up-rlock); k9log(proc: sleep: unlock r, pc, 0, r, 0); unlock(r-xlk); } else if (up-notepending) { k9log(proc: sleep: note pending, pc, 0, r, 0); goto done; } else { /* * now we are committed to * change state and call scheduler */ pt = proctrace; if(pt) pt(up, SSleep, 0); up-state = Wakeme; up-rx = r; /* statistics */ m-cs++; k9log(proc: sleep: sleeping, pc, 0, r, 0); procsave(up); if(setlabel(up-sched)) { /* * here when the process is awakened */ k9log(proc: sleep: awakened, pc, r-xp, r, 0); procrestore(up); spllo(); } else { /* * here to go to sleep (i.e. stop Running) */ //up-rmu = 1; //k9log(proc: sleep: unlock up, pc, 0, r, 0); //unlock(up-rlock); //k9log(proc: sleep: unlock r, pc, 0, r, 0); //unlock(r-xlk); k9log(proc: sleep: going sched, pc, 0, r, 0); gotolabel(m-sched); } } k9log(proc: sleep: done, pc, 0, r, 0); if(up-notepending) { k9log(proc: sleep: forward note, pc, 0, r, 0); up-notepending = 0; splx(s); if(up-procctl == Proc_exitme up-closingfgrp) forceclosefgrp(); error(Eintr); } splx(s); } struct { int32 count; struct { char * op; void * pc; void *
Re: [9fans] 9vx, kproc and *double sleep*
On Fri, 11 Jun 2010 19:31:58 +0200 Philippe Anel x...@bouyapop.org wrote: I only did my tests on 9vx. I have a version that I instrumented with a circular log buffer, and I have some gdb macros which dumps the buffer. I can put the whole source somewhere and even a log with my comments of the bug if you want to see it. But please note that I made several Yes, please. Thanks! changes (because I had to understand how it works) and I would rather copy my changes to the latest 9vx source tree so that everyone can read it. What do you think ? Agreed. Best to check this in on a separate branch though. Branching/merging is cheap in hg. Please, I would like to insist on the fact I'm not saying the promela model is wrong. And I realize that the fix I propose might not be the good one. Maybe the problem is even elsewhere. All these is just feelings, logs and headache. I haven't used promela so can't say anything about it. sleep() is pretty complicated so figuring it out will take some time and effort but I first have to understand the cause and from past experience I know that code to check a cause hypothesis can be quite valuable (hence my earlier question). An unambiguous proof of what went wrong somehow frees my mind to better focus on the solution! Thanks for your thought experiements code!
Re: [9fans] 9vx, kproc and *double sleep*
You can download my own (ugly) 9vx source code here : http://www.bouyapop.org/9vxigh.tar.bz2 In 9vx you'll find .gdbinit and crash.c. Just copy it to vx32 and replace 9vx folder, compile it and execute it under gdb with you own 9vx env. (gdb) r -F -r your folder then compile and execute crash.c with 8c/8l. When it crashes, you can watch the latest logs with the gdb command k9logs 100 (it will show you 100 last ops). Phil; Bakul Shah wrote: On Fri, 11 Jun 2010 19:31:58 +0200 Philippe Anel x...@bouyapop.org wrote: I only did my tests on 9vx. I have a version that I instrumented with a circular log buffer, and I have some gdb macros which dumps the buffer. I can put the whole source somewhere and even a log with my comments of the bug if you want to see it. But please note that I made several Yes, please. Thanks! changes (because I had to understand how it works) and I would rather copy my changes to the latest 9vx source tree so that everyone can read it. What do you think ? Agreed. Best to check this in on a separate branch though. Branching/merging is cheap in hg. Please, I would like to insist on the fact I'm not saying the promela model is wrong. And I realize that the fix I propose might not be the good one. Maybe the problem is even elsewhere. All these is just feelings, logs and headache. I haven't used promela so can't say anything about it. sleep() is pretty complicated so figuring it out will take some time and effort but I first have to understand the cause and from past experience I know that code to check a cause hypothesis can be quite valuable (hence my earlier question). An unambiguous proof of what went wrong somehow frees my mind to better focus on the solution! Thanks for your thought experiements code!