Re: [9fans] 9vx, kproc and *double sleep*

2010-06-14 Thread Charles Forsyth
it's interesting that neither of philippe's changes,
however justified, make any visible difference
to 9vx on my ubuntu 10.04LTS system: 9vx still
fails almost immediately. that's consistent with
9vx behaving itself as well as on any other platform
until i changed the linux and/or ubuntu version.
i'll see if i can brave gdb some more to find out.



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-14 Thread Philippe Anel
Charles,

Can you please give us stack information with gdb ?

Phil;

On Mon, 2010-06-14 at 20:15 +0100, Charles Forsyth wrote:
 it's interesting that neither of philippe's changes,
 however justified, make any visible difference
 to 9vx on my ubuntu 10.04LTS system: 9vx still
 fails almost immediately. that's consistent with
 9vx behaving itself as well as on any other platform
 until i changed the linux and/or ubuntu version.
 i'll see if i can brave gdb some more to find out.
 





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-14 Thread ron minnich
If anyone can help me with some valgrind patches we can see if
valgrind can be useful.

Charles, I am really puzzled about your ubuntu experience.

Oh, wait, can you set

LANG=C

and try again? Or is it?

BTW when you get the immediate explosion does a window even ever come
up or does it die before that?

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-14 Thread ron minnich
On Sun, Jun 13, 2010 at 11:03 AM, Philippe Anel x...@bouyapop.org wrote:
 I tried with adding :

   while (p-mach)
       sched_yield();


 at the end of sched.c:^runproc(), before the return.

 It seems to work well.

 What do you think ?

Not sure I understand all the implications but I'll try anything at
this point :-)

I'm trying now with -O3 back on. The -O3 was a red herring.

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-13 Thread Richard Miller
 - splhi -- it's not a true splhi in some sense; is it possible that
 some code is sneaking in and running even when you splhi()? Could this
 explain it?

The error Philippe has found is only indirectly related to splhi().
It's a race between a process in sleep() returning to the scheduler on
cpu A, and the same process being readied and rescheduled on cpu B
after the wakeup.

On native plan 9, A always wins the race because it runs splhi() and
the code path from sleep to schedinit (where up-state==Running is
checked) is shorter than the code path from runproc to the point in
sched where up-state is set to Running.  But the fact that this works
is timing-dependent: if cpu A for some reason ran slower than cpu B,
it could lose the race even without being interrupted.

As Philippe explained, in 9vx the cpus are being simulated by
threads.  Because these threads are being scheduled by the host
operating system, the virtual cpus can appear to be running at
different speeds or to pause at awkward moments.  Even without
any preemption at the plan 9 level of abstraction, the timing
assumption which prevents the sleep - reschedule race is no longer
guaranteed.




Re: [9fans] 9vx, kproc and *double sleep*

2010-06-13 Thread erik quanstrom
 that's only because the clock interrupt handler directly or indirectly (eg,
 via sched) calls spllo, and other trap or interrupt handlers could do that.

wouldn't that be fatal with shared 8259 interrupts?

- erik



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-13 Thread Philippe Anel

In fact you're right, and this shows us this would only happens to 9vx.
Indeed, the proc is a kproc and thus is not scheduled by the 
9vx/a/proc.c scheduler,
but by the one in 9vx/sched.c ... where dequeueproc() is not called and 
where p-mach

is not checked.

Thank you !

Phil;

Richard Miller wrote:

Philippe said:

  

Again, the change I proposed is not about sleep/wakeup/postnote, but
because wakeup() is ready()'ing the awakened process while the mach on
which sleep() runs is still holdind a pointer (up) to the awakened
process and can later (in schedinit()) assumes it is safe to access
(up)-state. Because of this, schedinit() can tries to call ready() on
(up), because because (up)-state may have been changed to Running by
a third mach entity.



and I tried to summarize:

  

It's a race between a process in sleep() returning to the scheduler on
cpu A, and the same process being readied and rescheduled on cpu B
after the wakeup.



But after further study of proc.c, I now believe we were both wrong.

A process on the ready queue can only be taken off the queue and
scheduled by calling dequeueproc(), which contains this:
/*
 *  p-mach==0 only when process state is saved
 */
if(p == 0 || p-mach){
unlock(runq);
return nil;
}
So the process p can only be scheduled (i.e. p-state set to Running)
if p-mach==nil.

The only place p-mach gets set to nil is in schedinit(), *after*
the test for p-state==Running.

This seems to mean there isn't a race after all, and Philippe's
thought experiment is impossible.

Am I missing something?



  





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-13 Thread Philippe Anel
Should be, at least I think so. But I don't even know yet how we  can do 
this.  :)
I think we can use the go trick and postpone the call to ready() while 
p-mach != 0.

But I don't know where yet.

Phil;

ron minnich wrote:

On Sun, Jun 13, 2010 at 7:26 AM, Philippe Anel x...@bouyapop.org wrote:
  

In fact you're right, and this shows us this would only happens to 9vx.
Indeed, the proc is a kproc and thus is not scheduled by the 9vx/a/proc.c
scheduler,
but by the one in 9vx/sched.c ... where dequeueproc() is not called and
where p-mach
is not checked.



So is changing 9vx/sched.c to do these two steps the real fix?

ron


  





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-13 Thread Philippe Anel

Hi,

The solution is not that simple. I mean when kprocs go to sleep
through the call to psleep(), a pwakeup() is required. We cannot
simply change the following sched.c:^runproc() part :

   while((p = kprocq.head) == nil){

by:

   while(((p = kprocq.head) == nil) || p-mach){

The a/proc.c scheduler is different as it goes idle and can be
awakened by an interrupt (or a working kproc in 9vx).

Phil;


So is changing 9vx/sched.c to do these two steps the real fix?

ron





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Philippe Anel


Hi,

I really think the spin model is good. And in fact, I really think
current sleep/wakeup/postnote code is good.  However, this model makes
the assumption that plan9 processes are really Machs and not coroutines.

I think we need a larger model, which includes the scheduler.

I mean a model that describes a set of processes (in spin meaning),
picking one kind of coroutine objects from a run queue (shared by all
spin processes) and then calling sleep/wakeup/postnote a few times
before putting the coroutine object back to the run queue. These spin
processes would represent the cpus (or Machs) while coroutine objects
would represent the plan9 processes.
I even think we don't have to simulate the fact these processes can be
interrupted.

Again, the change I proposed is not about sleep/wakeup/postnote, but
because wakeup() is ready()'ing the awakened process while the mach on
which sleep() runs is still holdind a pointer (up) to the awakened
process and can later (in schedinit()) assumes it is safe to access
(up)-state. Because of this, schedinit() can tries to call ready() on
(up), because because (up)-state may have been changed to Running by
a third mach entity.

This change only updates schedinit() (and tries) to make (up)-state
access safe when it happens after a sleep() is awakened.

Phil;



in any event, given the long history with sleep/wakeup, changes should
be justified with a promula model.  the current model omits the spl*
and the second lock.  (http://swtch.com/spin/sleep_wakeup.txt).

- erik







Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Philippe Anel


9fans,

FYI, I've wondered if they had the same problem in go runtime because
I suspected the code to be quite similar. And I think go team fixed the
problem in ready() equivalent in go runtime, by adding a flag in Proc
equivalent so the proc (G in go) is put back to the run queue ...


Phil;

In go/src/pkg/runtime/proc.c:

-

// Mark g ready to run.  Sched is already locked.  G might be running
// already and about to stop.  The sched lock protects g-status from
// changing underfoot.
static void
readylocked(G *g)
{
   if(g-m){
   // Running on another machine.
   // Ready it when it stops.
   g-readyonstop = 1;
   return;
   }
   ...

-

// Scheduler loop: find g to run, run it, repeat.
static void
scheduler(void)
{
   lock(sched);
   ...
  
   if(gp-readyonstop){

   gp-readyonstop = 0;
   readylocked(gp);
   }
   ...




Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Richard Miller
 - does richard miller's alternate implementation of wakeup
 solve this problem.

No, it doesn't.




Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread ron minnich
There's kind of an interesting similarity here to what I had to deal
with on the Xen port.

So, a few random thoughts, probably useless, from early problems of
this sort I've had.

- in Linux parlance, Plan 9 is a preemptible kernel. Interrupt
handlers can be interrupted, so to speak. Except for the clock
interrupt handler: you have to check the interrupt number to make sure
you are not pre-empting the clock interrupt handler. Sorry if I'm not
saying this very well. On Xen and lguest I had to make sure of this (I
mention this in the lguest port talk slides)
- splhi -- it's not a true splhi in some sense; is it possible that
some code is sneaking in and running even when you splhi()? Could this
explain it?
- What other aspect of the transition from hardware to
software-in-sandbox might explain a non-premptible bit of code getting
pre-empted?

OK, back to fixing my 1990 civic :-)

ron
-



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Charles Forsyth
in Linux parlance, Plan 9 is a preemptible kernel. Interrupt handlers can be 
interrupted, so to speak.

interrupt handlers are not normally interruptible during the interrupt
processing, but rather at the end (eg, when anyhigher, anyready or preempted
is called).

processes running at non-interrupt level in the kernel can be interrupted 
unless they are splhi
or using ilock.

Except for the clock interrupt handler

that's only because the clock interrupt handler directly or indirectly (eg,
via sched) calls spllo, and other trap or interrupt handlers could do that.



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread erik quanstrom
 FYI, I've wondered if they had the same problem in go runtime because
 I suspected the code to be quite similar. And I think go team fixed the
 problem in ready() equivalent in go runtime, by adding a flag in Proc
 equivalent so the proc (G in go) is put back to the run queue ...

are you sure?  that big scheduler lock looks like it's doing
the job of splhi() to me.

- erik



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread ron minnich
I'll look but one thing doesn't make sense to me:


On Fri, Jun 11, 2010 at 2:06 PM, Philippe Anel x...@bouyapop.org wrote:

           // xigh: move unlocking to schedinit()


schedinit only runs once and sleep runs all the time. That's the part
I don't get.

But you might have found something, I sure wish I understood it all better :-)

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel
Schedinit() initialize the scheduler label ... on which sleep() goes 
when calling gotolabel(m-sched).


Phil;

ron minnich wrote:

I'll look but one thing doesn't make sense to me:


On Fri, Jun 11, 2010 at 2:06 PM, Philippe Anel x...@bouyapop.org wrote:

  

  // xigh: move unlocking to schedinit()




schedinit only runs once and sleep runs all the time. That's the part
I don't get.

But you might have found something, I sure wish I understood it all better :-)

ron


  





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread erik quanstrom
 schedinit only runs once and sleep runs all the time. That's the part
 I don't get.

gotolabel in sleep sends you back to the
setlabel at the top of schedinit.

 But you might have found something, I sure wish I understood it all better :-)

i'm not entirely convinced that the problem isn't the fact that splhi()
doesn't do anything.

here's what i wonder:
- does richard miller's alternate implementation of wakeup
solve this problem.
- does changing spl* to manipulation of a per-cpu lock solve the problem?
sometimes preventing anything else from running on your mach is
exactly what you want.

in any event, given the long history with sleep/wakeup, changes should
be justified with a promula model.  the current model omits the spl*
and the second lock.  (http://swtch.com/spin/sleep_wakeup.txt).

- erik



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel
I don't think either splhi fixes the problem ... it only hides it for 
the 99.9% cases.


Phil;

erik quanstrom wrote:

schedinit only runs once and sleep runs all the time. That's the part
I don't get.



gotolabel in sleep sends you back to the
setlabel at the top of schedinit.

  

But you might have found something, I sure wish I understood it all better :-)



i'm not entirely convinced that the problem isn't the fact that splhi()
doesn't do anything.

here's what i wonder:
- does richard miller's alternate implementation of wakeup
solve this problem.
- does changing spl* to manipulation of a per-cpu lock solve the problem?
sometimes preventing anything else from running on your mach is
exactly what you want.

in any event, given the long history with sleep/wakeup, changes should
be justified with a promula model.  the current model omits the spl*
and the second lock.  (http://swtch.com/spin/sleep_wakeup.txt).

- erik


  





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel

Ooops I forgot to answer this :

- does changing spl* to manipulation of a per-cpu lock solve the problem?
sometimes preventing anything else from running on your mach is
exactly what you want.
  

No ... I don't think so. I think the problem comes from the fact the
process is no longer exclusively tied to the current Mach when going
(back) to schedinit() ... hence the change I did.

Phil;



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread ron minnich
On Fri, Jun 11, 2010 at 2:49 PM, Philippe Anel x...@bouyapop.org wrote:
 Schedinit() initialize the scheduler label ... on which sleep() goes when
 calling gotolabel(m-sched).

 Phil;


yep. Toldja I was not awake yet.

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread ron minnich
I'm going to put this change into my hg repo for 9vx and do some
testing; others are welcome to as well.

That's a pretty interesting catch.

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread erik quanstrom
On Fri Jun 11 11:03:32 EDT 2010, rminn...@gmail.com wrote:
 I'm going to put this change into my hg repo for 9vx and do some
 testing; others are welcome to as well.
 
 That's a pretty interesting catch.

please wait.  we still don't understand this problem
very well.  (why does this work on real hardware?)
i'd hate to just swap buggy implementations
of sleep/wakeup.

- erik



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread erik quanstrom
On Fri Jun 11 10:54:40 EDT 2010, x...@bouyapop.org wrote:
 I don't think either splhi fixes the problem ... it only hides it for 
 the 99.9% cases.

on a casual reading, i agree.  unfortunately,
the current simplified promela model disagrees,
and coraid has run millions of cpu-hrs on quad
processor machines running near 100% load
with up to 1500 procs, and never seen this.

unless you have a good reason why we've never
seen such a deadlock, i'm inclined to believe
we're missing something.  we need better reasons
for sticking locks in than guesswork.
multiple locks can easily lead to deadlock.

have you tried your solution with a single Mach?

 No ... I don't think so. I think the problem comes from the fact the
 process is no longer exclusively tied to the current Mach when going
 (back) to schedinit() ... hence the change I did.

have you tried?  worst case is you'll have more
information on the problem.

- erik



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel


I never seen it on real hardware but I think it does not mean it
cannot happen.  The problem in 9vx comes from the fact 9vx Mach are
simulated by pthreads which can be scheduled just before calling
gotolabel in sleep(). This gives the time to another Mach (or pthread) 
to 'readies' the proc A.


I never seen it on real hardware but I think it does not mean it
cannot happen.  The problem in 9vx comes from the fact 9vx Mach are
simulated by pthreads which can be scheduled just before calling
gotolabel in sleep(). This gives the time to another Mach (or pthread) 
to 'readies' the proc A.


I think it does not happen on real hardware because the cpu just don't
stop while calling gotolabel() and executes the scheduler. It does not
happen because the cpu is not interupted (thanks to splhi). But still,
I feel the problem is here, and we can imagine ... why not, the cpu
running proc A blocking on a bus request or something else.

I don't know if the model is good or not ... and I wrote this is only
a thougth experiment ... with my poor brain :)

I think it does not happen on real hardware because the cpu just don't
stop while calling gotolabel() and executes the scheduler. It does not
happen because the cpu is not interupted (thanks to splhi). But still,
I feel the problem is here, and we can imagine ... why not, the cpu
running proc A blocking on a bus request or something else.

I don't know if the model is good or not ... and I wrote this is only
a thougth experiment ... with my poor brain :)

Phil;


erik quanstrom wrote:

On Fri Jun 11 10:54:40 EDT 2010, x...@bouyapop.org wrote:
  
I don't think either splhi fixes the problem ... it only hides it for 
the 99.9% cases.



on a casual reading, i agree.  unfortunately,
the current simplified promela model disagrees,
and coraid has run millions of cpu-hrs on quad
processor machines running near 100% load
with up to 1500 procs, and never seen this.

unless you have a good reason why we've never
seen such a deadlock, i'm inclined to believe
we're missing something.  we need better reasons
for sticking locks in than guesswork.
multiple locks can easily lead to deadlock.

have you tried your solution with a single Mach?

  

No ... I don't think so. I think the problem comes from the fact the
process is no longer exclusively tied to the current Mach when going
(back) to schedinit() ... hence the change I did.



have you tried?  worst case is you'll have more
information on the problem.

- erik


  





Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel

Oooops ... sorry for double copy :) The post was supposed to be :

I never seen it on real hardware but I think it does not mean it
cannot happen.  The problem in 9vx comes from the fact 9vx Mach are
simulated by pthreads which can be scheduled just before calling
gotolabel in sleep(). This gives the time to another Mach (or pthread) 
to 'readies' the proc A.


I think it does not happen on real hardware because the cpu just don't
stop while calling gotolabel() and executes the scheduler. It does not
happen because the cpu is not interupted (thanks to splhi). But still,
I feel the problem is here, and we can imagine ... why not, the cpu
running proc A blocking on a bus request or something else.

I don't know if the model is good or not ... and as I wrote, this is only
a thougth experiment ... with my poor brain :)


Phil;



Philippe Anel wrote:


I never seen it on real hardware but I think it does not mean it
cannot happen.  The problem in 9vx comes from the fact 9vx Mach are
simulated by pthreads which can be scheduled just before calling
gotolabel in sleep(). This gives the time to another Mach (or pthread) 
to 'readies' the proc A.


I never seen it on real hardware but I think it does not mean it
cannot happen.  The problem in 9vx comes from the fact 9vx Mach are
simulated by pthreads which can be scheduled just before calling
gotolabel in sleep(). This gives the time to another Mach (or pthread) 
to 'readies' the proc A.


I think it does not happen on real hardware because the cpu just don't
stop while calling gotolabel() and executes the scheduler. It does not
happen because the cpu is not interupted (thanks to splhi). But still,
I feel the problem is here, and we can imagine ... why not, the cpu
running proc A blocking on a bus request or something else.

I don't know if the model is good or not ... and I wrote this is only
a thougth experiment ... with my poor brain :)

I think it does not happen on real hardware because the cpu just don't
stop while calling gotolabel() and executes the scheduler. It does not
happen because the cpu is not interupted (thanks to splhi). But still,
I feel the problem is here, and we can imagine ... why not, the cpu
running proc A blocking on a bus request or something else.

I don't know if the model is good or not ... and I wrote this is only
a thougth experiment ... with my poor brain :)

Phil;


erik quanstrom wrote:

On Fri Jun 11 10:54:40 EDT 2010, x...@bouyapop.org wrote:
 
I don't think either splhi fixes the problem ... it only hides it 
for the 99.9% cases.



on a casual reading, i agree.  unfortunately,
the current simplified promela model disagrees,
and coraid has run millions of cpu-hrs on quad
processor machines running near 100% load
with up to 1500 procs, and never seen this.

unless you have a good reason why we've never
seen such a deadlock, i'm inclined to believe
we're missing something.  we need better reasons
for sticking locks in than guesswork.
multiple locks can easily lead to deadlock.

have you tried your solution with a single Mach?

 

No ... I don't think so. I think the problem comes from the fact the
process is no longer exclusively tied to the current Mach when going
(back) to schedinit() ... hence the change I did.



have you tried?  worst case is you'll have more
information on the problem.

- erik


  









Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread ron minnich
On Fri, Jun 11, 2010 at 8:04 AM, erik quanstrom quans...@quanstro.net wrote:

 please wait.  we still don't understand this problem
 very well.  (why does this work on real hardware?)

all the 9vx failures I have seen are with the kexec threads. This is a
major 0vx change from 9. I do think that there is something in what
Phillipe is saying.

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Bakul Shah
On Fri, 11 Jun 2010 16:59:42 +0200 Philippe Anel x...@bouyapop.org  wrote:
 Ooops I forgot to answer this :
  - does changing spl* to manipulation of a per-cpu lock solve the problem?
  sometimes preventing anything else from running on your mach is
  exactly what you want.

 No ... I don't think so. I think the problem comes from the fact the
 process is no longer exclusively tied to the current Mach when going
 (back) to schedinit() ... hence the change I did.

Were you able to verify your hypothesis by adding a bit of
trapping code + assertion(s) in the original sources?  At the
point of double sleep one can check state to see if the
expected preconditions are true.  Alternatively one can check
when the expected conditions become true, set a variable and
test it where the double sleep print occurs.  Then one can sort
of walk back to the earliest point where things go wrong.



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel


I only did my tests on 9vx. I have a version that I instrumented with
a circular log buffer, and I have some gdb macros which dumps the
buffer.

I can put the whole source somewhere and even a log with my comments
of the bug if you want to see it. But please note that I made several
changes (because I had to understand how it works) and I would rather
copy my changes to the latest 9vx source tree so that everyone can
read it. What do you think ?

At the end of this post I added some part of the instruments.

Please, I would like to insist on the fact I'm not saying the promela
model is wrong. And I realize that the fix I propose might not be the
good one. Maybe the problem is even elsewhere. All these is just
feelings, logs and headache.

Phil;


The logs look like this :

(gdb) k9log 10
- kernel history size: 524288
 element size: 64
element count: 8192
  History stack: 56300 elems:
   dbeb: m= 3 pc=44e235 sp=51f02020 up=0 :0  xp=0 :0  r=   
0 a=   0 # kproc: runproc: psleep
   dbea: m= 3 pc=44e235 sp=51f02020 up=0 :0  xp=0 :0  r=   
0 a=   0 # kproc: runproc: search
   dbe9: m= 3 pc=44e235 sp=51f02020 up=0 :0  xp=0 :0  r=   
0 a=   0 # kproc: runproc
   dbe8: m= 3 pc=44dfb2 sp=51f02070 up=0 :0  xp=4 :8  r=   
0 a=   0 # proc: sched: calling runproc
   dbe7: m= 3 pc=44dfb2 sp=51f02070 up=0 :0  xp=4 :8  r=   
0 a=   0 # proc: sched
   dbe6: m= 3 pc=40e41b sp=51f020b0 up=4 :8  xp=0 :0  r=   
0 a=   0 # proc: schedinit: disable up
   dbe5: m= 3 pc=40e41b sp=51f020b0 up=4 :8  xp=0 :0  r=  
983360 a=   0 # proc: sleep: unlock r
   dbe4: m= 3 pc=40e41b sp=51f020b0 up=4 :8  xp=0 :0  r=  
983360 a=   0 # proc: sleep: unlock up
   dbe3: m= 3 pc=40e41b sp=51f020b0 up=4 :8  xp=0 :0  r=   
0 a=   0 # proc: schedinit: up still active
   dbe2: m= 3 pc=40e41b sp=51f020b0 up=4 :8  xp=0 :0  r=   
0 a=   0 # proc: schedinit: start


where the first column is the serial of the logged operation, and 'a'
just a int.

-
void
sleep(Rendez *r, int (*ftest)(void*), void *arg)
{
   int s;
   void (*pt)(Proc*, int, vlong);
   void * pc;

   pc = getcallerpc(r);
   k9log(proc: sleep, pc, 0, r, 0);

   s = splhi();

   if (up == nil)
   dbgbreak();

   k9log(proc: sleep: lock r, pc, 0, r, 0);

   lock(r-xlk);

   if(r-xp){
   k9log(proc: sleep: ** double sleep **, pc, r-xp, r, 0);
   panic(cpu%d: up=%d *double sleep* r=%p r-p=%d caller=%p\n,
 m-machno, up ? up-pid : 0, r, r-xp ? r-xp-pid : 0, pc);
   }

   /*
*  Wakeup only knows there may be something to do by testing
*  r-p in order to get something to lock on.
*  Flush that information out to memory in case the sleep is
*  committed.
*/
   r-xp = up;
   r-xm = m;
   r-xpc = pc;

   k9log(proc: sleep: lock up, pc, 0, r, 0);

   lock(up-rlock);

//if (up-state != Running)
//dbgbreak();
  
   k9log(proc: sleep: condition, pc, 0, r, up-nlocks.ref);


   if ((*ftest)(arg)) {
   k9log(proc: sleep: happened, pc, 0, r, 0);
  
   done:

   /*
*  if condition happened or a note is pending
*  never mind
*/
   r-xp = nil;

   k9log(proc: sleep: unlock up, pc, 0, r, 0);

   unlock(up-rlock);

   k9log(proc: sleep: unlock r, pc, 0, r, 0);

   unlock(r-xlk);
   }
   else if (up-notepending) {
   k9log(proc: sleep: note pending, pc, 0, r, 0);
   goto done;
   }
   else {
   /*
*  now we are committed to
*  change state and call scheduler
*/
   pt = proctrace;
   if(pt)
   pt(up, SSleep, 0);
   up-state = Wakeme;
   up-rx = r;

   /* statistics */
   m-cs++;

   k9log(proc: sleep: sleeping, pc, 0, r, 0);

   procsave(up);
   if(setlabel(up-sched)) {
   /*
*  here when the process is awakened
*/
   k9log(proc: sleep: awakened, pc, r-xp, r, 0);

   procrestore(up);
   spllo();
   } else {
   /*
*  here to go to sleep (i.e. stop Running)
*/

//up-rmu = 1;

//k9log(proc: sleep: unlock up, pc, 0, r, 0);
  
//unlock(up-rlock);
  
//k9log(proc: sleep: unlock r, pc, 0, r, 0);
  
//unlock(r-xlk);


   k9log(proc: sleep: going sched, pc, 0, r, 0);

   gotolabel(m-sched);
   }
   }

   k9log(proc: sleep: done, pc, 0, r, 0);

   if(up-notepending) {
   k9log(proc: sleep: forward note, pc, 0, r, 0);

   up-notepending = 0;
   splx(s);
   if(up-procctl == Proc_exitme  up-closingfgrp)
   forceclosefgrp();
   error(Eintr);
   }

   splx(s);
}



struct {
   int32 count;
   struct {
   char * op;
   void * pc;
   void *  

Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Bakul Shah
On Fri, 11 Jun 2010 19:31:58 +0200 Philippe Anel x...@bouyapop.org  wrote:
 
 I only did my tests on 9vx. I have a version that I instrumented with
 a circular log buffer, and I have some gdb macros which dumps the
 buffer.
 
 I can put the whole source somewhere and even a log with my comments
 of the bug if you want to see it. But please note that I made several

Yes, please. Thanks!

 changes (because I had to understand how it works) and I would rather
 copy my changes to the latest 9vx source tree so that everyone can
 read it. What do you think ?

Agreed.  Best to check this in on a separate branch though.
Branching/merging is cheap in hg.

 Please, I would like to insist on the fact I'm not saying the promela
 model is wrong. And I realize that the fix I propose might not be the
 good one. Maybe the problem is even elsewhere. All these is just
 feelings, logs and headache.

I haven't used promela so can't say anything about it.
sleep() is pretty complicated so figuring it out will take
some time and effort but I first have to understand the cause
and from past experience I know that code to check a cause
hypothesis can be quite valuable (hence my earlier question).
An unambiguous proof of what went wrong somehow frees my mind
to better focus on the solution!

Thanks for your thought experiements  code!



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-11 Thread Philippe Anel
You can download my own (ugly) 9vx source code here : 
http://www.bouyapop.org/9vxigh.tar.bz2


In 9vx you'll find .gdbinit and crash.c.

Just copy it to vx32 and replace 9vx folder, compile it and execute it 
under gdb with you own 9vx env.


(gdb)  r -F  -r  your folder

then compile  and execute  crash.c  with 8c/8l.

When it crashes, you can watch the latest logs with the gdb command 
k9logs 100 (it will show you 100 last ops).


Phil;

Bakul Shah wrote:

On Fri, 11 Jun 2010 19:31:58 +0200 Philippe Anel x...@bouyapop.org  wrote:
  

I only did my tests on 9vx. I have a version that I instrumented with
a circular log buffer, and I have some gdb macros which dumps the
buffer.

I can put the whole source somewhere and even a log with my comments
of the bug if you want to see it. But please note that I made several



Yes, please. Thanks!

  

changes (because I had to understand how it works) and I would rather
copy my changes to the latest 9vx source tree so that everyone can
read it. What do you think ?



Agreed.  Best to check this in on a separate branch though.
Branching/merging is cheap in hg.

  

Please, I would like to insist on the fact I'm not saying the promela
model is wrong. And I realize that the fix I propose might not be the
good one. Maybe the problem is even elsewhere. All these is just
feelings, logs and headache.



I haven't used promela so can't say anything about it.
sleep() is pretty complicated so figuring it out will take
some time and effort but I first have to understand the cause
and from past experience I know that code to check a cause
hypothesis can be quite valuable (hence my earlier question).
An unambiguous proof of what went wrong somehow frees my mind
to better focus on the solution!

Thanks for your thought experiements  code!