Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread erik quanstrom
> FYI, I've wondered if they had the same problem in go runtime because
> I suspected the code to be quite similar. And I think go team fixed the
> problem in ready() equivalent in go runtime, by adding a flag in Proc
> equivalent so the proc (G in go) is put back to the run queue ...

are you sure?  that big scheduler lock looks like it's doing
the job of splhi() to me.

- erik



Re: [9fans] unknown error message

2010-06-12 Thread erik quanstrom
On Sat Jun 12 17:46:52 EDT 2010, mathieu.lonja...@gmail.com wrote:
> Hello,
> 
> I doubt it will help me debug my problems but out of curiosity, can
> anyone tell me where this assertion is coming from please? Even better,
> if you have a clue about what I was doing wrong at this point ;)
> 
> 15014.35 a < t->size && t->size < b: assertion failed
> btfs 15014: suicide: sys: trap: page fault pc=0x0001079b
>

ron's excellent suggestion notwithstanding, this looks to
be a memory screw:

; g 'a < t->size && t->size < b' /sys/src
/sys/src/libc/port/pool.c:218:  assert(a < t->size && t->size < b);


setting the various parinoia flags (pool(2)) will probablly help.

- erik



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread ron minnich
On Sat, Jun 12, 2010 at 3:15 PM, Charles Forsyth  wrote:
>>in Linux parlance, Plan 9 is a "preemptible" kernel. Interrupt handlers can 
>>be interrupted, so to speak.
>
> interrupt handlers are not normally interruptible during the interrupt
> processing, but rather at the end (eg, when anyhigher, anyready or preempted
> is called).

Yes, I was not careful enough in how I said that.

For those who wonder what I was trying to say, see trap(); note what
happens after the isr() is called and look where preempted() is
called.

But, all this said, the problems we're seeing on 9vx are strangely
similar to the ones I had on Xen when code that was not supposed to be
interrupted got interrupted. There may be no real connection at all
however.

ron



Re: [9fans] unknown error message

2010-06-12 Thread ron minnich
On Sat, Jun 12, 2010 at 9:42 PM, Mathieu Lonjaret
 wrote:
> Hello,
>
> I doubt it will help me debug my problems but out of curiosity, can
> anyone tell me where this assertion is coming from please? Even better,
> if you have a clue about what I was doing wrong at this point ;)
>
> 15014.35 a < t->size && t->size < b: assertion failed
> btfs 15014: suicide: sys: trap: page fault pc=0x0001079b


The proc will still be there in 'broken' state (I love that feature of
Plan 9 ... actually implemented it on Linux but doubt I could ever get
them to take that patch).

acid 15014
stk()

it's great.

ron



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Charles Forsyth
>in Linux parlance, Plan 9 is a "preemptible" kernel. Interrupt handlers can be 
>interrupted, so to speak.

interrupt handlers are not normally interruptible during the interrupt
processing, but rather at the end (eg, when anyhigher, anyready or preempted
is called).

processes running at non-interrupt level in the kernel can be interrupted 
unless they are splhi
or using ilock.

>Except for the clock interrupt handler

that's only because the clock interrupt handler directly or indirectly (eg,
via sched) calls spllo, and other trap or interrupt handlers could do that.



[9fans] unknown error message

2010-06-12 Thread Mathieu Lonjaret
Hello,

I doubt it will help me debug my problems but out of curiosity, can
anyone tell me where this assertion is coming from please? Even better,
if you have a clue about what I was doing wrong at this point ;)

15014.35 a < t->size && t->size < b: assertion failed
btfs 15014: suicide: sys: trap: page fault pc=0x0001079b

Cheers,
Mathieu




Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread ron minnich
There's kind of an interesting similarity here to what I had to deal
with on the Xen port.

So, a few random thoughts, probably useless, from early problems of
this sort I've had.

- in Linux parlance, Plan 9 is a "preemptible" kernel. Interrupt
handlers can be interrupted, so to speak. Except for the clock
interrupt handler: you have to check the interrupt number to make sure
you are not pre-empting the clock interrupt handler. Sorry if I'm not
saying this very well. On Xen and lguest I had to make sure of this (I
mention this in the lguest port talk slides)
- splhi -- it's not a true splhi in some sense; is it possible that
some code is sneaking in and running even when you splhi()? Could this
explain it?
- What other aspect of the transition from hardware to
software-in-sandbox might explain a non-premptible bit of code getting
pre-empted?

OK, back to fixing my 1990 civic :-)

ron
-



Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Richard Miller
> - does richard miller's alternate implementation of wakeup
> solve this problem.

No, it doesn't.




Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Philippe Anel


9fans,

FYI, I've wondered if they had the same problem in go runtime because
I suspected the code to be quite similar. And I think go team fixed the
problem in ready() equivalent in go runtime, by adding a flag in Proc
equivalent so the proc (G in go) is put back to the run queue ...


Phil;

In go/src/pkg/runtime/proc.c:

>-<

// Mark g ready to run.  Sched is already locked.  G might be running
// already and about to stop.  The sched lock protects g->status from
// changing underfoot.
static void
readylocked(G *g)
{
   if(g->m){
   // Running on another machine.
   // Ready it when it stops.
   g->readyonstop = 1;
   return;
   }
   ...

>-<

// Scheduler loop: find g to run, run it, repeat.
static void
scheduler(void)
{
   lock(&sched);
   ...
  
   if(gp->readyonstop){

   gp->readyonstop = 0;
   readylocked(gp);
   }
   ...




Re: [9fans] 9vx, kproc and *double sleep*

2010-06-12 Thread Philippe Anel


Hi,

I really think the spin model is good. And in fact, I really think
current sleep/wakeup/postnote code is good.  However, this model makes
the assumption that plan9 processes are really Machs and not coroutines.

I think we need a larger model, which includes the scheduler.

I mean a model that describes a set of processes (in spin meaning),
picking one kind of coroutine objects from a run queue (shared by all
spin processes) and then calling sleep/wakeup/postnote a few times
before putting the coroutine object back to the run queue. These spin
processes would represent the cpus (or Machs) while coroutine objects
would represent the plan9 processes.
I even think we don't have to simulate the fact these processes can be
interrupted.

Again, the change I proposed is not about sleep/wakeup/postnote, but
because wakeup() is ready()'ing the awakened process while the mach on
which sleep() runs is still holdind a pointer (up) to the awakened
process and can later (in schedinit()) assumes it is safe to access
(up)->state. Because of this, schedinit() can tries to call ready() on
(up), because because (up)->state may have been changed to Running by
a third mach entity.

This change only updates schedinit() (and tries) to make (up)->state
access safe when it happens after a sleep() is awakened.

Phil;



in any event, given the long history with sleep/wakeup, changes should
be justified with a promula model.  the current model omits the spl*
and the second lock.  (http://swtch.com/spin/sleep_wakeup.txt).

- erik