Re: [9fans] notes and traps

2008-08-30 Thread cinap_lenrek
> i'm not convinced that you've explained why this change is
> correct or why it could solve the problem. after all, the
> NNOTED array is only 5 entries long.  what if one gets 5
> user notes before the system note?
> do you kill the process (isn't this how it works now?),
> make notes unreliable or block the sender? none of these
> options seems like an improvement to me.

Hm, i think here would be a better way to handle it without
losing user notes while make sure internal system notes
get posted (and handled) in any case.

What if we change postnote to:

int
postnote(Proc *p, int dolock, char *n, int flag)
{
int x, s, ret;
Rendez *r;
Proc *d, **l;

if(dolock)
qlock(&p->debug);

if(flag != NUser){
x = 0;
if(p->nnote < NNOTE){
if(p->nnote)
memmove(&p->note[0], &p->note[1], p->nnote * 
sizeof(p->note[0]));
p->nnote++;
}
} else {
x = -1;
if(p->nnote+1 < NNOTE)
x = p->nnote++;
}

ret = 0;
if(x >= 0) {
strcpy(p->note[x].msg, n);
p->note[x].flag = flag;
ret = 1;
}
p->notepending = 1;
if(dolock)
qunlock(&p->debug);
...

In that case, if a external note is posted, we make sure here is
always room for one further internal note in the array. (see the
p->nnote+1 < NNOTE)

On arrival of a internal note, we move the rest down the array
and put our note in the first entry.

So the following conditions hold true:

- internal notes get *always* posted (so the process gets terminated
  if its not handled or while in the note handler)
- we do not drop external notes on internal note arrival and the sender
  can detect if posting the note failed.
- internal notes get not queued after external notes so they will be
  handled by the next call to notify() and not tick notify() to think
  that while processing a previous user note, that this caused an
  internal note and kill the process before it has a chance to see the
  that note.

Anything wrong here?

--
cinap




Re: [9fans] notes and traps

2008-08-30 Thread cinap_lenrek
> > I'm not claiming anyone to be wrong. I want to solve
> > this problem.
> 
> i'm not sure there is a problem with notes.  but i really can't
> answer that question.  someone with a better understanding
> of what notes are supposed to be could answer this.
>
> > I also noted that i think this is the correct behavior in
> > the previous mail. My patch is on the other side inpostnote(),
> > where the note is put in the up->note[] array, not where
> > it decides to kill the proc.
> 
> i'm not convinced that you've explained why this change is
> correct or why it could solve the problem. after all, the
> NNOTED array is only 5 entries long.  what if one gets 5
> user notes before the system note?

assuming now that up->note[] is filled up and up->nnote = 5.
the trap happens... we are in sys/src/9/pc/trap.c trpa():
we detect the fault and do a postnote(up, 1, "sys: trap: bla", NDebug)
postnote() will see that p->notified is 0 becasue we are not in the
note handler yet, so it will skip the:

if(flag != NUser && (p->notify == 0 || p->notified))
p->nnote = 0;

then it checks if here is any room to store the note, which is not
the case. then we set p->notepending and exit.

So now, our trap note got lost! We will *not* get killed then... and happily
process the pending 5 user notes.

With my suggested patch, it would have discarded all the notes
and *have* the NDebug note posted. Notify will pick it up and kill
us if here is no handler installed or if it happend while inside the
note handler.

> do you kill the process (isn't this how it works now?),

no, it would lose the note with the current implementation.

> make notes unreliable or block the sender?

yes, trowing usernotes away when something important as an
internal note happend to make sure it gets put in the note array
so we have a chance to process it:

if(n->flag!=NUser && (up->notified || up->notify==0)){
if(n->flag == NDebug)
pprint("suicide: %s\n", n->msg);
qunlock(&up->debug);
pexit(n->msg, n->flag!=NDebug);
}

This was already in the code (see postnote()).  The only thing that i
claim to change is that it doesnt matter if we are notified or not
when posting the note.  We check for it in notify() ayway so the
process *will* be terminated if it causes another trap while inside
its note handler. 

> none of these
> options seems like an improvement to me.

losing the trap note and *not* handling it at all looks like
a bug to me.

> > >  sleeping in a note handler seems
> > > like it's pushing the limits to me. 
> > 
> > It doesnt matter if we sleep() here. You could do a
> > write() or any other syscall instead. Or do the looping 
> > and wait for the timer interrupt.
> 
> i know.  but sleep gives other processes a large window
> to deliver more notes.  what is the external note, by the
> way?  a keyboard interrupt?

External notes are the ones posted to /proc/pid/note[pg].
Like if you hit [DEL], rio writes "interrupt" the the programs
note file running in the window.

> > Being curious, why is sleep() in a note handler a bad
> > idea?
> 
> as i see it (please chime in if you disagree), a process in
> a note handler is in a funny state.  the kernel treats this
> state as funny as it keeps up->notified around
> to note (ha!) this fact.

As i see it, up->notified just means we have put the process
into its note handler, and any further notes that are in the
up->note[] array (that are NUser) get not handled until the
user process calls noted().

Thats exactly how it is documented in notify(2):

Until the program calls noted, any further externally-generated notes
(e.g., hangup or alarm) *[1] will be held off, and any further notes
generated by erroneous behavior by the program (such as divide by
zero) *[2] will kill the program.

[1] that are, NUser notes
[2] NDebug for traps

> as a general rule, i would think that anything prolonging
> this funny state would tend to cause problems.

> > Dont tell me... But i cant think of any other way to catch
> > linux syscalls for now.
> 
> without changing your mechanism, you could move the
> work out of the note handler and let the note handler
> just set state.

Not in this case. I need both notes to interrupt the linux
machine code (which i cant change) and to catch the
syscalls.

With the current handling, the only place where it would be save to
send (external) notes to a process from another is where i'm not in
linux user code (which can trigger a trap when its doing a syscall).
I dont generate traps in the linuxemu code.  (Which would kill me as
documneted in the manpage)

> for example, byron's rc ^c handler just set a global
> interrupted variable.  there were only three or four
> places where this variable was checked.  these locations
> were carefully vetted for longjmp safety. and lonjmp'ed
> back to the command loop (if interactive).  it doesn't
> sound like it would work well, but it worked

Re: [9fans] notes and traps

2008-08-30 Thread erik quanstrom
> I'm not claiming anyone to be wrong. I want to solve
> this problem.

i'm not sure there is a problem with notes.  but i really can't
answer that question.  someone with a better understanding
of what notes are supposed to be could answer this.

> I also noted that i think this is the correct behavior in
> the previous mail. My patch is on the other side inpostnote(),
> where the note is put in the up->note[] array, not where
> it decides to kill the proc.

i'm not convinced that you've explained why this change is
correct or why it could solve the problem. after all, the
NNOTED array is only 5 entries long.  what if one gets 5
user notes before the system note?

do you kill the process (isn't this how it works now?),
make notes unreliable or block the sender?  none of these
options seems like an improvement to me.

> >  sleeping in a note handler seems
> > like it's pushing the limits to me. 
> 
> It doesnt matter if we sleep() here. You could do a
> write() or any other syscall instead. Or do the looping 
> and wait for the timer interrupt.

i know.  but sleep gives other processes a large window
to deliver more notes.  what is the external note, by the
way?  a keyboard interrupt?

> Being curious, why is sleep() in a note handler a bad
> idea?

as i see it (please chime in if you disagree), a process in
a note handler is in a funny state.  the kernel treats this
state as funny as it keeps up->notified around
to note (ha!) this fact.

as a general rule, i would think that anything prolonging
this funny state would tend to cause problems.

> Dont tell me... But i cant think of any other way to catch
> linux syscalls for now.

without changing your mechanism, you could move the
work out of the note handler and let the note handler
just set state.

for example, byron's rc ^c handler just set a global
interrupted variable.  there were only three or four
places where this variable was checked.  these locations
were carefully vetted for longjmp safety. and lonjmp'ed
back to the command loop (if interactive).  it doesn't
sound like it would work well, but it worked perfectly.

about the same time byron was working on rc, i wrote
my own shell.  i wasn't convinced that byron's technique
would work or would be more reliable than doing things
from the signal handler.  ... experience says that i was
wrong.

there's also a special dodge of a note continuation handler
for ape available.  this hints that the note interface might
be pushed too far by ape.

- erik



Re: [9fans] notes and traps

2008-08-30 Thread Steve Simon
purely as an aside

this detailed conversation on how notes are handled in the
kernel is very interesting, I was trying to understand this myself
recently and gave up; I will now try again.

Thanks,

-Steve



Re: [9fans] notes and traps

2008-08-29 Thread cinap_lenrek
> i think you may be misunderstanding what i'm saying.
> the test for n->flag does not appear to be accidental.
> i am not quite sure why that test is there, but i go on
> the assumption that presotto knew what he was doing.
> if you're going to claim he was wrong, i think you'd better
> have good reasons.

I'm not claiming anyone to be wrong. I want to solve
this problem.

> including explaining why it's okay
> to not terminate a process which has not handed a system
> note when it gets another.

which has not handled what? All it checks is if here is *any* note
handled currently. And if the *next* note to be handled is
a system note.

If here are any other implications in the check i dont see
here please tell.

I also noted that i think this is the correct behavior in
the previous mail. My patch is on the other side inpostnote(),
where the note is put in the up->note[] array, not where
it decides to kill the proc.

> the example program is carefully constructed to abuse
> notes beyond reason.

Its the extract of what linuxemu needs to do. Its not just
constructed to annoy anyone or prove someone wrong.

>  sleeping in a note handler seems
> like it's pushing the limits to me. 

It doesnt matter if we sleep() here. You could do a
write() or any other syscall instead. Or do the looping 
and wait for the timer interrupt.

Being curious, why is sleep() in a note handler a bad
idea?

>  having two threads
> sending notes to the same proc including one generating
> a constent stream of gpf's seems like it's a bit beyond what
> notes were intended for.  if you remove the sleep from the
> note handler, all the notes are handled.

Not the case for me... enabling the for loops results in the
same crash here. (It takes a little bit longer than doing the
sleep() thing, maybe you need to increase loop count to
trigger it on a faster machine)

> i don't think by itself
> it's particularly compelling example.  your vm example
> may be compelling.  i don't understand it yet, though.
> so i can't say.

> i'm not arguing for broken system calls.  but notes, like
> signals, tend to be convienent but fraught interfaces.
> maybe now that plan 9 has given in to multithreading,
> it's time to replace notes.

Dont tell me... But i cant think of any other way to catch
linux syscalls for now.

> - erik

--
cinap




Re: [9fans] notes and traps

2008-08-29 Thread erik quanstrom
> No! Notes are bufferd in the up->note[] array. If you are in the note handler,
> another process *can* send you further (NUser) notes without doing any harm.
> 
> If we are in the note handler (up->notified == 1) and notify() gets hit,
> it will do nothing and return 0 see:
> 
> /sys/src/9/pc/trap.c: notify()
> ...
>   if(n->flag!=NUser && (up->notified || up->notify==0)){
>   if(n->flag == NDebug)
>   pprint("suicide: %s\n", n->msg);
>   qunlock(&up->debug);
>   pexit(n->msg, n->flag!=NDebug);
>   }
> 
>   if(up->notified){
>   qunlock(&up->debug);
>   splhi();
>   return 0;
>   }
> ...
> 
> The problem is when we get a NDebug note *after* an NUser note. Then

i think you may be misunderstanding what i'm saying.
the test for n->flag does not appear to be accidental.
i am not quite sure why that test is there, but i go on
the assumption that presotto knew what he was doing.
if you're going to claim he was wrong, i think you'd better
have good reasons.  including explaining why it's okay
to not terminate a process which has not handed a system
note when it gets another.

the example program is carefully constructed to abuse
notes beyond reason.  sleeping in a note handler seems
like it's pushing the limits to me.  having two threads
sending notes to the same proc including one generating
a constent stream of gpf's seems like it's a bit beyond what
notes were intended for.  if you remove the sleep from the
note handler, all the notes are handled.  i don't think by itself
it's particularly compelling example.  your vm example
may be compelling.  i don't understand it yet, though.
so i can't say.

i'm not arguing for broken system calls.  but notes, like
signals, tend to be convienent but fraught interfaces.
maybe now that plan 9 has given in to multithreading,
it's time to replace notes.

- erik



Re: [9fans] notes and traps

2008-08-29 Thread cinap_lenrek
> > i can reproduce it with this:
> > 
> > http://cm.bell-labs.com/sources/contrib/cinap_lenrek/traptest/
> > 
> > 8c test.c
> > 8a int80.s
> > 8l test.8 int80.8
> > ./8.out
> > 
> > 8.out 12490667: suicide: sys: trap: general protection violation 
> > pc=0x1333
> 
> okay.  it seems pretty clear from the code that you're dead meat
> if you receive a note while you're in the note handler.  that is,
> up->notified = 1. 

No! Notes are bufferd in the up->note[] array. If you are in the note handler,
another process *can* send you further (NUser) notes without doing any harm.

If we are in the note handler (up->notified == 1) and notify() gets hit,
it will do nothing and return 0 see:

/sys/src/9/pc/trap.c: notify()
...
if(n->flag!=NUser && (up->notified || up->notify==0)){
if(n->flag == NDebug)
pprint("suicide: %s\n", n->msg);
qunlock(&up->debug);
pexit(n->msg, n->flag!=NDebug);
}

if(up->notified){
qunlock(&up->debug);
splhi();
return 0;
}
...

The problem is when we get a NDebug note *after* an NUser note. Then
after notify() poped the first NUser note and putting the process into
the user handler, the NDebug note will be the next/first (up->note[0]) and then,
any (indirect) call to notify() will kill us because now it thinks while 
handling the last
note (up->notified == 1) it caused some trap/fatal event (up->note[0].flag != 
NUser).
but this was *not* the case here! We just traped after some other process
put a note in our queue.

The notify() code for detecting trap in note handler is fine i think.
Whats wrong is that the trap got put after the NUser note.

> it looks pretty clear that this is intentional.
> i don't see why one couldn't get 3-4 note before the note handler
> is called, however.
> 
> given this, calling sleep() from the note handler is an especially
> bad idea.
> 
> however, on a multiprocessor (or if you get scheduled by a clock
> tick on a up), you're still vulnerable.  this is akin to hitting ^c
> twice quickly — and watching one's shell exit.
> 
> it would be good to track down what's really going on in your
> vm.  how many processors does plan 9 think it has?

just one :-)

> i did some looking to see if i could find any discussions on the
> implementation of notes and didn't find anything in my quick scan.
> it would be very interesting to have a little perspective from someone
> who was there.

I have done further experiments and changed postnote() in
/sys/src/9/port/proc.c from:
...
if(flag != NUser && (p->notify == 0 || p->notified))
p->nnote = 0;
...
to:
...
if(flag != NUser)
p->nnote = 0;
...
which lets the testcase run without any suicides.

What it does is to ensure (in a harsh way) that not only
if the destination process is currently inside
the notehandler but always, the trap will end up as the first
entry in the up->note array. so no matter what NUser-notes
we received before.

A trap caused by a note handler will still suicide the
process which is correct.

This is just a hack. It would be better to keep the
other notes and move the tail one step down and then
putting the new note on the first entry if its != NUser.

What do you think?

> - erik

--
cinap
--- Begin Message ---
> i can reproduce it with this:
> 
> http://cm.bell-labs.com/sources/contrib/cinap_lenrek/traptest/
> 
> 8c test.c
> 8a int80.s
> 8l test.8 int80.8
> ./8.out
> 
> 8.out 12490667: suicide: sys: trap: general protection violation 
> pc=0x1333

okay.  it seems pretty clear from the code that you're dead meat
if you receive a note while you're in the note handler.  that is,
up->notified = 1.  it looks pretty clear that this is intentional.
i don't see why one couldn't get 3-4 note before the note handler
is called, however.

given this, calling sleep() from the note handler is an especially
bad idea.

however, on a multiprocessor (or if you get scheduled by a clock
tick on a up), you're still vulnerable.  this is akin to hitting ^c
twice quickly — and watching one's shell exit.

it would be good to track down what's really going on in your
vm.  how many processors does plan 9 think it has?

i did some looking to see if i could find any discussions on the
implementation of notes and didn't find anything in my quick scan.
it would be very interesting to have a little perspective from someone
who was there.

- erik
--- End Message ---


Re: [9fans] notes and traps

2008-08-29 Thread erik quanstrom
> i can reproduce it with this:
> 
> http://cm.bell-labs.com/sources/contrib/cinap_lenrek/traptest/
> 
> 8c test.c
> 8a int80.s
> 8l test.8 int80.8
> ./8.out
> 
> 8.out 12490667: suicide: sys: trap: general protection violation 
> pc=0x1333

okay.  it seems pretty clear from the code that you're dead meat
if you receive a note while you're in the note handler.  that is,
up->notified = 1.  it looks pretty clear that this is intentional.
i don't see why one couldn't get 3-4 note before the note handler
is called, however.

given this, calling sleep() from the note handler is an especially
bad idea.

however, on a multiprocessor (or if you get scheduled by a clock
tick on a up), you're still vulnerable.  this is akin to hitting ^c
twice quickly — and watching one's shell exit.

it would be good to track down what's really going on in your
vm.  how many processors does plan 9 think it has?

i did some looking to see if i could find any discussions on the
implementation of notes and didn't find anything in my quick scan.
it would be very interesting to have a little perspective from someone
who was there.

- erik




Re: [9fans] notes and traps

2008-08-29 Thread Kernel Panic

Kernel Panic wrote:

http://cm.bell-labs.com/sources/contrib/cinap_lenrek/traptest/

8c test.c
8a int80.s
8l test.8 int80.8
./8.out

8.out 12490667: suicide: sys: trap: general protection violation 
pc=0x1333


the parent process loops and does "fake" syscalls while
the child posts notes to it.

arg... swap parent <-> child... but you'll see it in the code.

--
cinap



Re: [9fans] notes and traps

2008-08-29 Thread Kernel Panic

erik quanstrom wrote:

it would also be interesting to know if you are
seeing this randomly or if you can reliable reproduce
this condition.
  

i can reproduce it with this:

http://cm.bell-labs.com/sources/contrib/cinap_lenrek/traptest/

8c test.c
8a int80.s
8l test.8 int80.8
./8.out

8.out 12490667: suicide: sys: trap: general protection violation 
pc=0x1333


the parent process loops and does "fake" syscalls while
the child posts notes to it.


- erik
  

--
cinap




Re: [9fans] notes and traps

2008-08-28 Thread Kernel Panic

erik quanstrom wrote:

it looks like you get the second trap while
you are still in your notify handler, since
i think this test
(up->notified || up->notify==0)
is for a proc in a notify handler getting a system
trap (or a proc with no notify handler).
  

right, the problem is, my note handler never sees the
*real* trap. the message passed to the notehandler is the
string posted from the other process. "sig" in the case
of linuxemu. the ureg has the trap field set to 0xD. the
pc is right on the INT 0x80 instruction., but we get the
"sig", not "sys: trap: general protection ..."

(Of course! because that was the context/event that brougth
it into the kernel in the first place. But i think it picked the wrong
note for delivery, and keeps the NDebug note pending.)

If i ignore the "sig" and do a noted(NCONT); the real trap
comes in. (most of the time) But if i do any syscall in the
handler of "sig", we get killed

(maybe caused by notify() detecting the condition:
up->note[0].flag != NUser && up->notified
while we process the "sig" note)

it would be very interesting to know what the system
trap is. 
it would also be interesting to know if you are

seeing this randomly or if you can reliable reproduce
this condition.
  

I can reproduce it outside of linuxemu. I'm at work currently...
I will provide a testcase.

Many thanks for your response.

- erik
  


--
cinap



Re: [9fans] notes and traps

2008-08-28 Thread erik quanstrom
it looks like you get the second trap while
you are still in your notify handler, since
i think this test
(up->notified || up->notify==0)
is for a proc in a notify handler getting a system
trap (or a proc with no notify handler).

it would be very interesting to know what the system
trap is.  it would also be interesting to know if you are
seeing this randomly or if you can reliable reproduce
this condition.

- erik