Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread Charles Forsyth
this resulted in a little side discussion.
to save someone else from having to break a strong oath about 9fans,
i'll sum it up. the existing code handles this situation.

when several processes share a segment, and any one of them
decides to shrink the segment, all the processes must see
the change before the pages are made available for re-use.
any one of them could have any of those pages currently in their mmu state.
and any  must be removed from the mmu state local to the process,
to ensure that no further access to the pages is possible before they are freed.
the critical point is that it's irrelevant whether traps or syscalls are
involved: ordinary store instructions are clearly bad enough.

thus /sys/src/9/port/segment.c:/^mfreeseg contains (with s locked)
/* flush this seg in all other processes */
if(s-ref  1)
procflushseg(s);
and procflushseg finds all processes that share s, sets them all up
to flush their mmu states, and also sets any processor running such a process
to flush its state (that's picked up by a clock interrupt).

procflushseg will not proceed until all processes and processors that
might need to flush state have done so. (s remains locked throughout.)

after the flush, no process or processor can have a reference
to any of the pages in its mmu state.  it is safe to free them,
which mfreeseg does.

now, a process might still be executing a system call or some other trap
that might refer to that segment, to an address that's now been removed
by another process. to access the memory, it must ultimately issue
a load or a store instruction (even for syscalls, such as read or write).
that instruction will trap, because as described above there is no longer
a valid mmu mapping for that address within the process. normally, the
trap will find the right page in the software mmu structures and install the 
map.
in this case, however, it can't find the address, so it will raise an exception
in the process (ie, send a note). (all such searches are done with the
segment properly locked.)---BeginMessage---
just stop processes with s-ref  1 from freeing parts of s with ibrk.
it's not as if anything ever does in practice.---End Message---


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread Richard Miller
 ... procflushseg finds all processes that share s, sets them all up
 to flush their mmu states, and also sets any processor running such a process
 to flush its state (that's picked up by a clock interrupt).

 procflushseg will not proceed until all processes and processors that
 might need to flush state have done so. (s remains locked throughout.)

Coincidentally, I spent last Sunday debugging a deadlock in precisely
this spot.  I had absentmindedly tried to use VESA vga on a multiprocessor.
The aux/vga -l apparently succeeded and the screen looked great, but
as a stealthy side-effect the CPU which had done the VESA call stopped
responding to interrupts -- including the local APIC clock interrupt
required for the mmu flush as described above.

So, some time later when another process (upas/fs as it happens) on
the other CPU wanted to adjust a segment size, procflushseg was called
and never returned.

Debugging can be challenging when cause and effect are minutes or
hours apart ...




Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread cinap_lenrek
very interesting. thanks for the sum up.

but my testcase crashes a uniprocessor system, so here is no
waiting for mmuflushes on other processors going on.

any other process that shares the segment and was suspended
in the kernel may potentialy hold a pointer to that freed memory
area of the segment and may cause a fault in the kernel on resume.

process1:
in kernel:
read(buf)
validaddr(buf)
...
*context switch*

process2:
in kernel:
...
ibrk()
mfreeseg()
procflushseg()
flushmmu()
...
*context switch*

process1:
still in kernel:
memmove(buf, ...)
*fault*
trap()
fault386()
fault() = -1
if(!user){
panic()
*panic*
}
...
postnote()


we cant really wait for all processes sharing the segment to be
in userspace as they may already be waiting in a long blocking
syscall.

how do we fix this?

--
cinap
---BeginMessage---
this resulted in a little side discussion.
to save someone else from having to break a strong oath about 9fans,
i'll sum it up. the existing code handles this situation.

when several processes share a segment, and any one of them
decides to shrink the segment, all the processes must see
the change before the pages are made available for re-use.
any one of them could have any of those pages currently in their mmu state.
and any  must be removed from the mmu state local to the process,
to ensure that no further access to the pages is possible before they are freed.
the critical point is that it's irrelevant whether traps or syscalls are
involved: ordinary store instructions are clearly bad enough.

thus /sys/src/9/port/segment.c:/^mfreeseg contains (with s locked)
/* flush this seg in all other processes */
if(s-ref  1)
procflushseg(s);
and procflushseg finds all processes that share s, sets them all up
to flush their mmu states, and also sets any processor running such a process
to flush its state (that's picked up by a clock interrupt).

procflushseg will not proceed until all processes and processors that
might need to flush state have done so. (s remains locked throughout.)

after the flush, no process or processor can have a reference
to any of the pages in its mmu state.  it is safe to free them,
which mfreeseg does.

now, a process might still be executing a system call or some other trap
that might refer to that segment, to an address that's now been removed
by another process. to access the memory, it must ultimately issue
a load or a store instruction (even for syscalls, such as read or write).
that instruction will trap, because as described above there is no longer
a valid mmu mapping for that address within the process. normally, the
trap will find the right page in the software mmu structures and install the 
map.
in this case, however, it can't find the address, so it will raise an exception
in the process (ie, send a note). (all such searches are done with the
segment properly locked.)---BeginMessage---
just stop processes with s-ref  1 from freeing parts of s with ibrk.
it's not as if anything ever does in practice.---End Message---
---End Message---


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread erik quanstrom
 process1:
   still in kernel:
   memmove(buf, ...)
   *fault*
   trap()
   fault386()
   fault() = -1
   if(!user){
   panic()
   *panic*
   }
   ...
   postnote()

could you be more specific.  what is your test program,
where is it crashing (if you know), and what is the panic
message, if any?  i must be dense, but i'm confused by
your process diagram.

- erik



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread Devon H. O'Dell
2009/7/31 erik quanstrom quans...@coraid.com:
 process1:
       still in kernel:
               memmove(buf, ...)
               *fault*
               trap()
                       fault386()
                               fault() = -1
                               if(!user){
                                       panic()
                                       *panic*
                               }
                               ...
                               postnote()

 could you be more specific.  what is your test program,
 where is it crashing (if you know), and what is the panic
 message, if any?  i must be dense, but i'm confused by
 your process diagram.

He posted it earlier in this thread

--dho

 - erik




Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread Devon H. O'Dell
2009/7/31 erik quanstrom quans...@coraid.com:
  could you be more specific.  what is your test program,
  where is it crashing (if you know), and what is the panic
  message, if any?  i must be dense, but i'm confused by
  your process diagram.

 He posted it earlier in this thread

 --dho

 please post a reference.  i do not see either the crash
 code or the panic message on 9fans.net/archive/2009/07.
 i don't recall it in the originals, either.

 - erik

Was received as an attachment. Inline:

#include u.h
#include libc.h

void
main(int argc, char **argv)
{
char *buf;
int fd;

if((fd = open(/dev/zero, OREAD))  0)
sysfatal(open);

buf = (char*)0x60;
segattach(0, memory, buf, 2*4096);

switch(rfork(RFPROC|RFMEM)){
case -1:
sysfatal(fork);
break;
case 0:
for(;;){
read(fd, buf+4096, 4096);
}
}
sleep(1000);
segbrk(buf, buf+4096);
}

--dho



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread Charles Forsyth
but my testcase crashes a uniprocessor system, so here is no
waiting for mmuflushes on other processors going on.

it ensures mmuflushes in all other processes (sharing that segment) as well.
in fact, the crash you describe just emphasises that point:
the page reference no longer exists, hence the fault.

the problem (which frankly doesn't bother me) is that fault386
is being overly cautious in assuming that a page fault that occurs
in system mode but can't map a page successfully is necessarily a kernel bug:
that's not true. it could just note the process instead.
(it doesn't bother me because since unix days i've seen less than a handful
of programs that SHRINK their existing data segments, and i think that's the
only case that can cause the panic you're seeing.)



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread erik quanstrom
 it ensures mmuflushes in all other processes (sharing that segment) as well.
 in fact, the crash you describe just emphasises that point:
 the page reference no longer exists, hence the fault.
 
 the problem (which frankly doesn't bother me) is that fault386
 is being overly cautious in assuming that a page fault that occurs
 in system mode but can't map a page successfully is necessarily a kernel bug:
 that's not true. it could just note the process instead.
 (it doesn't bother me because since unix days i've seen less than a handful
 of programs that SHRINK their existing data segments, and i think that's the
 only case that can cause the panic you're seeing.)

if this case is really not important, would it make sense
to disallow shrinking segments? it might be worth it just
to be able to define Eshrinkage.

- erik



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread Charles Forsyth
you can get similar effects by remapping things.
i meant that it isn't likely to happen by accident, so am i bovvered?
fault386 needs to be fixed mainly by or for people running a shared cpu
server with hostile users (ie, students).
for the rest of us it might be more useful to have the panic
to prevent real kernel bugs (ie, just bad pointers in device driver 
implementations)
from postnoting a process instead of stopping the system.
having said that, it could be argued that even in that case
a postnote to the invoking process would allow the rest of the system
to run and `might not' mean that the broken driver has wrecked other
data structures outside it in kernel memory.---BeginMessage---
 it ensures mmuflushes in all other processes (sharing that segment) as well.
 in fact, the crash you describe just emphasises that point:
 the page reference no longer exists, hence the fault.
 
 the problem (which frankly doesn't bother me) is that fault386
 is being overly cautious in assuming that a page fault that occurs
 in system mode but can't map a page successfully is necessarily a kernel bug:
 that's not true. it could just note the process instead.
 (it doesn't bother me because since unix days i've seen less than a handful
 of programs that SHRINK their existing data segments, and i think that's the
 only case that can cause the panic you're seeing.)

if this case is really not important, would it make sense
to disallow shrinking segments? it might be worth it just
to be able to define Eshrinkage.

- erik---End Message---


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-31 Thread cinap_lenrek
i attached it in a previous mail... try again...

--
cinap
---BeginMessage---
 process1:
   still in kernel:
   memmove(buf, ...)
   *fault*
   trap()
   fault386()
   fault() = -1
   if(!user){
   panic()
   *panic*
   }
   ...
   postnote()

could you be more specific.  what is your test program,
where is it crashing (if you know), and what is the panic
message, if any?  i must be dense, but i'm confused by
your process diagram.

- erik---End Message---
#include u.h
#include libc.h

void
main(int argc, char **argv)
{
char *buf;
int fd;

if((fd = open(/dev/zero, OREAD))  0)
sysfatal(open);

buf = (char*)0x60;
segattach(0, memory, buf, 2*4096);

switch(rfork(RFPROC|RFMEM)){
case -1:
sysfatal(fork);
break;
case 0:
for(;;){
read(fd, buf+4096, 4096);
}
}
sleep(1000);
segbrk(buf, buf+4096);
}


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread erik quanstrom
On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote:
 My familiarity with the kernel source code is superficial to say the 
 least, but it seems to me that this code (from /sys/src/9/pc/trap.c) 
 contains a race condition:
 
   702 if(sp(USTKTOP-BY2PG) || sp(USTKTOP-sizeof(Sargs)-BY2WD))
   703 validaddr(sp, sizeof(Sargs)+BY2WD, 0);
   704
   705 up-s = *((Sargs*)(sp+BY2WD));
 
 We verify that the address is correct; is there any reason another thread 
 in the same address space cannot start running after line 703 completes 
 and unmap that memory, causing us to access unmapped memory at line 705? 
 The system call entry is itself an interrupt gate, but line 689 is 
 spllo(), and we appear to hold no locks at this point.

plan 9 threads are cooperatively scheduled.  so
the correct term is proc.  but you are correct,
another proc sharing memory with this one
could be running.  however, that proc would
not have access to this proc's stack.  (rfork
doesn't allow shared stack.)  and even if it
did, plan 9 stacks don't shrink.

let's suppose that the address is invalid later.
the kernel always moves data to/from user
buffers outside of any locks because even
valid targets may be paged out.  if the address
is truely invalid, waserror() will be true and
the else branch starting at 714 will be executed

- erik



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread Sape Mullender
 On Thu, 30 Jul 2009, erik quanstrom wrote:
 
 On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote:
 My familiarity with the kernel source code is superficial to say the
 least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
 contains a race condition:

   702 if(sp(USTKTOP-BY2PG) || sp(USTKTOP-sizeof(Sargs)-BY2WD))
   703 validaddr(sp, sizeof(Sargs)+BY2WD, 0);
   704
   705 up-s = *((Sargs*)(sp+BY2WD));

 We verify that the address is correct; is there any reason another thread
 in the same address space cannot start running after line 703 completes
 and unmap that memory, causing us to access unmapped memory at line 705?
 The system call entry is itself an interrupt gate, but line 689 is
 spllo(), and we appear to hold no locks at this point.

 plan 9 threads are cooperatively scheduled.  so
 the correct term is proc.  but you are correct,
 another proc sharing memory with this one
 could be running.  however, that proc would
 not have access to this proc's stack.  (rfork
 doesn't allow shared stack.)  and even if it
 did, plan 9 stacks don't shrink.
 
 What if sp points inside a segment which is not the actual stack segment? 
 Then could someone else come along and segdetach() it in between the two 
 mentioned lines?
 
 let's suppose that the address is invalid later.
 the kernel always moves data to/from user
 buffers outside of any locks because even
 valid targets may be paged out.  if the address
 is truely invalid, waserror() will be true and
 the else branch starting at 714 will be executed

 - erik
 
 -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

Sape




Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread erik quanstrom
  plan 9 threads are cooperatively scheduled.  so
  the correct term is proc.  but you are correct,
  another proc sharing memory with this one
  could be running.  however, that proc would
  not have access to this proc's stack.  (rfork
  doesn't allow shared stack.)  and even if it
  did, plan 9 stacks don't shrink.
 
 What if sp points inside a segment which is not the actual stack segment? 
 Then could someone else come along and segdetach() it in between the two 
 mentioned lines?

see below:

  let's suppose that the address is invalid later.
  the kernel always moves data to/from user
  buffers outside of any locks because even
  valid targets may be paged out.  if the address
  is truely invalid, waserror() will be true and
  the else branch starting at 714 will be executed

if you think this is possible, why don't you build a
test case and prove that it can happen.  the easiest
way would be to disable the check completely.

- erik



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread cinap_lenrek
hm... what about the other stuff port/sysfile.c?

they also just checks pointers with validaddr and then call
into the device handler without locking anything / holding
anything.

--
cinap

---BeginMessage---
 On Thu, 30 Jul 2009, erik quanstrom wrote:
 
 On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote:
 My familiarity with the kernel source code is superficial to say the
 least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
 contains a race condition:

   702 if(sp(USTKTOP-BY2PG) || sp(USTKTOP-sizeof(Sargs)-BY2WD))
   703 validaddr(sp, sizeof(Sargs)+BY2WD, 0);
   704
   705 up-s = *((Sargs*)(sp+BY2WD));

 We verify that the address is correct; is there any reason another thread
 in the same address space cannot start running after line 703 completes
 and unmap that memory, causing us to access unmapped memory at line 705?
 The system call entry is itself an interrupt gate, but line 689 is
 spllo(), and we appear to hold no locks at this point.

 plan 9 threads are cooperatively scheduled.  so
 the correct term is proc.  but you are correct,
 another proc sharing memory with this one
 could be running.  however, that proc would
 not have access to this proc's stack.  (rfork
 doesn't allow shared stack.)  and even if it
 did, plan 9 stacks don't shrink.
 
 What if sp points inside a segment which is not the actual stack segment? 
 Then could someone else come along and segdetach() it in between the two 
 mentioned lines?
 
 let's suppose that the address is invalid later.
 the kernel always moves data to/from user
 buffers outside of any locks because even
 valid targets may be paged out.  if the address
 is truely invalid, waserror() will be true and
 the else branch starting at 714 will be executed

 - erik
 
 -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

Sape
---End Message---


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread cinap_lenrek
ok, forget it... the segments are actualy reference counted.
so a process would not be able to unmap a segment in
another process.

--
cinap
---BeginMessage---
 On Thu, 30 Jul 2009, erik quanstrom wrote:
 
 On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote:
 My familiarity with the kernel source code is superficial to say the
 least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
 contains a race condition:

   702 if(sp(USTKTOP-BY2PG) || sp(USTKTOP-sizeof(Sargs)-BY2WD))
   703 validaddr(sp, sizeof(Sargs)+BY2WD, 0);
   704
   705 up-s = *((Sargs*)(sp+BY2WD));

 We verify that the address is correct; is there any reason another thread
 in the same address space cannot start running after line 703 completes
 and unmap that memory, causing us to access unmapped memory at line 705?
 The system call entry is itself an interrupt gate, but line 689 is
 spllo(), and we appear to hold no locks at this point.

 plan 9 threads are cooperatively scheduled.  so
 the correct term is proc.  but you are correct,
 another proc sharing memory with this one
 could be running.  however, that proc would
 not have access to this proc's stack.  (rfork
 doesn't allow shared stack.)  and even if it
 did, plan 9 stacks don't shrink.
 
 What if sp points inside a segment which is not the actual stack segment? 
 Then could someone else come along and segdetach() it in between the two 
 mentioned lines?
 
 let's suppose that the address is invalid later.
 the kernel always moves data to/from user
 buffers outside of any locks because even
 valid targets may be paged out.  if the address
 is truely invalid, waserror() will be true and
 the else branch starting at 714 will be executed

 - erik
 
 -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

Sape
---End Message---


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread cinap_lenrek
... but you can still shrink segments for other processes :-)

(this crashes the kernel)

--
cinap
---BeginMessage---
hm... what about the other stuff port/sysfile.c?

they also just checks pointers with validaddr and then call
into the device handler without locking anything / holding
anything.

--
cinap

---BeginMessage---
 On Thu, 30 Jul 2009, erik quanstrom wrote:
 
 On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote:
 My familiarity with the kernel source code is superficial to say the
 least, but it seems to me that this code (from /sys/src/9/pc/trap.c)
 contains a race condition:

   702 if(sp(USTKTOP-BY2PG) || sp(USTKTOP-sizeof(Sargs)-BY2WD))
   703 validaddr(sp, sizeof(Sargs)+BY2WD, 0);
   704
   705 up-s = *((Sargs*)(sp+BY2WD));

 We verify that the address is correct; is there any reason another thread
 in the same address space cannot start running after line 703 completes
 and unmap that memory, causing us to access unmapped memory at line 705?
 The system call entry is itself an interrupt gate, but line 689 is
 spllo(), and we appear to hold no locks at this point.

 plan 9 threads are cooperatively scheduled.  so
 the correct term is proc.  but you are correct,
 another proc sharing memory with this one
 could be running.  however, that proc would
 not have access to this proc's stack.  (rfork
 doesn't allow shared stack.)  and even if it
 did, plan 9 stacks don't shrink.
 
 What if sp points inside a segment which is not the actual stack segment? 
 Then could someone else come along and segdetach() it in between the two 
 mentioned lines?
 
 let's suppose that the address is invalid later.
 the kernel always moves data to/from user
 buffers outside of any locks because even
 valid targets may be paged out.  if the address
 is truely invalid, waserror() will be true and
 the else branch starting at 714 will be executed

 - erik
 
 -- Elly

I think you may be right, Elly.  Multithreaded programs indeed have their
stack running outside the stack segment, so this could happen there.
splhi won't even do on a multiprocessor.  One should probably lock down
the segment.
We've never seen this happen, of course — or rather, we haven't noticed
this as the cause of a crash.

Sape
---End Message---
---End Message---
#include u.h
#include libc.h

void
main(int argc, char **argv)
{
char *buf;
int fd;

if((fd = open(/dev/zero, OREAD))  0)
sysfatal(open);

buf = (char*)0x60;
segattach(0, memory, buf, 2*4096);

switch(rfork(RFPROC|RFMEM)){
case -1:
sysfatal(fork);
break;
case 0:
for(;;){
read(fd, buf+4096, 4096);
}
}
sleep(1000);
segbrk(buf, buf+4096);
}


Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread erik quanstrom
 I think you may be right, Elly.  Multithreaded programs indeed have their
 stack running outside the stack segment, so this could happen there.
 splhi won't even do on a multiprocessor.  One should probably lock down
 the segment.
 We've never seen this happen, of course — or rather, we haven't noticed
 this as the cause of a crash.

just to beat a dead horse, i disabled the check in question
and ran the following program with an invalid address.
the program faulted and the kernel did not care.

; cat evil.c
#include u.h
#include libc.h

extern void evil(void);

void
main(void)
{
evil();
exits();
}
; cat evil.s
TEXT evil(SB), $0
PUSHL   SP
MOVL$0xa000, SP
MOVL$1, AX
INT $64
POPLSP
RET
% 8.out
8.out 78: suicide: invalid address 0xa000/24 in sys call pc=0x1046

- erik



Re: [9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-30 Thread Charles Forsyth
just stop processes with s-ref  1 from freeing parts of s with ibrk.
it's not as if anything ever does in practice.



[9fans] Race condition in /sys/src/9/pc/trap.c?

2009-07-29 Thread Matthew J Jones
My familiarity with the kernel source code is superficial to say the 
least, but it seems to me that this code (from /sys/src/9/pc/trap.c) 
contains a race condition:


 702 if(sp(USTKTOP-BY2PG) || sp(USTKTOP-sizeof(Sargs)-BY2WD))
 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0);
 704
 705 up-s = *((Sargs*)(sp+BY2WD));

We verify that the address is correct; is there any reason another thread 
in the same address space cannot start running after line 703 completes 
and unmap that memory, causing us to access unmapped memory at line 705? 
The system call entry is itself an interrupt gate, but line 689 is 
spllo(), and we appear to hold no locks at this point.


(Please forgive me if my terminology is not in line: as I said, my 
experience is very limited and I am just beginning to explore the source).


-- Elly