Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
i attached it in a previous mail... try again... -- cinap --- Begin Message --- > process1: > still in kernel: > memmove(buf, ...) > *fault* > trap() > fault386() > fault() => -1 > if(!user){ > panic() > *panic* > } > ... > postnote() could you be more specific. what is your test program, where is it crashing (if you know), and what is the panic message, if any? i must be dense, but i'm confused by your process diagram. - erik--- End Message --- #include #include void main(int argc, char **argv) { char *buf; int fd; if((fd = open("/dev/zero", OREAD)) < 0) sysfatal("open"); buf = (char*)0x60; segattach(0, "memory", buf, 2*4096); switch(rfork(RFPROC|RFMEM)){ case -1: sysfatal("fork"); break; case 0: for(;;){ read(fd, buf+4096, 4096); } } sleep(1000); segbrk(buf, buf+4096); }
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
you can get similar effects by remapping things. i meant that it isn't likely to happen by accident, so am i bovvered? fault386 needs to be fixed mainly by or for people running a shared cpu server with hostile users (ie, students). for the rest of us it might be more useful to have the panic to prevent real kernel bugs (ie, just bad pointers in device driver implementations) from postnoting a process instead of stopping the system. having said that, it could be argued that even in that case a postnote to the invoking process would allow the rest of the system to run and `might not' mean that the broken driver has wrecked other data structures outside it in kernel memory.--- Begin Message --- > it ensures mmuflushes in all other processes (sharing that segment) as well. > in fact, the crash you describe just emphasises that point: > the page reference no longer exists, hence the fault. > > the problem (which frankly doesn't bother me) is that fault386 > is being overly cautious in assuming that a page fault that occurs > in system mode but can't map a page successfully is necessarily a kernel bug: > that's not true. it could just note the process instead. > (it doesn't bother me because since unix days i've seen less than a handful > of programs that SHRINK their existing data segments, and i think that's the > only case that can cause the panic you're seeing.) if this case is really not important, would it make sense to disallow shrinking segments? it might be worth it just to be able to define Eshrinkage. - erik--- End Message ---
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> it ensures mmuflushes in all other processes (sharing that segment) as well. > in fact, the crash you describe just emphasises that point: > the page reference no longer exists, hence the fault. > > the problem (which frankly doesn't bother me) is that fault386 > is being overly cautious in assuming that a page fault that occurs > in system mode but can't map a page successfully is necessarily a kernel bug: > that's not true. it could just note the process instead. > (it doesn't bother me because since unix days i've seen less than a handful > of programs that SHRINK their existing data segments, and i think that's the > only case that can cause the panic you're seeing.) if this case is really not important, would it make sense to disallow shrinking segments? it might be worth it just to be able to define Eshrinkage. - erik
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
>but my testcase crashes a uniprocessor system, so here is no >waiting for mmuflushes on other processors going on. it ensures mmuflushes in all other processes (sharing that segment) as well. in fact, the crash you describe just emphasises that point: the page reference no longer exists, hence the fault. the problem (which frankly doesn't bother me) is that fault386 is being overly cautious in assuming that a page fault that occurs in system mode but can't map a page successfully is necessarily a kernel bug: that's not true. it could just note the process instead. (it doesn't bother me because since unix days i've seen less than a handful of programs that SHRINK their existing data segments, and i think that's the only case that can cause the panic you're seeing.)
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
2009/7/31 erik quanstrom : >> > could you be more specific. what is your test program, >> > where is it crashing (if you know), and what is the panic >> > message, if any? i must be dense, but i'm confused by >> > your process diagram. >> >> He posted it earlier in this thread >> >> --dho > > please post a reference. i do not see either the crash > code or the panic message on 9fans.net/archive/2009/07. > i don't recall it in the originals, either. > > - erik Was received as an attachment. Inline: #include #include void main(int argc, char **argv) { char *buf; int fd; if((fd = open("/dev/zero", OREAD)) < 0) sysfatal("open"); buf = (char*)0x60; segattach(0, "memory", buf, 2*4096); switch(rfork(RFPROC|RFMEM)){ case -1: sysfatal("fork"); break; case 0: for(;;){ read(fd, buf+4096, 4096); } } sleep(1000); segbrk(buf, buf+4096); } --dho
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> > could you be more specific. what is your test program, > > where is it crashing (if you know), and what is the panic > > message, if any? i must be dense, but i'm confused by > > your process diagram. > > He posted it earlier in this thread > > --dho please post a reference. i do not see either the crash code or the panic message on 9fans.net/archive/2009/07. i don't recall it in the originals, either. - erik
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
2009/7/31 erik quanstrom : >> process1: >> still in kernel: >> memmove(buf, ...) >> *fault* >> trap() >> fault386() >> fault() => -1 >> if(!user){ >> panic() >> *panic* >> } >> ... >> postnote() > > could you be more specific. what is your test program, > where is it crashing (if you know), and what is the panic > message, if any? i must be dense, but i'm confused by > your process diagram. He posted it earlier in this thread --dho > - erik > >
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> process1: > still in kernel: > memmove(buf, ...) > *fault* > trap() > fault386() > fault() => -1 > if(!user){ > panic() > *panic* > } > ... > postnote() could you be more specific. what is your test program, where is it crashing (if you know), and what is the panic message, if any? i must be dense, but i'm confused by your process diagram. - erik
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
very interesting. thanks for the sum up. but my testcase crashes a uniprocessor system, so here is no waiting for mmuflushes on other processors going on. any other process that shares the segment and was suspended in the kernel may potentialy hold a pointer to that freed memory area of the segment and may cause a fault in the kernel on resume. process1: in kernel: read(buf) validaddr(buf) ... *context switch* process2: in kernel: ... ibrk() mfreeseg() procflushseg() flushmmu() ... *context switch* process1: still in kernel: memmove(buf, ...) *fault* trap() fault386() fault() => -1 if(!user){ panic() *panic* } ... postnote() we cant really wait for all processes sharing the segment to be in userspace as they may already be waiting in a long blocking syscall. how do we fix this? -- cinap --- Begin Message --- this resulted in a little side discussion. to save someone else from having to break a strong oath about 9fans, i'll sum it up. the existing code handles this situation. when several processes share a segment, and any one of them decides to shrink the segment, all the processes must see the change before the pages are made available for re-use. any one of them could have any of those pages currently in their mmu state. and any must be removed from the mmu state local to the process, to ensure that no further access to the pages is possible before they are freed. the critical point is that it's irrelevant whether traps or syscalls are involved: ordinary store instructions are clearly bad enough. thus /sys/src/9/port/segment.c:/^mfreeseg contains (with s locked) /* flush this seg in all other processes */ if(s->ref > 1) procflushseg(s); and procflushseg finds all processes that share s, sets them all up to flush their mmu states, and also sets any processor running such a process to flush its state (that's picked up by a clock interrupt). procflushseg will not proceed until all processes and processors that might need to flush state have done so. (s remains locked throughout.) after the flush, no process or processor can have a reference to any of the pages in its mmu state. it is safe to free them, which mfreeseg does. now, a process might still be executing a system call or some other trap that might refer to that segment, to an address that's now been removed by another process. to access the memory, it must ultimately issue a load or a store instruction (even for syscalls, such as read or write). that instruction will trap, because as described above there is no longer a valid mmu mapping for that address within the process. normally, the trap will find the right page in the software mmu structures and install the map. in this case, however, it can't find the address, so it will raise an exception in the process (ie, send a note). (all such searches are done with the segment properly locked.)--- Begin Message --- just stop processes with s->ref > 1 from freeing parts of s with ibrk. it's not as if anything ever does in practice.--- End Message --- --- End Message ---
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> ... procflushseg finds all processes that share s, sets them all up > to flush their mmu states, and also sets any processor running such a process > to flush its state (that's picked up by a clock interrupt). > > procflushseg will not proceed until all processes and processors that > might need to flush state have done so. (s remains locked throughout.) Coincidentally, I spent last Sunday debugging a deadlock in precisely this spot. I had absentmindedly tried to use VESA vga on a multiprocessor. The aux/vga -l apparently succeeded and the screen looked great, but as a stealthy side-effect the CPU which had done the VESA call stopped responding to interrupts -- including the local APIC clock interrupt required for the mmu flush as described above. So, some time later when another process (upas/fs as it happens) on the other CPU wanted to adjust a segment size, procflushseg was called and never returned. Debugging can be challenging when cause and effect are minutes or hours apart ...
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
this resulted in a little side discussion. to save someone else from having to break a strong oath about 9fans, i'll sum it up. the existing code handles this situation. when several processes share a segment, and any one of them decides to shrink the segment, all the processes must see the change before the pages are made available for re-use. any one of them could have any of those pages currently in their mmu state. and any must be removed from the mmu state local to the process, to ensure that no further access to the pages is possible before they are freed. the critical point is that it's irrelevant whether traps or syscalls are involved: ordinary store instructions are clearly bad enough. thus /sys/src/9/port/segment.c:/^mfreeseg contains (with s locked) /* flush this seg in all other processes */ if(s->ref > 1) procflushseg(s); and procflushseg finds all processes that share s, sets them all up to flush their mmu states, and also sets any processor running such a process to flush its state (that's picked up by a clock interrupt). procflushseg will not proceed until all processes and processors that might need to flush state have done so. (s remains locked throughout.) after the flush, no process or processor can have a reference to any of the pages in its mmu state. it is safe to free them, which mfreeseg does. now, a process might still be executing a system call or some other trap that might refer to that segment, to an address that's now been removed by another process. to access the memory, it must ultimately issue a load or a store instruction (even for syscalls, such as read or write). that instruction will trap, because as described above there is no longer a valid mmu mapping for that address within the process. normally, the trap will find the right page in the software mmu structures and install the map. in this case, however, it can't find the address, so it will raise an exception in the process (ie, send a note). (all such searches are done with the segment properly locked.)--- Begin Message --- just stop processes with s->ref > 1 from freeing parts of s with ibrk. it's not as if anything ever does in practice.--- End Message ---
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
just stop processes with s->ref > 1 from freeing parts of s with ibrk. it's not as if anything ever does in practice.
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> I think you may be right, Elly. Multithreaded programs indeed have their > stack running outside the stack segment, so this could happen there. > splhi won't even do on a multiprocessor. One should probably lock down > the segment. > We've never seen this happen, of course — or rather, we haven't noticed > this as the cause of a crash. just to beat a dead horse, i disabled the check in question and ran the following program with an invalid address. the program faulted and the kernel did not care. ; cat evil.c #include #include extern void evil(void); void main(void) { evil(); exits(""); } ; cat evil.s TEXT evil(SB), $0 PUSHL SP MOVL$0xa000, SP MOVL$1, AX INT $64 POPLSP RET % 8.out 8.out 78: suicide: invalid address 0xa000/24 in sys call pc=0x1046 - erik
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
... but you can still shrink segments for other processes :-) (this crashes the kernel) -- cinap --- Begin Message --- hm... what about the other stuff port/sysfile.c? they also just checks pointers with validaddr and then call into the device handler without locking anything / holding anything. -- cinap --- Begin Message --- > On Thu, 30 Jul 2009, erik quanstrom wrote: > >> On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote: >>> My familiarity with the kernel source code is superficial to say the >>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c) >>> contains a race condition: >>> >>> 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) >>> 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); >>> 704 >>> 705 up->s = *((Sargs*)(sp+BY2WD)); >>> >>> We verify that the address is correct; is there any reason another thread >>> in the same address space cannot start running after line 703 completes >>> and unmap that memory, causing us to access unmapped memory at line 705? >>> The system call entry is itself an interrupt gate, but line 689 is >>> spllo(), and we appear to hold no locks at this point. >> >> plan 9 threads are cooperatively scheduled. so >> the correct term is proc. but you are correct, >> another proc sharing memory with this one >> could be running. however, that proc would >> not have access to this proc's stack. (rfork >> doesn't allow shared stack.) and even if it >> did, plan 9 stacks don't shrink. > > What if sp points inside a segment which is not the actual stack segment? > Then could someone else come along and segdetach() it in between the two > mentioned lines? > >> let's suppose that the address is invalid later. >> the kernel always moves data to/from user >> buffers outside of any locks because even >> valid targets may be paged out. if the address >> is truely invalid, waserror() will be true and >> the else branch starting at 714 will be executed >> >> - erik > > -- Elly I think you may be right, Elly. Multithreaded programs indeed have their stack running outside the stack segment, so this could happen there. splhi won't even do on a multiprocessor. One should probably lock down the segment. We've never seen this happen, of course — or rather, we haven't noticed this as the cause of a crash. Sape --- End Message --- --- End Message --- #include #include void main(int argc, char **argv) { char *buf; int fd; if((fd = open("/dev/zero", OREAD)) < 0) sysfatal("open"); buf = (char*)0x60; segattach(0, "memory", buf, 2*4096); switch(rfork(RFPROC|RFMEM)){ case -1: sysfatal("fork"); break; case 0: for(;;){ read(fd, buf+4096, 4096); } } sleep(1000); segbrk(buf, buf+4096); }
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
ok, forget it... the segments are actualy reference counted. so a process would not be able to unmap a segment in another process. -- cinap --- Begin Message --- > On Thu, 30 Jul 2009, erik quanstrom wrote: > >> On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote: >>> My familiarity with the kernel source code is superficial to say the >>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c) >>> contains a race condition: >>> >>> 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) >>> 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); >>> 704 >>> 705 up->s = *((Sargs*)(sp+BY2WD)); >>> >>> We verify that the address is correct; is there any reason another thread >>> in the same address space cannot start running after line 703 completes >>> and unmap that memory, causing us to access unmapped memory at line 705? >>> The system call entry is itself an interrupt gate, but line 689 is >>> spllo(), and we appear to hold no locks at this point. >> >> plan 9 threads are cooperatively scheduled. so >> the correct term is proc. but you are correct, >> another proc sharing memory with this one >> could be running. however, that proc would >> not have access to this proc's stack. (rfork >> doesn't allow shared stack.) and even if it >> did, plan 9 stacks don't shrink. > > What if sp points inside a segment which is not the actual stack segment? > Then could someone else come along and segdetach() it in between the two > mentioned lines? > >> let's suppose that the address is invalid later. >> the kernel always moves data to/from user >> buffers outside of any locks because even >> valid targets may be paged out. if the address >> is truely invalid, waserror() will be true and >> the else branch starting at 714 will be executed >> >> - erik > > -- Elly I think you may be right, Elly. Multithreaded programs indeed have their stack running outside the stack segment, so this could happen there. splhi won't even do on a multiprocessor. One should probably lock down the segment. We've never seen this happen, of course — or rather, we haven't noticed this as the cause of a crash. Sape --- End Message ---
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
hm... what about the other stuff port/sysfile.c? they also just checks pointers with validaddr and then call into the device handler without locking anything / holding anything. -- cinap --- Begin Message --- > On Thu, 30 Jul 2009, erik quanstrom wrote: > >> On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote: >>> My familiarity with the kernel source code is superficial to say the >>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c) >>> contains a race condition: >>> >>> 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) >>> 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); >>> 704 >>> 705 up->s = *((Sargs*)(sp+BY2WD)); >>> >>> We verify that the address is correct; is there any reason another thread >>> in the same address space cannot start running after line 703 completes >>> and unmap that memory, causing us to access unmapped memory at line 705? >>> The system call entry is itself an interrupt gate, but line 689 is >>> spllo(), and we appear to hold no locks at this point. >> >> plan 9 threads are cooperatively scheduled. so >> the correct term is proc. but you are correct, >> another proc sharing memory with this one >> could be running. however, that proc would >> not have access to this proc's stack. (rfork >> doesn't allow shared stack.) and even if it >> did, plan 9 stacks don't shrink. > > What if sp points inside a segment which is not the actual stack segment? > Then could someone else come along and segdetach() it in between the two > mentioned lines? > >> let's suppose that the address is invalid later. >> the kernel always moves data to/from user >> buffers outside of any locks because even >> valid targets may be paged out. if the address >> is truely invalid, waserror() will be true and >> the else branch starting at 714 will be executed >> >> - erik > > -- Elly I think you may be right, Elly. Multithreaded programs indeed have their stack running outside the stack segment, so this could happen there. splhi won't even do on a multiprocessor. One should probably lock down the segment. We've never seen this happen, of course — or rather, we haven't noticed this as the cause of a crash. Sape --- End Message ---
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> > plan 9 threads are cooperatively scheduled. so > > the correct term is proc. but you are correct, > > another proc sharing memory with this one > > could be running. however, that proc would > > not have access to this proc's stack. (rfork > > doesn't allow shared stack.) and even if it > > did, plan 9 stacks don't shrink. > > What if sp points inside a segment which is not the actual stack segment? > Then could someone else come along and segdetach() it in between the two > mentioned lines? see below: > > let's suppose that the address is invalid later. > > the kernel always moves data to/from user > > buffers outside of any locks because even > > valid targets may be paged out. if the address > > is truely invalid, waserror() will be true and > > the else branch starting at 714 will be executed if you think this is possible, why don't you build a test case and prove that it can happen. the easiest way would be to disable the check completely. - erik
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
> On Thu, 30 Jul 2009, erik quanstrom wrote: > >> On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote: >>> My familiarity with the kernel source code is superficial to say the >>> least, but it seems to me that this code (from /sys/src/9/pc/trap.c) >>> contains a race condition: >>> >>> 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) >>> 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); >>> 704 >>> 705 up->s = *((Sargs*)(sp+BY2WD)); >>> >>> We verify that the address is correct; is there any reason another thread >>> in the same address space cannot start running after line 703 completes >>> and unmap that memory, causing us to access unmapped memory at line 705? >>> The system call entry is itself an interrupt gate, but line 689 is >>> spllo(), and we appear to hold no locks at this point. >> >> plan 9 threads are cooperatively scheduled. so >> the correct term is proc. but you are correct, >> another proc sharing memory with this one >> could be running. however, that proc would >> not have access to this proc's stack. (rfork >> doesn't allow shared stack.) and even if it >> did, plan 9 stacks don't shrink. > > What if sp points inside a segment which is not the actual stack segment? > Then could someone else come along and segdetach() it in between the two > mentioned lines? > >> let's suppose that the address is invalid later. >> the kernel always moves data to/from user >> buffers outside of any locks because even >> valid targets may be paged out. if the address >> is truely invalid, waserror() will be true and >> the else branch starting at 714 will be executed >> >> - erik > > -- Elly I think you may be right, Elly. Multithreaded programs indeed have their stack running outside the stack segment, so this could happen there. splhi won't even do on a multiprocessor. One should probably lock down the segment. We've never seen this happen, of course — or rather, we haven't noticed this as the cause of a crash. Sape
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
On Thu, 30 Jul 2009, erik quanstrom wrote: On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote: My familiarity with the kernel source code is superficial to say the least, but it seems to me that this code (from /sys/src/9/pc/trap.c) contains a race condition: 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); 704 705 up->s = *((Sargs*)(sp+BY2WD)); We verify that the address is correct; is there any reason another thread in the same address space cannot start running after line 703 completes and unmap that memory, causing us to access unmapped memory at line 705? The system call entry is itself an interrupt gate, but line 689 is spllo(), and we appear to hold no locks at this point. plan 9 threads are cooperatively scheduled. so the correct term is proc. but you are correct, another proc sharing memory with this one could be running. however, that proc would not have access to this proc's stack. (rfork doesn't allow shared stack.) and even if it did, plan 9 stacks don't shrink. What if sp points inside a segment which is not the actual stack segment? Then could someone else come along and segdetach() it in between the two mentioned lines? let's suppose that the address is invalid later. the kernel always moves data to/from user buffers outside of any locks because even valid targets may be paged out. if the address is truely invalid, waserror() will be true and the else branch starting at 714 will be executed - erik -- Elly
Re: [9fans] Race condition in /sys/src/9/pc/trap.c?
On Thu Jul 30 00:05:45 EDT 2009, el...@andrew.cmu.edu wrote: > My familiarity with the kernel source code is superficial to say the > least, but it seems to me that this code (from /sys/src/9/pc/trap.c) > contains a race condition: > > 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) > 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); > 704 > 705 up->s = *((Sargs*)(sp+BY2WD)); > > We verify that the address is correct; is there any reason another thread > in the same address space cannot start running after line 703 completes > and unmap that memory, causing us to access unmapped memory at line 705? > The system call entry is itself an interrupt gate, but line 689 is > spllo(), and we appear to hold no locks at this point. plan 9 threads are cooperatively scheduled. so the correct term is proc. but you are correct, another proc sharing memory with this one could be running. however, that proc would not have access to this proc's stack. (rfork doesn't allow shared stack.) and even if it did, plan 9 stacks don't shrink. let's suppose that the address is invalid later. the kernel always moves data to/from user buffers outside of any locks because even valid targets may be paged out. if the address is truely invalid, waserror() will be true and the else branch starting at 714 will be executed - erik
[9fans] Race condition in /sys/src/9/pc/trap.c?
My familiarity with the kernel source code is superficial to say the least, but it seems to me that this code (from /sys/src/9/pc/trap.c) contains a race condition: 702 if(sp<(USTKTOP-BY2PG) || sp>(USTKTOP-sizeof(Sargs)-BY2WD)) 703 validaddr(sp, sizeof(Sargs)+BY2WD, 0); 704 705 up->s = *((Sargs*)(sp+BY2WD)); We verify that the address is correct; is there any reason another thread in the same address space cannot start running after line 703 completes and unmap that memory, causing us to access unmapped memory at line 705? The system call entry is itself an interrupt gate, but line 689 is spllo(), and we appear to hold no locks at this point. (Please forgive me if my terminology is not in line: as I said, my experience is very limited and I am just beginning to explore the source). -- Elly