On Wed, Aug 17, 2011 at 11:21:42PM +0300, Andriy Gapon wrote:
[skip]

> But I also would like to use this opportunity to discuss how we can
> make it easier to debug such issue as this. I think that this problem
> demonstrates that when we treat certain junk in kernel address value
> as a userland address value, we throw additional heaps of irrelevant
> stuff on top of an actual problem. One solution could be to use a
> special flag that would mark all actual attempts to access userland
> address (e.g. setting the flag on entrance to copyin and clearing it
> upon return), so that in the page fault handler we could distinguish
> actual faults on userland addresses from faults on garbage kernel
> addresses. I am sure that there could be other clever techniques to
> catch such garbage addresses early.

We already have such mechanism, the kernel code aware of the usermode
page access sets pcb_onfault. See the end of trap_pfault() handler.
In fact, we can catch it earlier, before even calling vm_fault().

BTW, I think this is esp. useful in the combination with the support
for the SMEP in recent Intel CPUs.

commit 2e1b36fa93f9499e37acf04a66ff0646d4f13536
Author: Konstantin Belousov <kos...@pooma.home>
Date:   Thu Aug 18 00:08:50 2011 +0300

    Assert that the exiting process does not return to usermode.
    On x86, do not call vm_fault() when the kernel is not prepared
    to handle unsuccessful page fault.

diff --git a/sys/amd64/amd64/trap.c b/sys/amd64/amd64/trap.c
index 4e5f8b8..55e1e5a 100644
--- a/sys/amd64/amd64/trap.c
+++ b/sys/amd64/amd64/trap.c
@@ -674,6 +674,19 @@ trap_pfault(frame, usermode)
                        goto nogo;
 
                map = &vm->vm_map;
+
+               /*
+                * When accessing a usermode address, kernel must be
+                * ready to accept the page fault, and provide a
+                * handling routine.  Since accessing the address
+                * without the handler is a bug, do not try to handle
+                * it normally, and panic immediately.
+                */
+               if (!usermode && (td->td_intr_nesting_level != 0 ||
+                   PCPU_GET(curpcb)->pcb_onfault == NULL)) {
+                       trap_fatal(frame, eva);
+                       return (-1);
+               }
        }
 
        /*
diff --git a/sys/i386/i386/trap.c b/sys/i386/i386/trap.c
index 5a8016c..e6d2b5a 100644
--- a/sys/i386/i386/trap.c
+++ b/sys/i386/i386/trap.c
@@ -831,6 +831,11 @@ trap_pfault(frame, usermode, eva)
                        goto nogo;
 
                map = &vm->vm_map;
+               if (!usermode && (td->td_intr_nesting_level != 0 ||
+                   PCPU_GET(curpcb)->pcb_onfault == NULL)) {
+                       trap_fatal(frame, eva);
+                       return (-1);
+               }
        }
 
        /*
diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
index 3527ed1..a69b7b8 100644
--- a/sys/kern/subr_trap.c
+++ b/sys/kern/subr_trap.c
@@ -99,6 +99,8 @@ userret(struct thread *td, struct trapframe *frame)
 
        CTR3(KTR_SYSC, "userret: thread %p (pid %d, %s)", td, p->p_pid,
             td->td_name);
+       KASSERT((p->p_flag & P_WEXIT) == 0,
+           ("Exiting process returns to usermode"));
 #if 0
 #ifdef DIAGNOSTIC
        /* Check that we called signotify() enough. */

Attachment: pgpMIIm18QgD2.pgp
Description: PGP signature

Reply via email to