I recently fixed a bug on FreeBSD related to DTrace's emulation of the
leave instruction.  A full explanation can be found at [1], but the
gist of the problem was that there is a small window in which critical
state is held above the current stack pointer.  If an interrupt was
taken in that window, the trap frame would corrupt this state, leading
to a crash.

I checked the Illumos source([2]), and Illumos' version of the
emulation code was identical(no surprise, as the FreeBSD
implementation was straight port from OpenSolaris).  To be honest, I'd
be rather surprised if Solaris and Illumos has been subject to the
same bug for so long -- I first reproduced the crash on FreeBSD by
accident, and the D script that I had written trying to debug a
different problem was triggering a crash within minutes under my
workload.  The only things that I can figure are either few people
were running Solaris/OpenSolaris/Illumos on i386 and using
DTrace(dubious), the Solaris/OpenSolaris/Illumos compiler rarely
generates leave instructions, or the invalid opcode fault handler in
Solaris/OpenSolaris/Illumos runs with interrupts disabled.  I don't
see a cli instruction in the Illumos fault handler, but I'm really not
familiar with the low-level details of i386 fault handling and it's
possible that the task descriptor for the fault handler can specify
that it runs with interrupts disabled.

I'm especially worried about the latter case, as there may well be
many other places in DTrace that are implicitly assuming that
interrupts are disabled.  For example, I know that DTrace makes
extensive use of per-CPU buffers for the data that it gathers.  That
code could easily be assuming that is being run in a critical section,
and a miss-timed interrupt might lead to the data being corrupted.

Any advice that could be offered on this would be most welcome.

If, by chance, Illumos is running the invalid opcode handler with
interrupts enabled, I believe that the following patch against
uts/intel/ia32/ml/exception.s will fix the problem(Disclaimer: I
haven't even tried to compile this, let alone test this.  But
essentially the same patch fixed the FreeBSD crash that I was seeing).

--- a/exception.s       2011-11-12 13:21:27.000000000 -0500
+++ b/exception.s       2011-11-12 13:23:13.000000000 -0500
@@ -587,11 +587,11 @@
        movl    8(%esp), %eax           /* load calling EIP */
        incl    %eax                    /* increment over LOCK prefix */
        movl    %eax, -8(%ebx)          /* store calling EIP */
-       movl    %ebx, -4(%esp)          /* temporarily store new %esp */
+       subl    $8, %ebx                /* adjust for three pushes, one pop */
+       movl    %ebx, 8(%esp)            /* temporarily store new %esp */
        popl    %ebx                    /* pop off temp */
        popl    %eax                    /* pop off temp */
-       movl    -12(%esp), %esp         /* set stack pointer */
-       subl    $8, %esp                /* adjust for three pushes, one pop */
+       movl    (%esp), %esp            /* set stack pointer */
        jmp     _emul_done
 4:
        /*


[1] - 
http://lists.freebsd.org/pipermail/freebsd-current/2011-November/029007.html
[2] - 
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/intel/ia32/ml/exception.s#572


-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to