This patch provides a way to optionally suppress spurious interrupts, as a workaround for systems described below:
Some old operating systems do not handle spurious interrupts well, and qemu tends to generate them significantly more often than real hardware. Examples: - Microport UNIX System V/386 v 2.1 (ca 1987) (The main problem I'm fixing: Without this patch, it panics sporadically when accessing the hard disk.) - AT&T UNIX System V/386 Release 4.0 Version 2.1a (ca 1991) See screenshot in "QEMU Official OS Support List": http://www.claunia.com/qemu/objectManager.php?sClass=application&iId=9 (I don't have this system to test.) - A report about OS/2 boot lockup from 2004 by Hampa Hug: http://lists.nongnu.org/archive/html/qemu-devel/2004-09/msg00367.html (My patch was partially inspired by his.) Also: http://lists.nongnu.org/archive/html/qemu-devel/2005-06/msg00243.html (I don't have this system to test.) Signed-off-by: Matthew Ogilvie <mmogilvi_q...@miniinfo.net> --- Note: checkpatches.pl gives an error about initializing the global "int no_spurious_interrupt_hack = 0;", even though existing lines near it are doing the same thing. Should I give precedence to checkpatches.pl, or nearby code? There was no version 1 of this patch; this was the last thing I had to work around to get UNIX running. High level symptoms: 1. Despite using this UNIX system for nearly 10 years (ca 1987-1996) on an early 80386, I don't remember ever seeing any crash like this. I vaguely remember I may have had one or two crashes for which I don't have other explanations that perhaps could have been this, but I don't remember the error messages to confirm it. 2. It is somewhat random when UNIX crashes when running in qemu. - Sometimes it crashes the first time the floppy-based installer tries to access the hard disk (partition table?). - Other times (though fairly rarely), it actually finishes formatting and copying the first disk's files to the hard disk without crashing. - On the other hand, I've never seen it successfully boot from the hard disk without this patch. An attempt to boot from the hard drive always panics quite early. 3. I tried -win2k-hack instead, thinking maybe the hard disk is just responding faster than UNIX expected. But it doesn't seem to have any effect. UNIX still panics sporadically the same way. - TANGENT: I was going to see if my patch provides an alternative fix for installing Windows 2000, but I was unable to reproduce the original -win2k-hack problem at all (with neither -win2k-hack NOR this patch). Maybe some other change has fixed it some other way? Or maybe it is only an issue in configurations I didn't test? (KVM instead of TCG? Less RAM? Something else?) It might be worth doing a little more investigation, and eliminating the -win2k-hack option if appropriate. 4. If I enable KVM, I get a different error very early in bootup (in splx function instead of splint), and this patch doesn't help. ============ My low level analysis of what is going on: It is hard to track down all the details, but based on logging a lot of qemu IRQ stuff, and setting a breakpoint in the earliest panic-related UNIX function using gdb, it looks like: 1. It is near the end of servicing a previous IRQ14 from the hard disk. 2. The processor has interrupts disabled (I think), while UNIX clears the slave 8259's IMR (mask) register (sets it to 0), allowing all interrupts to be passed on to the master. 3. While in that state, IRQ14 is raised (on the slave), which gets propagated to the master (IRQ2), but the CPU is not interrupted yet. 4. UNIX then masks the slave 8259's IMR register completely (sets to 0xff). 5. Because the master elcr register is set (by BIOS; UNIX never touches it) to edge trigger for IRQ2, the master latched on to IRQ2 earlier, and continues to assert the processors INT line (the env->interrupt_request&CPU_INTERRUPT_HARD bit) even after all slave IRQs have been masked off (clearing the input IRQ2). 6. Finally, UNIX enables CPU interrupts and the interrupt is delivered to the CPU, which ends up as a spurious IRQ15 due to the slave's imr register. UNIX doesn't know what to do with that, and panics/halts. I'm not sure why it only sporadically hits this sequence of events. There doesn't seem to be other IRQs asserted or serviced anywhere in the near past; the last several were all IRQ14's. But I can't help feeling I'm not reading the log output correctly or something, because that doesn't make sense. Maybe there is there some kind of a-few-instructions delay before a CPU interrupt is actually deliviered after interrupts are enabled, or some delay in raising IRQ14 after a hard drive operation is requested, and such delays need to fall into a narrow window of opportunity left by UNIX? I can get a disassembly of the UNIX kernel using a "coff"-enabled build of GNU objdump, giving function names but not much else. But I haven't studied it in enough detail to actually find the relevant code path that is manipulating imr as described above. However, this old post outlines some of the high level theory of UNIX spl*() functions: http://www.linuxmisc.com/29-unix-internals/4e6c1f6fa2e41670.htm If anyone wants to look into this further, I can provide access to the initial boot install floppy, at least. Email me. (Without the rest of the install disks, it isn't much use for anything except testing virtual machines like qemu against rare corner cases...) ============ Alternative Approaches: An alternative to this patch that might work (I haven't tried) would be to have BIOS set the master's elcr register 0x04 bit, making IRQ2 level triggered instead of edge triggered. I'm not sure what other effects this might have. Maybe it would actually be a more accurate model (I haven't checked documentation; maybe "slave mode" of a IRQ line into the master is supposed to be level triggered?) Or perhaps find a way to model the minimum timescale that a interrupt request needs to be active to be recognized? Or maybe my analysis isn't correct; I wasn't able to find the relevant code path in the UNIX kernel. ============ cpu-exec.c | 12 +++++++----- hw/i8259.c | 18 ++++++++++++++++++ qemu-options.hx | 12 ++++++++++++ sysemu.h | 1 + vl.c | 4 ++++ 5 files changed, 42 insertions(+), 5 deletions(-) diff --git a/cpu-exec.c b/cpu-exec.c index 134b3c4..c309847 100644 --- a/cpu-exec.c +++ b/cpu-exec.c @@ -329,11 +329,15 @@ int cpu_exec(CPUArchState *env) 0); env->interrupt_request &= ~(CPU_INTERRUPT_HARD | CPU_INTERRUPT_VIRQ); intno = cpu_get_pic_interrupt(env); - qemu_log_mask(CPU_LOG_TB_IN_ASM, "Servicing hardware INT=0x%02x\n", intno); - do_interrupt_x86_hardirq(env, intno, 1); - /* ensure that no TB jump will be modified as - the program flow was changed */ - next_tb = 0; + if (intno >= 0) { + qemu_log_mask(CPU_LOG_TB_IN_ASM, + "Servicing hardware INT=0x%02x\n", + intno); + do_interrupt_x86_hardirq(env, intno, 1); + /* ensure that no TB jump will be modified as + the program flow was changed */ + next_tb = 0; + } #if !defined(CONFIG_USER_ONLY) } else if ((interrupt_request & CPU_INTERRUPT_VIRQ) && (env->eflags & IF_MASK) && diff --git a/hw/i8259.c b/hw/i8259.c index 6587666..7ecb7e1 100644 --- a/hw/i8259.c +++ b/hw/i8259.c @@ -26,6 +26,7 @@ #include "isa.h" #include "monitor.h" #include "qemu-timer.h" +#include "sysemu.h" #include "i8259_internal.h" /* debug PIC */ @@ -193,6 +194,20 @@ int pic_read_irq(DeviceState *d) pic_intack(slave_pic, irq2); } else { /* spurious IRQ on slave controller */ + if (no_spurious_interrupt_hack) { + /* Pretend it was delivered and acknowledged. If + * it was spurious due to slave_pic->imr, then + * as soon as the mask is cleared, the slave will + * re-trigger IRQ2 on the master. If it is spurious for + * some other reason, make sure we don't keep trying + * to half-process the same spurious interrupt over + * and over again. + */ + s->irr &= ~(1<<irq); + s->last_irr &= ~(1<<irq); + s->isr &= ~(1<<irq); + return -1; + } irq2 = 7; } intno = slave_pic->irq_base + irq2; @@ -202,6 +217,9 @@ int pic_read_irq(DeviceState *d) pic_intack(s, irq); } else { /* spurious IRQ on host controller */ + if (no_spurious_interrupt_hack) { + return -1; + } irq = 7; intno = s->irq_base + irq; } diff --git a/qemu-options.hx b/qemu-options.hx index 03e13ec..57bb0b4 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1188,6 +1188,18 @@ Windows 2000 is installed, you no longer need this option (this option slows down the IDE transfers). ETEXI +DEF("no-spurious-interrupt-hack", 0, QEMU_OPTION_no_spurious_interrupt_hack, + "-no-spurious-interrupt-hack disable delivery of spurious interrupts\n", + QEMU_ARCH_I386) +STEXI +@item -no-spurious-interrupt-hack +@findex -no-spurious-interrupt-hack +Use it as a workaround for operating systems that drive PICs in a way that +can generate spurious interrupts, but the OS doesn't handle spurious +interrupts gracefully. (e.g. late 80s/early 90s versions of ATT UNIX +and derivatives) +ETEXI + HXCOMM Deprecated by -rtc DEF("rtc-td-hack", 0, QEMU_OPTION_rtc_td_hack, "", QEMU_ARCH_I386) diff --git a/sysemu.h b/sysemu.h index 65552ac..0170109 100644 --- a/sysemu.h +++ b/sysemu.h @@ -117,6 +117,7 @@ extern int graphic_depth; extern DisplayType display_type; extern const char *keyboard_layout; extern int win2k_install_hack; +extern int no_spurious_interrupt_hack; extern int alt_grab; extern int ctrl_grab; extern int usb_enabled; diff --git a/vl.c b/vl.c index 16d04a2..6de41c1 100644 --- a/vl.c +++ b/vl.c @@ -204,6 +204,7 @@ CharDriverState *serial_hds[MAX_SERIAL_PORTS]; CharDriverState *parallel_hds[MAX_PARALLEL_PORTS]; CharDriverState *virtcon_hds[MAX_VIRTIO_CONSOLES]; int win2k_install_hack = 0; +int no_spurious_interrupt_hack = 0; int usb_enabled = 0; int singlestep = 0; int smp_cpus = 1; @@ -3046,6 +3047,9 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_win2k_hack: win2k_install_hack = 1; break; + case QEMU_OPTION_no_spurious_interrupt_hack: + no_spurious_interrupt_hack = 1; + break; case QEMU_OPTION_rtc_td_hack: { static GlobalProperty slew_lost_ticks[] = { { -- 1.7.10.2.484.gcd07cc5