On Mon, 2010-11-15 at 20:31 +0100, Jan Kiszka wrote: > Hi Philippe, > > debugging some variant of I-pipe over an x86-32 target, I think I found > some fairly old flaw in the IRQ virtualization that causes rescheduling > delays (up to deadlocks) for Linux: > > - we are in sysenter_tail (other exit paths should be affected as well) > - we DISABLE_INTERRUPTS, but only virtually > - we go past "testl $_TIF_ALLWORK_MASK, %ecx", nothing to be done > - an IRQ for Linux arrives, it is pushed to the backlog > - __ipipe_unstall_iret_root replays the IRQ as the regs we are about to > return to have IF set (obviously, we return from a syscall) > - the Linux IRQ handler sets _TIF_NEED_RESCHED, but doesn't perform the > work on return as __ipipe_sync_stage set the stall flag for the Linux > domain before calling the handler > - but now the preempted sysenter return also does no reschedule as it > already passed the check - bang!
Ouch. You must have had a really busy Monday to find this one. > > Another variant of this Linux rescheduling issue: > > - we are in a lengthy loop inside the kernel, but we are preemptible > most of the time > - after disabling Linux IRQs briefly, we are calling > local_irq_enable() again > - in the meantime, we received a Linux IRQ which is now pending in the > backlog > - __ipipe_unstall_root triggers __ipipe_sync_stage > - Linux handler is called, sets NEED_RESCHED but does not reschedule > (see above) > - we do not test for resched again as we are not returning to user > space, and that for quite some time - bang! > > I think both issues are only related to virtualizing DISABLE_INTERRUPTS > for entry_32.S and I wonder if this doesn't finally qualify for a switch > to the 64-bit model. Or do you see simpler fixes? > We could probably use hw masking from sysenter_tail and on, but quite frankly, I think this time, enough is enough and this bug calls for a radical fix, which is indeed getting rid of interrupt virtualization in the kernel entry/exit paths for x86_32, which no other arch ever implemented anyway. The decision to virtualize there as well was taken circa 2.4.18, when upstream did not care that much about latency yet. Things have changed, and there is no more reason to virtualize interrupts in very short critical sections, at the expense of a lot more complexity. - __ipipe_unstall_iret_root - __ipipe_kpreempt_root and much of the nonsense we do to track linux's interrupt state would go away. -- Philippe. _______________________________________________ Adeos-main mailing list [email protected] https://mail.gna.org/listinfo/adeos-main
