Jan Kiszka wrote: > Hi Philippe, > > as already indicated, I'm starting to understand the ipipe bug Roman > sees. It seems to melt down to the following path: > > - exception raised over non-root domain (__rt_event_wait...) > - root domain is stalled on entry of __ipipe_handle_exception > - fault causing task is first relaxed, then scheduled away under Linux > - scheduled-in Linux task was interrupted in __ipipe_divert_exception, > shortly before __fixup_if > - __fixup_if finds root domain stalled and propagates this to the > register set of the interrupted context (user space task running on > its first fpu instruction, having triggered device_not_available). > - return to user space task with irqs disable - bang! >
Good catch. > Two ways to approach this: > 1. Do we actually have to stall the root domain in > __ipipe_handle_exception before ipipe_trap_notify? I don't see why we > should be better off with doing this afterwards. We do, because the root domain may install an I-pipe event handler on exceptions as well, and the callee may assume that the virtual interrupt state is correct. > 2. Avoid that __ipipe_divert_exception is interruptible and can pick up > the stall flag from a different Linux task. But I don't know if there > aren't more race windows like that. > Since the core of the issue is about a preemption point that may be introduced by a thread migration to secondary, the same goes with __ipipe_syscall_root; this is what I stumbled upon on a different trace set. The way to fix this properly is to decouple fixup_if() from the current global interrupt state at call time, and rather make such state context-dependent, so that iret emulation always uses the proper state value. A typical approach would be to record the stall bit value on the caller's stack, and feed fixup_if() with it. > Jan > -- Philippe. _______________________________________________ Adeos-main mailing list [email protected] https://mail.gna.org/listinfo/adeos-main
