Jan Kiszka wrote:
> Hi Philippe,
> 
> as already indicated, I'm starting to understand the ipipe bug Roman
> sees. It seems to melt down to the following path:
> 
> - exception raised over non-root domain (__rt_event_wait...)
> - root domain is stalled on entry of __ipipe_handle_exception
> - fault causing task is first relaxed, then scheduled away under Linux
> - scheduled-in Linux task was interrupted in __ipipe_divert_exception,
>   shortly before __fixup_if
> - __fixup_if finds root domain stalled and propagates this to the
>   register set of the interrupted context (user space task running on
>   its first fpu instruction, having triggered device_not_available).
> - return to user space task with irqs disable - bang!
>

Good catch.

> Two ways to approach this:
> 1. Do we actually have to stall the root domain in
>    __ipipe_handle_exception before ipipe_trap_notify? I don't see why we
>    should be better off with doing this afterwards.

We do, because the root domain may install an I-pipe event handler on exceptions
as well, and the callee may assume that the virtual interrupt state is correct.

> 2. Avoid that __ipipe_divert_exception is interruptible and can pick up
>    the stall flag from a different Linux task. But I don't know if there
>    aren't more race windows like that.
> 

Since the core of the issue is about a preemption point that may be introduced
by a thread migration to secondary, the same goes with __ipipe_syscall_root;
this is what I stumbled upon on a different trace set.

The way to fix this properly is to decouple fixup_if() from the current global
interrupt state at call time, and rather make such state context-dependent, so
that iret emulation always uses the proper state value. A typical approach would
be to record the stall bit value on the caller's stack, and feed fixup_if() 
with it.

> Jan
> 


-- 
Philippe.


_______________________________________________
Adeos-main mailing list
[email protected]
https://mail.gna.org/listinfo/adeos-main

Reply via email to