Jan Kiszka wrote: > Philippe Gerum wrote: >> Jan Kiszka wrote: >>> Jan Kiszka wrote: >>>> Philippe Gerum wrote: >>>>> Jan Kiszka wrote: >>>>>> Philippe Gerum wrote: >>>>>>> Jan Kiszka wrote: >>>>>>>> This is an attempt to fix the broken root domain state adjustment in >>>>>>>> __ipipe_handle_exception. Patch below fixes the issues recently >>>>>>>> reported >>>>>>>> by Roman Pisl. Also, it currently makes much more sense to me than what >>>>>>>> we have so far. >>>>>>>> >>>>>>>> In short, this patch propagates the hardware irq state into the root >>>>>>>> domains stall flag shortly before calling into the Linux handler, and >>>>>>>> only then. This avoids spurious root domain stalls the end up over the >>>>>>>> wrong Linux context due to context switches between enter and exit of >>>>>>>> ipipe_handle_exception. Also, this patch drops the bogus >>>>>>>> local_irq_save/restore pair that doesn't account for Linux irq state >>>>>>>> changes inside its fault handler. >>>>>>>> >>>>>>> Actually, it is not bogus at all, it is even mandatory on x86_64, given >>>>>>> that we >>>>>>> don't branch to any sysretq/iretq emulation unlike with x86_32. So if >>>>>>> we don't >>>>>>> restore the stall bit for the root domain properly there, we could end >>>>>>> up >>>>>>> running with interrupts off in user-space. >>>>>>> >>>>>>> However, the way the interrupt state is currently saved is wrong: we >>>>>>> should not >>>>>>> local_irq_disable() over non-root domains. Here is some on-line >>>>>>> documentation to >>>>>>> explain why: >>>>>>> >>>>>>> The main difference between x86_32 and 64 is that the former does >>>>>>> virtualize the >>>>>>> interrupt state in entry_32.S, unlike the latter. For that reason, >>>>>>> x86_64 does >>>>>>> not require (actually, we should not be doing) any fixup. So, to sum up: >>>>>>> >>>>>>> - we use fixup_if() to restore the virtual interrupt state properly >>>>>>> when control >>>>>>> is given back to the code that triggered the fault/exception (x86_32). >>>>>>> We need >>>>>>> to do that because of task migrations between primary and secondary >>>>>>> modes. >>>>>>> >>>>>>> - we must clear the virtual interrupt flag before calling the I-pipe >>>>>>> handler / >>>>>>> Linux regular exception handler, because our callee may/must run in the >>>>>>> root >>>>>>> domain as well, and expect that interrupt state to reflect the hw one, >>>>>>> as set by >>>>>>> the x86 exception gate / fault prologue in entry_*.S. >>>>>>> >>>>>>> - because of the above, we must use >>>>>>> local_irq_save()/local_irq_restore_nosync() >>>>>>> in our fault handler to make sure to restore the virtual interrupt flag >>>>>>> properly >>>>>>> between this routine, and the exception return statement (i.e. during >>>>>>> the Linux >>>>>>> fault epilogue in entry_*.S). >>>>>> OK, if there is a reason to enforce a stalled root domain while calling >>>>>> into the exception hook, this makes some sense. But I don't think it is >>>>>> formally correct to save the root state on entry and blindly restore it >>>>>> _after_ calling the Linux handler. I rather think we should keep the >>>>>> state that Linux leaves behind to remain transparent to it. Maybe no >>>>>> practical issue ATM, but it makes the code at least illogical. >>>>>> >>>>> Please re-read the explanations, and you will find the logic. I cannot do >>>>> anything more than re-hashing what I just said. What you perceive as >>>>> illogical >>>>> is actually the only sane way to do this. Formally speaking, a linux fault >>>>> handler may NOT alter the interrupt state blindly, so we must be able to >>>>> assume >>>>> that we ought to restore it the way the lower code set it. >>>> I got your first and second point, but they don't imply to me that the >>>> third shall be correct as well. "...to make sure to restore the virtual >>>> interrupt flag properly" is not directly an clear explanation (for me) >>>> why we have to restore the flag across calls to the _Linux_ handler. We >>>> can demand that the hook handler leaves the root state untouched, but >>>> requiring the same from Linux is a restriction that you don't find in >>>> the ipipe-less case, nor do I see the reason for this under ipipe control. >>>> >>> The make my question a bit more concrete (and help me writing the right >>> comments around these lines): What makes the following change bogus, >>> which scenario will fail? >>> >>> Index: b/arch/x86/kernel/ipipe.c >>> =================================================================== >>> --- a/arch/x86/kernel/ipipe.c >>> +++ b/arch/x86/kernel/ipipe.c >>> @@ -685,7 +685,9 @@ int __ipipe_handle_exception(struct pt_r >>> } >>> >>> __ipipe_std_extable[vector](regs, error_code); >>> - local_irq_restore_nosync(flags); >>> + >>> + __fixup_if(test_bit(IPIPE_STALL_FLAG, &ipipe_root_cpudom_var(status)), >>> + regs); >>> >>> return 0; >>> } >>> >> >> This would break the interrupt state on x86_64, because it is not >> virtualized by > > Hmm, __fixup_if is void on x86-64, so this was way off what I was trying > to express. > >> the low level code (latency wise, this is not worth the burden). So your >> exception path would stall the root domain, and never unstall it because you >> do >> not have any iretq/sysretq emulation; actually, you do not have any fixup. >> This >> would work on x86_32 for the converse reason though. > > OK, I finally understood this difference. (I guess it comes from a > different code structure of entry_32.S compared to entry_64.S, right? So > nothing we could unify on our own?) >
Exactly, or at least something that would be overkill to unify when it comes to full virtualization of the interrupt state in entry_64.S, for basically no gain, and maybe worse performances. > To clarify this for me: For 32 bit, the pipeline state after iret/sysret > is calculated in the entry layer (in __ipipe_unstall_iret_root more > precisely), so the local_irq_restore_nosync is actually of minor > importance here, isn't it? Couldn't we skip it for this arch then? > Yes, this would work with the current upstream code; but at the same time, not restoring the stall bit the way we got it on entry of the exception handler would break the assumption the I-pipe may make in further revisions that no routine will interfere with the root domain state. > On 64 bit, we have to set the right pipeline state before returning to > the entry layer because it won't be touched there at all. We currently > do this based on the state found on exception entry, but we could also > do it based on the regs.flags state that the Linux handler left behind > (like we do on 32 bit). But the scenario I had in mind where this would > actually make a difference turned out to be a red herring. I don't think > Linux modifies regs.flags in its exception handlers. The ipipe pattern > remains inconsistent IMHO, but it is practically irrelevant. > I don't think it is inconsistent, it is just about defining who owns the interrupt state. When the I-pipe is enabled, the fundamental design choice is that our low level code has precedence over what the upstream code does. Obviously, we try to do that in a way that keeps Linux happy, but if you consider the way x86_64 works for us, we actually choose to enforce interrupt protection using the hw state like in the I-pipe less case, instead of relying on a virtualized state while running the epilogue code, with some exceptions like calling preempt_schedule_irq() with (virtualized) interrupts off. > > Final question to explain the __fixup_if in __ipipe_handle_exception: > That's due to the scenario where we migrate to the root domain while > running the notify handler? We may return from that migration with some > IF state in regs.flags that no longer matches the one found on exception > entry, correct? Correct. > > Will stuff all this into a few lines of comments soon. > > Thanks for your patience, > Jan > -- Philippe. _______________________________________________ Adeos-main mailing list [email protected] https://mail.gna.org/listinfo/adeos-main
