On 25.09.20 16:44, Jan Kiszka wrote: > On 25.09.20 16:21, Jan Kiszka wrote: >> On 17.09.20 10:36, Oliver Schwartz wrote: >>> >>> >>>> On 17 Sep 2020, at 09:31, Jan Kiszka <jan.kis...@siemens.com> wrote: >>>> >>>> On 17.09.20 09:16, Oliver Schwartz wrote: >>>>>> On 15 Sep 2020, at 11:00, Jan Kiszka <jan.kis...@siemens.com >>>>>> <mailto:jan.kis...@siemens.com>> wrote: >>>>>> >>>>>> On 15.09.20 09:07, Oliver Schwartz wrote: >>>>>>> I’m currently trying out the arm64-zero-exits branch and got stuck. >>>>>>> System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve >>>>>>> brought ATF up to date and patched it with Jans patch to enable SDEI. >>>>>>> If I don’t enable SDEI in ATF everything works as expected (with VM >>>>>>> exits for interrupts, of course). Jailhouse source is the tip of branch >>>>>>> arm64-zero-exits. >>>>>>> If I enable SDEI in ATF, jailhouse works most of the time, except for >>>>>>> when it doesn’t. Sometimes, ‘jailhouse enable’ results in: >>>>>>>> Initializing processors: >>>>>>>> CPU 1... OK >>>>>>>> CPU 0... >>>>>>>> /home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73: >>>>>>>> returning error -EIO >>>>>> >>>>>> Weird - that the SDEI event enable call. >>>>>> >>>>>>>> FAILED >>>>>>>> JAILHOUSE_ENABLE: Input/output error >>>>>>> I’ve seen this error only when I enable jailhouse through some init >>>>>>> script during the boot process, when the system is also busy otherwise. >>>>>>> When starting jailhouse on an idle system I haven’t seen this. >>>>>> >>>>>> Possibly a regression of my recent refactoring which I didn't manage to >>>>>> test yet. Could you try if >>>>>> >>>>>> https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f >>>>>> >>>>>> was any better? >>>>>> >>>>>>> Sometimes it may hang later during ‘jailhouse enable’: >>>>>>>> Initializing processors: >>>>>>>> CPU 1... OK >>>>>>>> CPU 0... OK >>>>>>>> CPU 2... OK >>>>>>>> CPU 3... OK >>>>>>>> Initializing unit: irqchip >>>>>>>> Using SDEI-based management interrupt >>>>>>>> Initializing unit: ARM SMMU v3 >>>>>>>> Initializing unit: PVU IOMMU >>>>>>>> Initializing unit: PCI >>>>>>>> Adding virtual PCI device 00:00.0 to cell "root" >>>>>>>> Page pool usage after late setup: mem 67/992, remap 5/131072 >>>>>>>> Activating hypervisor >>>>>>>> [ 5.847540] The Jailhouse is opening. >>>>>>> Using a JTAG debugger I see that one or more cores are stuck in >>>>>>> hypervisor/arch/arm-common/psci.c, line 105. >>>>>>> It may also succeed in stopping one or more CPUs and then hang (again >>>>>>> with one or more cores stuck in psci.c, line 105): >>>>>>>> [ 5.810220] The Jailhouse is opening. >>>>>>>> [ 5.860054] CPU1: shutdown >>>>>>>> [ 5.862677] psci: CPU1 killed. >>>>> Now, with the first problem solved I’ve digged into the second one. It’s >>>>> actually a bit worse than in my initial description: If I just do >>>>> ‘jailhouse enable’ the system will always hang a few milliseconds after >>>>> the command completes - the only exception is when ‘jailhouse create’ is >>>>> executed immediately afterwards (which creates an inmate that uses 3 of 4 >>>>> CPU cores, leaving just one for Linux), which succeeds roughly on every >>>>> second try. I didn’t notice this initially because I usually start >>>>> jailhouse with a script that does ‘enable’ and ‘create’. >>>>> The reason for the hangs seems to be the psci emulation in Jailhouse, in >>>>> particular the CPU_SUSPEND calls. These are issued from my (Xilinx-) >>>>> kernel frequently if Linux has more than one core available. With SDEI >>>>> disabled the core can be woken up again by some interrupt. With SDEI >>>>> enabled, the core waits forever on the wfi intstruction. Because a >>>>> suspended core never wakes up again the whole system hangs at some point. >>>>> Any ideas why no interrupts are seen anymore in psci? My guess is that >>>>> it’s because the inmate (Linux) now has full control over the GIC, so it >>>>> may disable any interrupts before suspending a core, without Jailhouse >>>>> noticing. If this is the case, it may be necessary to re-enable the IRQs >>>>> before executing wfi. But I’m missing the big picture here - what >>>>> interrupt is the core waiting for in the first place? Any other thoughts? >>>> >>>> You likely found a bug in the SDEI feature of Jailhouse. The CPU_SUSPEND >>>> emulation assumes non-SDEI operation, i.e. interception of interrupts by >>>> the hypervisor, but that is not true in this mode. >>>> >>>> We need a way to wait for interrupts without actually receiving them when >>>> they arrive, but rather return to EL1 then. Maybe re-enabling >>>> interception, waiting, and then disabling it again before returning would >>>> do the trick. But then I also do not understand yet why >>>> https://github.com/bao-project/bao-hypervisor/blob/master/src/arch/armv8/psci.c >>>> gets away with wfi. Possibly, they run with interrupts on through the >>>> hypervisor, though that would not be straightforward either. >>> >>> The good news is that there’s an easy workaround, at least on my system: >>> disabling suspend calls before starting jailhouse >>> (echo 1 > /sys/devices//system/cpu/cpu<n>/cpuidle/state1/disable). >>> >> >> Seems the reason I was not seeing this so far is that my config [1] was >> lacking CONFIG_ARM_PSCI_CPUIDLE. Seeing it now as well, let's debug. >> > > My ideas seems to work (quick hack): > > diff --git a/hypervisor/arch/arm-common/psci.c > b/hypervisor/arch/arm-common/psci.c > index 6a9abf60..3bb3f6a8 100644 > --- a/hypervisor/arch/arm-common/psci.c > +++ b/hypervisor/arch/arm-common/psci.c > @@ -101,6 +101,14 @@ long psci_dispatch(struct trap_context *ctx) > > case PSCI_0_2_FN_CPU_SUSPEND: > case PSCI_0_2_FN64_CPU_SUSPEND: > + if (sdei_available) { > + unsigned long hcr; > + arm_read_sysreg(HCR_EL2, hcr); > + arm_write_sysreg(HCR_EL2, > + hcr | HCR_IMO_BIT | HCR_FMO_BIT); > + asm volatile("wfi" : : : "memory"); > + arm_read_sysreg(HCR_EL2, hcr); > + } else > if (!irqchip_has_pending_irqs()) { > asm volatile("wfi" : : : "memory"); > irqchip_handle_irq(); > > Now, if someone with more architectural knowledge than I could explain > why that's the case and if that will work on all platforms, with both > GICv2 and v3 (and maybe even v4), we could convert that into real patch. > Trying my luck on the CC list... >
Nää, I was too quick: wfi works, i.e. the hypervisor is woken up on pending interrupts, but some more bits than simply clearing IMO/FMO in HCR are needed in order to forward that pending irq event to EL1 when returning to it. Jan PS: I strongly suspect this is just broken under SDEI in bao as well. -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/2a039ddc-bd8f-eaf9-2494-7f62efb9aa80%40siemens.com.