On 17.09.20 10:36, Oliver Schwartz wrote: > > >> On 17 Sep 2020, at 09:31, Jan Kiszka <jan.kis...@siemens.com> wrote: >> >> On 17.09.20 09:16, Oliver Schwartz wrote: >>>> On 15 Sep 2020, at 11:00, Jan Kiszka <jan.kis...@siemens.com >>>> <mailto:jan.kis...@siemens.com>> wrote: >>>> >>>> On 15.09.20 09:07, Oliver Schwartz wrote: >>>>> I’m currently trying out the arm64-zero-exits branch and got stuck. >>>>> System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve >>>>> brought ATF up to date and patched it with Jans patch to enable SDEI. If >>>>> I don’t enable SDEI in ATF everything works as expected (with VM exits >>>>> for interrupts, of course). Jailhouse source is the tip of branch >>>>> arm64-zero-exits. >>>>> If I enable SDEI in ATF, jailhouse works most of the time, except for >>>>> when it doesn’t. Sometimes, ‘jailhouse enable’ results in: >>>>>> Initializing processors: >>>>>> CPU 1... OK >>>>>> CPU 0... >>>>>> /home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73: >>>>>> returning error -EIO >>>> >>>> Weird - that the SDEI event enable call. >>>> >>>>>> FAILED >>>>>> JAILHOUSE_ENABLE: Input/output error >>>>> I’ve seen this error only when I enable jailhouse through some init >>>>> script during the boot process, when the system is also busy otherwise. >>>>> When starting jailhouse on an idle system I haven’t seen this. >>>> >>>> Possibly a regression of my recent refactoring which I didn't manage to >>>> test yet. Could you try if >>>> >>>> https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f >>>> >>>> was any better? >>>> >>>>> Sometimes it may hang later during ‘jailhouse enable’: >>>>>> Initializing processors: >>>>>> CPU 1... OK >>>>>> CPU 0... OK >>>>>> CPU 2... OK >>>>>> CPU 3... OK >>>>>> Initializing unit: irqchip >>>>>> Using SDEI-based management interrupt >>>>>> Initializing unit: ARM SMMU v3 >>>>>> Initializing unit: PVU IOMMU >>>>>> Initializing unit: PCI >>>>>> Adding virtual PCI device 00:00.0 to cell "root" >>>>>> Page pool usage after late setup: mem 67/992, remap 5/131072 >>>>>> Activating hypervisor >>>>>> [ 5.847540] The Jailhouse is opening. >>>>> Using a JTAG debugger I see that one or more cores are stuck in >>>>> hypervisor/arch/arm-common/psci.c, line 105. >>>>> It may also succeed in stopping one or more CPUs and then hang (again >>>>> with one or more cores stuck in psci.c, line 105): >>>>>> [ 5.810220] The Jailhouse is opening. >>>>>> [ 5.860054] CPU1: shutdown >>>>>> [ 5.862677] psci: CPU1 killed. >>> Now, with the first problem solved I’ve digged into the second one. It’s >>> actually a bit worse than in my initial description: If I just do >>> ‘jailhouse enable’ the system will always hang a few milliseconds after the >>> command completes - the only exception is when ‘jailhouse create’ is >>> executed immediately afterwards (which creates an inmate that uses 3 of 4 >>> CPU cores, leaving just one for Linux), which succeeds roughly on every >>> second try. I didn’t notice this initially because I usually start >>> jailhouse with a script that does ‘enable’ and ‘create’. >>> The reason for the hangs seems to be the psci emulation in Jailhouse, in >>> particular the CPU_SUSPEND calls. These are issued from my (Xilinx-) kernel >>> frequently if Linux has more than one core available. With SDEI disabled >>> the core can be woken up again by some interrupt. With SDEI enabled, the >>> core waits forever on the wfi intstruction. Because a suspended core never >>> wakes up again the whole system hangs at some point. >>> Any ideas why no interrupts are seen anymore in psci? My guess is that it’s >>> because the inmate (Linux) now has full control over the GIC, so it may >>> disable any interrupts before suspending a core, without Jailhouse >>> noticing. If this is the case, it may be necessary to re-enable the IRQs >>> before executing wfi. But I’m missing the big picture here - what interrupt >>> is the core waiting for in the first place? Any other thoughts? >> >> You likely found a bug in the SDEI feature of Jailhouse. The CPU_SUSPEND >> emulation assumes non-SDEI operation, i.e. interception of interrupts by the >> hypervisor, but that is not true in this mode. >> >> We need a way to wait for interrupts without actually receiving them when >> they arrive, but rather return to EL1 then. Maybe re-enabling interception, >> waiting, and then disabling it again before returning would do the trick. >> But then I also do not understand yet why >> https://github.com/bao-project/bao-hypervisor/blob/master/src/arch/armv8/psci.c >> gets away with wfi. Possibly, they run with interrupts on through the >> hypervisor, though that would not be straightforward either. > > The good news is that there’s an easy workaround, at least on my system: > disabling suspend calls before starting jailhouse > (echo 1 > /sys/devices//system/cpu/cpu<n>/cpuidle/state1/disable). >
Seems the reason I was not seeing this so far is that my config [1] was lacking CONFIG_ARM_PSCI_CPUIDLE. Seeing it now as well, let's debug. Jan [1] https://github.com/siemens/jailhouse-images/blob/master/recipes-kernel/linux/files/arm64_defconfig_5.4 -- Siemens AG, Corporate Technology, CT RDA IOT SES-DE Corporate Competence Center Embedded Linux -- You received this message because you are subscribed to the Google Groups "Jailhouse" group. To unsubscribe from this group and stop receiving emails from it, send an email to jailhouse-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jailhouse-dev/ac29332c-0c28-2f3f-2ec2-5b0642fff951%40siemens.com.