> On 17 Sep 2020, at 09:31, Jan Kiszka <jan.kis...@siemens.com> wrote:
>
> On 17.09.20 09:16, Oliver Schwartz wrote:
>>> On 15 Sep 2020, at 11:00, Jan Kiszka <jan.kis...@siemens.com
>>> <mailto:jan.kis...@siemens.com>> wrote:
>>>
>>> On 15.09.20 09:07, Oliver Schwartz wrote:
>>>> I’m currently trying out the arm64-zero-exits branch and got stuck.
>>>> System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve
>>>> brought ATF up to date and patched it with Jans patch to enable SDEI. If I
>>>> don’t enable SDEI in ATF everything works as expected (with VM exits for
>>>> interrupts, of course). Jailhouse source is the tip of branch
>>>> arm64-zero-exits.
>>>> If I enable SDEI in ATF, jailhouse works most of the time, except for when
>>>> it doesn’t. Sometimes, ‘jailhouse enable’ results in:
>>>>> Initializing processors:
>>>>> CPU 1... OK
>>>>> CPU 0...
>>>>> /home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73:
>>>>> returning error -EIO
>>>
>>> Weird - that the SDEI event enable call.
>>>
>>>>> FAILED
>>>>> JAILHOUSE_ENABLE: Input/output error
>>>> I’ve seen this error only when I enable jailhouse through some init script
>>>> during the boot process, when the system is also busy otherwise. When
>>>> starting jailhouse on an idle system I haven’t seen this.
>>>
>>> Possibly a regression of my recent refactoring which I didn't manage to
>>> test yet. Could you try if
>>>
>>> https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f
>>>
>>> was any better?
>>>
>>>> Sometimes it may hang later during ‘jailhouse enable’:
>>>>> Initializing processors:
>>>>> CPU 1... OK
>>>>> CPU 0... OK
>>>>> CPU 2... OK
>>>>> CPU 3... OK
>>>>> Initializing unit: irqchip
>>>>> Using SDEI-based management interrupt
>>>>> Initializing unit: ARM SMMU v3
>>>>> Initializing unit: PVU IOMMU
>>>>> Initializing unit: PCI
>>>>> Adding virtual PCI device 00:00.0 to cell "root"
>>>>> Page pool usage after late setup: mem 67/992, remap 5/131072
>>>>> Activating hypervisor
>>>>> [ 5.847540] The Jailhouse is opening.
>>>> Using a JTAG debugger I see that one or more cores are stuck in
>>>> hypervisor/arch/arm-common/psci.c, line 105.
>>>> It may also succeed in stopping one or more CPUs and then hang (again with
>>>> one or more cores stuck in psci.c, line 105):
>>>>> [ 5.810220] The Jailhouse is opening.
>>>>> [ 5.860054] CPU1: shutdown
>>>>> [ 5.862677] psci: CPU1 killed.
>> Now, with the first problem solved I’ve digged into the second one. It’s
>> actually a bit worse than in my initial description: If I just do ‘jailhouse
>> enable’ the system will always hang a few milliseconds after the command
>> completes - the only exception is when ‘jailhouse create’ is executed
>> immediately afterwards (which creates an inmate that uses 3 of 4 CPU cores,
>> leaving just one for Linux), which succeeds roughly on every second try. I
>> didn’t notice this initially because I usually start jailhouse with a script
>> that does ‘enable’ and ‘create’.
>> The reason for the hangs seems to be the psci emulation in Jailhouse, in
>> particular the CPU_SUSPEND calls. These are issued from my (Xilinx-) kernel
>> frequently if Linux has more than one core available. With SDEI disabled the
>> core can be woken up again by some interrupt. With SDEI enabled, the core
>> waits forever on the wfi intstruction. Because a suspended core never wakes
>> up again the whole system hangs at some point.
>> Any ideas why no interrupts are seen anymore in psci? My guess is that it’s
>> because the inmate (Linux) now has full control over the GIC, so it may
>> disable any interrupts before suspending a core, without Jailhouse noticing.
>> If this is the case, it may be necessary to re-enable the IRQs before
>> executing wfi. But I’m missing the big picture here - what interrupt is the
>> core waiting for in the first place? Any other thoughts?
>
> You likely found a bug in the SDEI feature of Jailhouse. The CPU_SUSPEND
> emulation assumes non-SDEI operation, i.e. interception of interrupts by the
> hypervisor, but that is not true in this mode.
>
> We need a way to wait for interrupts without actually receiving them when
> they arrive, but rather return to EL1 then. Maybe re-enabling interception,
> waiting, and then disabling it again before returning would do the trick. But
> then I also do not understand yet why
> https://github.com/bao-project/bao-hypervisor/blob/master/src/arch/armv8/psci.c
> gets away with wfi. Possibly, they run with interrupts on through the
> hypervisor, though that would not be straightforward either.
The good news is that there’s an easy workaround, at least on my system:
disabling suspend calls before starting jailhouse
(echo 1 > /sys/devices//system/cpu/cpu<n>/cpuidle/state1/disable).
Oliver
--
You received this message because you are subscribed to the Google Groups
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to jailhouse-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/jailhouse-dev/F1455873-4D81-412E-AE3F-B584773FBB29%40gmx.de.