On 25.09.20 16:44, Jan Kiszka wrote:
> On 25.09.20 16:21, Jan Kiszka wrote:
>> On 17.09.20 10:36, Oliver Schwartz wrote:
>>>
>>>
>>>> On 17 Sep 2020, at 09:31, Jan Kiszka <jan.kis...@siemens.com> wrote:
>>>>
>>>> On 17.09.20 09:16, Oliver Schwartz wrote:
>>>>>> On 15 Sep 2020, at 11:00, Jan Kiszka <jan.kis...@siemens.com 
>>>>>> <mailto:jan.kis...@siemens.com>> wrote:
>>>>>>
>>>>>> On 15.09.20 09:07, Oliver Schwartz wrote:
>>>>>>> I’m currently trying out the arm64-zero-exits branch and got stuck.
>>>>>>> System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve 
>>>>>>> brought ATF up to date and patched it with Jans patch to enable SDEI. 
>>>>>>> If I don’t enable SDEI in ATF everything works as expected (with VM 
>>>>>>> exits for interrupts, of course). Jailhouse source is the tip of branch 
>>>>>>> arm64-zero-exits.
>>>>>>> If I enable SDEI in ATF, jailhouse works most of the time, except for 
>>>>>>> when it doesn’t. Sometimes, ‘jailhouse enable’ results in:
>>>>>>>> Initializing processors:
>>>>>>>>  CPU 1... OK
>>>>>>>>  CPU 0... 
>>>>>>>> /home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73:
>>>>>>>>  returning error -EIO
>>>>>>
>>>>>> Weird - that the SDEI event enable call.
>>>>>>
>>>>>>>> FAILED
>>>>>>>> JAILHOUSE_ENABLE: Input/output error
>>>>>>> I’ve seen this error only when I enable jailhouse through some init 
>>>>>>> script during the boot process, when the system is also busy otherwise. 
>>>>>>> When starting jailhouse on an idle system I haven’t seen this.
>>>>>>
>>>>>> Possibly a regression of my recent refactoring which I didn't manage to 
>>>>>> test yet. Could you try if
>>>>>>
>>>>>> https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f
>>>>>>
>>>>>> was any better?
>>>>>>
>>>>>>> Sometimes it may hang later during ‘jailhouse enable’:
>>>>>>>> Initializing processors:
>>>>>>>>  CPU 1... OK
>>>>>>>>  CPU 0... OK
>>>>>>>>  CPU 2... OK
>>>>>>>>  CPU 3... OK
>>>>>>>> Initializing unit: irqchip
>>>>>>>> Using SDEI-based management interrupt
>>>>>>>> Initializing unit: ARM SMMU v3
>>>>>>>> Initializing unit: PVU IOMMU
>>>>>>>> Initializing unit: PCI
>>>>>>>> Adding virtual PCI device 00:00.0 to cell "root"
>>>>>>>> Page pool usage after late setup: mem 67/992, remap 5/131072
>>>>>>>> Activating hypervisor
>>>>>>>> [    5.847540] The Jailhouse is opening.
>>>>>>> Using a JTAG debugger I see that one or more cores are stuck in 
>>>>>>> hypervisor/arch/arm-common/psci.c, line 105.
>>>>>>> It may also succeed in stopping one or more CPUs and then hang (again 
>>>>>>> with one or more cores stuck in psci.c, line 105):
>>>>>>>> [    5.810220] The Jailhouse is opening.
>>>>>>>> [    5.860054] CPU1: shutdown
>>>>>>>> [    5.862677] psci: CPU1 killed.
>>>>> Now, with the first problem solved I’ve digged into the second one. It’s 
>>>>> actually a bit worse than in my initial description: If I just do 
>>>>> ‘jailhouse enable’ the system will always hang a few milliseconds after 
>>>>> the command completes - the only exception is when ‘jailhouse create’ is 
>>>>> executed immediately afterwards (which creates an inmate that uses 3 of 4 
>>>>> CPU cores, leaving just one for Linux), which succeeds roughly on every 
>>>>> second try. I didn’t notice this initially because I usually start 
>>>>> jailhouse with a script that does ‘enable’ and ‘create’.
>>>>> The reason for the hangs seems to be the psci emulation in Jailhouse, in 
>>>>> particular the CPU_SUSPEND calls. These are issued from my (Xilinx-) 
>>>>> kernel frequently if Linux has more than one core available. With SDEI 
>>>>> disabled the core can be woken up again by some interrupt. With SDEI 
>>>>> enabled, the core waits forever on the wfi intstruction. Because a 
>>>>> suspended core never wakes up again the whole system hangs at some point.
>>>>> Any ideas why no interrupts are seen anymore in psci? My guess is that 
>>>>> it’s because the inmate (Linux) now has full control over the GIC, so it 
>>>>> may disable any interrupts before suspending a core, without Jailhouse 
>>>>> noticing. If this is the case, it may be necessary to re-enable the IRQs 
>>>>> before executing wfi. But I’m missing the big picture here - what 
>>>>> interrupt is the core waiting for in the first place? Any other thoughts?
>>>>
>>>> You likely found a bug in the SDEI feature of Jailhouse. The CPU_SUSPEND 
>>>> emulation assumes non-SDEI operation, i.e. interception of interrupts by 
>>>> the hypervisor, but that is not true in this mode.
>>>>
>>>> We need a way to wait for interrupts without actually receiving them when 
>>>> they arrive, but rather return to EL1 then. Maybe re-enabling 
>>>> interception, waiting, and then disabling it again before returning would 
>>>> do the trick. But then I also do not understand yet why 
>>>> https://github.com/bao-project/bao-hypervisor/blob/master/src/arch/armv8/psci.c
>>>>  gets away with wfi. Possibly, they run with interrupts on through the 
>>>> hypervisor, though that would not be straightforward either.
>>>
>>> The good news is that there’s an easy workaround, at least on my system: 
>>> disabling suspend calls before starting jailhouse 
>>> (echo 1 >  /sys/devices//system/cpu/cpu<n>/cpuidle/state1/disable).
>>>
>>
>> Seems the reason I was not seeing this so far is that my config [1] was
>> lacking CONFIG_ARM_PSCI_CPUIDLE. Seeing it now as well, let's debug.
>>
> 
> My ideas seems to work (quick hack):
> 
> diff --git a/hypervisor/arch/arm-common/psci.c 
> b/hypervisor/arch/arm-common/psci.c
> index 6a9abf60..3bb3f6a8 100644
> --- a/hypervisor/arch/arm-common/psci.c
> +++ b/hypervisor/arch/arm-common/psci.c
> @@ -101,6 +101,14 @@ long psci_dispatch(struct trap_context *ctx)
>  
>       case PSCI_0_2_FN_CPU_SUSPEND:
>       case PSCI_0_2_FN64_CPU_SUSPEND:
> +             if (sdei_available) {
> +                     unsigned long hcr;
> +                     arm_read_sysreg(HCR_EL2, hcr);
> +                     arm_write_sysreg(HCR_EL2,
> +                                      hcr | HCR_IMO_BIT | HCR_FMO_BIT);
> +                     asm volatile("wfi" : : : "memory");
> +                     arm_read_sysreg(HCR_EL2, hcr);
> +             } else
>               if (!irqchip_has_pending_irqs()) {
>                       asm volatile("wfi" : : : "memory");
>                       irqchip_handle_irq();
> 
> Now, if someone with more architectural knowledge than I could explain 
> why that's the case and if that will work on all platforms, with both 
> GICv2 and v3 (and maybe even v4), we could convert that into real patch.
> Trying my luck on the CC list...
> 

Nää, I was too quick: wfi works, i.e. the hypervisor is woken up on
pending interrupts, but some more bits than simply clearing IMO/FMO in
HCR are needed in order to forward that pending irq event to EL1 when
returning to it.

Jan

PS: I strongly suspect this is just broken under SDEI in bao as well.

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jailhouse-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/2a039ddc-bd8f-eaf9-2494-7f62efb9aa80%40siemens.com.

Reply via email to