> On 15 Sep 2020, at 17:23, Jan Kiszka <jan.kis...@siemens.com> wrote:
> 
> On 15.09.20 16:09, Oliver Schwartz wrote:
>>> On 15 Sep 2020, at 14:02, Jan Kiszka <jan.kis...@siemens.com> wrote:
>>> 
>>> On 15.09.20 13:49, Oliver Schwartz wrote:
>>>>> On 15 Sep 2020, at 11:00, Jan Kiszka <jan.kis...@siemens.com 
>>>>> <mailto:jan.kis...@siemens.com>> wrote:
>>>>> 
>>>>> On 15.09.20 09:07, Oliver Schwartz wrote:
>>>>>> I’m currently trying out the arm64-zero-exits branch and got stuck.
>>>>>> System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve 
>>>>>> brought ATF up to date and patched it with Jans patch to enable SDEI. If 
>>>>>> I don’t enable SDEI in ATF everything works as expected (with VM exits 
>>>>>> for interrupts, of course). Jailhouse source is the tip of branch 
>>>>>> arm64-zero-exits.
>>>>>> If I enable SDEI in ATF, jailhouse works most of the time, except for 
>>>>>> when it doesn’t. Sometimes, ‘jailhouse enable’ results in:
>>>>>>> Initializing processors:
>>>>>>>  CPU 1... OK
>>>>>>>  CPU 0... 
>>>>>>> /home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73:
>>>>>>>  returning error -EIO
>>>>> 
>>>>> Weird - that the SDEI event enable call.
>>>> Yes, that’s a bit scary. The code involved in ATF is limited - I’m pretty 
>>>> sure that I’m up-to-date with upstream there.
>>>>>>> FAILED
>>>>>>> JAILHOUSE_ENABLE: Input/output error
>>>>>> I’ve seen this error only when I enable jailhouse through some init 
>>>>>> script during the boot process, when the system is also busy otherwise. 
>>>>>> When starting jailhouse on an idle system I haven’t seen this.
>>>>> 
>>>>> Possibly a regression of my recent refactoring which I didn't manage to 
>>>>> test yet. Could you try if
>>>>> 
>>>>> https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f
>>>>> 
>>>>> was any better?
>>>> No, I don’t see any difference with that version.
>>> 
>>> Good and bad news at the same time, unfortunately ruling out a quick 
>>> solution.
>>> 
>>>>> 
>>>>>> Sometimes it may hang later during ‘jailhouse enable’:
>>>>>>> Initializing processors:
>>>>>>>  CPU 1... OK
>>>>>>>  CPU 0... OK
>>>>>>>  CPU 2... OK
>>>>>>>  CPU 3... OK
>>>>>>> Initializing unit: irqchip
>>>>>>> Using SDEI-based management interrupt
>>>>>>> Initializing unit: ARM SMMU v3
>>>>>>> Initializing unit: PVU IOMMU
>>>>>>> Initializing unit: PCI
>>>>>>> Adding virtual PCI device 00:00.0 to cell "root"
>>>>>>> Page pool usage after late setup: mem 67/992, remap 5/131072
>>>>>>> Activating hypervisor
>>>>>>> [    5.847540] The Jailhouse is opening.
>>>>>> Using a JTAG debugger I see that one or more cores are stuck in 
>>>>>> hypervisor/arch/arm-common/psci.c, line 105.
>>>>>> It may also succeed in stopping one or more CPUs and then hang (again 
>>>>>> with one or more cores stuck in psci.c, line 105):
>>>>>>> [    5.810220] The Jailhouse is opening.
>>>>>>> [    5.860054] CPU1: shutdown
>>>>>>> [    5.862677] psci: CPU1 killed.
>>>>>> Has anyone else observed this? Any ideas on what may cause this? My gut 
>>>>>> feeling is that there’s a race condition somewhere, but it feels like 
>>>>>> looking for a needle in a haistack.
>>>>>> Finally, if ‘jailhouse enable’ succeeds, a subsequent ‘jailhouse 
>>>>>> disable’ followed by another ‘jailhouse enable’ always hangs the system 
>>>>>> (cores stuck in psci.c).
>>>>>> Otherwise, once ‘jailhouse enable’ succeeds the system is working fine 
>>>>>> and I can start, stop load and unload my guest inmate at will.
>>>>>> To make matters a bit more complicated: My system is based on Xilinx 
>>>>>> Petalinux 2018.2. For various reasons I’m stuck with the ATF version 
>>>>>> that comes with Petalinux 
>>>>>> (https://github.com/Xilinx/arm-trusted-firmware/tree/xilinx-v2018.2), 
>>>>>> which is a bit dated. To get SDEI to work I had to backport a number of 
>>>>>> patches from later releases. I am quite confident that SDEI and EHF 
>>>>>> handling are now up-to-date with the master branch from Arms ATF 
>>>>>> repository, but there remains a chance that I missed something and the 
>>>>>> issues above result from something in ATF.
>>>>> 
>>>>> OK, obviously that different ATF is another critical point, also in the 
>>>>> light of SDEI_EVENT_ENABLE failing.
>>>> Sure. If you or others haven’t ever seen the above behaviour then the 
>>>> issue is most likely on this side and I have to do another comparison of 
>>>> my ATF sources to upstream.
>>> 
>>> Theoretically, it might also be a hidden issue in the ATF patch itself, 
>>> just exposed by your different setup.
>>> 
>>>>> Can't you get your board running with the upstream ATF version, just for 
>>>>> the Jailhouse test? Then we would know better where to dig.
>>>> That was my first approach, but I didn’t get very far. With upstream ATF 
>>>> from Arm my (Xilinx enhanced) kernel doesn’t boot. Exchanging the kernel 
>>>> opens another can of worms, but I’ll see what I can do.
>>>> I managed to boot with ATF from master in the Xilinx repository. I also 
>>>> had to update the PMU Firmware to make this work. The resulting system was 
>>>> acting strange in a number of ways. Jailhouse showed the same occasional 
>>>> hangs during intial CPU shutdown, but given the overall unstable system I 
>>>> abandoned any further investigations and resorted to patching the working 
>>>> ATF.
>>> 
>>> OK, sounds frightening, indeed. The overall degree of adjustments you have 
>>> to apply to even get booting systems is, well, demotivating with that 
>>> platform.
>>> 
>>> Anyway, pick the most reproducible effect, probably that failing 
>>> EVENT_ENABLE, and try to debug that in depth in the hope to find a single 
>>> magic root cause. Nasty things come with multi-cause problems, though, and 
>>> I've seen too many already.
>> True, unfortunately.
>> Some update on the EVENT_ENABLE: I’ve enabled logging in the SDEI part of 
>> ATF. A successful init looks like this:
>>> Initializing Jailhouse hypervisor v0.12 (105-g5352a61b-dirty) on CPU 3
>>> Code location: 0x0000ffffc0200800
>>> Page pool usage after early setup: mem 39/992, remap 0/131072
>>> Initializing processors:
>>>  CPU 3... INFO:    SDEI: > VER
>>> INFO:    SDEI: < VER:1000000000000
>>> INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
>>> INFO:    SDEI:  event state 0x0 => 0x1
>>> INFO:    SDEI: < REG:0
>>> INFO:    SDEI: > ENABLE(n:0)
>>> INFO:    SDEI:  event state 0x1 => 0x3
>>> INFO:    SDEI: < ENABLE:0
>>> INFO:    SDEI: > UNMASK:80000003
>>> INFO:    SDEI: < UNMASK:0
>>> OK
>>>  CPU 1... INFO:    SDEI: > VER
>>> INFO:    SDEI: < VER:1000000000000
>>> INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
>>> INFO:    SDEI:  event state 0x0 => 0x1
>>> INFO:    SDEI: < REG:0
>>> INFO:    SDEI: > ENABLE(n:0)
>>> INFO:    SDEI:  event state 0x1 => 0x3
>>> INFO:    SDEI: < ENABLE:0
>>> INFO:    SDEI: > UNMASK:80000001
>>> INFO:    SDEI: < UNMASK:0
>>> OK
>>>  CPU 0... INFO:    SDEI: > VER
>>> INFO:    SDEI: < VER:1000000000000
>>> INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
>>> INFO:    SDEI:  event state 0x0 => 0x1
>>> INFO:    SDEI: < REG:0
>>> INFO:    SDEI: > ENABLE(n:0)
>>> INFO:    SDEI:  event state 0x1 => 0x3
>>> INFO:    SDEI: < ENABLE:0
>>> INFO:    SDEI: > UNMASK:80000000
>>> INFO:    SDEI: < UNMASK:0
>>> OK
>>>  CPU 2... INFO:    SDEI: > VER
>>> INFO:    SDEI: < VER:1000000000000
>>> INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
>>> INFO:    SDEI:  event state 0x0 => 0x1
>>> INFO:    SDEI: < REG:0
>>> INFO:    SDEI: > ENABLE(n:0)
>>> INFO:    SDEI:  event state 0x1 => 0x3
>>> INFO:    SDEI: < ENABLE:0
>>> INFO:    SDEI: > UNMASK:80000002
>>> INFO:    SDEI: < UNMASK:0
>>> OK
>>> Initializing unit: irqchip
>>> Using SDEI-based management interrupt
>> In case of an error in setup.c, I get this:
>>> Initializing Jailhouse hypervisor v0.12 (105-g5352a61b-dirty) on CPU 0
>>> Code location: 0x0000ffffc0200800
>>> Page pool usage after early setup: mem 39/992, remap 0/131072
>>> Initializing processors:
>>>  CPU 0... INFO:    SDEI: > VER
>>> INFO:    SDEI: < VER:1000000000000
>>> INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
>>> INFO:    SDEI:  event state 0x0 => 0x1
>>> INFO:    SDEI: < REG:0
>>> INFO:    SDEI: > ENABLE(n:536872211)
>>> INFO:    SDEI: < ENABLE:-2
>>> /home/oliver/hil_sw/petalinux/rtbox2/build/tmp/work/plnx_zynqmp-xilinx-linux/jailhouse/0.12-gitAUTOINC+5352a61ba5-r0/git/hypervisor/arch/arm64/setup.c:74:
>>>  returning error err
>>> FAILED
>>> 10ININFNNFFFOOOO:: : :             SDS SDSDEEDIEE:I:I I>:   >:MA>   > 
>>> SMMMKAAA:SSSK8KK:80:0:080008000000000000300
>>> N00I02
>>> O0FI
>>> F NO
>>>  S    :F SOI D NF:E  IS O:D: E   I< : S  D S<EM IAD: MSEA<IKSK ::M 0:A<0
>>> K
>>> SO:INM0NAFF
>>> O KOI:::N 0  F
>>> 0nG) I  NS SFDDO: E E IISD :E:    I>>:S  DUN EU>INR: E URG>E( 
>>> NGnUR:(NEnRG0:E()
>>> (:nNF
>>>      :00OI): )
>>>   eE FIOSN:DFEO I: :     S< D  UESNDIER:E IG:< : U- N3evR
>>>   nGt:- s3t
>>> ate 0x1 => 0x0
>>> 3INNFFOO::        SSDDEEII::  <<  UUNNRREEGG::-0
>>>  JAILHOUSE_ENABLE: No such file or directory
>>> JAILHOUSE_CELL_CREATE: Invalid argument
>> The interesting part is this:
>>> INFO:    SDEI: > ENABLE(n:536872211)
>> The argument seen by ATF for the ENABLE call is 536872211, but in setup.c 
>> it’s hardcoded to 0. The problem doesn’t seem to be with the ATF version, 
>> but with the SMC calling per se, which is very scary.
>> The garbled output also made me think if there maybe is a concurrency issue 
>> in ATF.
> 
> Something is interrupting this synchronous call, mangling registers and then 
> returning as if nothing happened - I would say…


Found it. The manging with the registers is caused by the SMC calls themselves. 

Looking at the disassembly of setup.c I noticed that x1 was not reset to zero 
between the call to smc_arg5 (event_register) and smc_arg1 (event_handle). I 
believe this is because of the combination of ‘inline’ declaration and register 
arguments for the smc calls, which makes the compiler assume that the input 
registers are constant and maintain their values during the smc call. To fix 
this, the relevant registers (x0 to x3, if used) must also be declared as 
output arguments in the asm instruction in smc.h. With this change, the SMC 
calls always succeed on my system. I’ll send a patch tomorrow.

The bad news is that it doesn’t solve the other problem where the cores hang on 
initial shutdown.

Oliver

-- 
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jailhouse-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/4FFCB12D-ED2B-4B04-BC8F-1BE1CB12B6E4%40gmx.de.

Reply via email to