On 15.09.20 16:09, Oliver Schwartz wrote:


On 15 Sep 2020, at 14:02, Jan Kiszka <jan.kis...@siemens.com> wrote:

On 15.09.20 13:49, Oliver Schwartz wrote:
On 15 Sep 2020, at 11:00, Jan Kiszka <jan.kis...@siemens.com 
<mailto:jan.kis...@siemens.com>> wrote:

On 15.09.20 09:07, Oliver Schwartz wrote:
I’m currently trying out the arm64-zero-exits branch and got stuck.
System is a Xilinx ZU9EG on a custom board, similar to zcu102. I’ve brought ATF 
up to date and patched it with Jans patch to enable SDEI. If I don’t enable 
SDEI in ATF everything works as expected (with VM exits for interrupts, of 
course). Jailhouse source is the tip of branch arm64-zero-exits.
If I enable SDEI in ATF, jailhouse works most of the time, except for when it 
doesn’t. Sometimes, ‘jailhouse enable’ results in:
Initializing processors:
  CPU 1... OK
  CPU 0... 
/home/oliver/0.12-gitAUTOINC+98061469d0-r0/git/hypervisor/arch/arm64/setup.c:73:
 returning error -EIO

Weird - that the SDEI event enable call.
Yes, that’s a bit scary. The code involved in ATF is limited - I’m pretty sure 
that I’m up-to-date with upstream there.
FAILED
JAILHOUSE_ENABLE: Input/output error
I’ve seen this error only when I enable jailhouse through some init script 
during the boot process, when the system is also busy otherwise. When starting 
jailhouse on an idle system I haven’t seen this.

Possibly a regression of my recent refactoring which I didn't manage to test 
yet. Could you try if

https://github.com/siemens/jailhouse/commits/e0ef829c85895dc6387d5ea11b08aa65a456255f

was any better?
No, I don’t see any difference with that version.

Good and bad news at the same time, unfortunately ruling out a quick solution.


Sometimes it may hang later during ‘jailhouse enable’:
Initializing processors:
  CPU 1... OK
  CPU 0... OK
  CPU 2... OK
  CPU 3... OK
Initializing unit: irqchip
Using SDEI-based management interrupt
Initializing unit: ARM SMMU v3
Initializing unit: PVU IOMMU
Initializing unit: PCI
Adding virtual PCI device 00:00.0 to cell "root"
Page pool usage after late setup: mem 67/992, remap 5/131072
Activating hypervisor
[    5.847540] The Jailhouse is opening.
Using a JTAG debugger I see that one or more cores are stuck in 
hypervisor/arch/arm-common/psci.c, line 105.
It may also succeed in stopping one or more CPUs and then hang (again with one 
or more cores stuck in psci.c, line 105):
[    5.810220] The Jailhouse is opening.
[    5.860054] CPU1: shutdown
[    5.862677] psci: CPU1 killed.
Has anyone else observed this? Any ideas on what may cause this? My gut feeling 
is that there’s a race condition somewhere, but it feels like looking for a 
needle in a haistack.
Finally, if ‘jailhouse enable’ succeeds, a subsequent ‘jailhouse disable’ 
followed by another ‘jailhouse enable’ always hangs the system (cores stuck in 
psci.c).
Otherwise, once ‘jailhouse enable’ succeeds the system is working fine and I 
can start, stop load and unload my guest inmate at will.
To make matters a bit more complicated: My system is based on Xilinx Petalinux 
2018.2. For various reasons I’m stuck with the ATF version that comes with 
Petalinux (https://github.com/Xilinx/arm-trusted-firmware/tree/xilinx-v2018.2), 
which is a bit dated. To get SDEI to work I had to backport a number of patches 
from later releases. I am quite confident that SDEI and EHF handling are now 
up-to-date with the master branch from Arms ATF repository, but there remains a 
chance that I missed something and the issues above result from something in 
ATF.

OK, obviously that different ATF is another critical point, also in the light 
of SDEI_EVENT_ENABLE failing.
Sure. If you or others haven’t ever seen the above behaviour then the issue is 
most likely on this side and I have to do another comparison of my ATF sources 
to upstream.

Theoretically, it might also be a hidden issue in the ATF patch itself, just 
exposed by your different setup.

Can't you get your board running with the upstream ATF version, just for the 
Jailhouse test? Then we would know better where to dig.
That was my first approach, but I didn’t get very far. With upstream ATF from 
Arm my (Xilinx enhanced) kernel doesn’t boot. Exchanging the kernel opens 
another can of worms, but I’ll see what I can do.
I managed to boot with ATF from master in the Xilinx repository. I also had to 
update the PMU Firmware to make this work. The resulting system was acting 
strange in a number of ways. Jailhouse showed the same occasional hangs during 
intial CPU shutdown, but given the overall unstable system I abandoned any 
further investigations and resorted to patching the working ATF.

OK, sounds frightening, indeed. The overall degree of adjustments you have to 
apply to even get booting systems is, well, demotivating with that platform.

Anyway, pick the most reproducible effect, probably that failing EVENT_ENABLE, 
and try to debug that in depth in the hope to find a single magic root cause. 
Nasty things come with multi-cause problems, though, and I've seen too many 
already.

True, unfortunately.

Some update on the EVENT_ENABLE: I’ve enabled logging in the SDEI part of ATF. 
A successful init looks like this:

Initializing Jailhouse hypervisor v0.12 (105-g5352a61b-dirty) on CPU 3
Code location: 0x0000ffffc0200800
Page pool usage after early setup: mem 39/992, remap 0/131072
Initializing processors:
  CPU 3... INFO:    SDEI: > VER
INFO:    SDEI: < VER:1000000000000
INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
INFO:    SDEI:  event state 0x0 => 0x1
INFO:    SDEI: < REG:0
INFO:    SDEI: > ENABLE(n:0)
INFO:    SDEI:  event state 0x1 => 0x3
INFO:    SDEI: < ENABLE:0
INFO:    SDEI: > UNMASK:80000003
INFO:    SDEI: < UNMASK:0
OK
  CPU 1... INFO:    SDEI: > VER
INFO:    SDEI: < VER:1000000000000
INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
INFO:    SDEI:  event state 0x0 => 0x1
INFO:    SDEI: < REG:0
INFO:    SDEI: > ENABLE(n:0)
INFO:    SDEI:  event state 0x1 => 0x3
INFO:    SDEI: < ENABLE:0
INFO:    SDEI: > UNMASK:80000001
INFO:    SDEI: < UNMASK:0
OK
  CPU 0... INFO:    SDEI: > VER
INFO:    SDEI: < VER:1000000000000
INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
INFO:    SDEI:  event state 0x0 => 0x1
INFO:    SDEI: < REG:0
INFO:    SDEI: > ENABLE(n:0)
INFO:    SDEI:  event state 0x1 => 0x3
INFO:    SDEI: < ENABLE:0
INFO:    SDEI: > UNMASK:80000000
INFO:    SDEI: < UNMASK:0
OK
  CPU 2... INFO:    SDEI: > VER
INFO:    SDEI: < VER:1000000000000
INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
INFO:    SDEI:  event state 0x0 => 0x1
INFO:    SDEI: < REG:0
INFO:    SDEI: > ENABLE(n:0)
INFO:    SDEI:  event state 0x1 => 0x3
INFO:    SDEI: < ENABLE:0
INFO:    SDEI: > UNMASK:80000002
INFO:    SDEI: < UNMASK:0
OK
Initializing unit: irqchip
Using SDEI-based management interrupt

In case of an error in setup.c, I get this:

Initializing Jailhouse hypervisor v0.12 (105-g5352a61b-dirty) on CPU 0
Code location: 0x0000ffffc0200800
Page pool usage after early setup: mem 39/992, remap 0/131072
Initializing processors:
  CPU 0... INFO:    SDEI: > VER
INFO:    SDEI: < VER:1000000000000
INFO:    SDEI: > REG(n:0 e:ffffc0206784 a:ff0000010000 f:0 m:0)
INFO:    SDEI:  event state 0x0 => 0x1
INFO:    SDEI: < REG:0
INFO:    SDEI: > ENABLE(n:536872211)
INFO:    SDEI: < ENABLE:-2
/home/oliver/hil_sw/petalinux/rtbox2/build/tmp/work/plnx_zynqmp-xilinx-linux/jailhouse/0.12-gitAUTOINC+5352a61ba5-r0/git/hypervisor/arch/arm64/setup.c:74:
 returning error err
FAILED
10ININFNNFFFOOOO:: : :             SDS SDSDEEDIEE:I:I I>:   >:MA>   > 
SMMMKAAA:SSSK8KK:80:0:080008000000000000300
N00I02
O0FI
F NO
  S    :F SOI D NF:E  IS O:D: E   I< : S  D S<EM IAD: MSEA<IKSK ::M 0:A<0
K
SO:INM0NAFF
O KOI:::N 0  F
0nG) I  NS SFDDO: E E IISD :E:    I>>:S  DUN EU>INR: E URG>E( NGnUR:(NEnRG0:E()
(:nNF
      :00OI): )
eE FIOSN:DFEO I: : S< D UESNDIER:E IG:< : U- N3evR
   nGt:- s3t
ate 0x1 => 0x0
3INNFFOO::        SSDDEEII::  <<  UUNNRREEGG::-0
JAILHOUSE_ENABLE: No such file or directory
JAILHOUSE_CELL_CREATE: Invalid argument

The interesting part is this:

INFO:    SDEI: > ENABLE(n:536872211)

The argument seen by ATF for the ENABLE call is 536872211, but in setup.c it’s 
hardcoded to 0. The problem doesn’t seem to be with the ATF version, but with 
the SMC calling per se, which is very scary.

The garbled output also made me think if there maybe is a concurrency issue in 
ATF.

Something is interrupting this synchronous call, mangling registers and then returning as if nothing happened - I would say...

I have no idea, though, what could interrupt here. Also a bit weird is that this happens frequently in that enabling path... In Jailhouse, we have interrupts off. Maybe something changes at the point we leave for EL3? That would shift the interruption mostly to the entry path.

Wild guesses, still.

Jan

--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

--
You received this message because you are subscribed to the Google Groups 
"Jailhouse" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jailhouse-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jailhouse-dev/ce5abf20-0ec9-8c39-40ef-60215b95c9fb%40siemens.com.

Reply via email to