Hello Christian,
On 2026-01-13 16:16, Christian Bruel via Devel wrote:
I would like to share an issue found with the port of seL4 and VMs to
the STM32MP2 SoC,
which features Armv8-A dual Cortex-A35 cores:
When activating the secondary core, The RET instruction crashes while
executing the
arm_enable_mmu(). The exception stack and ESR register show no useful
information.
Thanks for sharing this. We have a similar issue in 32-bit mode on the
zynqmp,
but there it only fails sometimes. Does it consistently fail for you or
just
sometimes? I suspect the latter, as Cortex-A35 is mostly in-order (with
some
out-of-order memory accesses).
We didn't run into any of these issues with the tqma board, which also
has a
Cortex-A35, but is an imx8qxp SoC.
Simply replacing the RET instruction with BR X30 resolves the issue.
Interesting.
In “Requirements for branch predictor maintenance” [2], branch
prediction
maintenance operations must be performed after enabling or disabling
the MMU.
However, these operations translate as a NOP if the branch predictor
hardware
is not architecturally visible.
That's a link to an ARMv7 document. The 32-bit ARM Elfoader code does a
BPIALL
before enabling the MMU, but the 64-bit one doesn't.
However, other than having unnecessarily wrong branch predictions, I
don't see
how clearing branch predictor state would be architecturally required.
It seems
more likely that the correct branch prediction causes issues that don't
show up
when mispredicting the branch. Do you have an aarch64 reference?
Therefore, the ‘BL/RET’ behavior from different translation regime
seems to
to be implementation-dependent or dependent on optional architectural
features
That seems unlikely: The instructions are well-defined.
I suspect we are seeing the effect of a missing synchronisation or other
subtle
problem, not the cause of the problem here.
is enough to start seL4 with SMP=on NUM_NODES=2 and run a dual core
Linux on the CAmkES VM.
Do you see the same problem when booting with one core, with unmodified
code?
It would help to limit this to unicore to rule out any SMP problems.
The hyp and non-hyp boot code is noticeably different, so it is a bit
surprising
that both fail the same way.
This is semantically equivalent to the BL X30 workaround, but this is
more
explicit regarding the MMU translation regime around BL/RET.
But like the BR work-around, it's a work-around and doesn't fix the
source
of the problem.
Does adding BPIALL make it boot too? I guess it will because it has the
same
effect.
It seems strange that any of this makes a difference though, considering
that there is both a DSB as well as an ISB before the RET, at least in
src/arch-arm/armv/armv8-a/64/mmu-hyp.S:
enable_mmu sctlr_el2, x8
ic ialluis
dsb ish
isb
tlbi alle2is
dsb ish
isb
ldp x29, x30, [sp], #16
ret
END_FUNC(arm_enable_hyp_mmu)
So to double check, do your elfloader files match these:
https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-arm/armv/armv8-a/64/mmu.S
https://github.com/seL4/seL4_tools/blob/master/elfloader-tool/src/arch-arm/armv/armv8-a/64/mmu-hyp.S
Or are you running some old version?
Greetings,
Indan
_______________________________________________
Devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]