Am 27. Mai 2024 10:58:54 UTC schrieb Peter Maydell <peter.mayd...@linaro.org>:
>On Mon, 27 May 2024 at 03:36, Richard Henderson
><richard.hender...@linaro.org> wrote:
>>
>> On 5/25/24 13:50, Bernhard Beschow wrote:
>> >
>> >
>> > Am 25. Mai 2024 13:41:54 UTC schrieb Bernhard Beschow <shen...@gmail.com>:
>> >>
>> >>
>> >> Am 5. März 2024 13:52:34 UTC schrieb Peter Maydell 
>> >> <peter.mayd...@linaro.org>:
>> >>> From: Richard Henderson <richard.hender...@linaro.org>
>> >>>
>> >>> If translation is disabled, the default memory type is Device, which
>> >>> requires alignment checking.  This is more optimally done early via
>> >>> the MemOp given to the TCG memory operation.
>> >>>
>> >>> Reviewed-by: Philippe Mathieu-Daudé <phi...@linaro.org>
>> >>> Reported-by: Idan Horowitz <idan.horow...@gmail.com>
>> >>> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
>> >>> Message-id: 20240301204110.656742-6-richard.hender...@linaro.org
>> >>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1204
>> >>> Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
>> >>> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
>> >>
>> >> Hi,
>> >>
>> >> This change causes an old 4.14.40 Linux kernel to panic on boot using the 
>> >> sabrelite machine:
>> >>
>> >> [snip]
>> >> Alignment trap: init (1) PC=0x76f1e3d4 Instr=0x14913004 
>> >> Address=0x76f34f3e FSR 0x001
>> >> Alignment trap: init (1) PC=0x76f1e3d8 Instr=0x148c3004 
>> >> Address=0x7e8492bd FSR 0x801
>> >> Alignment trap: init (1) PC=0x76f0dab0 Instr=0x6823 Address=0x7e849fbb 
>> >> FSR 0x001
>> >> Alignment trap: init (1) PC=0x76f0dab2 Instr=0x6864 Address=0x7e849fbf 
>> >> FSR 0x001
>> >> scsi 0:0:0:0: Direct-Access     QEMU     QEMU HARDDISK    2.5+ PQ: 0 
>> >> ANSI: 5
>> >> fsl-asoc-card sound: ASoC: CODEC DAI sgtl5000 not registered
>> >> imx-sgtl5000 sound: ASoC: CODEC DAI sgtl5000 not registered
>> >> imx-sgtl5000 sound: snd_soc_register_card failed (-517)
>> >> Alignment trap: init (1) PC=0x76eac95a Instr=0xf8dd5015 
>> >> Address=0x7e849b05 FSR 0x001
>> >> Alignment trap: not handling instruction f8dd5015 at [<76eac95a>]
>> >> Unhandled fault: alignment exception (0x001) at 0x7e849b05
>> >> pgd = 9c59c000
>> >> [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f
>> >> Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
>> >>
>> >> ---[ end Kernel panic - not syncing: Attempted to kill init! 
>> >> exitcode=0x00000007
>> >>
>> >> As you can see, some alignment exceptions are handled by the kernel, the 
>> >> last one isn't. I added some additional printk()'s and traced it down to 
>> >> this location in the kernel: 
>> >> <https://github.com/torvalds/linux/blob/v4.14/arch/arm/mm/alignment.c#L762>
>> >>  which claims that ARMv6++ CPUs can handle up to word-sized unaligned 
>> >> accesses, thus no fixup is needed.
>> >>
>> >> I hope that this will be sufficient for a fix. Let me know if you need 
>> >> any additional information.
>> >
>> > I'm performing a direct kernel boot. On real hardware, a bootloader is 
>> > involved which probably enables unaligned access. This may explain why it 
>> > works there but not in QEMU any longer.
>> >
>> > To fix direct kernel boot, it seems as if the "built-in bootloader" would 
>> > need to be adapted/extended [1]. Any ideas?
>>
>> I strongly suspect a kernel bug.  Either mmu disabled or attempting 
>> unaligned access on
>> pages mapped as Device instead of Normal.
>
>The MMU surely must be enabled by this point in guest boot.
>This change doesn't affect whether we do alignment checks based
>on SCTLR.A being set, so it's not a simple "the bootloader was
>supposed to clear that and it didn't" (besides, A=0 means no
>checks, so that's the default anyway). So the failure is kind
>of weird.

I think the kernel's output indicates that the MMU is active:

  [7e849b05] *pgd=2c552831, *pte=109eb34f, *ppte=109eb83f

AFAIU, the value in brackets is a virtual address while the pte's are physical 
ones. Furthermore, the `info mtree` QMP command tells that the physical 
addresses are RAM addresses:

  0000000010000000-000000002fffffff (prio 0, ram): sabrelite.ram

So I think we can conclude this to be "normal memory" to speak in ARM terms.

Regarding the Linux kernel, it seems to me that it expects the unaligned 
accesses (up to word size) to be resolved by the hardware. On ARMv7 it can 
assume this, because the SCTLR.U bit is always set to 1 [1]. It then seems to 
only deal with cases which the hardware can't handle. In the case above, the 
unhandled instruction is (output from execlog plugin):

  0, 0x76ecc95a, 0x5015f8dd, "ldr.w r5, [sp, #0x15]"

Note that the correct order of the machine code is 
0xf8dd5015. This is not a pattern handled by the kernel, presumably because it 
expects it to be handled in hardware, hence the "not handling instruction xy" 
output. 

I have the impression that real hardware only traps when the hardware can't 
handle the unaligned access, and only when the SCTLR.A bit is set.

I'm not an ARM expert, so take this with a grain of salt.

Best regards,
Bernhard

[1] 
https://developer.arm.com/documentation/ddi0406/cb/Appendixes/ARMv6-Differences/Application-level-memory-support/Alignment


>
>-- PMM

Reply via email to