Quoting Ard Biesheuvel (2023-05-19 23:36:53) > On Fri, 19 May 2023 at 18:32, Oliver Steffen <ostef...@redhat.com> wrote: > > > > > > Hi all, > > > > I had another look at this and I can now reproduce the issue consistently, > > with a quite minimal setup, on recent Linux kernel, Qemu, and EDK2. > > It requires rebooting the guest in a tight loop. It happens in silent > > and verbose > > builds alike, but since the verbose ones are slowed down by the serial > > output, it > > takes longer to hit the issue. > > It is possible to reproduce it with the silent builds within a few minutes. > > For the verbose case I recommend running multiple Qemu instances in > > parallel (as > > many as the machine allows, in my case ~100). > > > > Thanks a lot for all these details, this is extremely helpful. > > So what appears to be happening is that we split the 2M block mapping > that covers the code that we were called from, and hit a level 2 > translation fault because the updated page table entry is still > observed to be in its transient 'invalid' state as we return to it. > > Could you please check whether this makes a difference? > > --- a/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > +++ b/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S > @@ -65,6 +65,7 @@ > // write updated entry > str x1, [x0] > dsb nshst > + isb > > .L2_\@: > .endm
That fixes it - no crash observed within 150k iterations. Thanks, Ard! - Oliver > > > > Details: > > > > CPU: Cavium ThunderX2(R) CPU CN9975 > > Tested on 3 different machines: > > HPE apache, HPE apollo, Gigabyte R181 > > Kernels tested: > > - 6.2.15-100.fc36.aarch64 > > - 5.14.0-312.el9.aarch64 > > (contains 406504c7b0405d74d74c15a667cd4c4620c3e7a9, > > "KVM: arm64: Fix S1PTW handling on RO memslots") > > Qemu v8.0.0 (RHEL version and build from upstream repo) > > EDK2: master branch from 2023-05-16 (cafb4f3f) > > gcc 11.3.1 > > > > EDK2 build command line: > > build \ > > -a AARCH64 > > -p ArmVirtPkg/ArmVirtQemu.dsc > > -t GCC5 -b DEBUG \ > > -D NETWORK_IP6_ENABLE \ > > -D NETWORK_HTTP_BOOT_ENABLE \ > > -D NETWORK_TLS_ENABLE \ > > -D NETWORK_ISCSI_ENABLE \ > > -D NETWORK_ALLOW_HTTP_CONNECTIONS \ > > -D CAVIUM_ERRATUM_27456=TRUE \ > > -D TPM2_ENABLE=TRUE \ > > -D TPM1_ENABLE=FALSE \ > > -D DEBUG_PRINT_ERROR_LEVEL=0x80000000 \ > > -D BUILD_SHELL=TRUE \ > > --pcd="gEfiShellPkgTokenSpaceGuid.PcdShellDefaultDelay=0" \ > > --pcd="gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut=0" \ > > --hash --cmd-len=65536 > > > > To reproduce the issue I launched the firmware in Qemu and have it do > > a reboot once it finished booting up > > via a startup.nsh on the ESP. > > > > Qemu command line: > > qemu-system-aarch64 \ > > -machine virt,accel=kvm -m 13G \ > > -boot menu=off \ > > -cpu host \ > > -blockdev node-name=code,driver=file,filename="${FW_CODE}",read-only=on > > \ > > -blockdev node-name=vars,driver=file,filename="${FW_VARS}" \ > > -machine pflash0=code \ > > -machine pflash1=vars \ > > -serial stdio \ > > -net none \ > > -drive file=esp.img,snapshot=on > > > > Other things like number of CPUs or the presence of a vTPM have no > > influence. I did not try different amounts of RAM yet. > > > > Serial output: > > [...] > > InitializeDxeNxMemoryProtectionPolicy: StackBase = 0x00000000476C5000 > > StackSize = 0x0000000000020000 > > InitializeDxeNxMemoryProtectionPolicy: applying strict permissions to > > active memory regions > > SetUefiImageMemoryAttributes - 0x0000000040000000 - 0x00000000076E5000 > > (0x0000000000004000) > > UpdateRegionMappingRecursive(0): 40000000 - 476E5000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(1): 40000000 - 476E5000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 40000000 - 476E5000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 47600000 - 476E5000 set > > 60000000000400 clr FF9F000000000B3F > > SetUefiImageMemoryAttributes - 0x00000000476C5000 - 0x0000000000001000 > > (0x0000000000006000) > > UpdateRegionMappingRecursive(0): 476C5000 - 476C6000 set > > 60000000000000 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(1): 476C5000 - 476C6000 set > > 60000000000000 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 476C5000 - 476C6000 set > > 60000000000000 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 476C5000 - 476C6000 set > > 60000000000000 clr FF9F000000000B3F > > SetUefiImageMemoryAttributes - 0x000000004772B000 - 0x00000000007C0000 > > (0x0000000000004000) > > UpdateRegionMappingRecursive(0): 4772B000 - 47EEB000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(1): 4772B000 - 47EEB000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 4772B000 - 47EEB000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 4772B000 - 47800000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 47E00000 - 47EEB000 set > > 60000000000400 clr FF9F000000000B3F > > SetUefiImageMemoryAttributes - 0x0000000047EF3000 - 0x0000000000101000 > > (0x0000000000004000) > > UpdateRegionMappingRecursive(0): 47EF3000 - 47FF4000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(1): 47EF3000 - 47FF4000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 47EF3000 - 47FF4000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 47EF3000 - 47FF4000 set > > 60000000000400 clr FF9F000000000B3F > > SetUefiImageMemoryAttributes - 0x0000000047FFA000 - 0x0000000334AA6000 > > (0x0000000000004000) > > UpdateRegionMappingRecursive(0): 47FFA000 - 37CAA0000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(1): 47FFA000 - 37CAA0000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 47FFA000 - 80000000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 47FFA000 - 48000000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 340000000 - 380000000 set 70C clr 0 > > UpdateRegionMappingRecursive(3): 37F000000 - 37F200000 set 70C clr 0 > > UpdateRegionMappingRecursive(2): 340000000 - 37CAA0000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 37CA00000 - 37CC00000 set 70C clr 0 > > UpdateRegionMappingRecursive(3): 37CA00000 - 37CAA0000 set > > 60000000000400 clr FF9F000000000B3F > > SetUefiImageMemoryAttributes - 0x000000037CB40000 - 0x00000000031F9000 > > (0x0000000000004000) > > UpdateRegionMappingRecursive(0): 37CB40000 - 37FD39000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(1): 37CB40000 - 37FD39000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(2): 37CB40000 - 37FD39000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 37CB40000 - 37CC00000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 37F000000 - 37F200000 set > > 60000000000400 clr FF9F000000000B3F > > UpdateRegionMappingRecursive(3): 37FC00000 - 37FE00000 set 70C clr 0 > > UpdateRegionMappingRecursive(3): 37FC00000 - 37FD39000 set > > 60000000000400 clr FF9F000000000B3F > > > > > > Synchronous Exception at 0x000000037FD3C0A8 > > PC 0x00037FD3C0A8 (0x00037FD39000+0x000030A8) [ 0] ArmCpuDxe.dll > > PC 0x00037FD3C0A8 (0x00037FD39000+0x000030A8) [ 0] ArmCpuDxe.dll > > PC 0x00037FD3BE70 (0x00037FD39000+0x00002E70) [ 0] ArmCpuDxe.dll > > PC 0x00037FD3BE70 (0x00037FD39000+0x00002E70) [ 0] ArmCpuDxe.dll > > PC 0x00037FD3C2E4 (0x00037FD39000+0x000032E4) [ 0] ArmCpuDxe.dll > > PC 0x0000476E78F8 (0x0000476E5000+0x000028F8) [ 1] DxeCore.dll > > PC 0x0000476ED680 (0x0000476E5000+0x00008680) [ 1] DxeCore.dll > > PC 0x0000476F2744 (0x0000476E5000+0x0000D744) [ 1] DxeCore.dll > > PC 0x0000476ECDE8 (0x0000476E5000+0x00007DE8) [ 1] DxeCore.dll > > PC 0x00037FD3D2DC (0x00037FD39000+0x000042DC) [ 2] ArmCpuDxe.dll > > PC 0x0000476EC788 (0x0000476E5000+0x00007788) [ 3] DxeCore.dll > > PC 0x0000476F9CA8 (0x0000476E5000+0x00014CA8) [ 3] DxeCore.dll > > PC 0x0000476EFEF0 (0x0000476E5000+0x0000AEF0) [ 3] DxeCore.dll > > > > [ 0] > > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuDxe/CpuDxe/DEBUG/ArmCpuDxe.dll > > [ 1] > > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll > > [ 2] > > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuDxe/CpuDxe/DEBUG/ArmCpuDxe.dll > > [ 3] > > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll > > > > X0 0x000000037F10BFF0 X1 0x000000037F106003 X2 > > 0x000000000037FC00 X3 0x0000000000000000 > > X4 0x0000000000000200 X5 0x0000000000000004 X6 > > 0x0000000000000000 X7 0x000000037FD3F4B5 > > X8 0x0000000000000000 X9 0x0000000000000002 X10 > > 0x0000000000000000 X11 0x0000000000000000 > > X12 0x0000000000000002 X13 0x0000000000000002 X14 > > 0x0000000000000001 X15 0x0000000000000002 > > X16 0x000000037FD3A268 X17 0x00000000007AFA10 X18 > > 0x0000000000000000 X19 0x000000037FC00000 > > X20 0x0000000000000002 X21 0x000000037F106003 X22 > > 0x000000037F10B000 X23 0x000000037FD42000 > > X24 0x00000000001FFFFF X25 0x000000037FD39000 X26 > > 0x000000037F106000 X27 0x0000000000000003 > > X28 0x000000037F10BFF0 FP 0x00000000476E4780 LR 0x000000037FD3C0A8 > > > > V0 0x0000000000000000 0000000000000000 V1 0x0000000000000000 > > 0000000000000000 > > V2 0x0000000000000000 0000000000000000 V3 0x0000000000000000 > > 0000000000000000 > > V4 0x0000000000000000 0000000000000000 V5 0x0000000000000000 > > 0000000000000000 > > V6 0x0000000000000000 0000000000000000 V7 0x0000000000000000 > > 0000000000000000 > > V8 0x0000000000000000 0000000000000000 V9 0x0000000000000000 > > 0000000000000000 > > V10 0x0000000000000000 0000000000000000 V11 0x0000000000000000 > > 0000000000000000 > > V12 0x0000000000000000 0000000000000000 V13 0x0000000000000000 > > 0000000000000000 > > V14 0x0000000000000000 0000000000000000 V15 0x0000000000000000 > > 0000000000000000 > > V16 0x0000000000000000 0000000000000000 V17 0x0000000000000000 > > 0000000000000000 > > V18 0x0000000000000000 0000000000000000 V19 0x0000000000000000 > > 0000000000000000 > > V20 0x0000000000000000 0000000000000000 V21 0x0000000000000000 > > 0000000000000000 > > V22 0x0000000000000000 0000000000000000 V23 0x0000000000000000 > > 0000000000000000 > > V24 0x0000000000000000 0000000000000000 V25 0x0000000000000000 > > 0000000000000000 > > V26 0x0000000000000000 0000000000000000 V27 0x0000000000000000 > > 0000000000000000 > > V28 0x0000000000000000 0000000000000000 V29 0x0000000000000000 > > 0000000000000000 > > V30 0x0000000000000000 0000000000000000 V31 0x0000000000000000 > > 0000000000000000 > > > > SP 0x00000000476E4780 ELR 0x000000037FD3C0A8 SPSR 0x80000205 FPSR > > 0x00000000 > > ESR 0x86000006 FAR 0x000000037FD3C0A8 > > > > ESR : EC 0x21 IL 0x1 ISS 0x00000006 > > > > Instruction abort: Translation fault, second level > > > > Stack dump: > > 00000476E4680: 0000000000000001 0000000000000004 00000000476E4700 > > 00000000476F3980 > > 00000476E46A0: 000000037FD40CBD 0000000000000003 000000037FC00000 > > 000000037FD39000 > > 00000476E46C0: 0060000000000400 FF9F000000000B3F 00000000476E4780 > > 000000037FD3BE70 > > 00000476E46E0: 000000037FC00000 0000000000000002 000000037F106000 > > 000000037F10B000 > > 00000476E4700: 0000000000000FF0 00000000001FFFFF 000000037FD39000 > > 000000037F106000 > > 00000476E4720: 0000000000000003 000000037F10BFF0 0060000000000400 > > FF9F000000000B3F > > 00000476E4740: 000000037FD39000 000000037FD39000 00000000476E4780 > > 0060000000000403 > > 00000476E4760: 0000000C00000001 000000037FD3F90E 0000000000000400 > > 000000037F10B000 > > > 00000476E4780: 00000000476E4830 000000037FD3BE70 000000037CB40000 > > > 0000000000000001 > > 00000476E47A0: 000000037F10B000 0000000047FFE000 0000000000000068 > > 000000003FFFFFFF > > 00000476E47C0: 000000037FD39000 000000037F10C528 0000000000000002 > > 0000000047FFE068 > > 00000476E47E0: 0060000000000400 FF9F000000000B3F 0000000300000001 > > 000000037FD39000 > > 00000476E4800: 000000017FD40CBD 0060000000000401 0000001500000001 > > 000000037FD3F90E > > 00000476E4820: 0060000000000400 000000037F106000 00000000476E48E0 > > 000000037FD3BE70 > > 00000476E4840: 000000037CB40000 0000000000000000 0000000047FFE000 > > 0000000047FFF000 > > 00000476E4860: 0000000000000000 0000007FFFFFFFFF 000000037FD39000 > > 000000037F10C528 > > ASSERT [ArmCpuDxe] > > /root/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(333): > > ((BOOLEAN)(0==1)) > > > > > > > > The full log is available here: > > https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/2023-05-19/85.log?inline=false > > > > Debug files, firmware binaries, and the full build tree are here: > > https://gitlab.com/osteffen/thunderx2-debug/-/tree/main/2023-05-19 > > > > I am able to reproduce this quickly, so any ideas for what I can try > > are welcome :-) > > > > Thanks > > -Oliver > > > -- 🎩Oliver Steffen (he/him) - Software Engineer, Virtualization Red Hat GmbH <https://www.redhat.com/de/global/dach>, Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany Commercial register: Amtsgericht München/Munich, HRB 153243, Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross Everyone has different working hours… Please do not feel obligated to reply outside of your normal work schedule. -=-=-=-=-=-=-=-=-=-=-=- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#105091): https://edk2.groups.io/g/devel/message/105091 Mute This Topic: https://groups.io/mt/96075174/21656 Group Owner: devel+ow...@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-