Quoting Ard Biesheuvel (2023-05-19 23:36:53)
> On Fri, 19 May 2023 at 18:32, Oliver Steffen <ostef...@redhat.com> wrote:
> >
> >
> > Hi all,
> >
> > I had another look at this and I can now reproduce the issue consistently,
> > with a quite minimal setup, on recent Linux kernel, Qemu, and EDK2.
> > It requires rebooting the guest in a tight loop. It happens in silent
> > and verbose
> > builds alike, but since the verbose ones are slowed down by the serial
> > output, it
> > takes longer to hit the issue.
> > It is possible to reproduce it with the silent builds within a few minutes.
> > For the verbose case I recommend running multiple Qemu instances in 
> > parallel (as
> > many as the machine allows, in my case ~100).
> >
>
> Thanks a lot for all these details, this is extremely helpful.
>
> So what appears to be happening is that we split the 2M block mapping
> that covers the code that we were called from, and hit a level 2
> translation fault because the updated page table entry is still
> observed to be in its transient 'invalid' state as we return to it.
>
> Could you please check whether this makes a difference?
>
> --- a/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S
> +++ b/ArmPkg/Library/ArmMmuLib/AArch64/ArmMmuLibReplaceEntry.S
> @@ -65,6 +65,7 @@
>    // write updated entry
>    str   x1, [x0]
>    dsb   nshst
> +  isb
>
>  .L2_\@:
>    .endm

That fixes it - no crash observed within 150k iterations.
Thanks, Ard!

- Oliver

>
>
> > Details:
> >
> > CPU: Cavium ThunderX2(R) CPU CN9975
> > Tested on 3 different machines:
> >     HPE apache, HPE apollo, Gigabyte R181
> > Kernels tested:
> >  - 6.2.15-100.fc36.aarch64
> >  - 5.14.0-312.el9.aarch64
> >    (contains 406504c7b0405d74d74c15a667cd4c4620c3e7a9,
> >    "KVM: arm64: Fix S1PTW handling on RO memslots")
> > Qemu v8.0.0 (RHEL version and build from upstream repo)
> > EDK2: master branch from 2023-05-16 (cafb4f3f)
> > gcc 11.3.1
> >
> > EDK2 build command line:
> > build \
> >   -a AARCH64
> >   -p ArmVirtPkg/ArmVirtQemu.dsc
> >   -t GCC5 -b DEBUG \
> >   -D NETWORK_IP6_ENABLE \
> >   -D NETWORK_HTTP_BOOT_ENABLE \
> >   -D NETWORK_TLS_ENABLE \
> >   -D NETWORK_ISCSI_ENABLE \
> >   -D NETWORK_ALLOW_HTTP_CONNECTIONS \
> >   -D CAVIUM_ERRATUM_27456=TRUE \
> >   -D TPM2_ENABLE=TRUE \
> >   -D TPM1_ENABLE=FALSE \
> >   -D DEBUG_PRINT_ERROR_LEVEL=0x80000000  \
> >   -D BUILD_SHELL=TRUE \
> >   --pcd="gEfiShellPkgTokenSpaceGuid.PcdShellDefaultDelay=0" \
> >   --pcd="gEfiMdePkgTokenSpaceGuid.PcdPlatformBootTimeOut=0" \
> >   --hash --cmd-len=65536
> >
> > To reproduce the issue I launched the firmware in Qemu and have it do
> > a reboot once it finished booting up
> > via a startup.nsh on the ESP.
> >
> > Qemu command line:
> > qemu-system-aarch64 \
> >     -machine virt,accel=kvm -m 13G \
> >     -boot menu=off \
> >     -cpu host \
> >     -blockdev node-name=code,driver=file,filename="${FW_CODE}",read-only=on 
> > \
> >     -blockdev node-name=vars,driver=file,filename="${FW_VARS}" \
> >     -machine pflash0=code \
> >     -machine pflash1=vars \
> >     -serial stdio \
> >     -net none \
> >     -drive file=esp.img,snapshot=on
> >
> > Other things like number of CPUs or the presence of a vTPM have no
> > influence. I did not try different amounts of RAM yet.
> >
> > Serial output:
> > [...]
> > InitializeDxeNxMemoryProtectionPolicy: StackBase = 0x00000000476C5000
> > StackSize = 0x0000000000020000
> > InitializeDxeNxMemoryProtectionPolicy: applying strict permissions to
> > active memory regions
> > SetUefiImageMemoryAttributes - 0x0000000040000000 - 0x00000000076E5000
> > (0x0000000000004000)
> > UpdateRegionMappingRecursive(0): 40000000 - 476E5000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(1): 40000000 - 476E5000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 40000000 - 476E5000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 47600000 - 476E5000 set
> > 60000000000400 clr FF9F000000000B3F
> > SetUefiImageMemoryAttributes - 0x00000000476C5000 - 0x0000000000001000
> > (0x0000000000006000)
> > UpdateRegionMappingRecursive(0): 476C5000 - 476C6000 set
> > 60000000000000 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(1): 476C5000 - 476C6000 set
> > 60000000000000 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 476C5000 - 476C6000 set
> > 60000000000000 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 476C5000 - 476C6000 set
> > 60000000000000 clr FF9F000000000B3F
> > SetUefiImageMemoryAttributes - 0x000000004772B000 - 0x00000000007C0000
> > (0x0000000000004000)
> > UpdateRegionMappingRecursive(0): 4772B000 - 47EEB000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(1): 4772B000 - 47EEB000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 4772B000 - 47EEB000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 4772B000 - 47800000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 47E00000 - 47EEB000 set
> > 60000000000400 clr FF9F000000000B3F
> > SetUefiImageMemoryAttributes - 0x0000000047EF3000 - 0x0000000000101000
> > (0x0000000000004000)
> > UpdateRegionMappingRecursive(0): 47EF3000 - 47FF4000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(1): 47EF3000 - 47FF4000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 47EF3000 - 47FF4000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 47EF3000 - 47FF4000 set
> > 60000000000400 clr FF9F000000000B3F
> > SetUefiImageMemoryAttributes - 0x0000000047FFA000 - 0x0000000334AA6000
> > (0x0000000000004000)
> > UpdateRegionMappingRecursive(0): 47FFA000 - 37CAA0000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(1): 47FFA000 - 37CAA0000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 47FFA000 - 80000000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 47FFA000 - 48000000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 340000000 - 380000000 set 70C clr 0
> > UpdateRegionMappingRecursive(3): 37F000000 - 37F200000 set 70C clr 0
> > UpdateRegionMappingRecursive(2): 340000000 - 37CAA0000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 37CA00000 - 37CC00000 set 70C clr 0
> > UpdateRegionMappingRecursive(3): 37CA00000 - 37CAA0000 set
> > 60000000000400 clr FF9F000000000B3F
> > SetUefiImageMemoryAttributes - 0x000000037CB40000 - 0x00000000031F9000
> > (0x0000000000004000)
> > UpdateRegionMappingRecursive(0): 37CB40000 - 37FD39000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(1): 37CB40000 - 37FD39000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(2): 37CB40000 - 37FD39000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 37CB40000 - 37CC00000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 37F000000 - 37F200000 set
> > 60000000000400 clr FF9F000000000B3F
> > UpdateRegionMappingRecursive(3): 37FC00000 - 37FE00000 set 70C clr 0
> > UpdateRegionMappingRecursive(3): 37FC00000 - 37FD39000 set
> > 60000000000400 clr FF9F000000000B3F
> >
> >
> > Synchronous Exception at 0x000000037FD3C0A8
> > PC 0x00037FD3C0A8 (0x00037FD39000+0x000030A8) [ 0] ArmCpuDxe.dll
> > PC 0x00037FD3C0A8 (0x00037FD39000+0x000030A8) [ 0] ArmCpuDxe.dll
> > PC 0x00037FD3BE70 (0x00037FD39000+0x00002E70) [ 0] ArmCpuDxe.dll
> > PC 0x00037FD3BE70 (0x00037FD39000+0x00002E70) [ 0] ArmCpuDxe.dll
> > PC 0x00037FD3C2E4 (0x00037FD39000+0x000032E4) [ 0] ArmCpuDxe.dll
> > PC 0x0000476E78F8 (0x0000476E5000+0x000028F8) [ 1] DxeCore.dll
> > PC 0x0000476ED680 (0x0000476E5000+0x00008680) [ 1] DxeCore.dll
> > PC 0x0000476F2744 (0x0000476E5000+0x0000D744) [ 1] DxeCore.dll
> > PC 0x0000476ECDE8 (0x0000476E5000+0x00007DE8) [ 1] DxeCore.dll
> > PC 0x00037FD3D2DC (0x00037FD39000+0x000042DC) [ 2] ArmCpuDxe.dll
> > PC 0x0000476EC788 (0x0000476E5000+0x00007788) [ 3] DxeCore.dll
> > PC 0x0000476F9CA8 (0x0000476E5000+0x00014CA8) [ 3] DxeCore.dll
> > PC 0x0000476EFEF0 (0x0000476E5000+0x0000AEF0) [ 3] DxeCore.dll
> >
> > [ 0] 
> > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuDxe/CpuDxe/DEBUG/ArmCpuDxe.dll
> > [ 1] 
> > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
> > [ 2] 
> > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuDxe/CpuDxe/DEBUG/ArmCpuDxe.dll
> > [ 3] 
> > /root/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll
> >
> >   X0 0x000000037F10BFF0   X1 0x000000037F106003   X2
> > 0x000000000037FC00   X3 0x0000000000000000
> >   X4 0x0000000000000200   X5 0x0000000000000004   X6
> > 0x0000000000000000   X7 0x000000037FD3F4B5
> >   X8 0x0000000000000000   X9 0x0000000000000002  X10
> > 0x0000000000000000  X11 0x0000000000000000
> >  X12 0x0000000000000002  X13 0x0000000000000002  X14
> > 0x0000000000000001  X15 0x0000000000000002
> >  X16 0x000000037FD3A268  X17 0x00000000007AFA10  X18
> > 0x0000000000000000  X19 0x000000037FC00000
> >  X20 0x0000000000000002  X21 0x000000037F106003  X22
> > 0x000000037F10B000  X23 0x000000037FD42000
> >  X24 0x00000000001FFFFF  X25 0x000000037FD39000  X26
> > 0x000000037F106000  X27 0x0000000000000003
> >  X28 0x000000037F10BFF0   FP 0x00000000476E4780   LR 0x000000037FD3C0A8
> >
> >   V0 0x0000000000000000 0000000000000000   V1 0x0000000000000000
> > 0000000000000000
> >   V2 0x0000000000000000 0000000000000000   V3 0x0000000000000000
> > 0000000000000000
> >   V4 0x0000000000000000 0000000000000000   V5 0x0000000000000000
> > 0000000000000000
> >   V6 0x0000000000000000 0000000000000000   V7 0x0000000000000000
> > 0000000000000000
> >   V8 0x0000000000000000 0000000000000000   V9 0x0000000000000000
> > 0000000000000000
> >  V10 0x0000000000000000 0000000000000000  V11 0x0000000000000000
> > 0000000000000000
> >  V12 0x0000000000000000 0000000000000000  V13 0x0000000000000000
> > 0000000000000000
> >  V14 0x0000000000000000 0000000000000000  V15 0x0000000000000000
> > 0000000000000000
> >  V16 0x0000000000000000 0000000000000000  V17 0x0000000000000000
> > 0000000000000000
> >  V18 0x0000000000000000 0000000000000000  V19 0x0000000000000000
> > 0000000000000000
> >  V20 0x0000000000000000 0000000000000000  V21 0x0000000000000000
> > 0000000000000000
> >  V22 0x0000000000000000 0000000000000000  V23 0x0000000000000000
> > 0000000000000000
> >  V24 0x0000000000000000 0000000000000000  V25 0x0000000000000000
> > 0000000000000000
> >  V26 0x0000000000000000 0000000000000000  V27 0x0000000000000000
> > 0000000000000000
> >  V28 0x0000000000000000 0000000000000000  V29 0x0000000000000000
> > 0000000000000000
> >  V30 0x0000000000000000 0000000000000000  V31 0x0000000000000000
> > 0000000000000000
> >
> >   SP 0x00000000476E4780  ELR 0x000000037FD3C0A8  SPSR 0x80000205  FPSR
> > 0x00000000
> >  ESR 0x86000006          FAR 0x000000037FD3C0A8
> >
> >  ESR : EC 0x21  IL 0x1  ISS 0x00000006
> >
> > Instruction abort: Translation fault, second level
> >
> > Stack dump:
> >   00000476E4680: 0000000000000001 0000000000000004 00000000476E4700
> > 00000000476F3980
> >   00000476E46A0: 000000037FD40CBD 0000000000000003 000000037FC00000
> > 000000037FD39000
> >   00000476E46C0: 0060000000000400 FF9F000000000B3F 00000000476E4780
> > 000000037FD3BE70
> >   00000476E46E0: 000000037FC00000 0000000000000002 000000037F106000
> > 000000037F10B000
> >   00000476E4700: 0000000000000FF0 00000000001FFFFF 000000037FD39000
> > 000000037F106000
> >   00000476E4720: 0000000000000003 000000037F10BFF0 0060000000000400
> > FF9F000000000B3F
> >   00000476E4740: 000000037FD39000 000000037FD39000 00000000476E4780
> > 0060000000000403
> >   00000476E4760: 0000000C00000001 000000037FD3F90E 0000000000000400
> > 000000037F10B000
> > > 00000476E4780: 00000000476E4830 000000037FD3BE70 000000037CB40000 
> > > 0000000000000001
> >   00000476E47A0: 000000037F10B000 0000000047FFE000 0000000000000068
> > 000000003FFFFFFF
> >   00000476E47C0: 000000037FD39000 000000037F10C528 0000000000000002
> > 0000000047FFE068
> >   00000476E47E0: 0060000000000400 FF9F000000000B3F 0000000300000001
> > 000000037FD39000
> >   00000476E4800: 000000017FD40CBD 0060000000000401 0000001500000001
> > 000000037FD3F90E
> >   00000476E4820: 0060000000000400 000000037F106000 00000000476E48E0
> > 000000037FD3BE70
> >   00000476E4840: 000000037CB40000 0000000000000000 0000000047FFE000
> > 0000000047FFF000
> >   00000476E4860: 0000000000000000 0000007FFFFFFFFF 000000037FD39000
> > 000000037F10C528
> > ASSERT [ArmCpuDxe]
> > /root/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/AArch64/DefaultExceptionHandler.c(333):
> > ((BOOLEAN)(0==1))
> >
> >
> >
> > The full log is available here:
> > https://gitlab.com/osteffen/thunderx2-debug/-/raw/main/2023-05-19/85.log?inline=false
> >
> > Debug files, firmware binaries, and the full build tree are here:
> > https://gitlab.com/osteffen/thunderx2-debug/-/tree/main/2023-05-19
> >
> > I am able to reproduce this quickly, so any ideas for what I can try
> > are welcome :-)
> >
> > Thanks
> > -Oliver
> >
>

--
🎩Oliver Steffen (he/him) - Software Engineer, Virtualization
Red Hat GmbH <https://www.redhat.com/de/global/dach>,
Registered seat: Werner-von-Siemens-Ring 12, D-85630 Grasbrunn, Germany
Commercial register: Amtsgericht München/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill,
Amy Ross

Everyone has different working hours… Please do not feel obligated to
reply outside of your normal work schedule.



-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.
View/Reply Online (#105091): https://edk2.groups.io/g/devel/message/105091
Mute This Topic: https://groups.io/mt/96075174/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Reply via email to