On Fri, 16 Nov 2018 at 00:21, Peter Maydell <peter.mayd...@linaro.org> wrote: > > On 19 October 2018 at 09:55, Hongbo Zhang <hongbo.zh...@linaro.org> wrote: > > there are two commit reverts I have to do to boot system currently, these > > block not only my new 'sbsa-ref', but also the 'virt'. > > (other two workarounds can be ignored, they are just for temp using before > > firmware porting is fully finished) > > > > I am not saying the comments themselves have problem, maybe firmware need > > to be adapted accordingly too. But before they are fixed, I just simply > > revert them to not block my run. > > (And, I've mentioned in v3 list that there are still problem of booting SMP > > too, but I won't mention it here this time, otherwise this patch/cover > > letter becomes too complicated -- at least we can boot one core, I can > > fix/discuss it later separately.) > > We do need to investigate and at least understand all these issues > before we can take this new board. Thanks for the repro instructions > for the virt board. > Well, for the SMP booting, when GICv2 used, there is no problem, max CPU number 8 can be booted, including all the three cases: kernel only, UEFI+kernel and ATF+UEFI+kernel.
But when GICv3 used, these two cases still work: kernel only, and UEFI+kernel, but ATF+UEFI+kernel fails booting more than 4 cores with GICv3. The original ATF didn't support GICv3, so I added the support: http://git.linaro.org/people/hongbo.zhang/atf-sbsa.git/log/?h=sbsa_gicv3 Root cause of failing to boot more than 4 cores with ATF+UEFI+kernel with my GICv3 enabled is due to this: In QEMU, we have this defination #define ARM_DEFAULT_CPUS_PER_CLUSTER 8 But in ATF, the defination is #define PLATFORM_MAX_CPUS_PER_CLUSTER 4 So when we pass smp=6 for example, QEMU generates MPIDR showing all the 6 cores are at cluster 0, but when ATF parses such MPIDR, the function plat_core_pos_by_mpidr() in plat/qemu/topology.c returns error since there should be no more cores than 4. I think we should change the definition in QEMU to 4, instead of changing the ATF's, because I checked Cortext a57/a72/a73/a75 spec, it says there are 4 cores max at one cluster. > > Steps to reproduce issues: > > 1. Compile ARMTF > > make CROSS_COMPILE=aarch64-linux-gnu- PLAT=qemu all DEBUG=1 > > What source tree do I need to build this EDK ? > I use https://github.com/ARM-software/arm-trusted-firmware.git And you can also use my http://git.linaro.org/people/hongbo.zhang/atf-sbsa.git/log/?h=sbsa_gicv3 with GICv3 enabled. Use additional compiling parameter QEMU_USE_GIC_DRIVER to selecct GICv3 "make PLAT=qemu all DEBUG=1 QEMU_USE_GIC_DRIVER=QEMU_GICV3" No such parameter to select default GICv2 (But I found if you change GICv2 and v3 from time to time when compiling, the build system may not do the correct changing every time, so it is better to do a clean before compiling) > > 2. Compile edk2 > > make -C BaseTools > > . edksetup.sh > > export GCC49_AARCH64_PREFIX=aarch64-linux-gnu- > > build -a AARCH64 -t GCC49 -p ArmVirtPkg/ArmVirtQemuKernel.dsc > > > > 3. Run QEMU > > 3a. copy or link ARMTF and edk2 images to the directory where you want to > > launch QEMU > > bl1.bin -> /home/hongbo/work/arm-trusted-firmware/build/qemu/debug/bl1.bin* > > bl2.bin -> /home/hongbo/work/arm-trusted-firmware/build/qemu/debug/bl2.bin* > > bl31.bin -> > > /home/hongbo/work/arm-trusted-firmware/build/qemu/debug/bl31.bin* > > bl33.bin -> > > /home/hongbo/work/edk2/Build/ArmVirtQemuKernel-AARCH64/DEBUG_GCC49/FV/QEMU_EFI.fd > > > > 3b. command to launch QEMU > > command1 to load a whole system > > qemu-system-aarch64 -machine virt,secure=on,virtualization=on -cpu > > cortex-a57 -m 1024 -bios bl1.bin -semihosting -serial stdio -device > > virtio-scsi-device,id=scsi -drive > > file=../qemu-imgs/deb9_arm64_netinst_uefi.raw,id=rootimg,if=none -device > > scsi-hd,drive=rootimg -netdev user,id=unet -device > > virtio-net-device,netdev=unet -net user > > > > or command2 simply load a kernel > > qemu-system-aarch64 -machine virt,secure=on,virtualization=on -cpu > > cortex-a57 -m 1024 -bios bl1.bin -semihosting -serial stdio -kernel Image > > -initrd xxx -append "root=/dev/xxx console=ttyAMA0" > > > > 4a. system halt with error message > > ASSERT_EFI_ERROR (Status = Not Found) > > ASSERT [ResetSystemRuntimeDxe] > > /home/hongbo/work/edk2/Build/ArmVirtQemuKernel-AARCH64/DEBUG_GCC49/AARCH64/MdeModulePkg/Universal/ResetSystemRuntimeDxe/ResetSystemRuntimeDxe/DEBUG/AutoGen.c(370): > > !EFI_ERROR (Status) > > > > 4b. Revert "device_tree: Increase FDT_MAX_SIZE to 1 MiB" > > command1 can run further to halt at nother place, see 5a and 5b > > command2 can load kernel successfully > > I'm not sure what's going on here. Some debugging of what the > assertion is checking and why we've hit it would be required. > I didn't expect changing FDT_MAX_SIZE would affect much but > perhaps it changes where the fdt winds up in memory or how > big it is so it overlaps with something else. > There is an fdt_pack() function which should compress a > created dtb, and which QEMU uses for some board models but > not others; but I would want to find out what's actually > happening here before looking at whether that is the right fix. > > > 5a. 2nd system halt with message > > Synchronous Exception at 0x0000000078A152F0 > > PC 0x000078A152F0 (0x000078A00000+0x000152F0) [ 0] ArmVeNorFlashDxe.dll > > PC 0x000078A152A0 (0x000078A00000+0x000152A0) [ 0] ArmVeNorFlashDxe.dll > > PC 0x000078A11DF0 (0x000078A00000+0x00011DF0) [ 0] ArmVeNorFlashDxe.dll > > [...snip...] > > PC 0x0000600088C4 > > PC 0x000060008230 > > PC 0x580B24C2580B24A1 > > > > Recursive exception occurred while dumping the CPU state > > > > 5b Revert "target/arm: Implement new do_transaction_failed hook" > > then no halt, command1 can boot OS successfully > > The bug here will be that the firmware is attempting to access > an address which has no device present there. We need to > find out what code in the firmware is doing that, and what > device it is trying to access. Then we can find out if it's > a firmware bug, or if there needs to be some device present, > or if we've given the wrong information in the device tree > or ACPI tables. > I think the firmware is checking mass storage device to find a bootable OS at this stage. > Can EDK be made to give a backtrace with source filenames > and line numbers for the exception ? > > thanks > -- PMM