(+ Leif) Exec summary: strange QEMU bug triggered by RELEASE_GCC5 code, which is caused by a spurious write to the NOR flash at runtime. The latter is also a bug, in Tianocore.
On 18 August 2016 at 16:36, Peter Maydell <peter.mayd...@linaro.org> wrote: > On 18 August 2016 at 15:15, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: >> On 18 August 2016 at 16:10, Peter Maydell <peter.mayd...@linaro.org> wrote: >>> On 16 August 2016 at 13:08, Ard Biesheuvel <ard.biesheu...@linaro.org> >>> wrote: >>>> Bad ram pointer 0x54 >>>> Aborted (core dumped) >>> >>> So the reason this happens is that get_page_addr_code() doesn't >>> correctly handle the case of the memory region being a >>> ROM that's not in ROMD mode. That is, the flash memory can >>> be either in "reads map directly to guest memory" (normal) >>> mode or "reads are MMIO to a device" (ROMD) mode. QEMU >>> can't execute from devices, so the best case here would >>> be that we print the "Sorry, we can't execute from a device" >>> message and stop execution. >>> >> >> So is there a spurious write somewhere that causes the ROM to switch >> into ROMD mode? Because it executes happily from ROM (until it >> doesn't, of course) > > The write that causes us to go into not-ROMD mode is in this block: > > 0x00000000000096ac: cb000294 sub x20, x20, x0 > 0x00000000000096b0: f9000a74 str x20, [x19, #16] > 0x00000000000096b4: 9100627c add x28, x19, #0x18 (24) > 0x00000000000096b8: b9400780 ldr w0, [x28, #4] > 0x00000000000096bc: 35002cc0 cbnz w0, #+0x598 (addr 0x9c54) > > which is executed with > > PC=00000000000096ac SP=000000004007f590 > X00=0000000000000160 X01=0000000000000095 X02=000000003031424e > X03=0000000000001b40 > X04=0000000000010b64 X05=0000000000000160 X06=0000000000000188 > X07=000000004007c268 > X08=00000000000149a0 X09=000000004007fe58 X10=000000004007f793 > X11=0000000000000002 > X12=00000000707fe07a X13=0000000000000002 X14=0000000000000000 > X15=0000000000000000 > X16=0000000000000000 X17=0000000000000000 X18=0000000000000000 > X19=00000000000149a0 > X20=00000000000149a0 X21=00000000000149a0 X22=0000000000000001 > X23=0000000000000160 > X24=000000004007fa24 X25=000000004007fa38 X26=0000000000000000 > X27=0000000000014840 > X28=00000000000149a0 X29=0000000000000000 X30=0000000000009364 > PSTATE=200003c5 --C- EL1h > > so you write 0x14840 to address 0x149b0, which is in the flash. > > (This is the last TB we execute, because trying to find the > next one hits the problem of the flash not being in ROMD mode. > So it's the very last thing in the log if you run QEMU with > -d in_asm,out_asm,exec,cpu,int -D /tmp/q.log) > OK, this rabbit hole goes pretty deep :-) Normally, the uncompressed PE/COFF images in the NOR flash (the ones that set up the MMU etc) are relocated at build time, so that they can execute from the offset they end up at in the NOR image. The relocation code sets the base address in the image header, and applies all fixups in the .reloc PE/COFF section. As it turns out, the LTO is so effective that it optimizes away all absolute symbol references, leaving us with no .reloc section at all. (i.e., the module turns out completely position independent, but purely by accident). The runtime loader does not cope well with this, and ends up writing to the NOR flash, triggering the issue above. Thanks, Ard.