[Qemu-devel] [Bug 1180970] *** affects all x86_64 soft emulation

2013-05-24 Thread Duane Voth
qemu: fatal: Trying to execute code outside RAM or ROM; worked in 1.4.0,
fails in 1.4.92

Want to bring a little attention to this bug - the break is in
target-i386/translate.c which affects all x86_64 soft emulation in a fairly
subtle way (ie. users will report a wide variety of problems none of which
seem to be related).  I can't find a way to elevate bug importance in
launchpad.

4a6fd938f5457ee161d2acbd9364608a2a68b7a1 is the offending commit.  There
have been numerous changes after this commit over top of the change that
broke emulation, so backing out this commit is not trivial.

I can reproduce the problem that is the subject of bug 1180970 for testing
easily.


Re: [Qemu-devel] [Bug 1180970] *** affects all x86_64 soft emulation

2013-05-24 Thread Laszlo Ersek
On 05/24/13 19:25, Duane Voth wrote:
> qemu: fatal: Trying to execute code outside RAM or ROM; worked in
> 1.4.0, fails in 1.4.92
>
> Want to bring a little attention to this bug - the break is in
> target-i386/translate.c which affects all x86_64 soft emulation in a
> fairly subtle way (ie. users will report a wide variety of problems
> none of which seem to be related).  I can't find a way to elevate bug
> importance in launchpad.
>
> 4a6fd938f5457ee161d2acbd9364608a2a68b7a1 is the offending commit.
> There have been numerous changes after this commit over top of the
> change that broke emulation, so backing out this commit is not
> trivial.
>
> I can reproduce the problem that is the subject of bug 1180970 for
> testing easily.

I can also reproduce this bug with my OVMF build, when KVM is disabled
(current master).

x86_64-softmmu/qemu-system-x86_64 -S -monitor stdio -m 1024 \
-vga cirrus -debugcon file:debug.log \
-global isa-debugcon.iobase=0x402 \
-bios /home/lacos/src/upstream/edk2-git-svn/out/OVMF.fd

Again, this is how qemu aborts:

> (qemu) qemu: fatal: Trying to execute code outside RAM or ROM at
> 0x0001
>
> RAX=3e084da8 RBX=3e084868 RCX= 
> RDX=3e084f00
> RSI=0001 RDI=3e085000 RBP=3e084708 
> RSP=3fac8510
> R8 = R9 =3e14c3e3 R10=0033 
> R11=00d3
> R12=3e0848a0 R13= R14= 
> R15=
> RIP=ffe4 RFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0008   00cf9300 DPL=0 DS   [-WA]
> CS =0028   00af9b00 DPL=0 CS64 [-RA]
> SS =0008   00cf9300 DPL=0 DS   [-WA]
> DS =0008   00cf9300 DPL=0 DS   [-WA]
> FS =0008   00cf9300 DPL=0 DS   [-WA]
> GS =0008   00cf9300 DPL=0 DS   [-WA]
> LDT=   8200 DPL=0 LDT
> TR =   8b00 DPL=0 TSS64-busy
> GDT= 3fa50e98 003f
> IDT= 3f9d6e20 0fff
> CR0=8033 CR2= CR3=3fa67000 CR4=0668
> [...]

Repeating from last time, we found it interesting that
RIP=ffe4 but the problem address is 0x0001.

I made some lame attempts to find out what code is running there, and --
since I've read the term "nop slide" recently --, I'll call it just that:

---[ debug patch]---
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 0aeccdb..0e0356f 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -7197,6 +7197,7 @@ static target_ulong disas_insn(CPUX86State *env, 
DisasContext *s,
 /* misc */
 case 0x90: /* nop */
 /* XXX: correct lock test for all insn */
+fprintf(stderr, "nop @ %016lx\n", pc_start);
 if (prefixes & PREFIX_LOCK) {
 goto illegal_op;
 }
---[ debug patch]---

The output it produces leading up to the abort quoted above is:

  nop @ ffe4
  nop @ ffe5
  nop @ ffe6
  nop @ ffe7
  nop @ fff5
  nop @ fff6
  nop @ fff7
  nop @ fff8
  nop @ fff9
  nop @ fffa
  nop @ fffb
  nop @ fffc
  nop @ fffd
  nop @ fffe
  nop @ 
  qemu: fatal: Trying to execute code outside RAM or ROM at 0x0001

Hence "nop slide".

Peeking into the coredump triggered by abort(), the backtrace is as follows:

  #0  0x7fd53c02b8a5 in raise (sig=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:64
  #1  0x7fd53c02d085 in abort () at abort.c:92
  #2  0x7fd5428d0333 in cpu_abort (env=0x7fd544b89c10, fmt=
  0x7fd542a47440 "Trying to execute code outside RAM or ROM at 0x%016lx\n")
  at /home/lacos/src/upstream/qemu/exec.c:542
  #3  0x7fd5428c9aa4 in get_page_addr_code (env1=0x7fd544b89c10, 
addr=4294967296)
  at /home/lacos/src/upstream/qemu/cputlb.c:338
  #4  0x7fd5429de266 in tb_gen_code (env=0x7fd544b89c10, pc=4294967268, 
cs_base=0, flags=4244148, cflags=0)
  at /home/lacos/src/upstream/qemu/translate-all.c:966
  #5  0x7fd5428c431b in tb_find_slow (env=0x7fd544b89c10, pc=4294967268, 
cs_base=0, flags=4244148)
  at /home/lacos/src/upstream/qemu/cpu-exec.c:139
  #6  0x7fd5428c44c4 in tb_find_fast (env=0x7fd544b89c10) at 
/home/lacos/src/upstream/qemu/cpu-exec.c:166
  #7  0x7fd5428c4c78 in cpu_x86_exec (env=0x7fd544b89c10) at 
/home/lacos/src/upstream/qemu/cpu-exec.c:593
  #8  0x7fd5428c8058 in tcg_cpu_exec (env=0x7fd544b89c10) at 
/home/lacos/src/upstream/qemu/cpus.c:1144
  #9  0x7fd5428c81a3 in tcg_exec_all () at 
/home/lacos/src/upstream/qemu/cpus.c:1177
  #10 0x7fd5428c7321

Re: [Qemu-devel] [Bug 1180970] *** affects all x86_64 soft emulation

2013-05-27 Thread Luiz Capitulino
On Fri, 24 May 2013 23:23:02 +0200
Laszlo Ersek  wrote:

> --[ proposed fix ]--
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 0e0356f..4fbd6c0 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -4813,7 +4813,11 @@ static target_ulong disas_insn(CPUX86State *env, 
> DisasContext *s,
>  /* 0x66 is ignored if rex.w is set */
>  dflag = 2;
>  }
> -if (!(prefixes & PREFIX_ADR)) {
> +if (prefixes & PREFIX_ADR) {
> +/* flip it back, 0x67 should have no effect */
> +aflag ^= 1;
> +}
> +else {
>  aflag = 2;
>  }
>  }
> --[ proposed fix ]--
> 
> I'll post it separately to the list for review.
> 
> Luiz, can you please test it with Windows guests?

On Windows 8 I can get past the boot loop point and even see Windows' boot
logo, but then I get a black screen (which I guess is the evolution of the
blue screen) asking me to reboot the PC saying "Error Code: 0x005D".

That error code is what I get with Windows 2008, with or without or patch.
I googled a bit about it, and it seems to be related to some CPU
incompatibility, which makes me think that this is a difference issue
(meaning that your patch does fix the boot loop bug).