Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM
On 28/06/2016 17:42, Peter Maydell wrote: > Ping for review? The patch is trivial, the hard part was coming up with the message for the user. :) Go ahead! Paolo > thanks > -- PMM > > On 20 June 2016 at 18:07, Peter Maydell wrote: >> In get_page_addr_code(), if the guest program counter turns out not to >> be in ROM or RAM, we can't handle executing from it, and we call >> cpu_abort(). This results in the message >> qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800 >> followed by a guest register dump, and then QEMU dumps core. >> >> This situation happens in one of two cases: >> (1) a guest kernel bug, where it jumped off into nowhere >> (2) a user command line mistake, where they tried to run an image for >> board A on a QEMU model of board B, or where they didn't provide >> an image at all, and QEMU executed through a ROM or RAM full of >> NOP instructions and then fell off the end >> >> In either case, a core dump of QEMU itself is entirely useless, and >> only confuses users into thinking that this is a bug in QEMU rather >> than a bug in the guest or a problem with their command line. (This >> is a variation on the general idea that we shouldn't assert() on >> something the user can accidentally provoke.) >> >> Replace the cpu_abort() with something that explains the situation >> a bit better and exits QEMU without dumping core. >> >> (See LP:1062220 for several examples of confused users.) >> >> Signed-off-by: Peter Maydell >> --- >> I've been meaning to do this for a while now...hopefully the >> expanded error message should reduce user confusion. >> >> cputlb.c | 39 +-- >> 1 file changed, 37 insertions(+), 2 deletions(-) >> >> diff --git a/cputlb.c b/cputlb.c >> index 23c9b91..079e497 100644 >> --- a/cputlb.c >> +++ b/cputlb.c >> @@ -30,6 +30,8 @@ >> #include "exec/ram_addr.h" >> #include "exec/exec-all.h" >> #include "tcg/tcg.h" >> +#include "qemu/error-report.h" >> +#include "exec/log.h" >> >> /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ >> /* #define DEBUG_TLB */ >> @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, >> prot, mmu_idx, size); >> } >> >> +static void report_bad_exec(CPUState *cpu, target_ulong addr) >> +{ >> +/* Accidentally executing outside RAM or ROM is quite common for >> + * several user-error situations, so report it in a way that >> + * makes it clear that this isn't a QEMU bug and provide suggestions >> + * about what a user could do to fix things. >> + */ >> +error_report("Trying to execute code outside RAM or ROM at 0x" >> + TARGET_FMT_lx, addr); >> +error_printf("This usually means one of the following happened:\n\n" >> + "(1) You told QEMU to execute a kernel for the wrong >> machine " >> + "type, and it crashed on startup (eg trying to run a " >> + "raspberry pi kernel on a versatilepb QEMU machine)\n" >> + "(2) You didn't give QEMU a kernel or BIOS filename at >> all, " >> + "and QEMU executed a ROM full of no-op instructions until " >> + "it fell off the end\n" >> + "(3) Your guest kernel has a bug and crashed by jumping " >> + "off into nowhere\n\n" >> + "This is almost always one of the first two, so check your >> " >> + "command line and that you are using the right type of >> kernel " >> + "for this machine.\n" >> + "If you think option (3) is likely then you can try >> debugging " >> + "your guest with the -d debug options; in particular " >> + "-d guest_errors will cause the log to include a dump of >> the " >> + "guest register state at this point.\n\n" >> + "Execution cannot continue; stopping here.\n\n"); >> + >> +/* Report also to the logs, with more detail including register dump */ >> +qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " >> + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); >> +log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); >> +} >> + >> /* NOTE: this function can trigger an exception */ >> /* NOTE2: the returned address is not exactly the physical address: it >> * is actually a ram_addr_t (in system mode; the user mode emulation >> @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, >> target_ulong addr) >> if (cc->do_unassigned_access) { >> cc->do_unassigned_access(cpu, addr, false, true, 0, 4); >> } else { >> -cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" >> - TARGET_FMT_lx "\n", addr); >> +report_bad_exec(cpu, addr); >> +exit(1); >> } >> } >>
Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM
On 28 June 2016 at 18:49, Paolo Bonzini wrote: > On 28/06/2016 17:42, Peter Maydell wrote: >> Ping for review? > > The patch is trivial, the hard part was coming up with the message for > the user. :) Sure, but review includes whether the message makes sense :-) > Go ahead! I'll push it to master in a bit. thanks -- PMM
Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM
On 06/20/2016 10:07 AM, Peter Maydell wrote: In get_page_addr_code(), if the guest program counter turns out not to be in ROM or RAM, we can't handle executing from it, and we call cpu_abort(). This results in the message qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800 followed by a guest register dump, and then QEMU dumps core. This situation happens in one of two cases: (1) a guest kernel bug, where it jumped off into nowhere (2) a user command line mistake, where they tried to run an image for board A on a QEMU model of board B, or where they didn't provide an image at all, and QEMU executed through a ROM or RAM full of NOP instructions and then fell off the end In either case, a core dump of QEMU itself is entirely useless, and only confuses users into thinking that this is a bug in QEMU rather than a bug in the guest or a problem with their command line. (This is a variation on the general idea that we shouldn't assert() on something the user can accidentally provoke.) Replace the cpu_abort() with something that explains the situation a bit better and exits QEMU without dumping core. (See LP:1062220 for several examples of confused users.) Signed-off-by: Peter Maydell --- Reviewed-by: Richard Henderson r~
Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM
Ping for review? thanks -- PMM On 20 June 2016 at 18:07, Peter Maydell wrote: > In get_page_addr_code(), if the guest program counter turns out not to > be in ROM or RAM, we can't handle executing from it, and we call > cpu_abort(). This results in the message > qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800 > followed by a guest register dump, and then QEMU dumps core. > > This situation happens in one of two cases: > (1) a guest kernel bug, where it jumped off into nowhere > (2) a user command line mistake, where they tried to run an image for > board A on a QEMU model of board B, or where they didn't provide > an image at all, and QEMU executed through a ROM or RAM full of > NOP instructions and then fell off the end > > In either case, a core dump of QEMU itself is entirely useless, and > only confuses users into thinking that this is a bug in QEMU rather > than a bug in the guest or a problem with their command line. (This > is a variation on the general idea that we shouldn't assert() on > something the user can accidentally provoke.) > > Replace the cpu_abort() with something that explains the situation > a bit better and exits QEMU without dumping core. > > (See LP:1062220 for several examples of confused users.) > > Signed-off-by: Peter Maydell > --- > I've been meaning to do this for a while now...hopefully the > expanded error message should reduce user confusion. > > cputlb.c | 39 +-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/cputlb.c b/cputlb.c > index 23c9b91..079e497 100644 > --- a/cputlb.c > +++ b/cputlb.c > @@ -30,6 +30,8 @@ > #include "exec/ram_addr.h" > #include "exec/exec-all.h" > #include "tcg/tcg.h" > +#include "qemu/error-report.h" > +#include "exec/log.h" > > /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ > /* #define DEBUG_TLB */ > @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, > prot, mmu_idx, size); > } > > +static void report_bad_exec(CPUState *cpu, target_ulong addr) > +{ > +/* Accidentally executing outside RAM or ROM is quite common for > + * several user-error situations, so report it in a way that > + * makes it clear that this isn't a QEMU bug and provide suggestions > + * about what a user could do to fix things. > + */ > +error_report("Trying to execute code outside RAM or ROM at 0x" > + TARGET_FMT_lx, addr); > +error_printf("This usually means one of the following happened:\n\n" > + "(1) You told QEMU to execute a kernel for the wrong > machine " > + "type, and it crashed on startup (eg trying to run a " > + "raspberry pi kernel on a versatilepb QEMU machine)\n" > + "(2) You didn't give QEMU a kernel or BIOS filename at all, > " > + "and QEMU executed a ROM full of no-op instructions until " > + "it fell off the end\n" > + "(3) Your guest kernel has a bug and crashed by jumping " > + "off into nowhere\n\n" > + "This is almost always one of the first two, so check your " > + "command line and that you are using the right type of > kernel " > + "for this machine.\n" > + "If you think option (3) is likely then you can try > debugging " > + "your guest with the -d debug options; in particular " > + "-d guest_errors will cause the log to include a dump of > the " > + "guest register state at this point.\n\n" > + "Execution cannot continue; stopping here.\n\n"); > + > +/* Report also to the logs, with more detail including register dump */ > +qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " > + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); > +log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); > +} > + > /* NOTE: this function can trigger an exception */ > /* NOTE2: the returned address is not exactly the physical address: it > * is actually a ram_addr_t (in system mode; the user mode emulation > @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, > target_ulong addr) > if (cc->do_unassigned_access) { > cc->do_unassigned_access(cpu, addr, false, true, 0, 4); > } else { > -cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" > - TARGET_FMT_lx "\n", addr); > +report_bad_exec(cpu, addr); > +exit(1); > } > } > p = (void *)((uintptr_t)addr + > env1->tlb_table[mmu_idx][page_index].addend); > -- > 1.9.1
Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM
On 20 June 2016 at 20:16, Mark Cave-Ayland wrote: > Excellent! Another use case I see here is with HelenOS/ppc whose > bootloader is fixed at address 0x800 (128Mb) and so if you don't > increase the memory above the default then you end up with this panic, > which as you rightly point out is often confusing. For that one, if the real life machine always has more ram and we don't mind breaking migration back-compat for it, then we could set its default_ram_size to something other than 128MB. thanks -- PMM
Re: [Qemu-devel] [PATCH] cputlb: don't cpu_abort() if guest tries to execute outside RAM or RAM
On 20/06/16 18:07, Peter Maydell wrote: > In get_page_addr_code(), if the guest program counter turns out not to > be in ROM or RAM, we can't handle executing from it, and we call > cpu_abort(). This results in the message > qemu: fatal: Trying to execute code outside RAM or ROM at 0x0800 > followed by a guest register dump, and then QEMU dumps core. > > This situation happens in one of two cases: > (1) a guest kernel bug, where it jumped off into nowhere > (2) a user command line mistake, where they tried to run an image for > board A on a QEMU model of board B, or where they didn't provide > an image at all, and QEMU executed through a ROM or RAM full of > NOP instructions and then fell off the end > > In either case, a core dump of QEMU itself is entirely useless, and > only confuses users into thinking that this is a bug in QEMU rather > than a bug in the guest or a problem with their command line. (This > is a variation on the general idea that we shouldn't assert() on > something the user can accidentally provoke.) > > Replace the cpu_abort() with something that explains the situation > a bit better and exits QEMU without dumping core. > > (See LP:1062220 for several examples of confused users.) > > Signed-off-by: Peter Maydell > --- > I've been meaning to do this for a while now...hopefully the > expanded error message should reduce user confusion. > > cputlb.c | 39 +-- > 1 file changed, 37 insertions(+), 2 deletions(-) > > diff --git a/cputlb.c b/cputlb.c > index 23c9b91..079e497 100644 > --- a/cputlb.c > +++ b/cputlb.c > @@ -30,6 +30,8 @@ > #include "exec/ram_addr.h" > #include "exec/exec-all.h" > #include "tcg/tcg.h" > +#include "qemu/error-report.h" > +#include "exec/log.h" > > /* DEBUG defines, enable DEBUG_TLB_LOG to log to the CPU_LOG_MMU target */ > /* #define DEBUG_TLB */ > @@ -427,6 +429,39 @@ void tlb_set_page(CPUState *cpu, target_ulong vaddr, > prot, mmu_idx, size); > } > > +static void report_bad_exec(CPUState *cpu, target_ulong addr) > +{ > +/* Accidentally executing outside RAM or ROM is quite common for > + * several user-error situations, so report it in a way that > + * makes it clear that this isn't a QEMU bug and provide suggestions > + * about what a user could do to fix things. > + */ > +error_report("Trying to execute code outside RAM or ROM at 0x" > + TARGET_FMT_lx, addr); > +error_printf("This usually means one of the following happened:\n\n" > + "(1) You told QEMU to execute a kernel for the wrong > machine " > + "type, and it crashed on startup (eg trying to run a " > + "raspberry pi kernel on a versatilepb QEMU machine)\n" > + "(2) You didn't give QEMU a kernel or BIOS filename at all, > " > + "and QEMU executed a ROM full of no-op instructions until " > + "it fell off the end\n" > + "(3) Your guest kernel has a bug and crashed by jumping " > + "off into nowhere\n\n" > + "This is almost always one of the first two, so check your " > + "command line and that you are using the right type of > kernel " > + "for this machine.\n" > + "If you think option (3) is likely then you can try > debugging " > + "your guest with the -d debug options; in particular " > + "-d guest_errors will cause the log to include a dump of > the " > + "guest register state at this point.\n\n" > + "Execution cannot continue; stopping here.\n\n"); > + > +/* Report also to the logs, with more detail including register dump */ > +qemu_log_mask(LOG_GUEST_ERROR, "qemu: fatal: Trying to execute code " > + "outside RAM or ROM at 0x" TARGET_FMT_lx "\n", addr); > +log_cpu_state_mask(LOG_GUEST_ERROR, cpu, CPU_DUMP_FPU | CPU_DUMP_CCOP); > +} > + > /* NOTE: this function can trigger an exception */ > /* NOTE2: the returned address is not exactly the physical address: it > * is actually a ram_addr_t (in system mode; the user mode emulation > @@ -455,8 +490,8 @@ tb_page_addr_t get_page_addr_code(CPUArchState *env1, > target_ulong addr) > if (cc->do_unassigned_access) { > cc->do_unassigned_access(cpu, addr, false, true, 0, 4); > } else { > -cpu_abort(cpu, "Trying to execute code outside RAM or ROM at 0x" > - TARGET_FMT_lx "\n", addr); > +report_bad_exec(cpu, addr); > +exit(1); > } > } > p = (void *)((uintptr_t)addr + > env1->tlb_table[mmu_idx][page_index].addend); > Excellent! Another use case I see here is with HelenOS/ppc whose bootloader is fixed at address 0x800 (128Mb) and so if you don't increase the memory above the default then you end up with this panic,