I've encountered what appears to be a race condition in arm-linux-user: sometimes the program qemu is running will hang forever and sometimes it will complete. After digging about there appears to be at least two problems:
1) An initial cause of a SIGSEGV 2) The incorrect handling of the SIGSEGV When #2 happens the symptom is an infinite loop in handling the SIGSEGV. Using GDB I've stepped through the signal handling and have cause to question a section of code that I would appreciate comments on. The target-specific cpu-exec.c:handle_cpu_signal() appears to have a different logical flow for arm than all other architectures and I wonder if it is incorrectly implemented. All architecture versions of handle_cpu_signal() will do one of four things: 1) return 0 or 1 at the start for certain conditions 2) not return by calling (do_)raise_exception_err() which eventually calls cpu_loop_exit() 3) not return by calling cpu_resume_from_signal() which eventually calls longjmp() 4) call cpu_loop_exit() directly (which eventually calls longjmp()) After these various code paths there's usually a comment that says "never comes here". The arm target is an exception to the above outline. raise_exception_err() has been comment disabled and the cpu_loop_exit() is incorrectly indented giving a false impression that it will always be called at the end of the function. I'm hypothesizing that for arm handle_cpu_signal() incorrectly returns and doesn't break an infinite loop. Please comment! Below is the questionable handle_cpu_signal(): static inline int handle_cpu_signal(unsigned long pc, unsigned long address, int is_write, sigset_t *old_set, void *puc) { TranslationBlock *tb; int ret; if (cpu_single_env) env = cpu_single_env; /* XXX: find a correct solution for multithread */ #if defined(DEBUG_SIGNAL) printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n", pc, address, is_write, *(unsigned long *)old_set); #endif /* XXX: locking issue */ if (is_write && page_unprotect(h2g(address), pc, puc)) { return 1; } /* see if it is an MMU fault */ ret = cpu_arm_handle_mmu_fault(env, address, is_write, 1, 0); if (ret < 0) return 0; /* not an MMU fault */ if (ret == 0) return 1; /* the MMU fault was handled without causing real CPU fault */ /* now we have a real cpu fault */ tb = tb_find_pc(pc); if (tb) { /* the PC is inside the translated code. It means that we have a virtual CPU fault */ cpu_restore_state(tb, env, pc, puc); } if( ret == 1 ) { sigprocmask(SIG_SETMASK, old_set, NULL); //raise_exception_err(env->exception_index, env->error_code); } else { /* we restore the process signal mask as the sigreturn should do it (XXX: use sigsetjmp) */ sigprocmask(SIG_SETMASK, old_set, NULL); cpu_loop_exit(); } }