On Wed, Jul 19, 2017 at 09:20:52AM +0530, Nikunj A Dadhania wrote: > David Gibson <da...@gibson.dropbear.id.au> writes: > > > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote: > >> David Gibson <da...@gibson.dropbear.id.au> writes: > >> > >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote: > >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG. > >> >> > >> >> When reset happens, all the CPUs are in halted state. First CPU is > >> >> brought out > >> >> of reset and secondary CPUs would be initialized by the guest kernel > >> >> using a > >> >> rtas call start-cpu. > >> >> > >> >> However, in case of TCG, decrementer interrupts keep on coming and > >> >> waking the > >> >> secondary CPUs up. > >> >> > >> >> These secondary CPUs would see the decrementer interrupt pending, which > >> >> makes > >> >> cpu::has_work() to bring them out of wait loop and start executing > >> >> tcg_exec_cpu(). > >> >> > >> >> The problem with this is all the CPUs wake up and start booting SLOF > >> >> image, > >> >> causing the following exception(4 CPUs TCG VM): > >> > > >> > Ok, I'm still trying to understand why the behaviour on reboot is > >> > different from the first boot. > >> > >> During first boot, the cpu is in the stopped state, so > >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state > >> until rtas start-cpu. Therefore, we never check the cpu_has_work() > >> > >> In case of reboot, all CPUs are resumed after reboot. So we check the > >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a > >> DECR interrupt and remove the CPU from halted state as the CPU has > >> work. > > > > Ok, so it sounds like we should set stopped on all the secondary CPUs > > on reset as well. What's causing them to be resumed after the reset > > at the moment? > > That is part of the main loop in vl.c, when reset is requested. All the > vcpus are paused (stopped == true) then system reset is issued, and all > cpus are resumed (stopped == false). Which is correct.
is it? Seems we have a different value of 'stopped' on the first boot compared to reoboots, which doesn't seem right. > static bool main_loop_should_exit(void) > { > [...] > request = qemu_reset_requested(); > if (request) { > pause_all_vcpus(); > qemu_system_reset(request); > resume_all_vcpus(); > if (!runstate_check(RUN_STATE_RUNNING) && > !runstate_check(RUN_STATE_INMIGRATE)) { > runstate_set(RUN_STATE_PRELAUNCH); > } > } > [...] > } > > The CPUs are in halted state, i.e. cpu::halted = 1. We have set > cpu::halted = 0 for the primary CPU, aka first_cpu, in > spapr.c::ppc_spapr_reset() > > In case of TCG, we have a decr callback (per CPU) scheduled once the > machine is started which keeps on running and interrupting irrespective > of the state of the machine. Right. The thing is "halted" means waiting-for-interrupt; it's mostly used for things like x86 hlt instruction, H_CEDE, short-term nap modes and so forth. We're abusing it a little for keeping the secondary CPUs offline until they're explicitly started. Trouble is, there isn't a clearly better option - stopped is automatically turned off after reset as above, so it can't be used for "stopped under firmware / hypervisor authority". > tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu); > > Which keeps on queueing the interrupt to the CPUs. All the other CPUs > are in a tight loop checking cpu_thread_is_idle(), which returns false > when it sees the decrementer interrupt, the cpu starts executing: > > cpu_exec() > -> cpu_handle_halt() > -> sees cpu_has_work() and sets cpu->halted = 0 > > And the execution resumes, when it shouldnt have. Ideally, for secondary > CPUs, cpu::halted flag is cleared in rtas start-cpu call. > > Initially, I thought we should not have interrupt during reset state. > That was the reason of ignoring decr interrupt when msr_ee was disabled > in my previous patch. BenH suggested that it is wrong from HW > perspective. > > Regards, > Nikunj > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature