On 01/18/17 11:19, Laszlo Ersek wrote: > On 01/18/17 11:03, Igor Mammedov wrote: >> On Tue, 17 Jan 2017 19:53:21 +0100 >> Laszlo Ersek <ler...@redhat.com> wrote: >>
[snip] >>> This is the code I wrote (extracting a new ich9_apm_broadcast_smi() >>> function from ich9_apm_ctrl_changed(), due to size reasons): >>> >>>> static void ich9_apm_broadcast_smi(void) >>>> { >>>> CPUState *cs; >>>> >>>> cpu_synchronize_all_states(); /* <--------- mark this */ >> it probably should be executed after pause_all_vcpus(), >> maybe it will help. >> >>>> CPU_FOREACH(cs) { >>>> X86CPU *cpu = X86_CPU(cs); >>>> CPUX86State *env = &cpu->env; >>>> >>>> if (env->smbase == 0x30000 && env->eip == 0xfff0) { >>>> CPUClass *k = CPU_GET_CLASS(cs); >>>> uint64_t cpu_arch_id = k->get_arch_id(cs); >>>> >>>> trace_ich9_apm_broadcast_smi_skip(cpu_arch_id); >>>> continue; >>>> } >>>> >>>> cpu_interrupt(cs, CPU_INTERRUPT_SMI); >>>> } >>>> } >>> >> [...] >>> (b) If I add the cpu_synchronize_all_states() call before the loop, then >>> the incorrect filter matches go away; QEMU sees the KVM VCPU states >>> correctly, and the SMI is broad-cast. >>> >>> However, in this case, the boot slows to a *crawl*. It's unbearable. All >>> VCPUs are pegged at 100%, and you can easily follow the OVMF debug log >>> with the naked eye, almost line by line. >>> I didn't expect that cpu_synchronize_all_states() would be so costly, >>> but it makes the idea of checking VCPU state in the APM_CNT handler a >>> non-starter. >> I wonder were bottleneck is to slow down guest so much. >> What is actual cost of calling cpu_synchronize_all_states()? >> >> Maybe calling pause_all_vcpus() before cpu_synchronize_all_states() >> would help. > > If I prepend just a pause_all_vcpus() function call, at the top, then > the VM freezes (quite understandably) when the first SMI is raised via > APM_CNT. > > If I, instead, bracket the function body with pause_all_vcpus() and > resume_all_vcpus(), like this: > > static void ich9_apm_broadcast_smi(void) > { > CPUState *cs; > > pause_all_vcpus(); > cpu_synchronize_all_states(); > CPU_FOREACH(cs) { > X86CPU *cpu = X86_CPU(cs); > CPUX86State *env = &cpu->env; > > if (env->smbase == 0x30000 && env->eip == 0xfff0) { > CPUClass *k = CPU_GET_CLASS(cs); > uint64_t cpu_arch_id = k->get_arch_id(cs); > > trace_ich9_apm_broadcast_smi_skip(cpu_arch_id); > continue; > } > > cpu_interrupt(cs, CPU_INTERRUPT_SMI); > } > resume_all_vcpus(); > } > > then I see the following symptoms: > - rather than 4 VCPUs being pegged at 100%, only one VCPU is pegged at > 100% > - the boot process, as observed from the OVMF log, is just as slow > (= crawling) as without the VCPU pausing/resuming (i.e., like with > cpu_synchronize_all_states() only); so ultimately the > pausing/resuming doesn't help. I also tried this variant (bracketing only the sync call with pause/resume): static void ich9_apm_broadcast_smi(void) { CPUState *cs; pause_all_vcpus(); cpu_synchronize_all_states(); resume_all_vcpus(); CPU_FOREACH(cs) { X86CPU *cpu = X86_CPU(cs); CPUX86State *env = &cpu->env; if (env->smbase == 0x30000 && env->eip == 0xfff0) { CPUClass *k = CPU_GET_CLASS(cs); uint64_t cpu_arch_id = k->get_arch_id(cs); trace_ich9_apm_broadcast_smi_skip(cpu_arch_id); continue; } cpu_interrupt(cs, CPU_INTERRUPT_SMI); } } This behaves identically to the version where pause/resume bracket the entire function body (i.e., 1 VCPU pegged, super-slow boot progress). Laszlo