Re: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
On Wed, 2021-08-18 at 12:21 +0200, Juergen Gross wrote: > Fix that by letting get_cpu_idle_time() deal with this case. > > Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core > scheduling") > Reported-by: Marek Marczykowski-Górecki > > Signed-off-by: Juergen Gross > Tested-by: Marek Marczykowski-Górecki > > Mmm... This is an interesting one. In fact, this fix is not only correct, it's also simple, effective and (I guess) easy enough to backport. Considering all these things together with the fact that we have an open issue, I'm going to provide my: Acked-by: Dario Faggioli (and this remains valid with Jan's proposed change done upon committing.) That said, in the long run... > --- > An alternative way to fix the issue would be to keep the > sched_resource > of offline cpus allocated like we already do with idle vcpus and > units. > This fix would be more intrusive, but it would avoid similar other > bugs > like this one. > ... it would be probably interesting to go this route, as it looks both more consistent and future proof (I mean implement it proactively, independently of issues... when/if we have time, of course!) Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ --- <> (Raistlin Majere) signature.asc Description: This is a digitally signed message part
Re: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
On 18.08.21 12:35, Jan Beulich wrote: On 18.08.2021 12:21, Juergen Gross wrote: With smt=0 during a suspend/resume cycle of the machine the threads which have been parked before will briefly come up again. This can result in problems e.g. with cpufreq driver being active as this will call into get_cpu_idle_time() for a cpu without initialized scheduler data. Fix that by letting get_cpu_idle_time() deal with this case. Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling") Reported-by: Marek Marczykowski-Górecki Signed-off-by: Juergen Gross Tested-by: Marek Marczykowski-Górecki Reviewed-by: Jan Beulich --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu) struct vcpu_runstate_info state = { 0 }; const struct vcpu *v = idle_vcpu[cpu]; -if ( cpu_online(cpu) && v ) +if ( cpu_online(cpu) && v && get_sched_res(cpu) ) vcpu_runstate_get(v, ); My earlier question was aiming at getting rid of the (now) middle part of the condition; I thought this may be okay to do as a secondary change here. But perhaps you intentionally left it there, so I'm unsure whether to suggest to make the adjustment while committing (awaiting a maintainer ack first anyway). Ah, okay. Yes, I think the test of v being non-NULL can be removed. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
On 18.08.2021 12:21, Juergen Gross wrote: > With smt=0 during a suspend/resume cycle of the machine the threads > which have been parked before will briefly come up again. This can > result in problems e.g. with cpufreq driver being active as this will > call into get_cpu_idle_time() for a cpu without initialized scheduler > data. > > Fix that by letting get_cpu_idle_time() deal with this case. > > Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core > scheduling") > Reported-by: Marek Marczykowski-Górecki > Signed-off-by: Juergen Gross > Tested-by: Marek Marczykowski-Górecki Reviewed-by: Jan Beulich > --- a/xen/common/sched/core.c > +++ b/xen/common/sched/core.c > @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu) > struct vcpu_runstate_info state = { 0 }; > const struct vcpu *v = idle_vcpu[cpu]; > > -if ( cpu_online(cpu) && v ) > +if ( cpu_online(cpu) && v && get_sched_res(cpu) ) > vcpu_runstate_get(v, ); My earlier question was aiming at getting rid of the (now) middle part of the condition; I thought this may be okay to do as a secondary change here. But perhaps you intentionally left it there, so I'm unsure whether to suggest to make the adjustment while committing (awaiting a maintainer ack first anyway). Jan
[PATCH] xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
With smt=0 during a suspend/resume cycle of the machine the threads which have been parked before will briefly come up again. This can result in problems e.g. with cpufreq driver being active as this will call into get_cpu_idle_time() for a cpu without initialized scheduler data. Fix that by letting get_cpu_idle_time() deal with this case. Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling") Reported-by: Marek Marczykowski-Górecki Signed-off-by: Juergen Gross Tested-by: Marek Marczykowski-Górecki --- An alternative way to fix the issue would be to keep the sched_resource of offline cpus allocated like we already do with idle vcpus and units. This fix would be more intrusive, but it would avoid similar other bugs like this one. --- xen/common/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 6d34764d38..9ac1b01ca8 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu) struct vcpu_runstate_info state = { 0 }; const struct vcpu *v = idle_vcpu[cpu]; -if ( cpu_online(cpu) && v ) +if ( cpu_online(cpu) && v && get_sched_res(cpu) ) vcpu_runstate_get(v, ); return state.time[RUNSTATE_running]; -- 2.26.2