Nathan Lynch's on May 2, 2019 12:57 am: > Hi Thiago, > > Thiago Jung Bauermann <bauer...@linux.ibm.com> writes: >> Nathan Lynch <nath...@linux.ibm.com> writes: >>> Thiago Jung Bauermann <bauer...@linux.ibm.com> writes: >>>> + while (true) { >>>> cpu_status = smp_query_cpu_stopped(pcpu); >>>> if (cpu_status == QCSS_STOPPED || >>>> cpu_status == QCSS_HARDWARE_ERROR) >>>> break; >>>> - cpu_relax(); >>>> + udelay(100); >>>> } >>>> } >>> >>> I agree with looping indefinitely but doesn't it need a cond_resched() >>> or similar check? >> >> If there's no kernel or hypervisor bug, it shouldn't take more than a >> few tens of ms for this loop to complete (Gautham measured a maximum of >> 10 ms on a POWER9 with an earlier version of this patch). > > 10ms is twice the default scheduler quantum... > > >> In case of bugs related to CPU hotplug (either in the kernel or the >> hypervisor), I was hoping that the resulting lockup warnings would be a >> good indicator that something is wrong. :-) > > Not convinced we should assume something is wrong if it takes a few > dozen ms to complete the operation.
Right, and if there is no kernel or hypervisor bug then it will stop eventually :) > AFAIK we don't have any guarantees > about the maximum latency of stop-self, and it can be affected by other > activity in the system, whether we're in shared processor mode, etc. Not > to mention smp_query_cpu_stopped has to acquire the global RTAS lock and > be serialized with other tasks calling into RTAS. So I am concerned > about generating spurious warnings here. Agreed. > > If for whatever reason the operation is taking too long, drmgr or > whichever application is initiating the change will appear to stop > making progress. It's not too hard to find out what's going on with > facilities like perf or /proc/pid/stack. > > >> Though perhaps adding a cond_resched() every 10 ms or so, with a >> WARN_ON() if it loops for more than 50 ms would be better. > > A warning doesn't seem appropriate to me, and cond_resched should be > invoked in each iteration. Or just msleep(1) in each iteration would be > fine, I think. > > But I'd like to bring in some more context -- here is the body of > pseries_cpu_die: > > static void pseries_cpu_die(unsigned int cpu) > { > int tries; > int cpu_status = 1; > unsigned int pcpu = get_hard_smp_processor_id(cpu); > > if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) { > cpu_status = 1; > for (tries = 0; tries < 5000; tries++) { > if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) { > cpu_status = 0; > break; > } > msleep(1); > } > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { > > for (tries = 0; tries < 25; tries++) { > cpu_status = smp_query_cpu_stopped(pcpu); > if (cpu_status == QCSS_STOPPED || > cpu_status == QCSS_HARDWARE_ERROR) > break; > cpu_relax(); > } > } > > This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE > branch). The CPU_STATE_INACTIVE branch is used when the offline behavior > is to use H_CEDE instead of stop-self, correct? > > And isn't entering H_CEDE expected to be quite a bit faster than > stop-self? If so, why does that path get five whole seconds[*] while > we're bikeshedding about tens of milliseconds for stop-self? :-) > > [*] And should it be made to retry indefinitely as well? I think so. Thanks, Nick