Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU

Nicholas Piggin Wed, 01 May 2019 16:13:22 -0700

Nathan Lynch's on May 2, 2019 12:57 am:
> Hi Thiago,
> 
> Thiago Jung Bauermann <bauer...@linux.ibm.com> writes:
>> Nathan Lynch <nath...@linux.ibm.com> writes:
>>> Thiago Jung Bauermann <bauer...@linux.ibm.com> writes:
>>>> +          while (true) {
>>>>                    cpu_status = smp_query_cpu_stopped(pcpu);
>>>>                    if (cpu_status == QCSS_STOPPED ||
>>>>                        cpu_status == QCSS_HARDWARE_ERROR)
>>>>                            break;
>>>> -                  cpu_relax();
>>>> +                  udelay(100);
>>>>            }
>>>>    }
>>>
>>> I agree with looping indefinitely but doesn't it need a cond_resched()
>>> or similar check?
>>
>> If there's no kernel or hypervisor bug, it shouldn't take more than a
>> few tens of ms for this loop to complete (Gautham measured a maximum of
>> 10 ms on a POWER9 with an earlier version of this patch).
> 
> 10ms is twice the default scheduler quantum...
> 
> 
>> In case of bugs related to CPU hotplug (either in the kernel or the
>> hypervisor), I was hoping that the resulting lockup warnings would be a
>> good indicator that something is wrong. :-)
> 
> Not convinced we should assume something is wrong if it takes a few
> dozen ms to complete the operation.


Right, and if there is no kernel or hypervisor bug then it will stop
eventually :)

> AFAIK we don't have any guarantees
> about the maximum latency of stop-self, and it can be affected by other
> activity in the system, whether we're in shared processor mode, etc. Not
> to mention smp_query_cpu_stopped has to acquire the global RTAS lock and
> be serialized with other tasks calling into RTAS. So I am concerned
> about generating spurious warnings here.

Agreed.

> 
> If for whatever reason the operation is taking too long, drmgr or
> whichever application is initiating the change will appear to stop
> making progress. It's not too hard to find out what's going on with
> facilities like perf or /proc/pid/stack.
> 
> 
>> Though perhaps adding a cond_resched() every 10 ms or so, with a
>> WARN_ON() if it loops for more than 50 ms would be better.
> 
> A warning doesn't seem appropriate to me, and cond_resched should be
> invoked in each iteration. Or just msleep(1) in each iteration would be
> fine, I think.
> 
> But I'd like to bring in some more context -- here is the body of
> pseries_cpu_die:
> 
> static void pseries_cpu_die(unsigned int cpu)
> {
>       int tries;
>       int cpu_status = 1;
>       unsigned int pcpu = get_hard_smp_processor_id(cpu);
> 
>       if (get_preferred_offline_state(cpu) == CPU_STATE_INACTIVE) {
>               cpu_status = 1;
>               for (tries = 0; tries < 5000; tries++) {
>                       if (get_cpu_current_state(cpu) == CPU_STATE_INACTIVE) {
>                               cpu_status = 0;
>                               break;
>                       }
>                       msleep(1);
>               }
>       } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> 
>               for (tries = 0; tries < 25; tries++) {
>                       cpu_status = smp_query_cpu_stopped(pcpu);
>                       if (cpu_status == QCSS_STOPPED ||
>                           cpu_status == QCSS_HARDWARE_ERROR)
>                               break;
>                       cpu_relax();
>               }
> }
> 
> This patch alters the behavior of the second loop (the CPU_STATE_OFFLINE
> branch). The CPU_STATE_INACTIVE branch is used when the offline behavior
> is to use H_CEDE instead of stop-self, correct?
> 
> And isn't entering H_CEDE expected to be quite a bit faster than
> stop-self? If so, why does that path get five whole seconds[*] while
> we're bikeshedding about tens of milliseconds for stop-self? :-)
> 
> [*] And should it be made to retry indefinitely as well?

I think so.

Thanks,
Nick

Re: [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU

Reply via email to