Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-16 Thread Gautham R Shenoy
Hello Michael,

On Mon, Jan 14, 2019 at 12:11:44PM -0600, Michael Bringmann wrote:
> On 1/9/19 12:08 AM, Gautham R Shenoy wrote:
> 
> > I did some testing during the holidays. Here are the observations:
> > 
> > 1) With just your patch (without any additional debug patch), if I run
> > DLPAR on /off operations on a system that has SMT=off, I am able to
> > see a crash involving RTAS stack corruption within an hour's time.
> > 
> > 2) With the debug patch (appended below) which has additional debug to
> > capture the callers of stop-self, start-cpu, set-power-levels, the
> > system is able to perform DLPAR on/off operations on a system with
> > SMT=off for three days. And then, it crashed with the dead CPU showing
> > a "Bad kernel stack pointer". From this log, I can clearly
> > see that there were no concurrent calls to stop-self, start-cpu,
> > set-power-levels. The only concurrent RTAS calls were the dying CPU
> > calling "stop-self", and the CPU running the DLPAR operation calling
> > "query-cpu-stopped-state". The crash signature is appended below as
> > well.
> > 
> > 3) Modifying your patch to remove the udelay and increase the loop
> > count from 25 to 1000 doesn't improve the situation. We are still able
> > to see the crash.
> > 
> > 4) With my patch, even without any additional debug, I was able to
> > observe the system run the tests successfully for over a week (I
> > started the tests before the Christmas weekend, and forgot to turn it
> > off!)
> 
> So does this mean that the problem is fixed with your patch?

No. On the contrary I think my patch unable to exploit the possible
race window.  From a technical point of view, Thiago's patch does the
right things

  - It waits for the target CPU to come out of CEDE and set its state
to CPU_STATE_OFFLINE.
  
  - Only then it then makes the "query-cpu-stopped-state" rtas call in
a loop, while giving sufficient delay between successive
queries. This reduces the unnecessary rtas-calls.

In my patch, I don't do any of this, but simply keep calling
"query-cpu-stopped-state" call in a loop for 4000 iterations.

That said, if I incorporate the wait for the target CPU to set its
state to CPU_STATE_OFFLINE in my patch and then make the
"query-cpu-stopped-state" rtas call, then, even I am able to get the
crash with the "RTAS CALL BUFFER CORRUPTION" message.

I think that incorporating the wait for the target CPU set its state
to CPU_STATE_OFFLINE increases the probability that
"query-cpu-stopped-state" and "stop-self" rtas calls get called more
or less at the same time. However since concurrent invocations of
these rtas-calls is allowed by the PAPR, it should not result in a
"RTAS CALL BUFFER CORRUPTION". Am I missing something here ?

> 
> > 
> > It appears that there is a narrow race window involving rtas-stop-self
> > and rtas-query-cpu-stopped-state calls that can be observed with your
> > patch. Adding any printk's seems to reduce the probability of hitting
> > this race window. It might be worth the while to check with RTAS
> > folks, if they suspect something here.
> 
> What would the RTAS folks be looking at here?  The 'narrow race window'
> is with respect to a patch that it sound like we should not be
> using.

IMHO, if the race-window exists, it would be better to confirm and fix
it, given that we have a patch that can exploit it consistently.

> 
> Thanks.
> Michael
> 
> -- 
> Michael W. Bringmann
> Linux Technology Center
> IBM Corporation
> Tie-Line  363-5196
> External: (512) 286-5196
> Cell:   (512) 466-0650
> m...@linux.vnet.ibm.com

--
Thanks and Regards
gautham.



Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-14 Thread Michael Bringmann
On 1/9/19 12:08 AM, Gautham R Shenoy wrote:

> I did some testing during the holidays. Here are the observations:
> 
> 1) With just your patch (without any additional debug patch), if I run
> DLPAR on /off operations on a system that has SMT=off, I am able to
> see a crash involving RTAS stack corruption within an hour's time.
> 
> 2) With the debug patch (appended below) which has additional debug to
> capture the callers of stop-self, start-cpu, set-power-levels, the
> system is able to perform DLPAR on/off operations on a system with
> SMT=off for three days. And then, it crashed with the dead CPU showing
> a "Bad kernel stack pointer". From this log, I can clearly
> see that there were no concurrent calls to stop-self, start-cpu,
> set-power-levels. The only concurrent RTAS calls were the dying CPU
> calling "stop-self", and the CPU running the DLPAR operation calling
> "query-cpu-stopped-state". The crash signature is appended below as
> well.
> 
> 3) Modifying your patch to remove the udelay and increase the loop
> count from 25 to 1000 doesn't improve the situation. We are still able
> to see the crash.
> 
> 4) With my patch, even without any additional debug, I was able to
> observe the system run the tests successfully for over a week (I
> started the tests before the Christmas weekend, and forgot to turn it
> off!)

So does this mean that the problem is fixed with your patch?

> 
> It appears that there is a narrow race window involving rtas-stop-self
> and rtas-query-cpu-stopped-state calls that can be observed with your
> patch. Adding any printk's seems to reduce the probability of hitting
> this race window. It might be worth the while to check with RTAS
> folks, if they suspect something here.

What would the RTAS folks be looking at here?  The 'narrow race window'
is with respect to a patch that it sound like we should not be using.

Thanks.
Michael

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2019-01-08 Thread Gautham R Shenoy
Hello Thiago,

Wish you a happy 2019!

On Sat, Dec 08, 2018 at 12:40:52AM -0200, Thiago Jung Bauermann wrote:
> 
> Gautham R Shenoy  writes:
> > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:
> >> Sure. I will test the patch and report back.
> >
> > I added the following debug patch on top of your patch, and after an
> > hour's run, the system crashed. Appending the log at the end.
> 
> Thank you very much for testing! Your debug patch was very helpful as
> well.
> 
> > I suppose we still need to increase the number of tries since we wait
> > only for 2.5ms looping before giving up.
> 
> Do you think it would have helped? In the debug output you posted I
> would have expected the following message to show up if the loop
> finished too early, and it didn't:
> 
> "Querying DEAD? cpu %i (%i) shows %i\n"
> 
> So I don't think increasing the loop length would have made a
> difference. In fact, the call to smp_query_cpu_stopped() always
> succeeded on the first iteration.

I did some testing during the holidays. Here are the observations:

1) With just your patch (without any additional debug patch), if I run
DLPAR on /off operations on a system that has SMT=off, I am able to
see a crash involving RTAS stack corruption within an hour's time.

2) With the debug patch (appended below) which has additional debug to
capture the callers of stop-self, start-cpu, set-power-levels, the
system is able to perform DLPAR on/off operations on a system with
SMT=off for three days. And then, it crashed with the dead CPU showing
a "Bad kernel stack pointer". From this log, I can clearly
see that there were no concurrent calls to stop-self, start-cpu,
set-power-levels. The only concurrent RTAS calls were the dying CPU
calling "stop-self", and the CPU running the DLPAR operation calling
"query-cpu-stopped-state". The crash signature is appended below as
well.

3) Modifying your patch to remove the udelay and increase the loop
count from 25 to 1000 doesn't improve the situation. We are still able
to see the crash.

4) With my patch, even without any additional debug, I was able to
observe the system run the tests successfully for over a week (I
started the tests before the Christmas weekend, and forgot to turn it
off!)

It appears that there is a narrow race window involving rtas-stop-self
and rtas-query-cpu-stopped-state calls that can be observed with your
patch. Adding any printk's seems to reduce the probability of hitting
this race window. It might be worth the while to check with RTAS
folks, if they suspect something here.

The Crash log with this debug patch is as follows

[DEBUG] CPU 32 waiting for CPU 130 to enter rtas
cpu 130 (hwid 130) Ready to die...
[DEBUG] CPU 32 noticed CPU 130 enter rtas: tries=0, time=539
[DEBUG] CPU 32 waiting for CPU 131 to enter rtas
cpu 131 (hwid 131) Ready to die...
[DEBUG] CPU 32 noticed CPU 131 enter rtas: tries=0, time=137
[DEBUG] CPU 32 waiting for CPU 132 to enter rtas
cpu 132 (hwid 132) Ready to die...
[DEBUG] CPU 32 noticed CPU 132 enter rtas: tries=0, time=1238
[DEBUG] CPU 32 waiting for CPU 133 to enter rtas
cpu 133 (hwid 133) Ready to die...
[DEBUG] CPU 32 noticed CPU 133 enter rtas: tries=1, time=1259
[DEBUG] CPU 32 waiting for CPU 134 to enter rtas
cpu 134 (hwid 134) Ready to die...
[DEBUG] CPU 32 noticed CPU 134 enter rtas: tries=0, time=1141
[DEBUG] CPU 32 waiting for CPU 135 to enter rtas
cpu 135 (hwid 135) Ready to die...
[DEBUG] CPU 32 noticed CPU 135 enter rtas: tries=0, time=636
cpu 120 (hwid 120) Ready to die...
[DEBUG] CPU 32 waiting for CPU 120 to enter rtas
[DEBUG] CPU 32 noticed CPU 120 enter rtas: tries=0, time=53
[DEBUG] CPU 32 waiting for CPU 121 to enter rtas
cpu 121 (hwid 121) Ready to die...
Bad kernel stack pointer 1fafb400 at 1fafb4b0
Bad kernel stack pointer 1fafb4b0 at 1ec15270
Oops: Bad kernel stack pointer, sig: 6 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 121 PID: 0 Comm: swapper/121 Not tainted 
4.20.0-thiago-original-with-debug+ #57
NIP:  1ec15270 LR: 1ec1526c CTR: 1ecd26c4
REGS: c0001e577d30 TRAP: 0300   Not tainted  
(4.20.0-thiago-original-with-debug+)
MSR:  80001000   CR: 4800  XER: 0020
CFAR: 1ecd27a8 DAR: 8108 DSISR: 4000 IRQMASK: 
80009033
GPR00: 1ec1526c 1fafb4b0  
GPR04: 01a40968 00e0 dfe8 0018
GPR08: 1f82ae00 1ed025d4 1f827850 
GPR12: 01b6d998 c0001eb4e080 c003a1bdbf90 1eea3020
GPR16:  c006fdc45100 c004c8b0 c13d5300
GPR20: c006fdc45300 0008 c19d2cf8 c13d6888
GPR24: 0079 c13d688c 0002 c13d688c
GPR28: c19cecf0 03c8  1fafb4b0
NIP [1ec15270] 0x1ec15270
LR [1ec1526c] 0x1ec1526c
Call Trace:
Instruction dump:
 

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-11 Thread Michael Bringmann
Note from Scott Mayes on latest crash:

Michael,

Since the partition crashed, I was able to get the last .2 seconds worth of 
RTAS call trace leading up to the crash.

Best I could tell from that bit of trace was that the removal of a processor 
involved the following steps:
-- Call to stop-self for a given thread
-- Repeated calls to query-cpu-stopped-state (which eventually indicated the 
thread was stopped)
-- Call to get-sensor-state for the thread to check its entity-state (9003) 
sensor which returned 'dr-entity-present'
-- Call to set-indicator to set the isolation-state (9001) indicator to ISOLATE 
state
-- Call to set-indicator to set the allocation-state (9003) indicator to 
UNUSABLE state

I noticed one example of thread x28 getting through all of these steps just 
fine, but for thread x20, although the
query-cpu-stopped state returned 0 status (STOPPED), a subsequent call to 
set-indicator to ISOLATE
failed.  This failure was near the end of the trace, but was not the very last 
RTAS call made in the trace.
The set-indicator failure reported to Linux was a -9001 (Valid outstanding 
translation) which was mapped
from a 0x502 (Invalid thread state) return code from PHYP's H_SET_DR_STATE 
h-call.

On 12/10/2018 02:31 PM, Thiago Jung Bauermann wrote:
> 
> Hello Michael,
> 
> Michael Bringmann  writes:
> 
>> I have asked Scott Mayes to take a look at one of these crashes from
>> the phyp side.  I will let you know if he finds anything notable.
> 
> Thanks! It might make sense to test whether booting with
> cede_offline=off makes the bug go away.

Scott is looking at the system.  I will try once he is finished.

> 
> One suspicion I have is regarding the code handling CPU_STATE_INACTIVE.
>>From what I understand, it is a powerpc-specific CPU state and from the
> perspective of the generic CPU hotplug state machine, inactive CPUs are
> already fully offline. Which means that the locking performed by the
> generic code state machine doesn't apply to transitioning CPUs from
> INACTIVE to OFFLINE state. Perhaps the bug is that there is more than
> one CPU making that transition at the same time? That would cause two
> CPUs to call RTAS stop-self.
> 
> I haven't checked whether this is really possible or not, though. It's
> just a conjecture.

Michael

> 
> --
> Thiago Jung Bauermann
> IBM Linux Technology Center
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:   (512) 466-0650
m...@linux.vnet.ibm.com



Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-10 Thread Thiago Jung Bauermann


Hello Michael,

Michael Bringmann  writes:

> I have asked Scott Mayes to take a look at one of these crashes from
> the phyp side.  I will let you know if he finds anything notable.

Thanks! It might make sense to test whether booting with
cede_offline=off makes the bug go away.

One suspicion I have is regarding the code handling CPU_STATE_INACTIVE.
>From what I understand, it is a powerpc-specific CPU state and from the
perspective of the generic CPU hotplug state machine, inactive CPUs are
already fully offline. Which means that the locking performed by the
generic code state machine doesn't apply to transitioning CPUs from
INACTIVE to OFFLINE state. Perhaps the bug is that there is more than
one CPU making that transition at the same time? That would cause two
CPUs to call RTAS stop-self.

I haven't checked whether this is really possible or not, though. It's
just a conjecture.

--
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-10 Thread Michael Bringmann
I have asked Scott Mayes to take a look at one of these crashes from
the phyp side.  I will let you know if he finds anything notable.

Michael

On 12/07/2018 08:40 PM, Thiago Jung Bauermann wrote:
> 
> Gautham R Shenoy  writes:
>> On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:
>>> Sure. I will test the patch and report back.
>>
>> I added the following debug patch on top of your patch, and after an
>> hour's run, the system crashed. Appending the log at the end.
> 
> Thank you very much for testing! Your debug patch was very helpful as
> well.
> 
>> I suppose we still need to increase the number of tries since we wait
>> only for 2.5ms looping before giving up.
> 
> Do you think it would have helped? In the debug output you posted I
> would have expected the following message to show up if the loop
> finished too early, and it didn't:
> 
> "Querying DEAD? cpu %i (%i) shows %i\n"
> 
> So I don't think increasing the loop length would have made a
> difference. In fact, the call to smp_query_cpu_stopped() always
> succeeded on the first iteration.
> 
> I think there is something else going on which we don't fully understand
> yet. From your other email:
> 
>> I agree that the Kernel has to respect RTAS's restriction. The PAPR
>> v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following:
>>
>> "The stop-self service needs to be serialized with calls to the
>>  stop-self, start-cpu, and set-power-level services. The OS must
>>  be able to call RTAS services on other processors while the
>>  processor is stopped or being stopped"
>>
>> Thus the onus is on the OS to ensure that there are no concurrent rtas
>> calls with "stop-self" token.
> 
> As you say perhaps there's another call to stop-self, start-cpu or
> set-power-level being made concurrently. I don't currently see how more
> than one stop-self or start-cpu call could be in flight at the same time
> given that there are a number of locks being grabbed during CPU hotplug
> and unplug. OTOH the CPU that actually calls stop-self doesn't seem to
> grab any locks itself so it's a possibility.
> 
> As for set-power-level, it's only used in the case of PCI hotplug from
> what I can see, and that isn't part of the picture in this case, right?
> 
> We could address this problem directly by adding another lock separate
> from rtas.lock to serialize just these calls. The challenge is with
> stop-self, because the CPU calling it will never return to release the
> lock. Is it possible to grab a lock (or down a semaphore) in the CPU
> calling stop-self and then release the lock (or up the semaphore) in the
> CPU running pseries_cpu_die()?
> 
>>> There's also a race between the CPU driving the unplug and the CPU
>>> being unplugged which I think is not easy for the CPU being
>>> unplugged to win, which makes the busy loop in pseries_cpu_die() a
>>> bit fragile. I describe the race in the patch description.
>>>
>>> My solution to make the race less tight is to make the CPU driving
>>> the unplug to only start the busy loop only after the CPU being
>>> unplugged is in the CPU_STATE_OFFLINE state. At that point, we know
>>> that it either is about to call RTAS or it already has.
>>
>> Ah, yes this is good optimization. Though, I think we ought to
>> unconditionally wait until the target CPU has woken up from CEDE and
>> changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we
>> would have caught it in dlpar_offline_cpu() itself.
> 
> I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return
> success when it had been given an invalid memory address to load from,
> so my confidence in the error reporting of hcalls is a bit shaken. :-)
> 
> In that case the CPU would wait forever for the CPU state to change. If
> you believe 100 ms is too short a timeout, we could make it 500 ms or
> even 1s. What do you think?
> 
>> cpu 112 (hwid 112) Ready to die...
>> [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65
>> cpu 113 (hwid 113) Ready to die...
>> [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139
>> cpu 114 (hwid 114) Ready to die...
>> [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036
>> cpu 115 (hwid 115) Ready to die...
>> [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133
>> cpu 116 (hwid 116) Ready to die...
>> [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231
>> cpu 117 (hwid 117) Ready to die...
>> [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231
>> cpu 118 (hwid 118) Ready to die...
>> [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231
>> cpu 119 (hwid 119) Ready to die...
>> [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131
>> cpu 104 (hwid 104) Ready to die...
>> [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40
> 
> Interesting, so 1.2 ms can pass before the dying CPU actually gets close
> to making the stop-self call. And even in those cases the retry loop is
> succeeding on the first try! So this shows 

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Thiago Jung Bauermann


Gautham R Shenoy  writes:
> On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:
>> Sure. I will test the patch and report back.
>
> I added the following debug patch on top of your patch, and after an
> hour's run, the system crashed. Appending the log at the end.

Thank you very much for testing! Your debug patch was very helpful as
well.

> I suppose we still need to increase the number of tries since we wait
> only for 2.5ms looping before giving up.

Do you think it would have helped? In the debug output you posted I
would have expected the following message to show up if the loop
finished too early, and it didn't:

"Querying DEAD? cpu %i (%i) shows %i\n"

So I don't think increasing the loop length would have made a
difference. In fact, the call to smp_query_cpu_stopped() always
succeeded on the first iteration.

I think there is something else going on which we don't fully understand
yet. From your other email:

> I agree that the Kernel has to respect RTAS's restriction. The PAPR
> v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following:
>
> "The stop-self service needs to be serialized with calls to the
>  stop-self, start-cpu, and set-power-level services. The OS must
>  be able to call RTAS services on other processors while the
>  processor is stopped or being stopped"
>
> Thus the onus is on the OS to ensure that there are no concurrent rtas
> calls with "stop-self" token.

As you say perhaps there's another call to stop-self, start-cpu or
set-power-level being made concurrently. I don't currently see how more
than one stop-self or start-cpu call could be in flight at the same time
given that there are a number of locks being grabbed during CPU hotplug
and unplug. OTOH the CPU that actually calls stop-self doesn't seem to
grab any locks itself so it's a possibility.

As for set-power-level, it's only used in the case of PCI hotplug from
what I can see, and that isn't part of the picture in this case, right?

We could address this problem directly by adding another lock separate
from rtas.lock to serialize just these calls. The challenge is with
stop-self, because the CPU calling it will never return to release the
lock. Is it possible to grab a lock (or down a semaphore) in the CPU
calling stop-self and then release the lock (or up the semaphore) in the
CPU running pseries_cpu_die()?

> > There's also a race between the CPU driving the unplug and the CPU
> > being unplugged which I think is not easy for the CPU being
> > unplugged to win, which makes the busy loop in pseries_cpu_die() a
> > bit fragile. I describe the race in the patch description.
> >
> > My solution to make the race less tight is to make the CPU driving
> > the unplug to only start the busy loop only after the CPU being
> > unplugged is in the CPU_STATE_OFFLINE state. At that point, we know
> > that it either is about to call RTAS or it already has.
>
> Ah, yes this is good optimization. Though, I think we ought to
> unconditionally wait until the target CPU has woken up from CEDE and
> changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we
> would have caught it in dlpar_offline_cpu() itself.

I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return
success when it had been given an invalid memory address to load from,
so my confidence in the error reporting of hcalls is a bit shaken. :-)

In that case the CPU would wait forever for the CPU state to change. If
you believe 100 ms is too short a timeout, we could make it 500 ms or
even 1s. What do you think?

> cpu 112 (hwid 112) Ready to die...
> [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65
> cpu 113 (hwid 113) Ready to die...
> [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139
> cpu 114 (hwid 114) Ready to die...
> [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036
> cpu 115 (hwid 115) Ready to die...
> [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133
> cpu 116 (hwid 116) Ready to die...
> [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231
> cpu 117 (hwid 117) Ready to die...
> [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231
> cpu 118 (hwid 118) Ready to die...
> [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231
> cpu 119 (hwid 119) Ready to die...
> [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131
> cpu 104 (hwid 104) Ready to die...
> [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40

Interesting, so 1.2 ms can pass before the dying CPU actually gets close
to making the stop-self call. And even in those cases the retry loop is
succeeding on the first try! So this shows that changing the code to
wait for the CPU_STATE_OFFLINE state is worth it.

> *** RTAS CALL BUFFER CORRUPTION ***
> 393: rtas32_call_buff_ptr=
>  0060  0060  0060  0060 [...`...`...`...`]
>  0060  0060  0060  0060 [...`...`...`...`]
>  0060  0060  0060  0060 

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Gautham R Shenoy
On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote:
> Hi Thiago,
> 
> 
> Sure. I will test the patch and report back.

I added the following debug patch on top of your patch, and after an
hour's run, the system crashed. Appending the log at the end.

I suppose we still need to increase the number of tries since we wait
only for 2.5ms looping before giving up.

x8-
>From 9e9e1e459cdc63540a73575af359a7bc0d43e455 Mon Sep 17 00:00:00 2001
From: "Gautham R. Shenoy" 
Date: Fri, 7 Dec 2018 16:47:25 +0530
Subject: [PATCH] Add debug on top of Thiago's patch

Signed-off-by: Gautham R. Shenoy 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index fad3ceb..2545aab 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -214,6 +214,9 @@ static void pseries_cpu_die(unsigned int cpu)
msleep(1);
}
} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
+   u64 tb_before = mftb();
+   u64 tb_after;
+
/*
 * If the current state is not offline yet, it means that the
 * dying CPU (which is in pseries_mach_cpu_die) didn't have a
@@ -230,6 +233,11 @@ static void pseries_cpu_die(unsigned int cpu)
break;
udelay(100);
}
+
+   tb_after = mftb();
+
+   printk("[DEBUG] Waited for CPU %d to enter rtas: tries=%d, 
time=%llu\n", cpu, tries,
+   div_u64(tb_after - tb_before, tb_ticks_per_usec));
}
 
if (cpu_status != 0) {
-- 
1.9.4
x8-

cpu 112 (hwid 112) Ready to die...
[DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65
cpu 113 (hwid 113) Ready to die...
[DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139
cpu 114 (hwid 114) Ready to die...
[DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036
cpu 115 (hwid 115) Ready to die...
[DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133
cpu 116 (hwid 116) Ready to die...
[DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231
cpu 117 (hwid 117) Ready to die...
[DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231
cpu 118 (hwid 118) Ready to die...
[DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231
cpu 119 (hwid 119) Ready to die...
[DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131
cpu 104 (hwid 104) Ready to die...
[DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40  

*** RTAS CALL BUFFER CORRUPTION ***
393: rtas32_call_buff_ptr=
 0060  0060  0060  0060 [...`...`...`...`]
 0060  0060  0060  0060 [...`...`...`...`]
 0060  0060  0060  0060 [...`...`...`...`]
 0060 0800 E07F ACA7   00C0 [...`]
2500        [%...]
      306E 7572 [0nur]
4800 0008       [H...0nur]
394: rtas64_map_buff_ptr=
  5046 5743   4F44 4500 [PFWCODE.]
  6000     0069 [`..i]
        []
   0005    0001 []
  1A00      []
  8018 6398   0300 00C0 [..c.]
        [..c.]
cpu 105 (hwid 105) Ready to die...
Bad kernel stack pointer 1fafb6c0 at 0
Oops: Bad kernel stack pointer, sig: 6 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 105 PID: 0 Comm: swapper/105 Not tainted 4.20.0-rc5-thiago+ #45
NIP:   LR:  CTR: 007829c8
REGS: c0001e63bd30 TRAP: 0700   Not tainted  (4.20.0-rc5-thiago+)
MSR:  80081000   CR: 2804  XER: 0010
CFAR: 1ec153f0 IRQMASK: 80009033
GPR00:  1fafb6c0 1ec236a0 0040
GPR04: 00c0 0080 00046c4fb4842557 00cd
GPR08: 0001f400 1ed035dc  
GPR12:  c0001eb5e480 c003a1b53f90 1eea3e20
GPR16:  c006fd845100 c004c8b0 c13d5300
GPR20: c006fd845300 0008 c19d2cf8 c13d6888
GPR24: 0069 c13d688c 0002 c13d688c
GPR28: c19cecf0 0348  
NIP []   (null)
LR []   (null)
Call Trace:
Instruction dump:
  

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-07 Thread Gautham R Shenoy
Hi Thiago,

On Thu, Dec 06, 2018 at 03:28:17PM -0200, Thiago Jung Bauermann wrote:

[..snip..]

> 
> 
> I posted a similar patch last year, but I wasn't able to arrive at a
> root cause analysis like you did:
> 
>
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html

Ah! Nice. So this is a known problem.

> 
> One thing I realized after I posted the patch was that in my case, the
> CPU was crashing inside RTAS. From the NIP and LR in the trace above it
> looks like it's crashing in RTAS in your case as well.
> 
> Michael Ellerman had two comments on my patch:
> 
> 1. Regardless of the underlying bug, the kernel shouldn't crash so we
> need a patch making it more resilient to this failure.
> 
> 2. The wait loop should use udelay() so that the loop will actually take
> a set amount of wall time, rather than just cycles.
> 
> Regarding 1. if the problem is that the kernel is causing RTAS to crash
> because it calls it in a way that's unsupported, then I don't see how we
> can make the kernel more resilient. We have to make the kernel respect
> RTAS' restrictions (or alternatively, poke RTAS devs to make RTAS fail
> gracefuly in these conditions).


I agree that the Kernel has to respect RTAS's restriction. The PAPR
v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following:

"The stop-self service needs to be serialized with calls to the
 stop-self, start-cpu, and set-power-level services. The OS must
 be able to call RTAS services on other processors while the
 processor is stopped or being stopped"

Thus the onus is on the OS to ensure that there are no concurrent rtas
calls with "stop-self" token.

> 
> Regarding 2. I implemented a new version of my patch (posted below) but
> I was never able to test it because I couldn't access a system where the
> problem was reproducible anymore.
> 
> There's also a race between the CPU driving the unplug and the CPU being
> unplugged which I think is not easy for the CPU being unplugged to win,
> which makes the busy loop in pseries_cpu_die() a bit fragile. I describe
> the race in the patch description.
> 
> My solution to make the race less tight is to make the CPU driving the
> unplug to only start the busy loop only after the CPU being unplugged is
> in the CPU_STATE_OFFLINE state. At that point, we know that it either is
> about to call RTAS or it already has.

Ah, yes this is good optimization. Though, I think we ought to
unconditionally wait until the target CPU has woken up from CEDE and
changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we
would have caught it in dlpar_offline_cpu() itself.

> 
> Do you think this makes sense? If you do, would you mind testing my
> patch? You can change the timeouts and delays if you want. To be honest
> they're just guesses on my part...

Sure. I will test the patch and report back.

--
Thanks and Regards
gautham.



Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-06 Thread Thiago Jung Bauermann


Hello Gautham,

Gautham R. Shenoy  writes:

> From: "Gautham R. Shenoy" 
>
> Currently running DLPAR offline/online operations in a loop on a
> POWER9 system with SMT=off results in the following crash:
>
> [  223.321032] cpu 112 (hwid 112) Ready to die...
> [  223.355963] Querying DEAD? cpu 113 (113) shows 2
> [  223.356233] cpu 114 (hwid 114) Ready to die...
> [  223.356236] cpu 113 (hwid 113) Ready to die...
> [  223.356241] Bad kernel stack pointer 1faf6ca0 at 1faf6d50
> [  223.356243] Oops: Bad kernel stack pointer, sig: 6 [#1]
> [  223.356244] LE SMP NR_CPUS=2048 NUMA pSeries
> [  223.356247] Modules linked in:
> [  223.356255] CPU: 114 PID: 0 Comm: swapper/114 Not tainted 4.20.0-rc3 #39
> [  223.356258] NIP:  1faf6d50 LR: 1faf6d50 CTR: 
> 1ec6d06c
> [  223.356259] REGS: c0001e5cbd30 TRAP: 0700   Not tainted  (4.20.0-rc3)
> [  223.356260] MSR:  80081000   CR: 2804  XER: 0020
> [  223.356263] CFAR: 1ec98590 IRQMASK: 80009033
>GPR00: 1faf6d50 1faf6ca0 1ed1c448 
> 0267e6a273c3
>GPR04:  00e0 dfe8 
> 1faf6d30
>GPR08: 1faf6d28 0267e6a273c3 1ec1b108 
> 
>GPR12: 01b6d998 c0001eb55080 c003a1b8bf90 
> 1eea3f40
>GPR16:  c006fda85100 c004c8b0 
> c13d5300
>GPR20: c006fda85300 0008 c19d2cf8 
> c13d6888
>GPR24: 0072 c13d688c 0002 
> c13d688c
>GPR28: c19cecf0 0390  
> 1faf6ca0
> [  223.356281] NIP [1faf6d50] 0x1faf6d50
> [  223.356281] LR [1faf6d50] 0x1faf6d50
> [  223.356282] Call Trace:
> [  223.356283] Instruction dump:
> [  223.356285]        
> 
> [  223.356286]        
> 
> [  223.356290] ---[ end trace f46a4e046b564d1f ]---
>
> This is due to multiple offlined CPUs (CPU 113 and CPU 114 above)
> concurrently (within 3us) trying to make the rtas-call with the
> "stop-self" token, something that is prohibited by the PAPR.
>
> The concurrent calls can happen as follows.
>
>   o In dlpar_offline_cpu() we prod an offline CPU X (offline due to
> SMT=off) and loop for 25 tries in pseries_cpu_die() querying if
> the target CPU X has been stopped in RTAS. After 25 tries, we
> prints the message "Querying DEAD? cpu X (X) shows 2" and return
> to dlpar_offline_cpu(). Note that at this point CPU X has not yet
> called rtas with the "stop-self" token, but can do so anytime now.
>
>   o Back in dlpar_offline_cpu(), we prod the next offline CPU Y. CPU Y
> promptly wakes up and calls RTAS with the "stop-self" token.
>
>   o Before RTAS can stop CPU Y, CPU X also calls RTAS with the
> "stop-self" token.
>
> The problem occurs due to the short number of tries (25) in
> pseries_cpu_die() which covers 90% of the cases. For the remaining 10%
> of the cases, it was observed that we would need to loop for a few
> hundred iterations before the target CPU calls rtas with "stop-self"
> token.Experiments show that each try takes roughly ~25us.
>
> In this patch we fix the problem by increasing the number of tries
> from 25 to 4000 (which roughly corresponds to 100ms) before bailing
> out and declaring that we have failed to observe the target CPU call
> rtas-stop-self. This provides sufficient serialization to ensure that
> there no concurrent rtas calls with "stop-self" token.
>
>
>
> Reported-by: Michael Bringmann 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 2f8e621..c913c44 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,14 +214,25 @@ static void pseries_cpu_die(unsigned int cpu)
>   msleep(1);
>   }
>   } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> + int max_tries = 4000; /* Roughly corresponds to 100ms */
> + u64 tb_before = mftb();
>
> - for (tries = 0; tries < 25; tries++) {
> + for (tries = 0; tries < max_tries; tries++) {
>   cpu_status = smp_query_cpu_stopped(pcpu);
>   if (cpu_status == QCSS_STOPPED ||
>   cpu_status == QCSS_HARDWARE_ERROR)
>   break;
>   cpu_relax();
>   }
> +
> + if (tries == max_tries) {
> + u64 

Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop

2018-12-06 Thread Thiago Jung Bauermann


Hello Gautham,

Gautham R. Shenoy  writes:

> From: "Gautham R. Shenoy" 
>
> Currently running DLPAR offline/online operations in a loop on a
> POWER9 system with SMT=off results in the following crash:
>
> [  223.321032] cpu 112 (hwid 112) Ready to die...
> [  223.355963] Querying DEAD? cpu 113 (113) shows 2
> [  223.356233] cpu 114 (hwid 114) Ready to die...
> [  223.356236] cpu 113 (hwid 113) Ready to die...
> [  223.356241] Bad kernel stack pointer 1faf6ca0 at 1faf6d50
> [  223.356243] Oops: Bad kernel stack pointer, sig: 6 [#1]
> [  223.356244] LE SMP NR_CPUS=2048 NUMA pSeries
> [  223.356247] Modules linked in:
> [  223.356255] CPU: 114 PID: 0 Comm: swapper/114 Not tainted 4.20.0-rc3 #39
> [  223.356258] NIP:  1faf6d50 LR: 1faf6d50 CTR: 
> 1ec6d06c
> [  223.356259] REGS: c0001e5cbd30 TRAP: 0700   Not tainted  (4.20.0-rc3)
> [  223.356260] MSR:  80081000   CR: 2804  XER: 0020
> [  223.356263] CFAR: 1ec98590 IRQMASK: 80009033
>GPR00: 1faf6d50 1faf6ca0 1ed1c448 
> 0267e6a273c3
>GPR04:  00e0 dfe8 
> 1faf6d30
>GPR08: 1faf6d28 0267e6a273c3 1ec1b108 
> 
>GPR12: 01b6d998 c0001eb55080 c003a1b8bf90 
> 1eea3f40
>GPR16:  c006fda85100 c004c8b0 
> c13d5300
>GPR20: c006fda85300 0008 c19d2cf8 
> c13d6888
>GPR24: 0072 c13d688c 0002 
> c13d688c
>GPR28: c19cecf0 0390  
> 1faf6ca0
> [  223.356281] NIP [1faf6d50] 0x1faf6d50
> [  223.356281] LR [1faf6d50] 0x1faf6d50
> [  223.356282] Call Trace:
> [  223.356283] Instruction dump:
> [  223.356285]        
> 
> [  223.356286]        
> 
> [  223.356290] ---[ end trace f46a4e046b564d1f ]---
>
> This is due to multiple offlined CPUs (CPU 113 and CPU 114 above)
> concurrently (within 3us) trying to make the rtas-call with the
> "stop-self" token, something that is prohibited by the PAPR.
>
> The concurrent calls can happen as follows.
>
>   o In dlpar_offline_cpu() we prod an offline CPU X (offline due to
> SMT=off) and loop for 25 tries in pseries_cpu_die() querying if
> the target CPU X has been stopped in RTAS. After 25 tries, we
> prints the message "Querying DEAD? cpu X (X) shows 2" and return
> to dlpar_offline_cpu(). Note that at this point CPU X has not yet
> called rtas with the "stop-self" token, but can do so anytime now.
>
>   o Back in dlpar_offline_cpu(), we prod the next offline CPU Y. CPU Y
> promptly wakes up and calls RTAS with the "stop-self" token.
>
>   o Before RTAS can stop CPU Y, CPU X also calls RTAS with the
> "stop-self" token.
>
> The problem occurs due to the short number of tries (25) in
> pseries_cpu_die() which covers 90% of the cases. For the remaining 10%
> of the cases, it was observed that we would need to loop for a few
> hundred iterations before the target CPU calls rtas with "stop-self"
> token.Experiments show that each try takes roughly ~25us.
>
> In this patch we fix the problem by increasing the number of tries
> from 25 to 4000 (which roughly corresponds to 100ms) before bailing
> out and declaring that we have failed to observe the target CPU call
> rtas-stop-self. This provides sufficient serialization to ensure that
> there no concurrent rtas calls with "stop-self" token.
>
>
>
> Reported-by: Michael Bringmann 
> Signed-off-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
> b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> index 2f8e621..c913c44 100644
> --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
> +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
> @@ -214,14 +214,25 @@ static void pseries_cpu_die(unsigned int cpu)
>   msleep(1);
>   }
>   } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
> + int max_tries = 4000; /* Roughly corresponds to 100ms */
> + u64 tb_before = mftb();
>
> - for (tries = 0; tries < 25; tries++) {
> + for (tries = 0; tries < max_tries; tries++) {
>   cpu_status = smp_query_cpu_stopped(pcpu);
>   if (cpu_status == QCSS_STOPPED ||
>   cpu_status == QCSS_HARDWARE_ERROR)
>   break;
>   cpu_relax();
>   }
> +
> + if (tries == max_tries) {
> + u64