Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Michael, On Mon, Jan 14, 2019 at 12:11:44PM -0600, Michael Bringmann wrote: > On 1/9/19 12:08 AM, Gautham R Shenoy wrote: > > > I did some testing during the holidays. Here are the observations: > > > > 1) With just your patch (without any additional debug patch), if I run > > DLPAR on /off operations on a system that has SMT=off, I am able to > > see a crash involving RTAS stack corruption within an hour's time. > > > > 2) With the debug patch (appended below) which has additional debug to > > capture the callers of stop-self, start-cpu, set-power-levels, the > > system is able to perform DLPAR on/off operations on a system with > > SMT=off for three days. And then, it crashed with the dead CPU showing > > a "Bad kernel stack pointer". From this log, I can clearly > > see that there were no concurrent calls to stop-self, start-cpu, > > set-power-levels. The only concurrent RTAS calls were the dying CPU > > calling "stop-self", and the CPU running the DLPAR operation calling > > "query-cpu-stopped-state". The crash signature is appended below as > > well. > > > > 3) Modifying your patch to remove the udelay and increase the loop > > count from 25 to 1000 doesn't improve the situation. We are still able > > to see the crash. > > > > 4) With my patch, even without any additional debug, I was able to > > observe the system run the tests successfully for over a week (I > > started the tests before the Christmas weekend, and forgot to turn it > > off!) > > So does this mean that the problem is fixed with your patch? No. On the contrary I think my patch unable to exploit the possible race window. From a technical point of view, Thiago's patch does the right things - It waits for the target CPU to come out of CEDE and set its state to CPU_STATE_OFFLINE. - Only then it then makes the "query-cpu-stopped-state" rtas call in a loop, while giving sufficient delay between successive queries. This reduces the unnecessary rtas-calls. In my patch, I don't do any of this, but simply keep calling "query-cpu-stopped-state" call in a loop for 4000 iterations. That said, if I incorporate the wait for the target CPU to set its state to CPU_STATE_OFFLINE in my patch and then make the "query-cpu-stopped-state" rtas call, then, even I am able to get the crash with the "RTAS CALL BUFFER CORRUPTION" message. I think that incorporating the wait for the target CPU set its state to CPU_STATE_OFFLINE increases the probability that "query-cpu-stopped-state" and "stop-self" rtas calls get called more or less at the same time. However since concurrent invocations of these rtas-calls is allowed by the PAPR, it should not result in a "RTAS CALL BUFFER CORRUPTION". Am I missing something here ? > > > > > It appears that there is a narrow race window involving rtas-stop-self > > and rtas-query-cpu-stopped-state calls that can be observed with your > > patch. Adding any printk's seems to reduce the probability of hitting > > this race window. It might be worth the while to check with RTAS > > folks, if they suspect something here. > > What would the RTAS folks be looking at here? The 'narrow race window' > is with respect to a patch that it sound like we should not be > using. IMHO, if the race-window exists, it would be better to confirm and fix it, given that we have a patch that can exploit it consistently. > > Thanks. > Michael > > -- > Michael W. Bringmann > Linux Technology Center > IBM Corporation > Tie-Line 363-5196 > External: (512) 286-5196 > Cell: (512) 466-0650 > m...@linux.vnet.ibm.com -- Thanks and Regards gautham.
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
On 1/9/19 12:08 AM, Gautham R Shenoy wrote: > I did some testing during the holidays. Here are the observations: > > 1) With just your patch (without any additional debug patch), if I run > DLPAR on /off operations on a system that has SMT=off, I am able to > see a crash involving RTAS stack corruption within an hour's time. > > 2) With the debug patch (appended below) which has additional debug to > capture the callers of stop-self, start-cpu, set-power-levels, the > system is able to perform DLPAR on/off operations on a system with > SMT=off for three days. And then, it crashed with the dead CPU showing > a "Bad kernel stack pointer". From this log, I can clearly > see that there were no concurrent calls to stop-self, start-cpu, > set-power-levels. The only concurrent RTAS calls were the dying CPU > calling "stop-self", and the CPU running the DLPAR operation calling > "query-cpu-stopped-state". The crash signature is appended below as > well. > > 3) Modifying your patch to remove the udelay and increase the loop > count from 25 to 1000 doesn't improve the situation. We are still able > to see the crash. > > 4) With my patch, even without any additional debug, I was able to > observe the system run the tests successfully for over a week (I > started the tests before the Christmas weekend, and forgot to turn it > off!) So does this mean that the problem is fixed with your patch? > > It appears that there is a narrow race window involving rtas-stop-self > and rtas-query-cpu-stopped-state calls that can be observed with your > patch. Adding any printk's seems to reduce the probability of hitting > this race window. It might be worth the while to check with RTAS > folks, if they suspect something here. What would the RTAS folks be looking at here? The 'narrow race window' is with respect to a patch that it sound like we should not be using. Thanks. Michael -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Thiago, Wish you a happy 2019! On Sat, Dec 08, 2018 at 12:40:52AM -0200, Thiago Jung Bauermann wrote: > > Gautham R Shenoy writes: > > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: > >> Sure. I will test the patch and report back. > > > > I added the following debug patch on top of your patch, and after an > > hour's run, the system crashed. Appending the log at the end. > > Thank you very much for testing! Your debug patch was very helpful as > well. > > > I suppose we still need to increase the number of tries since we wait > > only for 2.5ms looping before giving up. > > Do you think it would have helped? In the debug output you posted I > would have expected the following message to show up if the loop > finished too early, and it didn't: > > "Querying DEAD? cpu %i (%i) shows %i\n" > > So I don't think increasing the loop length would have made a > difference. In fact, the call to smp_query_cpu_stopped() always > succeeded on the first iteration. I did some testing during the holidays. Here are the observations: 1) With just your patch (without any additional debug patch), if I run DLPAR on /off operations on a system that has SMT=off, I am able to see a crash involving RTAS stack corruption within an hour's time. 2) With the debug patch (appended below) which has additional debug to capture the callers of stop-self, start-cpu, set-power-levels, the system is able to perform DLPAR on/off operations on a system with SMT=off for three days. And then, it crashed with the dead CPU showing a "Bad kernel stack pointer". From this log, I can clearly see that there were no concurrent calls to stop-self, start-cpu, set-power-levels. The only concurrent RTAS calls were the dying CPU calling "stop-self", and the CPU running the DLPAR operation calling "query-cpu-stopped-state". The crash signature is appended below as well. 3) Modifying your patch to remove the udelay and increase the loop count from 25 to 1000 doesn't improve the situation. We are still able to see the crash. 4) With my patch, even without any additional debug, I was able to observe the system run the tests successfully for over a week (I started the tests before the Christmas weekend, and forgot to turn it off!) It appears that there is a narrow race window involving rtas-stop-self and rtas-query-cpu-stopped-state calls that can be observed with your patch. Adding any printk's seems to reduce the probability of hitting this race window. It might be worth the while to check with RTAS folks, if they suspect something here. The Crash log with this debug patch is as follows [DEBUG] CPU 32 waiting for CPU 130 to enter rtas cpu 130 (hwid 130) Ready to die... [DEBUG] CPU 32 noticed CPU 130 enter rtas: tries=0, time=539 [DEBUG] CPU 32 waiting for CPU 131 to enter rtas cpu 131 (hwid 131) Ready to die... [DEBUG] CPU 32 noticed CPU 131 enter rtas: tries=0, time=137 [DEBUG] CPU 32 waiting for CPU 132 to enter rtas cpu 132 (hwid 132) Ready to die... [DEBUG] CPU 32 noticed CPU 132 enter rtas: tries=0, time=1238 [DEBUG] CPU 32 waiting for CPU 133 to enter rtas cpu 133 (hwid 133) Ready to die... [DEBUG] CPU 32 noticed CPU 133 enter rtas: tries=1, time=1259 [DEBUG] CPU 32 waiting for CPU 134 to enter rtas cpu 134 (hwid 134) Ready to die... [DEBUG] CPU 32 noticed CPU 134 enter rtas: tries=0, time=1141 [DEBUG] CPU 32 waiting for CPU 135 to enter rtas cpu 135 (hwid 135) Ready to die... [DEBUG] CPU 32 noticed CPU 135 enter rtas: tries=0, time=636 cpu 120 (hwid 120) Ready to die... [DEBUG] CPU 32 waiting for CPU 120 to enter rtas [DEBUG] CPU 32 noticed CPU 120 enter rtas: tries=0, time=53 [DEBUG] CPU 32 waiting for CPU 121 to enter rtas cpu 121 (hwid 121) Ready to die... Bad kernel stack pointer 1fafb400 at 1fafb4b0 Bad kernel stack pointer 1fafb4b0 at 1ec15270 Oops: Bad kernel stack pointer, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 121 PID: 0 Comm: swapper/121 Not tainted 4.20.0-thiago-original-with-debug+ #57 NIP: 1ec15270 LR: 1ec1526c CTR: 1ecd26c4 REGS: c0001e577d30 TRAP: 0300 Not tainted (4.20.0-thiago-original-with-debug+) MSR: 80001000 CR: 4800 XER: 0020 CFAR: 1ecd27a8 DAR: 8108 DSISR: 4000 IRQMASK: 80009033 GPR00: 1ec1526c 1fafb4b0 GPR04: 01a40968 00e0 dfe8 0018 GPR08: 1f82ae00 1ed025d4 1f827850 GPR12: 01b6d998 c0001eb4e080 c003a1bdbf90 1eea3020 GPR16: c006fdc45100 c004c8b0 c13d5300 GPR20: c006fdc45300 0008 c19d2cf8 c13d6888 GPR24: 0079 c13d688c 0002 c13d688c GPR28: c19cecf0 03c8 1fafb4b0 NIP [1ec15270] 0x1ec15270 LR [1ec1526c] 0x1ec1526c Call Trace: Instruction dump:
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Note from Scott Mayes on latest crash: Michael, Since the partition crashed, I was able to get the last .2 seconds worth of RTAS call trace leading up to the crash. Best I could tell from that bit of trace was that the removal of a processor involved the following steps: -- Call to stop-self for a given thread -- Repeated calls to query-cpu-stopped-state (which eventually indicated the thread was stopped) -- Call to get-sensor-state for the thread to check its entity-state (9003) sensor which returned 'dr-entity-present' -- Call to set-indicator to set the isolation-state (9001) indicator to ISOLATE state -- Call to set-indicator to set the allocation-state (9003) indicator to UNUSABLE state I noticed one example of thread x28 getting through all of these steps just fine, but for thread x20, although the query-cpu-stopped state returned 0 status (STOPPED), a subsequent call to set-indicator to ISOLATE failed. This failure was near the end of the trace, but was not the very last RTAS call made in the trace. The set-indicator failure reported to Linux was a -9001 (Valid outstanding translation) which was mapped from a 0x502 (Invalid thread state) return code from PHYP's H_SET_DR_STATE h-call. On 12/10/2018 02:31 PM, Thiago Jung Bauermann wrote: > > Hello Michael, > > Michael Bringmann writes: > >> I have asked Scott Mayes to take a look at one of these crashes from >> the phyp side. I will let you know if he finds anything notable. > > Thanks! It might make sense to test whether booting with > cede_offline=off makes the bug go away. Scott is looking at the system. I will try once he is finished. > > One suspicion I have is regarding the code handling CPU_STATE_INACTIVE. >>From what I understand, it is a powerpc-specific CPU state and from the > perspective of the generic CPU hotplug state machine, inactive CPUs are > already fully offline. Which means that the locking performed by the > generic code state machine doesn't apply to transitioning CPUs from > INACTIVE to OFFLINE state. Perhaps the bug is that there is more than > one CPU making that transition at the same time? That would cause two > CPUs to call RTAS stop-self. > > I haven't checked whether this is really possible or not, though. It's > just a conjecture. Michael > > -- > Thiago Jung Bauermann > IBM Linux Technology Center > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Michael, Michael Bringmann writes: > I have asked Scott Mayes to take a look at one of these crashes from > the phyp side. I will let you know if he finds anything notable. Thanks! It might make sense to test whether booting with cede_offline=off makes the bug go away. One suspicion I have is regarding the code handling CPU_STATE_INACTIVE. >From what I understand, it is a powerpc-specific CPU state and from the perspective of the generic CPU hotplug state machine, inactive CPUs are already fully offline. Which means that the locking performed by the generic code state machine doesn't apply to transitioning CPUs from INACTIVE to OFFLINE state. Perhaps the bug is that there is more than one CPU making that transition at the same time? That would cause two CPUs to call RTAS stop-self. I haven't checked whether this is really possible or not, though. It's just a conjecture. -- Thiago Jung Bauermann IBM Linux Technology Center
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
I have asked Scott Mayes to take a look at one of these crashes from the phyp side. I will let you know if he finds anything notable. Michael On 12/07/2018 08:40 PM, Thiago Jung Bauermann wrote: > > Gautham R Shenoy writes: >> On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: >>> Sure. I will test the patch and report back. >> >> I added the following debug patch on top of your patch, and after an >> hour's run, the system crashed. Appending the log at the end. > > Thank you very much for testing! Your debug patch was very helpful as > well. > >> I suppose we still need to increase the number of tries since we wait >> only for 2.5ms looping before giving up. > > Do you think it would have helped? In the debug output you posted I > would have expected the following message to show up if the loop > finished too early, and it didn't: > > "Querying DEAD? cpu %i (%i) shows %i\n" > > So I don't think increasing the loop length would have made a > difference. In fact, the call to smp_query_cpu_stopped() always > succeeded on the first iteration. > > I think there is something else going on which we don't fully understand > yet. From your other email: > >> I agree that the Kernel has to respect RTAS's restriction. The PAPR >> v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following: >> >> "The stop-self service needs to be serialized with calls to the >> stop-self, start-cpu, and set-power-level services. The OS must >> be able to call RTAS services on other processors while the >> processor is stopped or being stopped" >> >> Thus the onus is on the OS to ensure that there are no concurrent rtas >> calls with "stop-self" token. > > As you say perhaps there's another call to stop-self, start-cpu or > set-power-level being made concurrently. I don't currently see how more > than one stop-self or start-cpu call could be in flight at the same time > given that there are a number of locks being grabbed during CPU hotplug > and unplug. OTOH the CPU that actually calls stop-self doesn't seem to > grab any locks itself so it's a possibility. > > As for set-power-level, it's only used in the case of PCI hotplug from > what I can see, and that isn't part of the picture in this case, right? > > We could address this problem directly by adding another lock separate > from rtas.lock to serialize just these calls. The challenge is with > stop-self, because the CPU calling it will never return to release the > lock. Is it possible to grab a lock (or down a semaphore) in the CPU > calling stop-self and then release the lock (or up the semaphore) in the > CPU running pseries_cpu_die()? > >>> There's also a race between the CPU driving the unplug and the CPU >>> being unplugged which I think is not easy for the CPU being >>> unplugged to win, which makes the busy loop in pseries_cpu_die() a >>> bit fragile. I describe the race in the patch description. >>> >>> My solution to make the race less tight is to make the CPU driving >>> the unplug to only start the busy loop only after the CPU being >>> unplugged is in the CPU_STATE_OFFLINE state. At that point, we know >>> that it either is about to call RTAS or it already has. >> >> Ah, yes this is good optimization. Though, I think we ought to >> unconditionally wait until the target CPU has woken up from CEDE and >> changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we >> would have caught it in dlpar_offline_cpu() itself. > > I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return > success when it had been given an invalid memory address to load from, > so my confidence in the error reporting of hcalls is a bit shaken. :-) > > In that case the CPU would wait forever for the CPU state to change. If > you believe 100 ms is too short a timeout, we could make it 500 ms or > even 1s. What do you think? > >> cpu 112 (hwid 112) Ready to die... >> [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65 >> cpu 113 (hwid 113) Ready to die... >> [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139 >> cpu 114 (hwid 114) Ready to die... >> [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036 >> cpu 115 (hwid 115) Ready to die... >> [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133 >> cpu 116 (hwid 116) Ready to die... >> [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231 >> cpu 117 (hwid 117) Ready to die... >> [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231 >> cpu 118 (hwid 118) Ready to die... >> [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231 >> cpu 119 (hwid 119) Ready to die... >> [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131 >> cpu 104 (hwid 104) Ready to die... >> [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40 > > Interesting, so 1.2 ms can pass before the dying CPU actually gets close > to making the stop-self call. And even in those cases the retry loop is > succeeding on the first try! So this shows
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Gautham R Shenoy writes: > On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: >> Sure. I will test the patch and report back. > > I added the following debug patch on top of your patch, and after an > hour's run, the system crashed. Appending the log at the end. Thank you very much for testing! Your debug patch was very helpful as well. > I suppose we still need to increase the number of tries since we wait > only for 2.5ms looping before giving up. Do you think it would have helped? In the debug output you posted I would have expected the following message to show up if the loop finished too early, and it didn't: "Querying DEAD? cpu %i (%i) shows %i\n" So I don't think increasing the loop length would have made a difference. In fact, the call to smp_query_cpu_stopped() always succeeded on the first iteration. I think there is something else going on which we don't fully understand yet. From your other email: > I agree that the Kernel has to respect RTAS's restriction. The PAPR > v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following: > > "The stop-self service needs to be serialized with calls to the > stop-self, start-cpu, and set-power-level services. The OS must > be able to call RTAS services on other processors while the > processor is stopped or being stopped" > > Thus the onus is on the OS to ensure that there are no concurrent rtas > calls with "stop-self" token. As you say perhaps there's another call to stop-self, start-cpu or set-power-level being made concurrently. I don't currently see how more than one stop-self or start-cpu call could be in flight at the same time given that there are a number of locks being grabbed during CPU hotplug and unplug. OTOH the CPU that actually calls stop-self doesn't seem to grab any locks itself so it's a possibility. As for set-power-level, it's only used in the case of PCI hotplug from what I can see, and that isn't part of the picture in this case, right? We could address this problem directly by adding another lock separate from rtas.lock to serialize just these calls. The challenge is with stop-self, because the CPU calling it will never return to release the lock. Is it possible to grab a lock (or down a semaphore) in the CPU calling stop-self and then release the lock (or up the semaphore) in the CPU running pseries_cpu_die()? > > There's also a race between the CPU driving the unplug and the CPU > > being unplugged which I think is not easy for the CPU being > > unplugged to win, which makes the busy loop in pseries_cpu_die() a > > bit fragile. I describe the race in the patch description. > > > > My solution to make the race less tight is to make the CPU driving > > the unplug to only start the busy loop only after the CPU being > > unplugged is in the CPU_STATE_OFFLINE state. At that point, we know > > that it either is about to call RTAS or it already has. > > Ah, yes this is good optimization. Though, I think we ought to > unconditionally wait until the target CPU has woken up from CEDE and > changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we > would have caught it in dlpar_offline_cpu() itself. I recently saw a QEMU-implemented hcall (H_LOGICAL_CI_LOAD) return success when it had been given an invalid memory address to load from, so my confidence in the error reporting of hcalls is a bit shaken. :-) In that case the CPU would wait forever for the CPU state to change. If you believe 100 ms is too short a timeout, we could make it 500 ms or even 1s. What do you think? > cpu 112 (hwid 112) Ready to die... > [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65 > cpu 113 (hwid 113) Ready to die... > [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139 > cpu 114 (hwid 114) Ready to die... > [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036 > cpu 115 (hwid 115) Ready to die... > [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133 > cpu 116 (hwid 116) Ready to die... > [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231 > cpu 117 (hwid 117) Ready to die... > [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231 > cpu 118 (hwid 118) Ready to die... > [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231 > cpu 119 (hwid 119) Ready to die... > [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131 > cpu 104 (hwid 104) Ready to die... > [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40 Interesting, so 1.2 ms can pass before the dying CPU actually gets close to making the stop-self call. And even in those cases the retry loop is succeeding on the first try! So this shows that changing the code to wait for the CPU_STATE_OFFLINE state is worth it. > *** RTAS CALL BUFFER CORRUPTION *** > 393: rtas32_call_buff_ptr= > 0060 0060 0060 0060 [...`...`...`...`] > 0060 0060 0060 0060 [...`...`...`...`] > 0060 0060 0060 0060
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
On Fri, Dec 07, 2018 at 04:13:11PM +0530, Gautham R Shenoy wrote: > Hi Thiago, > > > Sure. I will test the patch and report back. I added the following debug patch on top of your patch, and after an hour's run, the system crashed. Appending the log at the end. I suppose we still need to increase the number of tries since we wait only for 2.5ms looping before giving up. x8- >From 9e9e1e459cdc63540a73575af359a7bc0d43e455 Mon Sep 17 00:00:00 2001 From: "Gautham R. Shenoy" Date: Fri, 7 Dec 2018 16:47:25 +0530 Subject: [PATCH] Add debug on top of Thiago's patch Signed-off-by: Gautham R. Shenoy --- arch/powerpc/platforms/pseries/hotplug-cpu.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index fad3ceb..2545aab 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -214,6 +214,9 @@ static void pseries_cpu_die(unsigned int cpu) msleep(1); } } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { + u64 tb_before = mftb(); + u64 tb_after; + /* * If the current state is not offline yet, it means that the * dying CPU (which is in pseries_mach_cpu_die) didn't have a @@ -230,6 +233,11 @@ static void pseries_cpu_die(unsigned int cpu) break; udelay(100); } + + tb_after = mftb(); + + printk("[DEBUG] Waited for CPU %d to enter rtas: tries=%d, time=%llu\n", cpu, tries, + div_u64(tb_after - tb_before, tb_ticks_per_usec)); } if (cpu_status != 0) { -- 1.9.4 x8- cpu 112 (hwid 112) Ready to die... [DEBUG] Waited for CPU 112 to enter rtas: tries=0, time=65 cpu 113 (hwid 113) Ready to die... [DEBUG] Waited for CPU 113 to enter rtas: tries=0, time=1139 cpu 114 (hwid 114) Ready to die... [DEBUG] Waited for CPU 114 to enter rtas: tries=0, time=1036 cpu 115 (hwid 115) Ready to die... [DEBUG] Waited for CPU 115 to enter rtas: tries=0, time=133 cpu 116 (hwid 116) Ready to die... [DEBUG] Waited for CPU 116 to enter rtas: tries=0, time=1231 cpu 117 (hwid 117) Ready to die... [DEBUG] Waited for CPU 117 to enter rtas: tries=0, time=1231 cpu 118 (hwid 118) Ready to die... [DEBUG] Waited for CPU 118 to enter rtas: tries=0, time=1231 cpu 119 (hwid 119) Ready to die... [DEBUG] Waited for CPU 119 to enter rtas: tries=0, time=1131 cpu 104 (hwid 104) Ready to die... [DEBUG] Waited for CPU 104 to enter rtas: tries=0, time=40 *** RTAS CALL BUFFER CORRUPTION *** 393: rtas32_call_buff_ptr= 0060 0060 0060 0060 [...`...`...`...`] 0060 0060 0060 0060 [...`...`...`...`] 0060 0060 0060 0060 [...`...`...`...`] 0060 0800 E07F ACA7 00C0 [...`] 2500 [%...] 306E 7572 [0nur] 4800 0008 [H...0nur] 394: rtas64_map_buff_ptr= 5046 5743 4F44 4500 [PFWCODE.] 6000 0069 [`..i] [] 0005 0001 [] 1A00 [] 8018 6398 0300 00C0 [..c.] [..c.] cpu 105 (hwid 105) Ready to die... Bad kernel stack pointer 1fafb6c0 at 0 Oops: Bad kernel stack pointer, sig: 6 [#1] LE SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 105 PID: 0 Comm: swapper/105 Not tainted 4.20.0-rc5-thiago+ #45 NIP: LR: CTR: 007829c8 REGS: c0001e63bd30 TRAP: 0700 Not tainted (4.20.0-rc5-thiago+) MSR: 80081000 CR: 2804 XER: 0010 CFAR: 1ec153f0 IRQMASK: 80009033 GPR00: 1fafb6c0 1ec236a0 0040 GPR04: 00c0 0080 00046c4fb4842557 00cd GPR08: 0001f400 1ed035dc GPR12: c0001eb5e480 c003a1b53f90 1eea3e20 GPR16: c006fd845100 c004c8b0 c13d5300 GPR20: c006fd845300 0008 c19d2cf8 c13d6888 GPR24: 0069 c13d688c 0002 c13d688c GPR28: c19cecf0 0348 NIP [] (null) LR [] (null) Call Trace: Instruction dump:
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hi Thiago, On Thu, Dec 06, 2018 at 03:28:17PM -0200, Thiago Jung Bauermann wrote: [..snip..] > > > I posted a similar patch last year, but I wasn't able to arrive at a > root cause analysis like you did: > > https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-February/153734.html Ah! Nice. So this is a known problem. > > One thing I realized after I posted the patch was that in my case, the > CPU was crashing inside RTAS. From the NIP and LR in the trace above it > looks like it's crashing in RTAS in your case as well. > > Michael Ellerman had two comments on my patch: > > 1. Regardless of the underlying bug, the kernel shouldn't crash so we > need a patch making it more resilient to this failure. > > 2. The wait loop should use udelay() so that the loop will actually take > a set amount of wall time, rather than just cycles. > > Regarding 1. if the problem is that the kernel is causing RTAS to crash > because it calls it in a way that's unsupported, then I don't see how we > can make the kernel more resilient. We have to make the kernel respect > RTAS' restrictions (or alternatively, poke RTAS devs to make RTAS fail > gracefuly in these conditions). I agree that the Kernel has to respect RTAS's restriction. The PAPR v2.8.1, Requirement R1-7.2.3-8 under section 7.2.3 says the following: "The stop-self service needs to be serialized with calls to the stop-self, start-cpu, and set-power-level services. The OS must be able to call RTAS services on other processors while the processor is stopped or being stopped" Thus the onus is on the OS to ensure that there are no concurrent rtas calls with "stop-self" token. > > Regarding 2. I implemented a new version of my patch (posted below) but > I was never able to test it because I couldn't access a system where the > problem was reproducible anymore. > > There's also a race between the CPU driving the unplug and the CPU being > unplugged which I think is not easy for the CPU being unplugged to win, > which makes the busy loop in pseries_cpu_die() a bit fragile. I describe > the race in the patch description. > > My solution to make the race less tight is to make the CPU driving the > unplug to only start the busy loop only after the CPU being unplugged is > in the CPU_STATE_OFFLINE state. At that point, we know that it either is > about to call RTAS or it already has. Ah, yes this is good optimization. Though, I think we ought to unconditionally wait until the target CPU has woken up from CEDE and changed its state to CPU_STATE_OFFLINE. After if PROD failed, then we would have caught it in dlpar_offline_cpu() itself. > > Do you think this makes sense? If you do, would you mind testing my > patch? You can change the timeouts and delays if you want. To be honest > they're just guesses on my part... Sure. I will test the patch and report back. -- Thanks and Regards gautham.
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Gautham, Gautham R. Shenoy writes: > From: "Gautham R. Shenoy" > > Currently running DLPAR offline/online operations in a loop on a > POWER9 system with SMT=off results in the following crash: > > [ 223.321032] cpu 112 (hwid 112) Ready to die... > [ 223.355963] Querying DEAD? cpu 113 (113) shows 2 > [ 223.356233] cpu 114 (hwid 114) Ready to die... > [ 223.356236] cpu 113 (hwid 113) Ready to die... > [ 223.356241] Bad kernel stack pointer 1faf6ca0 at 1faf6d50 > [ 223.356243] Oops: Bad kernel stack pointer, sig: 6 [#1] > [ 223.356244] LE SMP NR_CPUS=2048 NUMA pSeries > [ 223.356247] Modules linked in: > [ 223.356255] CPU: 114 PID: 0 Comm: swapper/114 Not tainted 4.20.0-rc3 #39 > [ 223.356258] NIP: 1faf6d50 LR: 1faf6d50 CTR: > 1ec6d06c > [ 223.356259] REGS: c0001e5cbd30 TRAP: 0700 Not tainted (4.20.0-rc3) > [ 223.356260] MSR: 80081000 CR: 2804 XER: 0020 > [ 223.356263] CFAR: 1ec98590 IRQMASK: 80009033 >GPR00: 1faf6d50 1faf6ca0 1ed1c448 > 0267e6a273c3 >GPR04: 00e0 dfe8 > 1faf6d30 >GPR08: 1faf6d28 0267e6a273c3 1ec1b108 > >GPR12: 01b6d998 c0001eb55080 c003a1b8bf90 > 1eea3f40 >GPR16: c006fda85100 c004c8b0 > c13d5300 >GPR20: c006fda85300 0008 c19d2cf8 > c13d6888 >GPR24: 0072 c13d688c 0002 > c13d688c >GPR28: c19cecf0 0390 > 1faf6ca0 > [ 223.356281] NIP [1faf6d50] 0x1faf6d50 > [ 223.356281] LR [1faf6d50] 0x1faf6d50 > [ 223.356282] Call Trace: > [ 223.356283] Instruction dump: > [ 223.356285] > > [ 223.356286] > > [ 223.356290] ---[ end trace f46a4e046b564d1f ]--- > > This is due to multiple offlined CPUs (CPU 113 and CPU 114 above) > concurrently (within 3us) trying to make the rtas-call with the > "stop-self" token, something that is prohibited by the PAPR. > > The concurrent calls can happen as follows. > > o In dlpar_offline_cpu() we prod an offline CPU X (offline due to > SMT=off) and loop for 25 tries in pseries_cpu_die() querying if > the target CPU X has been stopped in RTAS. After 25 tries, we > prints the message "Querying DEAD? cpu X (X) shows 2" and return > to dlpar_offline_cpu(). Note that at this point CPU X has not yet > called rtas with the "stop-self" token, but can do so anytime now. > > o Back in dlpar_offline_cpu(), we prod the next offline CPU Y. CPU Y > promptly wakes up and calls RTAS with the "stop-self" token. > > o Before RTAS can stop CPU Y, CPU X also calls RTAS with the > "stop-self" token. > > The problem occurs due to the short number of tries (25) in > pseries_cpu_die() which covers 90% of the cases. For the remaining 10% > of the cases, it was observed that we would need to loop for a few > hundred iterations before the target CPU calls rtas with "stop-self" > token.Experiments show that each try takes roughly ~25us. > > In this patch we fix the problem by increasing the number of tries > from 25 to 4000 (which roughly corresponds to 100ms) before bailing > out and declaring that we have failed to observe the target CPU call > rtas-stop-self. This provides sufficient serialization to ensure that > there no concurrent rtas calls with "stop-self" token. > > > > Reported-by: Michael Bringmann > Signed-off-by: Gautham R. Shenoy > --- > arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index 2f8e621..c913c44 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -214,14 +214,25 @@ static void pseries_cpu_die(unsigned int cpu) > msleep(1); > } > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { > + int max_tries = 4000; /* Roughly corresponds to 100ms */ > + u64 tb_before = mftb(); > > - for (tries = 0; tries < 25; tries++) { > + for (tries = 0; tries < max_tries; tries++) { > cpu_status = smp_query_cpu_stopped(pcpu); > if (cpu_status == QCSS_STOPPED || > cpu_status == QCSS_HARDWARE_ERROR) > break; > cpu_relax(); > } > + > + if (tries == max_tries) { > + u64
Re: [PATCH] pseries/hotplug: Add more delay in pseries_cpu_die while waiting for rtas-stop
Hello Gautham, Gautham R. Shenoy writes: > From: "Gautham R. Shenoy" > > Currently running DLPAR offline/online operations in a loop on a > POWER9 system with SMT=off results in the following crash: > > [ 223.321032] cpu 112 (hwid 112) Ready to die... > [ 223.355963] Querying DEAD? cpu 113 (113) shows 2 > [ 223.356233] cpu 114 (hwid 114) Ready to die... > [ 223.356236] cpu 113 (hwid 113) Ready to die... > [ 223.356241] Bad kernel stack pointer 1faf6ca0 at 1faf6d50 > [ 223.356243] Oops: Bad kernel stack pointer, sig: 6 [#1] > [ 223.356244] LE SMP NR_CPUS=2048 NUMA pSeries > [ 223.356247] Modules linked in: > [ 223.356255] CPU: 114 PID: 0 Comm: swapper/114 Not tainted 4.20.0-rc3 #39 > [ 223.356258] NIP: 1faf6d50 LR: 1faf6d50 CTR: > 1ec6d06c > [ 223.356259] REGS: c0001e5cbd30 TRAP: 0700 Not tainted (4.20.0-rc3) > [ 223.356260] MSR: 80081000 CR: 2804 XER: 0020 > [ 223.356263] CFAR: 1ec98590 IRQMASK: 80009033 >GPR00: 1faf6d50 1faf6ca0 1ed1c448 > 0267e6a273c3 >GPR04: 00e0 dfe8 > 1faf6d30 >GPR08: 1faf6d28 0267e6a273c3 1ec1b108 > >GPR12: 01b6d998 c0001eb55080 c003a1b8bf90 > 1eea3f40 >GPR16: c006fda85100 c004c8b0 > c13d5300 >GPR20: c006fda85300 0008 c19d2cf8 > c13d6888 >GPR24: 0072 c13d688c 0002 > c13d688c >GPR28: c19cecf0 0390 > 1faf6ca0 > [ 223.356281] NIP [1faf6d50] 0x1faf6d50 > [ 223.356281] LR [1faf6d50] 0x1faf6d50 > [ 223.356282] Call Trace: > [ 223.356283] Instruction dump: > [ 223.356285] > > [ 223.356286] > > [ 223.356290] ---[ end trace f46a4e046b564d1f ]--- > > This is due to multiple offlined CPUs (CPU 113 and CPU 114 above) > concurrently (within 3us) trying to make the rtas-call with the > "stop-self" token, something that is prohibited by the PAPR. > > The concurrent calls can happen as follows. > > o In dlpar_offline_cpu() we prod an offline CPU X (offline due to > SMT=off) and loop for 25 tries in pseries_cpu_die() querying if > the target CPU X has been stopped in RTAS. After 25 tries, we > prints the message "Querying DEAD? cpu X (X) shows 2" and return > to dlpar_offline_cpu(). Note that at this point CPU X has not yet > called rtas with the "stop-self" token, but can do so anytime now. > > o Back in dlpar_offline_cpu(), we prod the next offline CPU Y. CPU Y > promptly wakes up and calls RTAS with the "stop-self" token. > > o Before RTAS can stop CPU Y, CPU X also calls RTAS with the > "stop-self" token. > > The problem occurs due to the short number of tries (25) in > pseries_cpu_die() which covers 90% of the cases. For the remaining 10% > of the cases, it was observed that we would need to loop for a few > hundred iterations before the target CPU calls rtas with "stop-self" > token.Experiments show that each try takes roughly ~25us. > > In this patch we fix the problem by increasing the number of tries > from 25 to 4000 (which roughly corresponds to 100ms) before bailing > out and declaring that we have failed to observe the target CPU call > rtas-stop-self. This provides sufficient serialization to ensure that > there no concurrent rtas calls with "stop-self" token. > > > > Reported-by: Michael Bringmann > Signed-off-by: Gautham R. Shenoy > --- > arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c > b/arch/powerpc/platforms/pseries/hotplug-cpu.c > index 2f8e621..c913c44 100644 > --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c > +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c > @@ -214,14 +214,25 @@ static void pseries_cpu_die(unsigned int cpu) > msleep(1); > } > } else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) { > + int max_tries = 4000; /* Roughly corresponds to 100ms */ > + u64 tb_before = mftb(); > > - for (tries = 0; tries < 25; tries++) { > + for (tries = 0; tries < max_tries; tries++) { > cpu_status = smp_query_cpu_stopped(pcpu); > if (cpu_status == QCSS_STOPPED || > cpu_status == QCSS_HARDWARE_ERROR) > break; > cpu_relax(); > } > + > + if (tries == max_tries) { > + u64