RE: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
On Thurs, 15 Dec 2016, Thomas Gleixner wrote: >See the SDM. It suggests that the deadline is compared to the TSC value. >I don't care how it is implemented, but I very much care about it being >documented in the way it is implemented, which is obviously not the case. The TSC_DEADLINE behavior definitely diverges from the SDM and we've root caused the issue and are finalizing a fix that addresses the problem. Glad that a software workaround was also possible. We also plan to expand our documentation on what good behavior of a BIOS is as it relates to adjusting TSC at boot (in almost every circumstance you shouldn't do it). The official errata will be forthcoming. Thank you for helping to identify this. Kevin
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
On Wed, 14 Dec 2016, Roland Scheidegger wrote: > Am 14.12.2016 um 22:40 schrieb Thomas Gleixner: > > And the whole mess stems from the fact that the deadline is not as one > > would expect simply compared against the sum of the counter and the adjust > > MSR. > Why would it be compared against the sum? As far as I can tell the adjust > value should never come into play when using deadline timer (other than > indirectly because the TSC would change). See the SDM. It suggests that the deadline is compared to the TSC value. I don't care how it is implemented, but I very much care about it being documented in the way it is implemented, which is obviously not the case. And even if the adjust value is subtracted once when the timer is armed the whole thing should not blow up due to 32bit/sign extension bugs or whatever they decided to cobble together. Thanks, tglx
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
Am 14.12.2016 um 22:40 schrieb Thomas Gleixner: > On Wed, 14 Dec 2016, Thomas Gleixner wrote: >> Positive space, results in timer not firing anymore - at least not in a >> time frame you are willing to wait for. >> >> 0x 8000 >> >> Negative space, results in an interrupt storm. >> >> 0x >> 0x fffe >> 0x fffd >> 0x fffc >> 0x fffb >> >> >> These points are independent of the underlying counter value (cold boot, >> warm boot) and even reproduce after hours of power on reliably. >> >> And looking at the values makes me wonder about 32bit vs. 64bit wreckage >> combined with sign expansion done wrong. Im really impressed! > > And the whole mess stems from the fact that the deadline is not as one > would expect simply compared against the sum of the counter and the adjust > MSR. Why would it be compared against the sum? As far as I can tell the adjust value should never come into play when using deadline timer (other than indirectly because the TSC would change). (And I'd guess better avoid an armed deadline timer while changing TSC_ADJ...) In any case, I've tested the two patches on top of x86-timers and they work just fine - all TSC_ADJ values get set to zero both on boot and resume, no lockups, and tsc clocksource active, with some whining in the log of course :-). So, Tested-by: Roland Scheidegger > No, they subtract the adjust value from the MSR when you write the deadline > and latch the result to compare it against the counter. > > So when the following happens: > >ADJUST = 0 >RDTSC = 1000 >DEADLINE = 1100 > >ADJUST = 100 > >INTERRUPT >RDTSC = 1200 > >DEADLINE = 1300 > >ADJUST =0 > >INTERRUPT >RDTSC = 1200 > > So depending on the direction of the adjustment the timer fires late or > early. > > Combined with that math wreckage this is a complete disaster. And of course > nothing is documented anywhere and the SDM is outright wrong: > > 10.5.4.1 TSC-Deadline Mode > > The processor generates a timer interrupt when the value of time-stamp > counter is greater than or equal to that of IA32_TSC_DEADLINE. It then > disarms the timer and clear the IA32_TSC_DEADLINE MSR. (Both the time-stamp > counter and the IA32_TSC_DEADLINE MSR are 64-bit unsigned integers.) > > See the example above. 120 is neither equal nor greater than 130, at > least not in my universe. > > I serioulsy doubt that Intel manages it to design at least ONE functional > non broken timer before I retire. > > Thanks, > > tglx >
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
On Wed, 14 Dec 2016, Thomas Gleixner wrote: > Positive space, results in timer not firing anymore - at least not in a > time frame you are willing to wait for. > > 0x 8000 > > Negative space, results in an interrupt storm. > > 0x > 0x fffe > 0x fffd > 0x fffc > 0x fffb > > > These points are independent of the underlying counter value (cold boot, > warm boot) and even reproduce after hours of power on reliably. > > And looking at the values makes me wonder about 32bit vs. 64bit wreckage > combined with sign expansion done wrong. Im really impressed! And the whole mess stems from the fact that the deadline is not as one would expect simply compared against the sum of the counter and the adjust MSR. No, they subtract the adjust value from the MSR when you write the deadline and latch the result to compare it against the counter. So when the following happens: ADJUST = 0 RDTSC= 1000 DEADLINE = 1100 ADJUST = 100 INTERRUPT RDTSC= 1200 DEADLINE = 1300 ADJUST =0 INTERRUPT RDTSC= 1200 So depending on the direction of the adjustment the timer fires late or early. Combined with that math wreckage this is a complete disaster. And of course nothing is documented anywhere and the SDM is outright wrong: 10.5.4.1 TSC-Deadline Mode The processor generates a timer interrupt when the value of time-stamp counter is greater than or equal to that of IA32_TSC_DEADLINE. It then disarms the timer and clear the IA32_TSC_DEADLINE MSR. (Both the time-stamp counter and the IA32_TSC_DEADLINE MSR are 64-bit unsigned integers.) See the example above. 120 is neither equal nor greater than 130, at least not in my universe. I serioulsy doubt that Intel manages it to design at least ONE functional non broken timer before I retire. Thanks, tglx
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
On Wed, 14 Dec 2016, Thomas Gleixner wrote: > On Wed, 14 Dec 2016, Roland Scheidegger wrote: > > Am 13.12.2016 um 17:46 schrieb Thomas Gleixner: > > > What are the adjust values after a warm boot? > > > > So, after cold boot with a kernel which doesn't adjust TSCs, then warm > > boot I got: > > [0.00] TSC ADJUST: CPU0: -602358264300 176072418728 > > [0.00] TSC ADJUST: Boot CPU0: -602358264300 > > [0.172245] TSC ADJUST: CPU1: -602360207584 176587932558 > > [0.172245] TSC ADJUST differs: Reference CPU0: -602358264300 CPU1: > > -602360207584 > > [0.172246] TSC ADJUST synchronize: Reference CPU0: -602358264300 > > CPU1: -602360207584 > > [0.252663] TSC ADJUST: CPU2: -602359000822 176828627154 > > [0.252663] TSC ADJUST differs: Reference CPU0: -602358264300 CPU2: > > -602359000822 > > [0.252664] TSC ADJUST synchronize: Reference CPU0: -602358264300 > > CPU2: -602359000822 > > [0.337014] TSC ADJUST: CPU3: -602360177680 177081093132 > > [0.337014] TSC ADJUST differs: Reference CPU0: -602358264300 CPU3: > > -602360177680 > > [0.337015] TSC ADJUST synchronize: Reference CPU0: -602358264300 > > CPU3: -602360177680 > > > > and so on. > > > > Albeit after another reboot (some minutes later), it actually straight > > locked up again: > > > > TSC ADJUST: CPU1: -8257481427958 165112676430 > > TSC ADJUST differs: Reference CPU0: -8257479484330 CPU1: -8257481427958 > > TSC ADJUST synchronize: Reference CPU0: -8257479484330 CPU1: -8254781427958 > > TSC target sync skip > > ... > > smpboot: Target CPU is online > > > > So, actually I thought the TSC would get reset too on warm boot, but > > clearly looks like that isn't the case... > > But I don't know what's the difference between first and second reboot - > > the adjust values have just more magnitude, but otherwise even the > > direction of the adjustments and everything looks all the same (just > > like cold boot, which also looks all the same to me). > > I haven't found a pattern for the lockups yet and we have to wait for Intel > to provide useful information about that issue. All we know so far is that > negative adjust values are dangerous. Did some futher investigation. The values which cause the interrupt storms have very clear identifiable points which reliably reproduce: Positive space, results in timer not firing anymore - at least not in a time frame you are willing to wait for. 0x 8000 Negative space, results in an interrupt storm. 0x 0x fffe 0x fffd 0x fffc 0x fffb These points are independent of the underlying counter value (cold boot, warm boot) and even reproduce after hours of power on reliably. And looking at the values makes me wonder about 32bit vs. 64bit wreckage combined with sign expansion done wrong. Im really impressed! In the negative space there is something else going on which is dependent on the counter value. Right after cold boot the space is closer to zero than after hours of power on. So the approach of forbidding negative values is definitely not wrong. Thanks, tglx
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
On Wed, 14 Dec 2016, Roland Scheidegger wrote: > Am 13.12.2016 um 17:46 schrieb Thomas Gleixner: > > What are the adjust values after a warm boot? > > So, after cold boot with a kernel which doesn't adjust TSCs, then warm > boot I got: > [0.00] TSC ADJUST: CPU0: -602358264300 176072418728 > [0.00] TSC ADJUST: Boot CPU0: -602358264300 > [0.172245] TSC ADJUST: CPU1: -602360207584 176587932558 > [0.172245] TSC ADJUST differs: Reference CPU0: -602358264300 CPU1: > -602360207584 > [0.172246] TSC ADJUST synchronize: Reference CPU0: -602358264300 > CPU1: -602360207584 > [0.252663] TSC ADJUST: CPU2: -602359000822 176828627154 > [0.252663] TSC ADJUST differs: Reference CPU0: -602358264300 CPU2: > -602359000822 > [0.252664] TSC ADJUST synchronize: Reference CPU0: -602358264300 > CPU2: -602359000822 > [0.337014] TSC ADJUST: CPU3: -602360177680 177081093132 > [0.337014] TSC ADJUST differs: Reference CPU0: -602358264300 CPU3: > -602360177680 > [0.337015] TSC ADJUST synchronize: Reference CPU0: -602358264300 > CPU3: -602360177680 > > and so on. > > Albeit after another reboot (some minutes later), it actually straight > locked up again: > > TSC ADJUST: CPU1: -8257481427958 165112676430 > TSC ADJUST differs: Reference CPU0: -8257479484330 CPU1: -8257481427958 > TSC ADJUST synchronize: Reference CPU0: -8257479484330 CPU1: -8254781427958 > TSC target sync skip > ... > smpboot: Target CPU is online > > So, actually I thought the TSC would get reset too on warm boot, but > clearly looks like that isn't the case... > But I don't know what's the difference between first and second reboot - > the adjust values have just more magnitude, but otherwise even the > direction of the adjustments and everything looks all the same (just > like cold boot, which also looks all the same to me). I haven't found a pattern for the lockups yet and we have to wait for Intel to provide useful information about that issue. All we know so far is that negative adjust values are dangerous. Could you test the two patches on top of tip x86/timers branch so we can make progress with that whole disaster while waiting for Intel to come forth with a proper explanation? Thanks, tglx
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
Am 13.12.2016 um 17:46 schrieb Thomas Gleixner: > On Tue, 13 Dec 2016, Roland Scheidegger wrote: > >> Am 13.12.2016 um 14:14 schrieb Thomas Gleixner: >>> Roland reported interesting TSC ADJUST register wreckage on his DELL >>> machine, which seems to populate that MSR with a random number generator. >> >> FWIW, I thought about the actual values some more and I don't actually >> think they are all that random any more: the behavior is consistent with >> the bios trying to zero the TSC of all cpus. If I understand this right, >> writing a zero to TSC would cause somewhat small negative values in the >> TSC_ADJ register at boot time, and larger negative values at suspend >> time (at least if the TSC just stops when suspended and isn't reset) - >> exactly what I'm seeing. >> (And of course the different TSC_ADJ values would be because the bios is >> writing TSC without any thoughts of synchronization, just one cpu after >> another). > > Yeah, that might be. Still it looks like random nonsense and definitely the > BIOS developers did not follow the secrit boot protocol. > >>> Deeper investagation into fixing this wreckage unearthed another special >>> feature which is designed by Intel: Negative TSC adjuste values cause >>> interrupt storms on the TSC deadline timer. Further details in patch 2/2 >> >> This actually looks like quite a serious hw bug to me, shouldn't there >> be an errata for such a bug? >> >> And I still don't quite understand why the lockup doesn't happen after a >> warm boot, there must be something different there... > > What are the adjust values after a warm boot? > So, after cold boot with a kernel which doesn't adjust TSCs, then warm boot I got: [0.00] TSC ADJUST: CPU0: -602358264300 176072418728 [0.00] TSC ADJUST: Boot CPU0: -602358264300 [0.172245] TSC ADJUST: CPU1: -602360207584 176587932558 [0.172245] TSC ADJUST differs: Reference CPU0: -602358264300 CPU1: -602360207584 [0.172246] TSC ADJUST synchronize: Reference CPU0: -602358264300 CPU1: -602360207584 [0.252663] TSC ADJUST: CPU2: -602359000822 176828627154 [0.252663] TSC ADJUST differs: Reference CPU0: -602358264300 CPU2: -602359000822 [0.252664] TSC ADJUST synchronize: Reference CPU0: -602358264300 CPU2: -602359000822 [0.337014] TSC ADJUST: CPU3: -602360177680 177081093132 [0.337014] TSC ADJUST differs: Reference CPU0: -602358264300 CPU3: -602360177680 [0.337015] TSC ADJUST synchronize: Reference CPU0: -602358264300 CPU3: -602360177680 and so on. Albeit after another reboot (some minutes later), it actually straight locked up again: TSC ADJUST: CPU1: -8257481427958 165112676430 TSC ADJUST differs: Reference CPU0: -8257479484330 CPU1: -8257481427958 TSC ADJUST synchronize: Reference CPU0: -8257479484330 CPU1: -8254781427958 TSC target sync skip ... smpboot: Target CPU is online So, actually I thought the TSC would get reset too on warm boot, but clearly looks like that isn't the case... But I don't know what's the difference between first and second reboot - the adjust values have just more magnitude, but otherwise even the direction of the adjustments and everything looks all the same (just like cold boot, which also looks all the same to me). Roland
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
On Tue, 13 Dec 2016, Roland Scheidegger wrote: > Am 13.12.2016 um 14:14 schrieb Thomas Gleixner: > > Roland reported interesting TSC ADJUST register wreckage on his DELL > > machine, which seems to populate that MSR with a random number generator. > > FWIW, I thought about the actual values some more and I don't actually > think they are all that random any more: the behavior is consistent with > the bios trying to zero the TSC of all cpus. If I understand this right, > writing a zero to TSC would cause somewhat small negative values in the > TSC_ADJ register at boot time, and larger negative values at suspend > time (at least if the TSC just stops when suspended and isn't reset) - > exactly what I'm seeing. > (And of course the different TSC_ADJ values would be because the bios is > writing TSC without any thoughts of synchronization, just one cpu after > another). Yeah, that might be. Still it looks like random nonsense and definitely the BIOS developers did not follow the secrit boot protocol. > > Deeper investagation into fixing this wreckage unearthed another special > > feature which is designed by Intel: Negative TSC adjuste values cause > > interrupt storms on the TSC deadline timer. Further details in patch 2/2 > > This actually looks like quite a serious hw bug to me, shouldn't there > be an errata for such a bug? > > And I still don't quite understand why the lockup doesn't happen after a > warm boot, there must be something different there... What are the adjust values after a warm boot? Thanks, tglx
Re: [patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
Am 13.12.2016 um 14:14 schrieb Thomas Gleixner: > Roland reported interesting TSC ADJUST register wreckage on his DELL > machine, which seems to populate that MSR with a random number generator. FWIW, I thought about the actual values some more and I don't actually think they are all that random any more: the behavior is consistent with the bios trying to zero the TSC of all cpus. If I understand this right, writing a zero to TSC would cause somewhat small negative values in the TSC_ADJ register at boot time, and larger negative values at suspend time (at least if the TSC just stops when suspended and isn't reset) - exactly what I'm seeing. (And of course the different TSC_ADJ values would be because the bios is writing TSC without any thoughts of synchronization, just one cpu after another). > > Deeper investagation into fixing this wreckage unearthed another special > feature which is designed by Intel: Negative TSC adjuste values cause > interrupt storms on the TSC deadline timer. Further details in patch 2/2 This actually looks like quite a serious hw bug to me, shouldn't there be an errata for such a bug? And I still don't quite understand why the lockup doesn't happen after a warm boot, there must be something different there... (I didn't have the chance to test the patch yet.) Roland
[patch 0/2] tsc/adjust: Cure suspend/resume issues and prevent TSC deadline timer irq storm
Roland reported interesting TSC ADJUST register wreckage on his DELL machine, which seems to populate that MSR with a random number generator. Deeper investagation into fixing this wreckage unearthed another special feature which is designed by Intel: Negative TSC adjuste values cause interrupt storms on the TSC deadline timer. Further details in patch 2/2 Thanks tglx