On 2017.09.30 at 10:20 -0400, Brian Gerst wrote: > On Sat, Sep 30, 2017 at 8:47 AM, Markus Trippelsdorf > <mar...@trippelsdorf.de> wrote: > > On 2017.09.30 at 13:53 +0200, Borislav Petkov wrote: > >> On Sat, Sep 30, 2017 at 01:29:03PM +0200, Adam Borowski wrote: > >> > On Sat, Sep 30, 2017 at 01:11:37PM +0200, Borislav Petkov wrote: > >> > > On Sat, Sep 30, 2017 at 04:05:16AM +0200, Adam Borowski wrote: > >> > > > Any hints how to debug this? > >> > > > >> > > Do > >> > > rdmsr -a 0xc0010015 > >> > > as root and paste it here. > >> > > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > 1000010 > >> > > >> > on both 4.13.4 and 4.14-rc2+. > >> > >> Boot into -rc2+ and do as root: > >> > >> # wrmsr -a 0xc0010015 0x1000018 > >> > >> If the issue gets fixed then Mr. Luto better revert the new lazy TLB > >> flushing fun'n'games for 4.14 before it is too late and that kernel > >> releases b0rked. > > > > The issue does get fixed by setting TlbCacheDis to 1. I have been > > running it for the last few weeks without any problems. > > Performance is not affected at all. So it might by easier to just set > > the bit for older AMD processors as a boot quirk. > > Changing the TLB code so late might not be a good idea... > > Looking at the AMD K10 revision guide > (http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf), errata #298 > that this fixes should only apply to revisions DR-BA and DR-B2, which > include the original Phenom, but not Phenom II. The Phenom II X6 is > revision PH-E0, which does not have this errata.
It has nothing to do with errata #298. The new lazy TLB code causes MCEs, because the page tables may now contain garbage. See the long "Current mainline git (24e700e291d52bd2) hangs when building e.g. perf" LKML thread. -- Markus