Re: [all better] Re: regression: massive trouble with fpu rework
On Tue, 30 Jun 2015, Ingo Molnar wrote: > And I'd consider us hanging a separate (but not high prio) bug: the kernel > should > be robust as long as the CPUID data is stable. In that sense the original fix > is > right (we really want to unmask all available CPUID leaves), but it also > masked > another (less severe) kernel bug. > > For example virtualization is known to tweak CPUID details creatively, and > firmware (as this example shows it) can mess it up a well, so we generally > want to > treat it as untrusted input data that needs to be validated. Processor microcode updates can also change cpuid information, at least on Intel. There are Intel microcode updates in the field that do this. Specific Intel MSR writes *should* be able to change cpuid information as well, as they enable/disable features that are reflected by a cpuid bit. I have no idea about AMD, though. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Tue, 30 Jun 2015, Ingo Molnar wrote: And I'd consider us hanging a separate (but not high prio) bug: the kernel should be robust as long as the CPUID data is stable. In that sense the original fix is right (we really want to unmask all available CPUID leaves), but it also masked another (less severe) kernel bug. For example virtualization is known to tweak CPUID details creatively, and firmware (as this example shows it) can mess it up a well, so we generally want to treat it as untrusted input data that needs to be validated. Processor microcode updates can also change cpuid information, at least on Intel. There are Intel microcode updates in the field that do this. Specific Intel MSR writes *should* be able to change cpuid information as well, as they enable/disable features that are reflected by a cpuid bit. I have no idea about AMD, though. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On 06/29/2015 10:16 PM, Ingo Molnar wrote: > > * Borislav Petkov wrote: > >> On Mon, Jun 29, 2015 at 02:27:23PM +0200, Mike Galbraith wrote: >>> With it commented out, and fpu__init_system() either back at previously >>> booting position [5] or at original [0], doesn't matter, box is dead, >>> but differently. It stalls after setting clocksource to tsc, and just >>> sits there. >> >> ... which means that unmasking the CPUID features is absolutely needed >> on Linux. Not unmasking probably triggers this original bug which >> >> 066941bd4eeb ("x86: unmask CPUID levels on Intel CPUs") >> >> fixed. > > Yes. > > And I'd consider us hanging a separate (but not high prio) bug: the kernel > should > be robust as long as the CPUID data is stable. In that sense the original fix > is > right (we really want to unmask all available CPUID leaves), but it also > masked > another (less severe) kernel bug. > > For example virtualization is known to tweak CPUID details creatively, and > firmware (as this example shows it) can mess it up a well, so we generally > want to > treat it as untrusted input data that needs to be validated. > Well, that is not *entirely* possible, since if the data is just plain wrong, we're screwed no matter what. However, we could deal with CPUID level capping. The best way to do that is probably to have a table of CPU features and the minimum required CPUID level for each. If maximum CPUID level < that level, disable that feature. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On 06/29/2015 10:16 PM, Ingo Molnar wrote: * Borislav Petkov b...@alien8.de wrote: On Mon, Jun 29, 2015 at 02:27:23PM +0200, Mike Galbraith wrote: With it commented out, and fpu__init_system() either back at previously booting position [5] or at original [0], doesn't matter, box is dead, but differently. It stalls after setting clocksource to tsc, and just sits there. ... which means that unmasking the CPUID features is absolutely needed on Linux. Not unmasking probably triggers this original bug which 066941bd4eeb (x86: unmask CPUID levels on Intel CPUs) fixed. Yes. And I'd consider us hanging a separate (but not high prio) bug: the kernel should be robust as long as the CPUID data is stable. In that sense the original fix is right (we really want to unmask all available CPUID leaves), but it also masked another (less severe) kernel bug. For example virtualization is known to tweak CPUID details creatively, and firmware (as this example shows it) can mess it up a well, so we generally want to treat it as untrusted input data that needs to be validated. Well, that is not *entirely* possible, since if the data is just plain wrong, we're screwed no matter what. However, we could deal with CPUID level capping. The best way to do that is probably to have a table of CPU features and the minimum required CPUID level for each. If maximum CPUID level that level, disable that feature. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* H. Peter Anvin wrote: > On 06/28/2015 11:40 PM, Ingo Molnar wrote: > > > > Ok, so could you please move the fpu__init_system() further up and see > > which > > position is that starts breaking with the BIOS option set? > > > > here's the current, broken layout of the code: > > > > get_cpu_cap(c); > > [0] fpu__init_system(c); > > > > if (this_cpu->c_early_init) > > this_cpu->c_early_init(c); > > > > [1] > > c->cpu_index = 0; > > [2] > > filter_cpuid_features(c, false); > > > > [3] > > if (this_cpu->c_bsp_init) > > this_cpu->c_bsp_init(c); > > > > [4] > > setup_force_cpu_cap(X86_FEATURE_ALWAYS); > > [5] > > } > > > > and we know it from your testing that moving [0] to [5] fixes the crash. > > > > The question is, can we move it to [4], [3], [2] or even [1] instead, > > without > > breaking the system? > > > > I still don't see where the breakage comes from, but this would help us > > narrow it > > down. > > > > It should be moved to [4] or [5]. I would argue that the line setting > X86_FEATURE_ALWAYS should moved up and then fpu__init_system(c) should > be moved after the c_bsp_init() line. Yeah, so the patch I sent to Mike (and which solved the bug) moved it to [5]. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Borislav Petkov wrote: > On Mon, Jun 29, 2015 at 02:27:23PM +0200, Mike Galbraith wrote: > > With it commented out, and fpu__init_system() either back at previously > > booting position [5] or at original [0], doesn't matter, box is dead, > > but differently. It stalls after setting clocksource to tsc, and just > > sits there. > > ... which means that unmasking the CPUID features is absolutely needed > on Linux. Not unmasking probably triggers this original bug which > > 066941bd4eeb ("x86: unmask CPUID levels on Intel CPUs") > > fixed. Yes. And I'd consider us hanging a separate (but not high prio) bug: the kernel should be robust as long as the CPUID data is stable. In that sense the original fix is right (we really want to unmask all available CPUID leaves), but it also masked another (less severe) kernel bug. For example virtualization is known to tweak CPUID details creatively, and firmware (as this example shows it) can mess it up a well, so we generally want to treat it as untrusted input data that needs to be validated. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* H. Peter Anvin wrote: > On 06/29/2015 02:35 AM, Ingo Molnar wrote: > > > > Indeed, I bet that makes a difference! > > > > I wish that 'unmasking' logic came with more comments: > > > > - Why do BIOSen ever mask CPUIDs? > > > > To work around bugs in legacy operating systems. > > > - Why do we unmask the masking? > > Because we don't have those specific bugs. Great - would be nice to put those reasons between /* */ markers, to keep future generations (and overworked maintainers!) from wondering. > > - Why doesn't the kernel keep on working just fine even if certain CPUID > > aspects > > are turned off? > > Because it exercises code paths that are otherwise impossible, for example, > it > exposes the XSAVE capability without exposing the XSAVE information in higher > CPUID leaves. > > The other option would be to have a list of CPU features that should be > turned > off whenever the CPUID leaf maximum is too low, but it gives a better user > experience to just override the BIOS capping and then we have fewer code > paths > in the kernel to worry about. 1) As a side note, I think we should generally be robust enough to recognize pretty much any CPUID 'mischief' and at minimum not crash. 2) But this FPU crash is different, here the reason for the crash is the following bug in the FPU code: fpu__init_system(); /* inits the FPU based on masked CPUID */ ... CPUID *extends* ... fpu__init_cpu(); /* Actually uses the FPU now based on the expanded CPUID */ *KABOOM* I.e. we (obviously) should not base half on the FPU logic on different CPUID bits than the other half of the FPU logic. I'll queue up the fix, which is to do the early FPU init after our CPUID state stabilizes. (i.e. the second patch I sent to Mike.) Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On 06/28/2015 11:40 PM, Ingo Molnar wrote: > > Ok, so could you please move the fpu__init_system() further up and see which > position is that starts breaking with the BIOS option set? > > here's the current, broken layout of the code: > > get_cpu_cap(c); > [0] fpu__init_system(c); > > if (this_cpu->c_early_init) > this_cpu->c_early_init(c); > > [1] > c->cpu_index = 0; > [2] > filter_cpuid_features(c, false); > > [3] > if (this_cpu->c_bsp_init) > this_cpu->c_bsp_init(c); > > [4] > setup_force_cpu_cap(X86_FEATURE_ALWAYS); > [5] > } > > and we know it from your testing that moving [0] to [5] fixes the crash. > > The question is, can we move it to [4], [3], [2] or even [1] instead, without > breaking the system? > > I still don't see where the breakage comes from, but this would help us > narrow it > down. > It should be moved to [4] or [5]. I would argue that the line setting X86_FEATURE_ALWAYS should moved up and then fpu__init_system(c) should be moved after the c_bsp_init() line. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On 06/29/2015 02:35 AM, Ingo Molnar wrote: > > Indeed, I bet that makes a difference! > > I wish that 'unmasking' logic came with more comments: > > - Why do BIOSen ever mask CPUIDs? > To work around bugs in legacy operating systems. > - Why do we unmask the masking? Because we don't have those specific bugs. > - Why doesn't the kernel keep on working just fine even if certain CPUID > aspects > are turned off? Because it exercises code paths that are otherwise impossible, for example, it exposes the XSAVE capability without exposing the XSAVE information in higher CPUID leaves. The other option would be to have a list of CPU features that should be turned off whenever the CPUID leaf maximum is too low, but it gives a better user experience to just override the BIOS capping and then we have fewer code paths in the kernel to worry about. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, Jun 29, 2015 at 02:27:23PM +0200, Mike Galbraith wrote: > With it commented out, and fpu__init_system() either back at previously > booting position [5] or at original [0], doesn't matter, box is dead, > but differently. It stalls after setting clocksource to tsc, and just > sits there. ... which means that unmasking the CPUID features is absolutely needed on Linux. Not unmasking probably triggers this original bug which 066941bd4eeb ("x86: unmask CPUID levels on Intel CPUs") fixed. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, 2015-06-29 at 10:33 +0200, Borislav Petkov wrote: > I bet it is that > > /* Unmask CPUID levels if masked: */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > if (msr_clear_bit(MSR_IA32_MISC_ENABLE, > MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { > c->cpuid_level = cpuid_eax(0); > get_cpu_cap(c); > } > } > > in early_init_intel(). If you feel like playing, you might comment it > out to see what happens. With it commented out, and fpu__init_system() either back at previously booting position [5] or at original [0], doesn't matter, box is dead, but differently. It stalls after setting clocksource to tsc, and just sits there. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, Jun 29, 2015 at 11:35:04AM +0200, Ingo Molnar wrote: > I wish that 'unmasking' logic came with more comments: > > - Why do BIOSen ever mask CPUIDs? Doesn't say a thing why: 066941bd4eeb ("x86: unmask CPUID levels on Intel CPUs") SDM doesn't say why either: "Limit CPUID Maxval (R/W) When this bit is set to 1, CPUID.00H returns a maximum value in EAX[7:0] of 3. BIOS should contain a setup question that allows users to specify when the installed OS does not support CPUID functions greater than 3. ... Setting this bit may cause unexpected behavior in software that depends on the availability of CPUID leaves greater than 3." In the case of hiding XSAVE from windoze ninety-old, probably something was exploding there, there wasn't a fix to the software so the hardware had to become soft. Purely hypothetical, of course. The last sentence from the SDM quote above also explains why the Linux workaround of clearing that bit again, exists. > - Why do we unmask the masking? Also purely hypothetical: because Linux doesn't have the windoze problem. > - Why doesn't the kernel keep on working just fine even if certain > CPUID aspects are turned off? I guess that should be doable but one has to get such a box, enable that BIOS feature and fix all the fallout that happens. Provided it can be fixed. hpa went and reenabled those CPUID leafs instead in 066941bd4eeb1. I guess we simply shouldn't do any CPUID-dependent stuff before: if (this_cpu->c_early_init) this_cpu->c_early_init(c); and slap a big fat comment above it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Borislav Petkov wrote: > On Mon, Jun 29, 2015 at 10:25:29AM +0200, Mike Galbraith wrote: > > On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: > > > * > > > Ok, so could you please move the fpu__init_system() further up and see > > > which > > > position is that starts breaking with the BIOS option set? > > > > > > here's the current, broken layout of the code: > > > > > > get_cpu_cap(c); > > > [0] fpu__init_system(c); > > > > > > if (this_cpu->c_early_init) > > > this_cpu->c_early_init(c); > > > [0] is the only spot that breaks box. > > I bet it is that > > /* Unmask CPUID levels if masked: */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > if (msr_clear_bit(MSR_IA32_MISC_ENABLE, > MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { > c->cpuid_level = cpuid_eax(0); > get_cpu_cap(c); > } > } > > in early_init_intel(). If you feel like playing, you might comment it > out to see what happens. > > :-) Indeed, I bet that makes a difference! I wish that 'unmasking' logic came with more comments: - Why do BIOSen ever mask CPUIDs? - Why do we unmask the masking? - Why doesn't the kernel keep on working just fine even if certain CPUID aspects are turned off? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, 2015-06-29 at 10:33 +0200, Borislav Petkov wrote: > On Mon, Jun 29, 2015 at 10:25:29AM +0200, Mike Galbraith wrote: > > On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: > > > * > > > Ok, so could you please move the fpu__init_system() further up and see > > > which > > > position is that starts breaking with the BIOS option set? > > > > > > here's the current, broken layout of the code: > > > > > > get_cpu_cap(c); > > > [0] fpu__init_system(c); > > > > > > if (this_cpu->c_early_init) > > > this_cpu->c_early_init(c); > > > [0] is the only spot that breaks box. > > I bet it is that > > /* Unmask CPUID levels if masked: */ > if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { > if (msr_clear_bit(MSR_IA32_MISC_ENABLE, > MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { > c->cpuid_level = cpuid_eax(0); > get_cpu_cap(c); > } > } > > in early_init_intel(). If you feel like playing, you might comment it > out to see what happens. I'll poke after I get back from physical tort^Wtherapy. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, Jun 29, 2015 at 10:25:29AM +0200, Mike Galbraith wrote: > On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: > > * > > Ok, so could you please move the fpu__init_system() further up and see > > which > > position is that starts breaking with the BIOS option set? > > > > here's the current, broken layout of the code: > > > > get_cpu_cap(c); > > [0] fpu__init_system(c); > > > > if (this_cpu->c_early_init) > > this_cpu->c_early_init(c); > [0] is the only spot that breaks box. I bet it is that /* Unmask CPUID levels if masked: */ if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { c->cpuid_level = cpuid_eax(0); get_cpu_cap(c); } } in early_init_intel(). If you feel like playing, you might comment it out to see what happens. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: > * > Ok, so could you please move the fpu__init_system() further up and see which > position is that starts breaking with the BIOS option set? > > here's the current, broken layout of the code: > > get_cpu_cap(c); > [0] fpu__init_system(c); > > if (this_cpu->c_early_init) > this_cpu->c_early_init(c); > > [1] > c->cpu_index = 0; > [2] > filter_cpuid_features(c, false); > > [3] > if (this_cpu->c_bsp_init) > this_cpu->c_bsp_init(c); > > [4] > setup_force_cpu_cap(X86_FEATURE_ALWAYS); > [5] > } > > and we know it from your testing that moving [0] to [5] fixes the crash. > > The question is, can we move it to [4], [3], [2] or even [1] instead, without > breaking the system? > > I still don't see where the breakage comes from, but this would help us > narrow it > down. [0] is the only spot that breaks box. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Mike Galbraith wrote: > > This would suggest sensitivity on CPUID details, i.e. that doing > > fpu__init_system() before other CPU init sequences is causing the bug. > > > > Does the patch below perhaps make a difference? (I'd suggest to apply it > > _without_ the other patch I sent.) > > Yup, that made it not care about the BIOS setting.. again. > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > index 9fc5e3d9d9c8..922c5e0cea4c 100644 > > --- a/arch/x86/kernel/cpu/common.c > > +++ b/arch/x86/kernel/cpu/common.c > > @@ -742,7 +742,6 @@ static void __init early_identify_cpu(struct > > cpuinfo_x86 *c) > > cpu_detect(c); > > get_cpu_vendor(c); > > get_cpu_cap(c); > > - fpu__init_system(c); > > > > if (this_cpu->c_early_init) > > this_cpu->c_early_init(c); > > @@ -754,6 +753,7 @@ static void __init early_identify_cpu(struct > > cpuinfo_x86 *c) > > this_cpu->c_bsp_init(c); > > > > setup_force_cpu_cap(X86_FEATURE_ALWAYS); > > + fpu__init_system(c); > > } Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu->c_early_init) this_cpu->c_early_init(c); [1] c->cpu_index = 0; [2] filter_cpuid_features(c, false); [3] if (this_cpu->c_bsp_init) this_cpu->c_bsp_init(c); [4] setup_force_cpu_cap(X86_FEATURE_ALWAYS); [5] } and we know it from your testing that moving [0] to [5] fixes the crash. The question is, can we move it to [4], [3], [2] or even [1] instead, without breaking the system? I still don't see where the breakage comes from, but this would help us narrow it down. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, Jun 29, 2015 at 02:27:23PM +0200, Mike Galbraith wrote: With it commented out, and fpu__init_system() either back at previously booting position [5] or at original [0], doesn't matter, box is dead, but differently. It stalls after setting clocksource to tsc, and just sits there. ... which means that unmasking the CPUID features is absolutely needed on Linux. Not unmasking probably triggers this original bug which 066941bd4eeb (x86: unmask CPUID levels on Intel CPUs) fixed. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, Jun 29, 2015 at 11:35:04AM +0200, Ingo Molnar wrote: I wish that 'unmasking' logic came with more comments: - Why do BIOSen ever mask CPUIDs? Doesn't say a thing why: 066941bd4eeb (x86: unmask CPUID levels on Intel CPUs) SDM doesn't say why either: Limit CPUID Maxval (R/W) When this bit is set to 1, CPUID.00H returns a maximum value in EAX[7:0] of 3. BIOS should contain a setup question that allows users to specify when the installed OS does not support CPUID functions greater than 3. ... Setting this bit may cause unexpected behavior in software that depends on the availability of CPUID leaves greater than 3. In the case of hiding XSAVE from windoze ninety-old, probably something was exploding there, there wasn't a fix to the software so the hardware had to become soft. Purely hypothetical, of course. The last sentence from the SDM quote above also explains why the Linux workaround of clearing that bit again, exists. - Why do we unmask the masking? Also purely hypothetical: because Linux doesn't have the windoze problem. - Why doesn't the kernel keep on working just fine even if certain CPUID aspects are turned off? I guess that should be doable but one has to get such a box, enable that BIOS feature and fix all the fallout that happens. Provided it can be fixed. hpa went and reenabled those CPUID leafs instead in 066941bd4eeb1. I guess we simply shouldn't do any CPUID-dependent stuff before: if (this_cpu-c_early_init) this_cpu-c_early_init(c); and slap a big fat comment above it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, 2015-06-29 at 10:33 +0200, Borislav Petkov wrote: I bet it is that /* Unmask CPUID levels if masked: */ if (c-x86 6 || (c-x86 == 6 c-x86_model = 0xd)) { if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) 0) { c-cpuid_level = cpuid_eax(0); get_cpu_cap(c); } } in early_init_intel(). If you feel like playing, you might comment it out to see what happens. With it commented out, and fpu__init_system() either back at previously booting position [5] or at original [0], doesn't matter, box is dead, but differently. It stalls after setting clocksource to tsc, and just sits there. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Borislav Petkov b...@alien8.de wrote: On Mon, Jun 29, 2015 at 10:25:29AM +0200, Mike Galbraith wrote: On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: * Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [0] is the only spot that breaks box. I bet it is that /* Unmask CPUID levels if masked: */ if (c-x86 6 || (c-x86 == 6 c-x86_model = 0xd)) { if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) 0) { c-cpuid_level = cpuid_eax(0); get_cpu_cap(c); } } in early_init_intel(). If you feel like playing, you might comment it out to see what happens. :-) Indeed, I bet that makes a difference! I wish that 'unmasking' logic came with more comments: - Why do BIOSen ever mask CPUIDs? - Why do we unmask the masking? - Why doesn't the kernel keep on working just fine even if certain CPUID aspects are turned off? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On 06/28/2015 11:40 PM, Ingo Molnar wrote: Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [1] c-cpu_index = 0; [2] filter_cpuid_features(c, false); [3] if (this_cpu-c_bsp_init) this_cpu-c_bsp_init(c); [4] setup_force_cpu_cap(X86_FEATURE_ALWAYS); [5] } and we know it from your testing that moving [0] to [5] fixes the crash. The question is, can we move it to [4], [3], [2] or even [1] instead, without breaking the system? I still don't see where the breakage comes from, but this would help us narrow it down. It should be moved to [4] or [5]. I would argue that the line setting X86_FEATURE_ALWAYS should moved up and then fpu__init_system(c) should be moved after the c_bsp_init() line. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On 06/29/2015 02:35 AM, Ingo Molnar wrote: Indeed, I bet that makes a difference! I wish that 'unmasking' logic came with more comments: - Why do BIOSen ever mask CPUIDs? To work around bugs in legacy operating systems. - Why do we unmask the masking? Because we don't have those specific bugs. - Why doesn't the kernel keep on working just fine even if certain CPUID aspects are turned off? Because it exercises code paths that are otherwise impossible, for example, it exposes the XSAVE capability without exposing the XSAVE information in higher CPUID leaves. The other option would be to have a list of CPU features that should be turned off whenever the CPUID leaf maximum is too low, but it gives a better user experience to just override the BIOS capping and then we have fewer code paths in the kernel to worry about. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, Jun 29, 2015 at 10:25:29AM +0200, Mike Galbraith wrote: On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: * Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [0] is the only spot that breaks box. I bet it is that /* Unmask CPUID levels if masked: */ if (c-x86 6 || (c-x86 == 6 c-x86_model = 0xd)) { if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) 0) { c-cpuid_level = cpuid_eax(0); get_cpu_cap(c); } } in early_init_intel(). If you feel like playing, you might comment it out to see what happens. :-) -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Mike Galbraith umgwanakikb...@gmail.com wrote: This would suggest sensitivity on CPUID details, i.e. that doing fpu__init_system() before other CPU init sequences is causing the bug. Does the patch below perhaps make a difference? (I'd suggest to apply it _without_ the other patch I sent.) Yup, that made it not care about the BIOS setting.. again. diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9fc5e3d9d9c8..922c5e0cea4c 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -742,7 +742,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) cpu_detect(c); get_cpu_vendor(c); get_cpu_cap(c); - fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); @@ -754,6 +753,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) this_cpu-c_bsp_init(c); setup_force_cpu_cap(X86_FEATURE_ALWAYS); + fpu__init_system(c); } Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [1] c-cpu_index = 0; [2] filter_cpuid_features(c, false); [3] if (this_cpu-c_bsp_init) this_cpu-c_bsp_init(c); [4] setup_force_cpu_cap(X86_FEATURE_ALWAYS); [5] } and we know it from your testing that moving [0] to [5] fixes the crash. The question is, can we move it to [4], [3], [2] or even [1] instead, without breaking the system? I still don't see where the breakage comes from, but this would help us narrow it down. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, 2015-06-29 at 10:33 +0200, Borislav Petkov wrote: On Mon, Jun 29, 2015 at 10:25:29AM +0200, Mike Galbraith wrote: On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: * Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [0] is the only spot that breaks box. I bet it is that /* Unmask CPUID levels if masked: */ if (c-x86 6 || (c-x86 == 6 c-x86_model = 0xd)) { if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) 0) { c-cpuid_level = cpuid_eax(0); get_cpu_cap(c); } } in early_init_intel(). If you feel like playing, you might comment it out to see what happens. I'll poke after I get back from physical tort^Wtherapy. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Mon, 2015-06-29 at 08:40 +0200, Ingo Molnar wrote: * Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [1] c-cpu_index = 0; [2] filter_cpuid_features(c, false); [3] if (this_cpu-c_bsp_init) this_cpu-c_bsp_init(c); [4] setup_force_cpu_cap(X86_FEATURE_ALWAYS); [5] } and we know it from your testing that moving [0] to [5] fixes the crash. The question is, can we move it to [4], [3], [2] or even [1] instead, without breaking the system? I still don't see where the breakage comes from, but this would help us narrow it down. [0] is the only spot that breaks box. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* H. Peter Anvin h...@zytor.com wrote: On 06/29/2015 02:35 AM, Ingo Molnar wrote: Indeed, I bet that makes a difference! I wish that 'unmasking' logic came with more comments: - Why do BIOSen ever mask CPUIDs? To work around bugs in legacy operating systems. - Why do we unmask the masking? Because we don't have those specific bugs. Great - would be nice to put those reasons between /* */ markers, to keep future generations (and overworked maintainers!) from wondering. - Why doesn't the kernel keep on working just fine even if certain CPUID aspects are turned off? Because it exercises code paths that are otherwise impossible, for example, it exposes the XSAVE capability without exposing the XSAVE information in higher CPUID leaves. The other option would be to have a list of CPU features that should be turned off whenever the CPUID leaf maximum is too low, but it gives a better user experience to just override the BIOS capping and then we have fewer code paths in the kernel to worry about. 1) As a side note, I think we should generally be robust enough to recognize pretty much any CPUID 'mischief' and at minimum not crash. 2) But this FPU crash is different, here the reason for the crash is the following bug in the FPU code: fpu__init_system(); /* inits the FPU based on masked CPUID */ ... CPUID *extends* ... fpu__init_cpu(); /* Actually uses the FPU now based on the expanded CPUID */ *KABOOM* I.e. we (obviously) should not base half on the FPU logic on different CPUID bits than the other half of the FPU logic. I'll queue up the fix, which is to do the early FPU init after our CPUID state stabilizes. (i.e. the second patch I sent to Mike.) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Borislav Petkov b...@alien8.de wrote: On Mon, Jun 29, 2015 at 02:27:23PM +0200, Mike Galbraith wrote: With it commented out, and fpu__init_system() either back at previously booting position [5] or at original [0], doesn't matter, box is dead, but differently. It stalls after setting clocksource to tsc, and just sits there. ... which means that unmasking the CPUID features is absolutely needed on Linux. Not unmasking probably triggers this original bug which 066941bd4eeb (x86: unmask CPUID levels on Intel CPUs) fixed. Yes. And I'd consider us hanging a separate (but not high prio) bug: the kernel should be robust as long as the CPUID data is stable. In that sense the original fix is right (we really want to unmask all available CPUID leaves), but it also masked another (less severe) kernel bug. For example virtualization is known to tweak CPUID details creatively, and firmware (as this example shows it) can mess it up a well, so we generally want to treat it as untrusted input data that needs to be validated. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* H. Peter Anvin h...@zytor.com wrote: On 06/28/2015 11:40 PM, Ingo Molnar wrote: Ok, so could you please move the fpu__init_system() further up and see which position is that starts breaking with the BIOS option set? here's the current, broken layout of the code: get_cpu_cap(c); [0] fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); [1] c-cpu_index = 0; [2] filter_cpuid_features(c, false); [3] if (this_cpu-c_bsp_init) this_cpu-c_bsp_init(c); [4] setup_force_cpu_cap(X86_FEATURE_ALWAYS); [5] } and we know it from your testing that moving [0] to [5] fixes the crash. The question is, can we move it to [4], [3], [2] or even [1] instead, without breaking the system? I still don't see where the breakage comes from, but this would help us narrow it down. It should be moved to [4] or [5]. I would argue that the line setting X86_FEATURE_ALWAYS should moved up and then fpu__init_system(c) should be moved after the c_bsp_init() line. Yeah, so the patch I sent to Mike (and which solved the bug) moved it to [5]. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sun, 28 Jun 2015, Mike Galbraith wrote: > On Sun, 2015-06-28 at 12:06 -0300, Henrique de Moraes Holschuh wrote: > > It is just that this kind of breakage should not be subtle if we can help > > it, because people will use a crippled system for years without noticing... > > If you can use it without noticing for years, it ain't crippled, or? My > point being that severity seems more akin to the box having a zit behind > its left ear, in which case lobotomizing it seems a tad extreme. Noted. However if it does boot with cpuid limited (and we don't "unlimit" it somehow) on a recent processor, at *best* the user paid good money for a lot of stuff that is going to not be used to enhance system performance and system security. It is not nice to the user to just limp along silently about this. -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sun, 2015-06-28 at 12:06 -0300, Henrique de Moraes Holschuh wrote: > On Sun, 28 Jun 2015, Mike Galbraith wrote: > > > > > > BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. > > > > > > Well, it is supposed to disable CPUID levels >= 0x04. This thing should > > > *NEVER* be enabled, the last operating system that required it to be > > > enabled > > > was Windows 98. > > > > > > Can/do we override that crap during cpu init? If we cannot/don't, maybe > > > instead of limping along with CPUID crippled, it would be better to either > > > output a very nasty warning, or outright stop booting [with an appropriate > > > error message] ? > > > > Why get all upset? We didn't even notice before, nor did/does that > > other OS. A casual "BTW, your BIOS sucks.." should suffice, no? > > Oh, I am not upset, although I suppose my reply did look like it. Sorry > about that. I didn't mean you personally of course, I meant the kernel ;-) > It is just that this kind of breakage should not be subtle if we can help > it, because people will use a crippled system for years without noticing... If you can use it without noticing for years, it ain't crippled, or? My point being that severity seems more akin to the box having a zit behind its left ear, in which case lobotomizing it seems a tad extreme. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sun, 28 Jun 2015, Mike Galbraith wrote: > > > > > BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. > > > > Well, it is supposed to disable CPUID levels >= 0x04. This thing should > > *NEVER* be enabled, the last operating system that required it to be enabled > > was Windows 98. > > > > Can/do we override that crap during cpu init? If we cannot/don't, maybe > > instead of limping along with CPUID crippled, it would be better to either > > output a very nasty warning, or outright stop booting [with an appropriate > > error message] ? > > Why get all upset? We didn't even notice before, nor did/does that > other OS. A casual "BTW, your BIOS sucks.." should suffice, no? Oh, I am not upset, although I suppose my reply did look like it. Sorry about that. It is just that this kind of breakage should not be subtle if we can help it, because people will use a crippled system for years without noticing... -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sun, 28 Jun 2015, Mike Galbraith wrote: BIOS setting Limit CPUID Maximum upsets new fpu code mightily. Well, it is supposed to disable CPUID levels = 0x04. This thing should *NEVER* be enabled, the last operating system that required it to be enabled was Windows 98. Can/do we override that crap during cpu init? If we cannot/don't, maybe instead of limping along with CPUID crippled, it would be better to either output a very nasty warning, or outright stop booting [with an appropriate error message] ? Why get all upset? We didn't even notice before, nor did/does that other OS. A casual BTW, your BIOS sucks.. should suffice, no? Oh, I am not upset, although I suppose my reply did look like it. Sorry about that. It is just that this kind of breakage should not be subtle if we can help it, because people will use a crippled system for years without noticing... -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sun, 2015-06-28 at 12:06 -0300, Henrique de Moraes Holschuh wrote: On Sun, 28 Jun 2015, Mike Galbraith wrote: BIOS setting Limit CPUID Maximum upsets new fpu code mightily. Well, it is supposed to disable CPUID levels = 0x04. This thing should *NEVER* be enabled, the last operating system that required it to be enabled was Windows 98. Can/do we override that crap during cpu init? If we cannot/don't, maybe instead of limping along with CPUID crippled, it would be better to either output a very nasty warning, or outright stop booting [with an appropriate error message] ? Why get all upset? We didn't even notice before, nor did/does that other OS. A casual BTW, your BIOS sucks.. should suffice, no? Oh, I am not upset, although I suppose my reply did look like it. Sorry about that. I didn't mean you personally of course, I meant the kernel ;-) It is just that this kind of breakage should not be subtle if we can help it, because people will use a crippled system for years without noticing... If you can use it without noticing for years, it ain't crippled, or? My point being that severity seems more akin to the box having a zit behind its left ear, in which case lobotomizing it seems a tad extreme. -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sun, 28 Jun 2015, Mike Galbraith wrote: On Sun, 2015-06-28 at 12:06 -0300, Henrique de Moraes Holschuh wrote: It is just that this kind of breakage should not be subtle if we can help it, because people will use a crippled system for years without noticing... If you can use it without noticing for years, it ain't crippled, or? My point being that severity seems more akin to the box having a zit behind its left ear, in which case lobotomizing it seems a tad extreme. Noted. However if it does boot with cpuid limited (and we don't unlimit it somehow) on a recent processor, at *best* the user paid good money for a lot of stuff that is going to not be used to enhance system performance and system security. It is not nice to the user to just limp along silently about this. -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 2015-06-27 at 18:02 -0300, Henrique de Moraes Holschuh wrote: > On Sat, 27 Jun 2015, Mike Galbraith wrote: > > > > BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. > > > > That BIOS setting is annotated with the helpful text "Disabled for > > Windows XP". It makes box say interesting things during boot, like... > > > > x86/fpu: XSTATE_CPUID missing! > > > > > > ..or with HEAD, it triggers warning.. > > > > if (boot_cpu_data.cpuid_level < XSTATE_CPUID) { > > WARN_ON_FPU(1); > > return; > > } > > > > ..and all kinds of bad juju follows. I have no idea what the thing does > > beyond what I can interpolate from the word 'limit'. > > Well, it is supposed to disable CPUID levels >= 0x04. This thing should > *NEVER* be enabled, the last operating system that required it to be enabled > was Windows 98. > > Can/do we override that crap during cpu init? If we cannot/don't, maybe > instead of limping along with CPUID crippled, it would be better to either > output a very nasty warning, or outright stop booting [with an appropriate > error message] ? Why get all upset? We didn't even notice before, nor did/does that other OS. A casual "BTW, your BIOS sucks.." should suffice, no? -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 27 Jun 2015, Mike Galbraith wrote: > > > BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. > > That BIOS setting is annotated with the helpful text "Disabled for > Windows XP". It makes box say interesting things during boot, like... > > x86/fpu: XSTATE_CPUID missing! > > > ..or with HEAD, it triggers warning.. > > if (boot_cpu_data.cpuid_level < XSTATE_CPUID) { > WARN_ON_FPU(1); > return; > } > > ..and all kinds of bad juju follows. I have no idea what the thing does > beyond what I can interpolate from the word 'limit'. Well, it is supposed to disable CPUID levels >= 0x04. This thing should *NEVER* be enabled, the last operating system that required it to be enabled was Windows 98. Can/do we override that crap during cpu init? If we cannot/don't, maybe instead of limping along with CPUID crippled, it would be better to either output a very nasty warning, or outright stop booting [with an appropriate error message] ? -- "One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie." -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 2015-06-27 at 11:37 +0200, Borislav Petkov wrote: > On Sat, Jun 27, 2015 at 10:55:28AM +0200, Mike Galbraith wrote: > > Yup, that made it not care about the BIOS setting.. again. > > Does it say > > "x86/fpu: Legacy x87 FPU detected." > > with Ingo's patch? Nope. > Or do you see that "x86/fpu: Enabled xstate features... " print out from > the end of fpu__init_system_xstate()? [0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' [0.00] x86/fpu: Enabled xstate features 0x7, context size is 0x340 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, Jun 27, 2015 at 10:55:28AM +0200, Mike Galbraith wrote: > Yup, that made it not care about the BIOS setting.. again. Does it say "x86/fpu: Legacy x87 FPU detected." with Ingo's patch? Or do you see that "x86/fpu: Enabled xstate features... " print out from the end of fpu__init_system_xstate()? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 2015-06-27 at 10:25 +0200, Ingo Molnar wrote: > * Mike Galbraith wrote: > > > On Sat, 2015-06-27 at 08:25 +0200, Mike Galbraith wrote: > > > Hi Ingo, > > > > > > My i7-4790 box is having one hell of a time with this merge window, is > > > dead in the water. > > > > BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. > > Ok, that's interesting. Mind explaining it a bit more verbosely - which > setting is > causing what? That BIOS setting is annotated with the helpful text "Disabled for Windows XP". It makes box say interesting things during boot, like... x86/fpu: XSTATE_CPUID missing! ..or with HEAD, it triggers warning.. if (boot_cpu_data.cpuid_level < XSTATE_CPUID) { WARN_ON_FPU(1); return; } ..and all kinds of bad juju follows. I have no idea what the thing does beyond what I can interpolate from the word 'limit'. > This would suggest sensitivity on CPUID details, i.e. that doing > fpu__init_system() before other CPU init sequences is causing the bug. > > Does the patch below perhaps make a difference? (I'd suggest to apply it > _without_ > the other patch I sent.) Yup, that made it not care about the BIOS setting.. again. > Thanks, > > Ingo > > arch/x86/kernel/cpu/common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 9fc5e3d9d9c8..922c5e0cea4c 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -742,7 +742,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 > *c) > cpu_detect(c); > get_cpu_vendor(c); > get_cpu_cap(c); > - fpu__init_system(c); > > if (this_cpu->c_early_init) > this_cpu->c_early_init(c); > @@ -754,6 +753,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 > *c) > this_cpu->c_bsp_init(c); > > setup_force_cpu_cap(X86_FEATURE_ALWAYS); > + fpu__init_system(c); > } > > void __init early_cpu_init(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Mike Galbraith wrote: > On Sat, 2015-06-27 at 08:25 +0200, Mike Galbraith wrote: > > Hi Ingo, > > > > My i7-4790 box is having one hell of a time with this merge window, is > > dead in the water. > > BIOS setting "Limit CPUID Maximum" upsets new fpu code mightily. Ok, that's interesting. Mind explaining it a bit more verbosely - which setting is causing what? This would suggest sensitivity on CPUID details, i.e. that doing fpu__init_system() before other CPU init sequences is causing the bug. Does the patch below perhaps make a difference? (I'd suggest to apply it _without_ the other patch I sent.) Thanks, Ingo arch/x86/kernel/cpu/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9fc5e3d9d9c8..922c5e0cea4c 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -742,7 +742,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) cpu_detect(c); get_cpu_vendor(c); get_cpu_cap(c); - fpu__init_system(c); if (this_cpu->c_early_init) this_cpu->c_early_init(c); @@ -754,6 +753,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) this_cpu->c_bsp_init(c); setup_force_cpu_cap(X86_FEATURE_ALWAYS); + fpu__init_system(c); } void __init early_cpu_init(void) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 2015-06-27 at 10:25 +0200, Ingo Molnar wrote: * Mike Galbraith umgwanakikb...@gmail.com wrote: On Sat, 2015-06-27 at 08:25 +0200, Mike Galbraith wrote: Hi Ingo, My i7-4790 box is having one hell of a time with this merge window, is dead in the water. BIOS setting Limit CPUID Maximum upsets new fpu code mightily. Ok, that's interesting. Mind explaining it a bit more verbosely - which setting is causing what? That BIOS setting is annotated with the helpful text Disabled for Windows XP. It makes box say interesting things during boot, like... x86/fpu: XSTATE_CPUID missing! ..or with HEAD, it triggers warning.. if (boot_cpu_data.cpuid_level XSTATE_CPUID) { WARN_ON_FPU(1); return; } ..and all kinds of bad juju follows. I have no idea what the thing does beyond what I can interpolate from the word 'limit'. This would suggest sensitivity on CPUID details, i.e. that doing fpu__init_system() before other CPU init sequences is causing the bug. Does the patch below perhaps make a difference? (I'd suggest to apply it _without_ the other patch I sent.) Yup, that made it not care about the BIOS setting.. again. Thanks, Ingo arch/x86/kernel/cpu/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9fc5e3d9d9c8..922c5e0cea4c 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -742,7 +742,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) cpu_detect(c); get_cpu_vendor(c); get_cpu_cap(c); - fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); @@ -754,6 +753,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) this_cpu-c_bsp_init(c); setup_force_cpu_cap(X86_FEATURE_ALWAYS); + fpu__init_system(c); } void __init early_cpu_init(void) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, Jun 27, 2015 at 10:55:28AM +0200, Mike Galbraith wrote: Yup, that made it not care about the BIOS setting.. again. Does it say x86/fpu: Legacy x87 FPU detected. with Ingo's patch? Or do you see that x86/fpu: Enabled xstate features... print out from the end of fpu__init_system_xstate()? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
* Mike Galbraith umgwanakikb...@gmail.com wrote: On Sat, 2015-06-27 at 08:25 +0200, Mike Galbraith wrote: Hi Ingo, My i7-4790 box is having one hell of a time with this merge window, is dead in the water. BIOS setting Limit CPUID Maximum upsets new fpu code mightily. Ok, that's interesting. Mind explaining it a bit more verbosely - which setting is causing what? This would suggest sensitivity on CPUID details, i.e. that doing fpu__init_system() before other CPU init sequences is causing the bug. Does the patch below perhaps make a difference? (I'd suggest to apply it _without_ the other patch I sent.) Thanks, Ingo arch/x86/kernel/cpu/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 9fc5e3d9d9c8..922c5e0cea4c 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -742,7 +742,6 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) cpu_detect(c); get_cpu_vendor(c); get_cpu_cap(c); - fpu__init_system(c); if (this_cpu-c_early_init) this_cpu-c_early_init(c); @@ -754,6 +753,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) this_cpu-c_bsp_init(c); setup_force_cpu_cap(X86_FEATURE_ALWAYS); + fpu__init_system(c); } void __init early_cpu_init(void) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 2015-06-27 at 11:37 +0200, Borislav Petkov wrote: On Sat, Jun 27, 2015 at 10:55:28AM +0200, Mike Galbraith wrote: Yup, that made it not care about the BIOS setting.. again. Does it say x86/fpu: Legacy x87 FPU detected. with Ingo's patch? Nope. Or do you see that x86/fpu: Enabled xstate features... print out from the end of fpu__init_system_xstate()? [0.00] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100 [0.00] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers' [0.00] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers' [0.00] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers' [0.00] x86/fpu: Enabled xstate features 0x7, context size is 0x340 bytes, using 'standard' format. [0.00] x86/fpu: Using 'eager' FPU context switches. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 27 Jun 2015, Mike Galbraith wrote: BIOS setting Limit CPUID Maximum upsets new fpu code mightily. That BIOS setting is annotated with the helpful text Disabled for Windows XP. It makes box say interesting things during boot, like... x86/fpu: XSTATE_CPUID missing! ..or with HEAD, it triggers warning.. if (boot_cpu_data.cpuid_level XSTATE_CPUID) { WARN_ON_FPU(1); return; } ..and all kinds of bad juju follows. I have no idea what the thing does beyond what I can interpolate from the word 'limit'. Well, it is supposed to disable CPUID levels = 0x04. This thing should *NEVER* be enabled, the last operating system that required it to be enabled was Windows 98. Can/do we override that crap during cpu init? If we cannot/don't, maybe instead of limping along with CPUID crippled, it would be better to either output a very nasty warning, or outright stop booting [with an appropriate error message] ? -- One disk to rule them all, One disk to find them. One disk to bring them all and in the darkness grind them. In the Land of Redmond where the shadows lie. -- The Silicon Valley Tarot Henrique Holschuh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [all better] Re: regression: massive trouble with fpu rework
On Sat, 2015-06-27 at 18:02 -0300, Henrique de Moraes Holschuh wrote: On Sat, 27 Jun 2015, Mike Galbraith wrote: BIOS setting Limit CPUID Maximum upsets new fpu code mightily. That BIOS setting is annotated with the helpful text Disabled for Windows XP. It makes box say interesting things during boot, like... x86/fpu: XSTATE_CPUID missing! ..or with HEAD, it triggers warning.. if (boot_cpu_data.cpuid_level XSTATE_CPUID) { WARN_ON_FPU(1); return; } ..and all kinds of bad juju follows. I have no idea what the thing does beyond what I can interpolate from the word 'limit'. Well, it is supposed to disable CPUID levels = 0x04. This thing should *NEVER* be enabled, the last operating system that required it to be enabled was Windows 98. Can/do we override that crap during cpu init? If we cannot/don't, maybe instead of limping along with CPUID crippled, it would be better to either output a very nasty warning, or outright stop booting [with an appropriate error message] ? Why get all upset? We didn't even notice before, nor did/does that other OS. A casual BTW, your BIOS sucks.. should suffice, no? -Mike -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/