Re: [BUG] GPF on reboot of box

2017-10-08 Thread Steven Rostedt
On Sat, 7 Oct 2017 15:49:33 -0700
Andy Lutomirski  wrote:


> Howdy, batman!  Can you try the attached patch?

Yep and it worked.

Reported-by: Steven Rostedt (VMware) 
Tested-by: Steven Rostedt (VMware) 

Thanks!

-- Steve


Re: [BUG] GPF on reboot of box

2017-10-08 Thread Steven Rostedt
On Sat, 7 Oct 2017 15:49:33 -0700
Andy Lutomirski  wrote:


> Howdy, batman!  Can you try the attached patch?

Yep and it worked.

Reported-by: Steven Rostedt (VMware) 
Tested-by: Steven Rostedt (VMware) 

Thanks!

-- Steve


Re: [BUG] GPF on reboot of box

2017-10-07 Thread Andy Lutomirski
On Sat, Oct 7, 2017 at 6:36 AM, Steven Rostedt  wrote:
>
> [ Replying from an actual computer this time ]
>
> On Fri, 6 Oct 2017 19:21:26 -0700
> Nadav Amit  wrote:
>
>> IIRC, there was a problem in rc1, which should be resolved in newer rcs.
>> If you need to run rc1, you can try to use the kernel parameter “nopcid".
>
> I noticed the bug when I rebased my work on top of rc3 and started
> testing that. I only mentioned rc1 because that's the first tagged
> release that had the bug in it. If it is fixed in a later release, then
> great! But I don't see that as the case (I haven't tested work on top
> of rc3).
>
> I'd like to have this fixed without having to do workarounds like
> adding nopcid to the kernel command line. If it is a bug for me, I'm
> sure it's going to be a bug for many others that wont know how to
> complain about it.
>

Howdy, batman!  Can you try the attached patch?
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 54180fa6f66f..add33f600531 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -105,6 +105,10 @@ void __noreturn machine_real_restart(unsigned int type)
 	load_cr3(initial_page_table);
 #else
 	write_cr3(real_mode_header->trampoline_pgd);
+
+	/* Exiting long mode will fail if CR4.PCIDE is set. */
+	if (static_cpu_has(X86_FEATURE_PCID))
+		cr4_clear_bits(X86_CR4_PCIDE);
 #endif
 
 	/* Jump to the identity-mapped low memory code */


Re: [BUG] GPF on reboot of box

2017-10-07 Thread Andy Lutomirski
On Sat, Oct 7, 2017 at 6:36 AM, Steven Rostedt  wrote:
>
> [ Replying from an actual computer this time ]
>
> On Fri, 6 Oct 2017 19:21:26 -0700
> Nadav Amit  wrote:
>
>> IIRC, there was a problem in rc1, which should be resolved in newer rcs.
>> If you need to run rc1, you can try to use the kernel parameter “nopcid".
>
> I noticed the bug when I rebased my work on top of rc3 and started
> testing that. I only mentioned rc1 because that's the first tagged
> release that had the bug in it. If it is fixed in a later release, then
> great! But I don't see that as the case (I haven't tested work on top
> of rc3).
>
> I'd like to have this fixed without having to do workarounds like
> adding nopcid to the kernel command line. If it is a bug for me, I'm
> sure it's going to be a bug for many others that wont know how to
> complain about it.
>

Howdy, batman!  Can you try the attached patch?
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 54180fa6f66f..add33f600531 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -105,6 +105,10 @@ void __noreturn machine_real_restart(unsigned int type)
 	load_cr3(initial_page_table);
 #else
 	write_cr3(real_mode_header->trampoline_pgd);
+
+	/* Exiting long mode will fail if CR4.PCIDE is set. */
+	if (static_cpu_has(X86_FEATURE_PCID))
+		cr4_clear_bits(X86_CR4_PCIDE);
 #endif
 
 	/* Jump to the identity-mapped low memory code */


Re: [BUG] GPF on reboot of box

2017-10-07 Thread Steven Rostedt

[ Replying from an actual computer this time ]

On Fri, 6 Oct 2017 19:21:26 -0700
Nadav Amit  wrote:

> IIRC, there was a problem in rc1, which should be resolved in newer rcs.
> If you need to run rc1, you can try to use the kernel parameter “nopcid".

I noticed the bug when I rebased my work on top of rc3 and started
testing that. I only mentioned rc1 because that's the first tagged
release that had the bug in it. If it is fixed in a later release, then
great! But I don't see that as the case (I haven't tested work on top
of rc3).

I'd like to have this fixed without having to do workarounds like
adding nopcid to the kernel command line. If it is a bug for me, I'm
sure it's going to be a bug for many others that wont know how to
complain about it.

-- Steve


Re: [BUG] GPF on reboot of box

2017-10-07 Thread Steven Rostedt

[ Replying from an actual computer this time ]

On Fri, 6 Oct 2017 19:21:26 -0700
Nadav Amit  wrote:

> IIRC, there was a problem in rc1, which should be resolved in newer rcs.
> If you need to run rc1, you can try to use the kernel parameter “nopcid".

I noticed the bug when I rebased my work on top of rc3 and started
testing that. I only mentioned rc1 because that's the first tagged
release that had the bug in it. If it is fixed in a later release, then
great! But I don't see that as the case (I haven't tested work on top
of rc3).

I'd like to have this fixed without having to do workarounds like
adding nopcid to the kernel command line. If it is a bug for me, I'm
sure it's going to be a bug for many others that wont know how to
complain about it.

-- Steve


Re: [BUG] GPF on reboot of box

2017-10-06 Thread Steven Rostedt
This bug still crashes my box on rc3

-- Steve


On October 6, 2017 10:21:26 PM EDT, Nadav Amit  wrote:
>Steven Rostedt  wrote:
>
>> Starting with v4.14-rc1 my box crashes during a reboot. All I get is
>> the following:
>> 
>> [  250.615793] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [  250.635639] reboot: Restarting system
>> [  250.639671] reboot: machine restart
>> [  250.643842] general protection fault:  [#1] PREEMPT SMP KASAN
>> 
>> Then nothing. No reboot, it just hangs there.
>> 
>> I bisected it down to this commit:
>> 
>> 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
>> 
>> If I comment out the setting of X86_CR4_PCIDE in that commit, it
>boots
>> fine. But...
>> 
>> I noticed that commit c7ad5ad29 ("x86/mm/64: Initialize CR4.PCIDE
>> early") moved the problem code from arch/x86/kernel/cpu/common.c to
>> arch/x86/mm/init.c
>> 
>> If I comment it out in v4.14-rc1, it crashes on boot up.
>> 
>> It appears that my box doesn't like having X86_CR4_PCIDE set. it
>causes
>> reboot to fail. But it also appears that setting that is required for
>> some other code to work.
>
>IIRC, there was a problem in rc1, which should be resolved in newer
>rcs.
>If you need to run rc1, you can try to use the kernel parameter
>“nopcid".
>
>Nadav

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [BUG] GPF on reboot of box

2017-10-06 Thread Steven Rostedt
This bug still crashes my box on rc3

-- Steve


On October 6, 2017 10:21:26 PM EDT, Nadav Amit  wrote:
>Steven Rostedt  wrote:
>
>> Starting with v4.14-rc1 my box crashes during a reboot. All I get is
>> the following:
>> 
>> [  250.615793] sd 0:0:0:0: [sda] Synchronizing SCSI cache
>> [  250.635639] reboot: Restarting system
>> [  250.639671] reboot: machine restart
>> [  250.643842] general protection fault:  [#1] PREEMPT SMP KASAN
>> 
>> Then nothing. No reboot, it just hangs there.
>> 
>> I bisected it down to this commit:
>> 
>> 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
>> 
>> If I comment out the setting of X86_CR4_PCIDE in that commit, it
>boots
>> fine. But...
>> 
>> I noticed that commit c7ad5ad29 ("x86/mm/64: Initialize CR4.PCIDE
>> early") moved the problem code from arch/x86/kernel/cpu/common.c to
>> arch/x86/mm/init.c
>> 
>> If I comment it out in v4.14-rc1, it crashes on boot up.
>> 
>> It appears that my box doesn't like having X86_CR4_PCIDE set. it
>causes
>> reboot to fail. But it also appears that setting that is required for
>> some other code to work.
>
>IIRC, there was a problem in rc1, which should be resolved in newer
>rcs.
>If you need to run rc1, you can try to use the kernel parameter
>“nopcid".
>
>Nadav

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [BUG] GPF on reboot of box

2017-10-06 Thread Nadav Amit
Steven Rostedt  wrote:

> Starting with v4.14-rc1 my box crashes during a reboot. All I get is
> the following:
> 
> [  250.615793] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [  250.635639] reboot: Restarting system
> [  250.639671] reboot: machine restart
> [  250.643842] general protection fault:  [#1] PREEMPT SMP KASAN
> 
> Then nothing. No reboot, it just hangs there.
> 
> I bisected it down to this commit:
> 
> 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
> 
> If I comment out the setting of X86_CR4_PCIDE in that commit, it boots
> fine. But...
> 
> I noticed that commit c7ad5ad29 ("x86/mm/64: Initialize CR4.PCIDE
> early") moved the problem code from arch/x86/kernel/cpu/common.c to
> arch/x86/mm/init.c
> 
> If I comment it out in v4.14-rc1, it crashes on boot up.
> 
> It appears that my box doesn't like having X86_CR4_PCIDE set. it causes
> reboot to fail. But it also appears that setting that is required for
> some other code to work.

IIRC, there was a problem in rc1, which should be resolved in newer rcs.
If you need to run rc1, you can try to use the kernel parameter “nopcid".

Nadav

Re: [BUG] GPF on reboot of box

2017-10-06 Thread Nadav Amit
Steven Rostedt  wrote:

> Starting with v4.14-rc1 my box crashes during a reboot. All I get is
> the following:
> 
> [  250.615793] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [  250.635639] reboot: Restarting system
> [  250.639671] reboot: machine restart
> [  250.643842] general protection fault:  [#1] PREEMPT SMP KASAN
> 
> Then nothing. No reboot, it just hangs there.
> 
> I bisected it down to this commit:
> 
> 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
> 
> If I comment out the setting of X86_CR4_PCIDE in that commit, it boots
> fine. But...
> 
> I noticed that commit c7ad5ad29 ("x86/mm/64: Initialize CR4.PCIDE
> early") moved the problem code from arch/x86/kernel/cpu/common.c to
> arch/x86/mm/init.c
> 
> If I comment it out in v4.14-rc1, it crashes on boot up.
> 
> It appears that my box doesn't like having X86_CR4_PCIDE set. it causes
> reboot to fail. But it also appears that setting that is required for
> some other code to work.

IIRC, there was a problem in rc1, which should be resolved in newer rcs.
If you need to run rc1, you can try to use the kernel parameter “nopcid".

Nadav