On Tue, Jun 11, 2024 at 11:26:17AM -0700, H. Peter Anvin wrote:
> On 6/4/24 08:21, Kirill A. Shutemov wrote:
> > 
> >  From b45fe48092abad2612c2bafbb199e4de80c99545 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com>
> > Date: Fri, 10 Feb 2023 12:53:11 +0300
> > Subject: [PATCHv11.1 06/19] x86/kexec: Keep CR4.MCE set during kexec for 
> > TDX guest
> > 
> > TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If
> > that bit is cleared during CR4 register reprogramming during boot or
> > kexec flows, a #VE exception will be raised which the guest kernel
> > cannot handle it.
> > 
> > Therefore, make sure the CR4.MCE setting is preserved over kexec too and
> > avoid raising any #VEs.
> > 
> > The change doesn't affect non-TDX-guest environments.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com>
> > ---
> >   arch/x86/kernel/relocate_kernel_64.S | 17 ++++++++++-------
> >   1 file changed, 10 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/relocate_kernel_64.S 
> > b/arch/x86/kernel/relocate_kernel_64.S
> > index 085eef5c3904..9c2cf70c5f54 100644
> > --- a/arch/x86/kernel/relocate_kernel_64.S
> > +++ b/arch/x86/kernel/relocate_kernel_64.S
> > @@ -5,6 +5,8 @@
> >    */
> >   #include <linux/linkage.h>
> > +#include <linux/stringify.h>
> > +#include <asm/alternative.h>
> >   #include <asm/page_types.h>
> >   #include <asm/kexec.h>
> >   #include <asm/processor-flags.h>
> > @@ -145,14 +147,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> >      * Set cr4 to a known state:
> >      *  - physical address extension enabled
> >      *  - 5-level paging, if it was enabled before
> > +    *  - Machine check exception on TDX guest, if it was enabled before.
> > +    *    Clearing MCE might not be allowed in TDX guests, depending on 
> > setup.
> > +    *
> > +    * Use R13 that contains the original CR4 value, read in 
> > relocate_kernel().
> > +    * PAE is always set in the original CR4.
> >      */
> > -   movl    $X86_CR4_PAE, %eax
> > -   testq   $X86_CR4_LA57, %r13
> > -   jz      .Lno_la57
> > -   orl     $X86_CR4_LA57, %eax
> > -.Lno_la57:
> > -
> > -   movq    %rax, %cr4
> > +   andl    $(X86_CR4_PAE | X86_CR4_LA57), %r13d
> > +   ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %r13d), 
> > X86_FEATURE_TDX_GUEST
> > +   movq    %r13, %cr4
> 
> If this is the case, I don't really see a reason to clear MCE per se as I'm
> guessing a machine check here will be fatal anyway? It just changes the
> method of death.

Andrew had a strong opinion on method of death here.

https://lore.kernel.org/all/1144340e-dd95-ee3b-dabb-579f9a65b...@citrix.com

> Also, is there a reason to save %cr4, run code, and *then* clear the
> relevant bits? Wouldn't it be better to sanitize %cr4 as soon as possible?

You mean set new CR4 directly in relocate_kernel() before switching CR3?
I guess it is possible.

But I can say I see huge benefit of changing it. Such change would have
own risks.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Reply via email to