Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-07 Thread Thomas Gleixner
Ira,

On Mon, Dec 06 2021 at 17:54, Ira Weiny wrote:
> On Thu, Nov 25, 2021 at 03:12:47PM +0100, Thomas Gleixner wrote:
>> > +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
>> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
>> > +  /* add space for extended_pt_regs */
>> > +  subq$EXTENDED_PT_REGS_SIZE, %rsp
>> > +#endif
>> > +  .if \annotate_retpoline_safe == 1
>> > +  ANNOTATE_RETPOLINE_SAFE
>> > +  .endif
>> 
>> This annotation is new and nowhere mentioned why it is part of this
>> patch.
>
> I don't understand.  ANNOTATE_RETPOLINE_SAFE has been around since:
>
> 9e0e3c5130e9 x86/speculation, objtool: Annotate indirect calls/jumps
> for objtool

Sorry, I misread that macro maze. It's conditional obviously.

> I can split it if you prefer.  How about a patch with just the x86 extended
> pt_regs stuff but that would leave a zero size for the extended stuff?  Then
> followed by the pks bits?

Whatever makes sense and does one thing per patch.

>> I really have to ask the question whether this #ifdeffery has any value
>> at all. 8 bytes extra stack usage is not going to be the end of the
>> world and distro kernels will enable that config anyway.
>
> My goal with this has always been 0 overhead if turned off.  So this seemed
> like a logical addition.  Furthermore, ARCH_ENABLE_SUPERVISOR_PKEYS is
> predicated on ARCH_HAS_SUPERVISOR_PKEYS which is only available with x86_64.
> This removes the space for x86 when not needed.

The question is not about 64 vs. 32bit. The question is whether the
conditional makes sense for 64bit in the first place. Whether this
matters for 32bit has to be determined. It makes some sense, but less
#ifdeffery and less obfuscation makes sense too.

>> If we really want to save the space then certainly not by sprinkling
>> CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS all over the place and hiding the
>> extra sized ptregs in the pkrs header.
>> 
>> You are changing generic architecture code so you better think about
>> making such a change generic and extensible.
>
> I agree.  And I tried to do so.  The generic entry code is modified only by 
> the
> addition of pkrs_[save|restore]_irq().  These are only defined if the arch
> defines ARCH_HAS_SUPERVISOR_PKEYS and furthermore, if something triggers
> enabling ARCH_ENABLE_SUPERVISOR_PKEYS.

I'm talking about generic _architecture_ code, i.e. the code in
arch/x86/ which affects all vendors and systems.

> ARCH_HAS_SUPERVISOR_PKEYS is restricted to x86_64 at the moment.  All other
> arch's including x86 should not see any changes in the generic code.

That was not the question and I'm well aware of that.

>> If the next feature comes around which needs to save something in that
>> extended area then we are going to change the world again, right?
>
> I'm not sure what you mean by 'change the world'.  I would anticipate the 
> entry
> code to be modified with something similar to pks_[save|restore]_irq() and let
> the arch deal with the specifics.

If on X86 the next X86 specific feature comes around which needs extra
reg space then someone has to change world in arch/x86 again, replace
all the ARCH_ENABLE_SUPERVISOR_PKEYS #ifdefs with something else, right?

Instead of adding a new field to pt_regs_aux and be done with it.

> Also in [1] I thought Peter and Andy agreed that placing additional generic
> state in the extended pt_regs was not needed and does not buy us anything.  I
> specifically asked if that was something we wanted to do in.[2]

This was about a generic representation which affects the common entry
code in kernel/entry/... Can you spot the difference?

What I suggested is _solely_ x86 specific and does not trickle into
anything outside of arch/x86.

>> See? No magic hardcoded constant, no build time error checking for that
>> constant. Nothing, it just works.
>
> Yes agreed definitely an improvement.

It's the right thing to do.

>> That's one part, but let me come back to this:
>> 
>> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
>> > +  /* add space for extended_pt_regs */
>> > +  subq$EXTENDED_PT_REGS_SIZE, %rsp
>> 
>> What guarantees that RSP points to pt_regs at this point?  Nothing at
>> all. It's just pure luck and a question of time until this explodes in
>> hard to diagnose ways.
>
> It took me a bit to wrap my head around what I think you mean.  My initial
> response was that rsp should be the stack pointer for __call_ext_ptregs() just
> like it was for call.  But I think I see that it is better to open code this
> since others may want to play the same trick without using this code and
> therefore we may not be getting the extended pt_regs structure on the stack
> like we think.  For example if someone did...
>
>   movq%rsp, %rdi
>   RSP_ADD_OTHER_STACK_STUFF
>   __call_ext_ptregs   ...
>   RSP_REMOVE_OTHER_STACK_STUFF
>
> ... it would be broken.
>
> My assumption was that would be illegal after this patch.  But indeed there is
> no way to easily see that in the future.

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-06 Thread Ira Weiny
On Mon, Dec 06, 2021 at 05:54:23PM -0800, 'Ira Weiny' wrote:

[snip]

> > 
> > Though, if you look at the xen_pv_evtchn_do_upcall() part where you
> > added this extra invocation you might figure out that adding
> > pkrs_restore_irq() to irqentry_exit_cond_resched() and explicitely to
> > the 'else' path in irqentry_exit() makes it magically consistent for
> > both use cases.
> > 
> 
> Thank you, yes good catch.  However, I think I need at least 1 more call in 
> the
> !regs_irqs_disabled() && state.exit_rcu case right?
> 
> 11:29:48 > git di
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 717091910ebc..667676ebc129 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -363,8 +363,6 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct 
> pt_regs *regs)
>  
> inhcall = get_and_clear_inhcall();
> if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
> -   /* Normally called by irqentry_exit, restore pkrs here */
> -   pkrs_restore_irq(regs);
> irqentry_exit_cond_resched();
> instrumentation_end();
> restore_inhcall(inhcall);
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 1c0a70a17e93..60ae3d4c9cc0 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -385,6 +385,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs 
> *regs)
>  
>  void irqentry_exit_cond_resched(void)

Opps...  Of course regs will need to be passed in here now...

Ira

>  {
> +   pkrs_restore_irq(regs);
> if (!preempt_count()) {
> /* Sanity check RCU and thread stack */
> rcu_irq_exit_check_preempt();
> @@ -408,8 +409,6 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
> irqentry_state_t state)
> return;
> }
>  
> -   pkrs_restore_irq(regs);
> -
> if (!regs_irqs_disabled(regs)) {
> /*
>  * If RCU was not watching on entry this needs to be done
> @@ -421,6 +420,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
> irqentry_state_t state)
> /* Tell the tracer that IRET will enable interrupts */
> trace_hardirqs_on_prepare();
> lockdep_hardirqs_on_prepare(CALLER_ADDR0);
> +   pkrs_restore_irq(regs);
> instrumentation_end();
> rcu_irq_exit();
> lockdep_hardirqs_on(CALLER_ADDR0);
> @@ -439,6 +439,10 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
> irqentry_state_t state)
> trace_hardirqs_on();
> instrumentation_end();
> } else {
> +   instrumentation_begin();
> +   pkrs_restore_irq(regs);
> +   instrumentation_end();
> +
> /*
>  * IRQ flags state is correct already. Just tell RCU if it
>  * was not watching on entry.
> @@ -470,8 +474,8 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct 
> pt_regs *regs)
>  
>  void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t 
> irq_state)
>  {
> -   pkrs_restore_irq(regs);
> instrumentation_begin();
> +   pkrs_restore_irq(regs);
> ftrace_nmi_exit();
> if (irq_state.lockdep) {
> trace_hardirqs_on_prepare();
> 
> 
> Thank you again for the review,
> Ira
> 
> 
> [0] https://lore.kernel.org/lkml/20210804043231.2655537-5-ira.we...@intel.com/
> [1] 
> https://lore.kernel.org/lkml/20201217131924.gw3...@hirez.programming.kicks-ass.net/
> [2] 
> https://lore.kernel.org/lkml/20201216013202.gy1563...@iweiny-desk2.sc.intel.com/
> [3] https://lore.kernel.org/lkml/87y2hwqwng@nanos.tec.linutronix.de/
> 
> > Thanks,
> > 
> > tglx



Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-06 Thread Ira Weiny
Thomas,

Thanks for the review.  Sorry for being so late to respond I was sick all last
week and so it took me longer to figure out some of this stuff.

On Thu, Nov 25, 2021 at 03:12:47PM +0100, Thomas Gleixner wrote:
> Ira,
> 
> On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> > +/*
> > + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
> > + * @cfunc: C function to be called
> > + *
> > + * This will ensure that extended_ptregs is added and removed as needed 
> > during
> > + * a call into C code.
> > + */
> > +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> > +   /* add space for extended_pt_regs */
> > +   subq$EXTENDED_PT_REGS_SIZE, %rsp
> > +#endif
> > +   .if \annotate_retpoline_safe == 1
> > +   ANNOTATE_RETPOLINE_SAFE
> > +   .endif
> 
> This annotation is new and nowhere mentioned why it is part of this
> patch.

I don't understand.  ANNOTATE_RETPOLINE_SAFE has been around since:

9e0e3c5130e9 x86/speculation, objtool: Annotate indirect calls/jumps for objtool

> 
> Can you please do _ONE_ functional change per patch and not a
> unreviewable pile of changes in one go? Same applies for the ASM and the
> C code changes. The ASM change has to go first and then the C code can
> build upon it.

I'm sorry for having the ASM and C code together but this all seemed like 1
change to me.

I can split it if you prefer.  How about a patch with just the x86 extended
pt_regs stuff but that would leave a zero size for the extended stuff?  Then
followed by the pks bits?

> 
> > +   call\cfunc
> > +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> > +   /* remove space for extended_pt_regs */
> > +   addq$EXTENDED_PT_REGS_SIZE, %rsp
> > +#endif
> 
> I really have to ask the question whether this #ifdeffery has any value
> at all. 8 bytes extra stack usage is not going to be the end of the
> world and distro kernels will enable that config anyway.

My goal with this has always been 0 overhead if turned off.  So this seemed
like a logical addition.  Furthermore, ARCH_ENABLE_SUPERVISOR_PKEYS is
predicated on ARCH_HAS_SUPERVISOR_PKEYS which is only available with x86_64.
This removes the space for x86 when not needed.

All the config stuff was introduced in patch 04/18.[0]

> 
> If we really want to save the space then certainly not by sprinkling
> CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS all over the place and hiding the
> extra sized ptregs in the pkrs header.
> 
> You are changing generic architecture code so you better think about
> making such a change generic and extensible.

I agree.  And I tried to do so.  The generic entry code is modified only by the
addition of pkrs_[save|restore]_irq().  These are only defined if the arch
defines ARCH_HAS_SUPERVISOR_PKEYS and furthermore, if something triggers
enabling ARCH_ENABLE_SUPERVISOR_PKEYS.

ARCH_HAS_SUPERVISOR_PKEYS is restricted to x86_64 at the moment.  All other
arch's including x86 should not see any changes in the generic code.

I thought we had agreed that it was ok for me to restrict the addition of the
extended pt_regs to what was required for PKS when these changes were
discussed.  Because at the time I was concerned about my lack of knowledge of
all the other architectures.[1]

>
> Can folks please start
> thinking beyond the brim of their teacup and not pretend that the
> feature they are working on is the unicorn which requires unique magic
> brandnamed after the unicorn of the day.
> 
> If the next feature comes around which needs to save something in that
> extended area then we are going to change the world again, right?

I'm not sure what you mean by 'change the world'.  I would anticipate the entry
code to be modified with something similar to pks_[save|restore]_irq() and let
the arch deal with the specifics.

Also in [1] I thought Peter and Andy agreed that placing additional generic
state in the extended pt_regs was not needed and does not buy us anything.  I
specifically asked if that was something we wanted to do in.[2]

> Certainly not.
> 
> This wants to go into asm/ptrace.h:
> 
> struct pt_regs_aux {
>   u32 pkrs;
> };
> 
> struct pt_regs_extended {
>   struct pt_regs_aux  aux;
> struct pt_regsregs __attribute__((aligned(8)));
> };

Ok the aligned attribute does what I was doing much more gracefully.  This is a
good idea yes, thank you.

> 
> and then have in asm-offset:
> 
>DEFINE(PT_REGS_AUX_SIZE, sizeof(struct pt_regs_extended) - sizeof(struct 
> pt_regs));
> 
> which does the right thing whatever the size of pt_regs_aux is. So for
> the above it will have:
> 
>  #define PT_REGS_AUX_SIZE 8 /* sizeof(struct pt_regs_extended) - 
> sizeof(struct pt_regs) */
> 
> Even, if you do
> 
> struct pt_regs_aux {
> #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
>   u32 pkrs;
> #endif
> };
> 
> and the config switch is disabled. It's still correct:
> 
>  #define PT_REGS_AUX_SIZE 0 /* 

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-12-02 Thread Andy Lutomirski

On 11/12/21 16:50, Ira Weiny wrote:

On Tue, Aug 03, 2021 at 09:32:21PM -0700, 'Ira Weiny' wrote:

From: Ira Weiny 

The PKRS MSR is not managed by XSAVE.  It is preserved through a context
switch but this support leaves exception handling code open to memory
accesses during exceptions.

2 possible places for preserving this state were considered,
irqentry_state_t or pt_regs.[1]  pt_regs was much more complicated and
was potentially fraught with unintended consequences.[2]  However, Andy
came up with a way to hide additional values on the stack which could be
accessed as "extended_pt_regs".[3]


Andy,

I'm preparing to send V8 of this PKS work.  But I have not seen any feed back
since I originally implemented this in V4[1].

Does this meets your expectations?  Are there any issues you can see with this
code?


I think I'm generally okay with the approach to allocating space.  All 
of Thomas' comments still apply, though.  (Sorry, I'm horribly behind.)




Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-11-25 Thread Thomas Gleixner
Ira,

On Tue, Aug 03 2021 at 21:32, ira weiny wrote:
> +/*
> + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
> + * @cfunc:   C function to be called
> + *
> + * This will ensure that extended_ptregs is added and removed as needed 
> during
> + * a call into C code.
> + */
> +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* add space for extended_pt_regs */
> + subq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif
> + .if \annotate_retpoline_safe == 1
> + ANNOTATE_RETPOLINE_SAFE
> + .endif

This annotation is new and nowhere mentioned why it is part of this
patch.

Can you please do _ONE_ functional change per patch and not a
unreviewable pile of changes in one go? Same applies for the ASM and the
C code changes. The ASM change has to go first and then the C code can
build upon it.

> + call\cfunc
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* remove space for extended_pt_regs */
> + addq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif

I really have to ask the question whether this #ifdeffery has any value
at all. 8 bytes extra stack usage is not going to be the end of the
world and distro kernels will enable that config anyway.

If we really want to save the space then certainly not by sprinkling
CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS all over the place and hiding the
extra sized ptregs in the pkrs header.

You are changing generic architecture code so you better think about
making such a change generic and extensible. Can folks please start
thinking beyond the brim of their teacup and not pretend that the
feature they are working on is the unicorn which requires unique magic
brandnamed after the unicorn of the day.

If the next feature comes around which needs to save something in that
extended area then we are going to change the world again, right?
Certainly not.

This wants to go into asm/ptrace.h:

struct pt_regs_aux {
u32 pkrs;
};

struct pt_regs_extended {
struct pt_regs_aux  aux;
struct pt_regs  regs __attribute__((aligned(8)));
};

and then have in asm-offset:

   DEFINE(PT_REGS_AUX_SIZE, sizeof(struct pt_regs_extended) - sizeof(struct 
pt_regs));

which does the right thing whatever the size of pt_regs_aux is. So for
the above it will have:

 #define PT_REGS_AUX_SIZE 8 /* sizeof(struct pt_regs_extended) - sizeof(struct 
pt_regs) */

Even, if you do

struct pt_regs_aux {
#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
u32 pkrs;
#endif
};

and the config switch is disabled. It's still correct:

 #define PT_REGS_AUX_SIZE 0 /* sizeof(struct pt_regs_extended) - sizeof(struct 
pt_regs) */

See? No magic hardcoded constant, no build time error checking for that
constant. Nothing, it just works.

That's one part, but let me come back to this:

> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* add space for extended_pt_regs */
> + subq$EXTENDED_PT_REGS_SIZE, %rsp

What guarantees that RSP points to pt_regs at this point?  Nothing at
all. It's just pure luck and a question of time until this explodes in
hard to diagnose ways.

Because between

movq%rsp, %rdi
and
call

can legitimately be other code which causes the stack pointer to
change. It's not the case today, but nothing prevents this in the
future.

The correct thing to do is:

movq%rsp, %rdi
RSP_MAKE_PT_REGS_AUX_SPACE
call...
RSP_REMOVE_PT_REGS_AUX_SPACE

The few extra macro lines in the actual code are way better as they make
it completely obvious what's going on and any misuse can be spotted
easily.

> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> +/*
> + * PKRS is a per-logical-processor MSR which overlays additional protection 
> for
> + * pages which have been mapped with a protection key.
> + *
> + * Context switches save the MSR in the task struct thus taking that value to
> + * other processors if necessary.
> + *
> + * To protect against exceptions having access to this memory save the 
> current
> + * thread value and set the PKRS value to be used during the exception.
> + */
> +void pkrs_save_irq(struct pt_regs *regs)

That's a misnomer as this is invoked for _any_ exception not just
interrupts.

>  #ifdef CONFIG_XEN_PV
>  #ifndef CONFIG_PREEMPTION
>  /*
> @@ -309,6 +361,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct 
> pt_regs *regs)
>  
>   inhcall = get_and_clear_inhcall();
>   if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) {
> + /* Normally called by irqentry_exit, restore pkrs here */
> + pkrs_restore_irq(regs);
>   irqentry_exit_cond_resched();

Sigh. Consistency is overrated

> +
>  void setup_pks(void);
>  void pkrs_write_current(void);
>  void pks_init_task(struct task_struct *task);
> +void write_pkrs(u32 new_pkrs);

So we have pkrs_write_current() and write_pkrs() now. Can you please
stick to a common 

Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-11-25 Thread Thomas Gleixner
On Fri, Nov 12 2021 at 16:50, Ira Weiny wrote:
> On Tue, Aug 03, 2021 at 09:32:21PM -0700, 'Ira Weiny' wrote:
>> From: Ira Weiny 
>> 
>> The PKRS MSR is not managed by XSAVE.  It is preserved through a context
>> switch but this support leaves exception handling code open to memory
>> accesses during exceptions.
>> 
>> 2 possible places for preserving this state were considered,
>> irqentry_state_t or pt_regs.[1]  pt_regs was much more complicated and
>> was potentially fraught with unintended consequences.[2]  However, Andy
>> came up with a way to hide additional values on the stack which could be
>> accessed as "extended_pt_regs".[3]
>
> Andy,
>
> I'm preparing to send V8 of this PKS work.  But I have not seen any feed back
> since I originally implemented this in V4[1].
>
> Does this meets your expectations?  Are there any issues you can see with this
> code?
>
> I would appreciate your feedback.

Not Andy here, but I'll respond to the patch in a minute.

Thanks,

tglx



Re: [PATCH V7 08/18] x86/entry: Preserve PKRS MSR across exceptions

2021-11-12 Thread Ira Weiny
On Tue, Aug 03, 2021 at 09:32:21PM -0700, 'Ira Weiny' wrote:
> From: Ira Weiny 
> 
> The PKRS MSR is not managed by XSAVE.  It is preserved through a context
> switch but this support leaves exception handling code open to memory
> accesses during exceptions.
> 
> 2 possible places for preserving this state were considered,
> irqentry_state_t or pt_regs.[1]  pt_regs was much more complicated and
> was potentially fraught with unintended consequences.[2]  However, Andy
> came up with a way to hide additional values on the stack which could be
> accessed as "extended_pt_regs".[3]

Andy,

I'm preparing to send V8 of this PKS work.  But I have not seen any feed back
since I originally implemented this in V4[1].

Does this meets your expectations?  Are there any issues you can see with this
code?

I would appreciate your feedback.

Thanks,
Ira

[1] https://lore.kernel.org/lkml/20210322053020.2287058-9-ira.we...@intel.com/

> This method allows for; any place
> which has struct pt_regs can get access to the extra information; no
> extra information is added to irq_state; and pt_regs is left intact for
> compatibility with outside tools like BPF.
> 
> To simplify, the assembly code only adds space on the stack.  The
> setting or use of any needed values are left to the C code.  While some
> entry points may not use this space it is still added where ever pt_regs
> is passed to the C code for consistency.
> 
> Each nested exception gets another copy of this extended space allowing
> for any number of levels of exception handling.
> 
> In the assembly, a macro is defined to allow a central place to add
> space for other uses should the need arise.
> 
> Finally export pkrs_{save|restore}_irq to the common code to allow
> it to preserve the current task's PKRS in the new extended pt_regs if
> enabled.
> 
> Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or
> aided in the development of the patch..
> 
> [1] 
> https://lore.kernel.org/lkml/calcetrve1i5jdyzd_bcctxqjn+ze3t38efpgjxn1f577m36...@mail.gmail.com/
> [2] https://lore.kernel.org/lkml/874kpxx4jf@nanos.tec.linutronix.de/#t
> [3] 
> https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w...@mail.gmail.com/
> 
> Cc: Dave Hansen 
> Cc: Dan Williams 
> Suggested-by: Dave Hansen 
> Suggested-by: Dan Williams 
> Suggested-by: Peter Zijlstra 
> Suggested-by: Thomas Gleixner 
> Suggested-by: Andy Lutomirski 
> Signed-off-by: Ira Weiny 
> 
> ---
> Changes for V7:
>   Rebased to 5.14 entry code
>   declare write_pkrs() in pks.h
>   s/INIT_PKRS_VALUE/pkrs_init_value
>   Remove unnecessary INIT_PKRS_VALUE def
>   s/pkrs_save_set_irq/pkrs_save_irq/
>   The inital value for exceptions is best managed
>   completely within the pkey code.
> ---
>  arch/x86/entry/calling.h   | 26 +
>  arch/x86/entry/common.c| 54 ++
>  arch/x86/entry/entry_64.S  | 22 ++-
>  arch/x86/entry/entry_64_compat.S   |  6 +--
>  arch/x86/include/asm/pks.h | 18 +
>  arch/x86/include/asm/processor-flags.h |  2 +
>  arch/x86/kernel/head_64.S  |  7 ++--
>  arch/x86/mm/fault.c|  3 ++
>  include/linux/pkeys.h  | 11 +-
>  kernel/entry/common.c  | 14 ++-
>  10 files changed, 143 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index a4c061fb7c6e..a2f94677c3fd 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -63,6 +63,32 @@ For 32-bit we have the following conventions - kernel is 
> built with
>   * for assembly code:
>   */
>  
> +/*
> + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs
> + * @cfunc:   C function to be called
> + *
> + * This will ensure that extended_ptregs is added and removed as needed 
> during
> + * a call into C code.
> + */
> +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* add space for extended_pt_regs */
> + subq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif
> + .if \annotate_retpoline_safe == 1
> + ANNOTATE_RETPOLINE_SAFE
> + .endif
> + call\cfunc
> +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS
> + /* remove space for extended_pt_regs */
> + addq$EXTENDED_PT_REGS_SIZE, %rsp
> +#endif
> +.endm
> +
> +.macro call_ext_ptregs cfunc
> + __call_ext_ptregs \cfunc, annotate_retpoline_safe=0
> +.endm
> +
>  .macro PUSH_REGS rdx=%rdx rax=%rax save_ret=0
>   .if \save_ret
>   pushq   %rsi/* pt_regs->si */
> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
> index 6c2826417b33..a0d1d5519dba 100644
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_XEN_PV
>  #include 
> @@