from:"Borislav Petkov"

Re: [PATCH v13 08/35] x86/fred: Disable FRED by default in its early stage

2024-01-22 Thread Borislav Petkov

On Tue, Dec 05, 2023 at 02:49:57AM -0800, Xin Li wrote:
>   Warning: use of this parameter will taint the kernel
>   and may cause unknown problems.
>  
> + fred[X86-64]
> + Enable flexible return and event delivery

Let's make it accept multiple options from the get-go:

fred=on,disable-when,foo,bar,bla...

in case we need to tweak its behavior.

If it is only "fred" it will propagate this way downstream and it'll
lead to confusion later when people have to update their scripts and
config files when "fred" alone doesn't do what they're expecting
anymore.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 07/35] x86/fred: Disable FRED support if CONFIG_X86_FRED is disabled

2024-01-22 Thread Borislav Petkov

On Tue, Dec 05, 2023 at 02:49:56AM -0800, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> Add CONFIG_X86_FRED to  to make
> cpu_feature_enabled() work correctly with FRED.
> 
> Originally-by: Megha Dey 
> Signed-off-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
> 
> Changes since v10:
> * FRED feature is defined in cpuid word 12, not 13 (Nikolay Borisov).
> ---
>  arch/x86/include/asm/disabled-features.h   | 8 +++-
>  tools/arch/x86/include/asm/disabled-features.h | 8 +++-
>  2 files changed, 14 insertions(+), 2 deletions(-)

Whoever applies this: this one and the previous one can be merged into
one patch.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support

2024-01-03 Thread Borislav Petkov

On Tue, Jan 02, 2024 at 10:06:27PM +, Li, Xin3 wrote:
> Do I need to send an updated patch?

> Or just leave it to the maintainer who is going to take care of it?

While waiting, please take a look at this:

https://kernel.org/doc/html/latest/process/submitting-patches.html#don-t-get-discouraged-or-impatient

Might want to read the whole doc too.

But to answer your question: you wait a few weeks and collect all
comments and review feedback that you've received and incorporate them
into the patchset.

Then, after the time passes you send a new revision and explain in the
0th message what has changed.

Ok?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS instruction support

2024-01-02 Thread Borislav Petkov

On Tue, Dec 05, 2023 at 02:49:50AM -0800, Xin Li wrote:

> Subject: Re: [PATCH v13 01/35] x86/cpufeatures,opcode,msr: Add the WRMSRNS 
> instruction support

Or simply "x86/fred: Add ... "

Other than that,

Acked-by: Borislav Petkov (AMD) 

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 24/37] x86/idtentry: Incorporate definitions/declarations of the FRED entries

2023-11-28 Thread Borislav Petkov

On Tue, Nov 28, 2023 at 10:58:50AM -0800, H. Peter Anvin wrote:
> >You have a very good sense 😊

I've been reading code of a couple of people for a couple of years. :-)

> Remember that Signed-off-by: relates to the *patch flow*.

Yes, you should take the time to read Documentation/process/ and
especially submitting-patches.rst.

We have it all written down and I point new(-er) people at that
directory at least once a week. :-\

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 24/37] x86/idtentry: Incorporate definitions/declarations of the FRED entries

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:45PM -0700, Xin Li wrote:
> FRED and IDT can share most of the definitions and declarations so
> that in the majority of cases the actual handler implementation is the
> same.
> 
> The differences are the exceptions where FRED stores exception related
> information on the stack and the sysvec implementations as FRED can
> handle irqentry/exit() in the dispatcher instead of having it in each
> handler.
> 
> Also add stub defines for vectors which are not used due to Kconfig
> decisions to spare the ifdeffery in the actual FRED dispatch code.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Xin Li 

This makes me wonder too who the author is. The commit message text
sounds like tglx. :)

> @@ -137,6 +141,17 @@ static __always_inline void __##func(struct pt_regs 
> *regs,   \
>  #define DEFINE_IDTENTRY_RAW(func)\
>  __visible noinstr void func(struct pt_regs *regs)
>  
> +/**
> + * DEFINE_FREDENTRY_RAW - Emit code for raw FRED entry points

LOL, "FREDENTRY"

...

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 20/37] x86/fred: Disallow the swapgs instruction when FRED is enabled

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:41PM -0700, Xin Li wrote:
> +  * Note, LKGS loads the GS base address into the IA32_KERNEL_GS_BASE
> +  * MSR instead of the GS segment’s descriptor cache. As such, the

:verify_diff: WARNING: Unicode char [’] (0x8217 in line: + * MSR instead of 
the GS segment’s descriptor cache. As such, the

Just do a normal ' - char number 0x27.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 16/37] x86/ptrace: Add FRED additional information to the pt_regs structure

2023-11-28 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:37PM -0700, Xin Li wrote:
> FRED defines additional information in the upper 48 bits of cs/ss
> fields. Therefore add the information definitions into the pt_regs
> structure.
> 
> Specially introduce a new structure fred_ss to denote the FRED flags
> above SS selector, which avoids FRED_SSX_ macros and makes the code
> simpler and easier to read.
> 
> Signed-off-by: H. Peter Anvin (Intel) 

You and hpa need to go through all the patches and figure out who's the
author that's going to land in git.

Because this and others have hpa's SOB first, suggesting he's the
author. However, the mail doesn't start with

From: H. Peter Anvin (Intel) 

and then git will make *you* the author.

> Tested-by: Shan Kang 
> Signed-off-by: Thomas Gleixner 
> Signed-off-by: Xin Li 

...

>   union {
> - u64 ssx;// The full 64-bit data slot containing SS
> - u16 ss; // SS selector
> + /* SS selector */
> + u16 ss;
> + /* The extended 64-bit data slot containing SS */
> + u64 ssx;
> + /* The FRED SS extension */
> + struct fred_ss  fred_ss;

Aha, sanity about the right comments has come to your mind in this next
patch. :-P

Just do them right in the previous one.

>   /*
> -  * Top of stack on IDT systems.
> +  * Top of stack on IDT systems, while FRED systems have extra fields
> +  * defined above for storing exception related information, e.g. CR2 or
> +  * DR6.

Btw, I really appreciate the good commenting - thanks for that!

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

[PATCH] docs: submitting-patches: improve the base commit explanation

2023-11-15 Thread Borislav Petkov

From: "Borislav Petkov (AMD)" 

After receiving a second patchset this week without knowing which tree
it applies on and trying to apply it on the obvious ones and failing,
make sure the base tree information which needs to be supplied in the
0th message of the patchset is spelled out more explicitly.

Also, make the formulations stronger as this really is a requirement and
not only a useful thing anymore.

Signed-off-by: Borislav Petkov (AMD) 
---
 Documentation/process/submitting-patches.rst | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/process/submitting-patches.rst 
b/Documentation/process/submitting-patches.rst
index 86d346bcb8ef..6602b587 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -790,10 +790,14 @@ Providing base tree information
 ---
 
 When other developers receive your patches and start the review process,
-it is often useful for them to know where in the tree history they
-should place your work. This is particularly useful for automated CI
-processes that attempt to run a series of tests in order to establish
-the quality of your submission before the maintainer starts the review.
+it is absolutely necessary for them to know what is the base
+commit/branch your work applies on, considering the sheer amount of
+maintainer trees present nowadays. Note again the **T:** entry in the
+MAINTAINERS file explained above.
+
+This is even more important for automated CI processes that attempt to
+run a series of tests in order to establish the quality of your
+submission before the maintainer starts the review.
 
 If you are using ``git format-patch`` to generate your patches, you can
 automatically include the base tree information in your submission by
@@ -836,6 +840,9 @@ letter or in the first patch of the series and it should be 
placed
 either below the ``---`` line or at the very bottom of all other
 content, right before your email signature.
 
+Make sure that base commit is in an official maintainer/mainline tree
+and not in some internal, accessible only to you tree - otherwise it
+would be worthless.
 
 References
 --
-- 
2.42.0.rc0.25.ga82fb66fed25

Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

2023-11-13 Thread Borislav Petkov

On Tue, Nov 14, 2023 at 12:43:38AM +, Li, Xin3 wrote:
> No.  tglx asked for it:
> https://lkml.kernel.org/kvm/87y1h81ht4.ffs@tglx/

Aha

"According to the CPU folks FRED systems are guaranteed to have WRMSRNS -
I asked for that :). It's just not yet documented."

so I'm going to expect that to appear in the next FRED spec revision...

> Because we are doing 
>   wrmsrns(MSR_IA32_FRED_RSP0, ...)
> here, and X86_FEATURE_WRMSRNS doesn't guarantee MSR_IA32_FRED_RSP0 exists.
> 
> Or I missed something?

Well, according to what I'm hearing and reading so far:

FRED means WRMSRNS
FRED means MSR_IA32_FRED_RSP0

and if you had to be precise, the code should do:

if (cpu_feature_enabled(X86_FEATURE_FRED)) {
if (cpu_feature_enabled(X86_FEATURE_WRMSRNS))
wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
else
wrmsr(MSR_IA32_FRED_RSP0, (unsigned 
long)task_stack_page(task) + THREAD_SIZE);
}

but apparently FRED implies WRMSRNS - not documented anywhere currently
- so you can save yourself one check.

But your version checks FRED if it can do WRMSRNS while there's
a separate WRMSRNS flag and that made me wonder...

> Another patch set should replace WRMSR with WRMSRNS, with SERIALIZE added
> when needed.

I sense someone wants to optimize MSR writes ... :-)

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 19/37] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch

2023-11-13 Thread Borislav Petkov

On Mon, Nov 13, 2023 at 12:36:04PM -0500, H. Peter Anvin wrote:
> A resource cannot be consumed after the value has been written; this
> is the only necessary level of serialization, equivalent to, say, RAX.

Lemme see if I understand this correctly using this context as an
example: after this MSR_IA32_FRED_RSP0 write, any FRED events determined
to be delivered to level 0 will use this new task stack ptr?

And since the new task is not running yet and the old one isn't running
either, we're fine here. So the "serialization point" I was talking
about above is bollocks.

Close? :)

> A serializing instruction stops the entire pipeline until everything
> has retired and any stores have become globally visible.

Right, we don't need that here.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 19/37] x86/fred: Update MSR_IA32_FRED_RSP0 during task switch

2023-11-13 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:40PM -0700, Xin Li wrote:
> From: "H. Peter Anvin (Intel)" 
> 
> MSR_IA32_FRED_RSP0 is used during ring 3 event delivery, and needs to
> be updated to point to the top of next task stack during task switch.
> 
> Signed-off-by: H. Peter Anvin (Intel) 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/switch_to.h | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/switch_to.h 
> b/arch/x86/include/asm/switch_to.h
> index f42dbf17f52b..c3bd0c0758c9 100644
> --- a/arch/x86/include/asm/switch_to.h
> +++ b/arch/x86/include/asm/switch_to.h
> @@ -70,9 +70,13 @@ static inline void update_task_stack(struct task_struct 
> *task)
>  #ifdef CONFIG_X86_32
>   this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
>  #else
> - /* Xen PV enters the kernel on the thread stack. */
> - if (cpu_feature_enabled(X86_FEATURE_XENPV))
> + if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> + /* WRMSRNS is a baseline feature for FRED. */
> + wrmsrns(MSR_IA32_FRED_RSP0, (unsigned 
> long)task_stack_page(task) + THREAD_SIZE);

If this non-serializing write happens now and, AFAICT, the CR3 write
during the task switch has already happened in switch_mm* earlier, what
is the serialization point that's going to make sure that write is
committed before the new task starts executing?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for WRMSRNS

2023-11-08 Thread Borislav Petkov

On Mon, Oct 02, 2023 at 11:24:22PM -0700, Xin Li wrote:
> Subject: Re: [PATCH v12 01/37] x86/cpufeatures: Add the cpu feature bit for 
> WRMSRNS
  

For all your text:

s/cpu/CPU/g

> WRMSRNS is an instruction that behaves exactly like WRMSR, with
> the only difference being that it is not a serializing instruction
> by default. Under certain conditions, WRMSRNS may replace WRMSR to
> improve performance.
> 
> Add the CPU feature bit for WRMSRNS.
> 
> Tested-by: Shan Kang 
> Signed-off-by: Xin Li 
> ---
>  arch/x86/include/asm/cpufeatures.h   | 1 +
>  tools/arch/x86/include/asm/cpufeatures.h | 1 +
>  2 files changed, 2 insertions(+)

It looks to me like you can merge the first three patches into one as
all they do is add that insn support.

Then, further down in the patchset, it says:

+   if (cpu_feature_enabled(X86_FEATURE_FRED)) {
+   /* WRMSRNS is a baseline feature for FRED. */

but WRMSRNS is not mentioned in the FRED spec "Document Number:
346446-005US, Revision: 5.0" which, according to

https://www.intel.com/content/www/us/en/content-details/780121/flexible-return-and-event-delivery-fred-specification.html

is the latest.

Am I looking at the wrong one?

> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 58cb9495e40f..330876d34b68 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -322,6 +322,7 @@
>  #define X86_FEATURE_FSRS (12*32+11) /* "" Fast short REP STOSB */
>  #define X86_FEATURE_FSRC (12*32+12) /* "" Fast short REP 
> {CMPSB,SCASB} */
>  #define X86_FEATURE_LKGS (12*32+18) /* "" Load "kernel" 
> (userspace) GS */
> +#define X86_FEATURE_WRMSRNS  (12*32+19) /* "" Non-Serializing Write 
> to Model Specific Register instruction */

  /* "" Non-serializing WRMSR */

is more than enough.

And now I'm wondering: when you're adding a separate CPUID bit, then the
above should be

+   if (cpu_feature_enabled(X86_FEATURE_WRMSRNS)) {
+   /* WRMSRNS is a baseline feature for FRED. */

I see that you're adding a dependency:

+   { X86_FEATURE_FRED, X86_FEATURE_WRMSRNS   },

which then means you don't need the X86_FEATURE_WRMSRNS definition at
all and can use X86_FEATURE_FRED only.

So, what's up?

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Borislav Petkov

On Wed, Aug 28, 2019 at 02:29:13PM +0200, Pavel Machek wrote:
> This is not a way to have an inteligent conversation.

No, this *is* the way to keep the conversation sane, without veering
off into some absurd claims.

So, to cut to the chase: you can simply add "rdrand=force" to your
cmdline parameters and get back to using RDRAND.

And yet if you still feel this fix does not meet your expectations,
you were told already to either produce patches or who to contact. I'm
afraid complaining on this thread won't get you anywhere but that's your
call.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Borislav Petkov

On Wed, Aug 28, 2019 at 02:09:36PM +0200, Pavel Machek wrote:
> Yes, and now AMD has patch to break it on *all* machines.

It doesn't break all machines - you need to look at that patch again.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 4.19 72/98] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-28 Thread Borislav Petkov

On Wed, Aug 28, 2019 at 01:49:47PM +0200, Pavel Machek wrote:
> AMD screwed this up,

Except that it wasn't AMD who screwed up but BIOS on *some* laptops.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [tip: x86/urgent] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-24 Thread Borislav Petkov

On Sat, Aug 24, 2019 at 09:50:28AM -0400, Sasha Levin wrote:
> Why is this being removed (along with supporting code)?

This was only a temporary bug in the new tip-bot which is fixed now. The
commit in tip is fine:

c49a0a80137c ("x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h")

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 03/27] x86/fpu/xstate: Change names to separate XSAVES system and user states

2019-08-21 Thread Borislav Petkov

On Tue, Aug 13, 2019 at 01:52:01PM -0700, Yu-cheng Yu wrote:
> Control-flow Enforcement (CET) MSR contents are XSAVES system states.
> To support CET, introduce XSAVES system states first.
> 
> XSAVES is a "supervisor" instruction and, comparing to XSAVE, saves
> additional "supervisor" states that can be modified only from CPL 0.
> However, these states are per-task and not kernel's own.  Rename
> "supervisor" states to "system" states to clearly separate them from
> "user" states.
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/fpu/internal.h |  4 +-
>  arch/x86/include/asm/fpu/xstate.h   | 20 +++
>  arch/x86/kernel/fpu/init.c  |  2 +-
>  arch/x86/kernel/fpu/signal.c| 10 ++--
>  arch/x86/kernel/fpu/xstate.c| 86 ++---
>  5 files changed, 60 insertions(+), 62 deletions(-)

...

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index e5cb67d67c03..d560e8861a3c 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -54,13 +54,16 @@ static short xsave_cpuid_features[] __initdata = {
>  };
>  
>  /*
> - * Mask of xstate features supported by the CPU and the kernel:
> + * XSAVES system states can only be modified from CPL 0 and saved by
> + * XSAVES.  The rest are user states.  The following is a mask of
> + * supported user state features derived from boot_cpu_has() and

...derived from detected CPUID feature flags and
SUPPORTED_XFEATURES_MASK.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 02/27] x86/cpufeatures: Add CET CPU feature flags for Control-flow Enforcement Technology (CET)

2019-08-21 Thread Borislav Petkov

On Tue, Aug 13, 2019 at 01:52:00PM -0700, Yu-cheng Yu wrote:
> Add CPU feature flags for Control-flow Enforcement Technology (CET).
> 
> CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack
> CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect branch tracking
> 
> Reviewed-by: Borislav Petkov 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/cpufeatures.h | 2 ++
>  arch/x86/kernel/cpu/cpuid-deps.c   | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index e880f2408e29..122265ab46c1 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -334,6 +334,7 @@
>  #define X86_FEATURE_OSPKE(16*32+ 4) /* OS Protection Keys Enable 
> */
>  #define X86_FEATURE_WAITPKG  (16*32+ 5) /* UMONITOR/UMWAIT/TPAUSE 
> Instructions */
>  #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector 
> Bit Manipulation Instructions */
> +#define X86_FEATURE_SHSTK(16*32+ 7) /* Shadow Stack */
>  #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New 
> Instructions */
>  #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */
>  #define X86_FEATURE_VPCLMULQDQ   (16*32+10) /* Carry-Less 
> Multiplication Double Quadword */
> @@ -358,6 +359,7 @@
>  #define X86_FEATURE_MD_CLEAR (18*32+10) /* VERW clears CPU buffers */
>  #define X86_FEATURE_TSX_FORCE_ABORT  (18*32+13) /* "" TSX_FORCE_ABORT */
>  #define X86_FEATURE_PCONFIG  (18*32+18) /* Intel PCONFIG */
> +#define X86_FEATURE_IBT  (18*32+20) /* Indirect Branch 
> Tracking */
>  #define X86_FEATURE_SPEC_CTRL(18*32+26) /* "" Speculation 
> Control (IBRS + IBPB) */
>  #define X86_FEATURE_INTEL_STIBP  (18*32+27) /* "" Single Thread 
> Indirect Branch Predictors */
>  #define X86_FEATURE_FLUSH_L1D(18*32+28) /* Flush L1D cache */
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c 
> b/arch/x86/kernel/cpu/cpuid-deps.c
> index b5353244749b..9bf35f081080 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -68,6 +68,8 @@ static const struct cpuid_dep cpuid_deps[] = {
>   { X86_FEATURE_CQM_MBM_TOTAL,X86_FEATURE_CQM_LLC   },
>   { X86_FEATURE_CQM_MBM_LOCAL,X86_FEATURE_CQM_LLC   },
>   { X86_FEATURE_AVX512_BF16,  X86_FEATURE_AVX512VL  },
> + { X86_FEATURE_SHSTK,X86_FEATURE_XSAVES},
> + { X86_FEATURE_IBT,  X86_FEATURE_XSAVES},

This hunk needs re-tabbing after:

1e0c08e3034d ("cpu/cpuid-deps: Add a tab to cpuid dependent features")

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-17 Thread Borislav Petkov

On Thu, Aug 15, 2019 at 10:25:24PM +0100, Andrew Cooper wrote:
> I'm afraid that a number of hypervisors do write-discard, given the
> propensity of OSes (certainly traditionally) to go poking at bits like
> this without wrmsr_safe().
> 
> You either need to read the MSR back and observe that the bit has really
> changed, or in this case as Thomas suggests, look at CPUID again (which
> will likely be the faster option for the non-virtualised case).

One thing I didn't think of when we talked about this: this happens only
after you resume the hypervisor. And the words "resume the hypervisor"
already means an improbable use case. Yeah, yeah, one can close the
laptop lid of her/his F15h or F16h machine while guests are running and
when the HV resumes, those guests won't get randomness but I can't seem
to find it in myself to say, uuh, that's an important use case...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-15 Thread Borislav Petkov

On Thu, Aug 15, 2019 at 09:59:03PM +0100, Andrew Cooper wrote:
> If you're virtualised, the write to MSR_AMD64_CPUID_FN_1 almost
> certainly won't take effect, which means userspace will still be able to
> see the bit.

msr_clear_bit() has a return value. We should check it before
doing anything further. I hope the HV actually signals the write
success/failure properly so that we get a correct return value.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-15 Thread Borislav Petkov

On Thu, Aug 15, 2019 at 01:47:24PM +, Lendacky, Thomas wrote:
> Sure, I can do that. Do we want to tie this into the nordrand option and
> add rdrand=off or keep that separate?

Yeah, I was looking at that this morning and I'd say keep 'em separate
because if you have to tie, you need to export functions and then
there's

setup_clear_cpu_cap(X86_FEATURE_RDSEED);

in the nordrand callback but then F15h and F16h don't have RDSEED and
people would wonder, why clear RDSEED on AMD, blabla... so keeping them
separate saves us all that.

> I think this is a clearer indication that the action has taken place.

Yeah, but what does that bring us? You wanna know this now, while
testing. Once that whole effort is done, it is a useless printing of
info which you have in cpuinfo already.

> Not sure what you mean. We can't use the DMI stuff for this. So now, with
> the x86 family checks, if anyone adds some DMI stuff or x86 family stuff
> in the future that matches both the DMI and x86 family checks, this will
> be called more than once and so you need to copy any previous settings and
> add the new ones.

I had a suspicion that it was something like that. Ok, this is not a
big structure currently so I guess it is fine but if it keeps growing,
it would need a proper redesign like making it a list and callbacks
doing list_add_tail() for MSRs which get added. It would avoid that
kmalloc and copying which is silly. Please put a comment ontop why we're
copying.

> Except that X86_FEATURE_RDRAND isn't set anymore. I could create a new
> software feature that is set when the CPUID bit is cleared if that's
> preferred.

Nah, let's leave it like you had it.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH] x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h

2019-08-15 Thread Borislav Petkov

On Wed, Aug 14, 2019 at 09:17:41PM +, Lendacky, Thomas wrote:
> From: Tom Lendacky 
> 
> There have been reports of RDRAND issues after resuming from suspend on
> some AMD family 15h and family 16h systems. This issue stems from BIOS
> not performing the proper steps during resume to ensure RDRAND continues
> to function properly.

If this happens only during suspend/resume, this probably should
be done only on configurations which have CONFIG_SUSPEND and/or
CONFIG_HIBERNATION enabled. I'm assuming BIOS does init it properly
at least during boot - I mean, they should've passed some sort of a
certification.

OTOH, if the breakage happens on resume, they clearly didn't test the
BIOS suspend/resume. I mean, I'm not at all surprised - it is f*cking
BIOS. News at 11.

> RDRAND support is indicated by CPUID Fn0001_ECX[30]. This bit can be
> reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
> support using CPUID, including the kernel,  will believe that RDRAND is
> not supported.
> 
> Update the CPU initialization to clear the RDRAND CPUID bit for any family
> 15h and 16h processor that supports RDRAND. If it is known that the family
> 15h or family 16h system does not have an RDRAND resume issue or that the
> system will not be placed in suspend, the "rdrand_force" kernel parameter
> can be used to stop the clearing of the RDRAND CPUID bit.
> 
> Additionally, update the suspend and resume path to save and restore the
> MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
> place after resuming from suspend.
> 
> Note, that clearing the RDRAND CPUID bit does not prevent a processor
> that normally supports the RDRAND instruction from executing the RDRAND
> instruction. So any code that determined the support based on family and
> model won't #UD.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  .../admin-guide/kernel-parameters.txt |  8 ++
>  arch/x86/include/asm/msr-index.h  |  1 +
>  arch/x86/kernel/cpu/amd.c | 42 ++
>  arch/x86/power/cpu.c  | 83 ---
>  4 files changed, 121 insertions(+), 13 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 47d981a86e2f..f47eb33958c1 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4090,6 +4090,14 @@
>   Run specified binary instead of /init from the ramdisk,
>   used for early userspace startup. See initrd.
>  
> + rdrand_force[X86]
> + On certain AMD processors, the advertisement of the
> + RDRAND instruction has been disabled by the kernel
> + because of buggy BIOS support, specifically around the
> + suspend/resume path. This option allows for overriding
> + that decision if it is known that the BIOS support for
> + RDRAND is not buggy on the system.
> +
>   rdt=[HW,X86,RDT]
>   Turn on/off individual RDT features. List is:
>   cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
> diff --git a/arch/x86/include/asm/msr-index.h 
> b/arch/x86/include/asm/msr-index.h
> index 6b4fc2788078..29ae2b66b9e9 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -381,6 +381,7 @@
>  #define MSR_AMD64_PATCH_LEVEL0x008b
>  #define MSR_AMD64_TSC_RATIO  0xc104
>  #define MSR_AMD64_NB_CFG 0xc001001f
> +#define MSR_AMD64_CPUID_FN_0001  0xc0011004

I know the PPR has all the 0s but let's write it

MSR_AMD64_CPUID_FN_1

so that it is readable in the kernel.

>  #define MSR_AMD64_PATCH_LOADER   0xc0010020
>  #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140
>  #define MSR_AMD64_OSVW_STATUS0xc0010141
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 3afe07d602dd..86ff1464302b 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -804,6 +804,40 @@ static void init_amd_ln(struct cpuinfo_x86 *c)
>   msr_set_bit(MSR_AMD64_DE_CFG, 31);
>  }
>  
> +static bool rdrand_force;
> +
> +static int __init rdrand_force_cmdline(char *str)
> +{
> + rdrand_force = true;
> +
> + return 0;
> +}
> +early_param("rdrand_force", rdrand_force_cmdline);

Let's make this a more generic param:

rdrand=force[, ...]

in case we wanna add some more opts here later.

> +
> +static void init_hide_rdrand(struct cpuinfo_x86 *c)

clear_rdrand_cpuid_bit()

is what this function does.

> +{
> + /*
> +  * The nordrand option can clear X86_FEATURE_RDRAND, so check for
> +  * RDRAND support using the CPUID function directly.
> +  */
> + if (!(cpuid_ecx(1) & BIT(30)) || rdrand_force)
> + return;
> +
> + ms

Re: [PATCH v2] replace timeconst bc script with an sh script

2019-06-20 Thread Borislav Petkov

On Thu, Jun 20, 2019 at 04:29:19AM -0400, Ethan Sommer wrote:
> Ah sorry about that, I accidentally replied to Kieran only instead of to
> all, my response was "I will upload a patch with those issues fixed
> shortly, in terms of the dependency as far as I know commands only required
> for running tests don't count as kernel compilation dependencies, and I
> don't see any other uses of bc except for Documentation/EDID/Makefile, so I
> believe that bc can be removed from the kernel compilation section of the
> process document and will include that change with the updated patch that
> fixes the 2 issues you pointed out."

Sounds like parts of it should be in your commit message as a
justification *why* you're doing it. You can do that for your next
revision once you've waited a couple of days to gather feedback.

Also, please do not top-post.

Thx.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG 
Nürnberg)

Re: [PATCH v2] replace timeconst bc script with an sh script

2019-06-20 Thread Borislav Petkov

On Thu, Jun 20, 2019 at 04:11:32AM -0400, Ethan Sommer wrote:
> removes the bc build dependency introduced when timeconst.pl was
> replaced by timeconst.bc:
> 70730bca1331 ("kernel: Replace timeconst.pl with a bc script")

I don't see you answering Kieran's questions anywhere...

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah, HRB 21284 (AG 
Nürnberg)

Re: [PATCH 3/3] LICENSES: Rename other to deprecated

2019-05-14 Thread Borislav Petkov

On Tue, May 14, 2019 at 03:49:43PM +0200, Christoph Hellwig wrote:
> On Tue, May 14, 2019 at 12:26:32PM +0200, Borislav Petkov wrote:
> > This breaks scripts/spdxcheck.py, it needs below hunk. Also, should
> > "dual" be added to license_dirs too?
> 
> Yes.  In fact two people already submitted patches for that before
> you, just waiting for them to get picked up.

Yeah, it is definitely worth mentioning how minor stuff like that gets
noticed and fixed immediately in recent times. We've become better... :-)

Thx!

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 3/3] LICENSES: Rename other to deprecated

2019-05-14 Thread Borislav Petkov

On Tue, Apr 30, 2019 at 06:51:30AM -0400, Christoph Hellwig wrote:
> Make it clear in the directory name that these are not intended for new
> code.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  Documentation/process/license-rules.rst | 8 
>  LICENSES/{other => deprecated}/GPL-1.0  | 0
>  LICENSES/{other => deprecated}/ISC  | 0
>  LICENSES/{other => deprecated}/Linux-OpenIB | 0
>  LICENSES/{other => deprecated}/X11  | 0
>  5 files changed, 4 insertions(+), 4 deletions(-)
>  rename LICENSES/{other => deprecated}/GPL-1.0 (100%)
>  rename LICENSES/{other => deprecated}/ISC (100%)
>  rename LICENSES/{other => deprecated}/Linux-OpenIB (100%)
>  rename LICENSES/{other => deprecated}/X11 (100%)

This breaks scripts/spdxcheck.py, it needs below hunk. Also, should
"dual" be added to license_dirs too?

diff --git a/scripts/spdxcheck.py b/scripts/spdxcheck.py
index 4fe392e507fb..1a39b34588b7 100755
--- a/scripts/spdxcheck.py
+++ b/scripts/spdxcheck.py
@@ -32,7 +32,7 @@ class SPDXdata(object):
 def read_spdxdata(repo):
 
 # The subdirectories of LICENSES in the kernel source
-license_dirs = [ "preferred", "other", "exceptions" ]
+license_dirs = [ "preferred", "deprecated", "exceptions" ]
 lictree = repo.head.commit.tree['LICENSES']
 
 spdx = SPDXdata()

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-15 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 03:49:14PM -0500, Dave Anderson wrote:
> Yeah, I've been watching the thread, and the document looks fine to me.
> It's just that when I saw the discussion of this one being removed that
> I felt the need to respond...  ;-)

Good. :-)

Ok, I've amended the init_uts_ns.name.release description with the
additional important info.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 03:26:32PM -0500, Dave Anderson wrote:
> No.  It needs *both* the vmlinux file and the vmcore file in order to read 
> kernel
> virtual memory, so just having a kernel virtual address is insufficient.
> 
> So it's a chicken-and-egg situation.  This particular --osrelease option is 
> used
> to determine *what* vmlinux file would be required for an actual crash 
> analysis 
> session.

Ok, that makes sense. I could've used that explanation when reviewing
the documentation. Do you mind skimming through this:

https://lkml.kernel.org/r/2019045650.gg4...@zn.tnic

in case we've missed explaining relevant usage - like that above - of
some of the vmcoreinfo members?

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 03:07:33PM -0500, Dave Anderson wrote:
> That's what it *does* utilize -- it takes a standalone vmcore dumpfile, and 
> pulls out the OSRELEASE string from it, so that a user can determine what
> vmlinux file should be used with that vmcore for normal crash analysis.

And the vmcoreinfo is part of the vmcore, right?

So it can just as well read out the address of init_uts_ns and get the
kernel version from there.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 02:36:47PM -0500, Dave Anderson wrote:
> There's no reading of the dumpfile's memory involved, and that being the case,
> the vmlinux file is not utilized.  That's the whole point of the crash 
> option, i.e.,
> taking a vmcore file, and trying to determine what kernel should be used with 
> it:
> 
>   $ man crash
>   ...
>--osrelease dumpfile
>   Display the OSRELEASE vmcoreinfo string from a kdump dumpfile 
> header.

I don't understand - if you have the vmcoreinfo (which I assume is part
of the vmcore, yes, no?) you can go and dig out the kernel version from
it, no?

Why should you not utilize the vmcore file?

(I'm most likely missing something.)

> Well, I just don't agree that the OSRELEASE item is "frivolous". It's
> been in place, and depended upon, for many years.

Yeah, no. The ABI argument is moot in this case as in the last couple
of months people have been persuading me that vmcoreinfo is not ABI. So
you guys need to make up your mind what is it. And if it is an ABI, it
wasn't documented anywhere.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 01:58:32PM -0500, Dave Anderson wrote:
> Preferably it would be left as-is.  The crash utility has a "crash 
> --osrelease vmcore"
> option that only looks at the dumpfile header, and just dump the string.  
> With respect
> to compressed kdumps, crash could alternatively look at the utsname data that 
> is stored
> in the diskdump_header.utsname field, but with ELF vmcores, there is no such 
> back-up.

Well, there is:

  4f 53 52 45 4c 45 41 53  45 3d 35 2e 30 2e 30 2d  |OSRELEASE=5.0.0-|
0010  72 63 32 2b 0a 50 41 47  45 53 49 5a 45 3d 34 30  |rc2+.PAGESIZE=40|
0020  39 36 0a 53 59 4d 42 4f  4c 28 6d 65 6d 5f 73 65  |96.SYMBOL(mem_se|
0030  63 74 69 6f 6e 29 3d 66  66 66 66 66 66 66 66 38  |ction)=8|
0040  34 35 31 61 31 61 38 0a  53 59 4d 42 4f 4c 28 69  |451a1a8.SYMBOL(i|
0050  6e 69 74 5f 75 74 73 5f  6e 73 29 3d 66 66 66 66  |nit_uts_ns)=|

0060  66 66 66 66 38 32 30 31  33 35 34 30 0a 53 59 4d  |82013540

This address has it.

> What's the problem with leaving it alone?

The problem is that I'd like to get all those vmcoreinfo exports under
control and to not have people frivolously export whatever they feel
like, for obvious reasons, and to get rid of the duplicate/unused pieces
being part of vmcoreinfo.

I'm guessing removing OSRELEASE would simplify the kernel a bit by
getting rid of the VMCOREINFO_OSRELEASE define and export, and userspace
can read out the kernel version from init_uts_ns which is also exported.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 05:48:48PM +, Kazuhito Hagio wrote:
> As for makedumpfile, it will be not impossible to remove the
> init_uts_ns.name.relase (OSRELEASE), but some fixes are needed.
> Because historically OSRELEASE has been a kind of a mandatory entry
> in vmcoreinfo from the beginning of vmcoreinfo, so makedumpfile uses
> its existence to check whether a vmcoreinfo is sane.

Well, init_uts_ns is exported in vmcoreinfo anyway - makedumpfile
can simply test init_uts_ns.name.release just as well. And the
"historically" argument doesn't matter because vmcoreinfo is not an ABI.

So makedumpfile needs to be changed to check that new export.

> Also, I think crash also will need to be fixed if it is removed.

Yes, I'm expecting user tools to be fixed and then exports removed.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 01:30:30PM +0800, lijiang wrote:
> I noticed that the checkpatch was coded in Perl. But i am not familiar with
> the Perl program language, that would be beyond my ability to do this, i have
> to learn the Perl program language step by step. :-)

You could give it a try - it is not hard :-)

And there's no hurry for this, take your time.

> Do you mean this one 'KERNEL_IMAGE_SIZE'?

I mean, all those which are unused. Optimally, you should look at the
tools and see whether they're using those exports and if not, remove
them. But no hurry here too, take your time.

My final goal is to have this up-to-date documentation of what is
exported and what is used by user tools so that people can look at it
first before carelessly exporting yet another thing.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-14 Thread Borislav Petkov

On Mon, Jan 14, 2019 at 09:52:14AM +0800, lijiang wrote:
> I would like to remove this variable and post again.

No, you should remove the vmcoreinfo export too:

kernel/crash_core.c:398:VMCOREINFO_OSRELEASE(init_uts_ns.name.release);

after making sure userspace is not using it and *then* remove the
documentation.

But you can do that in a separate patch, so that it can be reverted if
trouble.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-11 Thread Borislav Petkov

On Thu, Jan 10, 2019 at 08:19:43PM +0800, Lianbo Jiang wrote:
> This document lists some variables that export to vmcoreinfo, and briefly
> describles what these variables indicate. It should be instructive for
> many people who do not know the vmcoreinfo.
> 
> Suggested-by: Borislav Petkov 
> Signed-off-by: Lianbo Jiang 
> ---
>  Documentation/kdump/vmcoreinfo.txt | 500 +
>  1 file changed, 500 insertions(+)
>  create mode 100644 Documentation/kdump/vmcoreinfo.txt

Ok, below is what I'm going to commit if no one complains. I hope you'd
find some time to work on adding the checkpatch check for patches which
add vmcoreinfo members but do not document them and also remove those
vmcoreinfo members which are unused.

Which should be easy because we don't have to be backwards-compatible
with makedumpfile as this is not an ABI.

Thx.

---
From: Lianbo Jiang 
Date: Thu, 10 Jan 2019 20:19:43 +0800
Subject: [PATCH] kdump: Document kernel data exported in the vmcoreinfo note

Document data exported in vmcoreinfo and briefly describe its use by
userspace tools.

 [ bp: heavily massage and redact the text. ]

Suggested-by: Borislav Petkov 
Signed-off-by: Lianbo Jiang 
Signed-off-by: Borislav Petkov 
Cc: Andrew Morton 
Cc: Baoquan He 
Cc: Dave Young 
Cc: Jonathan Corbet 
Cc: Thomas Gleixner 
Cc: Vivek Goyal 
Cc: ander...@redhat.com
Cc: k-ha...@ab.jp.nec.com
Cc: ke...@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: mi...@redhat.com
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190110121944.6050-2-liji...@redhat.com
---
 Documentation/kdump/vmcoreinfo.txt | 494 +
 1 file changed, 494 insertions(+)
 create mode 100644 Documentation/kdump/vmcoreinfo.txt

diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
new file mode 100644
index ..2dc3797940a3
--- /dev/null
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -0,0 +1,494 @@
+
+   VMCOREINFO
+
+
+===
+What is it?
+===
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+
+Common variables
+
+
+init_uts_ns.name.release
+
+
+The version of the Linux kernel. Used to find the corresponding source
+code from which the kernel has been built.
+
+PAGE_SIZE
+-
+
+The size of a page. It is the smallest unit of data used by the memory
+management facilities. It is usually 4096 bytes of size and a page is
+aligned on 4096 bytes. Used for computing page addresses.
+
+init_uts_ns
+---
+
+The UTS namespace which is used to isolate two specific elements of the
+system that relate to the uname(2) system call. It is named after the
+data structure used to store information returned by the uname(2) system
+call.
+
+User-space tools can get the kernel name, host name, kernel release
+number, kernel version, architecture name and OS type from it.
+
+node_online_map
+---
+
+An array node_states[N_ONLINE] which represents the set of online nodes
+in a system, one bit position per node number. Used to keep track of
+which nodes are in the system and online.
+
+swapper_pg_dir
+-
+
+The global page directory pointer of the kernel. Used to translate
+virtual to physical addresses.
+
+_stext
+--
+
+Defines the beginning of the text section. In general, _stext indicates
+the kernel start address. Used to convert a virtual address from the
+direct kernel map to a physical address.
+
+vmap_area_list
+--
+
+Stores the virtual area list. makedumpfile gets the vmalloc start value
+from this variable and its value is necessary for vmalloc translation.
+
+mem_map
+---
+
+Physical addresses are translated to struct pages by treating them as
+an index into the mem_map array. Right-shifting a physical address
+PAGE_SHIFT bits converts it into a page frame number which is an index
+into that mem_map array.
+
+Used to map an address to the corresponding struct page.
+
+contig_page_data
+
+
+Makedumpfile gets the pglist_data structure from this symbol, which is
+used to describe the memory layout.
+
+User-space tools use this to exclude free pages when dumping memory.
+
+mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
+--
+
+The address of the mem_section array, its length, structure size, and
+the section_mem_map offset.
+
+It exists in the sparse memory mapping model, and it is also somewhat
+similar to the mem_map variab

Re: [PATCH 1/2 v6] kdump: add the vmcoreinfo documentation

2019-01-11 Thread Borislav Petkov

On Thu, Jan 10, 2019 at 08:19:43PM +0800, Lianbo Jiang wrote:
> +init_uts_ns.name.release
> +
> +
> +The version of the Linux kernel. Used to find the corresponding source
> +code from which the kernel has been built.
> +

...

> +
> +init_uts_ns
> +---
> +
> +This is the UTS namespace, which is used to isolate two specific
> +elements of the system that relate to the uname(2) system call. The UTS
> +namespace is named after the data structure used to store information
> +returned by the uname(2) system call.
> +
> +User-space tools can get the kernel name, host name, kernel release
> +number, kernel version, architecture name and OS type from it.

Already asked this but no reply so lemme paste my question again:

"And this document already fulfills its purpose - those two vmcoreinfo
exports are redundant and the first one can be removed.

And now that we agreed that VMCOREINFO is not an ABI and is very tightly
coupled to the kernel version, init_uts_ns.name.release can be removed,
yes?

Or is there anything speaking against that?"

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-18 Thread Borislav Petkov

On Tue, Dec 18, 2018 at 03:31:32PM +0800, lijiang wrote:
> The printk_log is used to output human readable text, it will encapsulate 
> header
> information for log_buf, such as timestamp, syslog level, etc.

Me asking those questions is supposed to hint that the explanations need
improvement. But you get the idea...

> >> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
> >> +PG_hwpoision|PG_head_mask
> >> +=
> >> +It means the attribute of a page. These flags will be used to filter
> >> +the free pages.
> >> +
> >> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
> >> +==
> >> +The 'PG_buddy' flag indicates that the page is free and in the buddy
> >> +system. Makedumpfile can exclude the free pages managed by a buddy.
> > 
> > That text belongs with the one above?
> > 
> It exported the value of (~PG_buddy), so it is placed here independently.

Then make that obvious in the description. The one above talks about the
PG flags and this one should talk about PAGE_BUDDY_MAPCOUNT_VALUE and
what it is used for. The fact that it is computed by negating PG_buddy
is an implementation detail.

> These two variables are somewhat similar, but they are used in
> different scenarios.

Those different scenarious need to be part of the description.

> >> +KERNEL_IMAGE_SIZE
> >> +=
> >> +The size of 'KERNEL_IMAGE_SIZE', currently unused.
> > 
> > So remove?
> > 
> 
> I'm not sure whether it should be removed, so i keep it.

If it is unused, it should be removed as an VMCOREINFO export and from
the docs. But that can be done later, as a separate patch.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 2/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-17 Thread Borislav Petkov

On Sun, Dec 16, 2018 at 09:16:17PM +0800, Lianbo Jiang wrote:
> For AMD machine with SME feature, makedumpfile tools need to know
> whether the crash kernel was encrypted or not. If SME is enabled
> in the first kernel, the crash kernel's page table(pgd/pud/pmd/pte)
> contains the memory encryption mask, so need to remove the sme mask
> to obtain the true physical address.
> 
> Signed-off-by: Lianbo Jiang 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 14 ++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 4c8acdfdc5a7..1860fe24117d 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -352,10 +352,24 @@ void machine_kexec(struct kimage *image)
>  
>  void arch_crash_save_vmcoreinfo(void)
>  {
> + u64 sme_mask = sme_me_mask;
> +
>   VMCOREINFO_NUMBER(phys_base);
>   VMCOREINFO_SYMBOL(init_top_pgt);
>   vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>   pgtable_l5_enabled());
> + /*
> +  * Currently, the local variable 'sme_mask' stores the value of
> +  * sme_me_mask(bit 47), and also write the value of sme_mask to
> +  * the vmcoreinfo.
> +  * If need, the bit(sme_mask) might be redefined in the future,
> +  * but the 'bit63' will be reserved.
> +  * For example:
> +  * [ misc  ][ enc bit  ][ other misc SME info   ]
> +  * ____1000______..._
> +  * 63   59   55   51   47   43   39   35   31   27   ... 3
> +  */

This text belongs into the document.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-17 Thread Borislav Petkov

On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:
> +clear_idx
> +=
> +The index that the next printk record to read after the last 'clear'
> +command. It indicates the first record after the last SYSLOG_ACTION
> +_CLEAR, like issued by 'dmesg -c'.

What is that used for by the userspace tools?

> +
> +log_next_idx
> +
> +The index of the next record to store in the buffer 'log_buf'. It helps
> +to compute the index of current strings position.
> +
> +printk_log
> +==
> +The size of a structure 'printk_log'. It helps to compute the size of
> +messages, and extract dmesg log.

What is the difference between that and log_buf?



> +
> +(printk_log, ts_nsec|len|text_len|dict_len)
> +===
> +It represents these field offsets in the structure 'printk_log'. User
> +space tools can parse it and detect any changes to structure down the
> +line.

What does that mean? "any changes down the line"?

> +
> +(free_area.free_list, MIGRATE_TYPES)
> +
> +The number of migrate types for pages. The free_list is divided into
> +the array, it needs to know the number of the array.

... for?

> +
> +NR_FREE_PAGES
> +=
> +On linux-2.6.21 or later, the number of free_pages is in
> +vm_stat[NR_FREE_PAGES]. It can get the number of free pages from the
> +array.
> +
> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|
> +PG_hwpoision|PG_head_mask
> +=
> +It means the attribute of a page. These flags will be used to filter
> +the free pages.
> +
> +PAGE_BUDDY_MAPCOUNT_VALUE or ~PG_buddy
> +==
> +The 'PG_buddy' flag indicates that the page is free and in the buddy
> +system. Makedumpfile can exclude the free pages managed by a buddy.

That text belongs with the one above?

> +
> +HUGETLB_PAGE_DTOR
> +=
> +The 'HUGETLB_PAGE_DTOR' flag indicates the hugetlbfs pages. Makedumpfile
> +will exclude these pages.
> +
> +
> +x86_64 variables
> +
> +
> +phys_base
> +=
> +In x86_64, the 'phys_base' is necessary to convert virtual address of
> +exported kernel symbol to physical address.
> +
> +init_top_pgt
> +
> +The 'init_top_pgt' used to walk through the whole page table and convert
> +virtual address to physical address.

This is the same as swapper_pg_dir?

> +
> +pgtable_l5_enabled
> +==
> +User-space tools need to know whether the crash kernel was in 5-level
> +paging mode or not.
> +
> +node_data
> +=
> +This is a struct 'pglist_data' array, it stores all numa nodes information.
> +In general, Makedumpfile can get the pglist_data structure from symbol
> +'node_data'.
> +
> +(node_data, MAX_NUMNODES)
> +=
> +The number of this 'node_data' array. It means the maximum number of the
> +nodes in system.
> +
> +KERNELOFFSET
> +
> +Randomize the address of the kernel image. This is the offset of KASLR in
> +VMCOREINFO ELF notes. It is used to compute the page offset in x86_64. If
> +KASLE is disabled, this value is zero.
> +
> +KERNEL_IMAGE_SIZE
> +=
> +The size of 'KERNEL_IMAGE_SIZE', currently unused.

So remove?

> +
> +The old MODULES_VADDR need be decided by KERNEL_IMAGE_SIZE when kaslr
> +enabled. Now MODULES_VADDR is not needed any more since Pratyush makes
> +all VA to PA converting done by page table lookup.

Also, I did clean this up considerably - please include in your next
version:

---
diff --git a/Documentation/kdump/vmcoreinfo.txt 
b/Documentation/kdump/vmcoreinfo.txt
index d71260bf383a..2ce34d952bfd 100644
--- a/Documentation/kdump/vmcoreinfo.txt
+++ b/Documentation/kdump/vmcoreinfo.txt
@@ -1,18 +1,19 @@
 
-   Documentation for VMCOREINFO
+   VMCOREINFO
 
 
 ===
 What is the VMCOREINFO?
 ===
-It is a special ELF note section. The VMCOREINFO contains the first
-kernel's various information, for example, structure size, page size,
-symbol values and field offset, etc. These data are packed into an ELF
-note section, and these data will also help user-space tools(e.g. crash
-makedumpfile) analyze the first kernel's memory usage.
-
-In general, makedumpfile can dump the VMCOREINFO contents from vmlinux
-in the first kernel. For example:
+
+VMCOREINFO is a special ELF note section. It contains various
+information from the kernel like structure size, page size, symbol
+values, field offsets, etc. These data are packed into an ELF note
+section and used by user-space tools like crash and makedumpfile to
+analyze a kernel's memory layout.
+
+To dump the VMCOREINFO contents, one can do:
+
 # makedumpfile -g VMCOREINFO -x vmlinux
 
 
@@ -20,123 +21,132 @@ Common variabl

Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-17 Thread Borislav Petkov

On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:

This...

> +node_online_map
> +===
> +It is a macro definition, actually it is an array node_states[N_ONLINE],
> +and it represents the set of online node in a system, one bit position
> +per node number.
> +
> +This is used to keep track of which nodes are in the system and online.

... and this...

> +nodemask_t
> +==
> +The size of a 'nodemask_t' type. This value is used to compute the number
> +of online nodes.

sound redundant too?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 0/2 v3] kdump,vmcoreinfo: Export the value of sme mask to vmcoreinfo

2018-12-17 Thread Borislav Petkov

On Sun, Dec 16, 2018 at 09:16:15PM +0800, Lianbo Jiang wrote:
> This patchset did two things:
> a. add a new document for vmcoreinfo
> 
> This document lists some variables that export to vmcoreinfo, and briefly
> describles what these variables indicate. It should be instructive for
> many people who do not know the vmcoreinfo, and it would normalize the
> exported variable as a standard ABI between kernel and use-space.
> 
> b. export the value of sme mask to vmcoreinfo
> 
> For AMD machine with SME feature, makedumpfile tools need to know whether
> the crash kernel was encrypted or not. If SME is enabled in the first
> kernel, the crash kernel's page table(pgd/pud/pmd/pte) contains the
> memory encryption mask, so need to remove the sme mask to obtain the true
> physical address.
> 
> Changes since v1:
> 1. No need to export a kernel-internal mask to userspace, so copy the
> value of sme_me_mask to a local variable 'sme_mask' and write the value
> of sme_mask to vmcoreinfo.
> 2. Add comment for the code.
> 3. Improve the patch log.
> 4. Add the vmcoreinfo documentation.
> 
> Changes since v2:
> 1. Improve the vmcoreinfo document, add more descripts for these
> variables exported.
> 2. Fix spelling errors in the document.

Yes, it is starting to look better.

The last thing that's missing is a checkpatch.pl check which verifies
whether a new VMCOREINFO export is not being documented and warn if so.
But you can do that later.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v3] kdump: add the vmcoreinfo documentation

2018-12-17 Thread Borislav Petkov

On Sun, Dec 16, 2018 at 09:16:16PM +0800, Lianbo Jiang wrote:
> +
> +Common variables
> +
> +
> +init_uts_ns.name.release
> +
> +The number of OS release. Based on this version number, people can find
> +the source code for the corresponding version. When analyzing the vmcore,
> +people must read the source code to find the reason why the kernel crashed.
> +

> +
> +init_uts_ns
> +===
> +This is the UTS namespace, which is used to isolate two specific elements
> +of the system that relate to the uname system call. The UTS namespace is
> +named after the data structure used to store information returned by the
> +uname system call.
> +
> +User-space tools can get the kernel name, host name, kernel release number,
> +kernel version, architecture name and OS type from the 'init_uts_ns'.

And this document already fulfills its purpose - those two vmcoreinfo
exports are redundant and the first one can be removed.

And now that we agreed that VMCOREINFO is not an ABI and is very tightly
coupled to the kernel version, init_uts_ns.name.release can be removed,
yes?

Or is there anything speaking against that?

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v2] kdump: add the vmcoreinfo documentation

2018-12-10 Thread Borislav Petkov

On Wed, Dec 05, 2018 at 08:29:14PM +, Kazuhito Hagio wrote:
> Please note that this VMCOREINFO is generated from the information in
> the vmlinux only, not from the running kernel and /proc/kcore. So if
> we add a command to dump it from running kernel, it's not suitable.

Sure, I used a vmlinux for that.

> makedumpfile doesn't have any switch which dumps VMCOREINFO from kcore
> for now. (I'm thinking to have makedumpfile dump it as debug message,
> though.)

Might be useful as people are looking into using VMCOREINFO when
debugging a live kernel...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v2] kdump: add the vmcoreinfo documentation

2018-12-05 Thread Borislav Petkov

On Tue, Dec 04, 2018 at 05:35:09PM +0800, lijiang wrote:
> There are more people to review and improve this document together, that would
> be fine.

That's basically kernel development :)

> Generating VMCOREINFO is easy in the first kernel, for example:
> # makedumpfile -g VMCOREINFO -x vmlinux

I get:

$ makedumpfile -g VMCOREINFO -x vmlinux
The kernel version is not supported.
The makedumpfile operation may be incomplete.

The vmcoreinfo is saved to VMCOREINFO.

makedumpfile Completed.

But the text file looks ok AFAICT. Please add that command to the
documentation file.

> For these two *why*, it should be easy to understand. Because user-space tools
> need to know basic information, such as the symbol values, field offset, 
> structure
> size, etc. Otherwise, these tools won't know how to analyze the memory of the 
> crash
> kernel.

That's clear. The question is *why* *exactly* is every piece of export
needed.

> For the second question 'how they are used', we can get the answer
> from user-space tools, such as makedumpfile, crash tools. Therefore,
> it may not need to explain any more in kernel document. On the other
> hand, if we must put these contents into kernel document, i have to
> say, that would be a hard task.

You can put one or two sentences for each, stating what they're used
for. This way, people can go and look up makedumpfile source code for
further info and people trying to add new items to VMCOREINFO can look
in this documentation here first to figure out whether maybe the info
they need has been exported already.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [RFC PATCH v6 04/26] x86/fpu/xstate: Introduce XSAVES system states

2018-12-04 Thread Borislav Petkov

On Tue, Dec 04, 2018 at 09:08:11AM -0800, Yu-cheng Yu wrote:
> Then we will do this very often.  Why don't we create all three in the
> beginning: xfeatures_mask_all, xfeatures_mask_user, and xfeatures_mask_system?

Because the _all thing is the OR-ed product of the two and then you
don't have to update it when the _user and the _system ones change
because you'll be creating it on the fly each time.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [RFC PATCH v6 04/26] x86/fpu/xstate: Introduce XSAVES system states

2018-12-04 Thread Borislav Petkov

On Mon, Nov 19, 2018 at 01:47:47PM -0800, Yu-cheng Yu wrote:
> Control-flow Enforcement (CET) MSR contents are XSAVES system states.
> To support CET, introduce XSAVES system states first.
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/fpu/internal.h |  3 +-
>  arch/x86/include/asm/fpu/xstate.h   |  4 +-
>  arch/x86/kernel/fpu/core.c  |  6 +-
>  arch/x86/kernel/fpu/init.c  | 10 ---
>  arch/x86/kernel/fpu/xstate.c| 94 +++--
>  5 files changed, 69 insertions(+), 48 deletions(-)

...

> @@ -704,6 +710,7 @@ static int init_xstate_size(void)
>   */
>  static void fpu__init_disable_system_xstate(void)
>  {
> + xfeatures_mask_all = 0;
>   xfeatures_mask_user = 0;
>   cr4_clear_bits(X86_CR4_OSXSAVE);
>   fpu__xstate_clear_all_cpu_caps();
> @@ -717,6 +724,8 @@ void __init fpu__init_system_xstate(void)
>  {
>   unsigned int eax, ebx, ecx, edx;
>   static int on_boot_cpu __initdata = 1;
> + u64 cpu_system_xfeatures_mask;
> + u64 cpu_user_xfeatures_mask;

So what I had in mind is to not have those local vars but use
xfeatures_mask_user and xfeatures_mask_system here directly...

>   int err;
>   int i;
>  
> @@ -739,10 +748,23 @@ void __init fpu__init_system_xstate(void)
>   return;
>   }
>  
> + /*
> +  * Find user states supported by the processor.
> +  * Only these bits can be set in XCR0.
> +  */
>   cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
> - xfeatures_mask_user = eax + ((u64)edx << 32);
> + cpu_user_xfeatures_mask = eax + ((u64)edx << 32);
>  
> - if ((xfeatures_mask_user & XFEATURE_MASK_FPSSE) != XFEATURE_MASK_FPSSE) 
> {
> + /*
> +  * Find system states supported by the processor.
> +  * Only these bits can be set in IA32_XSS MSR.
> +  */
> + cpuid_count(XSTATE_CPUID, 1, &eax, &ebx, &ecx, &edx);
> + cpu_system_xfeatures_mask = ecx + ((u64)edx << 32);
> +
> + xfeatures_mask_all = cpu_user_xfeatures_mask | 
> cpu_system_xfeatures_mask;

... and not introduce xfeatures_mask_all at all but everywhere you need
all features, to do:

(xfeatures_mask_user | xfeatures_mask_system)

and work with that.

...

> @@ -1178,7 +1208,7 @@ int copy_kernel_to_xstate(struct xregs_state *xsave, 
> const void *kbuf)
>* The state that came in from userspace was user-state only.
>* Mask all the user states out of 'xfeatures':
>*/
> - xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR;
> + xsave->header.xfeatures &= (xfeatures_mask_all & ~xfeatures_mask_user);

... and this would be

xsave->header.xfeatures &= xfeatures_mask_system;

>  
>   /*
>* Add back in the features that came in from userspace:
> @@ -1234,7 +1264,7 @@ int copy_user_to_xstate(struct xregs_state *xsave, 
> const void __user *ubuf)
>* The state that came in from userspace was user-state only.
>* Mask all the user states out of 'xfeatures':
>*/
> - xsave->header.xfeatures &= XFEATURE_MASK_SUPERVISOR;
> + xsave->header.xfeatures &= (xfeatures_mask_all & ~xfeatures_mask_user);

Ditto here.

This way you have *two* mask variables and code queries them only.

Hmmm?

Or am I missing something?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH 1/2 v2] kdump: add the vmcoreinfo documentation

2018-12-03 Thread Borislav Petkov

Add some more Ccs.

On Sun, Dec 02, 2018 at 11:08:38AM +0800, Lianbo Jiang wrote:
> This document lists some variables that export to vmcoreinfo, and briefly
> describles what these variables indicate. It should be instructive for
> many people who do not know the vmcoreinfo, and it also normalizes the
> exported variable as a standard ABI between kernel and use-space.

Yeah, I'm not sure about it being an ABI. Apparently, it is considered
too tightly coupled to the kernel for it to be an ABI.

Regardless, thanks for doing that.

> Suggested-by: Borislav Petkov 
> Signed-off-by: Lianbo Jiang 
> ---
>  Documentation/kdump/vmcoreinfo.txt | 400 +
>  1 file changed, 400 insertions(+)
>  create mode 100644 Documentation/kdump/vmcoreinfo.txt
> 
> diff --git a/Documentation/kdump/vmcoreinfo.txt 
> b/Documentation/kdump/vmcoreinfo.txt

Aren't we adding new docs in rst format only or what is the logic there?

Jon?

> new file mode 100644
> index ..c6759be14af7
> --- /dev/null
> +++ b/Documentation/kdump/vmcoreinfo.txt
> @@ -0,0 +1,400 @@
> +
> + Documentation for Vmcoreinfo
> +
> +
> +===
> +What is the vmcoreinfo?
> +===
> +The vmcoreinfo contains the first kernel's various information, for

The first sentence here should be explaining what VMCOREINFO is: it is
an ELF PT_NOTE section. So that people can go, oh ok, it is a special
ELF section, when reading.

Then, MAKEDUMPFILE(8) spells VMCOREINFO in all caps and I think we
should do that too here, for ease of recognition.

Btw, do we have a makedumpfile switch or a tool/script which dumps
VMCOREINFO contents in human-readable form?

Maybe something nicer than:

$ hexdump -C /proc/kcore

> +example, structure size, page size, symbol values and field offset,
> +etc. These data are encapsulated into an elf format, and these data
> +will also help user-space tools(e.g. makedumpfile, crash) analyze the
> +first kernel's memory usage.
> +
> +
> +Common variables
> +
> +
> +init_uts_ns.name.release
> +
> +The number of OS release.
> +
> +PAGE_SIZE
> +=
> +The size of a page. It is usually 4k bytes.
> +
> +init_uts_ns
> +===
> +This is the UTS namespace, which is used to isolate two specific elements
> +of the system that relate to the uname system call. The UTS namespace is
> +named after the data structure used to store information returned by the
> +uname system call.

Those non-obvious exports should also have a short explanation why
they're part of VMCOREINFO.

> +
> +node_online_map
> +===
> +It is a macro definition, actually it is an arrary node_states[N_ONLINE],
> +and it represents the set of online node in a system, one bit position
> +per node number.

Ditto.

So yeah, people can find out what those things are but I think it is
more important to state here *why* they're part of VMCOREINFO and how
they're used and why they're exported.

Who knows, some might turn out to be not needed anymore. :)

> +
> +swapper_pg_dir
> +=
> +It is always an array, it gerenally stands for the pgd for the kernel.
> +When mmu is enabled in config file, the 'swapper_pg_dir' is valid.
> +
> +_stext
> +==
> +It is an assemble directive that defines the beginning of the text section.

That's an assembly symbol.

> +In gerenal, the '_stext' indicates the kernel start address.
> +
> +vmap_area_list
> +==
> +It stores the virtual area list, makedumpfile can get the vmalloc start
> +value according to this variable.

"... from this variable."

> +
> +mem_map
> +===
> +Physical addresses are translated to struct pages by treating them as an
> +index into the mem_map array. Shifting a physical address PAGE_SHIFT bits
> +to the right will treat it as a PFN from physical address 0, which is also
> +an index within the mem_map array.
> +
> +In a word, it can map the address to struct page.

"In short, ... "

> +
> +contig_page_data
> +
> +Makedumpfile can get the pglist_data structure according to this symbol

Please look up in the dictionary what "according" means. Using it in
this context is at least weird.

> +'contig_page_data'. The pglist_data structure is used to describe the
> +memory layout.
> +
> +mem_section|(mem_section, NR_SECTION_ROOTS)|(mem_section, section_mem_map)
> +==
> +Export the

Re: [PATCH v9 10/13] x86/resctrl: Fix the messages in rdt_last_cmd_printf and rdt_last_cmd_puts

2018-11-26 Thread Borislav Petkov

On Mon, Nov 26, 2018 at 02:16:24PM -0800, Reinette Chatre wrote:
> Hi Babu and Borislav,
> 
> Two typos seemed to have slipped through into the merged commit ...
> 
> On 11/21/2018 12:28 PM, Moger, Babu wrote:
> > @@ -163,14 +163,14 @@ int parse_cbm(struct rdt_parse_data *data, struct 
> > rdt_resource *r,
> >  * either is exclusive.
> >  */
> > if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, true)) {
> > -   rdt_last_cmd_printf("overlaps with exclusive group\n");
> > +   rdt_last_cmd_printf("Overlaps with exclusive group\n");
> > return -EINVAL;
> > }
> >  
> > if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
> > if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
> > rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
> > -   rdt_last_cmd_printf("overlaps with other group\n");
> > +   rdt_last_cmd_printf("0verlaps with other group\n");
> > return -EINVAL;
> > }
> > }
> 
> There is a zero instead of O used in second "Overlaps".

LOL!

That's so l337 :-)

> > @@ -1235,7 +1235,7 @@ static ssize_t rdtgroup_mode_write(struct 
> > kernfs_open_file *of,
> > goto out;
> > rdtgrp->mode = RDT_MODE_PSEUDO_LOCKSETUP;
> > } else {
> > -   rdt_last_cmd_printf("unknown/unsupported mode\n");
> > +   rdt_last_cmd_printf("Unknown orunsupported mode\n");
> > ret = -EINVAL;
> > }
> 
> Missing a space.

Whoops, nice catch. Can you send a fix pls?

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v9 01/13] x86/resctrl: Rename and move rdt files to new directory

2018-11-26 Thread Borislav Petkov

On Mon, Nov 26, 2018 at 10:31:02AM -0800, Luck, Tony wrote:
> Is this talking about renaming /sys/fs/resctrl?
> 
> If so NAK to that. It is ABI now. Lots of scripts depend
> on that name.

No no, that is cast in stone. The kernel source dir is called "resctrl"
now too:

arch/x86/kernel/cpu/resctrl

so even less confusion.

:)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v9 01/13] x86/resctrl: Rename and move rdt files to new directory

2018-11-23 Thread Borislav Petkov

On Fri, Nov 23, 2018 at 09:41:17AM +0100, Ingo Molnar wrote:
> Then at least make the directory name resource_control/, which is only 
> marginally longer and a lot more readable.
> 
> We really don't have to fit directly names into the 8 character DOS limit 
> anymore. ;-)

How about

resource_ctl

?

resource_control/ is kinda long-ish and the other names we have there
are nice and short, see below.

BTW, while we're talking renaming, I have a patch which renames the MCE
pile and am planning to slap it in around -rc6 timeframe since we don't
have a lot of RAS commits this time around, see also the end of this
mail. It makes the naming there all nicely regular. :)

> Also, the Kconfig space, when it gets extended with the AMD bits, should 
> probably follow the same nomenclature: CONFIG_X86_CPU_RESOURCE_CONTROL=y 
> or such.

Sure, I can do that together with the directory rename once we've agreed
on the name.

Thx.

arch/x86/kernel/cpu/
|-- amd.c
|-- aperfmperf.c
|-- bugs.c
|-- cacheinfo.c
|-- centaur.c
|-- common.c
|-- cpu.h
|-- cpuid-deps.c
|-- cyrix.c
|-- hygon.c
|-- hypervisor.c
|-- intel.c
|-- intel_pconfig.c
|-- Makefile
|-- match.c
|-- mcheck
|   |-- dev-mcelog.c
|   |-- Makefile
|   |-- mce_amd.c
|   |-- mce-apei.c
|   |-- mce.c
|   |-- mce-genpool.c
|   |-- mce-inject.c
|   |-- mce_intel.c
|   |-- mce-internal.h
|   |-- mce-severity.c
|   |-- p5.c
|   |-- therm_throt.c
|   |-- threshold.c
|   `-- winchip.c
|-- microcode
|   |-- amd.c
|   |-- core.c
|   |-- intel.c
|   `-- Makefile
|-- mkcapflags.sh
|-- mshyperv.c
|-- mtrr
|   |-- amd.c
|   |-- centaur.c
|   |-- cleanup.c
|   |-- cyrix.c
|   |-- generic.c
|   |-- if.c
|   |-- Makefile
|   |-- mtrr.c
|   `-- mtrr.h
|-- perfctr-watchdog.c
|-- powerflags.c
|-- proc.c
|-- rdrand.c
|-- resctrl
|   |-- core.c
|   |-- ctrlmondata.c
|   |-- internal.h
|   |-- Makefile
|   |-- monitor.c
|   |-- pseudo_lock.c
|   |-- pseudo_lock_event.h
|   `-- rdtgroup.c
|-- scattered.c
|-- topology.c
|-- transmeta.c
|-- umc.c
`-- vmware.c

4 directories, 61 files


---
From: Borislav Petkov 
Date: Sun, 18 Nov 2018 15:15:05 +0100
Subject: [PATCH] x86/mce: Streamline MCE subsystem's naming

Rename the containing folder to "mce" which is the most widespread name.
Drop the "mce[-_]" filename prefix of some compilation units (while
others don't have it).

This unifies the file naming in the MCE subsystem:

mce/
|-- amd.c
|-- apei.c
|-- core.c
|-- dev-mcelog.c
|-- genpool.c
|-- inject.c
|-- intel.c
|-- internal.h
|-- Makefile
|-- p5.c
|-- severity.c
|-- therm_throt.c
|-- threshold.c
`-- winchip.c

No functional changes.

Signed-off-by: Borislav Petkov 
---
 arch/x86/kernel/cpu/Makefile   |  2 +-
 arch/x86/kernel/cpu/{mcheck => mce}/Makefile   | 10 +-
 arch/x86/kernel/cpu/{mcheck/mce_amd.c => mce/amd.c}|  2 +-
 arch/x86/kernel/cpu/{mcheck/mce-apei.c => mce/apei.c}  |  2 +-
 arch/x86/kernel/cpu/{mcheck/mce.c => mce/core.c}   |  2 +-
 arch/x86/kernel/cpu/{mcheck => mce}/dev-mcelog.c   |  2 +-
 .../kernel/cpu/{mcheck/mce-genpool.c => mce/genpool.c} |  2 +-
 .../kernel/cpu/{mcheck/mce-inject.c => mce/inject.c}   |  2 +-
 .../x86/kernel/cpu/{mcheck/mce_intel.c => mce/intel.c} |  2 +-
 .../cpu/{mcheck/mce-internal.h => mce/internal.h}  |  0
 arch/x86/kernel/cpu/{mcheck => mce}/p5.c   |  0
 .../cpu/{mcheck/mce-severity.c => mce/severity.c}  |  2 +-
 arch/x86/kernel/cpu/{mcheck => mce}/therm_throt.c  |  0
 arch/x86/kernel/cpu/{mcheck => mce}/threshold.c|  0
 arch/x86/kernel/cpu/{mcheck => mce}/winchip.c  |  0
 15 files changed, 14 insertions(+), 14 deletions(-)
 rename arch/x86/kernel/cpu/{mcheck => mce}/Makefile (52%)
 rename arch/x86/kernel/cpu/{mcheck/mce_amd.c => mce/amd.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck/mce-apei.c => mce/apei.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck/mce.c => mce/core.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck => mce}/dev-mcelog.c (99%)
 rename arch/x86/kernel/cpu/{mcheck/mce-genpool.c => mce/genpool.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck/mce-inject.c => mce/inject.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck/mce_intel.c => mce/intel.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck/mce-internal.h => mce/internal.h} (100%)
 rename arch/x86/kernel/cpu/{mcheck => mce}/p5.c (100%)
 rename arch/x86/kernel/cpu/{mcheck/mce-severity.c => mce/severity.c} (99%)
 rename arch/x86/kernel/cpu/{mcheck => mce}/therm_throt.c (100%)
 rename arch/x86/kernel/cpu/{mcheck => mce}/threshold.c (100%)
 rename arch/x86/kernel/cpu/{mcheck => mce}/winchip.c (100%)

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 1f5d2291c31e..43afe707c6fb 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -40,7 +40,7 @@ obj-$(CONFIG_INTEL_RDT)   += intel_rdt.o 
intel_rdt_rdtgro

Re: [PATCH v9 01/13] x86/resctrl: Rename and move rdt files to new directory

2018-11-23 Thread Borislav Petkov

On Fri, Nov 23, 2018 at 08:28:39AM +0100, Ingo Molnar wrote:
> Ugh, violent NAK on this unreadable directory naming: 'resctrl' is an 
> ugly double/triple abbreviation that nobody recognizes for what it is to 
> begin with, and even the long form 'resource control' is an overly 
> generic naming - *everything* the kernel does is in essence 'resource 
> control' ...

Well, the fs this thing uses is called "resctrl".

Documentation/x86/resctrl_ui.txt:1075:the resctrl will still mount but cannot 
create CTRL_MON directories.
Documentation/x86/resctrl_ui.txt:1082:# mount -t resctrl resctrl /sys/fs/resctrl
Documentation/x86/resctrl_ui.txt:1083:# cd /sys/fs/resctrl

Are you saying that the fs should be renamed now too?

> So please find some better name and standardize the namespace around it. 
> A couple of suggestions:
> 
>  -'Hardware Quality of Service', i.e. HW_QOS, hw_qos
>  - or 'CPU bandwidth control',   i.e. CPU_BW, cpu_bw
>  - or 'Hardware Bandwidth Control',  i.e. HW_BW,  hw_bw

How are those *abbreviations* better? "hw_bw" is especially cryptic and
the others are no better.

"resctrl" to mean "resource control" is much better IMO. And it is
different from the "other" resource controlling the kernel does because
it is under arch/x86/kernel/cpu/ which tells you it is a *CPU* resource
control.

And also matches the user-visible "resctrl" filesystem.

But I don't have the energy to bikeshed this morning so whatever, as
long as it is short...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v9 08/13] x86/resctrl: Rename config parameter INTEL_RDT to RESCTRL

2018-11-22 Thread Borislav Petkov

On Wed, Nov 21, 2018 at 08:28:39PM +, Moger, Babu wrote:
> Resource control feature is supported by both Intel and AMD.
> So, rename the INTEL_RDT to vendor neutral RESCTRL.
> 
> Now CONFIG_RESCTRL will be used for both Intel and AMD to enable
> Resource Control support. Update the texts in config and condition
> accordingly.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/Kconfig | 23 ---
>  arch/x86/include/asm/resctrl_sched.h |  4 ++--
>  arch/x86/kernel/cpu/Makefile |  2 +-
>  arch/x86/kernel/cpu/resctrl/Makefile |  4 ++--
>  include/linux/sched.h|  2 +-
>  5 files changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 1a0be022f91d..36aad997caf8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -445,16 +445,25 @@ config RETPOLINE
> code are eliminated. Since this includes the syscall entry path,
> it is not entirely pointless.
>  
> -config INTEL_RDT
> - bool "Intel Resource Director Technology support"
> +config RESCTRL
> + bool "Resource Control support"
>   default n
> - depends on X86 && CPU_SUP_INTEL
> + depends on X86 && (CPU_SUP_INTEL || CPU_SUP_AMD)
>   select KERNFS
>   help
> -   Select to enable resource allocation and monitoring which are
> -   sub-features of Intel Resource Director Technology(RDT). More
> -   information about RDT can be found in the Intel x86
> -   Architecture Software Developer Manual.
> +   Select to enable Resource Control feature support.
> +
> +   These features are intended to provide the support for the allocation
> +   and monitoring of the usage of certain system resources by one or more
> +   processors.
> +
> +   Intel refers to this feature as Intel Resource Director Technology
> +   (Intel(R) RDT). More information about RDT can be found in the
> +   Intel x86 Architecture Software Developer Manual.
> +
> +   AMD refers to this feature as AMD Platform Quality of Service(AMD 
> QoS).
> +   More information about AMD QoS can be found in AMD64 Technology
> +   Platform Quality of Service Extensions manual.

I've simplified this to:

  Enable Resource Control support.

  Provide support for the allocation and monitoring of system resources 
  usage by the CPU.

  Intel calls this Intel Resource Director Technology
  (Intel(R) RDT). More information about RDT can be found in the
  Intel x86 Architecture Software Developer Manual.

  AMD calls this AMD Platform Quality of Service (AMD QoS).
  More information about AMD QoS can be found in the AMD64 Technology
  Platform Quality of Service Extensions manual.

  Say N if unsure.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v9 01/13] x86/resctrl: Rename and move rdt files to new directory

2018-11-22 Thread Borislav Petkov

On Wed, Nov 21, 2018 at 08:28:25PM +, Moger, Babu wrote:
> New generation of AMD processors start supporting RDT(or QOS)
> features. Together these features will be called as RESCTRL.
> With more than one vendors supporting these features, it seems
> more appropriate to rename these files.
> 
> Create a new directory with the name 'resctrl' and move all the
> intel_rdt files to the new directory. This way all the resctrl
> related code resides inside one directory.
> 
> Suggested-by: Borislav Petkov 
> Signed-off-by: Babu Moger 
> ---

...

> diff --git a/arch/x86/kernel/cpu/resctrl/Makefile 
> b/arch/x86/kernel/cpu/resctrl/Makefile
> new file mode 100644
> index ..04c9fd67fb3e
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/resctrl/Makefile
> @@ -0,0 +1,7 @@
> +#
> +# Makefile for resource control feature code
> +#

I've replaced that comment with

# SPDX-License-Identifier: GPL-2.0

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 11/13] arch/resctrl: Introduce QOS feature for AMD

2018-11-20 Thread Borislav Petkov

On Fri, Nov 16, 2018 at 08:54:43PM +, Moger, Babu wrote:
> Enables QOS feature on AMD.

>From Documentation/process/submitting-patches.rst:

"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."

> Following QoS sub-features are supported in AMD if the underlying
> hardware supports it.
>  - L3 Cache allocation enforcement
>  - L3 Cache occupancy monitoring
>  - L3 Code-Data Prioritization support
>  - Memory Bandwidth Enforcement(Allocation)
> 
> The specification for this feature is available at
> https://developer.amd.com/wp-content/resources/56375.pdf

I hope that URL is stable.

> There are differences in the way some of the features are implemented.
> Separate those functions and add those as vendor specific functions.
> The major difference is in MBA feature.
>  - AMD uses CPUID leaf 0x8020 to initialize the MBA features.
>  - AMD uses direct bandwidth value instead of delay based on bandwidth
>values.
>  - MSR register base addresses are different for MBA.
>  - Also AMD allows non-contiguous L3 cache bit masks.
> 
> Adds following functions to take care of the differences.
> rdt_get_mem_config_amd : MBA initialization function
> parse_bw_amd : Bandwidth parsing
> mba_wrmsr_amd: Writes bandwidth value
> cbm_validate_amd : L3 cache bitmask validation

This paragraph is not needed - what you do is visible in the patch
itself.

> Signed-off-by: Babu Moger 
> ---
>  arch/x86/kernel/cpu/resctrl.c | 68 +-
>  arch/x86/kernel/cpu/resctrl.h |  5 ++
>  arch/x86/kernel/cpu/resctrl_ctrlmondata.c | 70 +++
>  3 files changed, 141 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl.c b/arch/x86/kernel/cpu/resctrl.c
> index 3f26c7c114e7..0d700ab7fcf9 100644
> --- a/arch/x86/kernel/cpu/resctrl.c
> +++ b/arch/x86/kernel/cpu/resctrl.c
> @@ -61,6 +61,9 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
>   struct rdt_resource *r);
>  static void
>  cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
> +static void
> +mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
> +   struct rdt_resource *r);
>  
>  #define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
>  
> @@ -280,6 +283,31 @@ static bool rdt_get_mem_config(struct rdt_resource *r)
>   return true;
>  }
>  
> +static bool rdt_get_mem_config_amd(struct rdt_resource *r)
> +{
> + union cpuid_0x10_3_eax eax;
> + union cpuid_0x10_x_edx edx;
> + u32 ebx, ecx;
> +
> + cpuid_count(0x8020, 1, &eax.full, &ebx, &ecx, &edx.full);
> + r->num_closid = edx.split.cos_max + 1;
> + r->default_ctrl = MAX_MBA_BW_AMD;
> +
> + /* AMD does not use delay. Set delay_linear to false by default */

You don't need to write in the comment *what* you do - that's obvious.
"AMD does not use delay" is more than enough.

> + r->membw.delay_linear = false;
> +
> + /* FIX ME - May need to be read from MSR */

FIX ME?

> + r->membw.min_bw = 0;
> + r->membw.bw_gran = 1;
> + /* Max value is 2048, Data width should be 4 in decimal */
> + r->data_width = 4;
> +
> + r->alloc_capable = true;
> + r->alloc_enabled = true;
> +
> + return true;
> +}
> +
>  static void rdt_get_cache_alloc_cfg(int idx, struct rdt_resource *r)
>  {
>   union cpuid_0x10_1_eax eax;
> @@ -339,6 +367,16 @@ static int get_cache_id(int cpu, int level)
>   return -1;
>  }
>  
> +static void
> +mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource 
> *r)
> +{
> + unsigned int i;
> +
> + /*  Write the bw values for mba. */

That's an obvious comment. Drop it.

> + for (i = m->low; i < m->high; i++)
> + wrmsrl(r->msr_base + i, d->ctrl_val[i]);
> +}
> +
>  /*
>   * Map the memory b/w percentage value to delay values
>   * that can be written to QOS_MSRs.
> @@ -792,8 +830,12 @@ static bool __init rdt_cpu_has(int flag)
>  
>  static __init bool rdt_mba_config(void)
>  {
> - if (rdt_cpu_has(X86_FEATURE_MBA))
> - return rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
> + if (rdt_cpu_has(X86_FEATURE_MBA)) {

Save yourself an indentation level:

if (!rdt_cpu_has(X86_FEATURE_MBA))
return false;

if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
...


> + return 
> rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
> + else if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> + return 
> rdt_get_mem_config_amd(&rdt_resources_all[RDT_RESOURCE_MBA]);
> + }
>  
>   return false;
>  }

...

> diff --git a/arch/x86/kernel/cpu/resctrl.h b/arch/x86/kernel/cpu/resctrl.h
> index 102bcffbefd7..54ba21b7de2c 100644
> --- a/arch/x86/kernel/cpu/resctrl.h
>

Re: [PATCH v8 06/13] arch/resctrl: Initialize the resource functions that are different

2018-11-20 Thread Borislav Petkov

On Tue, Nov 20, 2018 at 10:59:18AM +0100, Borislav Petkov wrote:
> So I'm wondering: instead of having mba_wrmsr_intel() and
> mba_wrmsr_amd() and adding those per-vendor initialization functions,
> why don't you push down the vendor differentiation into mba_wrmsr()?
> 
> Then in that function you do
> 
>   if (vendor == X86_VENDOR_INTEL)
>   __mba_wrmsr_intel();
>   else if (vendor == X86_VENDOR_AMD)
>   __mba_wrmsr_amd();
> 
> and so on and then you don't have to do any of that initialization dance
> here and the struct rdt_resource assignment for the MBA will remain
> nicely similar to the other ones...
> 
> Hmmm?

Yeah, after having look at the patchset further, that might not be a
good idea as you need to assign more per-vendor stuff than just an MSR
accessor function...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 06/13] arch/resctrl: Initialize the resource functions that are different

2018-11-20 Thread Borislav Petkov

On Fri, Nov 16, 2018 at 08:54:32PM +, Moger, Babu wrote:
> Initialize the resource functions that are different between the
> vendors. Some features are initialized differently between the vendors.
> Add _intel suffix to Intel specific functions.
> 
> For example, MBA feature varies significantly between Intel and AMD.
> Separate the initialization of these resource functions. That way we
> can easily add AMD's functions later.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/kernel/cpu/resctrl.c | 34 +++
>  arch/x86/kernel/cpu/resctrl.h |  8 --
>  arch/x86/kernel/cpu/resctrl_ctrlmondata.c |  4 +--
>  3 files changed, 37 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl.c b/arch/x86/kernel/cpu/resctrl.c
> index 18c8222f326c..eeb7e0e4883e 100644
> --- a/arch/x86/kernel/cpu/resctrl.c
> +++ b/arch/x86/kernel/cpu/resctrl.c
> @@ -57,7 +57,8 @@ int max_name_width, max_data_width;
>  bool rdt_alloc_capable;
>  
>  static void
> -mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
> +mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
> + struct rdt_resource *r);
>  static void
>  cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r);
>  
> @@ -171,10 +172,7 @@ struct rdt_resource rdt_resources_all[] = {
>   .rid= RDT_RESOURCE_MBA,
>   .name   = "MB",
>   .domains= domain_init(RDT_RESOURCE_MBA),
> - .msr_base   = IA32_MBA_THRTL_BASE,
> - .msr_update = mba_wrmsr,
>   .cache_level= 3,
> - .parse_ctrlval  = parse_bw,
>   .format_str = "%d=%*u",
>   .fflags = RFTYPE_RES_MB,
>   },
> @@ -356,7 +354,8 @@ u32 delay_bw_map(unsigned long bw, struct rdt_resource *r)
>  }
>  
>  static void
> -mba_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
> +mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
> + struct rdt_resource *r)
>  {
>   unsigned int i;
>  
> @@ -873,6 +872,25 @@ static __init bool get_rdt_resources(void)
>   return (rdt_mon_capable || rdt_alloc_capable);
>  }
>  
> +static __init void rdt_init_res_defs_intel(void)
> +{
> + struct rdt_resource *r;
> +
> + for_each_rdt_resource(r) {
> + if (r->rid == RDT_RESOURCE_MBA) {
> + r->msr_base = IA32_MBA_THRTL_BASE;
> + r->msr_update = mba_wrmsr_intel;
> + r->parse_ctrlval = parse_bw_intel;
> + }
> + }
> +}
> +
> +static __init void rdt_init_res_defs(void)
> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> + rdt_init_res_defs_intel();
> +}

So I'm wondering: instead of having mba_wrmsr_intel() and
mba_wrmsr_amd() and adding those per-vendor initialization functions,
why don't you push down the vendor differentiation into mba_wrmsr()?

Then in that function you do

if (vendor == X86_VENDOR_INTEL)
__mba_wrmsr_intel();
else if (vendor == X86_VENDOR_AMD)
__mba_wrmsr_amd();

and so on and then you don't have to do any of that initialization dance
here and the struct rdt_resource assignment for the MBA will remain
nicely similar to the other ones...

Hmmm?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 05/13] arch/resctrl: Rename config parameter INTEL_RDT to RESCTRL

2018-11-20 Thread Borislav Petkov

On Fri, Nov 16, 2018 at 08:54:30PM +, Moger, Babu wrote:
> Now Resource control feature is supported by both Intel and AMD.
> Rename the INTEL_RDT to vendor neutral RESCTRL.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/Kconfig | 4 ++--
>  arch/x86/include/asm/resctrl_sched.h | 4 ++--
>  arch/x86/kernel/cpu/Makefile | 4 ++--
>  include/linux/sched.h| 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 1a0be022f91d..4aae7aba4d61 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -445,8 +445,8 @@ config RETPOLINE
> code are eliminated. Since this includes the syscall entry path,
> it is not entirely pointless.
>  
> -config INTEL_RDT
> - bool "Intel Resource Director Technology support"
> +config RESCTRL
> + bool "Resource Control feature support"

s/feature // - they're all features. :)

>   default n
>   depends on X86 && CPU_SUP_INTEL

Patch 9/13 is begging to be merged with this one.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 04/13] arch/resctrl: Bring all the macros to resctrl.h

2018-11-20 Thread Borislav Petkov

On Fri, Nov 16, 2018 at 08:54:28PM +, Moger, Babu wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl_monitor.c 
> b/arch/x86/kernel/cpu/resctrl_monitor.c
> index 6d654f7b0eba..9fd62263dabd 100644
> --- a/arch/x86/kernel/cpu/resctrl_monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl_monitor.c
> @@ -28,9 +28,6 @@
>  #include 
>  #include "resctrl.h"
>  
> -#define MSR_IA32_QM_CTR  0x0c8e
> -#define MSR_IA32_QM_EVTSEL   0x0c8d
> -
>  struct rmid_entry {
>   u32 rmid;
>   int busy;
> @@ -97,8 +94,8 @@ static u64 __rmid_read(u32 rmid, u32 eventid)
>* IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
>* are error bits.
>*/
> - wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
> - rdmsrl(MSR_IA32_QM_CTR, val);
> + wrmsr(IA32_QM_EVTSEL, eventid, rmid);
> + rdmsrl(IA32_QM_CTR, val);

Well, if you have a look at arch/x86/include/asm/msr-index.h, you'll see
that all MSR defines are prefixed with "MSR_" so dropping that prefix
here loses information and staring at the define doesn't tell me what it
is.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 03/13] arch/resctrl: Re-arrange RDT init code

2018-11-20 Thread Borislav Petkov

On Fri, Nov 16, 2018 at 08:54:26PM +, Moger, Babu wrote:
> Separate the call sequence for rdt_quirks and MBA feature.
> This is in preparation to handle vendor differences in these
> call sequences.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/kernel/cpu/resctrl.c | 27 +--
>  1 file changed, 21 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl.c b/arch/x86/kernel/cpu/resctrl.c
> index 5d526dc45751..4cea275c7c57 100644
> --- a/arch/x86/kernel/cpu/resctrl.c
> +++ b/arch/x86/kernel/cpu/resctrl.c
> @@ -794,6 +794,14 @@ static bool __init rdt_cpu_has(int flag)
>   return ret;
>  }

Just nitpicks:

> +static __init bool rdt_mba_config(void)

That function doesn't have a verb in its name but it needs to have one
stating what it does. You could do

mv rdt_get_mem_config() -> __rdt_get_mem_config()
mv rdt_mba_config() -> rdt_get_mem_config()

to have the hierarchy of what calls what. And then the AMD variant will
be called __rdt_get_mem_config_amd().

Also, those are all static functions so you can just as well drop the
"rdt" prefix, I'd say.

> +{
> + if (rdt_cpu_has(X86_FEATURE_MBA))
> + return rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);
> +
> + return false;
> +}
> +
>  static __init bool get_rdt_alloc_resources(void)
>  {
>   bool ret = false;
> @@ -818,10 +826,9 @@ static __init bool get_rdt_alloc_resources(void)
>   ret = true;
>   }
>  
> - if (rdt_cpu_has(X86_FEATURE_MBA)) {
> - if (rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]))
> - ret = true;
> - }
> + if (rdt_mba_config())
> + ret = true;
> +
>   return ret;
>  }
>  
> @@ -840,7 +847,7 @@ static __init bool get_rdt_mon_resources(void)
>   return !rdt_get_mon_l3_config(&rdt_resources_all[RDT_RESOURCE_L3]);
>  }
>  
> -static __init void rdt_quirks(void)
> +static __init void rdt_quirks_intel(void)
>  {
>   switch (boot_cpu_data.x86_model) {
>   case INTEL_FAM6_HASWELL_X:
> @@ -855,9 +862,14 @@ static __init void rdt_quirks(void)
>   }
>  }
>  
> +static __init void rdt_quirks(void)

Those functions also need to have a verb in the name stating what they
do.

> +{
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> + rdt_quirks_intel();
> +}
> +
>  static __init bool get_rdt_resources(void)
>  {
> - rdt_quirks();
>   rdt_alloc_capable = get_rdt_alloc_resources();
>   rdt_mon_capable = get_rdt_mon_resources();
>  
> @@ -871,6 +883,9 @@ static int __init resctrl_late_init(void)
>   struct rdt_resource *r;
>   int state, ret;
>  
> + /* Run quirks first */
> + rdt_quirks();

If the comment wasn't there, seeing "rdt_quirks();" doesn't say much and
makes me go look at what that function does.

> +
>   if (!get_rdt_resources())

Unlike here, where it is clear that this gets the rdt resources.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 01/13] arch/resctrl: Start renaming the rdt files to more generic names

2018-11-19 Thread Borislav Petkov

On Mon, Nov 19, 2018 at 08:11:36PM +, Moger, Babu wrote:
> Changed core.c and internel.h to res.c and res.h respectively. Both of
> these files are about resource definitions and initialization.

I guess but the core.c thing we do a lot in the kernel:

$ git ls-files | grep -E "/core\.c" | wc -l
100

which denotes where the core functionality is located. res.c
doesn't tell me that. res.h doesn't tell me either that it is an
resctrl-internal header...

IOW, calling it core.c and internal.h gives you an additional
information what those files contain. And I think that's useful.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v8 01/13] arch/resctrl: Start renaming the rdt files to more generic names

2018-11-18 Thread Borislav Petkov

Just to state it here on the thread: prefix should be "x86/resctrl" as
we said earlier. But that will be addressed in the next iteration, as we
agreed offlist.

On Fri, Nov 16, 2018 at 08:54:22PM +, Moger, Babu wrote:
> New generation of AMD processors start supporting RDT(or QOS) features.
> With more than one vendors supporting these features, it seems more
> appropriate to rename these files.
> 
> Changed intel_rdt to resctrl where applicable.
> 
> Signed-off-by: Babu Moger 
> Reviewed-by: Borislav Petkov 
> ---
>  arch/x86/include/asm/{intel_rdt_sched.h => resctrl_sched.h} | 0
>  arch/x86/kernel/cpu/Makefile| 6 +++---
>  arch/x86/kernel/cpu/{intel_rdt.c => resctrl.c}  | 4 ++--
>  arch/x86/kernel/cpu/{intel_rdt.h => resctrl.h}  | 6 +++---
>  .../cpu/{intel_rdt_ctrlmondata.c => resctrl_ctrlmondata.c}  | 2 +-
>  .../kernel/cpu/{intel_rdt_monitor.c => resctrl_monitor.c}   | 2 +-
>  .../cpu/{intel_rdt_pseudo_lock.c => resctrl_pseudo_lock.c}  | 6 +++---
>  ..._rdt_pseudo_lock_event.h => resctrl_pseudo_lock_event.h} | 2 +-
>  .../kernel/cpu/{intel_rdt_rdtgroup.c => resctrl_rdtgroup.c} | 4 ++--
>  arch/x86/kernel/process_32.c| 2 +-
>  arch/x86/kernel/process_64.c| 2 +-
>  11 files changed, 18 insertions(+), 18 deletions(-)
>  rename arch/x86/include/asm/{intel_rdt_sched.h => resctrl_sched.h} (100%)
>  rename arch/x86/kernel/cpu/{intel_rdt.c => resctrl.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt.h => resctrl.h} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_ctrlmondata.c => 
> resctrl_ctrlmondata.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_monitor.c => resctrl_monitor.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_pseudo_lock.c => 
> resctrl_pseudo_lock.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_pseudo_lock_event.h => 
> resctrl_pseudo_lock_event.h} (95%)
>  rename arch/x86/kernel/cpu/{intel_rdt_rdtgroup.c => resctrl_rdtgroup.c} (99%)

Ok, looking at this more, it looks like this:

ls -l arch/x86/kernel/cpu/resctrl*
 arch/x86/kernel/cpu/resctrl.c
 arch/x86/kernel/cpu/resctrl_ctrlmondata.c
 arch/x86/kernel/cpu/resctrl.h
 arch/x86/kernel/cpu/resctrl_monitor.c
 arch/x86/kernel/cpu/resctrl_pseudo_lock.c
 arch/x86/kernel/cpu/resctrl_pseudo_lock_event.h
 arch/x86/kernel/cpu/resctrl_rdtgroup.c

Now, it looks to me like the whole resctrl thing begs for its own dir:

 arch/x86/kernel/cpu/resctrl/core.c
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c
 arch/x86/kernel/cpu/resctrl/internal.h
 arch/x86/kernel/cpu/resctrl/monitor.c
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c
 arch/x86/kernel/cpu/resctrl/pseudo_lock_event.h
 arch/x86/kernel/cpu/resctrl/rdtgroup.c

This way, the whole RDT machinery will be confined to its own directory.

Hmm.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov

On Thu, Nov 15, 2018 at 01:01:17PM +0100, David Hildenbrand wrote:
> Just saying that "I'm not the first to do it, don't hit me with a stick" :)

:-)

> Indeed. And we still have without makedumpfile. I think you are aware of
> this, but I'll explain it just for consistency: PG_hwpoison

No, I appreciate an explanation very much! So thanks for that. :)

> At some point we detect a HW error and mask a page as PG_hwpoison.
> 
> makedumpfile knows how to treat that flag and can exclude it from the
> dump (== not access it). No crash.
> 
> kdump itself has no clue about old "struct pages". Especially:
> a) Where they are located in memory (e.g. SPARSE)
> b) What their format is ("where are the flags")
> c) What the meaning of flags is ("what does bit X mean")
> 
> In order to know such information, we would have to do parsing of quite
> some information inside the kernel in kdump. Basically what makedumpfile
> does just now. Is this feasible? I don't think so.
> 
> So we would need another approach to communicate such information as you
> said. I can't think of any, but if anybody reading this has an idea,
> please speak up. I am interested.

Yeah but that ship has sailed. And even if we had a great idea, we'd
have to support kdump before and after the idea. And that would be a
serious mess.

And if you have a huge box with gazillion piles of memory and an alpha
particle passes through a bunch of them on its way down to the earth's
core, and while doing so, flips a bunch of bits, you need to go and
collect all those regions and update some list which you then need to
shove into the second kernel.

And you probably need to do all that through perhaps a piece of memory
which is used for communication between first and second kernel and that
list better fit in there, or you need to realloc. And that piece of
memory's layout needs to be properly defined so that the second kernel
can parse it correctly.

And so on...

> The *only* way right now we would have to handle such scenarios:
> 
> 1. While dumping memory and we get a machine check, fake reading a zero
> page instead of crashing.
> 2. While dumping memory and we get a fault, fake reading a zero page
> instead of crashing.

Yap.

> Indeed, and the basic design is to export these flags. (let's say
> "unfortunately", being able to handle such stuff in kdump directly would
> be the dream).

Well, AFAICT, the minimum work you need to always do before starting the
dumping is somehow generate that list of pages or ranges to not dump.
And that work needs to be done by the first or the second kernel, I'd
say.

If the first kernel would do it, then you'd have to probably have
callbacks to certain operations which go and add ranges or pages to
exclude, to a list which is then readily accessible to the second
kernel. Which means, when you reserve memory for the second kernel,
you'd have to reserve memory also for such a list.

But then what do you do when that memory gets filled up...?

So I guess exporting those things in vmcoreinfo is probably the only
thing we *can* do in the end.

Oh well, enough rambling... :)

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov

On Thu, Nov 15, 2018 at 01:11:02PM +0100, Michal Hocko wrote:
> I am not familiar with kexec to judge this particular patch but we
> cannot simply define any range for these pages (same as for hwpoison
> ones) because they can be almost anywhere in the available memory range.
> Then there can be countless of them. There is no other way to rule them
> out but to check the page state.

I guess, especially if it is a monster box with a lot of memory in it.

> I am not really sure what is the concern here exactly. Kdump is so
> closly tight to the specific kernel version that the api exported
> specifically for its purpose cannot be seriously considered an ABI.
> Kdump has to adopt all the time.

Right...

Except, when people start ogling vmcoreinfo for other things and start
exporting all kinds of kernel internals in there, my alarm bells start
ringing.

But ok, kdump *is* special and I guess that's fine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov

On Thu, Nov 15, 2018 at 12:20:40PM +0100, David Hildenbrand wrote:
> Sorry to say, but that is the current practice without which
> makedumpfile would not be able to work at all. (exclude user pages,
> exclude page cache, exclude buddy pages). Let's not reinvent the wheel
> here. This is how dumping works forever.

Sorry, but "we've always done this in the past" doesn't make it better.

> I don't see how there should be "set of pages which do not have
> PG_offline".

It doesn't have to be a set of pages. Think a (mmconfig perhaps) region
which the kdump kernel should completely skip because poking in it in
the kdump kernel, causes all kinds of havoc like machine checks. etc.
We've had and still have one issue like that.

But let me clarify my note: I don't want to be discussing with you the
design of makedumpfile and how it should or should not work - that ship
has already sailed. Apparently there are valid reasons to do it this
way.

I was *simply* stating that it feels wrong to export mm flags like that.

But as I said already, that is mm guys' call and looking at how we're
already exporting a bunch of stuff in the vmcoreinfo - including other
mm flags - I guess one more flag doesn't matter anymore.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH RFC 3/6] kexec: export PG_offline to VMCOREINFO

2018-11-15 Thread Borislav Petkov

On Thu, Nov 15, 2018 at 02:19:23PM +0800, Dave Young wrote:
> It would be good to copy some background info from cover letter to the
> patch description so that we can get better understanding why this is
> needed now.
> 
> BTW, Lianbo is working on a documentation of the vmcoreinfo exported
> fields. Ccing him so that he is aware of this.
> 
> Also cc Boris,  although I do not want the doc changes blocks this
> he might have different opinion :)

Yeah, my initial reaction is that exporting an mm-internal flag to
userspace is a no-no.

What would be better, IMHO, is having a general method of telling the
kdump kernel - maybe ranges of physical addresses - which to skip.

Because the moment there's a set of pages which do not have PG_offline
set but kdump would still like to skip, this breaks.

But that's mm guys' call.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 06/27] x86/cet: Control protection exception handler

2018-11-14 Thread Borislav Petkov

On Wed, Nov 14, 2018 at 12:19:42PM -0800, Yu-cheng Yu wrote:
> Yes, I was not sure if the addition should follow the existing style (which 
> does
> not have identifier names).  What do you think is right?

Yeah, we've converted them all now to named params:

https://git.kernel.org/tip/8e1599fcac2efda8b7d433ef69d2492f0b351e3f

It probably would be easier if you redo your patchset ontop of current
tip/master.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v7 03/13] arch/x86: Re-arrange RDT init code

2018-11-14 Thread Borislav Petkov

On Fri, Nov 09, 2018 at 08:52:29PM +, Moger, Babu wrote:
> Separate the call sequence for rdt_quirks and MBA feature.
> This is in preparation to handle vendor differences in these
> call sequences.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/kernel/cpu/resctrl.c | 29 +++--
>  1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl.c b/arch/x86/kernel/cpu/resctrl.c
> index 5d526dc45751..fd5320dc 100644
> --- a/arch/x86/kernel/cpu/resctrl.c
> +++ b/arch/x86/kernel/cpu/resctrl.c
> @@ -794,6 +794,16 @@ static bool __init rdt_cpu_has(int flag)
>   return ret;
>  }
>  
> +static __init bool rdt_mba_config(void)
> +{
> + if (rdt_cpu_has(X86_FEATURE_MBA)) {
> + if (rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]))
> + return true;

Or simpler:

if (rdt_cpu_has(X86_FEATURE_MBA))
return rdt_get_mem_config(&rdt_resources_all[RDT_RESOURCE_MBA]);

return false;

as rdt_get_mem_config() already returns bool.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 06/27] x86/cet: Control protection exception handler

2018-11-14 Thread Borislav Petkov

That subject needs a verb:

Subject: [PATCH v5 06/27] x86/cet: Add control protection exception handler

On Thu, Oct 11, 2018 at 08:15:02AM -0700, Yu-cheng Yu wrote:
> A control protection exception is triggered when a control flow transfer
> attempt violated shadow stack or indirect branch tracking constraints.
> For example, the return address for a RET instruction differs from the
> safe copy on the shadow stack; or a JMP instruction arrives at a non-
> ENDBR instruction.
> 
> The control protection exception handler works in a similar way as the
> general protection fault handler.
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/entry/entry_64.S  |  2 +-
>  arch/x86/include/asm/traps.h   |  3 ++
>  arch/x86/kernel/idt.c  |  4 ++
>  arch/x86/kernel/signal_compat.c|  2 +-
>  arch/x86/kernel/traps.c| 64 ++
>  include/uapi/asm-generic/siginfo.h |  3 +-
>  6 files changed, 75 insertions(+), 3 deletions(-)

A *lot* of style problems here. Please use checkpatch and then common
sense to check your patches before sending. All those below are valid,
AFAICT:

WARNING: function definition argument 'struct pt_regs *' should also have an 
identifier name
#76: FILE: arch/x86/include/asm/traps.h:81:
+dotraplinkage void do_control_protection(struct pt_regs *, long);

WARNING: function definition argument 'long' should also have an identifier name
#76: FILE: arch/x86/include/asm/traps.h:81:
+dotraplinkage void do_control_protection(struct pt_regs *, long);

WARNING: static const char * array should probably be static const char * const
#124: FILE: arch/x86/kernel/traps.c:581:
+static const char *control_protection_err[] =

ERROR: that open brace { should be on the previous line
#125: FILE: arch/x86/kernel/traps.c:582:
+static const char *control_protection_err[] =
+{

WARNING: quoted string split across lines
#158: FILE: arch/x86/kernel/traps.c:615:
+   WARN_ONCE(1, "CET is disabled but got control "
+ "protection fault\n");

WARNING: Prefer printk_ratelimited or pr__ratelimited to printk_ratelimit
#165: FILE: arch/x86/kernel/traps.c:622:
+   printk_ratelimit()) {

WARNING: Avoid logging continuation uses where feasible
#176: FILE: arch/x86/kernel/traps.c:633:
+   pr_cont("\n");

ERROR: "(foo*)" should be "(foo *)"
#183: FILE: arch/x86/kernel/traps.c:640:
+   info.si_addr= (void __user*)uprobe_get_trap_addr(regs);


And now that patch doesn't even build anymore because of the siginfo
changes which came in during the merge window. I guess I'll wait for
your v6 patchset.

---
arch/x86/kernel/traps.c: In function ‘do_control_protection’:
arch/x86/kernel/traps.c:627:16: error: passing argument 1 of ‘clear_siginfo’ 
from incompatible pointer type [-Werror=incompatible-pointer-types]
  clear_siginfo(&info);
^
In file included from ./include/linux/sched/signal.h:6,
 from ./include/linux/ptrace.h:7,
 from ./include/linux/ftrace.h:14,
 from ./include/linux/kprobes.h:42,
 from arch/x86/kernel/traps.c:19:
./include/linux/signal.h:20:52: note: expected ‘kernel_siginfo_t *’ {aka 
‘struct kernel_siginfo *’} but argument is of type ‘siginfo_t *’ {aka ‘struct 
siginfo *’}
 static inline void clear_siginfo(kernel_siginfo_t *info)
  ~~^~~~
arch/x86/kernel/traps.c:632:26: error: passing argument 2 of ‘force_sig_info’ 
from incompatible pointer type [-Werror=incompatible-pointer-types]
  force_sig_info(SIGSEGV, &info, tsk);
  ^
In file included from ./include/linux/ptrace.h:7,
 from ./include/linux/ftrace.h:14,
 from ./include/linux/kprobes.h:42,
 from arch/x86/kernel/traps.c:19:
./include/linux/sched/signal.h:327:32: note: expected ‘struct kernel_siginfo *’ 
but argument is of type ‘siginfo_t *’ {aka ‘struct siginfo *’}
 extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
^~~
cc1: some warnings being treated as errors
make[2]: *** [scripts/Makefile.build:291: arch/x86/kernel/traps.o] Error 1
make[1]: *** [scripts/Makefile.build:516: arch/x86/kernel] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1060: arch/x86] Error 2
make: *** Waiting for unfinished jobs

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v7 01/13] arch/x86: Start renaming the rdt files to more generic names

2018-11-13 Thread Borislav Petkov

On Tue, Nov 13, 2018 at 09:35:40PM +, Yu, Fenghua wrote:
> Is "x86/resctrl" a better subject prefix?

Doh, of course. Diffstat is all arch/x86/.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 05/27] Documentation/x86: Add CET description

2018-11-13 Thread Borislav Petkov

On Thu, Oct 11, 2018 at 08:15:01AM -0700, Yu-cheng Yu wrote:
> Explain how CET works and the no_cet_shstk/no_cet_ibt kernel
> parameters.
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  .../admin-guide/kernel-parameters.txt |   6 +
>  Documentation/index.rst   |   1 +
>  Documentation/x86/index.rst   |  11 +
>  Documentation/x86/intel_cet.rst   | 266 ++
>  4 files changed, 284 insertions(+)
>  create mode 100644 Documentation/x86/index.rst
>  create mode 100644 Documentation/x86/intel_cet.rst

So this patch should probably come first in the series so that a reader
can know what to expect...

> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 92eb1f42240d..3854423f7c86 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2764,6 +2764,12 @@
>   noexec=on: enable non-executable mappings (default)
>   noexec=off: disable non-executable mappings
>  
> + no_cet_ibt  [X86-64] Disable indirect branch tracking for user-mode
> + applications
> +
> + no_cet_shstk[X86-64] Disable shadow stack support for user-mode
> + applications
> +
>   nosmap  [X86]
>   Disable SMAP (Supervisor Mode Access Prevention)
>   even if it is supported by processor.
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 5db7e87c7cb1..1cdc139adb40 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst

Please integrate scripts/checkpatch.pl into your patch creation
workflow. Some of the warnings/errors *actually* make sense:

WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#76: FILE: Documentation/x86/index.rst:1:
+===

WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
#93: FILE: Documentation/x86/intel_cet.rst:1:
+=

> @@ -104,6 +104,7 @@ implementation.
> :maxdepth: 2
>  
> sh/index
> +   x86/index
>  
>  Filesystem Documentation
>  
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> new file mode 100644
> index ..9c34d8cbc8f0
> --- /dev/null
> +++ b/Documentation/x86/index.rst
> @@ -0,0 +1,11 @@
> +===
> +X86 Documentation
> +===
> +
> +Control Flow Enforcement
> +
> +
> +.. toctree::
> +   :maxdepth: 1
> +
> +   intel_cet
> diff --git a/Documentation/x86/intel_cet.rst b/Documentation/x86/intel_cet.rst
> new file mode 100644
> index ..946f4802a51f
> --- /dev/null
> +++ b/Documentation/x86/intel_cet.rst
> @@ -0,0 +1,266 @@
> +=
> +Control Flow Enforcement Technology (CET)
> +=
> +
> +[1] Overview
> +
> +
> +Control Flow Enforcement Technology (CET) provides protection against
> +return/jump-oriented programming (ROP) attacks.  It can be implemented
> +to protect both the kernel and applications.  In the first phase,
> +only the user-mode protection is implemented on the 64-bit kernel.

s/the//  is implemented in 64-bit mode.

> +However, 32-bit applications are supported under the compatibility
> +mode.

Drop "However":

"32-bit applications are, of course, supported in compatibility mode."

> +
> +CET includes shadow stack (SHSTK) and indirect branch tracking (IBT).

"CET introduces two a shadow stack and an indirect branch tracking mechanism."

> +The SHSTK is a secondary stack allocated from memory.  The processor

s/The//

> +automatically pushes/pops a secure copy to the SHSTK every return
> +address and,

that reads funny - pls reorganize. Also, what is a "secure copy"?

You mean a copy of every return address which software cannot access?

> by comparing the secure copy to the program stack copy,
> +verifies function returns are as intended. 

 ... have not been corrupted/modified."

> The IBT verifies all
> +indirect CALL/JMP targets are intended and marked by the compiler with
> +'ENDBR' op codes.

"opcode" - one word. And before you use "ENDBR" you need to explain it
above what it is.

/me reads further... encounters ENDBR's definition...

ah, ok, so you should say something like

"... and marked by the compiler with the ENDBR opcode (see below)."

> +
> +There are two kernel configuration options:
> +
> +INTEL_X86_SHADOW_STACK_USER, and
> +INTEL_X86_BRANCH_TRACKING_USER.
> +
> +To build a CET-enabled kernel, Binutils v2.31 and GCC v8.1 or later
> +are required.  To build a CET-enabled application, GLIBC v2.28 or
> +later is also required.
> +
> +There are two command-line options for disabling CET features:
> +
> +no_cet_shstk - disables SHSTK, and
> +

Re: [PATCH v7 02/13] arch/x86: Rename the RDT functions and definitions

2018-11-12 Thread Borislav Petkov

On Mon, Nov 12, 2018 at 07:25:02PM +, Moger, Babu wrote:
> >> @@ -637,10 +637,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
> >> *
> >> * For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
> >> */
> >> -  intel_cqm_threshold = boot_cpu_data.x86_cache_size * 1024 / r->num_rmid;
> >> +  resctrl_cqm_threshold = boot_cpu_data.x86_cache_size * 1024 /
> >> +  r->num_rmid;
> > 
> > No need to break that line here.
> 
> Without the line break, checkpatch complains for "line over 80
> characters".  Do you think we can ignore those?

To quote from the tip handbook which is in preparation right now:

"+The 80 character rule is not a strict rule, so please use common sense when
 +breaking lines."

In this particular case, it is more readable IMO to leave the line unbroken:

resctrl_cqm_threshold = boot_cpu_data.x86_cache_size * 1024 / 
r->num_rmid;

What is even more readable though is:

unsigned int cl_size = boot_cpu_data.x86_cache_size;

...

resctrl_cqm_threshold = cl_size * 1024 / r->num_rmid;

and now you have a win-win situation. :)

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v7 02/13] arch/x86: Rename the RDT functions and definitions

2018-11-12 Thread Borislav Petkov

On Fri, Nov 09, 2018 at 08:52:27PM +, Moger, Babu wrote:
> As AMD is starting to support RDT(or QOS) features, rename
> the RDT functions and definitions to more generic names.
> 
> Replace intel_rdt to resctrl where applicable.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/include/asm/resctrl_sched.h   | 24 
>  arch/x86/kernel/cpu/resctrl.c  | 26 +-
>  arch/x86/kernel/cpu/resctrl.h  |  2 +-
>  arch/x86/kernel/cpu/resctrl_monitor.c  | 11 ++-
>  arch/x86/kernel/cpu/resctrl_rdtgroup.c | 10 +-
>  arch/x86/kernel/process_32.c   |  2 +-
>  arch/x86/kernel/process_64.c   |  2 +-
>  7 files changed, 39 insertions(+), 38 deletions(-)

As with patch 1, pls fix subject prefix.

> @@ -637,10 +637,11 @@ int rdt_get_mon_l3_config(struct rdt_resource *r)
>*
>* For a 35MB LLC and 56 RMIDs, this is ~1.8% of the LLC.
>*/
> - intel_cqm_threshold = boot_cpu_data.x86_cache_size * 1024 / r->num_rmid;
> + resctrl_cqm_threshold = boot_cpu_data.x86_cache_size * 1024 /
> + r->num_rmid;

No need to break that line here.

With that addressed and FWIW:

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v7 01/13] arch/x86: Start renaming the rdt files to more generic names

2018-11-12 Thread Borislav Petkov



 Subject: [PATCH v7 01/13] arch/x86: Start renaming the rdt files to more 
generic names

I guess the subject prefix for all those should be "arch/resctrl:" or so now.

On Fri, Nov 09, 2018 at 08:52:25PM +, Moger, Babu wrote:
> New generation of AMD processors start supporting RDT(or QOS) features.
> With more than one vendors supporting these features, it seems more
> appropriate to rename these files.
> 
> Changed intel_rdt to resctrl where applicable.
> 
> Signed-off-by: Babu Moger 
> ---
>  arch/x86/include/asm/{intel_rdt_sched.h => resctrl_sched.h} | 0
>  arch/x86/kernel/cpu/Makefile| 6 +++---
>  arch/x86/kernel/cpu/{intel_rdt.c => resctrl.c}  | 4 ++--
>  arch/x86/kernel/cpu/{intel_rdt.h => resctrl.h}  | 6 +++---
>  .../cpu/{intel_rdt_ctrlmondata.c => resctrl_ctrlmondata.c}  | 2 +-
>  .../kernel/cpu/{intel_rdt_monitor.c => resctrl_monitor.c}   | 2 +-
>  .../cpu/{intel_rdt_pseudo_lock.c => resctrl_pseudo_lock.c}  | 6 +++---
>  ..._rdt_pseudo_lock_event.h => resctrl_pseudo_lock_event.h} | 2 +-
>  .../kernel/cpu/{intel_rdt_rdtgroup.c => resctrl_rdtgroup.c} | 4 ++--
>  arch/x86/kernel/process_32.c| 2 +-
>  arch/x86/kernel/process_64.c| 2 +-
>  11 files changed, 18 insertions(+), 18 deletions(-)
>  rename arch/x86/include/asm/{intel_rdt_sched.h => resctrl_sched.h} (100%)
>  rename arch/x86/kernel/cpu/{intel_rdt.c => resctrl.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt.h => resctrl.h} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_ctrlmondata.c => 
> resctrl_ctrlmondata.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_monitor.c => resctrl_monitor.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_pseudo_lock.c => 
> resctrl_pseudo_lock.c} (99%)
>  rename arch/x86/kernel/cpu/{intel_rdt_pseudo_lock_event.h => 
> resctrl_pseudo_lock_event.h} (95%)
>  rename arch/x86/kernel/cpu/{intel_rdt_rdtgroup.c => resctrl_rdtgroup.c} (99%)

Other than that and FWIW:

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 04/27] x86/fpu/xstate: Add XSAVES system states for shadow stack

2018-11-08 Thread Borislav Petkov

On Thu, Nov 08, 2018 at 12:40:02PM -0800, Yu-cheng Yu wrote:
> In fpu_init_system_xstate(), we test and clear features that are not enabled.
> There we depend on the order of these elements.  This is the tenth "unknown
> xstate feature".

Aha, those are *reserved* bits - not unused, in XCR0.

Do an s/unused/reserved/g pls.

Now let's see, you have 0 for the 10th bit which happens to be

#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */

too. And if we look at the code:

for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) {
if (!boot_cpu_has(xsave_cpuid_features[i]))
xfeatures_mask_all &= ~BIT_ULL(i);

guess what happens if i == 10.

I know, the subsequent & SUPPORTED_XFEATURES_MASK saves you from the
#GP but that's still not good enough. The loop should not even call
boot_cpu_has() for reserved feature bits.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 04/27] x86/fpu/xstate: Add XSAVES system states for shadow stack

2018-11-08 Thread Borislav Petkov

On Thu, Oct 11, 2018 at 08:15:00AM -0700, Yu-cheng Yu wrote:
> Intel Control-flow Enforcement Technology (CET) introduces the
> following MSRs into the XSAVES system states.
> 
> IA32_U_CET (user-mode CET settings),
> IA32_PL3_SSP (user-mode shadow stack),
> IA32_PL0_SSP (kernel-mode shadow stack),
> IA32_PL1_SSP (ring-1 shadow stack),
> IA32_PL2_SSP (ring-2 shadow stack).

And?

That commit message got chopped off here, it seems.

> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/fpu/types.h| 22 +
>  arch/x86/include/asm/fpu/xstate.h   |  4 +++-
>  arch/x86/include/uapi/asm/processor-flags.h |  2 ++
>  arch/x86/kernel/fpu/xstate.c| 10 ++
>  4 files changed, 37 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/fpu/types.h 
> b/arch/x86/include/asm/fpu/types.h
> index 202c53918ecf..e55d51d172f1 100644
> --- a/arch/x86/include/asm/fpu/types.h
> +++ b/arch/x86/include/asm/fpu/types.h
> @@ -114,6 +114,9 @@ enum xfeature {
>   XFEATURE_Hi16_ZMM,
>   XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
>   XFEATURE_PKRU,
> + XFEATURE_RESERVED,
> + XFEATURE_SHSTK_USER,
> + XFEATURE_SHSTK_KERNEL,
>  
>   XFEATURE_MAX,
>  };
> @@ -128,6 +131,8 @@ enum xfeature {
>  #define XFEATURE_MASK_Hi16_ZMM   (1 << XFEATURE_Hi16_ZMM)
>  #define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
>  #define XFEATURE_MASK_PKRU   (1 << XFEATURE_PKRU)
> +#define XFEATURE_MASK_SHSTK_USER (1 << XFEATURE_SHSTK_USER)
> +#define XFEATURE_MASK_SHSTK_KERNEL   (1 << XFEATURE_SHSTK_KERNEL)
>  
>  #define XFEATURE_MASK_FPSSE  (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
>  #define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
> @@ -229,6 +234,23 @@ struct pkru_state {
>   u32 pad;
>  } __packed;
>  
> +/*
> + * State component 11 is Control flow Enforcement user states

Why the Camel-cased naming?

"Control" then "flow" then capitalized again "Enforcement".

Fix all occurrences pls, especially the user-visible strings.

> + */
> +struct cet_user_state {
> + u64 u_cet;  /* user control flow settings */
> + u64 user_ssp;   /* user shadow stack pointer */

Prefix both with "usr_" instead.

> +} __packed;
> +
> +/*
> + * State component 12 is Control flow Enforcement kernel states
> + */
> +struct cet_kernel_state {
> + u64 kernel_ssp; /* kernel shadow stack */
> + u64 pl1_ssp;/* ring-1 shadow stack */
> + u64 pl2_ssp;/* ring-2 shadow stack */

Just write "privilege level" everywhere - not "ring".

Btw, do you see how the type and the name of all those other fields in
that file are tabulated? Except yours...

> +} __packed;
> +
>  struct xstate_header {
>   u64 xfeatures;
>   u64 xcomp_bv;
> diff --git a/arch/x86/include/asm/fpu/xstate.h 
> b/arch/x86/include/asm/fpu/xstate.h
> index d8e2ec99f635..18b60748a34d 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -28,7 +28,9 @@
> XFEATURE_MASK_Hi16_ZMM | \
> XFEATURE_MASK_PKRU | \
> XFEATURE_MASK_BNDREGS | \
> -   XFEATURE_MASK_BNDCSR)
> +   XFEATURE_MASK_BNDCSR | \
> +   XFEATURE_MASK_SHSTK_USER | \
> +   XFEATURE_MASK_SHSTK_KERNEL)
>  
>  #ifdef CONFIG_X86_64
>  #define REX_PREFIX   "0x48, "
> diff --git a/arch/x86/include/uapi/asm/processor-flags.h 
> b/arch/x86/include/uapi/asm/processor-flags.h
> index bcba3c643e63..25311ec4b731 100644
> --- a/arch/x86/include/uapi/asm/processor-flags.h
> +++ b/arch/x86/include/uapi/asm/processor-flags.h
> @@ -130,6 +130,8 @@
>  #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT)
>  #define X86_CR4_PKE_BIT  22 /* enable Protection Keys support */
>  #define X86_CR4_PKE  _BITUL(X86_CR4_PKE_BIT)
> +#define X86_CR4_CET_BIT  23 /* enable Control flow Enforcement */
> +#define X86_CR4_CET  _BITUL(X86_CR4_CET_BIT)
>  
>  /*
>   * x86-64 Task Priority Register, CR8
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 605ec6decf3e..ad36ea28bfd1 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -35,6 +35,9 @@ static const char *xfeature_names[] =
>   "Processor Trace (unused)"  ,
>   "Protection Keys User registers",
>   "unknown xstate feature",
> + "Control flow User registers"   ,
> + "Control flow Kernel registers" ,
> + "unknown xstate feature",

So there are two "unknown xstate feature" array elems now...

>  static short xsave_cpuid_features[] __initdata = {
> @@ -48,6 +51,9 @@ static short xsave_cpuid_features[] __initdata = {
>   X86_FEATURE_AVX512F,
>   X86_FE

Re: [PATCH v5 03/27] x86/fpu/xstate: Introduce XSAVES system states

2018-10-18 Thread Borislav Petkov

On Thu, Oct 18, 2018 at 11:31:25AM +0200, Pavel Machek wrote:
> We want readable sources, not neat ascii art everywhere.

And we want pink ponies.

Reverse xmas tree order is and has been the usual variable sorting in
the tip tree for years.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 03/27] x86/fpu/xstate: Introduce XSAVES system states

2018-10-18 Thread Borislav Petkov

On Wed, Oct 17, 2018 at 04:17:01PM -0700, Randy Dunlap wrote:
> I asked what I really wanted to know.

Then the answer is a bit better readability, I'd guess.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 03/27] x86/fpu/xstate: Introduce XSAVES system states

2018-10-17 Thread Borislav Petkov

On Wed, Oct 17, 2018 at 03:39:47PM -0700, Randy Dunlap wrote:
> Would you mind explaining this request? (requirement?)
> Other than to say that it is the preference of some maintainers,
> please say Why it is preferred.
> 
> and since the s above won't typically be the same length,
> it's not for variable name alignment, right?

Searching the net a little, it shows you have asked that question
before. So what is it you really wanna know?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 03/27] x86/fpu/xstate: Introduce XSAVES system states

2018-10-17 Thread Borislav Petkov

On Thu, Oct 11, 2018 at 08:14:59AM -0700, Yu-cheng Yu wrote:
> Control Flow Enforcement (CET) MSRs are XSAVES system states.

That sentence needs massaging. MSRs are system states?!?!

> To support CET, we introduce XSAVES system states first.

Pls drop the "we" in all commit messages and convert the tone to
impartial and passive.

Also, this commit message needs to explain *why* you're doing this - it
is too laconic.

> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/fpu/internal.h |  3 +-
>  arch/x86/include/asm/fpu/xstate.h   |  4 +-
>  arch/x86/kernel/fpu/core.c  |  6 +-
>  arch/x86/kernel/fpu/init.c  | 10 
>  arch/x86/kernel/fpu/xstate.c| 86 ++---
>  5 files changed, 62 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h 
> b/arch/x86/include/asm/fpu/internal.h
> index 02c4296478c8..9a5db5a63f60 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -45,7 +45,6 @@ extern void fpu__init_cpu_xstate(void);
>  extern void fpu__init_system(struct cpuinfo_x86 *c);
>  extern void fpu__init_check_bugs(void);
>  extern void fpu__resume_cpu(void);
> -extern u64 fpu__get_supported_xfeatures_mask(void);
>  
>  /*
>   * Debugging facility:
> @@ -93,7 +92,7 @@ static inline void fpstate_init_xstate(struct xregs_state 
> *xsave)
>* XRSTORS requires these bits set in xcomp_bv, or it will
>* trigger #GP:
>*/
> - xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | 
> xfeatures_mask_user;
> + xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT | xfeatures_mask_all;
>  }
>  
>  static inline void fpstate_init_fxstate(struct fxregs_state *fx)
> diff --git a/arch/x86/include/asm/fpu/xstate.h 
> b/arch/x86/include/asm/fpu/xstate.h
> index 76f83d2ac10e..d8e2ec99f635 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -19,9 +19,6 @@
>  #define XSAVE_YMM_SIZE   256
>  #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)
>  
> -/* Supervisor features */
> -#define XFEATURE_MASK_SUPERVISOR (XFEATURE_MASK_PT)
> -
>  /* All currently supported features */
>  #define SUPPORTED_XFEATURES_MASK (XFEATURE_MASK_FP | \
> XFEATURE_MASK_SSE | \
> @@ -40,6 +37,7 @@
>  #endif
>  
>  extern u64 xfeatures_mask_user;
> +extern u64 xfeatures_mask_all;

You have a bunch of places where you generate the system mask by doing
~xfeatures_mask_user.

Why not define

xfeatures_mask_system

instead and generate the _all mask at the places you need it by doing

xfeatures_mask_user | xfeatures_mask_system

?

We are differentiating user and system states now so it is only logical
to have that mirrored in the variables, right?

You even do that in fpu__init_system_xstate().

...

> @@ -225,20 +230,19 @@ void fpu__init_cpu_xstate(void)
>* set here.
>*/
>  
> - xfeatures_mask_user &= ~XFEATURE_MASK_SUPERVISOR;
> -
>   cr4_set_bits(X86_CR4_OSXSAVE);
>   xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask_user);

< newline here.

> + /*
> +  * MSR_IA32_XSS sets which XSAVES system states to be managed by

Improve:

"MSR_IA32_XSS controls which system (not user) states are going to be
managed by XSAVES."

> @@ -702,6 +703,7 @@ static int init_xstate_size(void)
>   */
>  static void fpu__init_disable_system_xstate(void)
>  {
> + xfeatures_mask_all = 0;
>   xfeatures_mask_user = 0;
>   cr4_clear_bits(X86_CR4_OSXSAVE);
>   fpu__xstate_clear_all_cpu_caps();
> @@ -717,6 +719,8 @@ void __init fpu__init_system_xstate(void)
>   static int on_boot_cpu __initdata = 1;
>   int err;
>   int i;
> + u64 cpu_user_xfeatures_mask;
> + u64 cpu_system_xfeatures_mask;

Please sort function local variables declaration in a reverse christmas
tree order:

 longest_variable_name;
 shorter_var_name;
 even_shorter;
 i;

>  
>   WARN_ON_FPU(!on_boot_cpu);
>   on_boot_cpu = 0;

...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 02/27] x86/fpu/xstate: Change names to separate XSAVES system and user states

2018-10-15 Thread Borislav Petkov

On Thu, Oct 11, 2018 at 08:14:58AM -0700, Yu-cheng Yu wrote:
> Control Flow Enforcement (CET) MSRs are XSAVES system/supervisor
> states.  To support CET, we introduce XSAVES system states first.
> 
> XSAVES is a "supervisor" instruction and, comparing to XSAVE, saves
> additional "supervisor" states that can be modified only from CPL 0.
> However, these states are per-task and not kernel's own.  Rename
> "supervisor" states to "system" states to clearly separate them from
> "user" states.
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/fpu/internal.h |  4 +-
>  arch/x86/include/asm/fpu/xstate.h   | 20 +++
>  arch/x86/kernel/fpu/core.c  |  4 +-
>  arch/x86/kernel/fpu/init.c  |  2 +-
>  arch/x86/kernel/fpu/signal.c|  6 +--
>  arch/x86/kernel/fpu/xstate.c| 82 ++---
>  6 files changed, 57 insertions(+), 61 deletions(-)

...

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 87a57b7642d3..e7cbaed12ef1 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -51,13 +51,14 @@ static short xsave_cpuid_features[] __initdata = {
>  };
>  
>  /*
> - * Mask of xstate features supported by the CPU and the kernel:
> + * Mask of supported 'user' xstate features derived from boot_cpu_has() and
> + * SUPPORTED_XFEATURES_MASK.

<--- This comment here looks like a good place to put some blurb about
user and system states, what they are, what the distinction is and so
on.

>   */
> -u64 xfeatures_mask __read_mostly;
> +u64 xfeatures_mask_user __read_mostly;
>  
>  static unsigned int xstate_offsets[XFEATURE_MAX] = { [ 0 ... XFEATURE_MAX - 
> 1] = -1};
>  static unsigned int xstate_sizes[XFEATURE_MAX]   = { [ 0 ... XFEATURE_MAX - 
> 1] = -1};
> -static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];
> +static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask_user)*8];
>  
>  /*
>   * The XSAVE area of kernel can be in standard or compacted format;
> @@ -82,7 +83,7 @@ void fpu__xstate_clear_all_cpu_caps(void)
>   */
>  int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
>  {
> - u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask;
> + u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask_user;
>  
>   if (unlikely(feature_name)) {
>   long xfeature_idx, max_idx;
> @@ -113,14 +114,11 @@ int cpu_has_xfeatures(u64 xfeatures_needed, const char 
> **feature_name)
>  }
>  EXPORT_SYMBOL_GPL(cpu_has_xfeatures);
>  
> -static int xfeature_is_supervisor(int xfeature_nr)
> +static int xfeature_is_system(int xfeature_nr)
>  {
>   /*
> -  * We currently do not support supervisor states, but if
> -  * we did, we could find out like this.
> -  *
>* SDM says: If state component 'i' is a user state component,
> -  * ECX[0] return 0; if state component i is a supervisor
> +  * ECX[0] return 0; if state component i is a system

  is 0

>* state component, ECX[0] returns 1.

   is 1.

>*/
>   u32 eax, ebx, ecx, edx;

...

> @@ -242,7 +238,7 @@ void fpu__init_cpu_xstate(void)
>   */
>  static int xfeature_enabled(enum xfeature xfeature)
>  {
> - return !!(xfeatures_mask & (1UL << xfeature));
> + return !!(xfeatures_mask_user & BIT_ULL(xfeature));
>  }
>  
>  /*
> @@ -272,7 +268,7 @@ static void __init setup_xstate_features(void)
>   cpuid_count(XSTATE_CPUID, i, &eax, &ebx, &ecx, &edx);
>  
>   /*
> -  * If an xfeature is supervisor state, the offset
> +  * If an xfeature is system state, the offset

is a system state, ...

>* in EBX is invalid. We leave it to -1.
>*/
>   if (xfeature_is_user(i))

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v5 01/27] x86/cpufeatures: Add CPUIDs for Control Flow Enforcement Technology (CET)

2018-10-11 Thread Borislav Petkov

On Thu, Oct 11, 2018 at 08:14:57AM -0700, Yu-cheng Yu wrote:
> Add CPUIDs for Control Flow Enforcement Technology (CET).

This is not "CPUIDs" but feature flags. Fix the subject too pls.

> CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack
> CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect branch tracking
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/cpufeatures.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 89a048c2faec..142b15da06fd 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -321,6 +321,7 @@
>  #define X86_FEATURE_PKU  (16*32+ 3) /* Protection Keys 
> for Userspace */
>  #define X86_FEATURE_OSPKE(16*32+ 4) /* OS Protection Keys Enable 
> */
>  #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector 
> Bit Manipulation Instructions */
> +#define X86_FEATURE_SHSTK(16*32+ 7) /* Shadow Stack */
>  #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New 
> Instructions */
>  #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */
>  #define X86_FEATURE_VPCLMULQDQ   (16*32+10) /* Carry-Less 
> Multiplication Double Quadword */
> @@ -341,6 +342,7 @@
>  #define X86_FEATURE_AVX512_4VNNIW(18*32+ 2) /* AVX-512 Neural Network 
> Instructions */
>  #define X86_FEATURE_AVX512_4FMAPS(18*32+ 3) /* AVX-512 Multiply 
> Accumulation Single precision */
>  #define X86_FEATURE_PCONFIG  (18*32+18) /* Intel PCONFIG */
> +#define X86_FEATURE_IBT  (18*32+20) /* Indirect Branch 
> Tracking */
>  #define X86_FEATURE_SPEC_CTRL(18*32+26) /* "" Speculation 
> Control (IBRS + IBPB) */
>  #define X86_FEATURE_INTEL_STIBP  (18*32+27) /* "" Single Thread 
> Indirect Branch Predictors */
>  #define X86_FEATURE_FLUSH_L1D(18*32+28) /* Flush L1D cache */
> -- 

With that addressed:

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [RFC PATCH v4 03/27] x86/fpu/xstate: Enable XSAVES system states

2018-10-02 Thread Borislav Petkov

On Fri, Sep 21, 2018 at 08:03:27AM -0700, Yu-cheng Yu wrote:
> XSAVES saves both system and user states.  The Linux kernel
> currently does not save/restore any system states.  This patch
> creates the framework for supporting system states.

... and needs a lot more text explaining *why* it is doing that.

> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/fpu/internal.h |   3 +-
>  arch/x86/include/asm/fpu/xstate.h   |   9 ++-
>  arch/x86/kernel/fpu/core.c  |   7 +-
>  arch/x86/kernel/fpu/init.c  |  10 ---
>  arch/x86/kernel/fpu/xstate.c| 112 +---
>  5 files changed, 80 insertions(+), 61 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu/internal.h 
> b/arch/x86/include/asm/fpu/internal.h
> index f1f9bf91a0ab..1f447865db3a 100644
> --- a/arch/x86/include/asm/fpu/internal.h
> +++ b/arch/x86/include/asm/fpu/internal.h
> @@ -45,7 +45,6 @@ extern void fpu__init_cpu_xstate(void);
>  extern void fpu__init_system(struct cpuinfo_x86 *c);
>  extern void fpu__init_check_bugs(void);
>  extern void fpu__resume_cpu(void);
> -extern u64 fpu__get_supported_xfeatures_mask(void);
>  
>  /*
>   * Debugging facility:
> @@ -94,7 +93,7 @@ static inline void fpstate_init_xstate(struct xregs_state 
> *xsave)
>* trigger #GP:
>*/
>   xsave->header.xcomp_bv = XCOMP_BV_COMPACTED_FORMAT |
> - xfeatures_mask_user;
> + xfeatures_mask_all;
>  }
>  
>  static inline void fpstate_init_fxstate(struct fxregs_state *fx)
> diff --git a/arch/x86/include/asm/fpu/xstate.h 
> b/arch/x86/include/asm/fpu/xstate.h
> index 9b382e5157ed..a32dc5f8c963 100644
> --- a/arch/x86/include/asm/fpu/xstate.h
> +++ b/arch/x86/include/asm/fpu/xstate.h
> @@ -19,10 +19,10 @@
>  #define XSAVE_YMM_SIZE   256
>  #define XSAVE_YMM_OFFSET(XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET)
>  
> -/* System features */
> -#define XFEATURE_MASK_SYSTEM (XFEATURE_MASK_PT)

Previous patch renames it, this patch deletes it. Why do we need all
that unnecessary churn?

Also, this patch is trying to do a couple of things at once and
reviewing it is not trivial. Please split the changes logically.

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 19f8df54c72a..dd2c561c4544 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -51,13 +51,16 @@ static short xsave_cpuid_features[] __initdata = {
>  };
>  
>  /*
> - * Mask of xstate features supported by the CPU and the kernel:
> + * Mask of xstate features supported by the CPU and the kernel.
> + * This is the result from CPUID query, SUPPORTED_XFEATURES_MASK,
> + * and boot_cpu_has().
>   */

This needs to explain what both masks are - user and system. "CPU" and
"kernel" is not "user" and "all".

>  u64 xfeatures_mask_user __read_mostly;
> +u64 xfeatures_mask_all __read_mostly;



> @@ -219,30 +222,31 @@ void fpstate_sanitize_xstate(struct fpu *fpu)
>   */
>  void fpu__init_cpu_xstate(void)
>  {
> - if (!boot_cpu_has(X86_FEATURE_XSAVE) || !xfeatures_mask_user)
> + if (!boot_cpu_has(X86_FEATURE_XSAVE) || !xfeatures_mask_all)
>   return;
> +
> + cr4_set_bits(X86_CR4_OSXSAVE);
> +
>   /*
> -  * Make it clear that XSAVES system states are not yet
> -  * implemented should anyone expect it to work by changing
> -  * bits in XFEATURE_MASK_* macros and XCR0.
> +  * XCR_XFEATURE_ENABLED_MASK sets the features that are managed
> +  * by XSAVE{C, OPT} and XRSTOR.  Only XSAVE user states can be
> +  * set here.
>*/
> - WARN_ONCE((xfeatures_mask_user & XFEATURE_MASK_SYSTEM),
> - "x86/fpu: XSAVES system states are not yet implemented.\n");
> + xsetbv(XCR_XFEATURE_ENABLED_MASK,
> +xfeatures_mask_user);

No need to break the line here.

Also, you have a couple more places in your patches where you
unnecessarily break lines. Please don't do that, even if it exceeds 80
cols by a couple of chars.

>  
> - xfeatures_mask_user &= ~XFEATURE_MASK_SYSTEM;
> -
> - cr4_set_bits(X86_CR4_OSXSAVE);
> - xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask_user);
> + /*
> +  * MSR_IA32_XSS sets which XSAVES system states to be managed by
> +  * XSAVES.  Only XSAVES system states can be set here.
> +  */
> + if (boot_cpu_has(X86_FEATURE_XSAVES))
> + wrmsrl(MSR_IA32_XSS,
> +xfeatures_mask_all & ~xfeatures_mask_user);

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [RFC PATCH v4 02/27] x86/fpu/xstate: Change some names to separate XSAVES system and user states

2018-10-02 Thread Borislav Petkov

On Tue, Oct 02, 2018 at 09:30:52AM -0700, Dave Hansen wrote:
> > Good point.  However, "system" is more indicative; CET states are per-task 
> > and
> > not "Supervisor".  Do we want to go back to "Supervisor" or add comments?
> 
> This is one of those things where the SDM language does not match what
> we use in the kernel.  I think it's fine to call them "system" or
> "kernel" states to make it consistent with our existing in-kernel
> nomenclature.
> 
> I say add comments to clarify what the SDM calls it vs. what we do.

So AFAIU, the difference is that XSAVES is a CPL0 insn. Thus the
supervisor thing, I'd guess.

Now it looks like CET uses XSAVES (from skimming the patchset forward)
but then what our nomenclature is and how it all gets tied together,
needs to be explained somewhere prominent so that we're all on the same
page.

This patch's commit message is not even close. So I'd very much
appreciate a more verbose explanation, even if it repeats itself at
places.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [RFC PATCH v4 02/27] x86/fpu/xstate: Change some names to separate XSAVES system and user states

2018-10-02 Thread Borislav Petkov

On Fri, Sep 21, 2018 at 08:03:26AM -0700, Yu-cheng Yu wrote:
> To support XSAVES system states, change some names to distinguish
> user and system states.

I don't understand what the logic here is. SDM says:

XSAVES—Save Processor Extended States Supervisor

the stress being on "Supervisor" - why does it need to be renamed to
"system" now?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [RFC PATCH v4 01/27] x86/cpufeatures: Add CPUIDs for Control-flow Enforcement Technology (CET)

2018-09-28 Thread Borislav Petkov

On Fri, Sep 21, 2018 at 08:03:25AM -0700, Yu-cheng Yu wrote:
> Add CPUIDs for Control-flow Enforcement Technology (CET).
> 
> CPUID.(EAX=7,ECX=0):ECX[bit 7] Shadow stack
> CPUID.(EAX=7,ECX=0):EDX[bit 20] Indirect branch tracking
> 
> Signed-off-by: Yu-cheng Yu 
> ---
>  arch/x86/include/asm/cpufeatures.h | 2 ++
>  arch/x86/kernel/cpu/scattered.c| 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeatures.h 
> b/arch/x86/include/asm/cpufeatures.h
> index 89a048c2faec..fa69651a017e 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -221,6 +221,7 @@
>  #define X86_FEATURE_ZEN  ( 7*32+28) /* "" CPU is AMD 
> family 0x17 (Zen) */
>  #define X86_FEATURE_L1TF_PTEINV  ( 7*32+29) /* "" L1TF 
> workaround PTE inversion */
>  #define X86_FEATURE_IBRS_ENHANCED( 7*32+30) /* Enhanced IBRS */
> +#define X86_FEATURE_IBT  ( 7*32+31) /* Indirect Branch 
> Tracking */
>  
>  /* Virtualization flags: Linux defined, word 8 */
>  #define X86_FEATURE_TPR_SHADOW   ( 8*32+ 0) /* Intel TPR Shadow 
> */
> @@ -321,6 +322,7 @@
>  #define X86_FEATURE_PKU  (16*32+ 3) /* Protection Keys 
> for Userspace */
>  #define X86_FEATURE_OSPKE(16*32+ 4) /* OS Protection Keys Enable 
> */
>  #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector 
> Bit Manipulation Instructions */
> +#define X86_FEATURE_SHSTK(16*32+ 7) /* Shadow Stack */
>  #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New 
> Instructions */
>  #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */
>  #define X86_FEATURE_VPCLMULQDQ   (16*32+10) /* Carry-Less 
> Multiplication Double Quadword */
> diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
> index 772c219b6889..63cbb4d9938e 100644
> --- a/arch/x86/kernel/cpu/scattered.c
> +++ b/arch/x86/kernel/cpu/scattered.c
> @@ -21,6 +21,7 @@ struct cpuid_bit {
>  static const struct cpuid_bit cpuid_bits[] = {
>   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
>   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
> + { X86_FEATURE_IBT,  CPUID_EDX, 20, 0x0007, 0},

If you haven't noticed, there's already a separate leaf:

/* Intel-defined CPU features, CPUID level 0x0007:0 (EDX), word 18 */

in arch/x86/include/asm/cpufeatures.h

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH] Documentation/process: add Co-Developed-by: tag for patches with multiple authors

2017-11-17 Thread Borislav Petkov

On Thu, Nov 16, 2017 at 02:23:09PM +0100, Greg Kroah-Hartman wrote:
> Sometimes a single patch is the result of multiple authors.  As git only
> can have one "author" of a patch, it is still good to properly give
> credit to the other developers of a commit.  To address this, document
> the "Co-Developed-by:" tag which can be used to show other authors of
> the patch.
> 
> Note, these other authors must also provide a Signed-off-by: tag as it
> is their work that is being submitted here.
> 
> Reported-by: Thomas Gleixner 
> Signed-off-by: Greg Kroah-Hartman 
> 
> 
> diff --git a/Documentation/process/5.Posting.rst 
> b/Documentation/process/5.Posting.rst
> index 1b7728b19ea7..645fa9c7388a 100644
> --- a/Documentation/process/5.Posting.rst
> +++ b/Documentation/process/5.Posting.rst
> @@ -213,6 +213,11 @@ The tags in common use are:
> which can be found in Documentation/process/submitting-patches.rst.  Code 
> without a
> proper signoff cannot be merged into the mainline.
>  
> + - Co-Developed-by: states that the patch was also created by another 
> developer
> +   along with the original author.  This is useful at times when multiple
> +   people work on a single patch.  Note, this person also needs to have a
> +   Signed-off-by: line in the patch as well.
> +
>   - Acked-by: indicates an agreement by another developer (often a
> maintainer of the relevant code) that the patch is appropriate for
> inclusion into the kernel.

Having a second or third author and specifying that fact keeps popping
up during review and people keep screwing up the SOB chains, trying to
express that. So yes, it is a good idea to have a special tag for this.

Acked-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Documentation: Improve softlockup_panic= description text

2017-10-03 Thread Borislav Petkov

On Tue, Oct 03, 2017 at 03:13:23PM -0600, Jonathan Corbet wrote:
> Man that's a lot of work...much easier to just whine about it on the
> mailing lists...:)

You could've just said "nah, you do it" :)

> (Yes, I tweaked it, thanks).

Thanks!

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Documentation: Improve softlockup_panic= description text

2017-10-03 Thread Borislav Petkov

On Tue, Oct 03, 2017 at 02:39:17PM -0600, Jonathan Corbet wrote:
> So I hate to be fussy.  (OK, perhaps I like it, but I'll act reluctant
> anyway...:)  If it's an integer value, how can it be null?  Can we say "a
> non-zero value" instead?

non-zero yap, I was looking for that version but somehow it evaded
me. Can you change it while applying or should I send a new one?

> I'd take out the quotes too, but that's extra credit...

Well, a value can't really instruct, thus the quotes...

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Documentation: Improve softlockup_panic= description text

2017-10-03 Thread Borislav Petkov

From: Borislav Petkov 

It should say what that  range is and what that integer value
means. I had to look at the code...

Signed-off-by: Borislav Petkov 
Cc: Jonathan Corbet 
Cc: linux-doc@vger.kernel.org
---
 Documentation/admin-guide/kernel-parameters.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 9a84483db403..c14cd2645c1f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3887,6 +3887,12 @@
[KNL] Should the soft-lockup detector generate panics.
Format: 
 
+   A non-null value "instructs" the soft-lockup detector
+   to panic the machine when a soft-lockup occurs. This
+   is also controlled by CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC
+   which is the respective build-time switch to that
+   functionality.
+
softlockup_all_cpu_backtrace=
[KNL] Should the soft-lockup detector generate
backtraces on all cpus.
-- 
2.13.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 3/7] sched/idle: Add poll before enter real idle path

2017-09-14 Thread Borislav Petkov

On Thu, Sep 14, 2017 at 04:41:39PM +0800, Quan Xu wrote:
> > > On Tue, Aug 29, 2017 at 11:46:37AM +, Yang Zhang wrote:
> > > > Add poll in do_idle. For UP VM, if there are running task, it will not
> > > > goes into idle path, so we only enable poll in SMP VM.
> > > > 
> > > > Signed-off-by: Yang Zhang 
> > > > Signed-off-by: Quan Xu 
> > > Broken SoB chain.
> >   Peter,  I can't follow 'Broken SoB chain'.. could you explain more
> > about it?
> > 
>     Peter, Ping..

The SOB chain needs to show the path from the author to the upstream
maintainer. Yours has Yang before you, which doesn't say what his role
is.

Read Documentation/process/submitting-patches.rst, section 11.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 3/7] sched/idle: Add poll before enter real idle path

2017-08-29 Thread Borislav Petkov

On Tue, Aug 29, 2017 at 11:46:37AM +, Yang Zhang wrote:
> Add poll in do_idle. For UP VM, if there are running task, it will not
> goes into idle path, so we only enable poll in SMP VM.
> 
> Signed-off-by: Yang Zhang 
> Signed-off-by: Quan Xu 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> Cc: Peter Zijlstra 
> Cc: Borislav Petkov 
> Cc: Kyle Huey 
> Cc: Andy Lutomirski 
> Cc: Len Brown 
> Cc: linux-ker...@vger.kernel.org
> ---
>  arch/x86/kernel/process.c | 7 +++
>  kernel/sched/idle.c   | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 3ca1980..def4113 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -332,6 +332,13 @@ void arch_cpu_idle(void)
>   x86_idle();
>  }
>  
> +#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT)
> +void arch_cpu_idle_poll(void)
> +{
> + paravirt_idle_poll();
> +}
> +#endif

So this will get called on any system which has CONFIG_PARAVIRT enabled
*even* if they're not running any guests.

Huh?

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 04/38] x86/CPU/AMD: Add the Secure Memory Encryption CPU feature

2017-07-10 Thread Borislav Petkov

On Tue, Jul 11, 2017 at 01:07:46AM -0400, Brian Gerst wrote:
> > If I make the scattered feature support conditional on CONFIG_X86_64
> > (based on comment below) then cpu_has() will always be false unless
> > CONFIG_X86_64 is enabled. So this won't need to be wrapped by the
> > #ifdef.
> 
> If you change it to use cpu_feature_enabled(), gcc will see that it is
> disabled and eliminate the dead code at compile time.

Just do this:

   if (cpu_has(c, X86_FEATURE_SME)) {
   if (IS_ENABLED(CONFIG_X86_32)) {
   clear_cpu_cap(c, X86_FEATURE_SME);
   } else {
   u64 msr;

   /* Check if SME is enabled */
  rdmsrl(MSR_K8_SYSCFG, msr);
  if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
  clear_cpu_cap(c, X86_FEATURE_SME);
   }
   }

so that it is explicit that we disable it on 32-bit and we can save us
the ifdeffery elsewhere.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 34/36] x86/mm: Add support to encrypt the kernel in-place

2017-06-26 Thread Borislav Petkov

On Fri, Jun 23, 2017 at 12:44:46PM -0500, Tom Lendacky wrote:
> Normally the __p4d() macro would be used and that would be ok whether
> CONFIG_X86_5LEVEL is defined or not. But since __p4d() is part of the
> paravirt ops path I have to use native_make_p4d().

So __p4d is in !CONFIG_PARAVIRT path.

Regardless, we use the native_* variants in generic code to mean, not
paravirt. Just define it in a separate patch like the rest of the p4*
machinery and use it in your code. Sooner or later someone else will
need it.

> True, 5-level will only be turned on for specific hardware which is why
> I originally had this as only 4-level pagetables. But in a comment from
> you back on the v5 version you said it needed to support 5-level. I
> guess we should have discussed this more,

AFAIR, I said something along the lines of "what about 5-level page
tables?" and whether we care.

> but I also thought that should our hardware ever support 5-level
> paging in the future then this would be good to go.

There it is :-)

> The macros work great if you are not running identity mapped. You could
> use p*d_offset() to move easily through the tables, but those functions
> use __va() to generate table virtual addresses. I've seen where
> boot/compressed/pagetable.c #defines __va() to work with identity mapped
> pages but that would only work if I create a separate file just for this
> function.
> 
> Given when this occurs it's very similar to what __startup_64() does in
> regards to the IS_ENABLED(CONFIG_X86_5LEVEL) checks.

Ok.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 36/36] x86/mm: Add support to make use of Secure Memory Encryption

2017-06-23 Thread Borislav Petkov

On Fri, Jun 16, 2017 at 01:56:39PM -0500, Tom Lendacky wrote:
> Add support to check if SME has been enabled and if memory encryption
> should be activated (checking of command line option based on the
> configuration of the default state).  If memory encryption is to be
> activated, then the encryption mask is set and the kernel is encrypted
> "in place."
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/mem_encrypt.h |6 ++-
>  arch/x86/kernel/head64.c   |4 +-
>  arch/x86/mm/mem_encrypt.c  |   86 
> +++-
>  3 files changed, 90 insertions(+), 6 deletions(-)

...

> +/*
> + * Some SME functions run very early causing issues with the stack-protector
> + * support. Provide a way to turn off this support on a per-function basis.
> + */
> +#define SME_NOSTACKP __attribute__((__optimize__("no-stack-protector")))

__nostackp

just like the others in include/linux/compiler-gcc.h.

> +
> +static char sme_cmdline_arg[] __initdata = "mem_encrypt";
> +static char sme_cmdline_on[]  __initdata = "on";
> +static char sme_cmdline_off[] __initdata = "off";
>  
>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -200,6 +215,8 @@ void __init mem_encrypt_init(void)
>  
>   /* Call into SWIOTLB to update the SWIOTLB DMA buffers */
>   swiotlb_update_mem_attributes();
> +
> + pr_info("AMD Secure Memory Encryption (SME) active\n");
>  }
>  
>  void swiotlb_set_mem_attributes(void *vaddr, unsigned long size)
> @@ -527,8 +544,73 @@ void __init sme_encrypt_kernel(void)
>   native_write_cr3(__native_read_cr3());
>  }
>  
> -void __init sme_enable(void)
> +void __init SME_NOSTACKP sme_enable(struct boot_params *bp)
>  {
> + const char *cmdline_ptr, *cmdline_arg, *cmdline_on, *cmdline_off;
> + unsigned int eax, ebx, ecx, edx;
> + bool active_by_default;
> + unsigned long me_mask;
> + char buffer[16];
> + u64 msr;
> +
> + /* Check for the SME support leaf */
> + eax = 0x8000;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + if (eax < 0x801f)
> + return;
> +
> + /*
> +  * Check for the SME feature:
> +  *   CPUID Fn8000_001F[EAX] - Bit 0
> +  * Secure Memory Encryption support
> +  *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +  * Pagetable bit position used to indicate encryption
> +  */
> + eax = 0x801f;
> + ecx = 0;
> + native_cpuid(&eax, &ebx, &ecx, &edx);
> + if (!(eax & 1))
> + return;
> +
> + me_mask = 1UL << (ebx & 0x3f);
> +
> + /* Check if SME is enabled */
> + msr = __rdmsr(MSR_K8_SYSCFG);
> + if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
> + return;
> +
> + /*
> +  * Fixups have not been applied to phys_base yet and we're running
> +  * identity mapped, so we must obtain the address to the SME command
> +  * line argument data using rip-relative addressing.
> +  */
> + asm ("lea sme_cmdline_arg(%%rip), %0"
> +  : "=r" (cmdline_arg)
> +  : "p" (sme_cmdline_arg));
> + asm ("lea sme_cmdline_on(%%rip), %0"
> +  : "=r" (cmdline_on)
> +  : "p" (sme_cmdline_on));
> + asm ("lea sme_cmdline_off(%%rip), %0"
> +  : "=r" (cmdline_off)
> +  : "p" (sme_cmdline_off));
> +
> + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> + active_by_default = true;
> + else
> + active_by_default = false;
> +
> + cmdline_ptr = (const char *)((u64)bp->hdr.cmd_line_ptr |
> +  ((u64)bp->ext_cmd_line_ptr << 32));
> +
> + cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer));
> +
> + if (strncmp(buffer, cmdline_on, sizeof(buffer)) == 0)

if (!strncmp(...

> + sme_me_mask = me_mask;
> + else if (strncmp(buffer, cmdline_off, sizeof(buffer)) == 0)

else if (!strncmp(...

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 35/36] x86/boot: Add early cmdline parsing for options with arguments

2017-06-23 Thread Borislav Petkov

On Fri, Jun 16, 2017 at 01:56:30PM -0500, Tom Lendacky wrote:
> Add a cmdline_find_option() function to look for cmdline options that
> take arguments. The argument is returned in a supplied buffer and the
> argument length (regardless of whether it fits in the supplied buffer)
> is returned, with -1 indicating not found.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/cmdline.h |2 +
>  arch/x86/lib/cmdline.c |  105 
> 
>  2 files changed, 107 insertions(+)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 34/36] x86/mm: Add support to encrypt the kernel in-place

2017-06-23 Thread Borislav Petkov

On Fri, Jun 16, 2017 at 01:56:19PM -0500, Tom Lendacky wrote:
> Add the support to encrypt the kernel in-place. This is done by creating
> new page mappings for the kernel - a decrypted write-protected mapping
> and an encrypted mapping. The kernel is encrypted by copying it through
> a temporary buffer.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/mem_encrypt.h |6 +
>  arch/x86/mm/Makefile   |2 
>  arch/x86/mm/mem_encrypt.c  |  314 
> 
>  arch/x86/mm/mem_encrypt_boot.S |  150 +
>  4 files changed, 472 insertions(+)
>  create mode 100644 arch/x86/mm/mem_encrypt_boot.S
> 
> diff --git a/arch/x86/include/asm/mem_encrypt.h 
> b/arch/x86/include/asm/mem_encrypt.h
> index af835cf..7da6de3 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -21,6 +21,12 @@
>  
>  extern unsigned long sme_me_mask;
>  
> +void sme_encrypt_execute(unsigned long encrypted_kernel_vaddr,
> +  unsigned long decrypted_kernel_vaddr,
> +  unsigned long kernel_len,
> +  unsigned long encryption_wa,
> +  unsigned long encryption_pgd);
> +
>  void __init sme_early_encrypt(resource_size_t paddr,
> unsigned long size);
>  void __init sme_early_decrypt(resource_size_t paddr,
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 9e13841..0633142 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -38,3 +38,5 @@ obj-$(CONFIG_NUMA_EMU)  += numa_emulation.o
>  obj-$(CONFIG_X86_INTEL_MPX)  += mpx.o
>  obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
>  obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o
> +
> +obj-$(CONFIG_AMD_MEM_ENCRYPT)+= mem_encrypt_boot.o
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 842c8a6..6e87662 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -24,6 +24,8 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  
>  /*
>   * Since SME related variables are set early in the boot process they must
> @@ -209,8 +211,320 @@ void swiotlb_set_mem_attributes(void *vaddr, unsigned 
> long size)
>   set_memory_decrypted((unsigned long)vaddr, size >> PAGE_SHIFT);
>  }
>  
> +static void __init sme_clear_pgd(pgd_t *pgd_base, unsigned long start,
> +  unsigned long end)
> +{
> + unsigned long pgd_start, pgd_end, pgd_size;
> + pgd_t *pgd_p;
> +
> + pgd_start = start & PGDIR_MASK;
> + pgd_end = end & PGDIR_MASK;
> +
> + pgd_size = (((pgd_end - pgd_start) / PGDIR_SIZE) + 1);
> + pgd_size *= sizeof(pgd_t);
> +
> + pgd_p = pgd_base + pgd_index(start);
> +
> + memset(pgd_p, 0, pgd_size);
> +}
> +
> +#ifndef CONFIG_X86_5LEVEL
> +#define native_make_p4d(_x)  (p4d_t) { .pgd = native_make_pgd(_x) }
> +#endif

Huh, why isn't this in arch/x86/include/asm/pgtable_types.h in the #else
branch of #if CONFIG_PGTABLE_LEVELS > 4 ?

Also

ERROR: Macros with complex values should be enclosed in parentheses
#105: FILE: arch/x86/mm/mem_encrypt.c:232:
+#define native_make_p4d(_x)(p4d_t) { .pgd = native_make_pgd(_x) }

so why isn't it a function?

> +
> +#define PGD_FLAGS_KERNPG_TABLE_NOENC
> +#define P4D_FLAGS_KERNPG_TABLE_NOENC
> +#define PUD_FLAGS_KERNPG_TABLE_NOENC
> +#define PMD_FLAGS(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL)
> +
> +static void __init *sme_populate_pgd(pgd_t *pgd_base, void *pgtable_area,
> +  unsigned long vaddr, pmdval_t pmd_val)
> +{
> + pgd_t *pgd_p;
> + p4d_t *p4d_p;
> + pud_t *pud_p;
> + pmd_t *pmd_p;
> +
> + pgd_p = pgd_base + pgd_index(vaddr);
> + if (native_pgd_val(*pgd_p)) {
> + if (IS_ENABLED(CONFIG_X86_5LEVEL))

Err, I don't understand: so this is a Kconfig symbol and when it is
enabled at build time, you do a 5level pagetable.

But you can't stick a 5level pagetable to a hardware which doesn't know
about it.

Or do you mean that p4d layer folding at runtime to happen? (I admit, I
haven't looked at that in detail.) But then I'd hope that the generic
macros/functions would give you the ability to not care whether we have
a p4d or not and not add a whole bunch of ifdeffery to this code.

Hmmm.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 33/36] x86/mm: Use proper encryption attributes with /dev/mem

2017-06-23 Thread Borislav Petkov

On Fri, Jun 16, 2017 at 01:56:07PM -0500, Tom Lendacky wrote:
> When accessing memory using /dev/mem (or /dev/kmem) use the proper
> encryption attributes when mapping the memory.
> 
> To insure the proper attributes are applied when reading or writing
> /dev/mem, update the xlate_dev_mem_ptr() function to use memremap()
> which will essentially perform the same steps of applying __va for
> RAM or using ioremap() for if not RAM.
> 
> To insure the proper attributes are applied when mmapping /dev/mem,
> update the phys_mem_access_prot() to call phys_mem_access_encrypted(),
> a new function which will check if the memory should be mapped encrypted
> or not. If it is not to be mapped encrypted then the VMA protection
> value is updated to remove the encryption bit.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/io.h |3 +++
>  arch/x86/mm/ioremap.c |   18 +-
>  arch/x86/mm/pat.c |3 +++
>  3 files changed, 15 insertions(+), 9 deletions(-)

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 32/36] xen/x86: Remove SME feature in PV guests

2017-06-23 Thread Borislav Petkov

On Fri, Jun 16, 2017 at 01:55:54PM -0500, Tom Lendacky wrote:
> Xen does not currently support SME for PV guests. Clear the SME cpu

nitpick: s/cpu/CPU/

> capability in order to avoid any ambiguity.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/xen/enlighten_pv.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> index f33eef4..e6ecf42 100644
> --- a/arch/x86/xen/enlighten_pv.c
> +++ b/arch/x86/xen/enlighten_pv.c
> @@ -294,6 +294,7 @@ static void __init xen_init_capabilities(void)
>   setup_clear_cpu_cap(X86_FEATURE_MTRR);
>   setup_clear_cpu_cap(X86_FEATURE_ACC);
>   setup_clear_cpu_cap(X86_FEATURE_X2APIC);
> + setup_clear_cpu_cap(X86_FEATURE_SME);
>  
>   if (!xen_initial_domain())
>   setup_clear_cpu_cap(X86_FEATURE_ACPI);

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v7 31/36] x86/mm, kexec: Allow kexec to be used with SME

2017-06-23 Thread Borislav Petkov

On Fri, Jun 16, 2017 at 01:55:45PM -0500, Tom Lendacky wrote:
> Provide support so that kexec can be used to boot a kernel when SME is
> enabled.
> 
> Support is needed to allocate pages for kexec without encryption.  This
> is needed in order to be able to reboot in the kernel in the same manner
> as originally booted.
> 
> Additionally, when shutting down all of the CPUs we need to be sure to
> flush the caches and then halt. This is needed when booting from a state
> where SME was not active into a state where SME is active (or vice-versa).
> Without these steps, it is possible for cache lines to exist for the same
> physical location but tagged both with and without the encryption bit. This
> can cause random memory corruption when caches are flushed depending on
> which cacheline is written last.
> 
> Signed-off-by: Tom Lendacky 
> ---
>  arch/x86/include/asm/init.h  |1 +
>  arch/x86/include/asm/kexec.h |8 
>  arch/x86/include/asm/pgtable_types.h |1 +
>  arch/x86/kernel/machine_kexec_64.c   |   22 +-
>  arch/x86/kernel/process.c|   17 +++--
>  arch/x86/mm/ident_map.c  |   12 
>  include/linux/kexec.h|   14 ++
>  kernel/kexec_core.c  |   12 +++-
>  8 files changed, 79 insertions(+), 8 deletions(-)

...

> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index c9481eb..5d17fd6 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -334,6 +334,20 @@ static inline void *boot_phys_to_virt(unsigned long 
> entry)
>   return phys_to_virt(boot_phys_to_phys(entry));
>  }
>  
> +#ifndef arch_kexec_post_alloc_pages
> +static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int 
> pages,
> +   gfp_t gfp)
> +{
> + return 0;
> +}
> +#endif

Just a nitpick:

static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, 
gfp_t gfp) { return 0; }
static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { 
}

Other than that:

Reviewed-by: Borislav Petkov 

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 4 >

1 - 100 of 313 matches

Mail list logo