Re: [PATCH] sched: pull tasks when CPU is about to run SCHED_IDLE tasks

2021-01-11 Thread He Chen
On Wed, Dec 23, 2020 at 12:30:26PM +0100, Vincent Guittot wrote:
> On Wed, 23 Dec 2020 at 09:32,  wrote:
> >
> > From: Chen Xiaoguang 
> >
> > Before a CPU switches from running SCHED_NORMAL task to
> > SCHED_IDLE task, trying to pull SCHED_NORMAL tasks from other
> 
> Could you explain more in detail why you only care about this use case
> in particular and not the general case?
> 

We want to run online tasks using SCHED_NORMAL policy and offline tasks
using SCHED_IDLE policy. The online tasks and the offline tasks run in
the same computer in order to use the computer efficiently.
The online tasks are in sleep in most times but should responce soon
once
wake up. The offline tasks are in low priority and will run only when no
online
tasks.

The online tasks are more important than the offline tasks and are
latency
sensitive we should make sure the online tasks preempt the offline tasks
as soon as possilbe while there are online tasks waiting to run.
So in our situation we hope the SCHED_NORMAL to run if has any.

Let's assume we have 2 CPUs,
In CPU1 we got 2 SCHED_NORMAL tasks.
in CPU2 we got 1 SCHED_NORMAL task and 2 SCHED_IDLE tasks.

 CPU1  CPU2
curr   rq1curr  rq2
  +--+ | +--+   +--+ | ++ ++
t0|NORMAL| | |NORMAL|   |NORMAL| | |IDLE| |IDLE|
  +--+ | +--+   +--+ | ++ ++

 NORMAL exits or blocked
  +--+ | +--+| ++ ++
t1|NORMAL| | |NORMAL|| |IDLE| |IDLE|
  +--+ | +--+| ++ ++

 pick_next_task_fair
  +--+ | +--+ ++ | ++
t2|NORMAL| | |NORMAL| |IDLE| | |IDLE|
  +--+ | +--+ ++ | ++

 SCHED_IDLE running
t3+--+ | +--+++  | ++
  |NORMAL| | |NORMAL||IDLE|  | |IDLE|
  +--+ | +--+++  | ++
 
 run_rebalance_domains
  +--+ |+--+ | ++ ++
t4|NORMAL| ||NORMAL| | |IDLE| |IDLE|
  +--+ |+--+ | ++ ++

As we can see
t1: NORMAL task in CPU2 exits or blocked
t2: CPU2 pick_next_task_fair would pick a SCHED_IDLE to run while
another SCHED_NORMAL in rq1 is waiting. 
t3: SCHED_IDLE run in CPU2 while a SCHED_NORMAL wait in CPU1.
t4: after a short time, periodic load_balance triggerd and pull
SCHED_NORMAL in rq1 to rq2, and SCHED_NORMAL likely preempts SCHED_IDLE.

In this scenario, SCHED_IDLE is running while SCHED_NORMAL is waiting to
run.
The latency of this SCHED_NORMAL will be high which is not acceptble.

Do a load_balance before running the SCHED_IDLE may fix this problem.

This patch works as below:

 CPU1  CPU2
curr   rq1curr  rq2
  +--+ | +--+   +--+ | ++ ++
t0|NORMAL| | |NORMAL|   |NORMAL| | |IDLE| |IDLE|
  +--+ | +--+   +--+ | ++ ++

 NORMAL exits or blocked
  +--+ | +--+| ++ ++
t1|NORMAL| | |NORMAL|| |IDLE| |IDLE|
  +--+ | +--+| ++ ++

t2pick_next_task_fair (all se are
SCHED_IDLE)

 newidle_balance
  +--+ | +--+ | ++ ++
t3|NORMAL| | |NORMAL| | |IDLE| |IDLE|
  +--+ | +--+ | ++ ++


t1: NORMAL task in CPU2 exits or blocked
t2: pick_next_task_fair check all se in rbtree are SCHED_IDLE and calls
newidle_balance who tries to pull a SCHED_NORMAL(if has).
t3: pick_next_task_fair would pick a SCHED_NORMAL to run instead of
SCHED_IDLE(likely).

> > CPU by doing load_balance first.
> >
> > Signed-off-by: Chen Xiaoguang 
> > Signed-off-by: Chen He 
> > ---
> >  kernel/sched/fair.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ae7ceba..0a26132 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7004,6 +7004,11 @@ struct task_struct *
> > struct task_struct *p;
> > int new_tasks;
> >
> > +   if (prev &&
> > +   fair_policy(prev->policy) &&
> 
> Why do you need a prev and fair task  ? You seem to target the special
> case of pick_next_task  but in this case why not only testing rf!=null
>  to make sure to not return immediately after jumping to the idle
> label?
> 

We just want to do load_balance only when CPU switches from SCHED_NORMAL
to SCHED_IDLE.
If not check prev, when the running tasks are all SCHED_IDLE, we would
do newidle_balance everytime in pick_next_task_fair, it makes no sense
and kind of wasting.

> Also why not doing 

[tip:x86/cpufeature] x86/cpuid: Provide get_scattered_cpuid_leaf()

2016-11-16 Thread tip-bot for He Chen
Commit-ID:  47bdf3378d62a627cfb8a54e1180c08d67078b61
Gitweb: http://git.kernel.org/tip/47bdf3378d62a627cfb8a54e1180c08d67078b61
Author: He Chen <he.c...@linux.intel.com>
AuthorDate: Fri, 11 Nov 2016 17:25:35 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Wed, 16 Nov 2016 11:13:09 +0100

x86/cpuid: Provide get_scattered_cpuid_leaf()

Sparse populated CPUID leafs are collected in a software provided leaf to
avoid bloat of the x86_capability array, but there is no way to rebuild the
real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID
leaf from the CPU. While this is possible it is problematic as it does not
take software disabled features into account. If a feature is disabled on
the host it should not be exposed to a guest either.

Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered
cpuid table information and the active CPU features.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen <he.c...@linux.intel.com>
Reviewed-by: Borislav Petkov <b...@suse.de>
Cc: Luwei Kang <luwei.k...@intel.com>
Cc: k...@vger.kernel.org
Cc: Radim Krčmář <rkrc...@redhat.com>
Cc: Piotr Luc <piotr@intel.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Link: 
http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.c...@linux.intel.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>

---
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/cpu/scattered.c  | 49 ++--
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8f6ac5b..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbb470e..d1316f9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,24 +17,25 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
u32 regs[4];
const struct cpuid_bit *cb;
 
-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {
 
/* Verify that the level is valid */
@@ -51,3 +52,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   const struct cpuid_bit *cb;
+   u32 cpuid_val = 0;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+

[tip:x86/cpufeature] x86/cpuid: Provide get_scattered_cpuid_leaf()

2016-11-16 Thread tip-bot for He Chen
Commit-ID:  47bdf3378d62a627cfb8a54e1180c08d67078b61
Gitweb: http://git.kernel.org/tip/47bdf3378d62a627cfb8a54e1180c08d67078b61
Author: He Chen 
AuthorDate: Fri, 11 Nov 2016 17:25:35 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 16 Nov 2016 11:13:09 +0100

x86/cpuid: Provide get_scattered_cpuid_leaf()

Sparse populated CPUID leafs are collected in a software provided leaf to
avoid bloat of the x86_capability array, but there is no way to rebuild the
real leafs (e.g. for KVM CPUID enumeration) other than rereading the CPUID
leaf from the CPU. While this is possible it is problematic as it does not
take software disabled features into account. If a feature is disabled on
the host it should not be exposed to a guest either.

Add get_scattered_cpuid_leaf() which rebuilds the leaf from the scattered
cpuid table information and the active CPU features.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen 
Reviewed-by: Borislav Petkov 
Cc: Luwei Kang 
Cc: k...@vger.kernel.org
Cc: Radim Krčmář 
Cc: Piotr Luc 
Cc: Borislav Petkov 
Cc: Paolo Bonzini 
Link: 
http://lkml.kernel.org/r/1478856336-9388-3-git-send-email-he.c...@linux.intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/cpu/scattered.c  | 49 ++--
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8f6ac5b..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbb470e..d1316f9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,24 +17,25 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
u32 regs[4];
const struct cpuid_bit *cb;
 
-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {
 
/* Verify that the level is valid */
@@ -51,3 +52,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   const struct cpuid_bit *cb;
+   u32 cpuid_val = 0;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf);


[tip:x86/cpufeature] x86/cpuid: Cleanup cpuid_regs definitions

2016-11-16 Thread tip-bot for He Chen
Commit-ID:  47f10a36003eaf493125a5e6687dd1ff775bfd8c
Gitweb: http://git.kernel.org/tip/47f10a36003eaf493125a5e6687dd1ff775bfd8c
Author: He Chen <he.c...@linux.intel.com>
AuthorDate: Fri, 11 Nov 2016 17:25:34 +0800
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Wed, 16 Nov 2016 11:13:09 +0100

x86/cpuid: Cleanup cpuid_regs definitions

cpuid_regs is defined multiple times as structure and enum. Rename the enum
and move all of it to processor.h so we don't end up with more instances.

Rename the misnomed register enumeration from CR_* to the obvious CPUID_*.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen <he.c...@linux.intel.com>
Reviewed-by: Borislav Petkov <b...@alien8.de>
Cc: Luwei Kang <luwei.k...@intel.com>
Cc: k...@vger.kernel.org
Cc: Radim Krčmář <rkrc...@redhat.com>
Cc: Piotr Luc <piotr@intel.com>
Cc: Paolo Bonzini <pbonz...@redhat.com>
Link: 
http://lkml.kernel.org/r/1478856336-9388-2-git-send-email-he.c...@linux.intel.com
Signed-off-by: Thomas Gleixner <t...@linutronix.de>

---
 arch/x86/events/intel/pt.c   | 45 +---
 arch/x86/include/asm/processor.h | 11 ++
 arch/x86/kernel/cpu/scattered.c  | 28 ++---
 arch/x86/kernel/cpuid.c  |  4 
 4 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..8f6ac5b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpui

[tip:x86/cpufeature] x86/cpuid: Cleanup cpuid_regs definitions

2016-11-16 Thread tip-bot for He Chen
Commit-ID:  47f10a36003eaf493125a5e6687dd1ff775bfd8c
Gitweb: http://git.kernel.org/tip/47f10a36003eaf493125a5e6687dd1ff775bfd8c
Author: He Chen 
AuthorDate: Fri, 11 Nov 2016 17:25:34 +0800
Committer:  Thomas Gleixner 
CommitDate: Wed, 16 Nov 2016 11:13:09 +0100

x86/cpuid: Cleanup cpuid_regs definitions

cpuid_regs is defined multiple times as structure and enum. Rename the enum
and move all of it to processor.h so we don't end up with more instances.

Rename the misnomed register enumeration from CR_* to the obvious CPUID_*.

[ tglx: Rewrote changelog ]

Signed-off-by: He Chen 
Reviewed-by: Borislav Petkov 
Cc: Luwei Kang 
Cc: k...@vger.kernel.org
Cc: Radim Krčmář 
Cc: Piotr Luc 
Cc: Paolo Bonzini 
Link: 
http://lkml.kernel.org/r/1478856336-9388-2-git-send-email-he.c...@linux.intel.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/events/intel/pt.c   | 45 +---
 arch/x86/include/asm/processor.h | 11 ++
 arch/x86/kernel/cpu/scattered.c  | 28 ++---
 arch/x86/kernel/cpuid.c  |  4 
 4 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..8f6ac5b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD

Re: [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread He Chen
On Tue, Nov 15, 2016 at 04:24:39AM +0800, kbuild test robot wrote:
> Hi He,
> 
> [auto build test ERROR on kvm/linux-next]
> [also build test ERROR on v4.9-rc5]
> [cannot apply to next-20161114]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/He-Chen/x86-kvm-Add-AVX512_4VNNIW-and-AVX512_4FMAPS-support/20161114-170941
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
> config: x86_64-kexec (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/kvm/cpuid.c: In function '__do_cpuid_ent':
> >> arch/x86/kvm/cpuid.c:472:18: error: implicit declaration of function 
> >> 'get_scattered_cpuid_leaf' [-Werror=implicit-function-declaration]
>entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
>  ^~~~
> >> arch/x86/kvm/cpuid.c:472:49: error: 'CPUID_EDX' undeclared (first use in 
> >> this function)
>entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
> ^
>arch/x86/kvm/cpuid.c:472:49: note: each undeclared identifier is reported 
> only once for each function it appears in
>cc1: some warnings being treated as errors
>
I have downloaded .config.gz in attachment and use the .config in it
to build kernel in my local branch again, and I don't see any warn or
error message.

I wonder whether the previous 0001 and 0002 patches have applied to run
this test? Or is there something wrong with my compiler or patches?

Thanks,
-He


Re: [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread He Chen
On Tue, Nov 15, 2016 at 04:24:39AM +0800, kbuild test robot wrote:
> Hi He,
> 
> [auto build test ERROR on kvm/linux-next]
> [also build test ERROR on v4.9-rc5]
> [cannot apply to next-20161114]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/He-Chen/x86-kvm-Add-AVX512_4VNNIW-and-AVX512_4FMAPS-support/20161114-170941
> base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
> config: x86_64-kexec (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64 
> 
> All errors (new ones prefixed by >>):
> 
>arch/x86/kvm/cpuid.c: In function '__do_cpuid_ent':
> >> arch/x86/kvm/cpuid.c:472:18: error: implicit declaration of function 
> >> 'get_scattered_cpuid_leaf' [-Werror=implicit-function-declaration]
>entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
>  ^~~~
> >> arch/x86/kvm/cpuid.c:472:49: error: 'CPUID_EDX' undeclared (first use in 
> >> this function)
>entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
> ^
>arch/x86/kvm/cpuid.c:472:49: note: each undeclared identifier is reported 
> only once for each function it appears in
>cc1: some warnings being treated as errors
>
I have downloaded .config.gz in attachment and use the .config in it
to build kernel in my local branch again, and I don't see any warn or
error message.

I wonder whether the previous 0001 and 0002 patches have applied to run
this test? Or is there something wrong with my compiler or patches?

Thanks,
-He


Re: [PATCH v6 3/3] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread He Chen
On Mon, Nov 14, 2016 at 06:58:22AM +0100, Borislav Petkov wrote:
> On Mon, Nov 14, 2016 at 09:41:04AM +0800, He Chen wrote:
> > Yep, Luwei wrote it and I send it on behalf of him.
> 
> Then it needs to have the following format so that tools can pick up the
> proper author:
> 
> "From: Luwei ...
> 
> 
> 
> Signed-off-by: He Chen...
> Signed-off-by: Luwei...
> ...
> "
> 
> git format-patch gives that formatting.
> 
> If you want to change the ownership, do the following on the local
> commit:
> 
> $ git commit --amend --author="Luwei Kang <luwei.k...@intel.com>"
> 
> in case it lists you locally as author.
> 
> HTH.
> 
I am not sure if it is ok to reply this amended patch in this thread.
or should I send another [Patch v6.1] patchset?

Thanks,
-He


Re: [PATCH v6 3/3] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread He Chen
On Mon, Nov 14, 2016 at 06:58:22AM +0100, Borislav Petkov wrote:
> On Mon, Nov 14, 2016 at 09:41:04AM +0800, He Chen wrote:
> > Yep, Luwei wrote it and I send it on behalf of him.
> 
> Then it needs to have the following format so that tools can pick up the
> proper author:
> 
> "From: Luwei ...
> 
> 
> 
> Signed-off-by: He Chen...
> Signed-off-by: Luwei...
> ...
> "
> 
> git format-patch gives that formatting.
> 
> If you want to change the ownership, do the following on the local
> commit:
> 
> $ git commit --amend --author="Luwei Kang "
> 
> in case it lists you locally as author.
> 
> HTH.
> 
I am not sure if it is ok to reply this amended patch in this thread.
or should I send another [Patch v6.1] patchset?

Thanks,
-He


[Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread He Chen
>From 2daa60b3c6ab5aa6414ebb33119a34403dad2048 Mon Sep 17 00:00:00 2001
From: Luwei Kang <luwei.k...@intel.com>
Date: Mon, 7 Nov 2016 14:03:20 +0800
Subject: [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

Add two new AVX512 subfeatures support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Reviewed-by: Borislav Petkov <b...@suse.de>
Signed-off-by: He Chen <he.c...@linux.intel.com>
Signed-off-by: Luwei Kang <luwei.k...@intel.com>
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-14 Thread He Chen
>From 2daa60b3c6ab5aa6414ebb33119a34403dad2048 Mon Sep 17 00:00:00 2001
From: Luwei Kang 
Date: Mon, 7 Nov 2016 14:03:20 +0800
Subject: [Patch v6.1] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

Add two new AVX512 subfeatures support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Reviewed-by: Borislav Petkov 
Signed-off-by: He Chen 
Signed-off-by: Luwei Kang 
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



Re: [PATCH v6 3/3] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-13 Thread He Chen
On Sat, Nov 12, 2016 at 01:53:29PM +0100, Borislav Petkov wrote:
> On Fri, Nov 11, 2016 at 05:25:36PM +0800, He Chen wrote:
> > Add two new AVX512 subfeatures support for KVM guest.
> > 
> > AVX512_4VNNIW:
> > Vector instructions for deep learning enhanced word variable precision.
> > 
> > AVX512_4FMAPS:
> > Vector instructions for deep learning floating-point single precision.
> > 
> > Reviewed-by: Borislav Petkov <b...@suse.de>
> > Signed-off-by: Luwei Kang <luwei.k...@intel.com>
> > Signed-off-by: He Chen <he.c...@linux.intel.com>
> > ---
> 
> Whoops, I said it looked ok but missed that SOB chain above.
> 
> What does it mean? Did Luwei wrote the patch and you're sending it or
> ...?
> 
Yep, Luwei wrote it and I send it on behalf of him.

Thanks,
-He


Re: [PATCH v6 3/3] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-13 Thread He Chen
On Sat, Nov 12, 2016 at 01:53:29PM +0100, Borislav Petkov wrote:
> On Fri, Nov 11, 2016 at 05:25:36PM +0800, He Chen wrote:
> > Add two new AVX512 subfeatures support for KVM guest.
> > 
> > AVX512_4VNNIW:
> > Vector instructions for deep learning enhanced word variable precision.
> > 
> > AVX512_4FMAPS:
> > Vector instructions for deep learning floating-point single precision.
> > 
> > Reviewed-by: Borislav Petkov 
> > Signed-off-by: Luwei Kang 
> > Signed-off-by: He Chen 
> > ---
> 
> Whoops, I said it looked ok but missed that SOB chain above.
> 
> What does it mean? Did Luwei wrote the patch and you're sending it or
> ...?
> 
Yep, Luwei wrote it and I send it on behalf of him.

Thanks,
-He


[PATCH v6 1/3] x86/cpuid: Cleanup cpuid_regs definitions

2016-11-11 Thread He Chen
make cpuid_regs more clear and avoid potential name clash.

Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/events/intel/pt.c   | 45 +---
 arch/x86/include/asm/processor.h | 11 ++
 arch/x86/kernel/cpu/scattered.c  | 28 ++---
 arch/x86/kernel/cpuid.c  |  4 
 4 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..8f6ac5b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..dbb470e 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,13 +17,6 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
@@ -31,14 +24,14 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
const struct cpuid_bit *cb;
 
static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CR

[PATCH v6 1/3] x86/cpuid: Cleanup cpuid_regs definitions

2016-11-11 Thread He Chen
make cpuid_regs more clear and avoid potential name clash.

Signed-off-by: He Chen 
---
 arch/x86/events/intel/pt.c   | 45 +---
 arch/x86/include/asm/processor.h | 11 ++
 arch/x86/kernel/cpu/scattered.c  | 28 ++---
 arch/x86/kernel/cpuid.c  |  4 
 4 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..8f6ac5b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..dbb470e 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,13 +17,6 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
@@ -31,14 +24,14 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
const struct cpuid_bit *cb;
 
static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CR_EDX, 2, 

[PATCH v6 3/3] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-11 Thread He Chen
Add two new AVX512 subfeatures support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Reviewed-by: Borislav Petkov <b...@suse.de>
Signed-off-by: Luwei Kang <luwei.k...@intel.com>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v6 3/3] x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS support

2016-11-11 Thread He Chen
Add two new AVX512 subfeatures support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Reviewed-by: Borislav Petkov 
Signed-off-by: Luwei Kang 
Signed-off-by: He Chen 
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v6 2/3] x86/cpuid: Add a helper in scattered.c to return cpuid

2016-11-11 Thread He Chen
Some sparse CPUID leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM CPUID enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf() to rebuild actual
CPUID leaf, and it can be called outside by modules.

Reviewed-by: Borislav Petkov <b...@suse.de>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/cpu/scattered.c  | 49 ++--
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8f6ac5b..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbb470e..d1316f9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,24 +17,25 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
u32 regs[4];
const struct cpuid_bit *cb;
 
-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {
 
/* Verify that the level is valid */
@@ -51,3 +52,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   const struct cpuid_bit *cb;
+   u32 cpuid_val = 0;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf);
-- 
2.7.4



[PATCH v6 2/3] x86/cpuid: Add a helper in scattered.c to return cpuid

2016-11-11 Thread He Chen
Some sparse CPUID leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM CPUID enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf() to rebuild actual
CPUID leaf, and it can be called outside by modules.

Reviewed-by: Borislav Petkov 
Signed-off-by: He Chen 
---
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/cpu/scattered.c  | 49 ++--
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8f6ac5b..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index dbb470e..d1316f9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,24 +17,25 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
u32 regs[4];
const struct cpuid_bit *cb;
 
-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {
 
/* Verify that the level is valid */
@@ -51,3 +52,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   const struct cpuid_bit *cb;
+   u32 cpuid_val = 0;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf);
-- 
2.7.4



[PATCH v6 0/3] x86/kvm: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

2016-11-11 Thread He Chen
This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

BTW. sorry for sending patch so frequently, and really appreciate your
kindly review.

---
Changes in v6:
* refine commit messages.

Changes in v5:
* divide the whole patchset into 3 parts.
* refine commit messages.

Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
  support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (3):
  x86/cpuid: Cleanup cpuid_regs definitions
  x86/cpuid: Add a helper in scattered.c to return cpuid
  x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS subfeatures support

 arch/x86/events/intel/pt.c   | 45 ++-
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 57 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 arch/x86/kvm/cpuid.c | 14 +-
 5 files changed, 84 insertions(+), 50 deletions(-)

-- 
2.7.4



[PATCH v6 0/3] x86/kvm: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

2016-11-11 Thread He Chen
This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

BTW. sorry for sending patch so frequently, and really appreciate your
kindly review.

---
Changes in v6:
* refine commit messages.

Changes in v5:
* divide the whole patchset into 3 parts.
* refine commit messages.

Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
  support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (3):
  x86/cpuid: Cleanup cpuid_regs definitions
  x86/cpuid: Add a helper in scattered.c to return cpuid
  x86/kvm: Add AVX512_4VNNIW and AVX512_4FMAPS subfeatures support

 arch/x86/events/intel/pt.c   | 45 ++-
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 57 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 arch/x86/kvm/cpuid.c | 14 +-
 5 files changed, 84 insertions(+), 50 deletions(-)

-- 
2.7.4



[PATCH v5 0/3] cpuid: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

2016-11-09 Thread He Chen
This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

---
Changes in v5:
* divide the whole patchset into 3 parts.
* refine commit messages.

Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
  support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (3):
  cpuid: cleanup cpuid_regs definitions
  cpuid: Add a helper in scattered.c to return cpuid
  cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

 arch/x86/events/intel/pt.c   | 45 ++-
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 57 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 arch/x86/kvm/cpuid.c | 14 +-
 5 files changed, 84 insertions(+), 50 deletions(-)

-- 
2.7.4



[PATCH v5 0/3] cpuid: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

2016-11-09 Thread He Chen
This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

---
Changes in v5:
* divide the whole patchset into 3 parts.
* refine commit messages.

Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
  support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (3):
  cpuid: cleanup cpuid_regs definitions
  cpuid: Add a helper in scattered.c to return cpuid
  cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

 arch/x86/events/intel/pt.c   | 45 ++-
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 57 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 arch/x86/kvm/cpuid.c | 14 +-
 5 files changed, 84 insertions(+), 50 deletions(-)

-- 
2.7.4



[PATCH v5 1/3] cpuid: cleanup cpuid_regs definitions

2016-11-09 Thread He Chen
make cpuid_regs more clear and avoid potential name clash.

Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/events/intel/pt.c   | 45 +---
 arch/x86/include/asm/processor.h | 11 ++
 arch/x86/kernel/cpu/scattered.c  | 28 ++---
 arch/x86/kernel/cpuid.c  |  4 
 4 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..8f6ac5b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..5dbdd0b 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,13 +17,6 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
@@ -31,14 +24,14 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
const struct cpuid_bit *cb;
 
static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CR

[PATCH v5 2/3] cpuid: Add a helper in scattered.c to return cpuid

2016-11-09 Thread He Chen
Some sparse CPUID leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM cpuid enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf() to rebuild actual
CPUID leaf, and it can be called outside by modules.

Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/cpu/scattered.c  | 49 ++--
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8f6ac5b..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 5dbdd0b..d1316f9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,24 +17,25 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
u32 regs[4];
const struct cpuid_bit *cb;
 
-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CPUID_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX, 2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX, 3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CPUID_ECX, 0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CPUID_ECX, 3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CPUID_EDX, 7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CPUID_EDX, 9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX,11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {
 
/* Verify that the level is valid */
@@ -51,3 +52,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   const struct cpuid_bit *cb;
+   u32 cpuid_val = 0;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf);
-- 
2.7.4



[PATCH v5 2/3] cpuid: Add a helper in scattered.c to return cpuid

2016-11-09 Thread He Chen
Some sparse CPUID leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM cpuid enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf() to rebuild actual
CPUID leaf, and it can be called outside by modules.

Signed-off-by: He Chen 
---
 arch/x86/include/asm/processor.h |  3 +++
 arch/x86/kernel/cpu/scattered.c  | 49 ++--
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 8f6ac5b..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -189,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 5dbdd0b..d1316f9 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,24 +17,25 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
+/* Please keep the leaf sorted by cpuid_bit.level for faster search. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CPUID_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CPUID_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CPUID_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CPUID_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CPUID_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
+};
+
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
u32 regs[4];
const struct cpuid_bit *cb;
 
-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CPUID_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CPUID_EDX, 2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CPUID_EDX, 3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CPUID_ECX, 0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CPUID_ECX, 3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CPUID_EDX, 7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CPUID_EDX, 9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CPUID_EDX,11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {
 
/* Verify that the level is valid */
@@ -51,3 +52,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   const struct cpuid_bit *cb;
+   u32 cpuid_val = 0;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf);
-- 
2.7.4



[PATCH v5 1/3] cpuid: cleanup cpuid_regs definitions

2016-11-09 Thread He Chen
make cpuid_regs more clear and avoid potential name clash.

Signed-off-by: He Chen 
---
 arch/x86/events/intel/pt.c   | 45 +---
 arch/x86/include/asm/processor.h | 11 ++
 arch/x86/kernel/cpu/scattered.c  | 28 ++---
 arch/x86/kernel/cpuid.c  |  4 
 4 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..8f6ac5b 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..5dbdd0b 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,13 +17,6 @@ struct cpuid_bit {
u32 sub_leaf;
 };
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 {
u32 max_level;
@@ -31,14 +24,14 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
const struct cpuid_bit *cb;
 
static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CR_EDX, 2, 

[PATCH v5 3/3] cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

2016-11-09 Thread He Chen
Add two new AVX512 instructions support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Signed-off-by: Luwei Kang <luwei.k...@intel.com>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v5 3/3] cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

2016-11-09 Thread He Chen
Add two new AVX512 instructions support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Signed-off-by: Luwei Kang 
Signed-off-by: He Chen 
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v4 1/2] cpuid: Add a helper in scattered.c to return cpuid leaf info

2016-11-08 Thread He Chen
Some sparse cpuid leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM cpuid enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf to rebuild actual
cpuid leaf, and it can be called outside by modules. Also, export
enum cpuid_regs in pt.c and scattered.c to enum cpuid_regs_idx in
processor.h.
---
 arch/x86/events/intel/pt.c   | 45 ++--
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 56 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 4 files changed, 70 insertions(+), 49 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
@@ -178,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 

[PATCH v4 1/2] cpuid: Add a helper in scattered.c to return cpuid leaf info

2016-11-08 Thread He Chen
Some sparse cpuid leafs are gathered in a fake leaf to save size of
x86_capability array in current code, but sometimes, kernel or other
modules (e.g. KVM cpuid enumeration) may need actual hardware leaf
information.

This patch adds a helper get_scattered_cpuid_leaf to rebuild actual
cpuid leaf, and it can be called outside by modules. Also, export
enum cpuid_regs in pt.c and scattered.c to enum cpuid_regs_idx in
processor.h.
---
 arch/x86/events/intel/pt.c   | 45 ++--
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 56 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 4 files changed, 70 insertions(+), 49 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..1c1b9fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);
 
 static struct pt_pmu pt_pmu;
 
-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
@@ -64,21 +57,21 @@ static struct pt_cap_desc {
u8  reg;
u32 mask;
 } pt_caps[] = {
-   PT_CAP(max_subleaf, 0, CR_EAX, 0x),
-   PT_CAP(cr3_filtering,   0, CR_EBX, BIT(0)),
-   PT_CAP(psb_cyc, 0, CR_EBX, BIT(1)),
-   PT_CAP(ip_filtering,0, CR_EBX, BIT(2)),
-   PT_CAP(mtc, 0, CR_EBX, BIT(3)),
-   PT_CAP(ptwrite, 0, CR_EBX, BIT(4)),
-   PT_CAP(power_event_trace,   0, CR_EBX, BIT(5)),
-   PT_CAP(topa_output, 0, CR_ECX, BIT(0)),
-   PT_CAP(topa_multiple_entries,   0, CR_ECX, BIT(1)),
-   PT_CAP(single_range_output, 0, CR_ECX, BIT(2)),
-   PT_CAP(payloads_lip,0, CR_ECX, BIT(31)),
-   PT_CAP(num_address_ranges,  1, CR_EAX, 0x3),
-   PT_CAP(mtc_periods, 1, CR_EAX, 0x),
-   PT_CAP(cycle_thresholds,1, CR_EBX, 0x),
-   PT_CAP(psb_periods, 1, CR_EBX, 0x),
+   PT_CAP(max_subleaf, 0, CPUID_EAX, 0x),
+   PT_CAP(cr3_filtering,   0, CPUID_EBX, BIT(0)),
+   PT_CAP(psb_cyc, 0, CPUID_EBX, BIT(1)),
+   PT_CAP(ip_filtering,0, CPUID_EBX, BIT(2)),
+   PT_CAP(mtc, 0, CPUID_EBX, BIT(3)),
+   PT_CAP(ptwrite, 0, CPUID_EBX, BIT(4)),
+   PT_CAP(power_event_trace,   0, CPUID_EBX, BIT(5)),
+   PT_CAP(topa_output, 0, CPUID_ECX, BIT(0)),
+   PT_CAP(topa_multiple_entries,   0, CPUID_ECX, BIT(1)),
+   PT_CAP(single_range_output, 0, CPUID_ECX, BIT(2)),
+   PT_CAP(payloads_lip,0, CPUID_ECX, BIT(31)),
+   PT_CAP(num_address_ranges,  1, CPUID_EAX, 0x3),
+   PT_CAP(mtc_periods, 1, CPUID_EAX, 0x),
+   PT_CAP(cycle_thresholds,1, CPUID_EBX, 0x),
+   PT_CAP(psb_periods, 1, CPUID_EBX, 0x),
 };
 
 static u32 pt_cap_get(enum pt_capabilities cap)
@@ -213,10 +206,10 @@ static int __init pt_pmu_hw_init(void)
 
for (i = 0; i < PT_CPUID_LEAVES; i++) {
cpuid_count(20, i,
-   _pmu.caps[CR_EAX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EBX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_ECX + i*PT_CPUID_REGS_NUM],
-   _pmu.caps[CR_EDX + i*PT_CPUID_REGS_NUM]);
+   _pmu.caps[CPUID_EAX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EBX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_ECX + i*PT_CPUID_REGS_NUM],
+   _pmu.caps[CPUID_EDX + i*PT_CPUID_REGS_NUM]);
}
 
ret = -ENOMEM;
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e7f8c62 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,17 @@ struct cpuinfo_x86 {
u32 microcode;
 };
 
+struct cpuid_regs {
+   u32 eax, ebx, ecx, edx;
+};
+
+enum cpuid_regs_idx {
+   CPUID_EAX = 0,
+   CPUID_EBX,
+   CPUID_ECX,
+   CPUID_EDX,
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
@@ -178,6 +189,9 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf,
+   enum cpuid_regs_idx reg);
 

[PATCH v4 2/2] cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

2016-11-08 Thread He Chen
Add two new AVX512 instructions support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v4 2/2] cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

2016-11-08 Thread He Chen
Add two new AVX512 instructions support for KVM guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.
---
 arch/x86/kvm/cpuid.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..ddcdf7c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include  /* For use_eager_fpu.  Ugh! */
 #include 
 #include 
@@ -65,6 +66,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +382,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +468,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v4 0/2] cpuid: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

2016-11-08 Thread He Chen
This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

---
Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
  support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (2):
  cpuid: Add a helper in scattered.c to return cpuid leaf info
  cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

 arch/x86/events/intel/pt.c   | 45 ++--
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 56 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 arch/x86/kvm/cpuid.c | 14 +-
 5 files changed, 83 insertions(+), 50 deletions(-)

--
2.7.4



[PATCH v4 0/2] cpuid: Support AVX512_4VNNIW and AVX512_4FMAPS for KVM guest

2016-11-08 Thread He Chen
This patch series is going to add two new AVX512 features to KVM guest.
Since these two features are defined as scattered features in kernel,
some extra modification in kernel is included.

---
Changes in v4:
* divide patch into 2 parts, including modification in scattered.c and
  support new AVX512 instructions for KVM.
* coding style.
* refine commit message.

Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h

He Chen (2):
  cpuid: Add a helper in scattered.c to return cpuid leaf info
  cpuid: add AVX512_4VNNIW and AVX512_4FMAPS instructions support

 arch/x86/events/intel/pt.c   | 45 ++--
 arch/x86/include/asm/processor.h | 14 ++
 arch/x86/kernel/cpu/scattered.c  | 56 ++--
 arch/x86/kernel/cpuid.c  |  4 ---
 arch/x86/kvm/cpuid.c | 14 +-
 5 files changed, 83 insertions(+), 50 deletions(-)

--
2.7.4



Re: [PATCH v3] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-11-06 Thread He Chen
On Fri, Nov 04, 2016 at 11:52:35AM +0100, Borislav Petkov wrote:
> Please CC me on your future submissions, thanks.
> 

Sure.

> On Fri, Nov 04, 2016 at 03:07:19PM +0800, He Chen wrote:
> > The spec can be found in Intel Software Developer Manual or in
> > Instruction Set Extensions Programming Reference.
> 
> This commit message is completely useless. Write commit messages in
> the way as if you're explaining to another person *why* this change is
> needed and that other person doesn't have an idea what you're doing.
> 

My carelessness, will improve it in next patch. Thanks for kindly
advices.

> > Changes in v3:
> > * add a helper in scattered.c to get scattered leaf.
> 
> The modification to scattered et al without the kvm use should be a
> separate patch.
> 

Agreed.

> >   * Capabilities of Intel PT hardware, such as number of address bits or
> >   * supported output schemes, are cached and exported to userspace as "caps"
> > diff --git a/arch/x86/include/asm/processor.h 
> > b/arch/x86/include/asm/processor.h
> > index 984a7bf..47978b7 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -137,6 +137,13 @@ struct cpuinfo_x86 {
> > u32 microcode;
> >  };
> > 
> > +enum cpuid_regs_idx {
> 
> cpuid_regs was just fine.
> 

It should be, but I found it conflcts with `struct cpuid_regs` in
`arch/x86/kernel/cpuid.c` since it got exported.

Thanks,
-He


Re: [PATCH v3] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-11-06 Thread He Chen
On Fri, Nov 04, 2016 at 11:52:35AM +0100, Borislav Petkov wrote:
> Please CC me on your future submissions, thanks.
> 

Sure.

> On Fri, Nov 04, 2016 at 03:07:19PM +0800, He Chen wrote:
> > The spec can be found in Intel Software Developer Manual or in
> > Instruction Set Extensions Programming Reference.
> 
> This commit message is completely useless. Write commit messages in
> the way as if you're explaining to another person *why* this change is
> needed and that other person doesn't have an idea what you're doing.
> 

My carelessness, will improve it in next patch. Thanks for kindly
advices.

> > Changes in v3:
> > * add a helper in scattered.c to get scattered leaf.
> 
> The modification to scattered et al without the kvm use should be a
> separate patch.
> 

Agreed.

> >   * Capabilities of Intel PT hardware, such as number of address bits or
> >   * supported output schemes, are cached and exported to userspace as "caps"
> > diff --git a/arch/x86/include/asm/processor.h 
> > b/arch/x86/include/asm/processor.h
> > index 984a7bf..47978b7 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -137,6 +137,13 @@ struct cpuinfo_x86 {
> > u32 microcode;
> >  };
> > 
> > +enum cpuid_regs_idx {
> 
> cpuid_regs was just fine.
> 

It should be, but I found it conflcts with `struct cpuid_regs` in
`arch/x86/kernel/cpuid.c` since it got exported.

Thanks,
-He


[PATCH v3] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-11-04 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang <luwei.k...@intel.com>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h
---
 arch/x86/events/intel/pt.c   |  7 --
 arch/x86/include/asm/processor.h |  9 +++
 arch/x86/kernel/cpu/scattered.c  | 52 +++-
 arch/x86/kvm/cpuid.c | 14 ++-
 4 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..5b4b972 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);

 static struct pt_pmu pt_pmu;

-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..47978b7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,13 @@ struct cpuinfo_x86 {
u32 microcode;
 };

+enum cpuid_regs_idx {
+   CR_EAX = 0,
+   CR_ECX,
+   CR_EDX,
+   CR_EBX
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
@@ -178,6 +185,8 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf, enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);

diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..ca3c605 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,11 +17,17 @@ struct cpuid_bit {
u32 sub_leaf;
 };

-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
+/* Please keep the leaf sorted. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CR_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CR_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CR_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CR_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CR_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CR_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CR_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CR_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
 };

 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
@@ -30,18 +36,6 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
u32 regs[4];
const struct cpuid_bit *cb;

-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CR_EDX, 2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CR_EDX, 3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CR_ECX, 0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CR_ECX, 3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CR_EDX, 9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {

/* Verify that the level is valid */
@@ -57,3 +51,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   u32 cpuid_val = 0;
+   const struct cpuid_bit *cb;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return c

[PATCH v3] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-11-04 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang 
Signed-off-by: He Chen 
---
Changes in v3:
* add a helper in scattered.c to get scattered leaf.

Changes in v2:
* add new macros for new AVX512 scattered features.
* add a cpuid_count_edx function to processor.h
---
 arch/x86/events/intel/pt.c   |  7 --
 arch/x86/include/asm/processor.h |  9 +++
 arch/x86/kernel/cpu/scattered.c  | 52 +++-
 arch/x86/kvm/cpuid.c | 14 ++-
 4 files changed, 57 insertions(+), 25 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index c5047b8..5b4b972 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -36,13 +36,6 @@ static DEFINE_PER_CPU(struct pt, pt_ctx);

 static struct pt_pmu pt_pmu;

-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
-};
-
 /*
  * Capabilities of Intel PT hardware, such as number of address bits or
  * supported output schemes, are cached and exported to userspace as "caps"
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..47978b7 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -137,6 +137,13 @@ struct cpuinfo_x86 {
u32 microcode;
 };

+enum cpuid_regs_idx {
+   CR_EAX = 0,
+   CR_ECX,
+   CR_EDX,
+   CR_EBX
+};
+
 #define X86_VENDOR_INTEL   0
 #define X86_VENDOR_CYRIX   1
 #define X86_VENDOR_AMD 2
@@ -178,6 +185,8 @@ extern void identify_secondary_cpu(struct cpuinfo_x86 *);
 extern void print_cpu_info(struct cpuinfo_x86 *);
 void print_cpu_msr(struct cpuinfo_x86 *);
 extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
+extern u32 get_scattered_cpuid_leaf(unsigned int level,
+   unsigned int sub_leaf, enum cpuid_regs_idx reg);
 extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
 extern void init_amd_cacheinfo(struct cpuinfo_x86 *c);

diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 1db8dc4..ca3c605 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -17,11 +17,17 @@ struct cpuid_bit {
u32 sub_leaf;
 };

-enum cpuid_regs {
-   CR_EAX = 0,
-   CR_ECX,
-   CR_EDX,
-   CR_EBX
+/* Please keep the leaf sorted. */
+static const struct cpuid_bit cpuid_bits[] = {
+   { X86_FEATURE_APERFMPERF,   CR_ECX,  0, 0x0006, 0 },
+   { X86_FEATURE_EPB,  CR_ECX,  3, 0x0006, 0 },
+   { X86_FEATURE_INTEL_PT, CR_EBX, 25, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4VNNIW,CR_EDX,  2, 0x0007, 0 },
+   { X86_FEATURE_AVX512_4FMAPS,CR_EDX,  3, 0x0007, 0 },
+   { X86_FEATURE_HW_PSTATE,CR_EDX,  7, 0x8007, 0 },
+   { X86_FEATURE_CPB,  CR_EDX,  9, 0x8007, 0 },
+   { X86_FEATURE_PROC_FEEDBACK,CR_EDX, 11, 0x8007, 0 },
+   { 0, 0, 0, 0, 0 }
 };

 void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
@@ -30,18 +36,6 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
u32 regs[4];
const struct cpuid_bit *cb;

-   static const struct cpuid_bit cpuid_bits[] = {
-   { X86_FEATURE_INTEL_PT, CR_EBX,25, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4VNNIW,CR_EDX, 2, 0x0007, 0 },
-   { X86_FEATURE_AVX512_4FMAPS,CR_EDX, 3, 0x0007, 0 },
-   { X86_FEATURE_APERFMPERF,   CR_ECX, 0, 0x0006, 0 },
-   { X86_FEATURE_EPB,  CR_ECX, 3, 0x0006, 0 },
-   { X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 },
-   { X86_FEATURE_CPB,  CR_EDX, 9, 0x8007, 0 },
-   { X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },
-   { 0, 0, 0, 0, 0 }
-   };
-
for (cb = cpuid_bits; cb->feature; cb++) {

/* Verify that the level is valid */
@@ -57,3 +51,27 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
set_cpu_cap(c, cb->feature);
}
 }
+
+u32 get_scattered_cpuid_leaf(unsigned int level, unsigned int sub_leaf,
+enum cpuid_regs_idx reg)
+{
+   u32 cpuid_val = 0;
+   const struct cpuid_bit *cb;
+
+   for (cb = cpuid_bits; cb->feature; cb++) {
+
+   if (level > cb->level)
+   continue;
+
+   if (level < cb->level)
+   break;
+
+   if (reg == cb->reg && sub_leaf == cb->sub_leaf) {
+   if (cpu_has(_cpu_data, cb->feature))
+   cpuid_val |= BIT(cb->bit);
+   }
+   }
+
+   return cpuid_val;
+}
+EXPORT_SYMBOL_GPL(get_scattered_cpuid_leaf

Re: [PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-11-01 Thread He Chen
On Mon, Oct 31, 2016 at 12:41:32PM +0100, Paolo Bonzini wrote:
> 
> 
> On 31/10/2016 12:05, Borislav Petkov wrote:
> > On Mon, Oct 31, 2016 at 11:47:48AM +0100, Paolo Bonzini wrote:
> >> The information is all in arch/x86/kernel/cpu/scattered.c's cpuid_bits
> >> array.  Borislav, would it be okay to export the cpuid_regs enum?
> > 
> > Yeah, and kill the duplicated one in arch/x86/events/intel/pt.c too
> > please, while at it.
> > 
> > I'd still put it all in arch/x86/kernel/cpu/scattered.c so that it is
> > close-by and call it from outside.
> 
> Good.  Chen, are you going to do this?
> 

Sure.

Before sending a patch, let me check if my understanding is right...
I will add a helper in scattered.c like:

unsigned int get_scattered_cpuid_features(unsigned int level,
unsigned int sub_leaf, enum cpuid_regs 
reg)
{
u32 val = 0;
const struct cpuid_bit *cb;

for (cb = cpuid_bits; cb->feature; cb++) {

if (reg == cb->reg &&
level == cb->level &&
sub_leaf == cb->sub_leaf &&
boot_cpu_has(cb->feature))

val |= cb->bit;
}

return val;
}

And, when KVM wants to mask out features, it can be called outside like:

entry->edx &= kvm_cpuid_7_0_edx_x86_features;
entry->edx &= get_scatterd_cpuid_features(7, 0, CR_EDX);

Thanks,
-He


Re: [PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-11-01 Thread He Chen
On Mon, Oct 31, 2016 at 12:41:32PM +0100, Paolo Bonzini wrote:
> 
> 
> On 31/10/2016 12:05, Borislav Petkov wrote:
> > On Mon, Oct 31, 2016 at 11:47:48AM +0100, Paolo Bonzini wrote:
> >> The information is all in arch/x86/kernel/cpu/scattered.c's cpuid_bits
> >> array.  Borislav, would it be okay to export the cpuid_regs enum?
> > 
> > Yeah, and kill the duplicated one in arch/x86/events/intel/pt.c too
> > please, while at it.
> > 
> > I'd still put it all in arch/x86/kernel/cpu/scattered.c so that it is
> > close-by and call it from outside.
> 
> Good.  Chen, are you going to do this?
> 

Sure.

Before sending a patch, let me check if my understanding is right...
I will add a helper in scattered.c like:

unsigned int get_scattered_cpuid_features(unsigned int level,
unsigned int sub_leaf, enum cpuid_regs 
reg)
{
u32 val = 0;
const struct cpuid_bit *cb;

for (cb = cpuid_bits; cb->feature; cb++) {

if (reg == cb->reg &&
level == cb->level &&
sub_leaf == cb->sub_leaf &&
boot_cpu_has(cb->feature))

val |= cb->bit;
}

return val;
}

And, when KVM wants to mask out features, it can be called outside like:

entry->edx &= kvm_cpuid_7_0_edx_x86_features;
entry->edx &= get_scatterd_cpuid_features(7, 0, CR_EDX);

Thanks,
-He


[PATCH v2] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-31 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang <luwei.k...@intel.com>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
Changes in v2:
* add new macros for new AVX512 scattered features
* add a cpuid_count_edx function to processor.h
---
 arch/x86/include/asm/processor.h |  9 +
 arch/x86/kvm/cpuid.c | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e5ad7a74 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -577,6 +577,15 @@ static inline unsigned int cpuid_edx(unsigned int op)
return edx;
 }
 
+static inline unsigned int cpuid_count_edx(unsigned op, unsigned count)
+{
+   unsigned int eax, ebx, ecx, edx;
+
+   cpuid_count(op, count, , , , );
+
+   return edx;
+}
+
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static __always_inline void rep_nop(void)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..9990e7a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +381,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +467,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= cpuid_count_edx(7, 0);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v2] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-31 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang 
Signed-off-by: He Chen 
---
Changes in v2:
* add new macros for new AVX512 scattered features
* add a cpuid_count_edx function to processor.h
---
 arch/x86/include/asm/processor.h |  9 +
 arch/x86/kvm/cpuid.c | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e5ad7a74 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -577,6 +577,15 @@ static inline unsigned int cpuid_edx(unsigned int op)
return edx;
 }
 
+static inline unsigned int cpuid_count_edx(unsigned op, unsigned count)
+{
+   unsigned int eax, ebx, ecx, edx;
+
+   cpuid_count(op, count, , , , );
+
+   return edx;
+}
+
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static __always_inline void rep_nop(void)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..9990e7a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +381,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +467,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= cpuid_count_edx(7, 0);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v2] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-31 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang <luwei.k...@intel.com>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
Changes in v2:
* add new macros for new AVX512 scattered features
* add a cpuid_count_edx function to processor.h
---
 arch/x86/include/asm/processor.h |  9 +
 arch/x86/kvm/cpuid.c | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e5ad7a74 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -577,6 +577,15 @@ static inline unsigned int cpuid_edx(unsigned int op)
return edx;
 }
 
+static inline unsigned int cpuid_count_edx(unsigned op, unsigned count)
+{
+   unsigned int eax, ebx, ecx, edx;
+
+   cpuid_count(op, count, , , , );
+
+   return edx;
+}
+
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static __always_inline void rep_nop(void)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..9990e7a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +381,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +467,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= cpuid_count_edx(7, 0);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH v2] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-31 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang 
Signed-off-by: He Chen 
---
Changes in v2:
* add new macros for new AVX512 scattered features
* add a cpuid_count_edx function to processor.h
---
 arch/x86/include/asm/processor.h |  9 +
 arch/x86/kvm/cpuid.c | 13 -
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..e5ad7a74 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -577,6 +577,15 @@ static inline unsigned int cpuid_edx(unsigned int op)
return edx;
 }
 
+static inline unsigned int cpuid_count_edx(unsigned op, unsigned count)
+{
+   unsigned int eax, ebx, ecx, edx;
+
+   cpuid_count(op, count, , , , );
+
+   return edx;
+}
+
 /* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
 static __always_inline void rep_nop(void)
 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..9990e7a 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -65,6 +65,11 @@ u64 kvm_supported_xcr0(void)
 
 #define F(x) bit(X86_FEATURE_##x)
 
+/* These are scattered features in cpufeatures.h. */
+#define KVM_CPUID_BIT_AVX512_4VNNIW 2
+#define KVM_CPUID_BIT_AVX512_4FMAPS 3
+#define KF(x) bit(KVM_CPUID_BIT_##x)
+
 int kvm_update_cpuid(struct kvm_vcpu *vcpu)
 {
struct kvm_cpuid_entry2 *best;
@@ -376,6 +381,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS);
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +467,14 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+   entry->edx &= kvm_cpuid_7_0_edx_x86_features;
+   entry->edx &= cpuid_count_edx(7, 0);
} else {
entry->ebx = 0;
entry->ecx = 0;
+   entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



Re: [PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-28 Thread He Chen
On Fri, Oct 28, 2016 at 11:54:13AM +0200, Paolo Bonzini wrote:
> 
> 
> On 28/10/2016 11:46, He Chen wrote:
> > On Fri, Oct 28, 2016 at 11:31:05AM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 28/10/2016 11:12, He Chen wrote:
> >>> The spec can be found in Intel Software Developer Manual or in
> >>> Instruction Set Extensions Programming Reference.
> >>>
> >>> Signed-off-by: Luwei Kang <luwei.k...@intel.com>
> >>> Signed-off-by: He Chen <he.c...@linux.intel.com>
> >>> ---
> >>>  arch/x86/kvm/cpuid.c | 7 ++-
> >>>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> >>> index afa7bbb..328b169 100644
> >>> --- a/arch/x86/kvm/cpuid.c
> >>> +++ b/arch/x86/kvm/cpuid.c
> >>> @@ -376,6 +376,10 @@ static inline int __do_cpuid_ent(struct 
> >>> kvm_cpuid_entry2 *entry, u32 function,
> >>>   /* cpuid 7.0.ecx*/
> >>>   const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
> >>>  
> >>> + /* cpuid 7.0.edx*/
> >>> + const u32 kvm_cpuid_7_0_edx_x86_features =
> >>> +0x4 /* AVX512-4VNNIW */ | 0x8 /* AVX512-4FMAPS */;
> >>
> >> Please define the new features in cpufeature.h first.
> >>
> > These 2 new features defined as scattered features in kernel.
> > In cpufeature.h, there are:
> > #define X86_FEATURE_AVX512_4VNNIW (7*32+16)
> > #define X86_FEATURE_AVX512_4FMAPS (7*32+17)
> > 
> > Please see disscusion here:
> > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1250183.html
> 
> Uff, that sucks. :(  I'd agree with hpa's position in that thread.
> 
> Please do something like
> 
>   /* These are scattered features in cpufeature.h.  */
>   #define KVM_CPUID_BIT_AVX512_4VNNIW 2
>   #define KVM_CPUID_BIT_AVX512_4FMAPS 3
>   #define KF(x)   bit(KVM_CPUID_BIT_##x)
> 
> and then
> 
>   const u32 kvm_cpuid_7_0_edx_x86_features =
>   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS)
> 
> I'll think of a trick to avoid using F for scattered features...
> 
Appreciate it :-)


Re: [PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-28 Thread He Chen
On Fri, Oct 28, 2016 at 11:54:13AM +0200, Paolo Bonzini wrote:
> 
> 
> On 28/10/2016 11:46, He Chen wrote:
> > On Fri, Oct 28, 2016 at 11:31:05AM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 28/10/2016 11:12, He Chen wrote:
> >>> The spec can be found in Intel Software Developer Manual or in
> >>> Instruction Set Extensions Programming Reference.
> >>>
> >>> Signed-off-by: Luwei Kang 
> >>> Signed-off-by: He Chen 
> >>> ---
> >>>  arch/x86/kvm/cpuid.c | 7 ++-
> >>>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> >>> index afa7bbb..328b169 100644
> >>> --- a/arch/x86/kvm/cpuid.c
> >>> +++ b/arch/x86/kvm/cpuid.c
> >>> @@ -376,6 +376,10 @@ static inline int __do_cpuid_ent(struct 
> >>> kvm_cpuid_entry2 *entry, u32 function,
> >>>   /* cpuid 7.0.ecx*/
> >>>   const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
> >>>  
> >>> + /* cpuid 7.0.edx*/
> >>> + const u32 kvm_cpuid_7_0_edx_x86_features =
> >>> +0x4 /* AVX512-4VNNIW */ | 0x8 /* AVX512-4FMAPS */;
> >>
> >> Please define the new features in cpufeature.h first.
> >>
> > These 2 new features defined as scattered features in kernel.
> > In cpufeature.h, there are:
> > #define X86_FEATURE_AVX512_4VNNIW (7*32+16)
> > #define X86_FEATURE_AVX512_4FMAPS (7*32+17)
> > 
> > Please see disscusion here:
> > https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1250183.html
> 
> Uff, that sucks. :(  I'd agree with hpa's position in that thread.
> 
> Please do something like
> 
>   /* These are scattered features in cpufeature.h.  */
>   #define KVM_CPUID_BIT_AVX512_4VNNIW 2
>   #define KVM_CPUID_BIT_AVX512_4FMAPS 3
>   #define KF(x)   bit(KVM_CPUID_BIT_##x)
> 
> and then
> 
>   const u32 kvm_cpuid_7_0_edx_x86_features =
>   KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS)
> 
> I'll think of a trick to avoid using F for scattered features...
> 
Appreciate it :-)


Re: [PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-28 Thread He Chen
On Fri, Oct 28, 2016 at 11:31:05AM +0200, Paolo Bonzini wrote:
> 
> 
> On 28/10/2016 11:12, He Chen wrote:
> > The spec can be found in Intel Software Developer Manual or in
> > Instruction Set Extensions Programming Reference.
> > 
> > Signed-off-by: Luwei Kang <luwei.k...@intel.com>
> > Signed-off-by: He Chen <he.c...@linux.intel.com>
> > ---
> >  arch/x86/kvm/cpuid.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index afa7bbb..328b169 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -376,6 +376,10 @@ static inline int __do_cpuid_ent(struct 
> > kvm_cpuid_entry2 *entry, u32 function,
> > /* cpuid 7.0.ecx*/
> > const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
> >  
> > +   /* cpuid 7.0.edx*/
> > +   const u32 kvm_cpuid_7_0_edx_x86_features =
> > +0x4 /* AVX512-4VNNIW */ | 0x8 /* AVX512-4FMAPS */;
> 
> Please define the new features in cpufeature.h first.
> 
These 2 new features defined as scattered features in kernel.
In cpufeature.h, there are:
#define X86_FEATURE_AVX512_4VNNIW (7*32+16)
#define X86_FEATURE_AVX512_4FMAPS (7*32+17)

Please see disscusion here:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1250183.html

Thanks,
-He


Re: [PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-28 Thread He Chen
On Fri, Oct 28, 2016 at 11:31:05AM +0200, Paolo Bonzini wrote:
> 
> 
> On 28/10/2016 11:12, He Chen wrote:
> > The spec can be found in Intel Software Developer Manual or in
> > Instruction Set Extensions Programming Reference.
> > 
> > Signed-off-by: Luwei Kang 
> > Signed-off-by: He Chen 
> > ---
> >  arch/x86/kvm/cpuid.c | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index afa7bbb..328b169 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -376,6 +376,10 @@ static inline int __do_cpuid_ent(struct 
> > kvm_cpuid_entry2 *entry, u32 function,
> > /* cpuid 7.0.ecx*/
> > const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
> >  
> > +   /* cpuid 7.0.edx*/
> > +   const u32 kvm_cpuid_7_0_edx_x86_features =
> > +0x4 /* AVX512-4VNNIW */ | 0x8 /* AVX512-4FMAPS */;
> 
> Please define the new features in cpufeature.h first.
> 
These 2 new features defined as scattered features in kernel.
In cpufeature.h, there are:
#define X86_FEATURE_AVX512_4VNNIW (7*32+16)
#define X86_FEATURE_AVX512_4FMAPS (7*32+17)

Please see disscusion here:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1250183.html

Thanks,
-He


[PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-28 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang <luwei.k...@intel.com>
Signed-off-by: He Chen <he.c...@linux.intel.com>
---
 arch/x86/kvm/cpuid.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..328b169 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -376,6 +376,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+0x4 /* AVX512-4VNNIW */ | 0x8 /* AVX512-4FMAPS */;
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +462,13 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+entry->edx &= kvm_cpuid_7_0_edx_x86_features;
} else {
entry->ebx = 0;
entry->ecx = 0;
+entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4



[PATCH] x86/cpuid: expose AVX512_4VNNIW and AVX512_4FMAPS features to kvm guest

2016-10-28 Thread He Chen
The spec can be found in Intel Software Developer Manual or in
Instruction Set Extensions Programming Reference.

Signed-off-by: Luwei Kang 
Signed-off-by: He Chen 
---
 arch/x86/kvm/cpuid.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index afa7bbb..328b169 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -376,6 +376,10 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ecx*/
const u32 kvm_cpuid_7_0_ecx_x86_features = F(PKU) | 0 /*OSPKE*/;
 
+   /* cpuid 7.0.edx*/
+   const u32 kvm_cpuid_7_0_edx_x86_features =
+0x4 /* AVX512-4VNNIW */ | 0x8 /* AVX512-4FMAPS */;
+
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
 
@@ -458,12 +462,13 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* PKU is not yet implemented for shadow paging. */
if (!tdp_enabled)
entry->ecx &= ~F(PKU);
+entry->edx &= kvm_cpuid_7_0_edx_x86_features;
} else {
entry->ebx = 0;
entry->ecx = 0;
+entry->edx = 0;
}
entry->eax = 0;
-   entry->edx = 0;
break;
}
case 9:
-- 
2.7.4