On Sun, Jun 16, 2019 at 10:24:13PM +0200, Thomas Gleixner wrote:
> On Sun, 16 Jun 2019, Fenghua Yu wrote:
> 
> > Although x86_cache_max_rmid and x86_cache_occ_scale are only read once
> > during resctrl initialization, they are always stored in cpuinfo_x86 for
> > each CPU during run time without usage. Even if resctrl is not
> > configured, they still occupy space in cpuinfo_x86.
> 
> And that's a problem because?
> 
> > To save cpuinfo_x86 space and make CPU and resctrl initialization simpler,
> > remove the two fields from cpuinfo_x86 and get max rmid and occupancy
> 
> What is simpler? The fact that more code fiddles with CPUID? That's exactly
> the wrong direction.
> 
> The storage size of struct cpuinfo is largely uninteresting especially as
> long as we keep num_online_cpus copies of the same information around.
> 
> Just grep for places which invoke CPUID and then look how many of them do
> it over and over and even the code in arch/x86/kernel/cpu/ is an
> unpenetrable mess for exactly this reason.
> 
> The right thing to do is to have one instance of the CPUID information
> which is
> 
>   - a proper data structure with named fields and named bits
> 
>   - a single master instance which can be mapped to all CPUs
> 
> This data structure is filled in in one go by reading out all leaves and
> not by the maze we have today which puts together selected parts piecewise
> and never exposes a full and consistent picture.
> 
> This allows to remove all these custom copies of CPUID leaf readouts and
> allows proper filtering/disabling at a central place.
> 
> Making it a proper data structure with fields and bits gets rid of all that
> hex masking/shifting nonsense which is used to decode parts of those
> fields.
> 
> That's not a performance issue because all performance critical code should
> use static_cpu_has() anyway. For non critical code boot_cpu_has() is
> sufficient.
> 
> Upcoming secondary CPUs would do a sanity check on their CPUID content to
> check whether everything is symmetric. We should do that actually today
> because not detecting asymetric features early leads to exactly the issue
> which was fixed recently with loading the micro code earlier than perf.
> But we can't because the information retrieval is done in a gazillion of
> places.
>   
> Now you might argue that the upcoming asymetric processors (SIGH!) will
> require per CPU instances of the feature leafs. Sure that needs some
> thought, but it needs thought even with the current code and I'm absolutely
> not interested to duct tape that stuff into the current code.
> 
> The solution for this with the above scheme is to utilize the feature
> mismatch detection and have a proper classification which features are
> allowed to deviate and which are not. For those which can deviate, we can
> provide separate storage as this information needs to be propagated to
> other entities (fault handlers, placement code, xsave variants etc.). But
> that's a limited amount of information and the bulk will still be the same.
> 
> This mismatch detection is essential for dealing with future asymetric
> CPUs proper. When the kernel detects it, it can disable mismatching
> features which are not yet handled by code which has to be aware of them.
> 
> And disabling them means that with that scheme we can actually trap CPUID
> in userspace and provide it consistent and filtered information instead
> of having the mismatch between kernel view and user space view.
> 
> Borislav has experimented with that already, but thanks to the marvelous
> security features built into Intel (and other) CPUs this is still mostly a
> drawing board exercise.
> 
> Just for the record. Before this cleanup takes place, I'm not even looking
> at any patches which attempt to support asymetric processors. The current
> supply of duct tape engineering horrors is sufficient for bad mood. No need
> for more of that.

I see. Then this patch #1 doesn't make sense.

So in the next version, can I remove the patch #1, change the patch #2 as
follows, and keep the patch #3?

It's a waste for the four X86_FEATURE_CQM_* features to occupy two
pure feature bits words. To better utilize feature words, re-define
word 11 to host scattered features and move the four X86_FEATURE_CQM_*
features into Linux defined word 11. More scattered features can be added
in word 11 in the future.

Leaf 11 in cpuid_leafs is renamed as CPUID_LNX_4 to reflect it's a Linux
defined leaf.

Although word 12 doesn't have any feature now, leaf 12 in cpuid_leafs
still needs to be kept because cpuid_leafs must have NCAPINTS leafs.
Rename leaf 12 as CPUID_DUMMY which will be replaced by a meaningful name
in the next patch when CPUID.7.1:EAX occupies world 12.

KVM doesn't support resctrl now. So it's safe to move the
X86_FEATURE_CQM_* features to scattered features word 11 for KVM.

Signed-off-by: Fenghua Yu <[email protected]>
---
 arch/x86/include/asm/cpufeature.h  |  4 +--
 arch/x86/include/asm/cpufeatures.h | 17 ++++++----
 arch/x86/kernel/cpu/common.c       | 53 +++++++++++++++---------------
 arch/x86/kernel/cpu/cpuid-deps.c   |  3 ++
 arch/x86/kernel/cpu/scattered.c    |  4 +++
 arch/x86/kvm/cpuid.h               |  2 --
 6 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 1d337c51f7e6..403f70c2e431 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -22,8 +22,8 @@ enum cpuid_leafs
        CPUID_LNX_3,
        CPUID_7_0_EBX,
        CPUID_D_1_EAX,
-       CPUID_F_0_EDX,
-       CPUID_F_1_EDX,
+       CPUID_LNX_4,
+       CPUID_DUMMY,
        CPUID_8000_0008_EBX,
        CPUID_6_EAX,
        CPUID_8000_000A_EDX,
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..4f0a3d093794 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -269,13 +269,16 @@
 #define X86_FEATURE_XGETBV1            (10*32+ 2) /* XGETBV with ECX = 1 
instruction */
 #define X86_FEATURE_XSAVES             (10*32+ 3) /* XSAVES/XRSTORS 
instructions */
 
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */
-#define X86_FEATURE_CQM_LLC            (11*32+ 1) /* LLC QoS if 1 */
-
-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */
-#define X86_FEATURE_CQM_OCCUP_LLC      (12*32+ 0) /* LLC occupancy monitoring 
*/
-#define X86_FEATURE_CQM_MBM_TOTAL      (12*32+ 1) /* LLC Total MBM monitoring 
*/
-#define X86_FEATURE_CQM_MBM_LOCAL      (12*32+ 2) /* LLC Local MBM monitoring 
*/
+/*
+ * Extended auxiliary flags: Linux defined - For features scattered in various
+ * CPUID levels like 0xf, word 11.
+ *
+ * Reuse free bits when adding new feature flags!
+ */
+#define X86_FEATURE_CQM_LLC            (11*32+ 0) /* LLC QoS if 1 */
+#define X86_FEATURE_CQM_OCCUP_LLC      (11*32+ 1) /* LLC occupancy monitoring 
*/
+#define X86_FEATURE_CQM_MBM_TOTAL      (11*32+ 2) /* LLC Total MBM monitoring 
*/
+#define X86_FEATURE_CQM_MBM_LOCAL      (11*32+ 3) /* LLC Local MBM monitoring 
*/
 
 /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */
 #define X86_FEATURE_CLZERO             (13*32+ 0) /* CLZERO instruction */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2c57fffebf9b..f080be35da41 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -801,6 +801,31 @@ static void init_speculation_control(struct cpuinfo_x86 *c)
        }
 }
 
+static void get_cqm_info(struct cpuinfo_x86 *c)
+{
+       if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
+               u32 eax, ebx, ecx, edx;
+
+               /* QoS sub-leaf, EAX=0Fh, ECX=0 */
+               cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
+               /* will be overridden if occupancy monitoring exists */
+               c->x86_cache_max_rmid = ebx;
+
+               if (cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC) ||
+                   cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL) ||
+                   cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)) {
+                       /* QoS sub-leaf, EAX=0Fh, ECX=1 */
+                       cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
+
+                       c->x86_cache_max_rmid = ecx;
+                       c->x86_cache_occ_scale = ebx;
+               }
+       } else {
+               c->x86_cache_max_rmid = -1;
+               c->x86_cache_occ_scale = -1;
+       }
+}
+
 void get_cpu_cap(struct cpuinfo_x86 *c)
 {
        u32 eax, ebx, ecx, edx;
@@ -832,33 +857,6 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
                c->x86_capability[CPUID_D_1_EAX] = eax;
        }
 
-       /* Additional Intel-defined flags: level 0x0000000F */
-       if (c->cpuid_level >= 0x0000000F) {
-
-               /* QoS sub-leaf, EAX=0Fh, ECX=0 */
-               cpuid_count(0x0000000F, 0, &eax, &ebx, &ecx, &edx);
-               c->x86_capability[CPUID_F_0_EDX] = edx;
-
-               if (cpu_has(c, X86_FEATURE_CQM_LLC)) {
-                       /* will be overridden if occupancy monitoring exists */
-                       c->x86_cache_max_rmid = ebx;
-
-                       /* QoS sub-leaf, EAX=0Fh, ECX=1 */
-                       cpuid_count(0x0000000F, 1, &eax, &ebx, &ecx, &edx);
-                       c->x86_capability[CPUID_F_1_EDX] = edx;
-
-                       if ((cpu_has(c, X86_FEATURE_CQM_OCCUP_LLC)) ||
-                             ((cpu_has(c, X86_FEATURE_CQM_MBM_TOTAL)) ||
-                              (cpu_has(c, X86_FEATURE_CQM_MBM_LOCAL)))) {
-                               c->x86_cache_max_rmid = ecx;
-                               c->x86_cache_occ_scale = ebx;
-                       }
-               } else {
-                       c->x86_cache_max_rmid = -1;
-                       c->x86_cache_occ_scale = -1;
-               }
-       }
-
        /* AMD-defined flags: level 0x80000001 */
        eax = cpuid_eax(0x80000000);
        c->extended_cpuid_level = eax;
@@ -889,6 +887,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)
 
        init_scattered_cpuid_features(c);
        init_speculation_control(c);
+       get_cqm_info(c);
 
        /*
         * Clear/Set all flags overridden by options, after probe.
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..fa07a224e7b9 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -59,6 +59,9 @@ static const struct cpuid_dep cpuid_deps[] = {
        { X86_FEATURE_AVX512_4VNNIW,    X86_FEATURE_AVX512F   },
        { X86_FEATURE_AVX512_4FMAPS,    X86_FEATURE_AVX512F   },
        { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F   },
+       { X86_FEATURE_CQM_OCCUP_LLC,    X86_FEATURE_CQM_LLC   },
+       { X86_FEATURE_CQM_MBM_TOTAL,    X86_FEATURE_CQM_LLC   },
+       { X86_FEATURE_CQM_MBM_LOCAL,    X86_FEATURE_CQM_LLC   },
        {}
 };
 
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 94aa1c72ca98..adf9b71386ef 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -26,6 +26,10 @@ struct cpuid_bit {
 static const struct cpuid_bit cpuid_bits[] = {
        { X86_FEATURE_APERFMPERF,       CPUID_ECX,  0, 0x00000006, 0 },
        { X86_FEATURE_EPB,              CPUID_ECX,  3, 0x00000006, 0 },
+       { X86_FEATURE_CQM_LLC,          CPUID_EDX,  1, 0x0000000f, 0 },
+       { X86_FEATURE_CQM_OCCUP_LLC,    CPUID_EDX,  0, 0x0000000f, 1 },
+       { X86_FEATURE_CQM_MBM_TOTAL,    CPUID_EDX,  1, 0x0000000f, 1 },
+       { X86_FEATURE_CQM_MBM_LOCAL,    CPUID_EDX,  2, 0x0000000f, 1 },
        { X86_FEATURE_CAT_L3,           CPUID_EBX,  1, 0x00000010, 0 },
        { X86_FEATURE_CAT_L2,           CPUID_EBX,  2, 0x00000010, 0 },
        { X86_FEATURE_CDP_L3,           CPUID_ECX,  2, 0x00000010, 1 },
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5b6d1f..d78a61408243 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -47,8 +47,6 @@ static const struct cpuid_reg reverse_cpuid[] = {
        [CPUID_8000_0001_ECX] = {0x80000001, 0, CPUID_ECX},
        [CPUID_7_0_EBX]       = {         7, 0, CPUID_EBX},
        [CPUID_D_1_EAX]       = {       0xd, 1, CPUID_EAX},
-       [CPUID_F_0_EDX]       = {       0xf, 0, CPUID_EDX},
-       [CPUID_F_1_EDX]       = {       0xf, 1, CPUID_EDX},
        [CPUID_8000_0008_EBX] = {0x80000008, 0, CPUID_EBX},
        [CPUID_6_EAX]         = {         6, 0, CPUID_EAX},
        [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX},
-- 
2.19.1


Thanks.

-Fenghua

Reply via email to