from:"James Morse"

Re: [PATCH v2 01/24] x86/resctrl: Split struct rdt_resource

2021-04-08 Thread James Morse

Hi Reinette,

On 07/04/2021 00:42, Reinette Chatre wrote:
> On 4/6/2021 10:13 AM, James Morse wrote:
>> On 31/03/2021 22:35, Reinette Chatre wrote:
>>> On 3/12/2021 9:58 AM, James Morse wrote:
>>>> resctrl is the defacto Linux ABI for SoC resource partitioning features.
>>>> To support it on another architecture, it needs to be abstracted from
>>>> the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
>>>>
>>>> Start by splitting struct rdt_resource, (the name is kept to keep the noise
>>>> down), and add some type-trickery to keep the foreach helpers working.

>>>> Move everything that is particular to resctrl into a new header
>>>> file, keeping the x86 hardware accessors where they are. resctrl code
>>>> paths touching a 'hw' struct indicates where an abstraction is needed.
>>>
>>> This establishes the significance of this patch. Here the rdt_resource 
>>> struct is split up
>>> and it is this split that guides the subsequent abstraction. Considering 
>>> this I find that
>>> this description does not explain the resulting split sufficiently.
>>>
>>> Specifically, after reading the above summary I expect fs information in 
>>> rdt_resource and
>>> hw information in rdt_hw_resource but that does not seem to be the case. 
>>> For example,
>>> num_rmid is a property obtained from hardware but is found in rdt_resource 
>>> while other
>>> hardware properties initialized at the same time are found in 
>>> rdt_hw_resource. It is
>>> interesting to look at when the hardware is discovered (for example, 
>>> functions like
>>> cache_alloc_hsw_probe(), __get_mem_config_intel(), 
>>> __rdt_get_mem_config_amd(),
>>> rdt_get_cache_alloc_cfg()). Note how some of the discovered values end up 
>>> in rdt_resource
>>> and some in rdt_hw_resource.
>>
>>> I was expecting these properties discovered from hardware to
>>> be in rdt_hw_resource.
>>
>> Not all values discovered from the hardware are private to the architecture. 
>> They only
>> need to be private if there is some further abstraction involved.

> ok, but rdt_hw_resource is described as "hw attributes of a resctrl resource" 
> so this can
> be very confusing if rdt_hw_resource does _not_ actually contain (all of) the 
> hw
> attributes of a resctrl resource.

Aha, right. I'm bad at naming things. This started as untangling the hardware 
(cough:
arch) specific bits, but some things have migrated back the other way.

Do you think either of arch_rdt_resource or rdt_priv_resource are clearer?


> Could you please expand the kernel doc for rdt_hw_resource to explain that, 
> apart from
> @resctrl (that I just noticed is missing a description),

I'll add one for mbm_width too,

> it contains attributes needing
> abstraction for different architectures as opposed to the actual hardware 
> attributes?

|/**
| * struct rdt_hw_resource - arch private attributes of a resctrl resource
| * @resctrl:   Attributes of the resource used directly by resctrl.
| * @num_closid:Number of CLOSIDs available.
| * @msr_base:  Base MSR address for CBMs
| * @msr_update:Function pointer to update QOS MSRs
| * @mon_scale: cqm counter * mon_scale = occupancy in bytes
| * @mbm_width: Monitor width, to detect and correct for overflow.
| *
| * Members of this structure are either private to the architecture
| * e.g. mbm_width, or accessed via helpers that provide abstraction. e.g.
| * msr_update and msr_base.
| */


>> On your specific example: the resctrl filesystem code allocates from 
>> num_rmid. Its meaning
>> doesn't change. num_closid on the other hand changes depending on whether 
>> CDP is in use.
>>
>> Putting num_closid in resctrl's struct rdt_resource would work, but the 
>> value is wrong
>> once CDP is enabled. This would be annoying to debug, hiding the hardware 
>> value and
>> providing it via a helper avoids this, as by the end of the series there is 
>> only one
>> consumer: schemata_list_create().
>>
>> For MPAM, the helper would return arm64's version of rdt_min_closid as there 
>> is only one
>> 'num_closid' for the system, regardless of the resource. The driver has to 
>> duplicate the
>> logic in closid_init() to find the minimum common value of all the 
>> resources, as not all
>> the resources are exposed to resctrl, and an out-of-range closid value 
>> triggers an error
>> interrupt.
>>
>>
>>> It is also not clear to me how these structures are inten

Re: [PATCH v2 02/24] x86/resctrl: Split struct rdt_domain

2021-04-08 Thread James Morse

Hi Reinette,

On 31/03/2021 22:36, Reinette Chatre wrote:
> On 3/12/2021 9:58 AM, James Morse wrote:
>> resctrl is the defacto Linux ABI for SoC resource partitioning features.
>> To support it on another architecture, it needs to be abstracted from
>> the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
>>
>> Split struct rdt_domain up too. Move everything that that is particular
> 
> s/that that/that/
> 
>> to resctrl into a new header file. resctrl code paths touching a 'hw'
>> struct indicates where an abstraction is needed.
> 
> Similar to previous patch it would help to explain how this split was chosen. 
> For example,
> why are the CPUs to which a resource is associated not considered a hardware 
> property?

Similarly, because the meaning of those CPUs doesn't differ or change between 
architectures.

I've expanded the middle paragraph in the commit message to explain why the 
arch specific
things are arch specific:
| Continue by splitting struct rdt_domain, into an architecture private
| 'hw' struct, which contains the common resctrl structure that would be
| used by any architecture.
|
| The hardware values in ctrl_val and mbps_val need to be accessed via
| helpers to allow another architecture to convert these into a different
| format if necessary.
|
| After this split, filesystem code code paths touching a 'hw' struct
| indicates where an abstraction is needed.

and similarly changed the kernel doc comment.

Let me know if you prefer some other struct name.

Thanks,

James

Re: [PATCH v2 00/24] x86/resctrl: Merge the CDP resources

2021-04-08 Thread James Morse

Hi Babu,

On 06/04/2021 22:37, Babu Moger wrote:
> On 4/6/21 12:19 PM, James Morse wrote:
>> On 30/03/2021 21:36, Babu Moger wrote:
>>> On 3/12/21 11:58 AM, James Morse wrote:
>>>> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
>>>> behaviour is all contained in the filesystem parts, with a minimum amount
>>>> of arch specific code.

>>>> This series collapses the CODE/DATA resources, moving all the user-visible
>>>> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
>>>> configuration being applied to a cache. This is done by adding a
>>>> struct resctrl_schema to the parts of resctrl that will move to fs. This
>>>> holds the arch-code resource that is in use for this schema, along with
>>>> other properties like the name, and whether the configuration being applied
>>>> is CODE/DATA/BOTH.
>>
>>
>>> I applied your patches on my AMD box.
>>
>> Great! Thanks for taking a look,
>>
>>
>>> Seeing some difference in the behavior.
>>
>> Ooer,
>>
>>
>>> Before these patches.
>>>
>>> # dmesg |grep -i resctrl
>>> [   13.076973] resctrl: L3 allocation detected
>>> [   13.087835] resctrl: L3DATA allocation detected
>>> [   13.092886] resctrl: L3CODE allocation detected
>>> [   13.097936] resctrl: MB allocation detected
>>> [   13.102599] resctrl: L3 monitoring detected
>>>
>>>
>>> After the patches.
>>>
>>> # dmesg |grep -i resctrl
>>> [   13.076973] resctrl: L3 allocation detected
>>> [   13.097936] resctrl: MB allocation detected
>>> [   13.102599] resctrl: L3 monitoring detected
>>>
>>> You can see that L3DATA and L3CODE disappeared. I think we should keep the
>>> behavior same for x86(at least).
>>
>> This is the kernel log ... what user-space software is parsing that for an 
>> expected value?
>> What happens if the resctrl strings have been overwritten by more kernel log?
>>
>> I don't think user-space should be relying on this. I'd argue any user-space 
>> doing this is
>> already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't 
>> seem to do
>> anything useful
>>
>> Whether resctrl is support can be read from /proc/filesystems. CDP is 
>> probably a
>> try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a 
>> good idea.

> Yes. Agree. Looking at the dmesg may no be right way to figure out all the
> support details. As a normal practice, I searched for these texts and
> noticed difference. That is why I felt it is best to keep those texts same
> as before.

>> Its easy to fix, but it seems odd that the kernel has to print things for 
>> user-space to
>> try and parse. (I'd like to point at the user-space software that depends on 
>> this)
> 
> I dont think there is any software that parses the dmesg for these
> details. These are info messages for the developers.

The kernel log changes all the time, messages at boot aren't something you can 
depend on
seeing later. Unless there is some user-space software broken by this, I'm 
afraid I don't
think its a good idea to add extra code to keep it the same.

Printing 'CDP supported by Lx' would be more useful to developers perusing the 
log. Even
more useful would be exposing feature attributes via sysfs to say what resctrl 
supports
without having to mount-it-and-see. This can then be used by user-space too.
e.g.:
| cat /sys/fs/ext4/features/fast_commit


>>> I am still not clear why we needed resctrl_conf_type
>>>
>>> enum resctrl_conf_type {
>>> CDP_BOTH,
>>> CDP_CODE,
>>> CDP_DATA,
>>> };
>>>
>>> Right now, I see all the resources are initialized as CDP_BOTH.
>>>
>>>  [RDT_RESOURCE_L3] =
>>> {
>>> .conf_type  = CDP_BOTH,
>>>  [RDT_RESOURCE_L2] =
>>> {
>>> .conf_type  = CDP_BOTH,
>>>  [RDT_RESOURCE_MBA] =
>>> {
>>> .conf_type  = CDP_BOTH,
>>
>> Ah, those should have been removed in patch 24. Once all the resources are 
>> the same, the
>> resource doesn't need to describe what kind it is.
>>
>>
>>> If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
>>> CDP_DATA?
>>
>> The filesystem code for resctrl that will eventually move out of arch/x86 
>>

Re: [PATCH v2 00/24] x86/resctrl: Merge the CDP resources

2021-04-06 Thread James Morse

Hi Babu,

On 30/03/2021 21:36, Babu Moger wrote:
> On 3/12/21 11:58 AM, James Morse wrote:
>> This series re-folds the resctrl code so the CDP resources (L3CODE et al)
>> behaviour is all contained in the filesystem parts, with a minimum amount
>> of arch specific code.
>>
>> Arm have some CPU support for dividing caches into portions, and
>> applying bandwidth limits at various points in the SoC. The collective term
>> for these features is MPAM: Memory Partitioning and Monitoring.
>>
>> MPAM is similar enough to Intel RDT, that it should use the defacto linux
>> interface: resctrl. This filesystem currently lives under arch/x86, and is
>> tightly coupled to the architecture.
>> Ultimately, my plan is to split the existing resctrl code up to have an
>> arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
>> MPAM can be wired up.
>>
>> x86 might have two resources with cache controls, (L2 and L3) but has
>> extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
>> if CDP is enabled for the corresponding cache.
>>
>> MPAM has an equivalent feature to CDP, but its a property of the CPU,
>> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
>> its the ABI, but this isn't how the MPAM hardware works. It is entirely
>> possible that an in-kernel user of MPAM would not be using CDP, whereas
>> resctrl is.
>> Pretending L3CODE and L3DATA are entirely separate resources is a neat
>> trick, but doing this is specific to x86.
>> Doing this leaves the arch code in control of various parts of the
>> filesystem ABI: the resources names, and the way the schemata are parsed.
>> Allowing this stuff to vary between architectures is bad for user space.
>>
>> This series collapses the CODE/DATA resources, moving all the user-visible
>> resctrl ABI into what becomes the filesystem code. CDP becomes the type of
>> configuration being applied to a cache. This is done by adding a
>> struct resctrl_schema to the parts of resctrl that will move to fs. This
>> holds the arch-code resource that is in use for this schema, along with
>> other properties like the name, and whether the configuration being applied
>> is CODE/DATA/BOTH.


> I applied your patches on my AMD box.

Great! Thanks for taking a look,


> Seeing some difference in the behavior.

Ooer,


> Before these patches.
> 
> # dmesg |grep -i resctrl
> [   13.076973] resctrl: L3 allocation detected
> [   13.087835] resctrl: L3DATA allocation detected
> [   13.092886] resctrl: L3CODE allocation detected
> [   13.097936] resctrl: MB allocation detected
> [   13.102599] resctrl: L3 monitoring detected
> 
> 
> After the patches.
> 
> # dmesg |grep -i resctrl
> [   13.076973] resctrl: L3 allocation detected
> [   13.097936] resctrl: MB allocation detected
> [   13.102599] resctrl: L3 monitoring detected
> 
> You can see that L3DATA and L3CODE disappeared. I think we should keep the
> behavior same for x86(at least).

This is the kernel log ... what user-space software is parsing that for an 
expected value?
What happens if the resctrl strings have been overwritten by more kernel log?

I don't think user-space should be relying on this. I'd argue any user-space 
doing this is
already broken. Is it just the kernel selftest's filter_dmesg()? It doesn't 
seem to do
anything useful

Whether resctrl is support can be read from /proc/filesystems. CDP is probably a
try-it-and-see. User-space could parse /proc/cpuinfo, but its probably not a 
good idea.


Its easy to fix, but it seems odd that the kernel has to print things for 
user-space to
try and parse. (I'd like to point at the user-space software that depends on 
this)


> I am still not clear why we needed resctrl_conf_type
> 
> enum resctrl_conf_type {
> CDP_BOTH,
> CDP_CODE,
> CDP_DATA,
> };
> 
> Right now, I see all the resources are initialized as CDP_BOTH.
> 
>  [RDT_RESOURCE_L3] =
> {
> .conf_type  = CDP_BOTH,
>  [RDT_RESOURCE_L2] =
> {
> .conf_type  = CDP_BOTH,
>  [RDT_RESOURCE_MBA] =
> {
> .conf_type  = CDP_BOTH,

Ah, those should have been removed in patch 24. Once all the resources are the 
same, the
resource doesn't need to describe what kind it is.


> If all the resources are CDP_BOTH, then why we need separate CDP_CODE and
> CDP_DATA?

The filesystem code for resctrl that will eventually move out of arch/x86 needs 
to be able
to describe the type of configuration change being made back to the arch code. 
The enum
gets used for that.

x86 needs this as it

Re: [PATCH v2 01/24] x86/resctrl: Split struct rdt_resource

2021-04-06 Thread James Morse

Hi Reinette,

On 31/03/2021 22:35, Reinette Chatre wrote:
> On 3/12/2021 9:58 AM, James Morse wrote:
>> resctrl is the defacto Linux ABI for SoC resource partitioning features.
>> To support it on another architecture, it needs to be abstracted from
>> the features provided by Intel RDT and AMD PQoS, and moved to /fs/.
>>
>> Start by splitting struct rdt_resource, (the name is kept to keep the noise
>> down), and add some type-trickery to keep the foreach helpers working.

> Could you please replace "add some type-trickery" with a description of the
> changes(tricks?) referred to? Comments in the code would be helpful also ... 
> helping to
> avoid frowning at what at first glance seems like an out-of-bounds access.

Sure, this paragraph is rephrased:
| Start by splitting struct rdt_resource, into an arch specific 'hw'
| struct, which contains the common resctrl structure that would be used
| by any architecture.
|
| The foreach helpers are most commonly used by the filesystem code,
| and should return the common resctrl structure. for_each_rdt_resource()
| is changed to walk the common structure in its parent arch specific
| structure.

and a comment above for_each_rdt_resource():
| /*
|  * To return the common struct rdt_resource, which is contained in struct
|  * rdt_hw_resource, walk the resctrl member of struct rdt_hw_resource.
|  * This makes the limit the resctrl member past the end of the array.
|  */

>> Move everything that is particular to resctrl into a new header
>> file, keeping the x86 hardware accessors where they are. resctrl code
>> paths touching a 'hw' struct indicates where an abstraction is needed.
> 
> This establishes the significance of this patch. Here the rdt_resource struct 
> is split up
> and it is this split that guides the subsequent abstraction. Considering this 
> I find that
> this description does not explain the resulting split sufficiently.
> 
> Specifically, after reading the above summary I expect fs information in 
> rdt_resource and
> hw information in rdt_hw_resource but that does not seem to be the case. For 
> example,
> num_rmid is a property obtained from hardware but is found in rdt_resource 
> while other
> hardware properties initialized at the same time are found in 
> rdt_hw_resource. It is
> interesting to look at when the hardware is discovered (for example, 
> functions like
> cache_alloc_hsw_probe(), __get_mem_config_intel(), __rdt_get_mem_config_amd(),
> rdt_get_cache_alloc_cfg()). Note how some of the discovered values end up in 
> rdt_resource
> and some in rdt_hw_resource.

> I was expecting these properties discovered from hardware to
> be in rdt_hw_resource.

Not all values discovered from the hardware are private to the architecture. 
They only
need to be private if there is some further abstraction involved.

There is a trade-off here. Everything could be accessed via helpers, but I 
think that
would result in a lot of boiler plate.

On your specific example: the resctrl filesystem code allocates from num_rmid. 
Its meaning
doesn't change. num_closid on the other hand changes depending on whether CDP 
is in use.

Putting num_closid in resctrl's struct rdt_resource would work, but the value 
is wrong
once CDP is enabled. This would be annoying to debug, hiding the hardware value 
and
providing it via a helper avoids this, as by the end of the series there is 
only one
consumer: schemata_list_create().

For MPAM, the helper would return arm64's version of rdt_min_closid as there is 
only one
'num_closid' for the system, regardless of the resource. The driver has to 
duplicate the
logic in closid_init() to find the minimum common value of all the resources, 
as not all
the resources are exposed to resctrl, and an out-of-range closid value triggers 
an error
interrupt.

> It is also not clear to me how these structures are intended to be used for 
> related
> hardware properties. For example, rdt_resource keeps the properties
> alloc_capable/alloc_enabled/mon_capable/mon_enabled - but in this series 
> companion
> properties of cdp_capable/cdp_enabled are introduced and placed in 
> rdt_hw_resource.

There needs to be further abstraction around cdp_enabled. For Arm's MPAM CDP is 
emulated
by providing different closid for data-access and instruction-fetch. This is 
done in the
equivalent to IA32_PQR_ASSOC, so it affects all the resources.

For MPAM all resources would be cdp_capable, so the field doesn't need to exist.
cdp_enabled has to be used via a helper, as its a global property for all the 
tasks that
resctrl is in control of, not a per-resource field.

(this is the reason the previous version tried to make the CDP state global, on 
the
assumption it would never appear on both L2 and L3 for x86 systems)

(The next patch after these removes alloc_enabled, as it n

[PATCH v2 24/24] x86/resctrl: Merge the CDP resources

2021-03-12 Thread James Morse

Now that resctrl uses the schema's configuration type is the source of
CODE/DATA configuration styles, and there is only one configuration
array between the three views of the resource, remove the CODE and DATA
aliases.

This allows the alloc_ctrlval_array() and complications around free()ing
the ctrl_val arrays to be removed.

To continue providing the CDP user-interface, the resctrl filesystem
code creates two schema for one resource when CDP is enabled, and
generates the names itself. This ensures the user interface is the
same when another architecture emulates CDPs behaviour.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * rdt_get_cdp_config() is kept for its comment.
---
 arch/x86/kernel/cpu/resctrl/core.c | 174 ++---
 arch/x86/kernel/cpu/resctrl/internal.h |   4 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 122 -
 3 files changed, 75 insertions(+), 225 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 5021a726e87d..7c20d0469b3a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -78,42 +78,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base   = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
-   [RDT_RESOURCE_L3DATA] =
-   {
-   .conf_type  = CDP_DATA,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L3DATA,
-   .name   = "L3DATA",
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L3DATA),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L3_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
-   [RDT_RESOURCE_L3CODE] =
-   {
-   .conf_type  = CDP_CODE,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L3CODE,
-   .name   = "L3CODE",
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L3CODE),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L3_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
[RDT_RESOURCE_L2] =
{
.conf_type  = CDP_BOTH,
@@ -132,42 +96,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base   = MSR_IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
-   [RDT_RESOURCE_L2DATA] =
-   {
-   .conf_type  = CDP_DATA,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L2DATA,
-   .name   = "L2DATA",
-   .cache_level= 2,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L2DATA),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L2_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
-   [RDT_RESOURCE_L2CODE] =
-   {
-   .conf_type  = CDP_CODE,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L2CODE,
-   .name   = "L2CODE",
-   .cache_level= 2,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L2CODE),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-

[PATCH v2 23/24] x86/resctrl: Remove rdt_cdp_peer_get()

2021-03-12 Thread James Morse

Now that the configuration can be read from either CDP peer,
rdt_cdp_peer_get() is not needed to to map the resource and search for the
corresponding domain.

As __rdtgroup_cbm_overlaps() takes the configuration type from the schema,
this can be replaced with a second call for the other configuration type
if cdp is enabled.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 

Changes since v1:
 * Expanded commit mesasge.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 99 --
 1 file changed, 14 insertions(+), 85 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 801bff59db06..36e2905f4da6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1092,82 +1092,17 @@ static int rdtgroup_mode_show(struct kernfs_open_file 
*of,
return 0;
 }
 
-/**
- * rdt_cdp_peer_get - Retrieve CDP peer if it exists
- * @r: RDT resource to which RDT domain @d belongs
- * @d: Cache instance for which a CDP peer is requested
- * @r_cdp: RDT resource that shares hardware with @r (RDT resource peer)
- * Used to return the result.
- * @d_cdp: RDT domain that shares hardware with @d (RDT domain peer)
- * Used to return the result.
- * @peer_type: The CDP configuration type of the peer resource.
- *
- * RDT resources are managed independently and by extension the RDT domains
- * (RDT resource instances) are managed independently also. The Code and
- * Data Prioritization (CDP) RDT resources, while managed independently,
- * could refer to the same underlying hardware. For example,
- * RDT_RESOURCE_L2CODE and RDT_RESOURCE_L2DATA both refer to the L2 cache.
- *
- * When provided with an RDT resource @r and an instance of that RDT
- * resource @d rdt_cdp_peer_get() will return if there is a peer RDT
- * resource and the exact instance that shares the same hardware.
- *
- * Return: 0 if a CDP peer was found, <0 on error or if no CDP peer exists.
- * If a CDP peer was found, @r_cdp will point to the peer RDT resource
- * and @d_cdp will point to the peer RDT domain.
- */
-static int rdt_cdp_peer_get(struct rdt_resource *r, struct rdt_domain *d,
-   struct rdt_resource **r_cdp,
-   struct rdt_domain **d_cdp,
-   enum resctrl_conf_type *peer_type)
+static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type)
 {
-   struct rdt_resource *_r_cdp = NULL;
-   struct rdt_domain *_d_cdp = NULL;
-   int ret = 0;
-
-   switch (r->rid) {
-   case RDT_RESOURCE_L3DATA:
-   _r_cdp = _resources_all[RDT_RESOURCE_L3CODE].resctrl;
-   *peer_type = CDP_CODE;
-   break;
-   case RDT_RESOURCE_L3CODE:
-   _r_cdp =  _resources_all[RDT_RESOURCE_L3DATA].resctrl;
-   *peer_type = CDP_DATA;
-   break;
-   case RDT_RESOURCE_L2DATA:
-   _r_cdp =  _resources_all[RDT_RESOURCE_L2CODE].resctrl;
-   *peer_type = CDP_CODE;
-   break;
-   case RDT_RESOURCE_L2CODE:
-   _r_cdp =  _resources_all[RDT_RESOURCE_L2DATA].resctrl;
-   *peer_type = CDP_DATA;
-   break;
+   switch (my_type) {
+   case CDP_CODE:
+   return CDP_DATA;
+   case CDP_DATA:
+   return CDP_CODE;
default:
-   ret = -ENOENT;
-   goto out;
-   }
-
-   /*
-* When a new CPU comes online and CDP is enabled then the new
-* RDT domains (if any) associated with both CDP RDT resources
-* are added in the same CPU online routine while the
-* rdtgroup_mutex is held. It should thus not happen for one
-* RDT domain to exist and be associated with its RDT CDP
-* resource but there is no RDT domain associated with the
-* peer RDT CDP resource. Hence the WARN.
-*/
-   _d_cdp = rdt_find_domain(_r_cdp, d->id, NULL);
-   if (WARN_ON(IS_ERR_OR_NULL(_d_cdp))) {
-   _r_cdp = NULL;
-   _d_cdp = NULL;
-   ret = -EINVAL;
+   case CDP_BOTH:
+   return CDP_BOTH;
}
-
-out:
-   *r_cdp = _r_cdp;
-   *d_cdp = _d_cdp;
-
-   return ret;
 }
 
 /**
@@ -1248,19 +1183,16 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource 
*r, struct rdt_domain *d
 bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
   unsigned long cbm, int closid, bool exclusive)
 {
-   enum resctrl_conf_type peer_type;
+   enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
struct rdt_resource *r = s->res;
-   struct rdt_resource *r_cdp;
-   struct rdt_domain *d_cdp;
 
if (__rdtgroup_cbm_overlaps(r, d, cbm, closid, s->conf_type,
exclusive))

[PATCH v2 22/24] x86/resctrl: Merge the ctrl_val arrays

2021-03-12 Thread James Morse

Now that the CODE/DATA resources don't use overlapping slots in the
ctrl_val arrays, they can be merged. This allows the cdp_peer configuration
to be read from either resource's domain, instead of searching for the
matching flavour.

Doing this before merging the resources temporarily complicates
allocating and freeing the ctrl_val arrays. Add a helper to allocate
the ctrl_val array, that returns the value on the L2 or L3 resource
if it already exists. This gets removed once the resources are merged,
and there really is only one ctrl_val array.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Added underscores to ctrlval when its not in a function name
 * Removed temporary free_ctrlval_arrays() function.
 * Reduced churn in domain_setup_ctrlval().

Doing this before the resources are merged allows any bugs introduced to
be bisected to a smaller change, and keeps the rdt_cdp_peer_get() change
separate.
---
 arch/x86/kernel/cpu/resctrl/core.c | 66 --
 1 file changed, 62 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 8d5c1e9eefa1..5021a726e87d 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -509,6 +509,58 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 
*dc, u32 *dm)
}
 }
 
+static u32 *alloc_ctrlval_array(struct rdt_resource *r, struct rdt_domain *d,
+   bool mba_sc)
+{
+   /* these are for the underlying hardware, they may not match r/d */
+   struct rdt_domain *underlying_domain;
+   struct rdt_hw_resource *hw_res;
+   struct rdt_hw_domain *hw_dom;
+   bool remapped;
+
+   switch (r->rid) {
+   case RDT_RESOURCE_L3DATA:
+   case RDT_RESOURCE_L3CODE:
+   hw_res = _resources_all[RDT_RESOURCE_L3];
+   remapped = true;
+   break;
+   case RDT_RESOURCE_L2DATA:
+   case RDT_RESOURCE_L2CODE:
+   hw_res = _resources_all[RDT_RESOURCE_L2];
+   remapped = true;
+   break;
+   default:
+   hw_res = resctrl_to_arch_res(r);
+   remapped = false;
+   }
+
+
+   /*
+* If we changed the resource, we need to search for the underlying
+* domain. Doing this for all resources would make it tricky to add the
+* first resource, as domains aren't added to a resource list until
+* after the ctrlval arrays have been allocated.
+*/
+   if (remapped)
+   underlying_domain = rdt_find_domain(_res->resctrl, d->id,
+   NULL);
+   else
+   underlying_domain = d;
+   hw_dom = resctrl_to_arch_dom(underlying_domain);
+
+   if (mba_sc) {
+   if (hw_dom->mbps_val)
+   return hw_dom->mbps_val;
+   return kmalloc_array(hw_res->num_closid,
+sizeof(*hw_dom->mbps_val), GFP_KERNEL);
+   } else {
+   if (hw_dom->ctrl_val)
+   return hw_dom->ctrl_val;
+   return kmalloc_array(hw_res->num_closid,
+sizeof(*hw_dom->ctrl_val), GFP_KERNEL);
+   }
+}
+
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -516,11 +568,11 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
struct msr_param m;
u32 *dc, *dm;
 
-   dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val), 
GFP_KERNEL);
+   dc = alloc_ctrlval_array(r, d, false);
if (!dc)
return -ENOMEM;
 
-   dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val), 
GFP_KERNEL);
+   dm = alloc_ctrlval_array(r, d, true);
if (!dm) {
kfree(dc);
return -ENOMEM;
@@ -679,8 +731,14 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
if (d->plr)
d->plr->d = NULL;
 
-   kfree(hw_dom->ctrl_val);
-   kfree(hw_dom->mbps_val);
+   /* temporary: these four don't have a unique ctrlval array */
+   if ((r->rid != RDT_RESOURCE_L3CODE) &&
+   (r->rid != RDT_RESOURCE_L3DATA) &&
+   (r->rid != RDT_RESOURCE_L2CODE) &&
+   (r->rid != RDT_RESOURCE_L2DATA)) {
+   kfree(hw_dom->ctrl_val);
+   kfree(hw_dom->mbps_val);
+   }
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
-- 
2.30.0

[PATCH v2 21/24] x86/resctrl: Calculate the index from the configuration type

2021-03-12 Thread James Morse

resctrl uses cbm_idx() to map a closid to an index in the
configuration array. This is based on a multiplier and offset
that are held in the resource.

To merge the resources, the resctrl arch code needs to calculate
the index from something else, as there will only be one resource.

Decide based on the staged configuration type. This makes the
static mult and offset parameters redundant.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c| 12 
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 17 +++--
 include/linux/resctrl.h   |  6 --
 3 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index cb3186bc248b..8d5c1e9eefa1 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -69,8 +69,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
},
.domains= domain_init(RDT_RESOURCE_L3),
.parse_ctrlval  = parse_cbm,
@@ -89,8 +87,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
},
.domains= 
domain_init(RDT_RESOURCE_L3DATA),
.parse_ctrlval  = parse_cbm,
@@ -109,8 +105,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 1,
},
.domains= 
domain_init(RDT_RESOURCE_L3CODE),
.parse_ctrlval  = parse_cbm,
@@ -129,8 +123,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
},
.domains= domain_init(RDT_RESOURCE_L2),
.parse_ctrlval  = parse_cbm,
@@ -149,8 +141,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
},
.domains= 
domain_init(RDT_RESOURCE_L2DATA),
.parse_ctrlval  = parse_cbm,
@@ -169,8 +159,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 1,
},
.domains= 
domain_init(RDT_RESOURCE_L2CODE),
.parse_ctrlval  = parse_cbm,
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 12a898d42689..50266b524222 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -246,12 +246,17 @@ static int parse_line(char *line, struct resctrl_schema 
*s,
return -EINVAL;
 }
 
-static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
+static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
 {
-   if (r->rid == RDT_RESOURCE_MBA)
+   switch (type) {
+   default:
+   case CDP_BOTH:
return closid;
-
-   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
+   case CDP_CODE:
+   return (closid * 2) + 1;
+   case CDP_DATA:
+   return (closid * 2);
+   }
 }
 
 static bool apply_config(struct rdt_hw_domain *hw_dom,
@@ -297,7 +302,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 
closid)
if (!cfg->have_new_ctrl)
continue;
 
-   idx = cbm_idx(r, closid);
+   idx = get_config_ind

[PATCH v2 20/24] x86/resctrl: Apply offset correction when config is staged

2021-03-12 Thread James Morse

When resctrl comes to write the CAT MSR values, it applies an
adjustment based on the style of the resource. CODE and DATA
resources have their closid mapped into an odd/even range. Previously
this decision was based on the resource.

Move this logic to apply_config() so that in future it can
be based on the style of the configuration, not the resource.

This makes it possible to merge the resources.

Once the resources are merged, there may be multiple configurations
to apply for a single closid and resource. resctrl_arch_update_domains()
should supply the low and high indexes based on the changes it has made
to the ctrl_val array as this allows the hardware to be updated once for
a set of changes.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Removing the patch that moved the closid to the staged config means the
   min/max and return from apply_config() appears here.
---
 arch/x86/kernel/cpu/resctrl/core.c| 15 +---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 44 ++-
 arch/x86/kernel/cpu/resctrl/internal.h|  4 +--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  7 
 4 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index f1d0d64e5d97..cb3186bc248b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -195,11 +195,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
 };
 
-static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
-{
-   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
-}
-
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as they do not have CPUID enumeration support for Cache allocation.
@@ -438,7 +433,7 @@ cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct 
rdt_resource *r)
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + cbm_idx(r, i), hw_dom->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
 }
 
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
@@ -549,14 +544,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
 
m.low = 0;
m.high = hw_res->num_closid;
-
-   /*
-* temporary: the array is full-size, but cat_wrmsr() still re-maps
-* the index.
-*/
-   if (hw_res->conf_type != CDP_BOTH)
-   m.high /= 2;
-
hw_res->msr_update(d, , r);
return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 72a8cf52de47..12a898d42689 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -246,35 +246,47 @@ static int parse_line(char *line, struct resctrl_schema 
*s,
return -EINVAL;
 }
 
-static void apply_config(struct rdt_hw_domain *hw_dom,
-struct resctrl_staged_config *cfg, int closid,
+static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
+{
+   if (r->rid == RDT_RESOURCE_MBA)
+   return closid;
+
+   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
+}
+
+static bool apply_config(struct rdt_hw_domain *hw_dom,
+struct resctrl_staged_config *cfg, u32 idx,
 cpumask_var_t cpu_mask, bool mba_sc)
 {
struct rdt_domain *dom = _dom->resctrl;
u32 *dc = !mba_sc ? hw_dom->ctrl_val : hw_dom->mbps_val;
 
-   if (cfg->new_ctrl != dc[closid]) {
+   if (cfg->new_ctrl != dc[idx]) {
cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
-   dc[closid] = cfg->new_ctrl;
+   dc[idx] = cfg->new_ctrl;
+
+   return true;
}
+
+   return false;
 }
 
 int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 {
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
+   bool msr_param_init = false;
struct msr_param msr_param;
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
bool mba_sc;
int cpu;
+   u32 idx;
 
if (!zalloc_cpumask_var(_mask, GFP_KERNEL))
return -ENOMEM;
 
-   msr_param.low = closid;
-   msr_param.high = msr_param.low + 1;
msr_param.res = r;
 
mba_sc = is_mba_sc(r);
@@ -285,10 +297,23 @@ int resctrl_arch_update_domains(struct rdt_resource *r, 
u32 closid)
if (!cfg->have_new_ctrl)
continue;
 
-   apply_config(hw_dom, cfg, closid, cpu_mask, mba_sc);
+   idx = cbm_idx(r, closid);
+   if (!apply_config(hw_do

[PATCH v2 19/24] x86/resctrl: Make ctrlval arrays the same size

2021-03-12 Thread James Morse

The CODE and DATA resources have their own ctrlval arrays which are half
the size because num_closid was already adjusted.

Prior to having one ctrlval array for the resource, move the num_closid
correction into resctrl, so that the ctrlval arrays are all the same
size.

A short-lived quirk of this is that the caches are reset twice, once
for CODE once for DATA.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c | 10 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  9 +
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 4db28ce114bd..f1d0d64e5d97 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -363,7 +363,7 @@ static void rdt_get_cdp_config(int level, int type)
struct rdt_resource *r = _resources_all[type].resctrl;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 
-   hw_res->num_closid = hw_res_l->num_closid / 2;
+   hw_res->num_closid = hw_res_l->num_closid;
r->cache.cbm_len = r_l->cache.cbm_len;
r->default_ctrl = r_l->default_ctrl;
r->cache.shareable_bits = r_l->cache.shareable_bits;
@@ -549,6 +549,14 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
 
m.low = 0;
m.high = hw_res->num_closid;
+
+   /*
+* temporary: the array is full-size, but cat_wrmsr() still re-maps
+* the index.
+*/
+   if (hw_res->conf_type != CDP_BOTH)
+   m.high /= 2;
+
hw_res->msr_update(d, , r);
return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4ef8f90da043..a3e5c8b1b0cb 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2154,6 +2154,8 @@ static int schemata_list_create(void)
s->res = r;
s->conf_type = resctrl_to_arch_res(r)->conf_type;
s->num_closid = resctrl_arch_get_num_closid(r);
+   if (resctrl_arch_get_cdp_enabled(r->rid))
+   s->num_closid /= 2;
 
ret = snprintf(s->name, sizeof(s->name), r->name);
if (ret >= sizeof(s->name)) {
@@ -2366,6 +2368,13 @@ static int reset_all_ctrls(struct rdt_resource *r)
msr_param.low = 0;
msr_param.high = hw_res->num_closid;
 
+   /*
+* temporary: the array is full-sized, but cat_wrmsr() still re-maps
+* the index.
+*/
+   if (hw_res->cdp_enabled)
+   msr_param.high /= 2;
+
/*
 * Disable resource control for this resource by setting all
 * CBMs in all domains to the maximum mask value. Pick one CPU
-- 
2.30.0

[PATCH v2 18/24] x86/resctrl: Pass configuration type to resctrl_arch_get_config()

2021-03-12 Thread James Morse

Once the configuration arrays are merged, the get_config() helper needs to
be told whether the CODE, DATA or BOTH configuration is being retrieved.

Pass this information from the schema into resctrl_arch_get_config().

Nothing uses this yet, but it will later be used to map the closid
to the index in the configuration array.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  5 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 35 +++
 include/linux/resctrl.h   |  3 +-
 4 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 1cd54402b02a..72a8cf52de47 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -402,7 +402,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 }
 
 void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
-u32 closid, u32 *value)
+u32 closid, enum resctrl_conf_type type, u32 
*value)
 {
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 
@@ -424,7 +424,8 @@ static void show_doms(struct seq_file *s, struct 
resctrl_schema *schema, int clo
if (sep)
seq_puts(s, ";");
 
-   resctrl_arch_get_config(r, dom, closid, _val);
+   resctrl_arch_get_config(r, dom, closid, schema->conf_type,
+   _val);
seq_printf(s, r->format_str, dom->id, max_data_width,
   ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index c4d146658b44..7cab888f0a9c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -442,7 +442,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
cur_bw = pmbm_data->prev_bw;
-   resctrl_arch_get_config(r_mba, dom_mba, closid, _bw);
+   resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_BOTH, _bw);
delta_bw = pmbm_data->delta_bw;
/*
 * resctrl_arch_get_config() chooses the mbps/ctrl value to return
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 378cad0020da..4ef8f90da043 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -923,7 +923,8 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
for (i = 0; i < closids_supported(); i++) {
if (!closid_allocated(i))
continue;
-   resctrl_arch_get_config(r, dom, i, _val);
+   resctrl_arch_get_config(r, dom, i, s->conf_type,
+   _val);
mode = rdtgroup_mode_by_closid(i);
switch (mode) {
case RDT_MODE_SHAREABLE:
@@ -1099,6 +1100,7 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
  * Used to return the result.
  * @d_cdp: RDT domain that shares hardware with @d (RDT domain peer)
  * Used to return the result.
+ * @peer_type: The CDP configuration type of the peer resource.
  *
  * RDT resources are managed independently and by extension the RDT domains
  * (RDT resource instances) are managed independently also. The Code and
@@ -1116,7 +1118,8 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
  */
 static int rdt_cdp_peer_get(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_resource **r_cdp,
-   struct rdt_domain **d_cdp)
+   struct rdt_domain **d_cdp,
+   enum resctrl_conf_type *peer_type)
 {
struct rdt_resource *_r_cdp = NULL;
struct rdt_domain *_d_cdp = NULL;
@@ -1125,15 +1128,19 @@ static int rdt_cdp_peer_get(struct rdt_resource *r, 
struct rdt_domain *d,
switch (r->rid) {
case RDT_RESOURCE_L3DATA:
_r_cdp = _resources_all[RDT_RESOURCE_L3CODE].resctrl;
+   *peer_type = CDP_CODE;
break;
case RDT_RESOURCE_L3CODE:
_r_cdp =  _resources_all[RDT_RESOURCE_L3DATA].resctrl;
+   *peer_type = CDP_DATA;
break;
case RDT_RESOURCE_L2DATA:
_r_cdp =  _resources_all[RDT_RESOURCE_L2CODE].resctrl;
+   *peer_type = CDP_CODE;
break;
case RDT_RESOURCE_L2CODE:
_r_cdp =  _resources_all[RDT_RESOURCE_L2DATA].resctrl;
+   *peer_type = CDP_DATA;
break;
defa

[PATCH v2 16/24] x86/resctrl: Add a helper to read/set the CDP configuration

2021-03-12 Thread James Morse

Previously whether CDP is enabled is described in the alloc_enabled
and alloc_capable flags, which are set differently between the L3
and L3CODE+L3DATA resources.

To merge these resources, to give us one configuration, the CDP state
of the resource needs tracking explicitly. Add cdp_capable and cdp_enabled
as something the arch code manages.

resctrl_arch_set_cdp_enabled() lets resctrl enable or disable CDP
on a resource. resctrl_arch_get_cdp_enabled() lets it read the
current state.

With Arm's MPAM, separate code and data closids is a part of the
CPU configuration. Enabling CDP for one resource means it is enable
for all resources, as all resources would need to configure both
interpretations of the closid value.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Added prototype for resctrl_arch_set_cdp_enabled()
 * s/Currently/Previously/
 * rdt_get_cdp_config() accesses the array directly as most of the code
   here disappears once the resrouces are merged.

It isn't practical for MPAM to hide the CDP emulation by applying the same
'L3' configuration to the two closid that are in use, as this would
then consume two monitors, which are likely to be in short supply.
---
 arch/x86/kernel/cpu/resctrl/core.c|  4 ++
 arch/x86/kernel/cpu/resctrl/internal.h| 13 -
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 67 +--
 4 files changed, 57 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 4898fb6f70a4..4db28ce114bd 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -374,6 +374,10 @@ static void rdt_get_cdp_config(int level, int type)
 * "cdp" during resctrl file system mount time.
 */
r->alloc_enabled = false;
+   rdt_resources_all[level].cdp_enabled = false;
+   rdt_resources_all[type].cdp_enabled = false;
+   rdt_resources_all[level].cdp_capable = true;
+   rdt_resources_all[type].cdp_capable = true;
 }
 
 static void rdt_get_cdp_l3_config(void)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index ff1d58dee66b..9bafe3b32035 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -366,6 +366,8 @@ struct rdt_parse_data {
  * @msr_base:  Base MSR address for CBMs
  * @msr_update:Function pointer to update QOS MSRs
  * @mon_scale: cqm counter * mon_scale = occupancy in bytes
+ * @cdp_enabled:   CDP state of this resource
+ * @cdp_capable:   Is the CDP feature available on this resource
  */
 struct rdt_hw_resource {
enum resctrl_conf_type  conf_type;
@@ -376,6 +378,8 @@ struct rdt_hw_resource {
 struct rdt_resource *r);
unsigned intmon_scale;
unsigned intmbm_width;
+   boolcdp_enabled;
+   boolcdp_capable;
 };
 
 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource 
*r)
@@ -396,7 +400,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 
 extern struct dentry *debugfs_resctrl;
 
-enum {
+enum resctrl_res_level {
RDT_RESOURCE_L3,
RDT_RESOURCE_L3DATA,
RDT_RESOURCE_L3CODE,
@@ -417,6 +421,13 @@ static inline struct rdt_resource *resctrl_inc(struct 
rdt_resource *res)
return _res->resctrl;
 }
 
+static inline bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l)
+{
+   return rdt_resources_all[l].cdp_enabled;
+}
+
+int resctrl_arch_set_cdp_enabled(enum resctrl_res_level l, bool enable);
+
 #define for_each_rdt_resource(r) \
for (r = _resources_all[0].resctrl;   \
 r < _resources_all[RDT_NUM_RESOURCES].resctrl;   \
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c 
b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index 0ec6e335468c..d4ab1909a7bc 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -684,8 +684,8 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
 *   resource, the portion of cache used by it should be made
 *   unavailable to all future allocations from both resources.
 */
-   if (rdt_resources_all[RDT_RESOURCE_L3DATA].resctrl.alloc_enabled ||
-   rdt_resources_all[RDT_RESOURCE_L2DATA].resctrl.alloc_enabled) {
+   if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3) ||
+   resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L2)) {
rdt_last_cmd_puts("CDP enabled\n");
return -EINVAL;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 340bbeabb994..55c17adf763c 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgrou

[PATCH v2 12/24] x86/resctrl: Group staged configuration into a separate struct

2021-03-12 Thread James Morse

Arm's MPAM may have surprisingly large bitmaps for its cache
portions as the architecture allows up to 4K portions. The size
exposed via resctrl may not be the same, some scaling may
occur.

The values written to hardware may be unlike the values received
from resctrl, e.g. MBA percentages may be backed by a bitmap,
or a maximum value that isn't a percentage.

Previously the values to be used by hardware are written directly
to the ctrl_val array by the resctrl filesystem code before
rdt_ctrl_update() uses a function pointer to write the values to
hardware. This is a problem if scaling or conversion is needed by the
architecture as the hardware values will be different, possibly
requiring a second copy of the array.

To avoid having two copies, the arch code should own the ctrl_val array
(to allow scaling and conversion), and provide helpers to expose
the values in the format resctrl expects. The existing update_domains()
will be part of the architecture specific code.

Currently each domain has a have_new_ctrl flag, which indicates the
domains new_ctrl value should be written into the ctrl_val array and the
hardware updated if the value is different. For lack of a better term,
this is called staging the configuration. The changes are applied by
update_domains(), which takes a resource and a closid as arguments.
The have_new_ctrl flag is used to detect duplicate schemas being
specified by user-space. (e.g. "L3:0=0xff;0=0xff")

Once the resources are merged, staged configuration changes for
CODE and DATA types of resource would be written to the same domain.
The domain needs to hold a staged configuration for each type of
configuration.

Move the new_ctrl bitmap value and flag into a struct. Eventually
there will be an array of these, the values in the struct need to be
specified per type configuration, and there can be one configuration of
each type staged.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Expanded commit message
 * Removed explicit clearing of have_new_ctrl,
 * Moved ARRAY_SIZE() trickery to a later patch
 * Removed extra whitespace
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 43 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 22 +++-
 include/linux/resctrl.h   | 16 ++---
 3 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 57c2b0e121d2..a47a792fdcb3 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -62,16 +62,17 @@ int parse_bw(struct rdt_parse_data *data, struct 
resctrl_schema *s,
 {
struct rdt_resource *r = s->res;
unsigned long bw_val;
+   struct resctrl_staged_config *cfg = >staged_config;
 
-   if (d->have_new_ctrl) {
+   if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
}
 
if (!bw_validate(data->buf, _val, r))
return -EINVAL;
-   d->new_ctrl = bw_val;
-   d->have_new_ctrl = true;
+   cfg->new_ctrl = bw_val;
+   cfg->have_new_ctrl = true;
 
return 0;
 }
@@ -129,11 +130,12 @@ static bool cbm_validate(char *buf, u32 *data, struct 
rdt_resource *r)
 int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
  struct rdt_domain *d)
 {
+   struct resctrl_staged_config *cfg = >staged_config;
struct rdtgroup *rdtgrp = data->rdtgrp;
struct rdt_resource *r = s->res;
u32 cbm_val;
 
-   if (d->have_new_ctrl) {
+   if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
}
@@ -175,8 +177,8 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
}
}
 
-   d->new_ctrl = cbm_val;
-   d->have_new_ctrl = true;
+   cfg->new_ctrl = cbm_val;
+   cfg->have_new_ctrl = true;
 
return 0;
 }
@@ -190,6 +192,7 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
 static int parse_line(char *line, struct resctrl_schema *s,
  struct rdtgroup *rdtgrp)
 {
+   struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
@@ -219,6 +222,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
if (r->parse_ctrlval(, s, d))
return -EINVAL;
if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
+   cfg = >staged_config;
/*
 * In pseudo-locking setup mode and just
 * parsed a v

[PATCH v2 17/24] x86/resctrl: Use cdp_enabled in rdt_domain_reconfigure_cdp()

2021-03-12 Thread James Morse

rdt_domain_reconfigure_cdp() infers whether CDP is enabled by
checking the alloc_capable and alloc_enabled flags of the data
resources.

Now that we have an explicit cdp_enabled, use that.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 55c17adf763c..378cad0020da 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1933,14 +1933,16 @@ static int set_cache_qos_cfg(int level, bool enable)
 /* Restore the qos cfg state when a domain comes online */
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
 {
-   if (!r->alloc_capable)
+   struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+   if (!hw_res->cdp_capable)
return;
 
if (r == _resources_all[RDT_RESOURCE_L2DATA].resctrl)
-   l2_qos_cfg_update(>alloc_enabled);
+   l2_qos_cfg_update(_res->cdp_enabled);
 
if (r == _resources_all[RDT_RESOURCE_L3DATA].resctrl)
-   l3_qos_cfg_update(>alloc_enabled);
+   l3_qos_cfg_update(_res->cdp_enabled);
 }
 
 /*
-- 
2.30.0

[PATCH v2 13/24] x86/resctrl: Allow different CODE/DATA configurations to be staged

2021-03-12 Thread James Morse

Now that the staged configuration is grouped as a single struct, convert
it to an array to allow resctrl to stage more than one configuration at
a time for a single resource and closid.

To detect that the same schema is being specified twice when the schemata
file is written, the same slot in the staged_configuration array must be
used for each schema. Use the conf_type enum directly as an index.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Renamed max enum value CDP_NUM_TYPES
 * Whitespace and parenthesis
 * Missing word in the commit message
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 20 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  5 +++--
 include/linux/resctrl.h   |  5 -
 3 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index a47a792fdcb3..c46300bce210 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -60,10 +60,11 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
 int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 struct rdt_domain *d)
 {
+   struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
unsigned long bw_val;
-   struct resctrl_staged_config *cfg = >staged_config;
 
+   cfg = >staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
@@ -130,11 +131,12 @@ static bool cbm_validate(char *buf, u32 *data, struct 
rdt_resource *r)
 int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
  struct rdt_domain *d)
 {
-   struct resctrl_staged_config *cfg = >staged_config;
struct rdtgroup *rdtgrp = data->rdtgrp;
+   struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
u32 cbm_val;
 
+   cfg = >staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
@@ -192,6 +194,7 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
 static int parse_line(char *line, struct resctrl_schema *s,
  struct rdtgroup *rdtgrp)
 {
+   enum resctrl_conf_type t = s->conf_type;
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
@@ -222,7 +225,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
if (r->parse_ctrlval(, s, d))
return -EINVAL;
if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
-   cfg = >staged_config;
+   cfg = >staged_config[t];
/*
 * In pseudo-locking setup mode and just
 * parsed a valid CBM that should be
@@ -261,6 +264,7 @@ int update_domains(struct rdt_resource *r, int closid)
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
+   enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
bool mba_sc;
@@ -276,9 +280,13 @@ int update_domains(struct rdt_resource *r, int closid)
mba_sc = is_mba_sc(r);
list_for_each_entry(d, >domains, list) {
hw_dom = resctrl_to_arch_dom(d);
-   cfg = _dom->resctrl.staged_config;
-   if (cfg->have_new_ctrl)
+   for (t = 0; t < CDP_NUM_TYPES; t++) {
+   cfg = _dom->resctrl.staged_config[t];
+   if (!cfg->have_new_ctrl)
+   continue;
+
apply_config(hw_dom, cfg, closid, cpu_mask, mba_sc);
+   }
}
 
/*
@@ -350,7 +358,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
list_for_each_entry(s, _schema_all, list) {
list_for_each_entry(dom, >res->domains, list)
-   memset(>staged_config, 0, 
sizeof(dom->staged_config));
+   memset(dom->staged_config, 0, 
sizeof(dom->staged_config));
}
 
while ((tok = strsep(, "\n")) != NULL) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 0867319cf5ec..f89cdb107297 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2728,6 +2728,7 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource 
*r)
 static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema 
*s,

[PATCH v2 14/24] x86/resctrl: Rename update_domains() resctrl_arch_update_domains()

2021-03-12 Thread James Morse

update_domains() merges the staged configuration changes into the
arch codes configuration array. Rename to make it clear its part of the
arch code interface to resctrl.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * The closid is no longer staged as from resctrl its always going to be
   the same number even with CDP.
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 ++--
 arch/x86/kernel/cpu/resctrl/internal.h| 1 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 2 +-
 include/linux/resctrl.h   | 1 +
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index c46300bce210..271f5d28412a 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -259,7 +259,7 @@ static void apply_config(struct rdt_hw_domain *hw_dom,
}
 }
 
-int update_domains(struct rdt_resource *r, int closid)
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid)
 {
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
@@ -380,7 +380,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
list_for_each_entry(s, _schema_all, list) {
r = s->res;
-   ret = update_domains(r, rdtgrp->closid);
+   ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret)
goto out;
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 34b24a89eb6d..ff1d58dee66b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -498,7 +498,6 @@ void rdt_pseudo_lock_release(void);
 int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
 void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
-int update_domains(struct rdt_resource *r, int closid);
 int closids_supported(void);
 void closid_free(int closid);
 int alloc_rmid(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f89cdb107297..a103ed4f29fa 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2847,7 +2847,7 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
return ret;
}
 
-   ret = update_domains(r, rdtgrp->closid);
+   ret = resctrl_arch_update_domains(r, rdtgrp->closid);
if (ret < 0) {
rdt_last_cmd_puts("Failed to initialize allocations\n");
return ret;
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 06654f9346a2..a6d2d7a25293 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -190,5 +190,6 @@ struct resctrl_schema {
 
 /* The number of closid supported by this resource regardless of CDP */
 u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
+int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
 
 #endif /* _RESCTRL_H */
-- 
2.30.0

[PATCH v2 15/24] x86/resctrl: Add a helper to read a closid's configuration

2021-03-12 Thread James Morse

The hardware configuration may look completely different to the
values resctrl gets from user-space. The staged configuration
and resctrl_arch_update_domains() allow the architecture to
convert or translate these values.
(e.g. Arm's MPAM may back MBA's percentage control using the
 'BWPBM' bitmap)

Resctrl shouldn't read or write the ctrl_val[] values directly. As a
step towards taking direct access away, add a helper to read the
current configuration.

This will allow another architecture to scale the bitmaps if
necessary, and possibly use controls that don't take the user-space
control format at all.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 16 ++---
 arch/x86/kernel/cpu/resctrl/monitor.c |  6 +++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 43 ++-
 include/linux/resctrl.h   |  2 ++
 4 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 271f5d28412a..1cd54402b02a 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -401,22 +401,30 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file 
*of,
return ret ?: nbytes;
 }
 
+void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 closid, u32 *value)
+{
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+
+   if (!is_mba_sc(r))
+   *value = hw_dom->ctrl_val[closid];
+   else
+   *value = hw_dom->mbps_val[closid];
+}
+
 static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int 
closid)
 {
struct rdt_resource *r = schema->res;
-   struct rdt_hw_domain *hw_dom;
struct rdt_domain *dom;
bool sep = false;
u32 ctrl_val;
 
seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(dom, >domains, list) {
-   hw_dom = resctrl_to_arch_dom(dom);
if (sep)
seq_puts(s, ";");
 
-   ctrl_val = (!is_mba_sc(r) ? hw_dom->ctrl_val[closid] :
-   hw_dom->mbps_val[closid]);
+   resctrl_arch_get_config(r, dom, closid, _val);
seq_printf(s, r->format_str, dom->id, max_data_width,
   ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6f700ff1b64c..c4d146658b44 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -442,8 +442,12 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
cur_bw = pmbm_data->prev_bw;
-   user_bw = hw_dom_mba->mbps_val[closid];
+   resctrl_arch_get_config(r_mba, dom_mba, closid, _bw);
delta_bw = pmbm_data->delta_bw;
+   /*
+* resctrl_arch_get_config() chooses the mbps/ctrl value to return
+* based on is_mba_sc(). For now, reach into the hw_dom.
+*/
cur_msr_val = hw_dom_mba->ctrl_val[closid];
 
/*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a103ed4f29fa..340bbeabb994 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -910,27 +910,27 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
bool sep = false;
-   u32 *ctrl;
+   u32 ctrl_val;
 
mutex_lock(_mutex);
hw_shareable = r->cache.shareable_bits;
list_for_each_entry(dom, >domains, list) {
if (sep)
seq_putc(seq, ';');
-   ctrl = resctrl_to_arch_dom(dom)->ctrl_val;
sw_shareable = 0;
exclusive = 0;
seq_printf(seq, "%d=", dom->id);
-   for (i = 0; i < closids_supported(); i++, ctrl++) {
+   for (i = 0; i < closids_supported(); i++) {
if (!closid_allocated(i))
continue;
+   resctrl_arch_get_config(r, dom, i, _val);
mode = rdtgroup_mode_by_closid(i);
switch (mode) {
case RDT_MODE_SHAREABLE:
-   sw_shareable |= *ctrl;
+   sw_shareable |= ctrl_val;
break;
case RDT_MODE_EXCLUSIVE:
-   exclusive |= *ctrl;
+   exclusive |= ctrl_val;
break;
case RDT_MODE_PSEUDO_LOCKSETUP:

[PATCH v2 10/24] x86/resctrl: Swizzle rdt_resource and resctrl_schema in pseudo_lock_region

2021-03-12 Thread James Morse

struct pseudo_lock_region points to the rdt_resource. Once the
resources are merged, this won't be unique. The resource name
is moving into the scheam, so that eventually resctrl can generate
it.

Swap pseudo_lock_region's rdt_resource pointer for a schema pointer.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 ++--
 arch/x86/kernel/cpu/resctrl/internal.h| 6 +++---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 8 
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 4 ++--
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index dbbdd9f275e9..4428ec499037 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -227,7 +227,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 * the required initialization for single
 * region and return.
 */
-   rdtgrp->plr->r = r;
+   rdtgrp->plr->s = s;
rdtgrp->plr->d = d;
rdtgrp->plr->cbm = d->new_ctrl;
d->plr = rdtgrp->plr;
@@ -426,7 +426,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
ret = -ENODEV;
} else {
seq_printf(s, "%s:%d=%x\n",
-  rdtgrp->plr->r->name,
+  rdtgrp->plr->s->res->name,
   rdtgrp->plr->d->id,
   rdtgrp->plr->cbm);
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 97f1fff90147..34b24a89eb6d 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -158,8 +158,8 @@ struct mongroup {
 
 /**
  * struct pseudo_lock_region - pseudo-lock region information
- * @r: RDT resource to which this pseudo-locked region
- * belongs
+ * @s: Resctrl schema for the resource to which this
+ * pseudo-locked region belongs
  * @d: RDT domain to which this pseudo-locked region
  * belongs
  * @cbm:   bitmask of the pseudo-locked region
@@ -179,7 +179,7 @@ struct mongroup {
  * @pm_reqs:   Power management QoS requests related to this region
  */
 struct pseudo_lock_region {
-   struct rdt_resource *r;
+   struct resctrl_schema   *s;
struct rdt_domain   *d;
u32 cbm;
wait_queue_head_t   lock_thread_wq;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c 
b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index f8c5af759c0d..0ec6e335468c 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -246,7 +246,7 @@ static void pseudo_lock_region_clear(struct 
pseudo_lock_region *plr)
plr->line_size = 0;
kfree(plr->kmem);
plr->kmem = NULL;
-   plr->r = NULL;
+   plr->s = NULL;
if (plr->d)
plr->d->plr = NULL;
plr->d = NULL;
@@ -290,10 +290,10 @@ static int pseudo_lock_region_init(struct 
pseudo_lock_region *plr)
 
ci = get_cpu_cacheinfo(plr->cpu);
 
-   plr->size = rdtgroup_cbm_to_size(plr->r, plr->d, plr->cbm);
+   plr->size = rdtgroup_cbm_to_size(plr->s->res, plr->d, plr->cbm);
 
for (i = 0; i < ci->num_leaves; i++) {
-   if (ci->info_list[i].level == plr->r->cache_level) {
+   if (ci->info_list[i].level == plr->s->res->cache_level) {
plr->line_size = ci->info_list[i].coherency_line_size;
return 0;
}
@@ -796,7 +796,7 @@ bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain 
*d, unsigned long cbm
unsigned long cbm_b;
 
if (d->plr) {
-   cbm_len = d->plr->r->cache.cbm_len;
+   cbm_len = d->plr->s->res->cache.cbm_len;
cbm_b = d->plr->cbm;
if (bitmap_intersects(, _b, cbm_len))
return true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 368cb3a16cfd..1e59d3e9b861 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1439,8 +1439,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
ret = -ENODEV;

[PATCH v2 09/24] x86/resctrl: Pass the schema to resctrl filesystem functions

2021-03-12 Thread James Morse

Once the arch code is abstracted from the resctrl filesystem code
the separate schema for CDP are created by the filesystem code. This
means the same resource is used for different schema, or types of
configuration.

Helpers like rdtgroup_cbm_overlaps() will need the resctrl_schema
to retrieve the configuration (or configurations) as this is where
properties like the name, closid limit and CODE/DATA/BOTH type
are stored.

Change these functions to take a struct schema instead of the
struct rdt_resource.

All the modified functions eventually move to be part of the
filesystem code.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * split from a larger patch
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 23 +--
 arch/x86/kernel/cpu/resctrl/internal.h|  6 +++---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 19 +++
 include/linux/resctrl.h   |  3 ++-
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index fcd6ca73ac41..dbbdd9f275e9 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -57,9 +57,10 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
return true;
 }
 
-int parse_bw(struct rdt_parse_data *data, struct rdt_resource *r,
+int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 struct rdt_domain *d)
 {
+   struct rdt_resource *r = s->res;
unsigned long bw_val;
 
if (d->have_new_ctrl) {
@@ -125,10 +126,11 @@ static bool cbm_validate(char *buf, u32 *data, struct 
rdt_resource *r)
  * Read one cache bit mask (hex). Check that it is valid for the current
  * resource type.
  */
-int parse_cbm(struct rdt_parse_data *data, struct rdt_resource *r,
+int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
  struct rdt_domain *d)
 {
struct rdtgroup *rdtgrp = data->rdtgrp;
+   struct rdt_resource *r = s->res;
u32 cbm_val;
 
if (d->have_new_ctrl) {
@@ -160,12 +162,12 @@ int parse_cbm(struct rdt_parse_data *data, struct 
rdt_resource *r,
 * The CBM may not overlap with the CBM of another closid if
 * either is exclusive.
 */
-   if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, true)) {
+   if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, true)) {
rdt_last_cmd_puts("Overlaps with exclusive group\n");
return -EINVAL;
}
 
-   if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
+   if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, false)) {
if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
rdt_last_cmd_puts("Overlaps with other group\n");
@@ -185,9 +187,10 @@ int parse_cbm(struct rdt_parse_data *data, struct 
rdt_resource *r,
  * separated by ";". The "id" is in decimal, and must match one of
  * the "id"s for this resource.
  */
-static int parse_line(char *line, struct rdt_resource *r,
+static int parse_line(char *line, struct resctrl_schema *s,
  struct rdtgroup *rdtgrp)
 {
+   struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
struct rdt_domain *d;
@@ -213,7 +216,7 @@ static int parse_line(char *line, struct rdt_resource *r,
if (d->id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
-   if (r->parse_ctrlval(, r, d))
+   if (r->parse_ctrlval(, s, d))
return -EINVAL;
if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
/*
@@ -292,7 +295,7 @@ static int rdtgroup_parse_resource(char *resname, char *tok,
list_for_each_entry(s, _schema_all, list) {
r = s->res;
if (!strcmp(resname, r->name) && rdtgrp->closid < s->num_closid)
-   return parse_line(tok, r, rdtgrp);
+   return parse_line(tok, s, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
return -EINVAL;
@@ -377,8 +380,9 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
return ret ?: nbytes;
 }
 
-static void show_doms(struct seq_file *s, struct rdt_resource *r, int closid)
+static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int 
closid)
 {
+   struct rdt_resource *r = schema->res;
struct rdt_hw_domain *hw_dom;
struct rdt_domain *dom;
bool sep = false;
@@ -429,9 +433

[PATCH v2 11/24] x86/resctrl: Move the schemata names into struct resctrl_schema

2021-03-12 Thread James Morse

Move the names used for the schemata file out of the resource and
into struct resctrl_schema. This allows one resource to have two
different names, based on the other schema properties.

Currently the names are copied as they must match the resource, which
is currently separate. Eventually resctrl will generate two names
for one resource.

The filesystem code should calculate max_name_width for padding the
schemata file, move this to live with the code that will generate
the names.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1;
 * Don't hardcode max_name_width, that leads to bugs
 * Move max_name_width to live with the code that will generate the name.
 * Fixed name/names typo
---
 arch/x86/kernel/cpu/resctrl/core.c|  5 -
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 10 +++---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 19 +++
 include/linux/resctrl.h   |  2 ++
 4 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 078822bc58ae..4898fb6f70a4 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -778,13 +778,8 @@ static int resctrl_offline_cpu(unsigned int cpu)
 static __init void rdt_init_padding(void)
 {
struct rdt_resource *r;
-   int cl;
 
for_each_alloc_capable_rdt_resource(r) {
-   cl = strlen(r->name);
-   if (cl > max_name_width)
-   max_name_width = cl;
-
if (r->data_width > max_data_width)
max_data_width = r->data_width;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 4428ec499037..57c2b0e121d2 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -290,11 +290,9 @@ static int rdtgroup_parse_resource(char *resname, char 
*tok,
   struct rdtgroup *rdtgrp)
 {
struct resctrl_schema *s;
-   struct rdt_resource *r;
 
list_for_each_entry(s, _schema_all, list) {
-   r = s->res;
-   if (!strcmp(resname, r->name) && rdtgrp->closid < s->num_closid)
+   if (!strcmp(resname, s->name) && rdtgrp->closid < s->num_closid)
return parse_line(tok, s, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
@@ -388,7 +386,7 @@ static void show_doms(struct seq_file *s, struct 
resctrl_schema *schema, int clo
bool sep = false;
u32 ctrl_val;
 
-   seq_printf(s, "%*s:", max_name_width, r->name);
+   seq_printf(s, "%*s:", max_name_width, schema->name);
list_for_each_entry(dom, >domains, list) {
hw_dom = resctrl_to_arch_dom(dom);
if (sep)
@@ -408,7 +406,6 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 {
struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
-   struct rdt_resource *r;
int ret = 0;
u32 closid;
 
@@ -416,8 +413,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
if (rdtgrp) {
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
list_for_each_entry(schema, _schema_all, list) {
-   r = schema->res;
-   seq_printf(s, "%s:uninitialized\n", r->name);
+   seq_printf(s, "%s:uninitialized\n", 
schema->name);
}
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1e59d3e9b861..1ff969d7de75 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1439,7 +1439,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
ret = -ENODEV;
} else {
seq_printf(s, "%*s:", max_name_width,
-  rdtgrp->plr->s->res->name);
+  rdtgrp->plr->s->name);
size = rdtgroup_cbm_to_size(rdtgrp->plr->s->res,
rdtgrp->plr->d,
rdtgrp->plr->cbm);
@@ -1451,7 +1451,7 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
list_for_each_entry(schema, _schema_all, list) {
r = schema->res;
sep = false;
-   seq_printf(s, "%*s:", max_name_width, r->name);
+   seq_printf(s

[PATCH v2 08/24] x86/resctrl: Add resctrl_arch_get_num_closid()

2021-03-12 Thread James Morse

resctrl chooses whether to enable CDP, once it does, half the number
of closid are available. MPAM doesn't behave like this, an in-kernel user
of MPAM could be 'using CDP' while resctrl is not. (e.g. a KVM guest)

To move the 'half the closids' behaviour to be part of the filesystem code,
each schema would have a num_closids. This may be different from the
single resources num_closid if CDP is in use. An abstraction between the
hardware property, and the way resctrl is using it is needed.

Add a helper to read the resource's num_closid from the arch code. This
should return the number of closid that the resource supports, regardless
of whether CDP is in use.

This helper is used in the one remaining path that is specific to the
filesystem: schemata_list_create().

In contrast reset_all_ctrls() sets up a structure for modifying the
hardware, it is part of the architecture code, the maximum closid should
be the maximum value the hardware has, regardless of the way resctrl is
using it. All the uses in core.c are naturally part of the architecture
specific code.

For now return the hw_res->num_closid, which is already adjusted for CDP.
Once the CODE/DATA/BOTH resources are merged, resctrl can make the
adjustment when copying the value to the schema's num_closid.

Using a type with an obvious size for the architecture specific helper
means changing the type of num_closid to u32, which matches the type
already used by struct rdtgroup.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Rewrote commit message
 * Whitespace fixes
 * num_closid becomes u32 in all occurences to reduce surprises
---
 arch/x86/kernel/cpu/resctrl/core.c | 5 +
 arch/x86/kernel/cpu/resctrl/internal.h | 2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 4 ++--
 include/linux/resctrl.h| 6 +-
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 048c82e3baca..078822bc58ae 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -450,6 +450,11 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct 
rdt_resource *r)
return NULL;
 }
 
+u32 resctrl_arch_get_num_closid(struct rdt_resource *r)
+{
+   return resctrl_to_arch_res(r)->num_closid;
+}
+
 void rdt_ctrl_update(void *arg)
 {
struct msr_param *m = arg;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 57484d2f6214..51a6e5f2f035 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -370,7 +370,7 @@ struct rdt_parse_data {
 struct rdt_hw_resource {
enum resctrl_conf_type  conf_type;
struct rdt_resource resctrl;
-   int num_closid;
+   u32 num_closid;
unsigned intmsr_base;
void (*msr_update)  (struct rdt_domain *d, struct msr_param *m,
 struct rdt_resource *r);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1ff883f68ee1..a8d8499e6919 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -104,7 +104,7 @@ int closids_supported(void)
 static void closid_init(void)
 {
struct resctrl_schema *s;
-   int rdt_min_closid = 32;
+   u32 rdt_min_closid = 32;
 
/* Compute rdt_min_closid across all resources */
list_for_each_entry(s, _schema_all, list)
@@ -2134,7 +2134,7 @@ static int schemata_list_create(void)
 
s->res = r;
s->conf_type = resctrl_to_arch_res(r)->conf_type;
-   s->num_closid = resctrl_to_arch_res(r)->num_closid;
+   s->num_closid = resctrl_arch_get_num_closid(r);
 
INIT_LIST_HEAD(>list);
list_add(>list, _schema_all);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 0ff10468940b..5bd48c1ed497 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -171,6 +171,10 @@ struct resctrl_schema {
struct list_headlist;
enum resctrl_conf_type  conf_type;
struct rdt_resource *res;
-   int num_closid;
+   u32 num_closid;
 };
+
+/* The number of closid supported by this resource regardless of CDP */
+u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
+
 #endif /* _RESCTRL_H */
-- 
2.30.0

[PATCH v2 06/24] x86/resctrl: Walk the resctrl schema list instead of an arch list

2021-03-12 Thread James Morse

Once the arch code is abstracted from the resctrl filesystem code
the separate schema for CDP are created by the filesystem code. This
means the same resource is used for different schema, or types of
configuration.

Helpers like rdtgroup_cbm_overlaps() need the resctrl_schema to
retrieve the configuration (or configurations). Before these
helpers can be changed to take the schema instead of the resource,
their callers must have the schema on hand.

Change the users of for_each_alloc_enabled_rdt_resource() to walk
the schema instead. Schema were only created for alloc_enabled resources
so these two lists are currently equivalent.

schemata_list_create() and rdt_kill_sb() are ignored. The first
creates the schema list, and will eventually loop over the resource
indexes using an arch helper to retrieve the resource. rdt_kill_sb()
will eventually make use of an arch 'reset everything' helper.

After the filesystem code is moved, rdtgroup_pseudo_locked_in_hierarchy()
remains part of the x86 specific hooks to support psuedo lock. This code
walks each domain, and still does this after the separate resources are
merged.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Expanded commit message
 * Split from a larger patch
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 23 +++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 18 --
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 2e7466659af3..a6f9548a8a59 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -287,10 +287,12 @@ static int rdtgroup_parse_resource(char *resname, char 
*tok,
   struct rdtgroup *rdtgrp)
 {
struct rdt_hw_resource *hw_res;
+   struct resctrl_schema *s;
struct rdt_resource *r;
 
-   for_each_alloc_enabled_rdt_resource(r) {
-   hw_res = resctrl_to_arch_res(r);
+   list_for_each_entry(s, _schema_all, list) {
+   r = s->res;
+   hw_res = resctrl_to_arch_res(s->res);
if (!strcmp(resname, r->name) && rdtgrp->closid < 
hw_res->num_closid)
return parse_line(tok, r, rdtgrp);
}
@@ -301,6 +303,7 @@ static int rdtgroup_parse_resource(char *resname, char *tok,
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
 {
+   struct resctrl_schema *s;
struct rdtgroup *rdtgrp;
struct rdt_domain *dom;
struct rdt_resource *r;
@@ -331,8 +334,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
goto out;
}
 
-   for_each_alloc_enabled_rdt_resource(r) {
-   list_for_each_entry(dom, >domains, list)
+   list_for_each_entry(s, _schema_all, list) {
+   list_for_each_entry(dom, >res->domains, list)
dom->have_new_ctrl = false;
}
 
@@ -353,7 +356,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
goto out;
}
 
-   for_each_alloc_enabled_rdt_resource(r) {
+   list_for_each_entry(s, _schema_all, list) {
+   r = s->res;
ret = update_domains(r, rdtgrp->closid);
if (ret)
goto out;
@@ -401,6 +405,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
   struct seq_file *s, void *v)
 {
struct rdt_hw_resource *hw_res;
+   struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
int ret = 0;
@@ -409,8 +414,10 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
rdtgrp = rdtgroup_kn_lock_live(of->kn);
if (rdtgrp) {
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
-   for_each_alloc_enabled_rdt_resource(r)
+   list_for_each_entry(schema, _schema_all, list) {
+   r = schema->res;
seq_printf(s, "%s:uninitialized\n", r->name);
+   }
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
rdt_last_cmd_clear();
@@ -424,8 +431,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}
} else {
closid = rdtgrp->closid;
-   for_each_alloc_enabled_rdt_resource(r) {
-   hw_res = resctrl_to_arch_res(r);
+   list_for_each_entry(schema, _schema_all, list) {
+   hw_res = resctrl_to_arch_res

[PATCH v2 03/24] x86/resctrl: Add a separate schema list for resctrl

2021-03-12 Thread James Morse

To support multiple architectures, the resctrl code needs to be split
into a 'fs' specific part in core code, and an arch-specific backend.

It should be difficult for the arch-specific backends to diverge,
supporting slightly different ABIs for user-space. For example,
generating, parsing and validating the schema configuration values
should be done in what becomes the core code to prevent divergence.
Today, the schema emerge from which entries in the rdt_resources_all
array the arch code has chosen to enable.

Start by creating a struct resctrl_schema, which will eventually hold
the name and type of configuration values for resctrl.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Renamed resctrl_all_schema list
 * Used schemata_list as a prefix to make these easier to search for
 * Added kerneldoc string
 * Removed 'pending configuration' reference in commit message
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 43 +-
 include/linux/resctrl.h|  9 ++
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index d3e47fd51e3a..8a9da490134b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -106,6 +106,7 @@ extern unsigned int resctrl_cqm_threshold;
 extern bool rdt_alloc_capable;
 extern bool rdt_mon_capable;
 extern unsigned int rdt_mon_features;
+extern struct list_head resctrl_schema_all;
 
 enum rdt_group_type {
RDTCTRL_GROUP = 0,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 8c29304d3e01..73a695e7096d 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -39,6 +39,9 @@ static struct kernfs_root *rdt_root;
 struct rdtgroup rdtgroup_default;
 LIST_HEAD(rdt_all_groups);
 
+/* list of entries for the schemata file */
+LIST_HEAD(resctrl_schema_all);
+
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
@@ -2109,6 +2112,35 @@ static int rdt_enable_ctx(struct rdt_fs_context *ctx)
return ret;
 }
 
+static int schemata_list_create(void)
+{
+   struct rdt_resource *r;
+   struct resctrl_schema *s;
+
+   for_each_alloc_enabled_rdt_resource(r) {
+   s = kzalloc(sizeof(*s), GFP_KERNEL);
+   if (!s)
+   return -ENOMEM;
+
+   s->res = r;
+
+   INIT_LIST_HEAD(>list);
+   list_add(>list, _schema_all);
+   }
+
+   return 0;
+}
+
+static void schemata_list_destroy(void)
+{
+   struct resctrl_schema *s, *tmp;
+
+   list_for_each_entry_safe(s, tmp, _schema_all, list) {
+   list_del(>list);
+   kfree(s);
+   }
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2130,11 +2162,17 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_cdp;
 
+   ret = schemata_list_create();
+   if (ret) {
+   schemata_list_destroy();
+   goto out_mba;
+   }
+
closid_init();
 
ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
if (ret < 0)
-   goto out_mba;
+   goto out_schemata_free;
 
if (rdt_mon_capable) {
ret = mongroup_create_dir(rdtgroup_default.kn,
@@ -2184,6 +,8 @@ static int rdt_get_tree(struct fs_context *fc)
kernfs_remove(kn_mongrp);
 out_info:
kernfs_remove(kn_info);
+out_schemata_free:
+   schemata_list_destroy();
 out_mba:
if (ctx->enable_mba_mbps)
set_mba_sc(false);
@@ -2425,6 +2465,7 @@ static void rdt_kill_sb(struct super_block *sb)
rmdir_all_sub();
rdt_pseudo_lock_release();
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+   schemata_list_destroy();
static_branch_disable_cpuslocked(_alloc_enable_key);
static_branch_disable_cpuslocked(_mon_enable_key);
static_branch_disable_cpuslocked(_enable_key);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index be6f5df78e31..092ff0c13b9b 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -154,4 +154,13 @@ struct rdt_resource {
 
 };
 
+/**
+ * struct resctrl_schema - configuration abilities of a resource presented to 
user-space
+ * @list:  Member of resctrl's schema list
+ * @res:   The rdt_resource for this entry
+ */
+struct resctrl_schema {
+   struct list_headlist;
+   struct rdt_resource *res;
+};
 #endif /* _RESCTRL_H */
-- 
2.30.0

[PATCH v2 07/24] x86/resctrl: Store the effective num_closid in the schema

2021-03-12 Thread James Morse

The resctrl_schema struct holds properties that vary with the style of
configuration that resctrl applies to a resource.

There are already two values for the hardware's num_closid, depending on
whether the architecture presents the L3 or L3CODE/L3DATA resources.

As the way CDP changes the number of control groups that resctrl can create
is part of the user-space interface, it should be managed by the filesystem
parts of resctrl. This allows the architecture code to only describe the
value the hardware supports.

Add num_closid to resctrl_schema. This is the value seen by the filesystem,
and when the CDP resources are merged, will be different to the value
described by the arch code when CDP is enabled.

These functions operate on the num_closid value that is exposed to
user-space:
rdtgroup_parse_resource()
rdtgroup_schemata_show()
rdt_num_closids_show()
closid_init()

These are changed to use the schema value instead.

schemata_list_create() sets this value, and reaches into the architecture
specific structure to get the value. This will eventually be replaced with
a helper.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Added missing : in a comment.
 * Expanded commit message.
 * Reordered patches
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  9 +++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 13 -
 include/linux/resctrl.h   |  2 ++
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index a6f9548a8a59..fcd6ca73ac41 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -286,14 +286,12 @@ int update_domains(struct rdt_resource *r, int closid)
 static int rdtgroup_parse_resource(char *resname, char *tok,
   struct rdtgroup *rdtgrp)
 {
-   struct rdt_hw_resource *hw_res;
struct resctrl_schema *s;
struct rdt_resource *r;
 
list_for_each_entry(s, _schema_all, list) {
r = s->res;
-   hw_res = resctrl_to_arch_res(s->res);
-   if (!strcmp(resname, r->name) && rdtgrp->closid < 
hw_res->num_closid)
+   if (!strcmp(resname, r->name) && rdtgrp->closid < s->num_closid)
return parse_line(tok, r, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
@@ -404,7 +402,6 @@ static void show_doms(struct seq_file *s, struct 
rdt_resource *r, int closid)
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
   struct seq_file *s, void *v)
 {
-   struct rdt_hw_resource *hw_res;
struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
@@ -432,8 +429,8 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
} else {
closid = rdtgrp->closid;
list_for_each_entry(schema, _schema_all, list) {
-   hw_res = resctrl_to_arch_res(schema->res);
-   if (closid < hw_res->num_closid)
+   r = schema->res;
+   if (closid < schema->num_closid)
show_doms(s, r, closid);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4f25c3c6f6e8..1ff883f68ee1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -103,15 +103,12 @@ int closids_supported(void)
 
 static void closid_init(void)
 {
-   struct rdt_hw_resource *hw_res;
struct resctrl_schema *s;
int rdt_min_closid = 32;
 
/* Compute rdt_min_closid across all resources */
-   list_for_each_entry(s, _schema_all, list) {
-   hw_res = resctrl_to_arch_res(s->res);
-   rdt_min_closid = min(rdt_min_closid, hw_res->num_closid);
-   }
+   list_for_each_entry(s, _schema_all, list)
+   rdt_min_closid = min(rdt_min_closid, s->num_closid);
 
closid_free_map = BIT_MASK(rdt_min_closid) - 1;
 
@@ -849,11 +846,8 @@ static int rdt_num_closids_show(struct kernfs_open_file 
*of,
struct seq_file *seq, void *v)
 {
struct resctrl_schema *s = of->kn->parent->priv;
-   struct rdt_resource *r = s->res;
-   struct rdt_hw_resource *hw_res;
 
-   hw_res = resctrl_to_arch_res(r);
-   seq_printf(seq, "%d\n", hw_res->num_closid);
+   seq_printf(seq, "%u\n", s->num_closid);
return 0;
 }
 
@@ -2140,6 +2134,7 @@ static int schemata_list_create(void)
 
s->res = r;
s->conf_type = resctrl_to_

[PATCH v2 05/24] x86/resctrl: Label the resources with their configuration type

2021-03-12 Thread James Morse

The names of resources are used for the schemata name presented to
user-space. These should be part of the filesystem code that is common
to all architectures, as otherwise different architectures could
accidentally support different schemata.

resctrl should be able to generate 'L3, L3CODE, L3DATA' from the
architectures description of a cache at level 3 that supports CDP,
creating two separate struct resctrl_schema for the CDP case that
share the same resource, but differ in name and configuration type.

The configuration type is needed in struct resctrl_schema to generate
the name, and as an index into the array of per-domain staged
configurations that is added by a later patch.

Currently the resources are different for these types, the type
is currently encoded in the name, (and cbm_idx_offset).

Label all the entries in rdt_resources_all[], and copy that value to
struct resctrl_schema.

Copying the value ensures there is no mismatch, but allows the filesystem
parts of resctrl to be modified to use the schema. Once the resources are
merged, the filesystem code can assign this value based on the schema
being created.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * {cdp,conf}_type typo
 * Added kerneldoc comment
---
 arch/x86/kernel/cpu/resctrl/core.c | 7 +++
 arch/x86/kernel/cpu/resctrl/internal.h | 2 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 1 +
 include/linux/resctrl.h| 8 
 4 files changed, 18 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index ca43a7491fda..048c82e3baca 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -62,6 +62,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
 struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
{
+   .conf_type  = CDP_BOTH,
.resctrl = {
.rid= RDT_RESOURCE_L3,
.name   = "L3",
@@ -81,6 +82,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L3DATA] =
{
+   .conf_type  = CDP_DATA,
.resctrl = {
.rid= RDT_RESOURCE_L3DATA,
.name   = "L3DATA",
@@ -100,6 +102,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L3CODE] =
{
+   .conf_type  = CDP_CODE,
.resctrl = {
.rid= RDT_RESOURCE_L3CODE,
.name   = "L3CODE",
@@ -119,6 +122,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L2] =
{
+   .conf_type  = CDP_BOTH,
.resctrl = {
.rid= RDT_RESOURCE_L2,
.name   = "L2",
@@ -138,6 +142,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L2DATA] =
{
+   .conf_type  = CDP_DATA,
.resctrl = {
.rid= RDT_RESOURCE_L2DATA,
.name   = "L2DATA",
@@ -157,6 +162,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L2CODE] =
{
+   .conf_type  = CDP_CODE,
.resctrl = {
.rid= RDT_RESOURCE_L2CODE,
.name   = "L2CODE",
@@ -176,6 +182,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_MBA] =
{
+   .conf_type  = CDP_BOTH,
.resctrl = {
.rid= RDT_RESOURCE_MBA,
.name   = "MB",
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 8a9da490134b..57484d2f6214 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -361,12 +361,14 @@ struct rdt_parse_data {
 
 /**
  * struct rdt_hw_resource - hw attributes of a resctrl resource
+ * @conf_type: The type that should be used when configuring. temporary
  * @num_closid:Number of CLOSIDs available.
  * @msr_base:  Base MSR address for CBMs
  * @msr_update:Function pointer to update QOS MSRs
  * @mon_scale: cqm counter * mon_scale = occupancy in bytes
  */
 struct rdt_hw_resource {
+   enum resctrl_conf_type  conf_type;
struct rdt_resource resctrl;
int num_c

[PATCH v2 04/24] x86/resctrl: Pass the schema in info dir's private pointer

2021-03-12 Thread James Morse

Moving properties that resctrl exposes to user-space into the core
'fs' code, (e.g. the name of the schema), means some of the functions
that back the filesystem need the schema struct (to where the properties
are moved), but currently take the resource.

Once the CDP resources are merged, the resource doesn't reflect the
right level of information.

For the info dirs that represent a control, the information needed
is in the schema, as this is how the resource is being used. For the
monitors, its the resource as L3CODE_MON doesn't make sense, and would
monitor data too.

This difference means the type of the private pointers varies
between control and monitor info dirs.

If the flags are RF_MON_INFO, its a struct rdt_resource. If the
flags are RF_CTRL_INFO, its a struct resctrl_schema. Nothing in
res_common_files[] has both flags.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Added comment above removed for_each_alloc_enabled_rdt_resource() to hint
   at symmetry.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 38 +-
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 73a695e7096d..92b94d85c689 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -848,7 +848,8 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file 
*of,
 static int rdt_num_closids_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
struct rdt_hw_resource *hw_res;
 
hw_res = resctrl_to_arch_res(r);
@@ -859,7 +860,8 @@ static int rdt_num_closids_show(struct kernfs_open_file *of,
 static int rdt_default_ctrl_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%x\n", r->default_ctrl);
return 0;
@@ -868,7 +870,8 @@ static int rdt_default_ctrl_show(struct kernfs_open_file 
*of,
 static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%u\n", r->cache.min_cbm_bits);
return 0;
@@ -877,7 +880,8 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file 
*of,
 static int rdt_shareable_bits_show(struct kernfs_open_file *of,
   struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%x\n", r->cache.shareable_bits);
return 0;
@@ -900,13 +904,14 @@ static int rdt_shareable_bits_show(struct 
kernfs_open_file *of,
 static int rdt_bit_usage_show(struct kernfs_open_file *of,
  struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
/*
 * Use unsigned long even though only 32 bits are used to ensure
 * test_bit() is used safely.
 */
unsigned long sw_shareable = 0, hw_shareable = 0;
unsigned long exclusive = 0, pseudo_locked = 0;
+   struct rdt_resource *r = s->res;
struct rdt_domain *dom;
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
@@ -978,7 +983,8 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 static int rdt_min_bw_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%u\n", r->membw.min_bw);
return 0;
@@ -1009,7 +1015,8 @@ static int rdt_mon_features_show(struct kernfs_open_file 
*of,
 static int rdt_bw_gran_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%u\n", r->membw.bw_gran);
return 0;
@@ -1018,7 +1025,8 @@ static int rdt_bw_gran_show(struct kernfs_open_file *of,
 static in

[PATCH v2 02/24] x86/resctrl: Split struct rdt_domain

2021-03-12 Thread James Morse

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.

Split struct rdt_domain up too. Move everything that that is particular
to resctrl into a new header file. resctrl code paths touching a 'hw'
struct indicates where an abstraction is needed.

Splitting this structure only moves types around, and should not lead
to any change in behaviour.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Tabs space and other whitespace
 * cpu becomes CPU in all comments touched
---
 arch/x86/kernel/cpu/resctrl/core.c| 32 +++---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 10 --
 arch/x86/kernel/cpu/resctrl/internal.h| 40 +--
 arch/x86/kernel/cpu/resctrl/monitor.c |  8 +++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 29 ++--
 include/linux/resctrl.h   | 32 +-
 6 files changed, 91 insertions(+), 60 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 6cd84d54086f..ca43a7491fda 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -385,10 +385,11 @@ static void
 mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource 
*r)
 {
unsigned int i;
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + i, d->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
 }
 
 /*
@@ -410,21 +411,23 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
struct rdt_resource *r)
 {
unsigned int i;
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 
/*  Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + i, delay_bw_map(d->ctrl_val[i], r));
+   wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], 
r));
 }
 
 static void
 cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
 {
unsigned int i;
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + cbm_idx(r, i), d->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + cbm_idx(r, i), hw_dom->ctrl_val[i]);
 }
 
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
@@ -510,21 +513,22 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 
*dc, u32 *dm)
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
u32 *dc, *dm;
 
-   dc = kmalloc_array(hw_res->num_closid, sizeof(*d->ctrl_val), 
GFP_KERNEL);
+   dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val), 
GFP_KERNEL);
if (!dc)
return -ENOMEM;
 
-   dm = kmalloc_array(hw_res->num_closid, sizeof(*d->mbps_val), 
GFP_KERNEL);
+   dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val), 
GFP_KERNEL);
if (!dm) {
kfree(dc);
return -ENOMEM;
}
 
-   d->ctrl_val = dc;
-   d->mbps_val = dm;
+   hw_dom->ctrl_val = dc;
+   hw_dom->mbps_val = dm;
setup_default_ctrlval(r, dc, dm);
 
m.low = 0;
@@ -586,6 +590,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
struct list_head *add_pos = NULL;
+   struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
 
d = rdt_find_domain(r, id, _pos);
@@ -601,10 +606,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource 
*r)
return;
}
 
-   d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
-   if (!d)
+   hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
+   if (!hw_dom)
return;
 
+   d = _dom->resctrl;
d->id = id;
cpumask_set_cpu(cpu, >cpu_mask);
 
@@ -633,6 +639,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 {
int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+   struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
 
d = rdt_find_domain(r, id, NULL);
@@ -640,6 +647,7

[PATCH v2 01/24] x86/resctrl: Split struct rdt_resource

2021-03-12 Thread James Morse

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
the features provided by Intel RDT and AMD PQoS, and moved to /fs/.

Start by splitting struct rdt_resource, (the name is kept to keep the noise
down), and add some type-trickery to keep the foreach helpers working.

Move everything that is particular to resctrl into a new header
file, keeping the x86 hardware accessors where they are. resctrl code
paths touching a 'hw' struct indicates where an abstraction is needed.

Splitting this structure only moves types around, and should not lead
to any change in behaviour.

Splitting rdt_domain up in a similar way happens in the next patch.

Reviewed-by: Jamie Iles 
Signed-off-by: James Morse 
---
Changes since v1:
 * Tabs space and other whitespace
 * Restored for_each_rdt_resource() calls in arch code
---
 arch/x86/kernel/cpu/resctrl/core.c| 250 --
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  14 +-
 arch/x86/kernel/cpu/resctrl/internal.h| 139 +++-
 arch/x86/kernel/cpu/resctrl/monitor.c |  32 +--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  68 +++---
 include/linux/resctrl.h   | 111 ++
 7 files changed, 347 insertions(+), 271 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 698bb26aeb6e..6cd84d54086f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -57,120 +57,134 @@ static void
 mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
  struct rdt_resource *r);
 
-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
+#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].resctrl.domains)
 
-struct rdt_resource rdt_resources_all[] = {
+struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
{
-   .rid= RDT_RESOURCE_L3,
-   .name   = "L3",
-   .domains= domain_init(RDT_RESOURCE_L3),
+   .resctrl = {
+   .rid= RDT_RESOURCE_L3,
+   .name   = "L3",
+   .cache_level= 3,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 1,
+   .cbm_idx_offset = 0,
+   },
+   .domains= domain_init(RDT_RESOURCE_L3),
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
.msr_base   = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
-   },
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
},
[RDT_RESOURCE_L3DATA] =
{
-   .rid= RDT_RESOURCE_L3DATA,
-   .name   = "L3DATA",
-   .domains= domain_init(RDT_RESOURCE_L3DATA),
+   .resctrl = {
+   .rid= RDT_RESOURCE_L3DATA,
+   .name   = "L3DATA",
+   .cache_level= 3,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 2,
+   .cbm_idx_offset = 0,
+   },
+   .domains= 
domain_init(RDT_RESOURCE_L3DATA),
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
.msr_base   = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
-   },
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
},
[RDT_RESOURCE_L

[PATCH v2 00/24] x86/resctrl: Merge the CDP resources

2021-03-12 Thread James Morse

Hi folks,

Thanks to Reinette and Jamie for the comments on v1. Major changes in v2 are
to keep the closid in resctrl_arch_update_domains(), eliminating one patch,
splitting another that was making two sorts of change, and to re-order the
first few patches. See each patches changelog for more.


This series re-folds the resctrl code so the CDP resources (L3CODE et al)
behaviour is all contained in the filesystem parts, with a minimum amount
of arch specific code.

Arm have some CPU support for dividing caches into portions, and
applying bandwidth limits at various points in the SoC. The collective term
for these features is MPAM: Memory Partitioning and Monitoring.

MPAM is similar enough to Intel RDT, that it should use the defacto linux
interface: resctrl. This filesystem currently lives under arch/x86, and is
tightly coupled to the architecture.
Ultimately, my plan is to split the existing resctrl code up to have an
arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
MPAM can be wired up.

x86 might have two resources with cache controls, (L2 and L3) but has
extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
if CDP is enabled for the corresponding cache.

MPAM has an equivalent feature to CDP, but its a property of the CPU,
not the cache. Resctrl needs to have x86's odd/even behaviour, as that
its the ABI, but this isn't how the MPAM hardware works. It is entirely
possible that an in-kernel user of MPAM would not be using CDP, whereas
resctrl is.
Pretending L3CODE and L3DATA are entirely separate resources is a neat
trick, but doing this is specific to x86.
Doing this leaves the arch code in control of various parts of the
filesystem ABI: the resources names, and the way the schemata are parsed.
Allowing this stuff to vary between architectures is bad for user space.

This series collapses the CODE/DATA resources, moving all the user-visible
resctrl ABI into what becomes the filesystem code. CDP becomes the type of
configuration being applied to a cache. This is done by adding a
struct resctrl_schema to the parts of resctrl that will move to fs. This
holds the arch-code resource that is in use for this schema, along with
other properties like the name, and whether the configuration being applied
is CODE/DATA/BOTH.

This lets us fold the extra resources out of the arch code so that they
don't need to be duplicated if the equivalent feature to CDP is missing, or
implemented in a different way.


The first two patches split the resource and domain structs to have an
arch specific 'hw' portion, and the rest that is visible to resctrl.
Future series massage the resctrl code so there are no accesses to 'hw'
structures in the parts of resctrl that will move to fs, providing helpers
where necessary.

This series adds temporary scaffolding, which it removes a few patches
later. This is to allow things like the ctrlval arrays and resources to be
merged separately, which should make is easier to bisect. These things
are marked temporary, and should all be gone by the end of the series.

This series is a little rough around the monitors, would a fake
struct resctrl_schema for the monitors simplify things, or be a source
of bugs?

This series is based on v5.12-rc2, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git 
mpam/resctrl_merge_cdp/v2

v1 was posted here:
https://lore.kernel.org/lkml/20201030161120.227225-1-james.mo...@arm.com/

Parts were previously posted as an RFC here:
https://lore.kernel.org/lkml/20200214182947.39194-1-james.mo...@arm.com/


Thanks,

James Morse (24):
  x86/resctrl: Split struct rdt_resource
  x86/resctrl: Split struct rdt_domain
  x86/resctrl: Add a separate schema list for resctrl
  x86/resctrl: Pass the schema in info dir's private pointer
  x86/resctrl: Label the resources with their configuration type
  x86/resctrl: Walk the resctrl schema list instead of an arch list
  x86/resctrl: Store the effective num_closid in the schema
  x86/resctrl: Add resctrl_arch_get_num_closid()
  x86/resctrl: Pass the schema to resctrl filesystem functions
  x86/resctrl: Swizzle rdt_resource and resctrl_schema in
pseudo_lock_region
  x86/resctrl: Move the schemata names into struct resctrl_schema
  x86/resctrl: Group staged configuration into a separate struct
  x86/resctrl: Allow different CODE/DATA configurations to be staged
  x86/resctrl: Rename update_domains() resctrl_arch_update_domains()
  x86/resctrl: Add a helper to read a closid's configuration
  x86/resctrl: Add a helper to read/set the CDP configuration
  x86/resctrl: Use cdp_enabled in rdt_domain_reconfigure_cdp()
  x86/resctrl: Pass configuration type to resctrl_arch_get_config()
  x86/resctrl: Make ctrlval arrays the same size
  x86/resctrl: Apply offset correction when config is staged
  x86/resctrl: Calculate the index from the configuration type
  x86/resctrl: Merge the ctrl_val arrays
  x86/resctrl: Remove rdt_cdp_peer_get()
  x86/resctrl:

Re: [PATCH 12/24] x86/resctrl: Add closid to the staged config

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 23:46, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> Once the L2/L2CODE/L2DATA resources are merged, there may be two
>> configurations staged for one resource when CDP is enabled. The
>> closid should always be passed with the type of configuration to the
>> arch code.
>>
>> Because update_domains() will eventually apply a set of configurations,
>> it should take the closid from the same place, so they pair up.
>>
>> Move the closid to be a staged parameter.
> 
> Move implies that it is taken from one location and added to another. This 
> seems like a
> copy instead?

I'll rephrase it.


>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 0c95ed83eb05..b107c0202cfb 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -72,6 +72,7 @@ int parse_bw(struct rdt_parse_data *data, struct 
>> resctrl_schema *s,
>>   if (!bw_validate(data->buf, _val, r))
>>   return -EINVAL;
>>   cfg->new_ctrl = bw_val;
>> +    cfg->closid = data->rdtgrp->closid;
>>   cfg->have_new_ctrl = true;
>>     return 0;
>> @@ -178,6 +179,7 @@ int parse_cbm(struct rdt_parse_data *data, struct 
>> resctrl_schema *s,
>>   }
>>     cfg->new_ctrl = cbm_val;
>> +    cfg->closid = data->rdtgrp->closid;
>>   cfg->have_new_ctrl = true;

> rdtgrp is already available so it could just be:
> cfg->closid = rdtgrp->closid?

Yes, this is just trying to be identical to the earlier version.
I'll change it.


>>   return 0;
>> @@ -245,15 +247,15 @@ static int parse_line(char *line, struct 
>> resctrl_schema *s,
>>   }
>>     static void apply_config(struct rdt_hw_domain *hw_dom,
>> - struct resctrl_staged_config *cfg, int closid,
>> + struct resctrl_staged_config *cfg,
>>    cpumask_var_t cpu_mask, bool mba_sc)
>>   {
>>   struct rdt_domain *dom = _dom->resctrl;
>>   u32 *dc = mba_sc ? hw_dom->mbps_val : hw_dom->ctrl_val;
>>   -    if (cfg->new_ctrl != dc[closid]) {
>> +    if (cfg->new_ctrl != dc[cfg->closid]) {
>>   cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
>> -    dc[closid] = cfg->new_ctrl;
>> +    dc[cfg->closid] = cfg->new_ctrl;
>>   }
>>     cfg->have_new_ctrl = false;
>> @@ -284,7 +286,7 @@ int update_domains(struct rdt_resource *r, int closid)
>>   if (!cfg->have_new_ctrl)
>>   continue;
>>   -    apply_config(hw_dom, cfg, closid, cpu_mask, mba_sc);
>> +    apply_config(hw_dom, cfg, cpu_mask, mba_sc);
>>   }
>>   }

> It is not clear to me that storing the closid in the staged config is 
> necessary. A closid
> is associated with a resource group so when the user writes to the schemata 
> file all
> configurations would be (from the resource group perspective) for the same 
> closid. This is
> the value provided here to update domains. Looking ahead in this series this 
> closid is
> later used to compute the index (get_config_index()) that would use as input 
> cfg->closid,
> but that cfg->closid would be identical for all resources/staged configs and 
> then the new
> index would be computed based on the resource type. Having the closid in the 
> staged config
> thus does not seem to be necessary, it could remain as this function 
> parameter and be used
> for all staged configs?

Yes, when emulating CDP with MPAM, these can be two unrelated numbers, but it 
looks like I
lost track of the 'its always going to be the same value' for each invocation 
through the
filesystem.

An earlier version tried to push separate code/data closid values further into 
the resctrl
code, but I thought it was getting too confusing.
The x86 CDP behaviour is either 'configuration is twice the size' or 'an odd 
even
pair of closid' depending on how you look at it. For MPAM the equivalent 
feature is an
(arbitary!) pair of closid/partid.

Yes its not necessary for resctrl, I'll see how much of this I can unpick!


Thanks,

James

Re: [PATCH 11/24] x86/resctrl: Group staged configuration into a separate struct

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 23:28, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> Arm's MPAM may have surprisingly large bitmaps for its cache
>> portions as the architecture allows up to 4K portions. The size
>> exposed via resctrl may not be the same, some scaling may
>> occur.
>>
>> The values written to hardware may be unlike the values received
>> from resctrl, e.g. MBA percentages may be backed by a bitmap,
>> or a maximum value that isn't a percentage.
>>
>> Today resctrl's ctrlval arrays are written to directly by the

> If using a cryptic word like "ctrlval" it would be easier to understand what 
> is meant if
> it matches the variable in the code, "ctrl_val".

I thought the non-underscore version was the preferred version, e.g:
setup_default_ctrlval(), domain_setup_ctrlval() and parse_ctrlval. I'll switch 
to the
underscore version for things other than functions if you think its clearer.

>> resctrl filesystem code. e.g. apply_config(). This is a problem
> 
> This sentence starts with "Today" implying what code does before this change 
> but the
> example function, apply_config() is introduced in this patch.

I don't follow the problem here, 'today' refers to what the code does before 
the patch is
applied. "Before this patch" would make me unpopular, I'll try 'previously'.

>> if scaling or conversion is needed by the architecture.
>>
>> The arch code should own the ctrlval array (to allow scaling and
>> conversion), and should only need a single copy of the array for the
>> values currently applied in hardware.

> ok, but that is the case, no?

Its true for x86. But whether its true for MPAM depends on which side of the 
divide this
thing lands as the value from user-space may be different to what gets written 
to hardware.
If the filesystem code owned the list of values, there would need to be two 
copies to
allow MPAM to restore the values in hardware when CPUs come online.

(in particular, the MBA percentage control can be emulated with MPAMs bitmap or 
fractional
min/max, the driver has to choose at startup).

I'll try and bundle this as a clearer explanation into the commit message.

>> Move the new_ctrl bitmap value and flag into a struct for staged
>> configuration changes. This is created as an array to allow one per type
> 
> This is a bit cryptic as the reader may not know while reading this commit 
> message what
> "new_ctrl" is or where it is currently hosted.

Sure, I'll add more explanation of the current code.

>> of configuration. Today there is only one element in the array, but
>> eventually resctrl will use the array slots for CODE/DATA/BOTH to detect
>> a duplicate schema being written.

>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 28d69c78c29e..0c95ed83eb05 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> 
> ...
> 
>> @@ -240,15 +244,30 @@ static int parse_line(char *line, struct 
>> resctrl_schema *s,
>>   return -EINVAL;
>>   }
>>   +static void apply_config(struct rdt_hw_domain *hw_dom,
>> + struct resctrl_staged_config *cfg, int closid,
>> + cpumask_var_t cpu_mask, bool mba_sc)
>> +{
>> +    struct rdt_domain *dom = _dom->resctrl;
>> +    u32 *dc = mba_sc ? hw_dom->mbps_val : hw_dom->ctrl_val;
>> +
>> +    if (cfg->new_ctrl != dc[closid]) {
>> +    cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
>> +    dc[closid] = cfg->new_ctrl;
>> +    }
>> +
>> +    cfg->have_new_ctrl = false;
> 
> Why is this necessary?

(hmm, its ended up in the wrong patch, but:) This was to ensure that once the 
resources
are merged, all the work for applying configuration changes is done by the 
first IPI,
ensuring if update_domains() is called for a second schema with the same 
resource, it
finds no new work to do.
But without this, the empty_bitmap check would still catch it. I'll remove it.

>> +}
>> +
>>   int update_domains(struct rdt_resource *r, int closid)
>>   {
>> +    struct resctrl_staged_config *cfg;
>>   struct rdt_hw_domain *hw_dom;
>>   struct msr_param msr_param;
>>   cpumask_var_t cpu_mask;
>>   struct rdt_domain *d;
>>   bool mba_sc;
>> -    u32 *dc;
>> -    int cpu;
>> +    int cpu, i;
>>     if (!zalloc_cpumask_var(_mask, GFP_KERNEL))
>>   return -ENOMEM;
>> @@ -260,10 +279,12 @@ int update_domains(struct rdt_resource *r, int closid)
>>   mba_sc = is_mba_sc(r);
&

Re: [PATCH 10/24] x86/resctrl: Move the schema names into struct resctrl_schema

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 23:11, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> Move the names used for the schemata file out of the resource and
>> into struct resctrl_schema. This allows one resource to have two
>> different names, based on the other schema properties.
>>
>> This patch copies the names, eventually resctrl will generate them.
> 
> Please remove "This patch".
> 
>>
>> Remove the arch code's max_name_width, this is now resctrl's
>> problem.

>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index a65ff53394ed..28d69c78c29e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> 
> ...
> 
>> @@ -391,7 +389,7 @@ static void show_doms(struct seq_file *s, struct 
>> resctrl_schema
>> *schema, int clo
>>   bool sep = false;
>>   u32 ctrl_val;
>>   -    seq_printf(s, "%*s:", max_name_width, r->name);
>> +    seq_printf(s, "%*s:", RESCTRL_NAME_LEN, schema->name);
> 
> From what I understand this changes what some users will see. In the original 
> code the
> "max_name_width" is computed based on the maximum length of resources 
> supported. Systems
> that only support MBA would thus show a schemata of:
> 
> MB:0=100;1=100
> 
> I expect the above change would change the output to:
>     MB:0=100;1=100

Aha! Despite the comment - I've totally miss-understood what this code is doing.


Thanks!

James

Re: [PATCH 08/24] x86/resctrl: Walk the resctrl schema list instead of an arch list

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 22:52, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> Now that resctrl has its own list of resources it is using, walk that
>> list instead of the architectures list. This means resctrl has somewhere
>> to keep schema properties with the resource that is using them.
>>
>> Most users of for_each_alloc_enabled_rdt_resource() are per-schema,
>> and also want a schema property, like the conf_type. Switch these to
>> walk the schema list. Schema were only created for alloc_enabled
>> resources so these two lists are currently equivalent.
>>
> 
> From what I understand based on this description the patch will essentially 
> change
> instances of for_each_alloc_enabled_rdt_resource() to walking the schema list 
> 

>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index 8ac104c634fe..d3f9d142f58a 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -57,9 +57,10 @@ static bool bw_validate(char *buf, unsigned long *data, 
>> struct
>> rdt_resource *r)
>>   return true;
>>   }
>>   -int parse_bw(struct rdt_parse_data *data, struct rdt_resource *r,
>> +int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
>>    struct rdt_domain *d)
>>   {
>> +    struct rdt_resource *r = s->res;
>>   unsigned long bw_val;
>>     if (d->have_new_ctrl) {
> 
> ... this change and also the ones to parse_cbm() and rdtgroup_cbm_overlaps() 
> are not clear
> to me because it seems they replace the rdt_resource parameter with 
> resctrl_schema, but
> all in turn use that to access rdt_resource again. That seems unnecessary?

I previously restructured the series to do the schema stuff first, as I thought 
it would
make it easier to follow. It looks like this patch has picked up other stuff - 
I 'll split
this up so those changes get their own commit message.

By the end of the series, the rdt_resource isn't unique, the same 'L3' is 
provided to
L3CODE and L3DATA. The things that make these different are stored in the 
schema. In both
these cases the configuration is read/written, where that should go depends on 
the type of
the schema, which lives in the schema struct.


Thanks,

James

Re: [PATCH 07/24] x86/resctrl: Label the resources with their configuration type

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 22:30, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> Before the name for the schema can be generated, the type of the
>> configuration being applied to the resource needs to be known. Label
>> all the entries in rdt_resources_all[], and copy that value in to struct
> 
> s/in to/into/ or s/in to/to/ ?
> 
>> resctrl_schema.
>>
> 
> This commit message does not explain why it is needed to copy this value.
> 
>> Subsequent patches will generate the schema names in what will become
>> the fs code. Eventually the fs code will generate pairs of CODE/DATA if
>> the platform supports CDP for this resource.
> 
> Explaining how the copy is a step towards accomplishing this goal would be 
> very helpful.

(I've added text about what this is used for, and why it can't assign the 
values it wants
at this point in the series)

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 1bd785b1920c..628e5eb4d7a9 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2141,6 +2141,7 @@ static int create_schemata_list(void)
>>     s->res = r;
>>   s->num_closid = resctrl_arch_get_num_closid(r);
> 
> Above seems to be last user of this helper remaining ... why is helper needed 
> instead of
> something similar to below?

Great question, resctrl_to_arch_res(r)->num_closid? That won't work for MPAM, 
or would at
least force all architectures to copy x86's arch-specific structure.

schemata_list_create() needs to be part of the filesystem code after the split, 
but it
can't touch the hw structure like the conf_type is doing here.

This is mentioned in the commit message of the first two patches:
| resctrl code paths touching a 'hw' struct indicates where an abstraction is 
needed.

I evidently need to make that clearer.

>> +    s->conf_type = resctrl_to_arch_res(r)->conf_type;

I'll to do this temporarily, as by then end of the series 
schemata_list_create() chooses
the value, so this disappears.

Thanks,

James

Re: [PATCH 06/24] x86/resctrl: Store the effective num_closid in the schema

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 22:04, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> resctrl_schema holds properties that vary with the style of configuration
>> that resctrl applies to a resource.
>>
>> Once the arch code has a single resource per cache that can be configured,
>> resctrl will need to keep track of the num_closid itself.
>>
>> Add num_closid to resctrl_schema. Change callers like
>> rdtgroup_schemata_show() to walk the schema instead.

> This is a significant patch in that it introduces a second num_closid 
> available for code
> to use. Even so, the commit message is treating it quite nonchalantly ... 
> essentially
> stating that "here is a new closid and change some code to use it".

The difference already exists, the number of closid that resctrl exposes to 
user-space may
be different from what the hardware supports. Currently the arch code does a
bait-and-switch with the L3CODE/L3DATA resources, but this is specific to x86, 
it
shouldn't be required for another architecture to copy it if its not necessary 
to support CDP.

I'll expand the commit message with this.

An earlier version tried to use different types for the 'hw' number of closid, 
and the
version used by resctrl, but I figured it was too noisy.

> Could you please elaborate how the callers needing to "walk the schema 
> instead" were chosen?

These all want the num_closid value that is exposed to user-space:
rdtgroup_parse_resource()
rdtgroup_schemata_show()
rdt_num_closids_show()
closid_init()

If resctrl is in control of that, it should come from the schema instead of 
being pulled
straight out of the architecture code.

> This seems almost a revert of the earlier patch that introduced the helper 
> and I wonder if
> it may not make this easier to understand if these areas do not receive the 
> temporary
> change to use that helper.

Its a trade-off between churn for a self contained change (i.e. all the 'fs' 
bits,
regardless of if they are around later), and keeping the patches that are to do 
with the
schema together.

I don't think its a good idea to have "these are left alone as they will be 
changed
differently later" as that is liable to break (or just get weird) as the series 
is
restructured to fix it.

It will probably be fewer lines of change to do it the other way round. v2 does 
this,
hopefully that is easier on the eye!

Thanks,

James

Re: [PATCH 05/24] x86/resctrl: Pass the schema in resdir's private pointer

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 21:49, Reinette Chatre wrote:
> It is not clear what "resdir" mentioned in subject line refers to.

rdtgroup_mkdir_info_resdir(), it looks I picked the wrong bit to identify it.
('info' in a name usually conveys no information at all!)


> Could it be changed to "info dir"?

Sure,


> On 10/30/2020 9:11 AM, James Morse wrote:
>> Moving properties that resctrl exposes to user-space into the core
>> 'fs' code, (e.g. the name of the schema), means some of the functions
>> that back the filesystem need the schema struct, but currently take the
>> resource.
> 
> I think a simple addition would help to parse the above ...
> 
> " ... need the schema struct (to where the properties are moved), ..."
> 
>>
>> Once the CDP resources are merged, the resource doesn't reflect the
>> right level of information.
>>
>> For the info dirs that represent a control, the information needed
>> is in the schema, as this is how the resource is being used. For the
>> monitors, its the resource as L3CODE_MON doesn't make sense, and would
>> monitor data too.
>>
>> This difference means the type of the private pointers varies
>> between control and monitor info dirs.
>>
>> If the flags are RF_MON_INFO, its a struct rdt_resource. If the
>> flags are RF_CTRL_INFO, its a struct resctrl_schema. Nothing in
>> res_common_files[] has both flags.

>> @@ -1794,6 +1803,7 @@ static int rdtgroup_mkdir_info_resdir(struct 
>> rdt_resource *r, char
>> *name,
>>     static int rdtgroup_create_info_dir(struct kernfs_node *parent_kn)
>>   {
>> +    struct resctrl_schema *s;
>>   struct rdt_resource *r;
>>   unsigned long fflags;
>>   char name[32];
>> @@ -1809,9 +1819,10 @@ static int rdtgroup_create_info_dir(struct 
>> kernfs_node *parent_kn)
>>   if (ret)
>>   goto out_destroy;
>>   -    for_each_alloc_enabled_rdt_resource(r) {
>> +    list_for_each_entry(s, _all_schema, list) {
>> +    r = s->res;
>>   fflags =  r->fflags | RF_CTRL_INFO;
>> -    ret = rdtgroup_mkdir_info_resdir(r, r->name, fflags);
>> +    ret = rdtgroup_mkdir_info_resdir(s, r->name, fflags);
>>   if (ret)
>>   goto out_destroy;
>>   }

> I think it would be helpful to add a comment here to compensate for the 
> symmetry that is
> removed ("for_each_alloc_enabled_rdt_resource()" followed by a
> "for_each_mon_enabled_rdt_resource()").

Sure, the thing to convey is the first loop is for 'alloc_enabled' controls.


Thanks,

James

Re: [PATCH 04/24] x86/resctrl: Add a separate schema list for resctrl

2021-03-12 Thread James Morse

Hi Reinette

On 17/11/2020 21:29, Reinette Chatre wrote:
> On 10/30/2020 9:11 AM, James Morse wrote:
>> To support multiple architectures, the resctrl code needs to be split
>> into a 'fs' specific part in core code, and an arch-specific backend.
>>
>> It should be difficult for the arch-specific backends to diverge,
>> supporting slightly different ABIs for user-space. For example,
>> generating, parsing and validating the schema configuration values
>> should be done in what becomes the core code to prevent divergence.
>> Today, the schema emerge from which entries in the rdt_resources_all
>> array the arch code has chosen to enable.

>> Start by creating a struct resctrl_schema, which will eventually hold
>> the name and pending configuration values for resctrl.
> 
> Looking ahead I am not able to identify the "pending configuration values" 
> that will
> eventually be held in resctrl_schema. With entire series applied there is the 
> name, type,
> num_closid, and pointer to the resource.

Looks like a vestige from an earlier version:
This referred to the staged configuration in a much earlier version, but this 
has to stay
in the domain structure as its per-domain.

This should refer to the type that is used as an index into the domain's staged
configuration array.


Thanks,

James

Re: [PATCH 01/24] x86/resctrl: Split struct rdt_resource

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 19:20, Reinette Chatre wrote:
> On 10/30/2020 9:10 AM, James Morse wrote:
>> Splitting rdt_domain up in a similar way happens in the next patch.
>> No change in behaviour, this patch just moves types around.
> 
> Please remove the "this patch" term.

>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
>> b/arch/x86/kernel/cpu/resctrl/core.c
>> index e5f4ee8f4c3b..470661f2eb68 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c

>> @@ -912,9 +938,14 @@ static __init bool get_rdt_resources(void)
>>     static __init void rdt_init_res_defs_intel(void)
>>   {
>> +    struct rdt_hw_resource *hw_res;
>>   struct rdt_resource *r;
>> +    int i;
>> +
>> +    for (i = 0; i < RDT_NUM_RESOURCES; i++) {
>> +    hw_res = _resources_all[i];
>> +    r = _resources_all[i].resctrl;
>>   -    for_each_rdt_resource(r) {
>>   if (r->rid == RDT_RESOURCE_L3 ||
>>   r->rid == RDT_RESOURCE_L3DATA ||
>>   r->rid == RDT_RESOURCE_L3CODE ||
> 
> Could using for_each_rdt_resource() remain with the additional assignment of 
> hw_res?
> Similar to the later usage of for_each_alloc_enabled_rdt_resource()?

Sure. I didn't do it here because of the back-to-back container_of(), even 
though the
array is defined in the same file. But the compiler will remove all of that.


>> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> index c877642e8a14..ab6e584c9d2d 100644
>> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
>> @@ -284,10 +284,12 @@ int update_domains(struct rdt_resource *r, int closid)
>>   static int rdtgroup_parse_resource(char *resname, char *tok,
>>  struct rdtgroup *rdtgrp)
>>   {
>> +    struct rdt_hw_resource *hw_res;
>>   struct rdt_resource *r;
>>     for_each_alloc_enabled_rdt_resource(r) {
>> -    if (!strcmp(resname, r->name) && rdtgrp->closid < r->num_closid)
>> +    hw_res = resctrl_to_arch_res(r);
>> +    if (!strcmp(resname, r->name) && rdtgrp->closid < 
>> hw_res->num_closid)
>>   return parse_line(tok, r, rdtgrp);
>>   }

> This is the usage of for_each_alloc_enabled_rdt_resource() I refer to earlier.

(if it helps, the difference in treatment is because this one is to do with the 
filesystem
interface, the others were clearly arch code)


Thanks,

James

Re: [PATCH 03/24] x86/resctrl: Add resctrl_arch_get_num_closid()

2021-03-12 Thread James Morse

Hi Reinette,

On 17/11/2020 19:57, Reinette Chatre wrote:
> On 10/30/2020 9:10 AM, James Morse wrote:
>> resctrl chooses whether to enable CDP, once it does, half the number
>> of closid are available. MPAM doesn't behave like this, an in-kernel user
>> of MPAM could be 'using CDP' while resctrl is not.
>>
>> To move the 'half the closids' behaviour to be part of the core code,
>> each schema would have a num_closids. This may be different from the
>> single resources num_closid if CDP is in use.
>>
>> Add a helper to read the resource's num_closid, this should return the
>> number of closid that the resource supports, regardless of whether CDP
>> is in use.
>>
>> For now return the hw_res->num_closid, which is already adjusted for CDP.
>> Once the CODE/DATA/BOTH resources are merged, resctrl can make the
>> adjustment when copying the value to the schema's num_closid.


>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
>> b/arch/x86/kernel/cpu/resctrl/core.c
>> index 97040a54cc9a..5d5b566c4359 100644
>> --- a/arch/x86/kernel/cpu/resctrl/core.c
>> +++ b/arch/x86/kernel/cpu/resctrl/core.c
>> @@ -443,6 +443,11 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct 
>> rdt_resource
>> *r)
>>   return NULL;
>>   }
>>   +u32 resctrl_arch_get_num_closid(struct rdt_resource *r)
>> +{
>> +    return resctrl_to_arch_res(r)->num_closid;
>> +}
>> +

> Helper returns the value but also changes the type. Could you please add 
> motivation for
> this in a comment?

That was just sloppiness. The values from the MPAM driver are unsigned, and 
this saved casts.

I do think any cross-architecture bits should use types like u32 to avoid nasty 
surprises.
(the value in struct rdtgroup is already unsigned)

I'll do this better, and fix the commit message.


>>   void rdt_ctrl_update(void *arg)
>>   {
>>   struct msr_param *m = arg;

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index b55861ff4e34..df10135f021e 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -100,15 +100,13 @@ int closids_supported(void)
>>     static void closid_init(void)
>>   {
>> -    struct rdt_hw_resource *hw_res;
>> +    u32 rdt_min_closid = 32;
>>   struct rdt_resource *r;
>> -    int rdt_min_closid = 32;
>>     /* Compute rdt_min_closid across all resources */
>> -    for_each_alloc_enabled_rdt_resource(r) {
>> -    hw_res = resctrl_to_arch_res(r);
>> -    rdt_min_closid = min(rdt_min_closid, hw_res->num_closid);
>> -    }
>> +    for_each_alloc_enabled_rdt_resource(r)
>> +    rdt_min_closid = min(rdt_min_closid,
>> + resctrl_arch_get_num_closid(r));
>>     closid_free_map = BIT_MASK(rdt_min_closid) - 1;
>>   @@ -847,10 +845,8 @@ static int rdt_num_closids_show(struct 
>> kernfs_open_file *of,
>>   struct seq_file *seq, void *v)
>>   {
>>   struct rdt_resource *r = of->kn->parent->priv;
>> -    struct rdt_hw_resource *hw_res;
>>   -    hw_res = resctrl_to_arch_res(r);
>> -    seq_printf(seq, "%d\n", hw_res->num_closid);
>> +    seq_printf(seq, "%d\n", resctrl_arch_get_num_closid(r));
>>   return 0;
> 
> Now that this type is changed this will need to be %u?

Oops,


>>   diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
>> index f5af59b8f2a9..dfb0f32b73a1 100644
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -163,4 +163,7 @@ struct rdt_resource {
>>     };
>>   +/* The number of closid supported by this resource regardless of CDP */
>> +u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
>> +
>>   #endif /* _RESCTRL_H */


> The purpose of this change is unclear and introducing confusion. It 
> introduces a helper
> that returns the num_closid associated with a resource but it does not use 
> the helper in
> all the cases where this value is needed. After this change some code uses 
> this new helper
> while other code continue to access the value directly.

This is the split between the bits that are fs/resctrl and 
architecture-specific emerging.
Those functions that legitimately use struct rdt_hw_resource directly don't 
need the
helper. e.g. reset_all_ctrls() remains part of the architecture-specific code.

The aim is none of the code in fs/resctrl would ever touch those structs, it 
would always
use the helper.

I agree the commit message is over-focussed on why you can't reach in and grab 
a va

Re: How can a userspace program tell if the system supports the ACPI S4 state (Suspend-to-Disk)?

2021-02-09 Thread James Morse

Hi Dexuan,

On 05/02/2021 19:36, Dexuan Cui wrote:
>> From: Rafael J. Wysocki 
>> Sent: Friday, February 5, 2021 5:06 AM
>> To: Dexuan Cui 
>> Cc: linux-a...@vger.kernel.org; linux-kernel@vger.kernel.org;
>> linux-hyp...@vger.kernel.org; Michael Kelley 
>> Subject: Re: How can a userspace program tell if the system supports the ACPI
>> S4 state (Suspend-to-Disk)?
>>
>> On Sat, Dec 12, 2020 at 2:22 AM Dexuan Cui  wrote:
>>>
>>> Hi all,
>>> It looks like Linux can hibernate even if the system does not support the 
>>> ACPI
>>> S4 state, as long as the system can shut down, so "cat /sys/power/state"
>>> always contains "disk", unless we specify the kernel parameter "nohibernate"
>>> or we use LOCKDOWN_HIBERNATION.

>>> In some scenarios IMO it can still be useful if the userspace is able to 
>>> detect
>>> if the ACPI S4 state is supported or not, e.g. when a Linux guest runs on
>>> Hyper-V, Hyper-V uses the virtual ACPI S4 state as an indicator of the 
>>> proper
>>> support of the tool stack on the host, i.e. the guest is discouraged from
>>> trying hibernation if the state is not supported.

What goes wrong? This sounds like a funny way of signalling hypervisor policy.


>>> I know we can check the S4 state by 'dmesg':
>>>
>>> # dmesg |grep ACPI: | grep support
>>> [3.034134] ACPI: (supports S0 S4 S5)
>>>
>>> But this method is unreliable because the kernel msg buffer can be filled
>>> and overwritten. Is there any better method? If not, do you think if the
>>> below patch is appropriate? Thanks!
>>
>> Sorry for the delay.
>>
>> If ACPI S4 is supported, /sys/power/disk will list "platform" as one
>> of the options (and it will be the default one then).  Otherwise,
>> "platform" is not present in /sys/power/disk, because ACPI is the only
>> user of hibernation_ops.

> This works on x86. Thanks a lot!
> 
> BTW, does this also work on ARM64?

Not today. The S4/S5 stuff is part of 'ACPI_SYSTEM_POWER_STATES_SUPPORT', which 
arm64
doesn't enable as it has a firmware mechanism that covers this on both DT and 
ACPI
systems. That code is what calls hibernation_set_ops() to enable ACPI's 
platform mode.

Regardless: hibernate works fine. What does your hypervisor do that causes 
problems?
(I think all we expect from firmware is it doesn't randomise the placement of 
the ACPI
tables as they aren't necessarily part of the hibernate image)


Thanks,

James

Re: [PATCH v11 0/6] arm64: MMU enabled kexec relocation

2021-02-01 Thread James Morse

Hi Pavel,

On 27/01/2021 17:27, Pavel Tatashin wrote:
> Enable MMU during kexec relocation in order to improve reboot performance.
> 
> If kexec functionality is used for a fast system update, with a minimal
> downtime, the relocation of kernel + initramfs takes a significant portion
> of reboot.
> 
> The reason for slow relocation is because it is done without MMU, and thus
> not benefiting from D-Cache.
> 
> Performance data
> 
> For this experiment, the size of kernel plus initramfs is small, only 25M.
> If initramfs was larger, than the improvements would be greater, as time
> spent in relocation is proportional to the size of relocation.
> 
> Previously:
> kernel shutdown   0.022131328s
> relocation0.440510736s
> kernel startup0.294706768s
> 
> Relocation was taking: 58.2% of reboot time
> 
> Now:
> kernel shutdown   0.032066576s
> relocation0.022158152s
> kernel startup0.296055880s
> 
> Now: Relocation takes 6.3% of reboot time
> 
> Total reboot is x2.16 times faster.
> 
> With bigger userland (fitImage 380M), the reboot time is improved by 3.57s,
> and is reduced from 3.9s down to 0.33s

> Previous approaches and discussions
> ---

The problem I see with this is rewriting the relocation code. It needs to work 
whether the
machine has enough memory to enable the MMU during kexec, or not.

In off-list mail to Pavel I proposed an alternative implementation here:
https://gitlab.arm.com/linux-arm/linux-jm/-/tree/kexec+mmu/v0

By using a copy of the linear map, and passing the phys_to_virt offset into
arm64_relocate_new_kernel() its possible to use the same code when we fail to 
allocate the
page tables, and run with the MMU off as it does today.
I'm convinced someone will crawl out of the woodwork screaming 'regression' if 
we
substantially increase the amount of memory needed to kexec at all.

>From that discussion: this didn't meet Pavel's timing needs.
If you depend on having all the src/dst pages lined up in a single line, it 
sounds like
you've over-tuned this to depend on the CPU's streaming mode. What causes the 
CPU to
start/stop that stuff is very implementation specific (and firmware 
configurable).
I don't think we should let this rule out systems that can kexec today, but 
don't have
enough extra memory for the page tables.
Having two copies of the relocation code is obviously a bad idea.


(as before: ) Instead of trying to make the relocations run quickly, can we 
reduce them?
This would benefit other architectures too.

Can the kexec core code allocate higher order pages, instead of doing 
everything page at
at time?

If you have a crash kernel reservation, can we use that to eliminate the 
relocations
completely?
(I think this suggestion has been lost in translation each time I make it.
I mean like this:
https://gitlab.arm.com/linux-arm/linux-jm/-/tree/kexec/kexec_in_crashk/v0
Runes to test it:
| sudo ./kexec -p -u
| sudo cat /proc/iomem | grep Crash
|  b020-f01f : Crash kernel
| sudo ./kexec --mem-min=0xb020 --mem-max=0xf01ff -l ~/Image 
--reuse-cmdline

I bet its even faster!)


I think 'as fast as possible' and 'memory constrained' are mutually exclusive
requirements. We need to make the page tables optional with a single 
implementation.


Thanks,

James

Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group

2020-12-09 Thread James Morse

Hi Reinette, Fenghua,

Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' 
conveys the
guts of this change more quickly!

On 03/12/2020 23:25, Reinette Chatre wrote:
> From: Fenghua Yu 
> 
> Currently when moving a task to a resource group the PQR_ASSOC MSR
> is updated with the new closid and rmid in an added task callback.
> If the task is running the work is run as soon as possible. If the
> task is not running the work is executed later

> in the kernel exit path when the kernel returns to the task again.

kernel exit makes me thing of user-space... is it enough to just say:
"by __switch_to() when task is next run"?


> Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
> is running is the right thing to do. Queueing work for a task that is
> not running is unnecessary (the PQR_ASSOC MSR is already updated when the
> task is scheduled in) and causing system resource waste with the way in
> which it is implemented: Work to update the PQR_ASSOC register is queued
> every time the user writes a task id to the "tasks" file, even if the task
> already belongs to the resource group. This could result in multiple pending
> work items associated with a single task even if they are all identical and
> even though only a single update with most recent values is needed.
> Specifically, even if a task is moved between different resource groups
> while it is sleeping then it is only the last move that is relevant but
> yet a work item is queued during each move.
> This unnecessary queueing of work items could result in significant system
> resource waste, especially on tasks sleeping for a long time. For example,
> as demonstrated by Shakeel Butt in [1] writing the same task id to the
> "tasks" file can quickly consume significant memory. The same problem
> (wasted system resources) occurs when moving a task between different
> resource groups.
> 
> As pointed out by Valentin Schneider in [2] there is an additional issue with
> the way in which the queueing of work is done in that the task_struct update
> is currently done after the work is queued, resulting in a race with the
> register update possibly done before the data needed by the update is 
> available.
> 
> To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
> right after the new closid and rmid are ready during the task movement,
> only if the task is running. If a moved task is not running nothing is
> done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
> This is the same way used to update the register when tasks are moved as
> part of resource group removal.

(as t->on_cpu is already used...)

Reviewed-by: James Morse 


> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 68db7d2dec8f..9d62f1fadcc3 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)


> +static void update_task_closid_rmid(struct task_struct *t)
>  {
> + int cpu;
>  
> + if (task_on_cpu(t, ))
> + smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);


I think:
|   if (task_curr(t))
|   smp_call_function_single(task_cpu(t), _update_task_closid_rmid, 
t, 1);

here would make for an easier backport as it doesn't depend on the previous 
patch.


> +}

[...]

>  static int __rdtgroup_move_task(struct task_struct *tsk,
>   struct rdtgroup *rdtgrp)
>  {

> + if (rdtgrp->type == RDTCTRL_GROUP) {
> + tsk->closid = rdtgrp->closid;
> + tsk->rmid = rdtgrp->mon.rmid;
> + } else if (rdtgrp->type == RDTMON_GROUP) {

[...]

> + } else {

> + rdt_last_cmd_puts("Invalid resource group type\n");
> + return -EINVAL;

Wouldn't this be a kernel bug?
I'd have thought there would be a WARN_ON_ONCE() here to make it clear this 
isn't
user-space's fault!


>   }
> - return ret;
> +
> + /*
> +  * By now, the task's closid and rmid are set. If the task is current
> +  * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
> +  * group go into effect. If the task is not current, the MSR will be
> +  * updated when the task is scheduled in.
> +  */
> + update_task_closid_rmid(tsk);
> +
> + return 0;
>  }


Thanks,

James

Re: [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers

2020-12-09 Thread James Morse

Hi Reinette, Fenghua,

On 03/12/2020 23:25, Reinette Chatre wrote:
> From: Fenghua Yu 
> 
> The code of setting the CPU on which a task is running in a CPU mask is
> moved into a couple of helpers. The new helper task_on_cpu() will be
> reused shortly.

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 6f4ca4bea625..68db7d2dec8f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -525,6 +525,38 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)

> +#ifdef CONFIG_SMP

(using IS_ENABLED(CONFIG_SMP) lets the compiler check all the code in one go, 
then
dead-code-remove the stuff that will never happen... its also easier on the 
eye!)


> +/* Get the CPU if the task is on it. */
> +static bool task_on_cpu(struct task_struct *t, int *cpu)
> +{
> + /*
> +  * This is safe on x86 w/o barriers as the ordering of writing to
> +  * task_cpu() and t->on_cpu is reverse to the reading here. The
> +  * detection is inaccurate as tasks might move or schedule before
> +  * the smp function call takes place. In such a case the function
> +  * call is pointless, but there is no other side effect.
> +  */

> + if (t->on_cpu) {

kernel/sched/core.c calls out that there can be two tasks on one CPU with this 
set.
(grep astute)
I think that means this series will falsely match the old task for a CPU while 
the
scheduler is running, and IPI it unnecessarily.

task_curr() is the helper that knows not to do this.


> + *cpu = task_cpu(t);
> +
> + return true;
> + }
> +
> + return false;
> +}


Thanks,

James

Re: [PATCH v4 2/2] arm64: kvm: Introduce MTE VCPU feature

2020-11-25 Thread James Morse

Hi Steven, Catalin,

On 18/11/2020 16:01, Steven Price wrote:
> On 17/11/2020 16:07, Catalin Marinas wrote:
>> On Mon, Oct 26, 2020 at 03:57:27PM +, Steven Price wrote:
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 19aacc7d64de..38fe25310ca1 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -862,6 +862,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>>> phys_addr_t
>>> fault_ipa,
>>>   if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>   vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>  , _ipa);
>>> +
>>> +    /*
>>> + * The otherwise redundant test for system_supports_mte() allows the
>>> + * code to be compiled out when CONFIG_ARM64_MTE is not present.
>>> + */
>>> +    if (system_supports_mte() && kvm->arch.mte_enabled && pfn_valid(pfn)) {
>>> +    /*
>>> + * VM will be able to see the page's tags, so we must ensure
>>> + * they have been initialised.
>>> + */
>>> +    struct page *page = pfn_to_page(pfn);
>>> +    long i, nr_pages = compound_nr(page);
>>> +
>>> +    /* if PG_mte_tagged is set, tags have already been initialised */
>>> +    for (i = 0; i < nr_pages; i++, page++) {
>>> +    if (!test_and_set_bit(PG_mte_tagged, >flags))
>>> +    mte_clear_page_tags(page_address(page));
>>> +    }
>>> +    }
>>
>> If this page was swapped out and mapped back in, where does the
>> restoring from swap happen?
> 
> Restoring from swap happens above this in the call to gfn_to_pfn_prot()
> 
>> I may have asked in the past, is user_mem_abort() the only path for
>> mapping Normal pages into stage 2?
>>
> 
> That is my understanding (and yes you asked before) and no one has corrected 
> me! ;)

A recent discovery: Copy on write will cause kvm_set_spte_handler() to fixup 
the mapping
(instead of just invalidating it) on the assumption the guest is going to read 
whatever
was written.

Its possible user_mem_abort() will go and stomp on that mapping a second time, 
but if the
VMM triggers this at stage1, you won't have a vcpu for the update.


Thanks,

James

Re: [PATCH 1/2] x86/intel_rdt: Check monitor group vs control group membership earlier

2020-11-20 Thread James Morse

Hi Valentin,

On 18/11/2020 18:00, Valentin Schneider wrote:
> A task can only be moved between monitor groups if both groups belong to
> the same control group. This is checked fairly late however: by that time
> we already have appended a task_work() callback.

(is that a problem? It's needed to do the kfree())


> Check the validity of the move before getting anywhere near task_work
> callbacks.

This saves the kzalloc()/task_work_add() if it wasn't going to be necessary.

Reviewed-by: James Morse 


Thanks,

James

Re: [PATCH 2/2] x86/intel_rdt: Plug task_work vs task_struct {rmid,closid} update race

2020-11-20 Thread James Morse

Hi Valentin,

On 18/11/2020 18:00, Valentin Schneider wrote:
> Upon moving a task to a new control / monitor group, said task's {closid,
> rmid} fields are updated *after* triggering the move_myself() task_work
> callback. This can cause said callback to miss the update, e.g. if the
> triggering thread got preempted before fiddling with task_struct, or if the
> targeted task was already on its way to return to userspace.

So, if move_myself() runs after task_work_add() but before tsk is written to.
Sounds fun!

> Update the task_struct's {closid, rmid} tuple *before* invoking
> task_work_add(). As they can happen concurrently, wrap {closid, rmid}
> accesses with READ_ONCE() and WRITE_ONCE(). Highlight the required ordering
> with a pair of comments.

... and this one is if move_myself() or __resctrl_sched_in() runs while tsk is 
being
written to on another CPU. It might get torn values, or multiple-reads get 
different values.

The READ_ONCE/WRITE_ONCEry would have been easier to read as a separate patch 
as you touch
all sites, and move/change some of them.

Regardless:
Reviewed-by: James Morse 

I don't 'get' memory-ordering, so one curiosity below:

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index b6b5b95df833..135a51529f70 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -524,11 +524,13 @@ static void move_myself(struct callback_head *head)
>* If resource group was deleted before this task work callback
>* was invoked, then assign the task to root group and free the
>* resource group.
> +  *
> +  * See pairing atomic_inc() in __rdtgroup_move_task()
>*/
>   if (atomic_dec_and_test(>waitcount) &&
>   (rdtgrp->flags & RDT_DELETED)) {
> - current->closid = 0;
> - current->rmid = 0;
> + WRITE_ONCE(current->closid, 0);
> + WRITE_ONCE(current->rmid, 0);
>   kfree(rdtgrp);
>   }
>  
> @@ -553,14 +555,32 @@ static int __rdtgroup_move_task(struct task_struct *tsk,

>   /*
>* Take a refcount, so rdtgrp cannot be freed before the
>* callback has been invoked.
> +  *
> +  * Also ensures above {closid, rmid} writes are observed by
> +  * move_myself(), as it can run immediately after task_work_add().
> +  * Otherwise old values may be loaded, and the move will only actually
> +  * happen at the next context switch.

But __resctrl_sched_in() can still occur at anytime and READ_ONCE() a pair of 
values that
don't go together?
I don't think this is a problem for RDT as with old-rmid the task was a member 
of that
monitor-group previously, and 'freed' rmid are kept in limbo for a while after.
(old-closid is the same as the task having not schedule()d since the change, 
which is fine).

For MPAM, this is more annoying as changing just the closid may put the task in 
a
monitoring group that never existed, meaning its surprise dirty later.

If this all makes sense, I guess the fix (for much later) is to union 
closid/rmid, and
WRITE_ONCE() them together where necessary.
(I've made a note for when I next pass that part of the MPAM tree)

> +  *
> +  * Pairs with atomic_dec() in move_myself().
>*/
>   atomic_inc(>waitcount);
> +
>   ret = task_work_add(tsk, >work, TWA_RESUME);
>   if (ret) {
>   /*

Thanks!

James

Re: [PATCH] arm64: kexec: Use smp_send_stop in machine_shutdown

2020-11-19 Thread James Morse

Hi Henry,

On 16/11/2020 21:11, Henry Willard wrote:
> On 11/11/20 10:11 AM, James Morse wrote:
>> On 06/11/2020 23:25, Henry Willard wrote:
>>> machine_shutdown() is called by kernel_kexec() to shutdown
>>> the non-boot CPUs prior to starting the new kernel. The
>>> implementation of machine_shutdown() varies by architecture.
>>> Many make an interprocessor call, such as smp_send_stop(),
>>> to stop the non-boot CPUs. On some architectures the CPUs make
>>> some sort of firmware call to stop the CPU. On some architectures
>>> without the necessary firmware support to stop the CPU, the CPUs
>>> go into a disabled loop, which is not suitable for supporting
>>> kexec. On Arm64 systems that support PSCI, CPUs can be stopped
>>> with a PSCI CPU_OFF call.
>> All this variation is because we want to to get the CPU back in a sane 
>> state, as if we'd
>> just come from cold boot. Without the platform firmware doing its 
>> initialisation, the only
>> way we have of doing this is to run the cpuhp callbacks to take the CPU 
>> offline cleanly.

> If it is unsafe to call cpu_ops.cpu_die (or cpu_die) on Arm except from cpuhp 
> shouldn't
> something detect that?

It wouldn't be the first undocumented assumption in linux!


>>> Arm64 machine_shutdown() uses the CPU hotplug infrastructure via
>>> smp_shutdown_nonboot_cpus() to stop each CPU. This is relatively
>>> slow and takes a best case of .02 to .03 seconds per CPU which are
>>> stopped sequentially.

>> Hmmm, looks like cpuhp doesn't have a way to run the callbacks in parallel...
>>
>>
>>> This can take the better part of a second for
>>> all the CPUs to be stopped depending on how many CPUs are present.
>>> If for some reason the CPUs are busy at the time of the kexec reboot,
>>> it can take several seconds to shut them all down.

>> Busy doing what?

> Executing user code

Nice. For EL0 you can always interrupt it, so that shouldn't matter.
I guess the issue is CPUs waiting for an irqsave spinlock that can't be 
interrupted until
they've finished the work they are doing.


>> I assume the problem is CPUs starting work on behalf of user-space, which is 
>> now
>> pointless, which prevents them from scheduling into the cpuhp work quickly.
>>
>> Does hoisting kexec's conditional call to freeze_processes() above the 
>> #ifdef - so that
>> user-space threads are no longer schedule-able improve things here?

> It might help the worst cases, but even on an idle system it takes a while.

>>> Each CPU shuts itself down by calling PSCI CPU_OFF.
>>> In some applications such as embedded systems, which need a very
>>> fast reboot (less than a second), this may be too slow.

>> Where does this requirement come from? Surely kexec is part of a software 
>> update, not
>> regular operation.

> The requirement comes from the owner of the larger environment of which this 
> embedded
> system is a part.

> So, yes, this is part of software maintenance of a component during
> regular operation.

What does kexec as part of its regular operation? kexec re-writes all of 
memory! Surely
its only for software updates, which can't happen by surprise!

(This one second number has come up before. Why not 2, or 10?)


Pavel had a similar requirement. He was looking at doing the kexec-image 
re-assembly with
the MMU enabled. This benefits a very large kexec-image on platforms with lots 
of memory,
but where the CPUs suffer when making non-cacheable accesses. The combined 
series is here:
https://gitlab.arm.com/linux-arm/linux-jm/-/commits/kexec+mmu/v0/

But this didn't show enough of an improvement on Pavel's platform.


>>> This patch reverts to using smp_send_stop() to signal all
>>> CPUs to stop immediately. Currently smp_send_stop() causes each cpu
>>> to call local_cpu_stop(), which goes into a disabled loop. This patch
>>> modifies local_cpu_stop() to call cpu_die() when kexec_in_progress
>>> is true, so that the CPU calls PSCI CPU_OFF just as in the case of
>>> smp_shutdown_nonboot_cpus().
>> This is appropriate for panic(), as we accept it may fail.
>>
>> For Kexec(), the CPU must go offline, otherwise we can't overwrite the code 
>> it was
>> running. The arch code can't just call CPU_OFF in any context. See 5.5 
>> CPU_OFF' of
>> https://developer.arm.com/documentation/den0022/d
>>
>> 5.5.2 describes what the OS must do first, in particular interrupts must be 
>> migrated away
>> from the CPU calling CPU_OFF. Currently the cpuhp notifiers do this, which 
>> after this
>> patch, no longer run.

> I

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-19 Thread James Morse

Hi,

(sorry for the late response)

On 06/11/2020 18:46, Nicolas Saenz Julienne wrote:
> On Thu, 2020-11-05 at 16:11 +0000, James Morse wrote:>> We also depend on 
> this when skipping the checksum code in purgatory, which can be
>> exceedingly slow.
> 
> This one I don't fully understand, so I'll lazily assume the prerequisite is
> the same WRT how memory is mapped. :)

The aim is its never normally mapped by the kernel. This is so that if we can't 
get rid of
the secondary CPUs (e.g. they have IRQs masked), but they are busy scribbling 
all over
memory, we have a rough guarantee that they aren't scribbling over the kdump 
kernel.

We can skip the checksum in purgatory, as there is very little risk of the 
memory having
been corrupted.


> Ultimately there's also /sys/kernel/kexec_crash_size's handling. Same
> prerequisite.

Yeah, this lets you release PAGE_SIZEs back to the allocator, which means the
marked-invalid page tables we have hidden there need to be PAGE_SIZE mappings.


Thanks,

James


> Keeping in mind acpi_table_upgrade() and unflatten_device_tree() depend on
> having the linear mappings available. I don't see any simple way of solving
> this. Both moving the firmware description routines to use fixmap or 
> correcting
> the linear mapping further down the line so as to include kdump's regions, 
> seem
> excessive/impossible (feel free to correct me here). I'd be happy to hear
> suggestions. Otherwise we're back to hard-coding the information as we
> initially did.
> 
> Let me stress that knowing the DMA constraints in the system before reserving
> crashkernel's regions is necessary if we ever want it to work seamlessly on 
> all
> platforms. Be it small stuff like the Raspberry Pi or huge servers with TB of
> memory.

Re: [PATCH 00/24] x86/resctrl: Merge the CDP resources

2020-11-17 Thread James Morse

Hi Reinette,

On 16/11/2020 17:54, Reinette Chatre wrote:
> On 10/30/2020 9:10 AM, James Morse wrote:
>> MPAM has an equivalent feature to CDP, but its a property of the CPU,
>> not the cache. Resctrl needs to have x86's odd/even behaviour, as that
>> its the ABI, but this isn't how the MPAM hardware works. It is entirely
>> possible that an in-kernel user of MPAM would not be using CDP, whereas
>> resctrl is.

> The above seems to distinguish between "in-kernel user of MPAM" and resctrl 
> (now obtaining
> support for MPAM). Could you please provide more details on the "in-kernel 
> user of MPAM"
> and elaborate on how these two usages are expected to interact with MPAM 
> concurrently?

This is a badly phrased reference to all the bits of MPAM that are left on the 
floor after
the resctrl support is plumbed up.

Currently none of the software exists, but MPAM also has support for: 
virtualisation, the
interrupt-controller (GIC) and the IO-MMU. None of these things are exposed via 
resctrl,
so they either need new schema (which must also work for x86), or handling 
'invisibly' in
the kernel.

Virtualisation is probably the easiest example: With MPAM, the guest may be 
'using CDP'
whereas the host is not, or vice-versa.
The guest will never be allowed to access the MMIO configuration directly, it 
will be
managed via the host's driver. Now the host's driver has to handle CDP-on and 
CDP-off
configurations.
Keeping the odd/even CDP stuff in resctrl means the arch-code/driver doesn't 
need to know
or care about this stuff if the hardware doesn't.

If the interrupt-controller or IO-MMU consume closid/rmid, then I'd describe 
them as
in-kernel users (as the kernel owns their configuration). These would never use 
CDP as
they don't fetch instructions.

How do I envision these things working concurrently?
(a) closid/rmid can be reserved before resctrl is mounted, or
(b) allocated by user-space and handed back to the kernel (e.g. virtualisation).

The ctrlval values move to belong to the arch-code/driver, so if 'something' 
changes the
configuration behind resctrls back, the new schema values are immediately 
visible via the
corresponding schema file in case (b). In case (a), resctrl would never look at 
those
closid, but it wouldn't matter if it did.

(the counter-example is mba_sc, which may need to convert the current ctrlval 
back to a
mbps_val if its being managed by something other than resctrl)

Thanks,

James

Re: [PATCH] arm64: kexec: Use smp_send_stop in machine_shutdown

2020-11-11 Thread James Morse

Hi Henry,

On 06/11/2020 23:25, Henry Willard wrote:
> machine_shutdown() is called by kernel_kexec() to shutdown
> the non-boot CPUs prior to starting the new kernel. The
> implementation of machine_shutdown() varies by architecture.
> Many make an interprocessor call, such as smp_send_stop(),
> to stop the non-boot CPUs. On some architectures the CPUs make
> some sort of firmware call to stop the CPU. On some architectures
> without the necessary firmware support to stop the CPU, the CPUs
> go into a disabled loop, which is not suitable for supporting
> kexec. On Arm64 systems that support PSCI, CPUs can be stopped
> with a PSCI CPU_OFF call.

All this variation is because we want to to get the CPU back in a sane state, 
as if we'd
just come from cold boot. Without the platform firmware doing its 
initialisation, the only
way we have of doing this is to run the cpuhp callbacks to take the CPU offline 
cleanly.

> Arm64 machine_shutdown() uses the CPU hotplug infrastructure via
> smp_shutdown_nonboot_cpus() to stop each CPU. This is relatively
> slow and takes a best case of .02 to .03 seconds per CPU which are
> stopped sequentially.

Hmmm, looks like cpuhp doesn't have a way to run the callbacks in parallel...

> This can take the better part of a second for
> all the CPUs to be stopped depending on how many CPUs are present.
> If for some reason the CPUs are busy at the time of the kexec reboot,
> it can take several seconds to shut them all down.

Busy doing what?

I assume the problem is CPUs starting work on behalf of user-space, which is now
pointless, which prevents them from scheduling into the cpuhp work quickly.

Does hoisting kexec's conditional call to freeze_processes() above the #ifdef - 
so that
user-space threads are no longer schedule-able improve things here?

> Each CPU shuts itself down by calling PSCI CPU_OFF.

> In some applications such as embedded systems, which need a very
> fast reboot (less than a second), this may be too slow.

Where does this requirement come from? Surely kexec is part of a software 
update, not
regular operation.

> This patch reverts to using smp_send_stop() to signal all
> CPUs to stop immediately. Currently smp_send_stop() causes each cpu
> to call local_cpu_stop(), which goes into a disabled loop. This patch
> modifies local_cpu_stop() to call cpu_die() when kexec_in_progress
> is true, so that the CPU calls PSCI CPU_OFF just as in the case of
> smp_shutdown_nonboot_cpus().

This is appropriate for panic(), as we accept it may fail.

For Kexec(), the CPU must go offline, otherwise we can't overwrite the code it 
was
running. The arch code can't just call CPU_OFF in any context. See 5.5 CPU_OFF' 
of
https://developer.arm.com/documentation/den0022/d

5.5.2 describes what the OS must do first, in particular interrupts must be 
migrated away
from the CPU calling CPU_OFF. Currently the cpuhp notifiers do this, which 
after this
patch, no longer run.

You're going to need some duct-tape here, but I recall the proposed
'ARCH_OFFLINE_CPUS_ON_REBOOT', which would help, but isn't a complete thing. 
From the
discussion:
https://lore.kernel.org/lkml/87h80vwta7@nanos.tec.linutronix.de/
https://lore.kernel.org/lkml/alpine.deb.2.21.1908201321200.2...@nanos.tec.linutronix.de/

using cpuhp to offline these CPUs is the right thing to do.
If the problem is its too slow, can we tackled that instead?

> Using smp_send_stop() instead of
> smp_shutdown_nonboot_cpus() reduces the shutdown time for 23 CPUs
> from about .65 seconds on an idle system to less than 5 msecs. On a
> busy system smp_shutdown_nonboot_cpus() may take several seconds,
> while smp_send_stop() needs only the 5 msecs.

> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 4784011cecac..2568452a2417 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -142,12 +143,22 @@ void arch_cpu_idle_dead(void)
>   * This must completely disable all secondary CPUs; simply causing those CPUs
>   * to execute e.g. a RAM-based pin loop is not sufficient. This allows the
>   * kexec'd kernel to use any and all RAM as it sees fit, without having to
> - * avoid any code or data used by any SW CPU pin loop. The CPU hotplug
> - * functionality embodied in smpt_shutdown_nonboot_cpus() to achieve this.
> + * avoid any code or data used by any SW CPU pin loop. The target stop 
> function
> + * will call cpu_die() if kexec_in_progress is set.
>   */
>  void machine_shutdown(void)
>  {
> - smp_shutdown_nonboot_cpus(reboot_cpu);
> + unsigned long timeout;
> +
> + /*
> +  * Don't wait forever, but no longer than a second
> +  */

For kexec we must wait for the CPU to exit the current kernel. If it doesn't we 
can't
overwrite the current memory image with the kexec payload.

Features like CNP allow CPUs to share TLB entries. If a CPU is left behind in 
the older
kernel, the code its is executing will be overwritten and its behaviour stops 
being

Re: [PATCH 10/24] x86/resctrl: Move the schema names into struct resctrl_schema

2020-11-11 Thread James Morse

Hi Jamie,

Thanks for taking a look,

On 10/11/2020 11:39, Jamie Iles wrote:
> On Fri, Oct 30, 2020 at 04:11:06PM +0000, James Morse wrote:
>> Move the names used for the schemata file out of the resource and
>> into struct resctrl_schema. This allows one resource to have two
>> different names, based on the other schema properties.
>>
>> This patch copies the names, eventually resctrl will generate them.
>>
>> Remove the arch code's max_name_width, this is now resctrl's
>> problem.

>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
>> b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 311a3890bc53..48f4d6783647 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -2150,6 +2151,12 @@ static int create_schemata_list(void)
>>  s->num_closid = resctrl_arch_get_num_closid(r);
>>  s->conf_type = resctrl_to_arch_res(r)->conf_type;
>>  
>> +ret = snprintf(s->name, sizeof(s->name), r->name);
>> +if (ret >= sizeof(s->name)) {
>> +kfree(s);
>> +return -EINVAL;
>> +}
>> +
> 
> How about:
> 
> + ret = strscpy(s->name, r->name, sizeof(s->name));
> + if (ret < 0)) {
> + kfree(s);
> + return -EINVAL;
> + }

Never heard of it ... yup, that looks better. Thanks!
(I thought I knew not to write that bug!)


> So that there isn't a non-constant format specifier that'll trip 
> Coverity+friends up later?

Heh, its gone by the last patch.
Fixed locally.


Thanks,

James

Re: [RFC PATCH 1/4] ACPI: PPTT: Fix for a high level cache node detected in the low level

2020-11-06 Thread James Morse

Hi Shiju, Jonathan,

On 05/11/2020 17:42, Shiju Jose wrote:
> From: Jonathan Cameron 
> 
> According to the following sections of the PPTT definition in the
> ACPI specification(V6.3), a high level cache node( For example L2 cache)
> could be represented simultaneously both in the private resource
> of a CPU node and via the next_level_of_cache pointer of a low level
> cache node.
> 1. Section 5.2.29.1 Processor hierarchy node structure (Type 0)
> "Each processor node includes a list of resources that are private
> to that node. Resources are described in other PPTT structures such as
> Type 1 cache structures. The processor node’s private resource list
> includes a reference to each of the structures that represent private
> resources to a given processor node. For example, an SoC level processor
> node might contain two references, one pointing to a Level 3 cache
> resource and another pointing to an ID structure."
> 
> 2. Section 5.2.29.2 Cache Type Structure - Type 1
>Figure 5-26 Cache Type Structure - Type 1 Example

'fix' in the subject makes me twitch ... is there a user-space visible bug 
because of this?


> For the use case of creating EDAC device blocks for the CPU caches,
> we need to search for cache node types in all levels using
> acpi_find_cache_node(), as a platform independent solution to

I'm nervous to base the edac user-space view of caches on something other than 
what is
described in /sys/devices/system/cpu/cpu0/cache. These things have to match, 
otherwise
user-space can't work out which cpu's L2's it should add to get the value for 
the physical
cache.

Getting the data from somewhere else risks making this more complicated.

Using the PPTT means this won't work on "HPE Server"s that use ghes_edac too. I 
don't
think we should have any arm64 specific behaviour here.


> retrieve the cache info from the ACPI PPTT. The reason is that
> cacheinfo in the drivers/base/cacheinfo.c would not be populated
> in this stage.

Because both ghes_init() and cacheinfo_sysfs_init() are device_initcall()?

Couldn't we fix this by making ghes_init(), device_initcall_sync() (with a 
comment saying
what it depends on)


I agree this means dealing with cpuhp as the cacheinfo data is only available 
for online CPUs.


> In this case, we found acpi_find_cache_node()
> mistakenly detecting high level cache as low level cache, when
> the cache node is in the processor node’s private resource list.
> 
> To fix this issue add duplication check in the acpi_find_cache_level(),
> for a cache node found in the private resource of a CPU node
> with all the next level of caches present in the other cache nodes.

I'm not overly familiar with the PPTT, is it possible this issue is visible in
/sys/devices/system/cpu/cpu0/cache?


Thanks,

James


> diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
> index 4ae93350b70d..de1dd605d3ad 100644
> --- a/drivers/acpi/pptt.c
> +++ b/drivers/acpi/pptt.c
> @@ -132,21 +132,80 @@ static unsigned int acpi_pptt_walk_cache(struct 
> acpi_table_header *table_hdr,
>   return local_level;
>  }
>  
> +/**
> + * acpi_pptt_walk_check_duplicate() - Find the cache resource to check,
> + * is a duplication in the next_level_of_cache pointer of other cache.
> + * @table_hdr: Pointer to the head of the PPTT table
> + * @res: cache resource in the PPTT we want to walk
> + * @res_check: cache resource in the PPTT we want to check for duplication.
> + *
> + * Given both PPTT resource, verify that they are cache nodes, then walk
> + * down each level of cache @res, and check for the duplication.
> + *
> + * Return: true if duplication found, false otherwise.
> + */
> +static bool acpi_pptt_walk_check_duplicate(struct acpi_table_header 
> *table_hdr,
> +struct acpi_subtable_header *res,
> +struct acpi_subtable_header 
> *res_check)
> +{
> + struct acpi_pptt_cache *cache;
> + struct acpi_pptt_cache *check;
> +
> + if (res->type != ACPI_PPTT_TYPE_CACHE ||
> + res_check->type != ACPI_PPTT_TYPE_CACHE)
> + return false;
> +
> + cache = (struct acpi_pptt_cache *)res;
> + check = (struct acpi_pptt_cache *)res_check;
> + while (cache) {
> + if (cache == check)
> + return true;
> + cache = fetch_pptt_cache(table_hdr, cache->next_level_of_cache);
> + }
> +
> + return false;
> +}
> +
>  static struct acpi_pptt_cache *
>  acpi_find_cache_level(struct acpi_table_header *table_hdr,
> struct acpi_pptt_processor *cpu_node,
> unsigned int *starting_level, unsigned int level,
> int type)
>  {
> - struct acpi_subtable_header *res;
> + struct acpi_subtable_header *res, *res2;
>   unsigned int number_of_levels = *starting_level;
>   int resource = 0;
> + int resource2 = 0;
> + bool duplicate = false;
>   struct acpi_pptt_cache *ret = NULL;
>

Re: [RFC PATCH 0/4] EDAC/ghes: Add EDAC device for recording the CPU error count

2020-11-06 Thread James Morse

Hi Shiju,

On 05/11/2020 17:42, Shiju Jose wrote:
> For the firmware-first error handling on ARM64 hardware platforms,
> CPU cache corrected error count is not recorded.
> Create an CPU EDAC device and device blocks for the CPU caches
> for this purpose. The EDAC device blocks  are created based on the
> CPU caches information represented in the ACPI PPTT.

Using the PPTT won't work on x86 systems. Can we use the core-code's common 
data to learn
about caches: struct cpu_cacheinfo and struct cacheinfo ?


Thanks,

James

Re: [PATCH v6 1/7] arm64: mm: Move reserve_crashkernel() into mem_init()

2020-11-05 Thread James Morse

Hi!

On 03/11/2020 17:31, Nicolas Saenz Julienne wrote:
> crashkernel might reserve memory located in ZONE_DMA. We plan to delay
> ZONE_DMA's initialization after unflattening the devicetree and ACPI's
> boot table initialization, so move it later in the boot process.
> Specifically into mem_init(), this is the last place crashkernel will be
> able to reserve the memory before the page allocator kicks in.

> There
> isn't any apparent reason for doing this earlier.

It's so that map_mem() can carve it out of the linear/direct map.
This is so that stray writes from a crashing kernel can't accidentally corrupt 
the kdump
kernel. We depend on this if we continue with kdump, but failed to offline all 
the other
CPUs. We also depend on this when skipping the checksum code in purgatory, 
which can be
exceedingly slow.

Grepping around, the current order is:

start_kernel()
-> setup_arch()
-> arm64_memblock_init()/* reserve */
-> paging_init()
-> map_mem()/* carve out reservation */
[...]
-> mm_init()
-> mem_init()


I agree we should add comments to make this apparent!


Thanks,

James


> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 095540667f0f..fc4ab0d6d5d2 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -386,8 +386,6 @@ void __init arm64_memblock_init(void)
>   else
>   arm64_dma32_phys_limit = PHYS_MASK + 1;
>  
> - reserve_crashkernel();
> -
>   reserve_elfcorehdr();
>  
>   high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
> @@ -508,6 +506,8 @@ void __init mem_init(void)
>   else
>   swiotlb_force = SWIOTLB_NO_FORCE;
>  
> + reserve_crashkernel();
> +
>   set_max_mapnr(max_pfn - PHYS_PFN_OFFSET);
>  
>  #ifndef CONFIG_SPARSEMEM_VMEMMAP
>

[PATCH 23/24] x86/resctrl: Remove rdt_cdp_peer_get()

2020-10-30 Thread James Morse

Now that the configuration can be read from either resource, as they share
the ctrlval array, rdt_cdp_peer_get() is not needed to to map the resource
and search for the corresponding domain.

Replace it with a helper to return the 'other' CODE/DATA type, and use
the existing get-config helper.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 99 --
 1 file changed, 14 insertions(+), 85 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 162e415d5d09..0d561679f7e8 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1094,82 +1094,17 @@ static int rdtgroup_mode_show(struct kernfs_open_file 
*of,
return 0;
 }
 
-/**
- * rdt_cdp_peer_get - Retrieve CDP peer if it exists
- * @r: RDT resource to which RDT domain @d belongs
- * @d: Cache instance for which a CDP peer is requested
- * @r_cdp: RDT resource that shares hardware with @r (RDT resource peer)
- * Used to return the result.
- * @d_cdp: RDT domain that shares hardware with @d (RDT domain peer)
- * Used to return the result.
- * @peer_type: The CDP configuration type of the peer resource.
- *
- * RDT resources are managed independently and by extension the RDT domains
- * (RDT resource instances) are managed independently also. The Code and
- * Data Prioritization (CDP) RDT resources, while managed independently,
- * could refer to the same underlying hardware. For example,
- * RDT_RESOURCE_L2CODE and RDT_RESOURCE_L2DATA both refer to the L2 cache.
- *
- * When provided with an RDT resource @r and an instance of that RDT
- * resource @d rdt_cdp_peer_get() will return if there is a peer RDT
- * resource and the exact instance that shares the same hardware.
- *
- * Return: 0 if a CDP peer was found, <0 on error or if no CDP peer exists.
- * If a CDP peer was found, @r_cdp will point to the peer RDT resource
- * and @d_cdp will point to the peer RDT domain.
- */
-static int rdt_cdp_peer_get(struct rdt_resource *r, struct rdt_domain *d,
-   struct rdt_resource **r_cdp,
-   struct rdt_domain **d_cdp,
-   enum resctrl_conf_type *peer_type)
+static enum resctrl_conf_type resctrl_peer_type(enum resctrl_conf_type my_type)
 {
-   struct rdt_resource *_r_cdp = NULL;
-   struct rdt_domain *_d_cdp = NULL;
-   int ret = 0;
-
-   switch (r->rid) {
-   case RDT_RESOURCE_L3DATA:
-   _r_cdp = _resources_all[RDT_RESOURCE_L3CODE].resctrl;
-   *peer_type = CDP_CODE;
-   break;
-   case RDT_RESOURCE_L3CODE:
-   _r_cdp =  _resources_all[RDT_RESOURCE_L3DATA].resctrl;
-   *peer_type = CDP_DATA;
-   break;
-   case RDT_RESOURCE_L2DATA:
-   _r_cdp =  _resources_all[RDT_RESOURCE_L2CODE].resctrl;
-   *peer_type = CDP_CODE;
-   break;
-   case RDT_RESOURCE_L2CODE:
-   _r_cdp =  _resources_all[RDT_RESOURCE_L2DATA].resctrl;
-   *peer_type = CDP_DATA;
-   break;
+   switch (my_type) {
+   case CDP_CODE:
+   return CDP_DATA;
+   case CDP_DATA:
+   return CDP_CODE;
default:
-   ret = -ENOENT;
-   goto out;
-   }
-
-   /*
-* When a new CPU comes online and CDP is enabled then the new
-* RDT domains (if any) associated with both CDP RDT resources
-* are added in the same CPU online routine while the
-* rdtgroup_mutex is held. It should thus not happen for one
-* RDT domain to exist and be associated with its RDT CDP
-* resource but there is no RDT domain associated with the
-* peer RDT CDP resource. Hence the WARN.
-*/
-   _d_cdp = rdt_find_domain(_r_cdp, d->id, NULL);
-   if (WARN_ON(IS_ERR_OR_NULL(_d_cdp))) {
-   _r_cdp = NULL;
-   _d_cdp = NULL;
-   ret = -EINVAL;
+   case CDP_BOTH:
+   return CDP_BOTH;
}
-
-out:
-   *r_cdp = _r_cdp;
-   *d_cdp = _d_cdp;
-
-   return ret;
 }
 
 /**
@@ -1250,19 +1185,16 @@ static bool __rdtgroup_cbm_overlaps(struct rdt_resource 
*r, struct rdt_domain *d
 bool rdtgroup_cbm_overlaps(struct resctrl_schema *s, struct rdt_domain *d,
   unsigned long cbm, int closid, bool exclusive)
 {
-   enum resctrl_conf_type peer_type;
+   enum resctrl_conf_type peer_type = resctrl_peer_type(s->conf_type);
struct rdt_resource *r = s->res;
-   struct rdt_resource *r_cdp;
-   struct rdt_domain *d_cdp;
 
if (__rdtgroup_cbm_overlaps(r, d, cbm, closid, s->conf_type,
exclusive))
return true;
 
-   if (rdt_cdp_peer_get(r, d, _cdp, _cdp, _type) < 0)
+   if (!resctrl_arch_g

[PATCH 16/24] x86/resctrl: Add a helper to read/set the CDP configuration

2020-10-30 Thread James Morse

Currently whether CDP is enabled is described in the alloc_enabled
and alloc_capable flags, which are set differently between the L3
and L3CODE+L3DATA resources.

To merge these resources, to give us one configuration, the CDP state
of the resource needs tracking explicitly. Add cdp_capable as something
visible to resctrl, and cdp_enabled as something the arch code manages.

resctrl_arch_set_cdp_enabled() lets resctrl enable or disable CDP
on a resource. resctrl_arch_get_cdp_enabled() lets it read the
current state.

With Arm's MPAM, separate code and data closids is a part of the
CPU configuration. Enabling CDP for one resource means all resources
see the different closid values.

Signed-off-by: James Morse 

---
It may be possible for MPAM to apply the same 'L3' configuration to
the two closid that are in use, giving the illusion that CDP is enabled
for some resources, but disabled for others ... but this will complicate
monitoring.
---
 arch/x86/kernel/cpu/resctrl/core.c|  4 ++
 arch/x86/kernel/cpu/resctrl/internal.h| 11 +++-
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 67 +--
 4 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index cda071009fed..7e98869ba006 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -369,11 +369,15 @@ static void rdt_get_cdp_config(int level, int type)
r->cache.shareable_bits = r_l->cache.shareable_bits;
r->data_width = (r->cache.cbm_len + 3) / 4;
r->alloc_capable = true;
+   hw_res_l->cdp_capable = true;
+   hw_res->cdp_capable = true;
/*
 * By default, CDP is disabled. CDP can be enabled by mount parameter
 * "cdp" during resctrl file system mount time.
 */
r->alloc_enabled = false;
+   hw_res_l->cdp_enabled = false;
+   hw_res->cdp_enabled = false;
 }
 
 static void rdt_get_cdp_l3_config(void)
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index e86550d888cc..f039fd9f4f4f 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -365,6 +365,8 @@ struct rdt_parse_data {
  * @msr_base:  Base MSR address for CBMs
  * @msr_update:Function pointer to update QOS MSRs
  * @mon_scale: cqm counter * mon_scale = occupancy in bytes
+ * @cdp_capable:   Is the CDP feature available on this resource
+ * @cdp_enabled:   CDP state of this resource
  */
 struct rdt_hw_resource {
enum resctrl_conf_type  conf_type;
@@ -377,6 +379,8 @@ struct rdt_hw_resource {
 struct rdt_resource *r);
unsigned intmon_scale;
unsigned intmbm_width;
+   boolcdp_capable;
+   boolcdp_enabled;
 };
 
 static inline struct rdt_hw_resource *resctrl_to_arch_res(struct rdt_resource 
*r)
@@ -397,7 +401,7 @@ DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 
 extern struct dentry *debugfs_resctrl;
 
-enum {
+enum resctrl_res_level {
RDT_RESOURCE_L3,
RDT_RESOURCE_L3DATA,
RDT_RESOURCE_L3CODE,
@@ -418,6 +422,11 @@ static inline struct rdt_resource *resctrl_inc(struct 
rdt_resource *res)
return _res->resctrl;
 }
 
+static inline bool resctrl_arch_get_cdp_enabled(enum resctrl_res_level l)
+{
+   return rdt_resources_all[l].cdp_enabled;
+}
+
 #define for_each_rdt_resource(r) \
for (r = _resources_all[0].resctrl;   \
 r < _resources_all[RDT_NUM_RESOURCES].resctrl;   \
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c 
b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index d9d9861f244f..f126d442a65f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -684,8 +684,8 @@ int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp)
 *   resource, the portion of cache used by it should be made
 *   unavailable to all future allocations from both resources.
 */
-   if (rdt_resources_all[RDT_RESOURCE_L3DATA].resctrl.alloc_enabled ||
-   rdt_resources_all[RDT_RESOURCE_L2DATA].resctrl.alloc_enabled) {
+   if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3) ||
+   resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L2)) {
rdt_last_cmd_puts("CDP enabled\n");
return -EINVAL;
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f168f5a39242..6e150560c3c1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1995,51 +1995,62 @@ static int cdp_enable(int level, int data_type, in

[PATCH 18/24] x86/resctrl: Pass configuration type to resctrl_arch_get_config()

2020-10-30 Thread James Morse

Once the configuration arrays are merged, the get_config() helper needs to
be told whether the CODE, DATA or BOTH configuration is being retrieved.

Pass this information from the schema into resctrl_arch_get_config().

Nothing uses this yet, but it will later be used to map the closid
to the index in the configuration array.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  5 ++--
 arch/x86/kernel/cpu/resctrl/monitor.c |  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 35 +++
 include/linux/resctrl.h   |  3 +-
 4 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 0cf2f24e5c3b..f6b4049c67c2 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -429,7 +429,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 }
 
 void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
-u32 closid, u32 *value)
+u32 closid, enum resctrl_conf_type type, u32 
*value)
 {
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
 
@@ -451,7 +451,8 @@ static void show_doms(struct seq_file *s, struct 
resctrl_schema *schema, int clo
if (sep)
seq_puts(s, ";");
 
-   resctrl_arch_get_config(r, dom, closid, _val);
+   resctrl_arch_get_config(r, dom, closid, schema->conf_type,
+   _val);
seq_printf(s, r->format_str, dom->id, max_data_width,
   ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 6a62f1323b27..ab6630b466d5 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -379,7 +379,7 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
cur_bw = pmbm_data->prev_bw;
-   resctrl_arch_get_config(r_mba, dom_mba, closid, _bw);
+   resctrl_arch_get_config(r_mba, dom_mba, closid, CDP_BOTH, _bw);
delta_bw = pmbm_data->delta_bw;
/*
 * resctrl_arch_get_config() chooses the mbps/ctrl value to return
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index eeedafa7d5e7..cb9ca56ce2e6 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -924,7 +924,8 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
for (i = 0; i < closids_supported(); i++) {
if (!closid_allocated(i))
continue;
-   resctrl_arch_get_config(r, dom, i, _val);
+   resctrl_arch_get_config(r, dom, i, s->conf_type,
+   _val);
mode = rdtgroup_mode_by_closid(i);
switch (mode) {
case RDT_MODE_SHAREABLE:
@@ -1101,6 +1102,7 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
  * Used to return the result.
  * @d_cdp: RDT domain that shares hardware with @d (RDT domain peer)
  * Used to return the result.
+ * @peer_type: The CDP configuration type of the peer resource.
  *
  * RDT resources are managed independently and by extension the RDT domains
  * (RDT resource instances) are managed independently also. The Code and
@@ -1118,7 +1120,8 @@ static int rdtgroup_mode_show(struct kernfs_open_file *of,
  */
 static int rdt_cdp_peer_get(struct rdt_resource *r, struct rdt_domain *d,
struct rdt_resource **r_cdp,
-   struct rdt_domain **d_cdp)
+   struct rdt_domain **d_cdp,
+   enum resctrl_conf_type *peer_type)
 {
struct rdt_resource *_r_cdp = NULL;
struct rdt_domain *_d_cdp = NULL;
@@ -1127,15 +1130,19 @@ static int rdt_cdp_peer_get(struct rdt_resource *r, 
struct rdt_domain *d,
switch (r->rid) {
case RDT_RESOURCE_L3DATA:
_r_cdp = _resources_all[RDT_RESOURCE_L3CODE].resctrl;
+   *peer_type = CDP_CODE;
break;
case RDT_RESOURCE_L3CODE:
_r_cdp =  _resources_all[RDT_RESOURCE_L3DATA].resctrl;
+   *peer_type = CDP_DATA;
break;
case RDT_RESOURCE_L2DATA:
_r_cdp =  _resources_all[RDT_RESOURCE_L2CODE].resctrl;
+   *peer_type = CDP_CODE;
break;
case RDT_RESOURCE_L2CODE:
_r_cdp =  _resources_all[RDT_RESOURCE_L2DATA].resctrl;
+   *peer_type = CDP_DATA;
break;
default:
ret = -EN

[PATCH 13/24] x86/resctrl: Allow different CODE/DATA configurations to be staged

2020-10-30 Thread James Morse

Now that the configuration is staged via an array, allow resctrl to
stage more than configuration at a time for a single resource and
closid.

To detect that the same schema is being specified twice when the schemata
file is written, the same slot in the staged_configuration array must be
used for each schema. Use the conf_type enum directly as an index.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 16 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  5 +++--
 include/linux/resctrl.h   |  4 +++-
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index b107c0202cfb..f7152c7fdc1b 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -60,10 +60,11 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
 int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 struct rdt_domain *d)
 {
-   struct resctrl_staged_config *cfg = >staged_config[0];
+   struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
unsigned long bw_val;
 
+   cfg = >staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
@@ -131,11 +132,12 @@ static bool cbm_validate(char *buf, u32 *data, struct 
rdt_resource *r)
 int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
  struct rdt_domain *d)
 {
-   struct resctrl_staged_config *cfg = >staged_config[0];
struct rdtgroup *rdtgrp = data->rdtgrp;
+   struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
u32 cbm_val;
 
+   cfg = >staged_config[s->conf_type];
if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
@@ -194,6 +196,7 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
 static int parse_line(char *line, struct resctrl_schema *s,
  struct rdtgroup *rdtgrp)
 {
+   enum resctrl_conf_type t = s->conf_type;
struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
@@ -225,7 +228,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
if (r->parse_ctrlval(, s, d))
return -EINVAL;
if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
-   cfg = >staged_config[0];
+   cfg = >staged_config[t];
/*
 * In pseudo-locking setup mode and just
 * parsed a valid CBM that should be
@@ -266,10 +269,11 @@ int update_domains(struct rdt_resource *r, int closid)
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
struct msr_param msr_param;
+   enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
bool mba_sc;
-   int cpu, i;
+   int cpu;
 
if (!zalloc_cpumask_var(_mask, GFP_KERNEL))
return -ENOMEM;
@@ -281,8 +285,8 @@ int update_domains(struct rdt_resource *r, int closid)
mba_sc = is_mba_sc(r);
list_for_each_entry(d, >domains, list) {
hw_dom = resctrl_to_arch_dom(d);
-   for (i = 0; i < ARRAY_SIZE(d->staged_config); i++) {
-   cfg = _dom->resctrl.staged_config[i];
+   for (t = 0; t < ARRAY_SIZE(d->staged_config); t++) {
+   cfg = _dom->resctrl.staged_config[t];
if (!cfg->have_new_ctrl)
continue;
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1092631ac0b3..5eb14dc9c579 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2747,6 +2747,7 @@ static u32 cbm_ensure_valid(u32 _val, struct rdt_resource 
*r)
 static int __init_one_rdt_domain(struct rdt_domain *d, struct resctrl_schema 
*s,
 u32 closid)
 {
+   enum resctrl_conf_type t = s-> conf_type;
struct rdt_resource *r_cdp = NULL;
struct resctrl_staged_config *cfg;
struct rdt_domain *d_cdp = NULL;
@@ -2758,7 +2759,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, 
struct resctrl_schema *s,
int i;
 
rdt_cdp_peer_get(r, d, _cdp, _cdp);
-   cfg = >staged_config[0];
+   cfg = >staged_config[t];
cfg->have_new_ctrl = false;
cfg->new_ctrl = r->cache.shareable_bits;
us

[PATCH 15/24] x86/resctrl: Add a helper to read a closid's configuration

2020-10-30 Thread James Morse

The hardware configuration may look completely different to the
values resctrl gets from user-space. The staged configuration
and resctrl_arch_update_domains() allow the architecture to
convert or translate these values.
(e.g. Arm's MPAM may back MBA's percentage control using the
 'BWPBM' bitmap)

Resctrl shouldn't read or write these values directly. As a
step towards taking direct access away, add a helper to read
the current configuration.

This will allow another architecture to scale the bitmaps if
necessary, and possibly use controls that don't take the user-space
control format at all.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 16 ++---
 arch/x86/kernel/cpu/resctrl/monitor.c |  6 +++-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 43 ++-
 include/linux/resctrl.h   |  2 ++
 4 files changed, 37 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 91864c2e5795..0cf2f24e5c3b 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -428,22 +428,30 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file 
*of,
return ret ?: nbytes;
 }
 
+void resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
+u32 closid, u32 *value)
+{
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+
+   if (!is_mba_sc(r))
+   *value = hw_dom->ctrl_val[closid];
+   else
+   *value = hw_dom->mbps_val[closid];
+}
+
 static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int 
closid)
 {
struct rdt_resource *r = schema->res;
-   struct rdt_hw_domain *hw_dom;
struct rdt_domain *dom;
bool sep = false;
u32 ctrl_val;
 
seq_printf(s, "%*s:", RESCTRL_NAME_LEN, schema->name);
list_for_each_entry(dom, >domains, list) {
-   hw_dom = resctrl_to_arch_dom(dom);
if (sep)
seq_puts(s, ";");
 
-   ctrl_val = (!is_mba_sc(r) ? hw_dom->ctrl_val[closid] :
-   hw_dom->mbps_val[closid]);
+   resctrl_arch_get_config(r, dom, closid, _val);
seq_printf(s, r->format_str, dom->id, max_data_width,
   ctrl_val);
sep = true;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 8b7d7ebfcd4b..6a62f1323b27 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -379,8 +379,12 @@ static void update_mba_bw(struct rdtgroup *rgrp, struct 
rdt_domain *dom_mbm)
hw_dom_mba = resctrl_to_arch_dom(dom_mba);
 
cur_bw = pmbm_data->prev_bw;
-   user_bw = hw_dom_mba->mbps_val[closid];
+   resctrl_arch_get_config(r_mba, dom_mba, closid, _bw);
delta_bw = pmbm_data->delta_bw;
+   /*
+* resctrl_arch_get_config() chooses the mbps/ctrl value to return
+* based on is_mba_sc(). For now, reach into the hw_dom.
+*/
cur_msr_val = hw_dom_mba->ctrl_val[closid];
 
/*
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c6689cad1ce7..f168f5a39242 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -911,27 +911,27 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
bool sep = false;
-   u32 *ctrl;
+   u32 ctrl_val;
 
mutex_lock(_mutex);
hw_shareable = r->cache.shareable_bits;
list_for_each_entry(dom, >domains, list) {
if (sep)
seq_putc(seq, ';');
-   ctrl = resctrl_to_arch_dom(dom)->ctrl_val;
sw_shareable = 0;
exclusive = 0;
seq_printf(seq, "%d=", dom->id);
-   for (i = 0; i < closids_supported(); i++, ctrl++) {
+   for (i = 0; i < closids_supported(); i++) {
if (!closid_allocated(i))
continue;
+   resctrl_arch_get_config(r, dom, i, _val);
mode = rdtgroup_mode_by_closid(i);
switch (mode) {
case RDT_MODE_SHAREABLE:
-   sw_shareable |= *ctrl;
+   sw_shareable |= ctrl_val;
break;
case RDT_MODE_EXCLUSIVE:
-   exclusive |= *ctrl;
+   exclusive |= ctrl_val;
break;
case RDT_MODE_PSEUDO_LOCKSETUP:
/*
@@ -1190,7 +1

[PATCH 09/24] x86/resctrl: Change rdt_resource to resctrl_schema in pseudo_lock_region

2020-10-30 Thread James Morse

struct pseudo_lock_region points to the rdt_resource. Once the
resources are merged, this won't be unique. The resource name
is moving into the schema, so that eventually resctrl can generate
it.

Change pseudo_lock_region's rdt_resource pointer for a schema pointer.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 4 ++--
 arch/x86/kernel/cpu/resctrl/internal.h| 6 +++---
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 8 
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 4 ++--
 4 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index d3f9d142f58a..a65ff53394ed 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -228,7 +228,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 * the required initialization for single
 * region and return.
 */
-   rdtgrp->plr->r = r;
+   rdtgrp->plr->s = s;
rdtgrp->plr->d = d;
rdtgrp->plr->cbm = d->new_ctrl;
d->plr = rdtgrp->plr;
@@ -429,7 +429,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
ret = -ENODEV;
} else {
seq_printf(s, "%s:%d=%x\n",
-  rdtgrp->plr->r->name,
+  rdtgrp->plr->s->res->name,
   rdtgrp->plr->d->id,
   rdtgrp->plr->cbm);
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 1e1f2493a87f..27671a654f8b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -158,8 +158,8 @@ struct mongroup {
 
 /**
  * struct pseudo_lock_region - pseudo-lock region information
- * @r: RDT resource to which this pseudo-locked region
- * belongs
+ * @s: Resctrl schema for the resource to which this
+ * pseudo-locked region belongs
  * @d: RDT domain to which this pseudo-locked region
  * belongs
  * @cbm:   bitmask of the pseudo-locked region
@@ -179,7 +179,7 @@ struct mongroup {
  * @pm_reqs:   Power management QoS requests related to this region
  */
 struct pseudo_lock_region {
-   struct rdt_resource *r;
+   struct resctrl_schema   *s;
struct rdt_domain   *d;
u32 cbm;
wait_queue_head_t   lock_thread_wq;
diff --git a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c 
b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
index d58f2ffa65e0..d9d9861f244f 100644
--- a/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
+++ b/arch/x86/kernel/cpu/resctrl/pseudo_lock.c
@@ -246,7 +246,7 @@ static void pseudo_lock_region_clear(struct 
pseudo_lock_region *plr)
plr->line_size = 0;
kfree(plr->kmem);
plr->kmem = NULL;
-   plr->r = NULL;
+   plr->s = NULL;
if (plr->d)
plr->d->plr = NULL;
plr->d = NULL;
@@ -290,10 +290,10 @@ static int pseudo_lock_region_init(struct 
pseudo_lock_region *plr)
 
ci = get_cpu_cacheinfo(plr->cpu);
 
-   plr->size = rdtgroup_cbm_to_size(plr->r, plr->d, plr->cbm);
+   plr->size = rdtgroup_cbm_to_size(plr->s->res, plr->d, plr->cbm);
 
for (i = 0; i < ci->num_leaves; i++) {
-   if (ci->info_list[i].level == plr->r->cache_level) {
+   if (ci->info_list[i].level == plr->s->res->cache_level) {
plr->line_size = ci->info_list[i].coherency_line_size;
return 0;
}
@@ -796,7 +796,7 @@ bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain 
*d, unsigned long cbm
unsigned long cbm_b;
 
if (d->plr) {
-   cbm_len = d->plr->r->cache.cbm_len;
+   cbm_len = d->plr->s->res->cache.cbm_len;
cbm_b = d->plr->cbm;
if (bitmap_intersects(, _b, cbm_len))
return true;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 592a517afd6a..311a3890bc53 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1441,8 +1441,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
ret = -ENODEV;

[PATCH 21/24] x86/resctrl: Calculate the index from the configuration type

2020-10-30 Thread James Morse

resctrl uses cbm_idx() to map a closid to an index in the
configuration array. This is based on whether this is a CODE,
DATA or BOTH resource.

To merge the resources, resctrl needs to make this decision
based on something else, as there will only be one resource.
Decide based on the staged configuration type. This makes the
static mult and offset parameters set by the arch code redundant.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c| 12 
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 17 +++--
 include/linux/resctrl.h   |  6 --
 3 files changed, 11 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 79b17ece4528..e2f5ea129be2 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -69,8 +69,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
},
.domains= domain_init(RDT_RESOURCE_L3),
.parse_ctrlval  = parse_cbm,
@@ -89,8 +87,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
},
.domains= 
domain_init(RDT_RESOURCE_L3DATA),
.parse_ctrlval  = parse_cbm,
@@ -109,8 +105,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 3,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 1,
},
.domains= 
domain_init(RDT_RESOURCE_L3CODE),
.parse_ctrlval  = parse_cbm,
@@ -129,8 +123,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
},
.domains= domain_init(RDT_RESOURCE_L2),
.parse_ctrlval  = parse_cbm,
@@ -149,8 +141,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
},
.domains= 
domain_init(RDT_RESOURCE_L2DATA),
.parse_ctrlval  = parse_cbm,
@@ -169,8 +159,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.cache_level= 2,
.cache = {
.min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 1,
},
.domains= 
domain_init(RDT_RESOURCE_L2CODE),
.parse_ctrlval  = parse_cbm,
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 28a251cf3c60..cb91dcd0f329 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -249,12 +249,17 @@ static int parse_line(char *line, struct resctrl_schema 
*s,
return -EINVAL;
 }
 
-static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
+static u32 get_config_index(u32 closid, enum resctrl_conf_type type)
 {
-   if (r->rid == RDT_RESOURCE_MBA)
+   switch (type) {
+   default:
+   case CDP_BOTH:
return closid;
-
-   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
+   case CDP_CODE:
+   return (closid * 2) + 1;
+   case CDP_DATA:
+   return (closid * 2);
+   }
 }
 
 /*
@@ -305,7 +310,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r)
if (!cfg->have_new_ctrl)
continue;
 
-   idx = cbm_idx(r, cfg->closid);
+   idx = get_config_index(cfg->closid, t);
if (!apply_config(hw_dom, cfg, cpu

[PATCH 20/24] x86/resctrl: Apply offset correction when config is staged

2020-10-30 Thread James Morse

When resctrl comes to write the CAT MSR values, it applies an
adjustment based on the style of the resource. CODE and DATA
resources have their closid mapped into an odd/even range.

Previously the ctrlval array was increased to be the same size
regardless of CODE/DATA/BOTH. Move the arithmetic into apply_config()
so that odd/even slots in the ctrlval array are used.

This makes it possible to merge the resources.

In future, the arithmetic will be based on the style of the configuration,
not the resource.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c| 15 +--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 15 ---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  7 ---
 3 files changed, 13 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index b2fda4cd88ba..79b17ece4528 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -195,11 +195,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
 };
 
-static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
-{
-   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
-}
-
 /*
  * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
  * as they do not have CPUID enumeration support for Cache allocation.
@@ -438,7 +433,7 @@ cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct 
rdt_resource *r)
unsigned int i;
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + cbm_idx(r, i), hw_dom->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
 }
 
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
@@ -549,14 +544,6 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
 
m.low = 0;
m.high = hw_res->num_closid;
-
-   /*
-* temporary: the array is full-size, but cat_wrmsr() still re-maps
-* the index.
-*/
-   if (hw_res->conf_type != CDP_BOTH)
-   m.high /= 2;
-
hw_res->msr_update(d, , r);
return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index f6b4049c67c2..28a251cf3c60 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -249,6 +249,14 @@ static int parse_line(char *line, struct resctrl_schema *s,
return -EINVAL;
 }
 
+static unsigned int cbm_idx(struct rdt_resource *r, unsigned int closid)
+{
+   if (r->rid == RDT_RESOURCE_MBA)
+   return closid;
+
+   return closid * r->cache.cbm_idx_mult + r->cache.cbm_idx_offset;
+}
+
 /*
  * Merge the staged config with the domains configuration array.
  * Return true if changes were made.
@@ -297,7 +305,7 @@ int resctrl_arch_update_domains(struct rdt_resource *r)
if (!cfg->have_new_ctrl)
continue;
 
-   idx = cfg->closid;
+   idx = cbm_idx(r, cfg->closid);
if (!apply_config(hw_dom, cfg, cpu_mask, idx, mba_sc))
continue;
 
@@ -432,11 +440,12 @@ void resctrl_arch_get_config(struct rdt_resource *r, 
struct rdt_domain *d,
 u32 closid, enum resctrl_conf_type type, u32 
*value)
 {
struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+   u32 idx = cbm_idx(r, closid);
 
if (!is_mba_sc(r))
-   *value = hw_dom->ctrl_val[closid];
+   *value = hw_dom->ctrl_val[idx];
else
-   *value = hw_dom->mbps_val[closid];
+   *value = hw_dom->mbps_val[idx];
 }
 
 static void show_doms(struct seq_file *s, struct resctrl_schema *schema, int 
closid)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 4fa6c386d751..162e415d5d09 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2379,13 +2379,6 @@ static int reset_all_ctrls(struct rdt_resource *r)
msr_param.low = 0;
msr_param.high = hw_res->num_closid;
 
-   /*
-* temporary: the array is full-sized, but cat_wrmsr() still re-maps
-* the index.
-*/
-   if (hw_res->cdp_enabled)
-   msr_param.high /= 2;
-
/*
 * Disable resource control for this resource by setting all
 * CBMs in all domains to the maximum mask value. Pick one CPU
-- 
2.28.0

[PATCH 10/24] x86/resctrl: Move the schema names into struct resctrl_schema

2020-10-30 Thread James Morse

Move the names used for the schemata file out of the resource and
into struct resctrl_schema. This allows one resource to have two
different names, based on the other schema properties.

This patch copies the names, eventually resctrl will generate them.

Remove the arch code's max_name_width, this is now resctrl's
problem.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c|  9 ++---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 10 +++---
 arch/x86/kernel/cpu/resctrl/internal.h|  2 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 17 -
 include/linux/resctrl.h   |  7 +++
 5 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 1ed5e04031e6..cda071009fed 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -37,10 +37,10 @@ DEFINE_MUTEX(rdtgroup_mutex);
 DEFINE_PER_CPU(struct resctrl_pqr_state, pqr_state);
 
 /*
- * Used to store the max resource name width and max resource data width
+ * Used to store the max resource data width
  * to display the schemata in a tabular format
  */
-int max_name_width, max_data_width;
+int max_data_width;
 
 /*
  * Global boolean for rdt_alloc which is true if any
@@ -776,13 +776,8 @@ static int resctrl_offline_cpu(unsigned int cpu)
 static __init void rdt_init_padding(void)
 {
struct rdt_resource *r;
-   int cl;
 
for_each_alloc_capable_rdt_resource(r) {
-   cl = strlen(r->name);
-   if (cl > max_name_width)
-   max_name_width = cl;
-
if (r->data_width > max_data_width)
max_data_width = r->data_width;
}
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index a65ff53394ed..28d69c78c29e 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -291,13 +291,11 @@ static int rdtgroup_parse_resource(char *resname, char 
*tok,
   struct rdtgroup *rdtgrp)
 {
struct resctrl_schema *s;
-   struct rdt_resource *r;
 
lockdep_assert_held(_mutex);
 
list_for_each_entry(s, _all_schema, list) {
-   r = s->res;
-   if (!strcmp(resname, r->name) && rdtgrp->closid < s->num_closid)
+   if (!strcmp(resname, s->name) && rdtgrp->closid < s->num_closid)
return parse_line(tok, s, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
@@ -391,7 +389,7 @@ static void show_doms(struct seq_file *s, struct 
resctrl_schema *schema, int clo
bool sep = false;
u32 ctrl_val;
 
-   seq_printf(s, "%*s:", max_name_width, r->name);
+   seq_printf(s, "%*s:", RESCTRL_NAME_LEN, schema->name);
list_for_each_entry(dom, >domains, list) {
hw_dom = resctrl_to_arch_dom(dom);
if (sep)
@@ -411,7 +409,6 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
 {
struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
-   struct rdt_resource *r;
int ret = 0;
u32 closid;
 
@@ -419,8 +416,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
if (rdtgrp) {
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
list_for_each_entry(schema, _all_schema, list) {
-   r = schema->res;
-   seq_printf(s, "%s:uninitialized\n", r->name);
+   seq_printf(s, "%s:uninitialized\n", 
schema->name);
}
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
if (!rdtgrp->plr->d) {
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 27671a654f8b..5294ae0c3ed9 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -248,7 +248,7 @@ struct rdtgroup {
 /* List of all resource groups */
 extern struct list_head rdt_all_groups;
 
-extern int max_name_width, max_data_width;
+extern int max_data_width;
 
 int __init rdtgroup_init(void);
 void __exit rdtgroup_exit(void);
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 311a3890bc53..48f4d6783647 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1440,8 +1440,8 @@ static int rdtgroup_size_show(struct kernfs_open_file *of,
rdt_last_cmd_puts("Cache domain offline\n");
ret = -ENODEV;
} else {
-   seq_printf(s, &

[PATCH 14/24] x86/resctrl: Make update_domains() learn the affected closids

2020-10-30 Thread James Morse

Now that the closid is present in the staged configuration,
update_domains() can learn which low/high values it should update,
instead of being explicitly told. This paves the way for multiple
configuration changes being staged, affecting different indexes
in the ctrlval array.

Remove the single passed in closid, and update msr_param as each
staged config is applied.

Once the L2/L2CODE/L2DATA resources are merged this will allow
update_domains() to be called once for the single resource, even
when CDP is in use. This results in both CODE and DATA
configurations being applied and the two consecutive closids being
updated with a single smp_call_function_many().

This keeps the CDP odd/even behaviour inside the arch code for resctrl,
so that architectures that don't do this don't need to emulate it.

As update_domains() applies the staged configuration to the hw_dom's
configuration array, and updates the hardware, make it part of the
arch code interface.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 40 +--
 arch/x86/kernel/cpu/resctrl/internal.h|  6 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  2 +-
 include/linux/resctrl.h   |  1 +
 4 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index f7152c7fdc1b..91864c2e5795 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -249,37 +249,44 @@ static int parse_line(char *line, struct resctrl_schema 
*s,
return -EINVAL;
 }
 
-static void apply_config(struct rdt_hw_domain *hw_dom,
+/*
+ * Merge the staged config with the domains configuration array.
+ * Return true if changes were made.
+ */
+static bool apply_config(struct rdt_hw_domain *hw_dom,
 struct resctrl_staged_config *cfg,
-cpumask_var_t cpu_mask, bool mba_sc)
+cpumask_var_t cpu_mask, u32 idx, bool mba_sc)
 {
struct rdt_domain *dom = _dom->resctrl;
u32 *dc = mba_sc ? hw_dom->mbps_val : hw_dom->ctrl_val;
 
-   if (cfg->new_ctrl != dc[cfg->closid]) {
+   cfg->have_new_ctrl = false;
+   if (cfg->new_ctrl != dc[idx]) {
cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
-   dc[cfg->closid] = cfg->new_ctrl;
+   dc[idx] = cfg->new_ctrl;
+
+   return true;
}
 
-   cfg->have_new_ctrl = false;
+   return false;
 }
 
-int update_domains(struct rdt_resource *r, int closid)
+int resctrl_arch_update_domains(struct rdt_resource *r)
 {
struct resctrl_staged_config *cfg;
struct rdt_hw_domain *hw_dom;
+   bool msr_param_init = false;
struct msr_param msr_param;
enum resctrl_conf_type t;
cpumask_var_t cpu_mask;
struct rdt_domain *d;
bool mba_sc;
int cpu;
+   u32 idx;
 
if (!zalloc_cpumask_var(_mask, GFP_KERNEL))
return -ENOMEM;
 
-   msr_param.low = closid;
-   msr_param.high = msr_param.low + 1;
msr_param.res = r;
 
mba_sc = is_mba_sc(r);
@@ -290,10 +297,23 @@ int update_domains(struct rdt_resource *r, int closid)
if (!cfg->have_new_ctrl)
continue;
 
-   apply_config(hw_dom, cfg, cpu_mask, mba_sc);
+   idx = cfg->closid;
+   if (!apply_config(hw_dom, cfg, cpu_mask, idx, mba_sc))
+   continue;
+
+   if (!msr_param_init) {
+   msr_param.low = idx;
+   msr_param.high = idx;
+   msr_param_init = true;
+   } else {
+   msr_param.low = min(msr_param.low, idx);
+   msr_param.high = max(msr_param.high, idx);
+   }
}
}
 
+   msr_param.high += 1;
+
/*
 * Avoid writing the control msr with control values when
 * MBA software controller is enabled
@@ -387,7 +407,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
 
list_for_each_entry(s, _all_schema, list) {
r = s->res;
-   ret = update_domains(r, rdtgrp->closid);
+   ret = resctrl_arch_update_domains(r);
if (ret)
goto out;
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 5294ae0c3ed9..e86550d888cc 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -324,8 +324,8 @@ static inline struct rdt_hw_domain 
*resctrl_to_arch_dom(struct rdt_domain *r)
  */
 struct msr_param {
struct rdt_resource *res;

[PATCH 12/24] x86/resctrl: Add closid to the staged config

2020-10-30 Thread James Morse

Once the L2/L2CODE/L2DATA resources are merged, there may be two
configurations staged for one resource when CDP is enabled. The
closid should always be passed with the type of configuration to the
arch code.

Because update_domains() will eventually apply a set of configurations,
it should take the closid from the same place, so they pair up.

Move the closid to be a staged parameter.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 10 ++
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  6 --
 include/linux/resctrl.h   |  2 ++
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 0c95ed83eb05..b107c0202cfb 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -72,6 +72,7 @@ int parse_bw(struct rdt_parse_data *data, struct 
resctrl_schema *s,
if (!bw_validate(data->buf, _val, r))
return -EINVAL;
cfg->new_ctrl = bw_val;
+   cfg->closid = data->rdtgrp->closid;
cfg->have_new_ctrl = true;
 
return 0;
@@ -178,6 +179,7 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
}
 
cfg->new_ctrl = cbm_val;
+   cfg->closid = data->rdtgrp->closid;
cfg->have_new_ctrl = true;
 
return 0;
@@ -245,15 +247,15 @@ static int parse_line(char *line, struct resctrl_schema 
*s,
 }
 
 static void apply_config(struct rdt_hw_domain *hw_dom,
-struct resctrl_staged_config *cfg, int closid,
+struct resctrl_staged_config *cfg,
 cpumask_var_t cpu_mask, bool mba_sc)
 {
struct rdt_domain *dom = _dom->resctrl;
u32 *dc = mba_sc ? hw_dom->mbps_val : hw_dom->ctrl_val;
 
-   if (cfg->new_ctrl != dc[closid]) {
+   if (cfg->new_ctrl != dc[cfg->closid]) {
cpumask_set_cpu(cpumask_any(>cpu_mask), cpu_mask);
-   dc[closid] = cfg->new_ctrl;
+   dc[cfg->closid] = cfg->new_ctrl;
}
 
cfg->have_new_ctrl = false;
@@ -284,7 +286,7 @@ int update_domains(struct rdt_resource *r, int closid)
if (!cfg->have_new_ctrl)
continue;
 
-   apply_config(hw_dom, cfg, closid, cpu_mask, mba_sc);
+   apply_config(hw_dom, cfg, cpu_mask, mba_sc);
}
}
 
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index c307170ee45f..1092631ac0b3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2806,6 +2806,7 @@ static int __init_one_rdt_domain(struct rdt_domain *d, 
struct resctrl_schema *s,
rdt_last_cmd_printf("No space on %s:%d\n", s->name, d->id);
return -ENOSPC;
}
+   cfg->closid = closid;
cfg->have_new_ctrl = true;
 
return 0;
@@ -2836,7 +2837,7 @@ static int rdtgroup_init_cat(struct resctrl_schema *s, 
u32 closid)
 }
 
 /* Initialize MBA resource with default values. */
-static void rdtgroup_init_mba(struct rdt_resource *r)
+static void rdtgroup_init_mba(struct rdt_resource *r, u32 closid)
 {
struct resctrl_staged_config *cfg;
struct rdt_domain *d;
@@ -2844,6 +2845,7 @@ static void rdtgroup_init_mba(struct rdt_resource *r)
list_for_each_entry(d, >domains, list) {
cfg = >staged_config[0];
cfg->new_ctrl = is_mba_sc(r) ? MBA_MAX_MBPS : r->default_ctrl;
+   cfg->closid = closid;
cfg->have_new_ctrl = true;
}
 }
@@ -2860,7 +2862,7 @@ static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
list_for_each_entry(s, _all_schema, list) {
r = s->res;
if (r->rid == RDT_RESOURCE_MBA) {
-   rdtgroup_init_mba(r);
+   rdtgroup_init_mba(r, rdtgrp->closid);
} else {
ret = rdtgroup_init_cat(s, rdtgrp->closid);
if (ret < 0)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f1164bbb66c5..695247c08ba3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -28,10 +28,12 @@ enum resctrl_conf_type {
 
 /**
  * struct resctrl_staged_config - parsed configuration to be applied
+ * @closid:The closid the new configuration applies to
  * @new_ctrl:  new ctrl value to be loaded
  * @have_new_ctrl: did user provide new_ctrl for this domain
  */
 struct resctrl_staged_config {
+   u32 closid;
u32 new_ctrl;
boolhave_new_ctrl;
 };
-- 
2.28.0

[PATCH 17/24] x86/resctrl: Use cdp_enabled in rdt_domain_reconfigure_cdp()

2020-10-30 Thread James Morse

rdt_domain_reconfigure_cdp() infers whether CDP is enabled by
checking the alloc_capable and alloc_enabled flags of the data
resources.

Now that there is an explicit cdp_enabled, use that.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 6e150560c3c1..eeedafa7d5e7 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -1944,14 +1944,16 @@ static int set_cache_qos_cfg(int level, bool enable)
 /* Restore the qos cfg state when a domain comes online */
 void rdt_domain_reconfigure_cdp(struct rdt_resource *r)
 {
-   if (!r->alloc_capable)
+   struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+
+   if (!hw_res->cdp_capable)
return;
 
if (r == _resources_all[RDT_RESOURCE_L2DATA].resctrl)
-   l2_qos_cfg_update(>alloc_enabled);
+   l2_qos_cfg_update(_res->cdp_enabled);
 
if (r == _resources_all[RDT_RESOURCE_L3DATA].resctrl)
-   l3_qos_cfg_update(>alloc_enabled);
+   l3_qos_cfg_update(_res->cdp_enabled);
 }
 
 /*
-- 
2.28.0

[PATCH 11/24] x86/resctrl: Group staged configuration into a separate struct

2020-10-30 Thread James Morse

Arm's MPAM may have surprisingly large bitmaps for its cache
portions as the architecture allows up to 4K portions. The size
exposed via resctrl may not be the same, some scaling may
occur.

The values written to hardware may be unlike the values received
from resctrl, e.g. MBA percentages may be backed by a bitmap,
or a maximum value that isn't a percentage.

Today resctrl's ctrlval arrays are written to directly by the
resctrl filesystem code. e.g. apply_config(). This is a problem
if scaling or conversion is needed by the architecture.

The arch code should own the ctrlval array (to allow scaling and
conversion), and should only need a single copy of the array for the
values currently applied in hardware.

Move the new_ctrl bitmap value and flag into a struct for staged
configuration changes. This is created as an array to allow one per type
of configuration. Today there is only one element in the array, but
eventually resctrl will use the array slots for CODE/DATA/BOTH to detect
a duplicate schema being written.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 49 ---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 22 +-
 include/linux/resctrl.h   | 17 +---
 3 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 28d69c78c29e..0c95ed83eb05 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -60,18 +60,19 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
 int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 struct rdt_domain *d)
 {
+   struct resctrl_staged_config *cfg = >staged_config[0];
struct rdt_resource *r = s->res;
unsigned long bw_val;
 
-   if (d->have_new_ctrl) {
+   if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
}
 
if (!bw_validate(data->buf, _val, r))
return -EINVAL;
-   d->new_ctrl = bw_val;
-   d->have_new_ctrl = true;
+   cfg->new_ctrl = bw_val;
+   cfg->have_new_ctrl = true;
 
return 0;
 }
@@ -129,11 +130,12 @@ static bool cbm_validate(char *buf, u32 *data, struct 
rdt_resource *r)
 int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
  struct rdt_domain *d)
 {
+   struct resctrl_staged_config *cfg = >staged_config[0];
struct rdtgroup *rdtgrp = data->rdtgrp;
struct rdt_resource *r = s->res;
u32 cbm_val;
 
-   if (d->have_new_ctrl) {
+   if (cfg->have_new_ctrl) {
rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
return -EINVAL;
}
@@ -175,8 +177,8 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
}
}
 
-   d->new_ctrl = cbm_val;
-   d->have_new_ctrl = true;
+   cfg->new_ctrl = cbm_val;
+   cfg->have_new_ctrl = true;
 
return 0;
 }
@@ -190,6 +192,7 @@ int parse_cbm(struct rdt_parse_data *data, struct 
resctrl_schema *s,
 static int parse_line(char *line, struct resctrl_schema *s,
  struct rdtgroup *rdtgrp)
 {
+   struct resctrl_staged_config *cfg;
struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
@@ -220,6 +223,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
if (r->parse_ctrlval(, s, d))
return -EINVAL;
if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
+   cfg = >staged_config[0];
/*
 * In pseudo-locking setup mode and just
 * parsed a valid CBM that should be
@@ -230,7 +234,7 @@ static int parse_line(char *line, struct resctrl_schema *s,
 */
rdtgrp->plr->s = s;
rdtgrp->plr->d = d;
-   rdtgrp->plr->cbm = d->new_ctrl;
+   rdtgrp->plr->cbm = cfg->new_ctrl;
d->plr = rdtgrp->plr;
return 0;
}
@@ -240,15 +244,30 @@ static int parse_line(char *line, struct resctrl_schema 
*s,
return -EINVAL;
 }
 
+static void apply_config(struct rdt_hw_domain *hw_dom,
+struct resctrl_staged_config *cfg, int closid,
+cpumask_var_t cpu_mask, bool mba_sc)
+{
+   struct rdt_domain *dom = _dom->resctrl;
+   u32 *dc = mba_sc ? hw_dom->mbps_val : hw_dom->ctrl_val;

[PATCH 22/24] x86/resctrl: Merge the ctrlval arrays

2020-10-30 Thread James Morse

Now that the CODE/DATA resources don't use overlapping slots in the
ctrlval arrays, they can be merged. This allows the cdp_peer configuration
to be read from any resource's domain, instead of searching for the matching
flavour.

Add a helper to allocate the ctrlval array, that returns the value on the
L2 or L3 resource if it already exists. This gets removed once the
resources are merged, and there really is only one ctrlval array.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c | 79 +++---
 1 file changed, 72 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index e2f5ea129be2..01d010977367 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -509,6 +509,72 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 
*dc, u32 *dm)
}
 }
 
+/*
+ * temporary.
+ * This relies on L2 or L3 being allocated before their CODE/DATA aliases
+ */
+static u32 *alloc_ctrlval_array(struct rdt_resource *r, struct rdt_domain *d,
+   bool mba_sc)
+{
+   /* these are for the underlying hardware, they may not match r/d */
+   struct rdt_domain *underlying_domain;
+   struct rdt_hw_resource *hw_res;
+   struct rdt_hw_domain *hw_dom;
+   bool remapped;
+
+   switch (r->rid) {
+   case RDT_RESOURCE_L3DATA:
+   case RDT_RESOURCE_L3CODE:
+   hw_res = _resources_all[RDT_RESOURCE_L3];
+   remapped = true;
+   break;
+   case RDT_RESOURCE_L2DATA:
+   case RDT_RESOURCE_L2CODE:
+   hw_res = _resources_all[RDT_RESOURCE_L2];
+   remapped = true;
+   break;
+   default:
+   hw_res = resctrl_to_arch_res(r);
+   remapped = false;
+   }
+
+   /*
+* If we changed the resource, we need to search for the underlying
+* domain. Doing this for all resources would make it tricky to add the
+* first resource, as domains aren't added to a resource list until
+* after the ctrlval arrays have been allocated.
+*/
+   if (remapped)
+   underlying_domain = rdt_find_domain(_res->resctrl, d->id,
+   NULL);
+   else
+   underlying_domain = d;
+   hw_dom = resctrl_to_arch_dom(underlying_domain);
+
+   if (mba_sc) {
+   if (hw_dom->mbps_val)
+   return hw_dom->mbps_val;
+   return kmalloc_array(hw_res->num_closid,
+sizeof(*hw_dom->mbps_val), GFP_KERNEL);
+   } else {
+   if (hw_dom->ctrl_val)
+   return hw_dom->ctrl_val;
+   return kmalloc_array(hw_res->num_closid,
+sizeof(*hw_dom->ctrl_val), GFP_KERNEL);
+   }
+}
+
+/* Only kfree() for L2/L3, not the CODE/DATA aliases */
+static void free_ctrlval_arrays(struct rdt_resource *r, struct rdt_domain *d)
+{
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
+
+   if (r->rid == RDT_RESOURCE_L2 || r->rid == RDT_RESOURCE_L3) {
+   kfree(hw_dom->ctrl_val);
+   kfree(hw_dom->mbps_val);
+   }
+}
+
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
@@ -516,18 +582,18 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
struct msr_param m;
u32 *dc, *dm;
 
-   dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val), 
GFP_KERNEL);
+   dc = alloc_ctrlval_array(r, d, false);
if (!dc)
return -ENOMEM;
+   hw_dom->ctrl_val = dc;
 
-   dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val), 
GFP_KERNEL);
+   dm = alloc_ctrlval_array(r, d, true);
if (!dm) {
-   kfree(dc);
+   free_ctrlval_arrays(r, d);
return -ENOMEM;
}
-
-   hw_dom->ctrl_val = dc;
hw_dom->mbps_val = dm;
+
setup_default_ctrlval(r, dc, dm);
 
m.low = 0;
@@ -677,8 +743,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
if (d->plr)
d->plr->d = NULL;
 
-   kfree(hw_dom->ctrl_val);
-   kfree(hw_dom->mbps_val);
+   free_ctrlval_arrays(r, d);
bitmap_free(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d->mbm_local);
-- 
2.28.0

[PATCH 24/24] x86/resctrl: Merge the CDP resources

2020-10-30 Thread James Morse

Now that resctrl uses the schema's configuration type as the source of
CODE/DATA configuration styles, and there is only one configuration
array between the three views of the resource, remove the CODE and DATA
aliases.

This means the arch code only needs to describe the hardware to
resctrl, which will then create the separate CODE/DATA schema
for its ABI.

Add a helper to add schema with a the CDP suffix if CDP is enabled.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c | 193 ++---
 arch/x86/kernel/cpu/resctrl/internal.h |   4 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 113 ---
 3 files changed, 72 insertions(+), 238 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 01d010977367..57d4131fdd80 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -78,42 +78,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base   = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
},
-   [RDT_RESOURCE_L3DATA] =
-   {
-   .conf_type  = CDP_DATA,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L3DATA,
-   .name   = "L3DATA",
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L3DATA),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L3_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
-   [RDT_RESOURCE_L3CODE] =
-   {
-   .conf_type  = CDP_CODE,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L3CODE,
-   .name   = "L3CODE",
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L3CODE),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L3_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
[RDT_RESOURCE_L2] =
{
.conf_type  = CDP_BOTH,
@@ -132,42 +96,6 @@ struct rdt_hw_resource rdt_resources_all[] = {
.msr_base   = MSR_IA32_L2_CBM_BASE,
.msr_update = cat_wrmsr,
},
-   [RDT_RESOURCE_L2DATA] =
-   {
-   .conf_type  = CDP_DATA,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L2DATA,
-   .name   = "L2DATA",
-   .cache_level= 2,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L2DATA),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L2_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
-   [RDT_RESOURCE_L2CODE] =
-   {
-   .conf_type  = CDP_CODE,
-   .resctrl = {
-   .rid= RDT_RESOURCE_L2CODE,
-   .name   = "L2CODE",
-   .cache_level= 2,
-   .cache = {
-   .min_cbm_bits   = 1,
-   },
-   .domains= 
domain_init(RDT_RESOURCE_L2CODE),
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
-   },
-   .msr_base   = MSR_IA32_L2_CBM_BASE,
-   .msr_update = cat_wrmsr,
-   },
[RDT_RESOURCE_MBA] =
{
.conf_type  = CDP_BOTH,
@@ -339,40

[PATCH 19/24] x86/resctrl: Make ctrlval arrays the same size

2020-10-30 Thread James Morse

The CODE and DATA resources have their own ctrlval arrays which are half
the size because num_closid was already adjusted.

Prior to having one ctrlval array for the resource, move the num_closid
correction into resctrl, so that the ctrlval arrays are all the same
size.

A shortlived quirk of this is that the caches are reset twice, once
for CODE once for DATA.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c | 10 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 10 ++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 7e98869ba006..b2fda4cd88ba 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -363,7 +363,7 @@ static void rdt_get_cdp_config(int level, int type)
struct rdt_resource *r = _resources_all[type].resctrl;
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
 
-   hw_res->num_closid = hw_res_l->num_closid / 2;
+   hw_res->num_closid = hw_res_l->num_closid;
r->cache.cbm_len = r_l->cache.cbm_len;
r->default_ctrl = r_l->default_ctrl;
r->cache.shareable_bits = r_l->cache.shareable_bits;
@@ -549,6 +549,14 @@ static int domain_setup_ctrlval(struct rdt_resource *r, 
struct rdt_domain *d)
 
m.low = 0;
m.high = hw_res->num_closid;
+
+   /*
+* temporary: the array is full-size, but cat_wrmsr() still re-maps
+* the index.
+*/
+   if (hw_res->conf_type != CDP_BOTH)
+   m.high /= 2;
+
hw_res->msr_update(d, , r);
return 0;
 }
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cb9ca56ce2e6..4fa6c386d751 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2165,6 +2165,9 @@ static int create_schemata_list(void)
 
s->res = r;
s->num_closid = resctrl_arch_get_num_closid(r);
+   if (resctrl_arch_get_cdp_enabled(r->rid))
+   s->num_closid /= 2;
+
s->conf_type = resctrl_to_arch_res(r)->conf_type;
 
ret = snprintf(s->name, sizeof(s->name), r->name);
@@ -2376,6 +2379,13 @@ static int reset_all_ctrls(struct rdt_resource *r)
msr_param.low = 0;
msr_param.high = hw_res->num_closid;
 
+   /*
+* temporary: the array is full-sized, but cat_wrmsr() still re-maps
+* the index.
+*/
+   if (hw_res->cdp_enabled)
+   msr_param.high /= 2;
+
/*
 * Disable resource control for this resource by setting all
 * CBMs in all domains to the maximum mask value. Pick one CPU
-- 
2.28.0

[PATCH 07/24] x86/resctrl: Label the resources with their configuration type

2020-10-30 Thread James Morse

Before the name for the schema can be generated, the type of the
configuration being applied to the resource needs to be known. Label
all the entries in rdt_resources_all[], and copy that value in to struct
resctrl_schema.

Subsequent patches will generate the schema names in what will become
the fs code. Eventually the fs code will generate pairs of CODE/DATA if
the platform supports CDP for this resource.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c | 7 +++
 arch/x86/kernel/cpu/resctrl/internal.h | 1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 1 +
 include/linux/resctrl.h| 8 
 4 files changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 5d5b566c4359..1ed5e04031e6 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -62,6 +62,7 @@ mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
 struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
{
+   .conf_type  = CDP_BOTH,
.resctrl = {
.rid= RDT_RESOURCE_L3,
.name   = "L3",
@@ -81,6 +82,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L3DATA] =
{
+   .conf_type  = CDP_DATA,
.resctrl = {
.rid= RDT_RESOURCE_L3DATA,
.name   = "L3DATA",
@@ -100,6 +102,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L3CODE] =
{
+   .conf_type  = CDP_CODE,
.resctrl = {
.rid= RDT_RESOURCE_L3CODE,
.name   = "L3CODE",
@@ -119,6 +122,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L2] =
{
+   .conf_type  = CDP_BOTH,
.resctrl = {
.rid= RDT_RESOURCE_L2,
.name   = "L2",
@@ -138,6 +142,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L2DATA] =
{
+   .conf_type  = CDP_DATA,
.resctrl = {
.rid= RDT_RESOURCE_L2DATA,
.name   = "L2DATA",
@@ -157,6 +162,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_L2CODE] =
{
+   .conf_type  = CDP_CODE,
.resctrl = {
.rid= RDT_RESOURCE_L2CODE,
.name   = "L2CODE",
@@ -176,6 +182,7 @@ struct rdt_hw_resource rdt_resources_all[] = {
},
[RDT_RESOURCE_MBA] =
{
+   .conf_type  = CDP_BOTH,
.resctrl = {
.rid= RDT_RESOURCE_MBA,
.name   = "MB",
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 682e84aebd14..6c87a81946b1 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -367,6 +367,7 @@ struct rdt_parse_data {
  * @mon_scale: cqm counter * mon_scale = occupancy in bytes
  */
 struct rdt_hw_resource {
+   enum resctrl_conf_type  conf_type;
struct rdt_resource resctrl;
 
int num_closid;
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 1bd785b1920c..628e5eb4d7a9 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2141,6 +2141,7 @@ static int create_schemata_list(void)
 
s->res = r;
s->num_closid = resctrl_arch_get_num_closid(r);
+   s->conf_type = resctrl_to_arch_res(r)->conf_type;
 
INIT_LIST_HEAD(>list);
list_add(>list, _all_schema);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b32152968bca..20d8b6dd4af4 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -15,6 +15,12 @@ int proc_resctrl_show(struct seq_file *m,
 
 #endif
 
+enum resctrl_conf_type {
+   CDP_BOTH,
+   CDP_CODE,
+   CDP_DATA,
+};
+
 /**
  * struct rdt_domain - group of cpus sharing an RDT resource
  * @list:  all instances of this resource
@@ -165,11 +171,13 @@ struct rdt_resource {
 
 /**
  * @list:  Member of resctrl's schema list
+ * @cdp_type:  Whether this entry is for code/data/both
  *

[PATCH 03/24] x86/resctrl: Add resctrl_arch_get_num_closid()

2020-10-30 Thread James Morse

resctrl chooses whether to enable CDP, once it does, half the number
of closid are available. MPAM doesn't behave like this, an in-kernel user
of MPAM could be 'using CDP' while resctrl is not.

To move the 'half the closids' behaviour to be part of the core code,
each schema would have a num_closids. This may be different from the
single resources num_closid if CDP is in use.

Add a helper to read the resource's num_closid, this should return the
number of closid that the resource supports, regardless of whether CDP
is in use.

For now return the hw_res->num_closid, which is already adjusted for CDP.
Once the CODE/DATA/BOTH resources are merged, resctrl can make the
adjustment when copying the value to the schema's num_closid.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c|  5 +
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  9 +++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 14 +-
 include/linux/resctrl.h   |  3 +++
 4 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 97040a54cc9a..5d5b566c4359 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -443,6 +443,11 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct 
rdt_resource *r)
return NULL;
 }
 
+u32 resctrl_arch_get_num_closid(struct rdt_resource *r)
+{
+   return resctrl_to_arch_res(r)->num_closid;
+}
+
 void rdt_ctrl_update(void *arg)
 {
struct msr_param *m = arg;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 2e7466659af3..14ea6a40993f 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -286,12 +286,11 @@ int update_domains(struct rdt_resource *r, int closid)
 static int rdtgroup_parse_resource(char *resname, char *tok,
   struct rdtgroup *rdtgrp)
 {
-   struct rdt_hw_resource *hw_res;
struct rdt_resource *r;
 
for_each_alloc_enabled_rdt_resource(r) {
-   hw_res = resctrl_to_arch_res(r);
-   if (!strcmp(resname, r->name) && rdtgrp->closid < 
hw_res->num_closid)
+   if (!strcmp(resname, r->name) &&
+rdtgrp->closid < resctrl_arch_get_num_closid(r))
return parse_line(tok, r, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
@@ -400,7 +399,6 @@ static void show_doms(struct seq_file *s, struct 
rdt_resource *r, int closid)
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
   struct seq_file *s, void *v)
 {
-   struct rdt_hw_resource *hw_res;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
int ret = 0;
@@ -425,8 +423,7 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
} else {
closid = rdtgrp->closid;
for_each_alloc_enabled_rdt_resource(r) {
-   hw_res = resctrl_to_arch_res(r);
-   if (closid < hw_res->num_closid)
+   if (closid < resctrl_arch_get_num_closid(r))
show_doms(s, r, closid);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b55861ff4e34..df10135f021e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -100,15 +100,13 @@ int closids_supported(void)
 
 static void closid_init(void)
 {
-   struct rdt_hw_resource *hw_res;
+   u32 rdt_min_closid = 32;
struct rdt_resource *r;
-   int rdt_min_closid = 32;
 
/* Compute rdt_min_closid across all resources */
-   for_each_alloc_enabled_rdt_resource(r) {
-   hw_res = resctrl_to_arch_res(r);
-   rdt_min_closid = min(rdt_min_closid, hw_res->num_closid);
-   }
+   for_each_alloc_enabled_rdt_resource(r)
+   rdt_min_closid = min(rdt_min_closid,
+resctrl_arch_get_num_closid(r));
 
closid_free_map = BIT_MASK(rdt_min_closid) - 1;
 
@@ -847,10 +845,8 @@ static int rdt_num_closids_show(struct kernfs_open_file 
*of,
struct seq_file *seq, void *v)
 {
struct rdt_resource *r = of->kn->parent->priv;
-   struct rdt_hw_resource *hw_res;
 
-   hw_res = resctrl_to_arch_res(r);
-   seq_printf(seq, "%d\n", hw_res->num_closid);
+   seq_printf(seq, "%d\n", resctrl_arch_get_num_closid(r));
return 0;
 }
 
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index f5af59b8f2a9..dfb0f32b73a1 100644
--- a/include/linux/resctrl.h
+++

[PATCH 06/24] x86/resctrl: Store the effective num_closid in the schema

2020-10-30 Thread James Morse

resctrl_schema holds properties that vary with the style of configuration
that resctrl applies to a resource.

Once the arch code has a single resource per cache that can be configured,
resctrl will need to keep track of the num_closid itself.

Add num_closid to resctrl_schema. Change callers like
rdtgroup_schemata_show() to walk the schema instead.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 13 -
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 11 +--
 include/linux/resctrl.h   |  2 ++
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 14ea6a40993f..8ac104c634fe 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -286,11 +286,12 @@ int update_domains(struct rdt_resource *r, int closid)
 static int rdtgroup_parse_resource(char *resname, char *tok,
   struct rdtgroup *rdtgrp)
 {
+   struct resctrl_schema *s;
struct rdt_resource *r;
 
-   for_each_alloc_enabled_rdt_resource(r) {
-   if (!strcmp(resname, r->name) &&
-rdtgrp->closid < resctrl_arch_get_num_closid(r))
+   list_for_each_entry(s, _all_schema, list) {
+   r = s->res;
+   if (!strcmp(resname, r->name) && rdtgrp->closid < s->num_closid)
return parse_line(tok, r, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
@@ -399,6 +400,7 @@ static void show_doms(struct seq_file *s, struct 
rdt_resource *r, int closid)
 int rdtgroup_schemata_show(struct kernfs_open_file *of,
   struct seq_file *s, void *v)
 {
+   struct resctrl_schema *schema;
struct rdtgroup *rdtgrp;
struct rdt_resource *r;
int ret = 0;
@@ -422,8 +424,9 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
}
} else {
closid = rdtgrp->closid;
-   for_each_alloc_enabled_rdt_resource(r) {
-   if (closid < resctrl_arch_get_num_closid(r))
+   list_for_each_entry(schema, _all_schema, list) {
+   r = schema->res;
+   if (closid < schema->num_closid)
show_doms(s, r, closid);
}
}
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index cb16454a6b0e..1bd785b1920c 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -103,13 +103,12 @@ int closids_supported(void)
 
 static void closid_init(void)
 {
+   struct resctrl_schema *s;
u32 rdt_min_closid = 32;
-   struct rdt_resource *r;
 
/* Compute rdt_min_closid across all resources */
-   for_each_alloc_enabled_rdt_resource(r)
-   rdt_min_closid = min(rdt_min_closid,
-resctrl_arch_get_num_closid(r));
+   list_for_each_entry(s, _all_schema, list)
+   rdt_min_closid = min(rdt_min_closid, s->num_closid);
 
closid_free_map = BIT_MASK(rdt_min_closid) - 1;
 
@@ -848,9 +847,8 @@ static int rdt_num_closids_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
 {
struct resctrl_schema *s = of->kn->parent->priv;
-   struct rdt_resource *r = s->res;
 
-   seq_printf(seq, "%d\n", resctrl_arch_get_num_closid(r));
+   seq_printf(seq, "%d\n", s->num_closid);
return 0;
 }
 
@@ -2142,6 +2140,7 @@ static int create_schemata_list(void)
return -ENOMEM;
 
s->res = r;
+   s->num_closid = resctrl_arch_get_num_closid(r);
 
INIT_LIST_HEAD(>list);
list_add(>list, _all_schema);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index de6cbc725753..b32152968bca 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -166,10 +166,12 @@ struct rdt_resource {
 /**
  * @list:  Member of resctrl's schema list
  * @res:   The rdt_resource for this entry
+ * @num_closid Number of CLOSIDs available for this resource
  */
 struct resctrl_schema {
struct list_headlist;
struct rdt_resource *res;
+   u32 num_closid;
 };
 
 /* The number of closid supported by this resource regardless of CDP */
-- 
2.28.0

[PATCH 02/24] x86/resctrl: Split struct rdt_domain

2020-10-30 Thread James Morse

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
Intel RDT, and moved it to /fs/.

Split struct rdt_domain up too. Move everything that that is particular
to resctrl into a new header file. resctrl code paths touching a 'hw'
struct indicates where an abstraction is needed.

No change in behaviour, this patch just moves types around.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c| 32 +++---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 10 --
 arch/x86/kernel/cpu/resctrl/internal.h| 40 +--
 arch/x86/kernel/cpu/resctrl/monitor.c |  8 +++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 29 ++--
 include/linux/resctrl.h   | 35 +++-
 6 files changed, 94 insertions(+), 60 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 470661f2eb68..97040a54cc9a 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -385,10 +385,11 @@ static void
 mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource 
*r)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
unsigned int i;
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + i, d->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + i, hw_dom->ctrl_val[i]);
 }
 
 /*
@@ -410,21 +411,23 @@ mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
struct rdt_resource *r)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
unsigned int i;
 
/*  Write the delay values for mba. */
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + i, delay_bw_map(d->ctrl_val[i], r));
+   wrmsrl(hw_res->msr_base + i, delay_bw_map(hw_dom->ctrl_val[i], 
r));
 }
 
 static void
 cat_wrmsr(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
unsigned int i;
 
for (i = m->low; i < m->high; i++)
-   wrmsrl(hw_res->msr_base + cbm_idx(r, i), d->ctrl_val[i]);
+   wrmsrl(hw_res->msr_base + cbm_idx(r, i), hw_dom->ctrl_val[i]);
 }
 
 struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r)
@@ -510,21 +513,22 @@ void setup_default_ctrlval(struct rdt_resource *r, u32 
*dc, u32 *dm)
 static int domain_setup_ctrlval(struct rdt_resource *r, struct rdt_domain *d)
 {
struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r);
+   struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d);
struct msr_param m;
u32 *dc, *dm;
 
-   dc = kmalloc_array(hw_res->num_closid, sizeof(*d->ctrl_val), 
GFP_KERNEL);
+   dc = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->ctrl_val), 
GFP_KERNEL);
if (!dc)
return -ENOMEM;
 
-   dm = kmalloc_array(hw_res->num_closid, sizeof(*d->mbps_val), 
GFP_KERNEL);
+   dm = kmalloc_array(hw_res->num_closid, sizeof(*hw_dom->mbps_val), 
GFP_KERNEL);
if (!dm) {
kfree(dc);
return -ENOMEM;
}
 
-   d->ctrl_val = dc;
-   d->mbps_val = dm;
+   hw_dom->ctrl_val = dc;
+   hw_dom->mbps_val = dm;
setup_default_ctrlval(r, dc, dm);
 
m.low = 0;
@@ -586,6 +590,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
struct list_head *add_pos = NULL;
+   struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
 
d = rdt_find_domain(r, id, _pos);
@@ -599,10 +604,11 @@ static void domain_add_cpu(int cpu, struct rdt_resource 
*r)
return;
}
 
-   d = kzalloc_node(sizeof(*d), GFP_KERNEL, cpu_to_node(cpu));
-   if (!d)
+   hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu));
+   if (!hw_dom)
return;
 
+   d = _dom->resctrl;
d->id = id;
cpumask_set_cpu(cpu, >cpu_mask);
 
@@ -631,6 +637,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 {
int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
+   struct rdt_hw_domain *hw_dom;
struct rdt_domain *d;
 
d = rdt_find_domain(r, id, NULL);
@@ -638,6 +645,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource 
*r)
pr_warn("Couldn't find cache id for CPU %d\n", cpu);
return;
}
+   hw_dom = res

[PATCH 08/24] x86/resctrl: Walk the resctrl schema list instead of an arch list

2020-10-30 Thread James Morse

Now that resctrl has its own list of resources it is using, walk that
list instead of the architectures list. This means resctrl has somewhere
to keep schema properties with the resource that is using them.

Most users of for_each_alloc_enabled_rdt_resource() are per-schema,
and also want a schema property, like the conf_type. Switch these to
walk the schema list. Schema were only created for alloc_enabled
resources so these two lists are currently equivalent.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 38 ++-
 arch/x86/kernel/cpu/resctrl/internal.h|  6 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c| 34 +---
 include/linux/resctrl.h   |  5 +--
 4 files changed, 53 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 8ac104c634fe..d3f9d142f58a 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -57,9 +57,10 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
return true;
 }
 
-int parse_bw(struct rdt_parse_data *data, struct rdt_resource *r,
+int parse_bw(struct rdt_parse_data *data, struct resctrl_schema *s,
 struct rdt_domain *d)
 {
+   struct rdt_resource *r = s->res;
unsigned long bw_val;
 
if (d->have_new_ctrl) {
@@ -125,10 +126,11 @@ static bool cbm_validate(char *buf, u32 *data, struct 
rdt_resource *r)
  * Read one cache bit mask (hex). Check that it is valid for the current
  * resource type.
  */
-int parse_cbm(struct rdt_parse_data *data, struct rdt_resource *r,
+int parse_cbm(struct rdt_parse_data *data, struct resctrl_schema *s,
  struct rdt_domain *d)
 {
struct rdtgroup *rdtgrp = data->rdtgrp;
+   struct rdt_resource *r = s->res;
u32 cbm_val;
 
if (d->have_new_ctrl) {
@@ -160,12 +162,12 @@ int parse_cbm(struct rdt_parse_data *data, struct 
rdt_resource *r,
 * The CBM may not overlap with the CBM of another closid if
 * either is exclusive.
 */
-   if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, true)) {
+   if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, true)) {
rdt_last_cmd_puts("Overlaps with exclusive group\n");
return -EINVAL;
}
 
-   if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
+   if (rdtgroup_cbm_overlaps(s, d, cbm_val, rdtgrp->closid, false)) {
if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
rdt_last_cmd_puts("Overlaps with other group\n");
@@ -185,9 +187,10 @@ int parse_cbm(struct rdt_parse_data *data, struct 
rdt_resource *r,
  * separated by ";". The "id" is in decimal, and must match one of
  * the "id"s for this resource.
  */
-static int parse_line(char *line, struct rdt_resource *r,
+static int parse_line(char *line, struct resctrl_schema *s,
  struct rdtgroup *rdtgrp)
 {
+   struct rdt_resource *r = s->res;
struct rdt_parse_data data;
char *dom = NULL, *id;
struct rdt_domain *d;
@@ -213,7 +216,8 @@ static int parse_line(char *line, struct rdt_resource *r,
if (d->id == dom_id) {
data.buf = dom;
data.rdtgrp = rdtgrp;
-   if (r->parse_ctrlval(, r, d))
+
+   if (r->parse_ctrlval(, s, d))
return -EINVAL;
if (rdtgrp->mode ==  RDT_MODE_PSEUDO_LOCKSETUP) {
/*
@@ -289,10 +293,12 @@ static int rdtgroup_parse_resource(char *resname, char 
*tok,
struct resctrl_schema *s;
struct rdt_resource *r;
 
+   lockdep_assert_held(_mutex);
+
list_for_each_entry(s, _all_schema, list) {
r = s->res;
if (!strcmp(resname, r->name) && rdtgrp->closid < s->num_closid)
-   return parse_line(tok, r, rdtgrp);
+   return parse_line(tok, s, rdtgrp);
}
rdt_last_cmd_printf("Unknown or unsupported resource name '%s'\n", 
resname);
return -EINVAL;
@@ -301,6 +307,7 @@ static int rdtgroup_parse_resource(char *resname, char *tok,
 ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
 {
+   struct resctrl_schema *s;
struct rdtgroup *rdtgrp;
struct rdt_domain *dom;
struct rdt_resource *r;
@@ -331,8 +338,8 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
goto out;
}
 
-   for_each_alloc_enabled_rdt_resource(r) {
-   list_fo

[PATCH 05/24] x86/resctrl: Pass the schema in resdir's private pointer

2020-10-30 Thread James Morse

Moving properties that resctrl exposes to user-space into the core
'fs' code, (e.g. the name of the schema), means some of the functions
that back the filesystem need the schema struct, but currently take the
resource.

Once the CDP resources are merged, the resource doesn't reflect the
right level of information.

For the info dirs that represent a control, the information needed
is in the schema, as this is how the resource is being used. For the
monitors, its the resource as L3CODE_MON doesn't make sense, and would
monitor data too.

This difference means the type of the private pointers varies
between control and monitor info dirs.

If the flags are RF_MON_INFO, its a struct rdt_resource. If the
flags are RF_CTRL_INFO, its a struct resctrl_schema. Nothing in
res_common_files[] has both flags.

Signed-off-by: James Morse 

---
Fake schema for monitors may simplify this if anyone thinks that is
preferable.
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 37 +-
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index f79a5e548138..cb16454a6b0e 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -847,7 +847,8 @@ static int rdt_last_cmd_status_show(struct kernfs_open_file 
*of,
 static int rdt_num_closids_show(struct kernfs_open_file *of,
struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%d\n", resctrl_arch_get_num_closid(r));
return 0;
@@ -856,7 +857,8 @@ static int rdt_num_closids_show(struct kernfs_open_file *of,
 static int rdt_default_ctrl_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%x\n", r->default_ctrl);
return 0;
@@ -865,7 +867,8 @@ static int rdt_default_ctrl_show(struct kernfs_open_file 
*of,
 static int rdt_min_cbm_bits_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%u\n", r->cache.min_cbm_bits);
return 0;
@@ -874,7 +877,8 @@ static int rdt_min_cbm_bits_show(struct kernfs_open_file 
*of,
 static int rdt_shareable_bits_show(struct kernfs_open_file *of,
   struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%x\n", r->cache.shareable_bits);
return 0;
@@ -897,13 +901,14 @@ static int rdt_shareable_bits_show(struct 
kernfs_open_file *of,
 static int rdt_bit_usage_show(struct kernfs_open_file *of,
  struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
/*
 * Use unsigned long even though only 32 bits are used to ensure
 * test_bit() is used safely.
 */
unsigned long sw_shareable = 0, hw_shareable = 0;
unsigned long exclusive = 0, pseudo_locked = 0;
+   struct rdt_resource *r = s->res;
struct rdt_domain *dom;
int i, hwb, swb, excl, psl;
enum rdtgrp_mode mode;
@@ -975,7 +980,8 @@ static int rdt_bit_usage_show(struct kernfs_open_file *of,
 static int rdt_min_bw_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%u\n", r->membw.min_bw);
return 0;
@@ -1006,7 +1012,8 @@ static int rdt_mon_features_show(struct kernfs_open_file 
*of,
 static int rdt_bw_gran_show(struct kernfs_open_file *of,
 struct seq_file *seq, void *v)
 {
-   struct rdt_resource *r = of->kn->parent->priv;
+   struct resctrl_schema *s = of->kn->parent->priv;
+   struct rdt_resource *r = s->res;
 
seq_printf(seq, "%u\n", r->membw.bw_gran);
return 0;
@@ -1015,7 +1022,8 @@ static int rdt_bw_gran_show(struct kernfs_open_file *of,
 static int rdt_delay_linear_show(struct kernfs_open_file *of,

[PATCH 01/24] x86/resctrl: Split struct rdt_resource

2020-10-30 Thread James Morse

resctrl is the defacto Linux ABI for SoC resource partitioning features.
To support it on another architecture, it needs to be abstracted from
Intel RDT, and moved it to /fs/.

Start by splitting struct rdt_resource, (the name is kept to keep the noise
down), and add some type-trickery to keep the foreach helpers working.

Move everything that that is particular to resctrl into a new header
file, keeping the x86 hardware accessors where they are. resctrl code
paths touching a 'hw' struct indicates where an abstraction is needed.

Splitting rdt_domain up in a similar way happens in the next patch.
No change in behaviour, this patch just moves types around.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/core.c| 258 --
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  14 +-
 arch/x86/kernel/cpu/resctrl/internal.h| 138 +++-
 arch/x86/kernel/cpu/resctrl/monitor.c |  32 +--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   4 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c|  69 +++---
 include/linux/resctrl.h   | 117 ++
 7 files changed, 362 insertions(+), 270 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index e5f4ee8f4c3b..470661f2eb68 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -57,120 +57,134 @@ static void
 mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m,
  struct rdt_resource *r);
 
-#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].domains)
+#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].resctrl.domains)
 
-struct rdt_resource rdt_resources_all[] = {
+struct rdt_hw_resource rdt_resources_all[] = {
[RDT_RESOURCE_L3] =
{
-   .rid= RDT_RESOURCE_L3,
-   .name   = "L3",
-   .domains= domain_init(RDT_RESOURCE_L3),
+   .resctrl = {
+   .rid= RDT_RESOURCE_L3,
+   .name   = "L3",
+   .cache_level= 3,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 1,
+   .cbm_idx_offset = 0,
+   },
+   .domains= domain_init(RDT_RESOURCE_L3),
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
.msr_base   = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   .cbm_idx_mult   = 1,
-   .cbm_idx_offset = 0,
-   },
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
},
[RDT_RESOURCE_L3DATA] =
{
-   .rid= RDT_RESOURCE_L3DATA,
-   .name   = "L3DATA",
-   .domains= domain_init(RDT_RESOURCE_L3DATA),
+   .resctrl = {
+   .rid= RDT_RESOURCE_L3DATA,
+   .name   = "L3DATA",
+   .cache_level= 3,
+   .cache = {
+   .min_cbm_bits   = 1,
+   .cbm_idx_mult   = 2,
+   .cbm_idx_offset = 0,
+   },
+   .domains= 
domain_init(RDT_RESOURCE_L3DATA),
+   .parse_ctrlval  = parse_cbm,
+   .format_str = "%d=%0*x",
+   .fflags = RFTYPE_RES_CACHE,
+   },
.msr_base   = MSR_IA32_L3_CBM_BASE,
.msr_update = cat_wrmsr,
-   .cache_level= 3,
-   .cache = {
-   .min_cbm_bits   = 1,
-   .cbm_idx_mult   = 2,
-   .cbm_idx_offset = 0,
-   },
-   .parse_ctrlval  = parse_cbm,
-   .format_str = "%d=%0*x",
-   .fflags = RFTYPE_RES_CACHE,
},
[RDT_RESOURCE_L3CODE] =
{
-   .rid= RDT_RESOURCE_L3CODE,
-   .name   = "L3CODE",
-   .domains= d

[PATCH 04/24] x86/resctrl: Add a separate schema list for resctrl

2020-10-30 Thread James Morse

To support multiple architectures, the resctrl code needs to be split
into a 'fs' specific part in core code, and an arch-specific backend.

It should be difficult for the arch-specific backends to diverge,
supporting slightly different ABIs for user-space. For example,
generating, parsing and validating the schema configuration values
should be done in what becomes the core code to prevent divergence.
Today, the schema emerge from which entries in the rdt_resources_all
array the arch code has chosen to enable.

Start by creating a struct resctrl_schema, which will eventually hold
the name and pending configuration values for resctrl.

Signed-off-by: James Morse 
---
 arch/x86/kernel/cpu/resctrl/internal.h |  1 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 43 +-
 include/linux/resctrl.h|  9 ++
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index f7aab9245259..682e84aebd14 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -106,6 +106,7 @@ extern unsigned int resctrl_cqm_threshold;
 extern bool rdt_alloc_capable;
 extern bool rdt_mon_capable;
 extern unsigned int rdt_mon_features;
+extern struct list_head resctrl_all_schema;
 
 enum rdt_group_type {
RDTCTRL_GROUP = 0,
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index df10135f021e..f79a5e548138 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -39,6 +39,9 @@ static struct kernfs_root *rdt_root;
 struct rdtgroup rdtgroup_default;
 LIST_HEAD(rdt_all_groups);
 
+/* list of entries for the schemata file */
+LIST_HEAD(resctrl_all_schema);
+
 /* Kernel fs node for "info" directory under root */
 static struct kernfs_node *kn_info;
 
@@ -2117,6 +2120,35 @@ static int rdt_enable_ctx(struct rdt_fs_context *ctx)
return ret;
 }
 
+static int create_schemata_list(void)
+{
+   struct resctrl_schema *s;
+   struct rdt_resource *r;
+
+   for_each_alloc_enabled_rdt_resource(r) {
+   s = kzalloc(sizeof(*s), GFP_KERNEL);
+   if (!s)
+   return -ENOMEM;
+
+   s->res = r;
+
+   INIT_LIST_HEAD(>list);
+   list_add(>list, _all_schema);
+   }
+
+   return 0;
+}
+
+static void destroy_schemata_list(void)
+{
+   struct resctrl_schema *s, *tmp;
+
+   list_for_each_entry_safe(s, tmp, _all_schema, list) {
+   list_del(>list);
+   kfree(s);
+   }
+}
+
 static int rdt_get_tree(struct fs_context *fc)
 {
struct rdt_fs_context *ctx = rdt_fc2context(fc);
@@ -2138,11 +2170,17 @@ static int rdt_get_tree(struct fs_context *fc)
if (ret < 0)
goto out_cdp;
 
+   ret = create_schemata_list();
+   if (ret) {
+   destroy_schemata_list();
+   goto out_mba;
+   }
+
closid_init();
 
ret = rdtgroup_create_info_dir(rdtgroup_default.kn);
if (ret < 0)
-   goto out_mba;
+   goto out_schemata_free;
 
if (rdt_mon_capable) {
ret = mongroup_create_dir(rdtgroup_default.kn,
@@ -2194,6 +2232,8 @@ static int rdt_get_tree(struct fs_context *fc)
kernfs_remove(kn_mongrp);
 out_info:
kernfs_remove(kn_info);
+out_schemata_free:
+   destroy_schemata_list();
 out_mba:
if (ctx->enable_mba_mbps)
set_mba_sc(false);
@@ -2439,6 +2479,7 @@ static void rdt_kill_sb(struct super_block *sb)
rmdir_all_sub();
rdt_pseudo_lock_release();
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
+   destroy_schemata_list();
static_branch_disable_cpuslocked(_alloc_enable_key);
static_branch_disable_cpuslocked(_mon_enable_key);
static_branch_disable_cpuslocked(_enable_key);
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index dfb0f32b73a1..de6cbc725753 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -163,6 +163,15 @@ struct rdt_resource {
 
 };
 
+/**
+ * @list:  Member of resctrl's schema list
+ * @res:   The rdt_resource for this entry
+ */
+struct resctrl_schema {
+   struct list_headlist;
+   struct rdt_resource *res;
+};
+
 /* The number of closid supported by this resource regardless of CDP */
 u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
 
-- 
2.28.0

[PATCH 00/24] x86/resctrl: Merge the CDP resources

2020-10-30 Thread James Morse

Hi folks,

This series re-folds the resctrl code so the CDP resources (L3CODE et al)
behaviour is all contained in the filesystem parts, with a minimum amount
of arch specific code.

Arm have some CPU support for dividing caches into portions, and
applying bandwidth limits at various points in the SoC. The collective term
for these features is MPAM: Memory Partitioning and Monitoring.

MPAM is similar enough to Intel RDT, that it should use the defacto linux
interface: resctrl. This filesystem currently lives under arch/x86, and is
tightly coupled to the architecture.
Ultimately, my plan is to split the existing resctrl code up to have an
arch<->fs abstraction, then move all the bits out to fs/resctrl. From there
MPAM can be wired up.

x86 might have two resources with cache controls, (L2 and L3) but has
extra copies for CDP: L{2,3}{CODE,DATA}, which are marked as enabled
if CDP is enabled for the corresponding cache.

MPAM has an equivalent feature to CDP, but its a property of the CPU,
not the cache. Resctrl needs to have x86's odd/even behaviour, as that
its the ABI, but this isn't how the MPAM hardware works. It is entirely
possible that an in-kernel user of MPAM would not be using CDP, whereas
resctrl is.
Pretending L3CODE and L3DATA are entirely separate resources is a neat
trick, but doing this is specific to x86.
Doing this leaves the arch code in control of various parts of the
filesystem ABI: the resources names, and the way the schemata are parsed.
Allowing this stuff to vary between architectures is bad for user space.


This series collapses the CODE/DATA resources, moving all the user-visible
resctrl ABI into the filesystem code. CDP becomes the type of configuration
being applied to a cache. This is done by adding a struct resctrl_schema to
the parts of resctrl that will move to fs. This holds the arch-code resource
that is in use for this schema, along with other properties like the name,
and whether the configuration being applied is CODE/DATA/BOTH.

This lets us fold the extra resources out of the arch code so that they
don't need to be duplicated if the equivalent feature to CDP is missing, or
implemented in a different way.


The first two patches split the resource and domain structs to have an
arch specific 'hw' portion, and the rest that is visible to resctrl.
Future series massage the resctrl code so there are no accesses to 'hw'
structures in the parts of resctrl that will move to fs, providing helpers
where necessary.


Since anyone last looked at this, the CDP property has been made per-resource
instead of global. MPAM will need to make this global in the arch code, as
CODE/DATA closid are based on how the CPU tags traffic, not how the cache
interprets it. resctrl sets CDP enabled on a resource, but reads it back on
each one.
The attempt to keep closids as-used-by-resctrl and closids as-written-to-hw
appart has been dropped.
There are two copies of num_closid. The version private to the arch code is
the value discovered from hardware. resctrl has its own version, which it
may write to, which is exposed to user-space. This lets resctrl do its
odd/even thing, even if thats not how the hardware works.

This series adds temporary scaffolding, which it removes a few patches
later. This is to allow things like the ctrlval arrays and resources to be
merged separately, which should make is easier to bisect. These things
are marked temporary, and should all be gone by the end of the series.

This series is a little rough around the monitors, would a fake
struct resctrl_schema for the monitors simplify things, or be a source
of bugs?

This series is based on v5.10-rc1, and can be retrieved from:
git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git 
mpam/resctrl_merge_cdp/v1

Parts were previously posted as an RFC here:
https://lore.kernel.org/lkml/20200214182947.39194-1-james.mo...@arm.com/


Thanks,

James Morse (24):
  x86/resctrl: Split struct rdt_resource
  x86/resctrl: Split struct rdt_domain
  x86/resctrl: Add resctrl_arch_get_num_closid()
  x86/resctrl: Add a separate schema list for resctrl
  x86/resctrl: Pass the schema in resdir's private pointer
  x86/resctrl: Store the effective num_closid in the schema
  x86/resctrl: Label the resources with their configuration type
  x86/resctrl: Walk the resctrl schema list instead of an arch list
  x86/resctrl: Change rdt_resource to resctrl_schema in
pseudo_lock_region
  x86/resctrl: Move the schema names into struct resctrl_schema
  x86/resctrl: Group staged configuration into a separate struct
  x86/resctrl: Add closid to the staged config
  x86/resctrl: Allow different CODE/DATA configurations to be staged
  x86/resctrl: Make update_domains() learn the affected closids
  x86/resctrl: Add a helper to read a closid's configuration
  x86/resctrl: Add a helper to read/set the CDP configuration
  x86/resctrl: Use cdp_enabled in rdt_domain_reconfigure_cdp()
  x86/resctrl: Pass configuration type to resctrl_arch_get_

Re: Queries on ARM SDEI Linux kernel code

2020-10-30 Thread James Morse

Hi Neeraj,

On 21/10/2020 18:31, Neeraj Upadhyay wrote:
> On 10/16/2020 9:57 PM, James Morse wrote:
>> On 15/10/2020 07:07, Neeraj Upadhyay wrote:
>>> 1. Looks like interrupt bind interface (SDEI_1_0_FN_SDEI_INTERRUPT_BIND) is 
>>> not available
>>> for clients to use; can you please share information on
>>> why it is not provided?
>>
>> There is no compelling use-case for it, and its very complex to support as 
>> the driver can
>> no longer hide things like hibernate.
>>
>> Last time I looked, it looked like the SDEI driver would need to ask the 
>> irqchip to
>> prevent modification while firmware re-configures the irq. I couldn't work 
>> out how this
>> would work if the irq is in-progress on another CPU.

> Got it. I will think in this direction, on how to achieve this.

I'm really not keen on supporting it! Its basically unusable.


>> The reasons to use bound-interrupts can equally be supported with an event 
>> provided by
>> firmware.
>>
>>
> Ok, I will explore in that direction.

Great!


[...]

>> Ideally the driver would register the event, and provide a call_on_cpu() 
>> helper to trigger
>> it. This should fit in with however the GIC's PMR based NMI does its PPI 
>> based
>> crash/stacktrace call so that the caller doesn't need to know if its back by 
>> IRQ, pNMI or
>> SDEI.

> Ok; I will explore how PMR based NMIs work; I thought it was SGI based. But 
> will recheck.

This is where the recent work has been. (One of) Julien's cover-letter's 
describes it as
supporting PPI and SPI: https://lwn.net/Articles/755906/


>>> 3. Can kernel panic() be triggered from sdei event handler?
>>
>> Yes,
>>
>>
>>> Is it a safe operation?
>>
>> panic() wipes out the machine... did you expect it to keep running?
> 
> I wanted to check the case where panic triggers kexec/kdump path into capture 
> kernel.
> 
>> What does safe mean here?

> I think I didn't put it correctly; I meant what possible scenarios can 
> happen in this case and you explained one below, thanks!

Ah, kdump. You will certainly get into the kdump kernel, but I think the SDEI 
reset calls
will fail as there is still an event in progress, so the kernel will leave it 
masked to
prevent any new events being taken.
This shouldn't affect kdumps work of dumping memory, and calling reset.


>> You should probably call nmi_panic() if there is the risk that the event 
>> occurred during
>> panic() on the same CPU, as it would otherwise just block.
>>
>>
>>> The spec says, synchronous exceptions should not be triggered; I think panic
>>> won't do it; but anything which triggers a WARN
>>> or other sync exception in that path can cause undefined behavior. Can you 
>>> share your
>>> thoughts on this?
>>
>> What do you mean by undefined behaviour?

> I was thinking, if SDEI event preempts EL1, at the point, where EL1 has just 
> entered an
> exception, and hasn't captured the registers like spsr_el1, elr_el1 and other 
> registers,
> what will be the behavior?

Exceptions to/from EL3 don't affect them, so those registers keep their value 
until the
next exception taken by EL1 (which overwrites them), or ERET from EL1 (which 
makes them
UNKNOWN). Kdump may change exception level on nVHE systems to do a reset.

If you need them for kdump, they should be saved in the crash handler... they 
aren't
needed in the general case as we learn what those values were by unwinding the 
stack,
which is also true for SDEI. (it gets the original PC from firmware to build a 
stack frame)

[...]

>>> "The handler code should not enable asynchronous exceptions by clearing any 
>>> of the
>>> PSTATE.DAIF bits, and should not cause synchronous exceptions to the client 
>>> Exception
>>> level."
>>
>>
>> What are you using this thing for?

> Usecase is, a watchdog SPI interrupt, which we want to bound to a SDEI event. 
> Below is the
> flow:
> 
> wdog expiry -> SDEI event -> HLOS panic -> trigger kexec/kdump

Having a common interface to this would be a good thing, that way firmware can 
hide how
its implemented.


Thanks,

James

Re: Queries on ARM SDEI Linux kernel code

2020-10-16 Thread James Morse

Hi Neeraj,

On 15/10/2020 07:07, Neeraj Upadhyay wrote:
> 1. Looks like interrupt bind interface (SDEI_1_0_FN_SDEI_INTERRUPT_BIND) is 
> not available
> for clients to use; can you please share information on
> why it is not provided?

There is no compelling use-case for it, and its very complex to support as the 
driver can
no longer hide things like hibernate.

Last time I looked, it looked like the SDEI driver would need to ask the 
irqchip to
prevent modification while firmware re-configures the irq. I couldn't work out 
how this
would work if the irq is in-progress on another CPU.

The reasons to use bound-interrupts can equally be supported with an event 
provided by
firmware.

> While trying to dig information on this, I saw  that [1] says:
>   Now the hotplug callbacks save  nothing, and restore the OS-view of 
> registered/enabled.
> This makes bound-interrupts harder to work with.

> Based on this comment, the changes from v4 [2], which I could understand is, 
> cpu down path
> does not save the current event enable status, and we rely on the enable 
> status
> `event->reenable', which is set, when register/unregister, enable/disable 
> calls are made;
> this enable status is used during cpu up path, to decide whether to reenable 
> the interrupt.

> Does this make, bound-interrupts harder to work with? how? Can you please 
> explain? Or
> above save/restore is not the reason and you meant something else?

If you bind a level-triggered interrupt, how does firmware know how to clear 
the interrupt
from whatever is generating it?

What happens if the OS can't do this either, as it needs to allocate memory, or 
take a
lock, which it can't do in nmi context?

The people that wrote the SDEI spec's answer to this was that the handler can 
disable the
event from inside the handler... and firmware will do, something, to stop the 
interrupt
screaming.

So now an event can become disabled anytime its registered, which makes it more
complicated to save/restore.

> Also, does shared bound interrupts 

Shared-interrupts as an NMI made me jump. But I think you mean a bound 
interrupt as a
shared event. i.e. and SPI not a PPI.

> also have the same problem, as save/restore behavior
> was only for private events?

See above, the problem is the event disabling itself.

Additionally those changes to unregister the private-event mean the code can't 
tell the
difference between cpuhp and hibernate... only hibernate additionally loses the 
state in
firmware.

> 2. SDEI_EVENT_SIGNAL api is not provided? What is the reason for it? Its 
> handling has the
> same problems, which are there for bound interrupts?

Its not supported as no-one showed up with a use-case.
While firmware is expected to back it with a PPI, its doesn't have the same 
problems as
bound-interrupts as its not an interrupt the OS ever knows about.

> Also, if it is provided, clients need to register event 0 ? Vendor events or 
> other event
> nums are not supported, as per spec.

Ideally the driver would register the event, and provide a call_on_cpu() helper 
to trigger
it. This should fit in with however the GIC's PMR based NMI does its PPI based
crash/stacktrace call so that the caller doesn't need to know if its back by 
IRQ, pNMI or
SDEI.

> 3. Can kernel panic() be triggered from sdei event handler?

Yes,

> Is it a safe operation?

panic() wipes out the machine... did you expect it to keep running?
What does safe mean here?

You should probably call nmi_panic() if there is the risk that the event 
occurred during
panic() on the same CPU, as it would otherwise just block.

> The spec says, synchronous exceptions should not be triggered; I think panic
> won't do it; but anything which triggers a WARN
> or other sync exception in that path can cause undefined behavior. Can you 
> share your
> thoughts on this?

What do you mean by undefined behaviour?

SDEI was originally to report external abort to the OS in regions where the OS 
can't take
an exception because the exception-registers are live, just after and exception 
and just
before eret.

If you take another exception from the NMI handler, chances are you're going to 
go back
round the loop again, only this time firmware can't inject the SDEI event, so 
it has to
reboot.

If you know it might cause an exception, you shouldn't do it in NMI context.

> "The handler code should not enable asynchronous exceptions by clearing any 
> of the
> PSTATE.DAIF bits, and should not cause synchronous exceptions to the client 
> Exception level."

What are you using this thing for?

Thanks,

James

Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-07 Thread James Morse

Hi Shiju,

On 06/10/2020 17:13, Shiju Jose wrote:

[...]

> Please find following pseudo code we added for the kernel side to make sure
> we correctly understand your suggestions.
> 
> 1. Create edac device and edac device sysfs entries for the online CPU caches.
> /drivers/edac/edac_device.c
> struct edac_device_ctl_info  *edac_device_add_cache(unsigned int id, u8 
> level, u8 type) {

Eh? Ah, you are adding helpers for devices that are a cache. As far as I can 
see, edac
only cares about 'devices', I don't think this is needed unless there are 
multiple users,
or it makes a visible difference to user-space.

Otherwise this could just go into ghes_edac.c

How this looks to user-space probably needs discussing. We should avoid 
inventing anything
new. I'd expect user-space to see something like the structure described at the 
top of
edac_device.h... but I can't spot a driver using this.
(its a shame its not the other way up, to avoid duplicating shared caches)

Some archaeology may be needed!

(if there is some existing structure, I agree it should be wrapped up in 
helpers to ensure
its easiest to be the same. This may be what a edac_device_block is...)

>  }

> /drivers/base/cacheinfo.c
> int cache_create_edac_entries(u64 mpidr, u8 cache_level, u8 cache_type)
> { 
>   ...
>   /* Get cacheinfo for each online cpus */
>   for_each_online_cpu(i) {
>   struct cpu_cacheinfo *cpu_ci = get_cpu_cacheinfo(i);

I agree the structure of the caches should come from cacheinfo, and you spotted 
it only
works for online CPUs!. This means there is an interaction with cpuhp here)

>   if (!cpu_ci || !cpu_ci->id)

cpu->id?  0 is a valid id, there is an attribute flag to say this is valid.
This field exists in struct cacheinfo, not struct cpu_cacheinfo.

>   continue;
>   ... 
>   /*Add  the edac entry for the CPU cache */
>   edev_cache = edac_device_add_cache(cpu_ci->id, cpu_ci ->level, 
> cpu_ci ->type)
>   if (!edev_cache)
>   break;

This would break all other edac users.
The edac driver for the platform should take care of creating this stuff, not 
the core
cacheinfo code.

The edac driver for the platform may know that L2 doesn't report RAS errors, so 
there is
no point exposing it.

For firmware-first, we can't know this until an error shows up, so have to 
create
everything. This stuff should only be created/exported when ghes_edac.c is 
determined to
be this platforms edac driver. This code should live in ghes_edac.c.

>   ...
>   }
>   ... 
> }

> unsigned int cache_get_cache_id(u64 proc_id, u8 cache_level, u8 cache_type)

See get_cpu_cacheinfo_id(int cpu, int level) in next. (something very similar 
to this
lived in arch/x86, bits of the MPAM tree that moved it got queued for next)

> { 
>   unsigned int cache_id = 0;
>   ...
>   /* Walk looking for matching cache node */   
>   for_each_online_cpu(i) {

(there is an interaction with cpuhp here)

>   struct cpu_cacheinfo *cpu_ci = get_cpu_cacheinfo(i);
>   if (!cpu_ci || !cpu_ci->id)
>   continue;

>   id = CONV(proc_id);  /* need to check */

No idea what is going on here.

(Deriving an ID from the CPU_s_ that are attached to the cache is arm64 
specific. This has
to work for x86 too.
The MPAM out-of-tree code does this as we don't have anything else. Feedback 
when it was
posted as RFC was that the id values should be compacted, I was hoping we would 
get
something like an id from the PPTT before needing this value as resctrl ABI for 
MPAM)

>   if((id == cpu_ci->id) && (cache_level == cpu_ci->level) && 
> (cache_type == cpu_ci->type))  {
>   cache_id = cpu_ci->id;
>   break;
>   }
>   }
>   return cache_id;
> }

> 2. Store CPU CE count in the edac sysfs entry for the CPU cache.
> 
> drivers/edac/ghes_edac.c
> void ghes_edac_report_cpu_error(int cache_id, u8 cache_level, u8 cache_type , 
> uint32 ce_count)
> {
>   ...
>   /* Check edac entry for cache already present, if not add new entry */  
>  

You can't create devices at runtime! The notification comes in irq context, and
edac_device_add_device() takes a mutex, and you need to allocate memory.

This could be deferred to process context - but I bet its a nuisance for 
user-space to now
know what counters are available until errors show up.

>   edev_cache = find_edac_device_cache(cache_id, cache_level, cache_type);
>   if (!edev_cache) {
>   /*Add  the edac entry for the cache */
>   edev_cache = edac_device_add_cache(cache_id, cache_level, 
> cache_type);
>   if (!edev_cache)
>   return;
>   }

Better would be to lookup the device based on the CPER. (It already looks up 
the DIMM
based on the CPER)

>   /* Store

Re: [PATCH 2/2] arm64: Add support for SMCCC TRNG firmware interface

2020-10-07 Thread James Morse

Hi Andre,

On 06/10/2020 21:18, Andre Przywara wrote:
> The ARM architected TRNG firmware interface, described in ARM spec
> DEN0098[1], defines an ARM SMCCC based interface to a true random number
> generator, provided by firmware.
> This can be discovered via the SMCCC >=v1.1 interface, and provides
> up to 192 bits of entropy per call.
> 
> Hook this SMC call into arm64's arch_get_random_*() implementation,
> coming to the rescue when the CPU does not implement the ARM v8.5 RNG
> system registers.
> 
> For the detection, we piggy back on the PSCI/SMCCC discovery (which gives
> us the conduit to use: hvc or smc), then try to call the
> ARM_SMCCC_TRNG_VERSION function, which returns -1 if this interface is
> not implemented.

>  arch/arm64/include/asm/archrandom.h | 83 +
>  1 file changed, 73 insertions(+), 10 deletions(-)

> diff --git a/arch/arm64/include/asm/archrandom.h 
> b/arch/arm64/include/asm/archrandom.h
> index ffb1a40d5475..b6c291c42a48 100644
> --- a/arch/arm64/include/asm/archrandom.h
> +++ b/arch/arm64/include/asm/archrandom.h
> @@ -7,6 +7,13 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
> +static enum smc_trng_status {
> + SMC_TRNG_UNKNOWN,
> + SMC_TRNG_NOT_SUPPORTED,
> + SMC_TRNG_SUPPORTED
> +} smc_trng_status = SMC_TRNG_UNKNOWN;

Doesn't this static variable in a header file mean each file that includes this 
has its
own copy? Is that intentional?


Thanks,

James

Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on short time period

2020-10-02 Thread James Morse

Hi Shiju,

On 02/10/2020 16:38, Shiju Jose wrote:
>> -Original Message-
>> From: Borislav Petkov [mailto:b...@alien8.de]
>> Sent: 02 October 2020 13:44
>> To: Shiju Jose 
>> Cc: linux-e...@vger.kernel.org; linux-a...@vger.kernel.org; linux-
>> ker...@vger.kernel.org; tony.l...@intel.com; r...@rjwysocki.net;
>> james.mo...@arm.com; l...@kernel.org; Linuxarm
>> 
>> Subject: Re: [RFC PATCH 0/7] RAS/CEC: Extend CEC for errors count check on
>> short time period
>>
>> On Fri, Oct 02, 2020 at 01:22:28PM +0100, Shiju Jose wrote:
>>> Open Questions based on the feedback from Boris, 1. ARM processor
>>> error types are cache/TLB/bus errors.
>>>[Reference N2.4.4.1 ARM Processor Error Information UEFI Spec v2.8]
>>> Any of the above error types should not be consider for the error
>>> collection and CPU core isolation?

Boris' earlier example was that Bus errors have very little to do with the CPU. 
It may
just be that this CPU is handling the IRQs for a fault device, and thus 
receiving the
errors. irqbalance could change that anytime.

I'd prefer we just stick with the caches for now.


>>> 2.If disabling entire CPU core is not acceptable, please suggest
>>> method to disable L1 and L2 cache on ARM64 core?

This is not something linux can do. It may not be possible to do it all.


>> More open questions:
>>
>>> This requirement is the part of the early fault prediction by taking
>>> action when large number of corrected errors reported on a CPU core
>>> before it causing serious faults.
>>
>> And do you know of actual real-life examples where this is really the case? 
>> Do
>> you have any users who report a large error count on ARM CPUs, originating
>>from the caches and that something like that would really help?
>>
>> Because from my x86 CPUs limited experience, the cache arrays are mostly
>> fine and errors reported there are not something that happens very
>> frequently so we don't even need to collect and count those.
>>
>> So is this something which you need to have in order to check a box
>> somewhere that there is some functionality or is there an actual real-life 
>> use
>> case behind it which a customer has requested?

> We have not got a real-life example for this case. However rare errors
> like this can occur frequently sometimes at scale, which would cause
> more serious issues if not handled.

Don't you need to look across all your 'at scale' machines to know what normal 
looks like?

I can't see how a reasonable prediction can be made from just one machine's 
behaviour
since boot. These are corrected errors, nothing has gone wrong.


>> Open question from James with my reply to it:
>>
>> On Thu, Oct 01, 2020 at 06:16:03PM +0100, James Morse wrote:
>>> If the corrected-count is available somewhere, can't this policy be
>>> made in user-space?

> The error count is present in the struct cper_arm_err_info, the fields of
> this structure  are not reported to the user-space through trace events?

> Presently the fields of table struct cper_sec_proc_arm only are reported 
> to the user-space through trace-arm-event.
> Also there can be multiple cper_arm_err_info per cper_sec_proc_arm.
> Thus I think this need reporting through a new trace event?

I think it would be more useful to feed this into edac like ghes.c already does 
for memory
errors. These would end up as corrected errors counts on devices for L3 or 
whatever.

This saves fixing your user-space component to the arm specific CPER record 
format, or
even firmware-first, meaning its useful to the widest number of people.


> Also the logical index of a CPU which I think need to extract from the 
> 'mpidr' field of
> struct cper_sec_proc_arm using platform dependent kernel function 
> get_logical_index().
> Thus cpu index also need to report to the user space.

I thought you were talking about caches. These structures have a 'level' for 
cache errors.

Certainly you need a way of knowing which cache it is, and from that number you 
should
also be able to work out which the CPUs it is attached to.

x86 already has a way of doing this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/resctrl_ui.rst#n327

arm64 doesn't have anything equivalent, but my current proposal for MPAM is 
here:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git/commit/?h=mpam/snapshot/feb=ce3148bd39509ac8b12f5917f0f92ce014a5b22f

I was hoping the PPTT table would grow something we could use as an ID, but 
I've not seen
anything yet.


>> You mean rasdaemon goes and offlines CPUs when certain thresholds are
>> reached? Sure. It would be much more flexible too.

Re: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate an erroneous CPU core

2020-10-01 Thread James Morse

Hi guys,

On 17/09/2020 09:40, Borislav Petkov wrote:
> On Thu, Sep 10, 2020 at 03:29:56PM +, Shiju Jose wrote:

> You can't know what exactly you wanna do if you don't have a use case
> you're trying to address.
> 
>> According to the ARM Processor CPER definition the error types
>> reported are Cache Error, TLB Error, Bus Error and micro-architectural
>> Error.
> 
> Bus error sounds like not even originating in the CPU but the CPU only
> reporting it. Imagine if that really were the case, and you go disable
> the CPU but the error source is still there. You've just disabled the
> reporting of the error only and now you don't even know anymore that
> you're getting errors.
> 
>> Few thoughts on this,
>> 1. Not sure will a CPU core would work/perform as normal after disabling
>> a functional unit?
> 
> You can disable parts of caches, etc, so that you can have a somewhat
> functioning CPU until the replacement maintenance can take place.

This is implementation-specific stuff that only firmware can do...


>> 2. Support in the HW to disable a function unit alone may not available.
> 
> Yes.
> 
>> 3. If it is require to store and retrieve the error count based on
>> functional unit, then CEC will become more complex?
> 
> Depends on how it is designed. That's why we're first talking about what
> needs to be done exactly before going off and doing something.
> 
>> This requirement is the part of the early fault prediction by taking
>> action when large number of corrected errors reported on a CPU core
>> before it causing serious faults.
> 
> And do you know of actual real-life examples where this is really the
> case? Do you have any users who report a large error count on ARM CPUs,
> originating from the caches and that something like that would really
> help?
> 
> Because from my x86 CPUs limited experience, the cache arrays are mostly
> fine and errors reported there are not something that happens very
> frequently so we don't even need to collect and count those.
> 
> So is this something which you need to have in order to check a box
> somewhere that there is some functionality or is there an actual
> real-life use case behind it which a customer has requested?

If the corrected-count is available somewhere, can't this policy be made in 
user-space?


Thanks,

James

Re: [PATCH v3] ACPI / APEI: do memory failure on the physical address reported by ARM processor error section

2020-10-01 Thread James Morse

Hi Tanxiaofei,

(sorry for the late reply)

On 28/09/2020 03:02, Xiaofei Tan wrote:
> After the commit 8fcc4ae6faf8 ("arm64: acpi: Make apei_claim_sea()
> synchronise with APEI's irq work") applied, do_sea() return directly
> for user-mode if apei_claim_sea() handled any error record. Therefore,
> each error record reported by the user-mode SEA must be effectively
> processed in APEI GHES driver.
> 
> Currently, GHES driver only processes Memory Error Section.(Ignore PCIe
> Error Section, as it has nothing to do with SEA). It is not enough.
> Because ARM Processor Error could also be used for SEA in some hardware
> platforms, such as Kunpeng9xx series. We can't ask them to switch to
> use Memory Error Section for two reasons:
> 1)The server was delivered to customers, and it will introduce
> compatibility issue.

> 2)It make sense to use ARM Processor Error Section. Because either
> cache or memory errors could generate SEA when consumed by a processor.
> 
> Do memory failure handling for ARM Processor Error Section just like
> for Memory Error Section.


> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 99df00f..ca0aa97 100644
> --- a/drivers/acpi/apei/ghes.
> +++ b/drivers/acpi/apei/ghes.c
> @@ -441,28 +441,35 @@ static void ghes_kick_task_work(struct callback_head 
> *head)

> +static bool ghes_handle_arm_hw_error(struct acpi_hest_generic_data *gdata, 
> int sev)
> +{
> + struct cper_sec_proc_arm *err = acpi_hest_get_payload(gdata);
> + struct cper_arm_err_info *err_info;
> + bool queued = false;
> + int sec_sev, i;
> +
> + log_arm_hw_error(err);
> +
> + sec_sev = ghes_severity(gdata->error_severity);
> + if (sev != GHES_SEV_RECOVERABLE || sec_sev != GHES_SEV_RECOVERABLE)
> + return false;
> +
> + err_info = (struct cper_arm_err_info *) (err + 1);
> + for (i = 0; i < err->err_info_num; i++, err_info++) {

err_info has its own length, could we use that in case someone comes up with a 
new table
version? (like this, old versions of the kernel will read mis-aligned 
structures)


> + if (!(err_info->validation_bits & 
> CPER_ARM_INFO_VALID_PHYSICAL_ADDR))
> + continue;
> +
> + if (err_info->type != CPER_ARM_CACHE_ERROR) {
> + pr_warn_ratelimited(FW_WARN GHES_PFX
> + "Physical address should be invalid for %s\n",

Should? A bus-error could have a valid physical address. I can't see anything 
in the spec
that forbids this. In general we shouldn't try to validate what firmware is 
doing.


> + err_info->type < ARRAY_SIZE(cper_proc_error_type_strs) ?
> + cper_proc_error_type_strs[err_info->type] : "unknown 
> error type");
> + continue;
> + }

I think we should warn for the cases this handler doesn't cover, but we should 
try to
catch all of them. e.g:

|   bool is_cache = (err_info->type == CPER_ARM_CACHE_ERROR);
|   bool has_pa = (err_info->validation_bits & 
CPER_ARM_INFO_VALID_PHYSICAL_ADDR)
|
|   if (!is_cache || !has_pa) {
|   pr_warn_ratelimited(..."Unhandled processor error type %s\n", 
...);
|   continue;
|   }


For cache errors, (err_info->error_info & BIT(26)) has its own 
corrected/uncorrected flag.
You filter out 'overall corrected' section types earlier, could you check this 
error
record before invoking memory_failure()?

(sections may contain a set of errors. I'm not convinced a 'corrected section' 
can't
contain latent uncorrected errors, it just means the machine didn't need that 
data yet)



> + if (ghes_do_memory_failure(err_info->physical_fault_addr, 0))
> + queued = true;

May as well:
|   return ghes_do_memory_failure(...);


> + }
> +
> + return queued;

(and make this:
|   return false
)

> +}



Thanks,

James

Re: [PATCH v3 08/16] irqchip/gic: Configure SGIs as standard interrupts

2020-09-18 Thread James Morse

Hi Marc,

(CC: +Jon)

On 01/09/2020 15:43, Marc Zyngier wrote:
> Change the way we deal with GIC SGIs by turning them into proper
> IRQs, and calling into the arch code to register the interrupt range
> instead of a callback.

Your comment "This only works because we don't nest SGIs..." on this thread 
tripped some
bad memories from adding the irq-stack. Softirq causes us to nest irqs, but 
only once.

(I've messed with the below diff to remove the added stuff:)

> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
> index 4ffd62af888f..4be2b62f816f 100644
> --- a/drivers/irqchip/irq-gic.c
> +++ b/drivers/irqchip/irq-gic.c
> @@ -335,31 +335,22 @@ static void __exception_irq_entry gic_handle_irq(struct 
> pt_regs *regs)
>   irqstat = readl_relaxed(cpu_base + GIC_CPU_INTACK);
>   irqnr = irqstat & GICC_IAR_INT_ID_MASK;
>  
> - if (likely(irqnr > 15 && irqnr < 1020)) {
> - if (static_branch_likely(_deactivate_key))
> - writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
> - isb();
> - handle_domain_irq(gic->domain, irqnr, regs);
> - continue;
> - }
> - if (irqnr < 16) {
>   writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI);
> - if (static_branch_likely(_deactivate_key))
> - writel_relaxed(irqstat, cpu_base + 
> GIC_CPU_DEACTIVATE);
> -#ifdef CONFIG_SMP
> - /*
> -  * Ensure any shared data written by the CPU sending
> -  * the IPI is read after we've read the ACK register
> -  * on the GIC.
> -  *
> -  * Pairs with the write barrier in gic_raise_softirq
> -  */
>   smp_rmb();
> - handle_IPI(irqnr, regs);

If I read this right, previously we would EOI the interrupt before calling 
handle_IPI().
Where as now with the version of this series in your tree, we stuff the 
to-be-EOId value
in a percpu variable, which is only safe if these don't nest.

Hidden in irq_exit(), kernel/softirq.c::__irq_exit_rcu() has this:
|   preempt_count_sub(HARDIRQ_OFFSET);
|   if (!in_interrupt() && local_softirq_pending())
|   invoke_softirq();

The arch code doesn't raise the preempt counter by HARDIRQ, so once 
__irq_exit_rcu() has
dropped it, in_interrupt() returns false, and we invoke_softirq().

invoke_softirq() -> __do_softirq() -> local_irq_enable()!

Fortunately, __do_softirq() raises the softirq count first using 
__local_bh_disable_ip(),
which in-interrupt() checks too, so this can only happen once per IRQ.

Now the irq_exit() has moved from handle_IPI(), which ran after EOI, into
handle_domain_irq(), which runs before. I think its possible SGIs nest, and the 
new percpu
variable becomes corrupted.

Presumably this isn't a problem for regular IRQ, as they don't need the 
sending-CPU in
order to EOI, which is why it wasn't a problem before.

Adding anything to preempt-count around the whole thing upsets RCU, and softirq 
seems to
expect this nesting, but evidently the gic does not. I'm not sure what the 
right thing to
do would be. A dirty hack like [0] would confirm the theory.

/me runs

Thanks,

James

[0] A dirty hack
---%<---
diff --git a/kernel/softirq.c b/kernel/softirq.c
index bf88d7f62433..50e14d8cbec3 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -376,7 +376,7 @@ static inline void invoke_softirq(void)
if (ksoftirqd_running(local_softirq_pending()))
return;

-   if (!force_irqthreads) {
+   if (false) {
 #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
/*
 * We can safely execute softirq on the current stack if
@@ -393,6 +393,7 @@ static inline void invoke_softirq(void)
do_softirq_own_stack();
 #endif
} else {
+   /* hack: force this */
wakeup_softirqd();
}
 }
---%<---

Re: [PATCH] arm64: traps: clean up arm64_ras_serror_get_severity()

2020-08-25 Thread James Morse

Hi Zhang,

On 12/08/2020 12:09, Liguang Zhang wrote:
> Function arm64_is_fatal_ras_serror() is always called after
> arm64_is_ras_serror(), so we should remove some needless
> arm64_is_ras_serror() call in function arm64_ras_serror_get_severity().

> diff --git a/arch/arm64/include/asm/traps.h b/arch/arm64/include/asm/traps.h
> index cee5928e1b7d..287b4d64dc67 100644
> --- a/arch/arm64/include/asm/traps.h
> +++ b/arch/arm64/include/asm/traps.h
> @@ -79,13 +79,6 @@ static inline bool arm64_is_ras_serror(u32 esr)
>   */
>  static inline u32 arm64_ras_serror_get_severity(u32 esr)
>  {
> - u32 aet = esr & ESR_ELx_AET;
> -
> - if (!arm64_is_ras_serror(esr)) {
> - /* Not a RAS error, we can't interpret the ESR. */
> - return ESR_ELx_AET_UC;
> - }

I agree this can go, it looks like I had it here as a sanity check while the 
KVM bits were
sorted out.

Please also remove the comment that says it does this:
| * Non-RAS SError's are reported as Uncontained/Uncategorized.

This becomes the callers problem.

>   /*
>* AET is RES0 if 'the value returned in the DFSC field is not
>* [ESR_ELx_FSC_SERROR]'

> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 13ebd5ca2070..635d4cca0a4b 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -913,7 +913,7 @@ bool arm64_is_fatal_ras_serror(struct pt_regs *regs, 
> unsigned int esr)
>   case ESR_ELx_AET_UC:/* Uncontainable or Uncategorized error */
>   default:
>   /* Error has been silently propagated */
> - arm64_serror_panic(regs, esr);
> + return true;

KVM depends on this, please don't remove it.

What does 'fatal' mean?
To the arch code it means panic(), as we don't (yet) have the information to 
fix the
error. But to KVM 'fatal' means kill-the-guest. KVM does this as without 
user-space's
involvement, there is very little else it can do.

KVM can only do this if the error is contained. As it must have been contained 
by stage2,
so the host can keep running. But if the error was reported as uncontained, KVM 
would need
to panic() the host.

(An example of an Uncontained error is a store that went to the wrong address 
due to
corruption that wasn't caught in time. We can't trust any value in memory once 
we've seen
one of these.)

I agree it looks funny, but it was simpler for the arch code helper to do this, 
instead of
having an 'arm64_is_uncontained_ras_serror(), as now you'd always have to check 
three things.

>   }
>  }

Thanks,

James

Re: [EXT] Re: [PATCH] edac: nxp: Add L1 and L2 error detection for A53 and A72 cores

2020-08-25 Thread James Morse

Hi Alison,

On 25/08/2020 03:31, Alison Wang wrote:
>> On 09/07/2020 09:22, Alison Wang wrote:
>>> Add error detection for A53 and A72 cores. Hardware error injection is
>>> supported on A53. Software error injection is supported on both.
>>
> 
>>
>> As we can't safely write to these registers from linux, so I think this 
>> means all
>> the error injection and maybe SMC stuff can disappear.

> I agreed with your opinion that CPUACTLR_EL1 and L2ACTLR can't be written in 
> Linux.

Well, we can't do what the TRM tells us we must before writing to that register.

> So the error injection can't be done in Linux. Do you mean the error 
> injection can
> only be done in firmware before Linux boots up? If so, the system is running 
> with error
> injection enabled all the time, it may be not a good idea too. Any suggestion?

These registers are expected to have one value, forever. The errata document 
sometimes
tells us to to set or clear one of these bits to workaround an issue. Because 
they can
only be written to when the system is idle, typically during boot, this is 
firmware's
responsibility.

I expect firmware to set the bits in ACTLR_EL3, to prevent lower exception 
levels from
touching any of these registers.

I don't know how the error injection on A53 or A72 works, so I don't know if 
you can leave
it enabled all the time. The bit you are setting is described as RES0 by the 
A53 and A72
TRMs. I suspect I had the wrong TRM open, as my 'L1DEIEN' comment seems to be 
what your
CPUACTLR_EL1[6] is called on A35. (35, 53? Guess how that happened!)

A35's error injection says:
| While this bit is set, double-bit errors are injected on all writes to the L1 
D-cache
| data RAMs for the first word of each 32-byte region.

You certainly can't leave this sort of thing enabled! And you can't change it 
at runtime,
so we can't use it.

I think features like this are intended to be used to check the integration, 
not to test
the software.

After I sent the original comments on this, I found Sascha's version, which has 
these
issues resolved:
https://lore.kernel.org/linux-arm-kernel/20200813075721.27981-1-s.ha...@pengutronix.de/

I think this version should work on your platform too.

Thanks,

James

Re: [PATCH] edac: nxp: Add L1 and L2 error detection for A53 and A72 cores

2020-08-21 Thread James Morse

Hi Alison,

On 09/07/2020 09:22, Alison Wang wrote:
> Add error detection for A53 and A72 cores. Hardware error injection is
> supported on A53. Software error injection is supported on both.

> For hardware error injection on A53 to work, proper access to
> L2ACTLR_EL1, CPUACTLR_EL1 needs to be granted by EL3 firmware.

Not just hardware error injection, any access to these registers needs to be 
granted by
each higher exception level. If you run as a KVM guest, all access to these
implementation-defined registers is disabled.

This means your driver doesn't work on:
| compatible = "arm,cortex-a53-edac" or "arm,cortex-a72-edac",

as it also depends on firmware settings.

Writing to CPUACTLR_EL1 isn't something we can do in linux. The TRM has this 
clanger:
| The CPU Auxiliary Control Register can be written only when the system is 
idle. ARM
| recommends that you write to this register after a powerup reset, before the 
MMU is
| enabled, and before any ACE or ACP traffic begins.

We can't make the system idle from linux. Only firmware can do this.

The same goes for L2ACTLR.

> This is
> done by making an SMC call in the driver. Failure to enable access
> disables hardware error injection. For error detection to work, another
> SMC call enables access to L2ECTLR_EL1.

Ewww. Surely this should either be enabled, or disabled. What is the point of 
letting the
OS toggle it?

Using these registers, you can do dangerous things like turn the L2 cache off. 
Does your
platform have any resident secure-world software?

> It is for NXP's Layerscape family LS1043A, LS1046A, LS2088A and LX2160A.

Please ensure your driver probes from the top-level platform compatible. This 
is the only
way to know the platform has your firmware, with the special settings and magic 
SMC call
that the driver depends on.

> Signed-off-by: York Sun 
> Signed-off-by: Alison Wang 

Who is the author of this patch?

If the first Signed-of-by tag isn't for the person posting the patch, there 
should be a
'From:' line in the patch so that git picks up the author properly when the 
maintainer
applies the patch.
If you set the author of the patch correctly in git, git format-patch will do 
the right
thing for you.

> diff --git a/Documentation/devicetree/bindings/edac/cortex-arm64-edac.txt 
> b/Documentation/devicetree/bindings/edac/cortex-arm64-edac.txt> new file mode 
> 100644
> index ..41c840993814
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/cortex-arm64-edac.txt

Binding patches should be in a separate patch, that appears first in the 
series. The whole
series should be copied to the devicetree mailing list: 
devicet...@vger.kernel.org

You should also run get-maintainers.pl to ensure you copy the devicetree 
maintainers.

> @@ -0,0 +1,40 @@
> +ARM Cortex A53 and A72 L1/L2 cache error reporting
> +
> +CPU Memory Error Syndrome and L2 Memory Error Syndrome registers can be
> +used for checking L1 and L2 memory errors. However, only A53 supports
> +double-bit error injection to L1 and L2 memory. This driver uses the
> +hardware error injection when available, but also provides a way to
> +inject errors by software.
> +
> +To use hardware error injection and the interrupt, proper access needs
> +to be granted in ACTLR_EL3 (and/or ACTLR_EL2) register by EL3 firmware SMC 
> call.

Please describe this as "on $platforms this is done by".
On others it may be enabled already, or not possible to enable at all.

> +Correctable errors do not trigger such interrupt.

What interrupt?

> This driver uses
> +dynamic polling internal to check for errors. The more errors detected,
> +the more frequently it polls. Combining with interrupt, this driver can
> +detect correctable and uncorrectable errors. However, if the
> +uncorrectable errors cause system abort exception, this driver is not able to
> +report errors in time.

The driver isn't involved in correct-ing the errors, this was already  done, or 
not-done,
by the hardware.
Please describe this as 'corrected and uncorrected', as the state is pretty 
final.

> +The SIP-specific SMC calls are only for NXP's Layerscape family LS1043A,
> +LS1046A, LS2088A and LX2160A.
> +
> +The following section describes the Cortex A53/A72 EDAC DT node binding.
> +
> +Required properties:
> +- compatible: Should be "arm,cortex-a53-edac" or "arm,cortex-a72-edac"
> +- cpus: Should be a list of compatible cores
> +
> +Optional properties:
> +- interrupts: Interrupt number if supported
> +
> +Example:
> + edac {
> + compatible = "arm,cortex-a53-edac";
> + cpus = <>,
> +<>,
> +<>,
> +<>;
> + interrupts = <0 108 0x4>;
> +
> + };

Because this depends on firmware, please make this depend on the top-level 
compatible for
the whole platform.

It might be worth describing the SMC calls in the binding if its likely anyone 
else will
use the driver. (matching a platform, but no SMC defined

[tip: x86/cache] x86/resctrl: Merge AMD/Intel parse_bw() calls

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 5df3ca9334d5603e4afbb95953d0affb37dcf86b
Gitweb:
https://git.kernel.org/tip/5df3ca9334d5603e4afbb95953d0affb37dcf86b
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:27 
Committer: Borislav Petkov 
CommitterDate: Wed, 19 Aug 2020 09:38:57 +02:00

x86/resctrl: Merge AMD/Intel parse_bw() calls

Now after arch_needs_linear has been added, the parse_bw() calls are
almost the same between AMD and Intel.

The difference is '!is_mba_sc()', which is not checked on AMD. This
will always be true on AMD CPUs as mba_sc cannot be enabled as
is_mba_linear() is false.

Removing this duplication means user-space visible behaviour and
error messages are not validated or generated in different places.

Reviewed-by : Babu Moger 
Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-9-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/core.c|  3 +-
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 57 +--
 arch/x86/kernel/cpu/resctrl/internal.h|  6 +--
 3 files changed, 5 insertions(+), 61 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 52b8991..10a52d1 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -168,6 +168,7 @@ struct rdt_resource rdt_resources_all[] = {
.name   = "MB",
.domains= domain_init(RDT_RESOURCE_MBA),
.cache_level= 3,
+   .parse_ctrlval  = parse_bw,
.format_str = "%d=%*u",
.fflags = RFTYPE_RES_MB,
},
@@ -926,7 +927,6 @@ static __init void rdt_init_res_defs_intel(void)
else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_THRTL_BASE;
r->msr_update = mba_wrmsr_intel;
-   r->parse_ctrlval = parse_bw_intel;
}
}
 }
@@ -946,7 +946,6 @@ static __init void rdt_init_res_defs_amd(void)
else if (r->rid == RDT_RESOURCE_MBA) {
r->msr_base = MSR_IA32_MBA_BW_BASE;
r->msr_update = mba_wrmsr_amd;
-   r->parse_ctrlval = parse_bw_amd;
}
}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index e3bcd77..b0e24cb 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -23,59 +23,6 @@
 
 /*
  * Check whether MBA bandwidth percentage value is correct. The value is
- * checked against the minimum and maximum bandwidth values specified by
- * the hardware. The allocated bandwidth percentage is rounded to the next
- * control step available on the hardware.
- */
-static bool bw_validate_amd(char *buf, unsigned long *data,
-   struct rdt_resource *r)
-{
-   unsigned long bw;
-   int ret;
-
-   /* temporary: always false on AMD */
-   if (!r->membw.delay_linear && r->membw.arch_needs_linear) {
-   rdt_last_cmd_puts("No support for non-linear MB domains\n");
-   return false;
-   }
-
-   ret = kstrtoul(buf, 10, );
-   if (ret) {
-   rdt_last_cmd_printf("Non-decimal digit in MB value %s\n", buf);
-   return false;
-   }
-
-   if (bw < r->membw.min_bw || bw > r->default_ctrl) {
-   rdt_last_cmd_printf("MB value %ld out of range [%d,%d]\n", bw,
-   r->membw.min_bw, r->default_ctrl);
-   return false;
-   }
-
-   *data = roundup(bw, (unsigned long)r->membw.bw_gran);
-   return true;
-}
-
-int parse_bw_amd(struct rdt_parse_data *data, struct rdt_resource *r,
-struct rdt_domain *d)
-{
-   unsigned long bw_val;
-
-   if (d->have_new_ctrl) {
-   rdt_last_cmd_printf("Duplicate domain %d\n", d->id);
-   return -EINVAL;
-   }
-
-   if (!bw_validate_amd(data->buf, _val, r))
-   return -EINVAL;
-
-   d->new_ctrl = bw_val;
-   d->have_new_ctrl = true;
-
-   return 0;
-}
-
-/*
- * Check whether MBA bandwidth percentage value is correct. The value is
  * checked against the minimum and max bandwidth values specified by the
  * hardware. The allocated bandwidth percentage is rounded to the next
  * control step available on the hardware.
@@ -110,8 +57,8 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
return true;
 }
 
-int parse_bw_intel(struct rdt_parse_data *data, struct rdt_resource *r,
-

[tip: x86/cache] cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 709c4362725abb5fa1e36fd94893a9b0d049df82
Gitweb:
https://git.kernel.org/tip/709c4362725abb5fa1e36fd94893a9b0d049df82
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:29 
Committer: Borislav Petkov 
CommitterDate: Wed, 19 Aug 2020 11:04:23 +02:00

cacheinfo: Move resctrl's get_cache_id() to the cacheinfo header file

resctrl/core.c defines get_cache_id() for use in its cpu-hotplug
callbacks. This gets the id attribute of the cache at the corresponding
level of a CPU.

Later rework means this private function needs to be shared. Move
it to the header file.

The name conflicts with a different definition in intel_cacheinfo.c,
name it get_cpu_cacheinfo_id() to show its relation with
get_cpu_cacheinfo().

Now this is visible on other architectures, check the id attribute
has actually been set.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Babu Moger 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-11-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/core.c | 17 ++---
 include/linux/cacheinfo.h  | 21 +
 2 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index cbbd751..1c00f2f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -350,19 +350,6 @@ static void rdt_get_cdp_l2_config(void)
rdt_get_cdp_config(RDT_RESOURCE_L2, RDT_RESOURCE_L2CODE);
 }
 
-static int get_cache_id(int cpu, int level)
-{
-   struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
-   int i;
-
-   for (i = 0; i < ci->num_leaves; i++) {
-   if (ci->info_list[i].level == level)
-   return ci->info_list[i].id;
-   }
-
-   return -1;
-}
-
 static void
 mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource 
*r)
 {
@@ -560,7 +547,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, 
struct rdt_domain *d)
  */
 static void domain_add_cpu(int cpu, struct rdt_resource *r)
 {
-   int id = get_cache_id(cpu, r->cache_level);
+   int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
struct list_head *add_pos = NULL;
struct rdt_domain *d;
 
@@ -606,7 +593,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r)
 
 static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 {
-   int id = get_cache_id(cpu, r->cache_level);
+   int id = get_cpu_cacheinfo_id(cpu, r->cache_level);
struct rdt_domain *d;
 
d = rdt_find_domain(r, id, NULL);
diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h
index 46b92cd..4f72b47 100644
--- a/include/linux/cacheinfo.h
+++ b/include/linux/cacheinfo.h
@@ -3,6 +3,7 @@
 #define _LINUX_CACHEINFO_H
 
 #include 
+#include 
 #include 
 #include 
 
@@ -119,4 +120,24 @@ int acpi_find_last_cache_level(unsigned int cpu);
 
 const struct attribute_group *cache_get_priv_group(struct cacheinfo 
*this_leaf);
 
+/*
+ * Get the id of the cache associated with @cpu at level @level.
+ * cpuhp lock must be held.
+ */
+static inline int get_cpu_cacheinfo_id(int cpu, int level)
+{
+   struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
+   int i;
+
+   for (i = 0; i < ci->num_leaves; i++) {
+   if (ci->info_list[i].level == level) {
+   if (ci->info_list[i].attributes & CACHE_ID)
+   return ci->info_list[i].id;
+   return -1;
+   }
+   }
+
+   return -1;
+}
+
 #endif /* _LINUX_CACHEINFO_H */

[tip: x86/cache] x86/resctrl: Remove struct rdt_membw::max_delay

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: e89f85b9171665c917dca59920884f3d4fe0b1ef
Gitweb:
https://git.kernel.org/tip/e89f85b9171665c917dca59920884f3d4fe0b1ef
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:21 
Committer: Borislav Petkov 
CommitterDate: Tue, 18 Aug 2020 17:01:23 +02:00

x86/resctrl: Remove struct rdt_membw::max_delay

max_delay is used by x86's __get_mem_config_intel() as a local variable.
Remove it, replacing it with a local variable.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-3-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/core.c | 8 
 arch/x86/kernel/cpu/resctrl/internal.h | 3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 6a9df71..9225ee5 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -254,16 +254,16 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
 {
union cpuid_0x10_3_eax eax;
union cpuid_0x10_x_edx edx;
-   u32 ebx, ecx;
+   u32 ebx, ecx, max_delay;
 
cpuid_count(0x0010, 3, , , , );
r->num_closid = edx.split.cos_max + 1;
-   r->membw.max_delay = eax.split.max_delay + 1;
+   max_delay = eax.split.max_delay + 1;
r->default_ctrl = MAX_MBA_BW;
if (ecx & MBA_IS_LINEAR) {
r->membw.delay_linear = true;
-   r->membw.min_bw = MAX_MBA_BW - r->membw.max_delay;
-   r->membw.bw_gran = MAX_MBA_BW - r->membw.max_delay;
+   r->membw.min_bw = MAX_MBA_BW - max_delay;
+   r->membw.bw_gran = MAX_MBA_BW - max_delay;
} else {
if (!rdt_get_mb_table(r))
return false;
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 72bb210..1eb39bd 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -369,8 +369,6 @@ struct rdt_cache {
 
 /**
  * struct rdt_membw - Memory bandwidth allocation related data
- * @max_delay: Max throttle delay. Delay is the hardware
- * representation for memory bandwidth.
  * @min_bw:Minimum memory bandwidth percentage user can request
  * @bw_gran:   Granularity at which the memory bandwidth is allocated
  * @delay_linear:  True if memory B/W delay is in linear scale
@@ -378,7 +376,6 @@ struct rdt_cache {
  * @mb_map:Mapping of memory B/W percentage to memory B/W delay
  */
 struct rdt_membw {
-   u32 max_delay;
u32 min_bw;
u32 bw_gran;
u32 delay_linear;

[tip: x86/cache] x86/resctrl: Use is_closid_match() in more places

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: e6b2fac36fcc0b73cbef063d700a9841850e37a0
Gitweb:
https://git.kernel.org/tip/e6b2fac36fcc0b73cbef063d700a9841850e37a0
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:25 
Committer: Borislav Petkov 
CommitterDate: Wed, 19 Aug 2020 09:08:36 +02:00

x86/resctrl: Use is_closid_match() in more places

rdtgroup_tasks_assigned() and show_rdt_tasks() loop over threads testing
for a CTRL/MON group match by closid/rmid with the provided rdtgrp.
Further down the file are helpers to do this, move these further up and
make use of them here.

These helpers additionally check for alloc/mon capable. This is harmless
as rdtgroup_mkdir() tests these capable flags before allowing the config
directories to be created.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-7-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 30 +++--
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index b044617..78f3be1 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -592,6 +592,18 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
return ret;
 }
 
+static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
+{
+   return (rdt_alloc_capable &&
+  (r->type == RDTCTRL_GROUP) && (t->closid == r->closid));
+}
+
+static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
+{
+   return (rdt_mon_capable &&
+  (r->type == RDTMON_GROUP) && (t->rmid == r->mon.rmid));
+}
+
 /**
  * rdtgroup_tasks_assigned - Test if tasks have been assigned to resource group
  * @r: Resource group
@@ -607,8 +619,7 @@ int rdtgroup_tasks_assigned(struct rdtgroup *r)
 
rcu_read_lock();
for_each_process_thread(p, t) {
-   if ((r->type == RDTCTRL_GROUP && t->closid == r->closid) ||
-   (r->type == RDTMON_GROUP && t->rmid == r->mon.rmid)) {
+   if (is_closid_match(t, r) || is_rmid_match(t, r)) {
ret = 1;
break;
}
@@ -706,8 +717,7 @@ static void show_rdt_tasks(struct rdtgroup *r, struct 
seq_file *s)
 
rcu_read_lock();
for_each_process_thread(p, t) {
-   if ((r->type == RDTCTRL_GROUP && t->closid == r->closid) ||
-   (r->type == RDTMON_GROUP && t->rmid == r->mon.rmid))
+   if (is_closid_match(t, r) || is_rmid_match(t, r))
seq_printf(s, "%d\n", t->pid);
}
rcu_read_unlock();
@@ -2245,18 +2255,6 @@ static int reset_all_ctrls(struct rdt_resource *r)
return 0;
 }
 
-static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
-{
-   return (rdt_alloc_capable &&
-   (r->type == RDTCTRL_GROUP) && (t->closid == r->closid));
-}
-
-static bool is_rmid_match(struct task_struct *t, struct rdtgroup *r)
-{
-   return (rdt_mon_capable &&
-   (r->type == RDTMON_GROUP) && (t->rmid == r->mon.rmid));
-}
-
 /*
  * Move tasks from one to the other group. If @from is NULL, then all tasks
  * in the systems are moved unconditionally (used for teardown).

[tip: x86/cache] x86/resctrl: Remove unused struct mbm_state::chunks_bw

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: abe8f12b44250d02937665033a8b750c1bfeb26e
Gitweb:
https://git.kernel.org/tip/abe8f12b44250d02937665033a8b750c1bfeb26e
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:20 
Committer: Borislav Petkov 
CommitterDate: Tue, 18 Aug 2020 16:51:55 +02:00

x86/resctrl: Remove unused struct mbm_state::chunks_bw

Nothing reads struct mbm_states's chunks_bw value, its a copy of
chunks. Remove it.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-2-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/internal.h | 2 --
 arch/x86/kernel/cpu/resctrl/monitor.c  | 3 +--
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 5ffa322..72bb210 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -283,7 +283,6 @@ struct rftype {
  * struct mbm_state - status for each MBM counter in each domain
  * @chunks:Total data moved (multiply by rdt_group.mon_scale to get bytes)
  * @prev_msr   Value of IA32_QM_CTR for this RMID last time we read it
- * @chunks_bw  Total local data moved. Used for bandwidth calculation
  * @prev_bw_msr:Value of previous IA32_QM_CTR for bandwidth counting
  * @prev_bwThe most recent bandwidth in MBps
  * @delta_bw   Difference between the current and previous bandwidth
@@ -292,7 +291,6 @@ struct rftype {
 struct mbm_state {
u64 chunks;
u64 prev_msr;
-   u64 chunks_bw;
u64 prev_bw_msr;
u32 prev_bw;
u32 delta_bw;
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index 837d7d0..d6b92d7 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -279,8 +279,7 @@ static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
return;
 
chunks = mbm_overflow_count(m->prev_bw_msr, tval, rr->r->mbm_width);
-   m->chunks_bw += chunks;
-   m->chunks = m->chunks_bw;
+   m->chunks += chunks;
cur_bw = (chunks * r->mon_scale) >> 20;
 
if (m->delta_comp)

[tip: x86/cache] x86/resctrl: Fix stale comment

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: ae0fbedd2a18cd82a2c0c5605a0a60865bc54576
Gitweb:
https://git.kernel.org/tip/ae0fbedd2a18cd82a2c0c5605a0a60865bc54576
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:22 
Committer: Borislav Petkov 
CommitterDate: Tue, 18 Aug 2020 17:02:24 +02:00

x86/resctrl: Fix stale comment

The comment in rdtgroup_init() refers to the non existent function
rdt_mount(), which has now been renamed rdt_get_tree(). Fix the
comment.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-4-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c 
b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 3f844f1..b044617 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -3196,7 +3196,7 @@ int __init rdtgroup_init(void)
 * It may also be ok since that would enable debugging of RDT before
 * resctrl is mounted.
 * The reason why the debugfs directory is created here and not in
-* rdt_mount() is because rdt_mount() takes rdtgroup_mutex and
+* rdt_get_tree() is because rdt_get_tree() takes rdtgroup_mutex and
 * during the debugfs directory creation also >s_type->i_mutex_key
 * (the lockdep class of inode->i_rwsem). Other filesystem
 * interactions (eg. SyS_getdents) have the lock ordering:

[tip: x86/cache] x86/resctrl: Use container_of() in delayed_work handlers

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: f995801ba3a0660cb352c8beb794379c82781ca3
Gitweb:
https://git.kernel.org/tip/f995801ba3a0660cb352c8beb794379c82781ca3
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:23 
Committer: Borislav Petkov 
CommitterDate: Tue, 18 Aug 2020 17:05:08 +02:00

x86/resctrl: Use container_of() in delayed_work handlers

mbm_handle_overflow() and cqm_handle_limbo() are both provided with
the domain's work_struct when called, but use get_domain_from_cpu()
to find the domain, along with the appropriate error handling.

container_of() saves some list walking and bitmap testing, use that
instead.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-5-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c 
b/arch/x86/kernel/cpu/resctrl/monitor.c
index d6b92d7..54dffe5 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -477,19 +477,13 @@ void cqm_handle_limbo(struct work_struct *work)
mutex_lock(_mutex);
 
r = _resources_all[RDT_RESOURCE_L3];
-   d = get_domain_from_cpu(cpu, r);
-
-   if (!d) {
-   pr_warn_once("Failure to get domain for limbo worker\n");
-   goto out_unlock;
-   }
+   d = container_of(work, struct rdt_domain, cqm_limbo.work);
 
__check_limbo(d, false);
 
if (has_busy_rmid(r, d))
schedule_delayed_work_on(cpu, >cqm_limbo, delay);
 
-out_unlock:
mutex_unlock(_mutex);
 }
 
@@ -519,10 +513,7 @@ void mbm_handle_overflow(struct work_struct *work)
goto out_unlock;
 
r = _resources_all[RDT_RESOURCE_L3];
-
-   d = get_domain_from_cpu(cpu, r);
-   if (!d)
-   goto out_unlock;
+   d = container_of(work, struct rdt_domain, mbm_over.work);
 
list_for_each_entry(prgrp, _all_groups, rdtgroup_list) {
mbm_update(r, d, prgrp->mon.rmid);

[tip: x86/cache] x86/resctrl: Include pid.h

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: a21a4391f20c0ab45db452e22bc3e8afe8b36e46
Gitweb:
https://git.kernel.org/tip/a21a4391f20c0ab45db452e22bc3e8afe8b36e46
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:24 
Committer: Borislav Petkov 
CommitterDate: Tue, 18 Aug 2020 17:06:15 +02:00

x86/resctrl: Include pid.h

We are about to disturb the header soup. This header uses struct pid
and struct pid_namespace. Include their header.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Link: https://lkml.kernel.org/r/20200708163929.2783-6-james.mo...@arm.com
---
 include/linux/resctrl.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index daf5cf6..9b05af9 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -2,6 +2,8 @@
 #ifndef _RESCTRL_H
 #define _RESCTRL_H
 
+#include 
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 int proc_resctrl_show(struct seq_file *m,

[tip: x86/cache] x86/resctrl: Add struct rdt_membw::arch_needs_linear to explain AMD/Intel MBA difference

2020-08-19 Thread tip-bot2 for James Morse

The following commit has been merged into the x86/cache branch of tip:

Commit-ID: 41215b7947f1b1b86fd77a7bebd2320599aea7bd
Gitweb:
https://git.kernel.org/tip/41215b7947f1b1b86fd77a7bebd2320599aea7bd
Author:James Morse 
AuthorDate:Wed, 08 Jul 2020 16:39:26 
Committer: Borislav Petkov 
CommitterDate: Wed, 19 Aug 2020 09:34:51 +02:00

x86/resctrl: Add struct rdt_membw::arch_needs_linear to explain AMD/Intel MBA 
difference

The configuration values user-space provides to the resctrl filesystem
are ABI. To make this work on another architecture, all the ABI bits
should be moved out of /arch/x86 and under /fs.

To do this, the differences between AMD and Intel CPUs needs to be
explained to resctrl via resource properties, instead of function
pointers that let the arch code accept subtly different values on
different platforms/architectures.

For MBA, Intel CPUs reject configuration attempts for non-linear
resources, whereas AMD ignore this field as its MBA resource is never
linear. To merge the parse/validate functions, this difference needs to
be explained.

Add struct rdt_membw::arch_needs_linear to indicate the arch code needs
the linear property to be true to configure this resource. AMD can set
this and delay_linear to false. Intel can set arch_needs_linear to
true to keep the existing "No support for non-linear MB domains" error
message for affected platforms.

 [ bp: convert "we" etc to passive voice. ]

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Link: https://lkml.kernel.org/r/20200708163929.2783-8-james.mo...@arm.com
---
 arch/x86/kernel/cpu/resctrl/core.c| 3 +++
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 8 +++-
 arch/x86/kernel/cpu/resctrl/internal.h| 2 ++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/resctrl/core.c 
b/arch/x86/kernel/cpu/resctrl/core.c
index 9225ee5..52b8991 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -260,6 +260,7 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
r->num_closid = edx.split.cos_max + 1;
max_delay = eax.split.max_delay + 1;
r->default_ctrl = MAX_MBA_BW;
+   r->membw.arch_needs_linear = true;
if (ecx & MBA_IS_LINEAR) {
r->membw.delay_linear = true;
r->membw.min_bw = MAX_MBA_BW - max_delay;
@@ -267,6 +268,7 @@ static bool __get_mem_config_intel(struct rdt_resource *r)
} else {
if (!rdt_get_mb_table(r))
return false;
+   r->membw.arch_needs_linear = false;
}
r->data_width = 3;
 
@@ -288,6 +290,7 @@ static bool __rdt_get_mem_config_amd(struct rdt_resource *r)
 
/* AMD does not use delay */
r->membw.delay_linear = false;
+   r->membw.arch_needs_linear = false;
 
r->membw.min_bw = 0;
r->membw.bw_gran = 1;
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c 
b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index 934c8fb..e3bcd77 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -33,6 +33,12 @@ static bool bw_validate_amd(char *buf, unsigned long *data,
unsigned long bw;
int ret;
 
+   /* temporary: always false on AMD */
+   if (!r->membw.delay_linear && r->membw.arch_needs_linear) {
+   rdt_last_cmd_puts("No support for non-linear MB domains\n");
+   return false;
+   }
+
ret = kstrtoul(buf, 10, );
if (ret) {
rdt_last_cmd_printf("Non-decimal digit in MB value %s\n", buf);
@@ -82,7 +88,7 @@ static bool bw_validate(char *buf, unsigned long *data, 
struct rdt_resource *r)
/*
 * Only linear delay values is supported for current Intel SKUs.
 */
-   if (!r->membw.delay_linear) {
+   if (!r->membw.delay_linear && r->membw.arch_needs_linear) {
rdt_last_cmd_puts("No support for non-linear MB domains\n");
return false;
}
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h 
b/arch/x86/kernel/cpu/resctrl/internal.h
index 1eb39bd..7b00723 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -372,6 +372,7 @@ struct rdt_cache {
  * @min_bw:Minimum memory bandwidth percentage user can request
  * @bw_gran:   Granularity at which the memory bandwidth is allocated
  * @delay_linear:  True if memory B/W delay is in linear scale
+ * @arch_needs_linear: True if we can't configure non-linear resources
  * @mba_sc:True if MBA software controller(mba_sc) is enabled
  * @mb_map:Mapping of memory B/W percentage to memory B/W delay
  */
@@ -379,6 +380,7 @@ struct rdt_membw

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1082 matches

Mail list logo