Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Wed, 6 May 2015, Peter Zijlstra wrote: On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote: Will fix the whitespace issues (including before return) or other possible coding convention issues. It could be more of a common sense to have this in checkpatch rather that manually having to pointing out. If you want to have fun with that go for it though. My main objection was that your coding style is entirely inconsistent with itself. Sometimes you have a whitespace before return, sometimes you do not. Sometimes you have exit labels with locks, sometimes you do not. etc.. Pick one stick to it; although we'd all much prefer if you pick the one that's common to the kernel. Will fix the convention issues. Thanks, Vikas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Wed, 6 May 2015, Peter Zijlstra wrote: On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote: On Sat, 2 May 2015, Peter Zijlstra wrote: There's CAT in your subject, make up your minds already on what you want to call this stuff. We dont have control over the names.It is clear from the patch 0/7 where its If I read 0/n its _after_ I've read all the other patches. The thing is, 0/n should not contain anything persistent. Patches should stand on their own. explained that RDT is the umbrella term and CAT is a part of it and this patch series is only for CAT ... It also mentions what exact section of the Intel manual this refers to. Is there still some lack of clarification here ? But we're not implementing an umbrella right? We're implementing Cache QoS Enforcement (CQE aka. CAT). In some sense we are - The idea was that the same rdt cgroup would include other features in the rdt and cache allocation is one of them. Hence the cgroup name is RDT. Like Matt just commented we just found some naming issues with respect to the APIs whether to use cat or rdt. I can plan to remove the 'cat' altogether and use cache alloc as I just learnt it may be not liked because its cat(mean an animal.. in english :) ) Only reason to do it now is that we cant change the cgroup name later. Why confuse things with calling it random other names? From what I understand the whole RDT thing is the umbrella term for Cache QoS Monitoring and Enforcement together. CQM is implemented elsewhere, this part is only implementing CQE. So just call it that, calling it RDT is actively misleading, because it explicitly does _NOT_ do the monitoring half of it. If its just your disliking the term thats already known. I think its crazy to go CQE no CAT no RDT, but I could get over that in time. But now it turns out you need _both_, and that's even more silly. I agree the changing of names have led to enough confusion and we can try to see that if we can make this better in coming features. I will send fixes where I will try to be more clear on names. Basically have rdt for things which would be common to cache alloc and all other features and keep cache alloc which is specific to cache alloc (like the cache bit mask is only for cache alloc).. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Wed, 2015-05-06 at 10:09 +0200, Peter Zijlstra wrote: > > But we're not implementing an umbrella right? We're implementing Cache > QoS Enforcement (CQE aka. CAT). > > Why confuse things with calling it random other names? > > From what I understand the whole RDT thing is the umbrella term for > Cache QoS Monitoring and Enforcement together. CQM is implemented > elsewhere, this part is only implementing CQE. > > So just call it that, calling it RDT is actively misleading, because it > explicitly does _NOT_ do the monitoring half of it. Right, and we're already running into this problem where some of the function names contain "rdt" and some contain "cat". How about we go with "intel cache alloc"? We avoid the dreaded TLA-fest, it clearly matches up with what's in the Software Developer's Manual (Cache Allocation Technology) and it's pretty simple for people who haven't read the SDM to understand. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote: > Will fix the whitespace issues (including before return) or other possible > coding convention issues. > > It could be more of a common sense to have this in checkpatch rather that > manually having to pointing out. If you want to have fun with that go for it > though. My main objection was that your coding style is entirely inconsistent with itself. Sometimes you have a whitespace before return, sometimes you do not. Sometimes you have exit labels with locks, sometimes you do not. etc.. Pick one stick to it; although we'd all much prefer if you pick the one that's common to the kernel. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote: > On Sat, 2 May 2015, Peter Zijlstra wrote: > > > > >There's CAT in your subject, make up your minds already on what you want > >to call this stuff. > > We dont have control over the names.It is clear from the patch 0/7 where its If I read 0/n its _after_ I've read all the other patches. The thing is, 0/n should not contain anything persistent. Patches should stand on their own. > explained that RDT is the umbrella term and CAT is a part of it and this > patch series is only for CAT ... It also mentions what exact section of the > Intel manual this refers to. Is there still some lack of clarification here > ? But we're not implementing an umbrella right? We're implementing Cache QoS Enforcement (CQE aka. CAT). Why confuse things with calling it random other names? >From what I understand the whole RDT thing is the umbrella term for Cache QoS Monitoring and Enforcement together. CQM is implemented elsewhere, this part is only implementing CQE. So just call it that, calling it RDT is actively misleading, because it explicitly does _NOT_ do the monitoring half of it. > If its just your disliking the term thats already known. I think its crazy to go CQE no CAT no RDT, but I could get over that in time. But now it turns out you need _both_, and that's even more silly. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Sat, 2 May 2015, Peter Zijlstra wrote: There's CAT in your subject, make up your minds already on what you want to call this stuff. We dont have control over the names.It is clear from the patch 0/7 where its explained that RDT is the umbrella term and CAT is a part of it and this patch series is only for CAT ... It also mentions what exact section of the Intel manual this refers to. Is there still some lack of clarification here ? If its just your disliking the term thats already known. would have received suggestions for some fancy names like venilla / or icecream or burger and spent time deciding on the names but we cant do that to keep consistent with the rest of naming. On Fri, May 01, 2015 at 06:36:37PM -0700, Vikas Shivappa wrote: +static void rdt_free_closid(unsigned int clos) +{ + superfluous whitespace Will fix the whitespace issues (including before return) or other possible coding convention issues. It could be more of a common sense to have this in checkpatch rather that manually having to pointing out. If you want to have fun with that go for it though. An automated approach would make the time taken much smaller in terms of reviewer's time and for people submiting the code as well. They are all run against checkpatch. + lockdep_assert_held(&rdt_group_mutex); + + clear_bit(clos, rdtss_info.closmap); +} +static inline bool cbm_is_contiguous(unsigned long var) +{ + unsigned long first_bit, zero_bit; + unsigned long maxcbm = MAX_CBM_LENGTH; flip these two lines + + if (!var) + return false; + + first_bit = find_next_bit(&var, maxcbm, 0); What was wrong with find_first_bit() ? + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit); + + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm) + return false; + + return true; +} + +static int cat_cbm_read(struct seq_file *m, void *v) +{ + struct intel_rdt *ir = css_rdt(seq_css(m)); + + seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask); inconsistent spacing, you mostly have a whilespace before the return statement, but here you have not. + return 0; +} + +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue) +{ + struct intel_rdt *par, *c; + struct cgroup_subsys_state *css; + unsigned long *cbm_tmp; No reason no to order these on line length is there? + + if (!cbm_is_contiguous(cbmvalue)) { + pr_err("bitmask should have >= 1 bits and be contiguous\n"); + return -EINVAL; + } + + par = parent_rdt(ir); + cbm_tmp = &ccmap[par->clos].cache_mask; + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH)) + return -EINVAL; + + rcu_read_lock(); + rdt_for_each_child(css, ir) { + c = css_rdt(css); + cbm_tmp = &ccmap[c->clos].cache_mask; + if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) { + rcu_read_unlock(); + pr_err("Children's mask not a subset\n"); + return -EINVAL; + } + } + + rcu_read_unlock(); Daft whitespace again. + return 0; +} + +static bool cbm_search(unsigned long cbm, int *closid) +{ + int maxid = boot_cpu_data.x86_cat_closs; + unsigned int i; + + for (i = 0; i < maxid; i++) { + if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) { + *closid = i; + return true; + } + } and again + return false; +} + +static void cbmmap_dump(void) +{ + int i; + + pr_debug("CBMMAP\n"); + for (i = 0; i < boot_cpu_data.x86_cat_closs; i++) + pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n", +(unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt); This is missing {} +} + +static void __cpu_cbm_update(void *info) +{ + unsigned int closid = *((unsigned int *)info); + + wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask); +} +static int cat_cbm_write(struct cgroup_subsys_state *css, +struct cftype *cft, u64 cbmvalue) +{ + struct intel_rdt *ir = css_rdt(css); + ssize_t err = 0; + unsigned long cache_mask, max_mask; + unsigned long *cbm_tmp; + unsigned int closid; + u32 max_cbm = boot_cpu_data.x86_cat_cbmlength; That's just a right mess isn't it? + + if (ir == &rdt_root_group) + return -EPERM; + bitmap_set(&max_mask, 0, max_cbm); + + /* +* Need global mutex as cbm write may allocate a closid. +*/ + mutex_lock(&rdt_group_mutex); + bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm); + cbm_tmp = &ccmap[ir->clos].cache_mask; + + if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH)) + goto out; + +
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
There's CAT in your subject, make up your minds already on what you want to call this stuff. On Fri, May 01, 2015 at 06:36:37PM -0700, Vikas Shivappa wrote: > +static void rdt_free_closid(unsigned int clos) > +{ > + superfluous whitespace > + lockdep_assert_held(&rdt_group_mutex); > + > + clear_bit(clos, rdtss_info.closmap); > +} > +static inline bool cbm_is_contiguous(unsigned long var) > +{ > + unsigned long first_bit, zero_bit; > + unsigned long maxcbm = MAX_CBM_LENGTH; flip these two lines > + > + if (!var) > + return false; > + > + first_bit = find_next_bit(&var, maxcbm, 0); What was wrong with find_first_bit() ? > + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit); > + > + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm) > + return false; > + > + return true; > +} > + > +static int cat_cbm_read(struct seq_file *m, void *v) > +{ > + struct intel_rdt *ir = css_rdt(seq_css(m)); > + > + seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask); inconsistent spacing, you mostly have a whilespace before the return statement, but here you have not. > + return 0; > +} > + > +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue) > +{ > + struct intel_rdt *par, *c; > + struct cgroup_subsys_state *css; > + unsigned long *cbm_tmp; No reason no to order these on line length is there? > + > + if (!cbm_is_contiguous(cbmvalue)) { > + pr_err("bitmask should have >= 1 bits and be contiguous\n"); > + return -EINVAL; > + } > + > + par = parent_rdt(ir); > + cbm_tmp = &ccmap[par->clos].cache_mask; > + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH)) > + return -EINVAL; > + > + rcu_read_lock(); > + rdt_for_each_child(css, ir) { > + c = css_rdt(css); > + cbm_tmp = &ccmap[c->clos].cache_mask; > + if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) { > + rcu_read_unlock(); > + pr_err("Children's mask not a subset\n"); > + return -EINVAL; > + } > + } > + > + rcu_read_unlock(); Daft whitespace again. > + return 0; > +} > + > +static bool cbm_search(unsigned long cbm, int *closid) > +{ > + int maxid = boot_cpu_data.x86_cat_closs; > + unsigned int i; > + > + for (i = 0; i < maxid; i++) { > + if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) { > + *closid = i; > + return true; > + } > + } and again > + return false; > +} > + > +static void cbmmap_dump(void) > +{ > + int i; > + > + pr_debug("CBMMAP\n"); > + for (i = 0; i < boot_cpu_data.x86_cat_closs; i++) > + pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n", > + (unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt); This is missing {} > +} > + > +static void __cpu_cbm_update(void *info) > +{ > + unsigned int closid = *((unsigned int *)info); > + > + wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask); > +} > +static int cat_cbm_write(struct cgroup_subsys_state *css, > + struct cftype *cft, u64 cbmvalue) > +{ > + struct intel_rdt *ir = css_rdt(css); > + ssize_t err = 0; > + unsigned long cache_mask, max_mask; > + unsigned long *cbm_tmp; > + unsigned int closid; > + u32 max_cbm = boot_cpu_data.x86_cat_cbmlength; That's just a right mess isn't it? > + > + if (ir == &rdt_root_group) > + return -EPERM; > + bitmap_set(&max_mask, 0, max_cbm); > + > + /* > + * Need global mutex as cbm write may allocate a closid. > + */ > + mutex_lock(&rdt_group_mutex); > + bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm); > + cbm_tmp = &ccmap[ir->clos].cache_mask; > + > + if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH)) > + goto out; > + > + err = validate_cbm(ir, cache_mask); > + if (err) > + goto out; > + > + /* > + * At this point we are sure to change the cache_mask.Hence release the > + * reference to the current CLOSid and try to get a reference for > + * a different CLOSid. > + */ > + __clos_put(ir->clos); > + > + if (cbm_search(cache_mask, &closid)) { > + ir->clos = closid; > + __clos_get(closid); > + } else { > + err = rdt_alloc_closid(ir); > + if (err) > + goto out; > + > + ccmap[ir->clos].cache_mask = cache_mask; > + cbm_update_all(ir->clos); > + } > + > + cbmmap_dump(); > +out: > + Daft whitespace again.. Also inconsistent return paradigm, here you use an out label, where in validate_cbm() you did rcu_read_unlock() and return from the middle. > + mutex_unlock(&rdt_group_mutex); > + return err; > +} > + > +static inline bool rdt_u
[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
Add support for cache bit mask manipulation. The change adds a file cache_mask to the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. Update to the CBM is done by writing to the IA32_L3_MASK_n. The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the cgroup never fails. When a child cgroup is created it inherits the CLOSid and the cache_mask from its parent. When a user changes the default CBM for a cgroup, a new CLOSid may be allocated if the cache_mask was not used before. If the new CBM is the one that is already used, the count for that CLOSid<->CBM is incremented. The changing of 'cbm' may fail with -ENOSPC once the kernel runs out of maximum CLOSids it can support. User can create as many cgroups as he wants but having different CBMs at the same time is restricted by the maximum number of CLOSids (multiple cgroups can have the same CBM). Kernel maintains a CLOSid<->cbm mapping which keeps count of cgroups using a CLOSid. The tasks in the CAT cgroup would get to fill the L3 cache represented by the cgroup's cache_mask file. Reuse of CLOSids for cgroups with same bitmask also has following advantages: - This helps to use the scant CLOSids optimally. - This also implies that during context switch, write to PQR-MSR is done only when a task with a different bitmask is scheduled in. During cpu bringup due to a hotplug event, IA32_L3_MASK_n MSR is synchronized from the clos cbm map if it is used by any cgroup for the package. Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 7 +- arch/x86/kernel/cpu/intel_rdt.c | 364 --- 2 files changed, 346 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 87af1a5..9e9dbbe 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,6 +4,9 @@ #ifdef CONFIG_CGROUP_RDT #include +#define MAX_CBM_LENGTH 32 +#define IA32_L3_CBM_BASE 0xc90 +#define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ @@ -17,8 +20,8 @@ struct intel_rdt { }; struct clos_cbm_map { - unsigned long cbm; - unsigned int cgrp_count; + unsigned long cache_mask; + unsigned int clos_refcnt; }; /* diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index eec57fe..58b39d6 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -24,16 +24,25 @@ #include #include #include +#include #include /* - * ccmap maintains 1:1 mapping between CLOSid and cbm. + * ccmap maintains 1:1 mapping between CLOSid and cache_mask. */ static struct clos_cbm_map *ccmap; static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +/* + * Mask of CPUs for writing CBM values. We only need one per-socket. + */ +static cpumask_t rdt_cpumask; + +#define rdt_for_each_child(pos_css, parent_ir) \ + css_for_each_child((pos_css), &(parent_ir)->css) + static inline bool cat_supported(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_CAT_L3)) @@ -42,22 +51,66 @@ static inline bool cat_supported(struct cpuinfo_x86 *c) return false; } +static void __clos_init(unsigned int closid) +{ + struct clos_cbm_map *ccm = &ccmap[closid]; + + lockdep_assert_held(&rdt_group_mutex); + + ccm->clos_refcnt = 1; +} + /* -* Called with the rdt_group_mutex held. -*/ -static int rdt_free_closid(struct intel_rdt *ir) + * Allocates a new closid from unused closids. + */ +static int rdt_alloc_closid(struct intel_rdt *ir) { + unsigned int id; + unsigned int maxid; lockdep_assert_held(&rdt_group_mutex); - WARN_ON(!ccmap[ir->clos].cgrp_count); - ccmap[ir->clos].cgrp_count--; - if (!ccmap[ir->clos].cgrp_count) - clear_bit(ir->clos, rdtss_info.closmap); + maxid = boot_cpu_data.x86_cat_closs; + id = find_next_zero_bit(rdtss_info.closmap, maxid, 0); + if (id == maxid) + return -ENOSPC; + + set_bit(id, rdtss_info.closmap); + __clos_init(id); + ir->clos = id; return 0; } +static void rdt_free_closid(unsigned int clos) +{ + + lockdep_assert_held(&rdt_group_mutex); + + clear_bit(clos, rdtss_info.closmap); +} + +static void __clos_get(unsigned int closid) +{ + struct clos_cbm_map *ccm = &ccmap[closid]; + + lockdep_assert_held(&rdt_group_mutex); + + ccm->clos_refcnt += 1; +} + +static void __clos_put(unsigned int closid) +{ + struct clos_cbm_map *ccm = &ccmap[closid]; + + lockdep_assert_held(&rdt_group_mutex); + WARN_ON(!ccm->clos_refcnt); + + ccm->clos_refcnt -= 1; + if (!ccm->clos_refcnt) + rdt_free_closid(closid); +} + static struct cgroup_subsys_state
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Thu, 9 Apr 2015, Marcelo Tosatti wrote: On Thu, Mar 12, 2015 at 04:16:03PM -0700, Vikas Shivappa wrote: Add support for cache bit mask manipulation. The change adds a file to the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the cgroup never fails. When a child cgroup is created it inherits the CLOSid and the CBM from its parent. When a user changes the default CBM for a cgroup, a new CLOSid may be allocated if the CBM was not used before. If the new CBM is the one that is already used, the count for that CLOSid<->CBM is incremented. The changing of 'cbm' may fail with -ENOSPC once the kernel runs out of maximum CLOSids it can support. User can create as many cgroups as he wants but having different CBMs at the same time is restricted by the maximum number of CLOSids (multiple cgroups can have the same CBM). Kernel maintains a CLOSid<->cbm mapping which keeps count of cgroups using a CLOSid. The tasks in the CAT cgroup would get to fill the LLC cache represented by the cgroup's 'cbm' file. Reuse of CLOSids for cgroups with same bitmask also has following advantages: - This helps to use the scant CLOSids optimally. - This also implies that during context switch, write to PQR-MSR is done only when a task with a different bitmask is scheduled in. Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 3 + arch/x86/kernel/cpu/intel_rdt.c | 205 +++ 2 files changed, 208 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 87af1a5..0ed28d9 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,6 +4,9 @@ #ifdef CONFIG_CGROUP_RDT #include +#define MAX_CBM_LENGTH 32 +#define IA32_L3_CBM_BASE 0xc90 +#define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 3726f41..495497a 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +#define rdt_for_each_child(pos_css, parent_ir) \ + css_for_each_child((pos_css), &(parent_ir)->css) + static inline bool cat_supported(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_CAT_L3)) @@ -83,6 +86,31 @@ static int __init rdt_late_init(void) late_initcall(rdt_late_init); /* + * Allocates a new closid from unused closids. + * Called with the rdt_group_mutex held. + */ + +static int rdt_alloc_closid(struct intel_rdt *ir) +{ + unsigned int id; + unsigned int maxid; + + lockdep_assert_held(&rdt_group_mutex); + + maxid = boot_cpu_data.x86_cat_closs; + id = find_next_zero_bit(rdtss_info.closmap, maxid, 0); + if (id == maxid) + return -ENOSPC; + + set_bit(id, rdtss_info.closmap); + WARN_ON(ccmap[id].cgrp_count); + ccmap[id].cgrp_count++; + ir->clos = id; + + return 0; +} + +/* * Called with the rdt_group_mutex held. */ static int rdt_free_closid(struct intel_rdt *ir) @@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css) mutex_unlock(&rdt_group_mutex); } +/* + * Tests if atleast two contiguous bits are set. + */ + +static inline bool cbm_is_contiguous(unsigned long var) +{ + unsigned long first_bit, zero_bit; + unsigned long maxcbm = MAX_CBM_LENGTH; + + if (bitmap_weight(&var, maxcbm) < 2) + return false; + + first_bit = find_next_bit(&var, maxcbm, 0); + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit); + + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm) + return false; + + return true; +} + +static int cat_cbm_read(struct seq_file *m, void *v) +{ + struct intel_rdt *ir = css_rdt(seq_css(m)); + + seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm); + return 0; +} + +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue) +{ + struct intel_rdt *par, *c; + struct cgroup_subsys_state *css; + unsigned long *cbm_tmp; + + if (!cbm_is_contiguous(cbmvalue)) { + pr_info("cbm should have >= 2 bits and be contiguous\n"); + return -EINVAL; + } + + par = parent_rdt(ir); + cbm_tmp = &ccmap[par->clos].cbm; + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH)) + return -EINVAL; Can you have different errors for the different cases? Could use -EPER + rcu_read_lock(); + rdt_for_each_child(css, ir) { + c = css_rdt(css); + cbm_tmp = &ccmap[c->clos].cbm; + if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LE
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Thu, Mar 12, 2015 at 04:16:03PM -0700, Vikas Shivappa wrote: > Add support for cache bit mask manipulation. The change adds a file to > the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. > > The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the > cgroup never fails. When a child cgroup is created it inherits the > CLOSid and the CBM from its parent. When a user changes the default > CBM for a cgroup, a new CLOSid may be allocated if the CBM was not > used before. If the new CBM is the one that is already used, the > count for that CLOSid<->CBM is incremented. The changing of 'cbm' > may fail with -ENOSPC once the kernel runs out of maximum CLOSids it > can support. > User can create as many cgroups as he wants but having different CBMs > at the same time is restricted by the maximum number of CLOSids > (multiple cgroups can have the same CBM). > Kernel maintains a CLOSid<->cbm mapping which keeps count > of cgroups using a CLOSid. > > The tasks in the CAT cgroup would get to fill the LLC cache represented > by the cgroup's 'cbm' file. > > Reuse of CLOSids for cgroups with same bitmask also has following > advantages: > - This helps to use the scant CLOSids optimally. > - This also implies that during context switch, write to PQR-MSR is done > only when a task with a different bitmask is scheduled in. > > Signed-off-by: Vikas Shivappa > --- > arch/x86/include/asm/intel_rdt.h | 3 + > arch/x86/kernel/cpu/intel_rdt.c | 205 > +++ > 2 files changed, 208 insertions(+) > > diff --git a/arch/x86/include/asm/intel_rdt.h > b/arch/x86/include/asm/intel_rdt.h > index 87af1a5..0ed28d9 100644 > --- a/arch/x86/include/asm/intel_rdt.h > +++ b/arch/x86/include/asm/intel_rdt.h > @@ -4,6 +4,9 @@ > #ifdef CONFIG_CGROUP_RDT > > #include > +#define MAX_CBM_LENGTH 32 > +#define IA32_L3_CBM_BASE 0xc90 > +#define CBM_FROM_INDEX(x)(IA32_L3_CBM_BASE + x) > > struct rdt_subsys_info { > /* Clos Bitmap to keep track of available CLOSids.*/ > diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c > index 3726f41..495497a 100644 > --- a/arch/x86/kernel/cpu/intel_rdt.c > +++ b/arch/x86/kernel/cpu/intel_rdt.c > @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info; > static DEFINE_MUTEX(rdt_group_mutex); > struct intel_rdt rdt_root_group; > > +#define rdt_for_each_child(pos_css, parent_ir) \ > + css_for_each_child((pos_css), &(parent_ir)->css) > + > static inline bool cat_supported(struct cpuinfo_x86 *c) > { > if (cpu_has(c, X86_FEATURE_CAT_L3)) > @@ -83,6 +86,31 @@ static int __init rdt_late_init(void) > late_initcall(rdt_late_init); > > /* > + * Allocates a new closid from unused closids. > + * Called with the rdt_group_mutex held. > + */ > + > +static int rdt_alloc_closid(struct intel_rdt *ir) > +{ > + unsigned int id; > + unsigned int maxid; > + > + lockdep_assert_held(&rdt_group_mutex); > + > + maxid = boot_cpu_data.x86_cat_closs; > + id = find_next_zero_bit(rdtss_info.closmap, maxid, 0); > + if (id == maxid) > + return -ENOSPC; > + > + set_bit(id, rdtss_info.closmap); > + WARN_ON(ccmap[id].cgrp_count); > + ccmap[id].cgrp_count++; > + ir->clos = id; > + > + return 0; > +} > + > +/* > * Called with the rdt_group_mutex held. > */ > static int rdt_free_closid(struct intel_rdt *ir) > @@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state > *css) > mutex_unlock(&rdt_group_mutex); > } > > +/* > + * Tests if atleast two contiguous bits are set. > + */ > + > +static inline bool cbm_is_contiguous(unsigned long var) > +{ > + unsigned long first_bit, zero_bit; > + unsigned long maxcbm = MAX_CBM_LENGTH; > + > + if (bitmap_weight(&var, maxcbm) < 2) > + return false; > + > + first_bit = find_next_bit(&var, maxcbm, 0); > + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit); > + > + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm) > + return false; > + > + return true; > +} > + > +static int cat_cbm_read(struct seq_file *m, void *v) > +{ > + struct intel_rdt *ir = css_rdt(seq_css(m)); > + > + seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm); > + return 0; > +} > + > +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue) > +{ > + struct intel_rdt *par, *c; > + struct cgroup_subsys_state *css; > + unsigned long *cbm_tmp; > + > + if (!cbm_is_contiguous(cbmvalue)) { > + pr_info("cbm should have >= 2 bits and be contiguous\n"); > + return -EINVAL; > + } > + > + par = parent_rdt(ir); > + cbm_tmp = &ccmap[par->clos].cbm; > + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH)) > + return -EINVAL; Can you have different errors for the different cases? > + rcu_read_lock(); > + rdt_for_each_child(css,
[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
Add support for cache bit mask manipulation. The change adds a file to the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the cgroup never fails. When a child cgroup is created it inherits the CLOSid and the CBM from its parent. When a user changes the default CBM for a cgroup, a new CLOSid may be allocated if the CBM was not used before. If the new CBM is the one that is already used, the count for that CLOSid<->CBM is incremented. The changing of 'cbm' may fail with -ENOSPC once the kernel runs out of maximum CLOSids it can support. User can create as many cgroups as he wants but having different CBMs at the same time is restricted by the maximum number of CLOSids (multiple cgroups can have the same CBM). Kernel maintains a CLOSid<->cbm mapping which keeps count of cgroups using a CLOSid. The tasks in the CAT cgroup would get to fill the LLC cache represented by the cgroup's 'cbm' file. Reuse of CLOSids for cgroups with same bitmask also has following advantages: - This helps to use the scant CLOSids optimally. - This also implies that during context switch, write to PQR-MSR is done only when a task with a different bitmask is scheduled in. Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 3 + arch/x86/kernel/cpu/intel_rdt.c | 205 +++ 2 files changed, 208 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index 87af1a5..0ed28d9 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,6 +4,9 @@ #ifdef CONFIG_CGROUP_RDT #include +#define MAX_CBM_LENGTH 32 +#define IA32_L3_CBM_BASE 0xc90 +#define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 3726f41..495497a 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +#define rdt_for_each_child(pos_css, parent_ir) \ + css_for_each_child((pos_css), &(parent_ir)->css) + static inline bool cat_supported(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_CAT_L3)) @@ -83,6 +86,31 @@ static int __init rdt_late_init(void) late_initcall(rdt_late_init); /* + * Allocates a new closid from unused closids. + * Called with the rdt_group_mutex held. + */ + +static int rdt_alloc_closid(struct intel_rdt *ir) +{ + unsigned int id; + unsigned int maxid; + + lockdep_assert_held(&rdt_group_mutex); + + maxid = boot_cpu_data.x86_cat_closs; + id = find_next_zero_bit(rdtss_info.closmap, maxid, 0); + if (id == maxid) + return -ENOSPC; + + set_bit(id, rdtss_info.closmap); + WARN_ON(ccmap[id].cgrp_count); + ccmap[id].cgrp_count++; + ir->clos = id; + + return 0; +} + +/* * Called with the rdt_group_mutex held. */ static int rdt_free_closid(struct intel_rdt *ir) @@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css) mutex_unlock(&rdt_group_mutex); } +/* + * Tests if atleast two contiguous bits are set. + */ + +static inline bool cbm_is_contiguous(unsigned long var) +{ + unsigned long first_bit, zero_bit; + unsigned long maxcbm = MAX_CBM_LENGTH; + + if (bitmap_weight(&var, maxcbm) < 2) + return false; + + first_bit = find_next_bit(&var, maxcbm, 0); + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit); + + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm) + return false; + + return true; +} + +static int cat_cbm_read(struct seq_file *m, void *v) +{ + struct intel_rdt *ir = css_rdt(seq_css(m)); + + seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm); + return 0; +} + +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue) +{ + struct intel_rdt *par, *c; + struct cgroup_subsys_state *css; + unsigned long *cbm_tmp; + + if (!cbm_is_contiguous(cbmvalue)) { + pr_info("cbm should have >= 2 bits and be contiguous\n"); + return -EINVAL; + } + + par = parent_rdt(ir); + cbm_tmp = &ccmap[par->clos].cbm; + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH)) + return -EINVAL; + + rcu_read_lock(); + rdt_for_each_child(css, ir) { + c = css_rdt(css); + cbm_tmp = &ccmap[c->clos].cbm; + if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) { + pr_info("Children's mask not a subset\n"); + rcu_read_unlock(); + return -EINVAL; + } +
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Fri, 27 Feb 2015, Tejun Heo wrote: Hello, Vikas. On Fri, Feb 27, 2015 at 11:34:16AM -0800, Vikas Shivappa wrote: This cgroup subsystem would basically let the user partition one of the Platform shared resource , the LLC cache. This could be extended in future I suppose LLC means last level cache? It'd be great if you can spell out the full term when the abbreviation is first referenced in the comments or documentation. Yes that's last level cache. Will update documentation/comments if any. to partition more shared resources when there is hardware support that way we may eventually have more files in the cgroup. RDT is a generic term for platform resource sharing. For more information you can refer to section 17.15 of Intel SDM. We did go through quite a bit of discussion on lkml regarding adding the cgroup interface for CAT and the patches were posted only after that. This cgroup would not interact with other cgroups in the sense would not modify or add any elements to existing cgroups - there was such a proposal but was removed as we did not get agreement on lkml. the original lkml thread is here from 10/2014 for your reference - https://lkml.org/lkml/2014/10/16/568 Yeap, I followed that thread and this being a separate controller definitely makes a lot more sense. I take it that the feature implemented is too coarse to allow for weight based distribution? Could you please clarify more on this ? However there is a limitation from hardware that there have to be a minimum of 2 bits in the cbm if thats what you referred to. Otherwise the bits in the cbm directly map to the number of cache ways and hence the cache capacity .. Right, so the granularity is fairly coarse and specifying things like "distribute cache in 4:2:1 (or even in absolute bytes) to these three cgroups" wouldn't work at all. Specifying in any amount of cache bytes would be not possible because the minimum granularity has to be atleast one cache way because the entire memory can be indexed into one cache way. Providing the bit mask granularity helps users to not worry about how much bytes cache way is and can specify in terms of the bitmask. If we want to provide such an interface in the cgroups where users can specify the size in bytes then we need to show the user the minimum granularity in bytes as well. Also note that this bit masks are overlapping and hence the users have a way to specify overlapped regions in cache which may be very useful in lot of scenarios where multiple cgroups want to share the capacity. The minimum granularity is 2 bits in the pre-production SKUs and it does put limitation to scenarios you say. We will issue a patch update once it hopefully gets updated in later SKUs. But note that the SDM also recommends using 2 bits from performance aspect because an application using only cache-way would have a lot more conflicts. Say if max cbm is 20bits then the granularity is 10% of total cache.. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
Hello, Vikas. On Fri, Feb 27, 2015 at 11:34:16AM -0800, Vikas Shivappa wrote: > This cgroup subsystem would basically let the user partition one of the > Platform shared resource , the LLC cache. This could be extended in future I suppose LLC means last level cache? It'd be great if you can spell out the full term when the abbreviation is first referenced in the comments or documentation. > to partition more shared resources when there is hardware support that way > we may eventually have more files in the cgroup. RDT is a generic term for > platform resource sharing. > For more information you can refer to section 17.15 of Intel SDM. > We did go through quite a bit of discussion on lkml regarding adding the > cgroup interface for CAT and the patches were posted only after that. > This cgroup would not interact with other cgroups in the sense would not > modify or add any elements to existing cgroups - there was such a proposal > but was removed as we did not get agreement on lkml. > > the original lkml thread is here from 10/2014 for your reference - > https://lkml.org/lkml/2014/10/16/568 Yeap, I followed that thread and this being a separate controller definitely makes a lot more sense. > I > >take it that the feature implemented is too coarse to allow for weight > >based distribution? > > > Could you please clarify more on this ? However there is a limitation from > hardware that there have to be a minimum of 2 bits in the cbm if thats what > you referred to. Otherwise the bits in the cbm directly map to the number of > cache ways and hence the cache capacity .. Right, so the granularity is fairly coarse and specifying things like "distribute cache in 4:2:1 (or even in absolute bytes) to these three cgroups" wouldn't work at all. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
Hello Tejun, On Fri, 27 Feb 2015, Tejun Heo wrote: Hello, On Tue, Feb 24, 2015 at 03:16:40PM -0800, Vikas Shivappa wrote: Add support for cache bit mask manipulation. The change adds a file to the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the cgroup never fails. When a child cgroup is created it inherits the CLOSid and the CBM from its parent. When a user changes the default CBM for a cgroup, a new CLOSid may be allocated if the CBM was not used before. If the new CBM is the one that is already used, the count for that CLOSid<->CBM is incremented. The changing of 'cbm' may fail with -ENOSPC once the kernel runs out of maximum CLOSids it can support. User can create as many cgroups as he wants but having different CBMs at the same time is restricted by the maximum number of CLOSids (multiple cgroups can have the same CBM). Kernel maintains a CLOSid<->cbm mapping which keeps count of cgroups using a CLOSid. The tasks in the CAT cgroup would get to fill the LLC cache represented by the cgroup's 'cbm' file. Reuse of CLOSids for cgroups with same bitmask also has following advantages: - This helps to use the scant CLOSids optimally. - This also implies that during context switch, write to PQR-MSR is done only when a task with a different bitmask is scheduled in. I feel a bit underwhelmed about this new controller and its interface. It is evidently at a lot lower level and way more niche than what other controllers are doing, even cpuset. At the same time, as long as it's well isolated, it piggybacking on cgroup should be okay. This cgroup subsystem would basically let the user partition one of the Platform shared resource , the LLC cache. This could be extended in future to partition more shared resources when there is hardware support that way we may eventually have more files in the cgroup. RDT is a generic term for platform resource sharing. For more information you can refer to section 17.15 of Intel SDM. We did go through quite a bit of discussion on lkml regarding adding the cgroup interface for CAT and the patches were posted only after that. This cgroup would not interact with other cgroups in the sense would not modify or add any elements to existing cgroups - there was such a proposal but was removed as we did not get agreement on lkml. the original lkml thread is here from 10/2014 for your reference - https://lkml.org/lkml/2014/10/16/568 I take it that the feature implemented is too coarse to allow for weight based distribution? Could you please clarify more on this ? However there is a limitation from hardware that there have to be a minimum of 2 bits in the cbm if thats what you referred to. Otherwise the bits in the cbm directly map to the number of cache ways and hence the cache capacity .. Thanks, Vikas Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
On Fri, Feb 27, 2015 at 07:12:22AM -0500, Tejun Heo wrote: > I feel a bit underwhelmed about this new controller and its interface. > It is evidently at a lot lower level and way more niche than what > other controllers are doing, even cpuset. At the same time, as long > as it's well isolated, it piggybacking on cgroup should be okay. I > take it that the feature implemented is too coarse to allow for weight > based distribution? And, Ingo, Peter, are you guys in general agreeing with this addition? As Tony said, we don't wanna be left way behind but that doesn't mean we wanna jump on everything giving off the faintest sign of movement, which sadly has happened often enough in the storage area at least. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
Hello, On Tue, Feb 24, 2015 at 03:16:40PM -0800, Vikas Shivappa wrote: > Add support for cache bit mask manipulation. The change adds a file to > the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. > > The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the > cgroup never fails. When a child cgroup is created it inherits the > CLOSid and the CBM from its parent. When a user changes the default > CBM for a cgroup, a new CLOSid may be allocated if the CBM was not > used before. If the new CBM is the one that is already used, the > count for that CLOSid<->CBM is incremented. The changing of 'cbm' > may fail with -ENOSPC once the kernel runs out of maximum CLOSids it > can support. > User can create as many cgroups as he wants but having different CBMs > at the same time is restricted by the maximum number of CLOSids > (multiple cgroups can have the same CBM). > Kernel maintains a CLOSid<->cbm mapping which keeps count > of cgroups using a CLOSid. > > The tasks in the CAT cgroup would get to fill the LLC cache represented > by the cgroup's 'cbm' file. > > Reuse of CLOSids for cgroups with same bitmask also has following > advantages: > - This helps to use the scant CLOSids optimally. > - This also implies that during context switch, write to PQR-MSR is done > only when a task with a different bitmask is scheduled in. I feel a bit underwhelmed about this new controller and its interface. It is evidently at a lot lower level and way more niche than what other controllers are doing, even cpuset. At the same time, as long as it's well isolated, it piggybacking on cgroup should be okay. I take it that the feature implemented is too coarse to allow for weight based distribution? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT
Add support for cache bit mask manipulation. The change adds a file to the RDT cgroup which represents the CBM(cache bit mask) for the cgroup. The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the cgroup never fails. When a child cgroup is created it inherits the CLOSid and the CBM from its parent. When a user changes the default CBM for a cgroup, a new CLOSid may be allocated if the CBM was not used before. If the new CBM is the one that is already used, the count for that CLOSid<->CBM is incremented. The changing of 'cbm' may fail with -ENOSPC once the kernel runs out of maximum CLOSids it can support. User can create as many cgroups as he wants but having different CBMs at the same time is restricted by the maximum number of CLOSids (multiple cgroups can have the same CBM). Kernel maintains a CLOSid<->cbm mapping which keeps count of cgroups using a CLOSid. The tasks in the CAT cgroup would get to fill the LLC cache represented by the cgroup's 'cbm' file. Reuse of CLOSids for cgroups with same bitmask also has following advantages: - This helps to use the scant CLOSids optimally. - This also implies that during context switch, write to PQR-MSR is done only when a task with a different bitmask is scheduled in. Signed-off-by: Vikas Shivappa --- arch/x86/include/asm/intel_rdt.h | 3 + arch/x86/kernel/cpu/intel_rdt.c | 179 +++ 2 files changed, 182 insertions(+) diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h index ecd9664..a414771 100644 --- a/arch/x86/include/asm/intel_rdt.h +++ b/arch/x86/include/asm/intel_rdt.h @@ -4,6 +4,9 @@ #ifdef CONFIG_CGROUP_RDT #include +#define MAX_CBM_LENGTH 32 +#define IA32_L3_CBM_BASE 0xc90 +#define CBM_FROM_INDEX(x) (IA32_L3_CBM_BASE + x) struct rdt_subsys_info { /* Clos Bitmap to keep track of available CLOSids.*/ diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 6cf1a16..dd090a7 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info; static DEFINE_MUTEX(rdt_group_mutex); struct intel_rdt rdt_root_group; +#define rdt_for_each_child(pos_css, parent_ir) \ + css_for_each_child((pos_css), &(parent_ir)->css) + static inline bool cat_supported(struct cpuinfo_x86 *c) { if (cpu_has(c, X86_FEATURE_CAT_L3)) @@ -84,6 +87,30 @@ static int __init rdt_late_init(void) late_initcall(rdt_late_init); /* + * Allocates a new closid from unused closids. + * Called with the rdt_group_mutex held. + */ + +static int rdt_alloc_closid(struct intel_rdt *ir) +{ + unsigned int id; + unsigned int maxid; + + lockdep_assert_held(&rdt_group_mutex); + + maxid = boot_cpu_data.x86_cat_closs; + id = find_next_zero_bit(rdtss_info.closmap, maxid, 0); + if (id == maxid) + return -ENOSPC; + + set_bit(id, rdtss_info.closmap); + ccmap[id].cgrp_count++; + ir->clos = id; + + return 0; +} + +/* * Called with the rdt_group_mutex held. */ static int rdt_free_closid(struct intel_rdt *ir) @@ -135,8 +162,160 @@ static void rdt_css_free(struct cgroup_subsys_state *css) mutex_unlock(&rdt_group_mutex); } +/* + * Tests if atleast two contiguous bits are set. + */ + +static inline bool cbm_is_contiguous(unsigned long var) +{ + unsigned long first_bit, zero_bit; + unsigned long maxcbm = MAX_CBM_LENGTH; + + if (bitmap_weight(&var, maxcbm) < 2) + return false; + + first_bit = find_next_bit(&var, maxcbm, 0); + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit); + + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm) + return false; + + return true; +} + +static int cat_cbm_read(struct seq_file *m, void *v) +{ + struct intel_rdt *ir = css_rdt(seq_css(m)); + + seq_bitmap(m, ir->cbm, MAX_CBM_LENGTH); + seq_putc(m, '\n'); + return 0; +} + +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue) +{ + struct intel_rdt *par, *c; + struct cgroup_subsys_state *css; + + if (!cbm_is_contiguous(cbmvalue)) { + pr_info("cbm should have >= 2 bits and be contiguous\n"); + return -EINVAL; + } + + par = parent_rdt(ir); + if (!bitmap_subset(&cbmvalue, par->cbm, MAX_CBM_LENGTH)) + return -EINVAL; + + rcu_read_lock(); + rdt_for_each_child(css, ir) { + c = css_rdt(css); + if (!bitmap_subset(c->cbm, &cbmvalue, MAX_CBM_LENGTH)) { + pr_info("Children's mask not a subset\n"); + rcu_read_unlock(); + return -EINVAL; + } + } + + rcu_read_unlock(); + return 0; +} + +static bool cbm_search(unsigned long cbm, int *closid) +{ + int maxid = boot_c