Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-06 Thread Vikas Shivappa



On Wed, 6 May 2015, Peter Zijlstra wrote:


On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:

Will fix the whitespace issues (including before return) or other possible
coding convention issues.

It could be more of a common sense to have this in checkpatch rather that
manually having to pointing out. If you want to have fun with that go for it
though.


My main objection was that your coding style is entirely inconsistent
with itself.

Sometimes you have a whitespace before return, sometimes you do not.

Sometimes you have exit labels with locks, sometimes you do not.

etc..

Pick one stick to it; although we'd all much prefer if you pick the one
that's common to the kernel.


Will fix the convention issues.

Thanks,
Vikas




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-06 Thread Vikas Shivappa



On Wed, 6 May 2015, Peter Zijlstra wrote:


On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:

On Sat, 2 May 2015, Peter Zijlstra wrote:



There's CAT in your subject, make up your minds already on what you want
to call this stuff.


We dont have control over the names.It is clear from the patch 0/7 where its


If I read 0/n its _after_ I've read all the other patches. The thing is,
0/n should not contain anything persistent. Patches should stand on
their own.


explained that RDT is the umbrella term and CAT is a part of it and this
patch series is only for CAT ... It also mentions what exact section of the
Intel manual this refers to. Is there still some lack of clarification here
?


But we're not implementing an umbrella right? We're implementing Cache
QoS Enforcement (CQE aka. CAT).


In some sense we are - The idea was that the same rdt cgroup would include other 
features in the rdt and 
cache allocation is one of them. Hence the cgroup name is RDT. Like Matt just 
commented we just found some naming issues with respect to the APIs whether to 
use cat or rdt. I can plan to remove the 'cat' altogether and use cache alloc 
as I just learnt it may be not liked because its cat(mean an animal.. in english 
:) )


Only reason to do it now is that we cant change the cgroup name later.



Why confuse things with calling it random other names?

From what I understand the whole RDT thing is the umbrella term for
Cache QoS Monitoring and Enforcement together. CQM is implemented
elsewhere, this part is only implementing CQE.

So just call it that, calling it RDT is actively misleading, because it
explicitly does _NOT_ do the monitoring half of it.


If its just your disliking the term thats already known.


I think its crazy to go CQE no CAT no RDT, but I could get over that in
time. But now it turns out you need _both_, and that's even more silly.


I agree the changing of names have led to enough confusion and we can try to see 
that if we can make this better in coming features.
I will send fixes where I will try to be more clear on names. 
Basically have rdt for things which would be common to cache alloc and all other 
features and keep cache alloc which is specific to cache alloc (like the 
cache bit mask is only for cache alloc)..






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-06 Thread Matt Fleming
On Wed, 2015-05-06 at 10:09 +0200, Peter Zijlstra wrote:
> 
> But we're not implementing an umbrella right? We're implementing Cache
> QoS Enforcement (CQE aka. CAT).
> 
> Why confuse things with calling it random other names?
> 
> From what I understand the whole RDT thing is the umbrella term for
> Cache QoS Monitoring and Enforcement together. CQM is implemented
> elsewhere, this part is only implementing CQE.
> 
> So just call it that, calling it RDT is actively misleading, because it
> explicitly does _NOT_ do the monitoring half of it.

Right, and we're already running into this problem where some of the
function names contain "rdt" and some contain "cat".

How about we go with "intel cache alloc"? We avoid the dreaded TLA-fest,
it clearly matches up with what's in the Software Developer's Manual
(Cache Allocation Technology) and it's pretty simple for people who
haven't read the SDM to understand.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-06 Thread Peter Zijlstra
On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:
> Will fix the whitespace issues (including before return) or other possible
> coding convention issues.
> 
> It could be more of a common sense to have this in checkpatch rather that
> manually having to pointing out. If you want to have fun with that go for it
> though.

My main objection was that your coding style is entirely inconsistent
with itself.

Sometimes you have a whitespace before return, sometimes you do not.

Sometimes you have exit labels with locks, sometimes you do not.

etc..

Pick one stick to it; although we'd all much prefer if you pick the one
that's common to the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-06 Thread Peter Zijlstra
On Mon, May 04, 2015 at 10:30:15AM -0700, Vikas Shivappa wrote:
> On Sat, 2 May 2015, Peter Zijlstra wrote:
> 
> >
> >There's CAT in your subject, make up your minds already on what you want
> >to call this stuff.
> 
> We dont have control over the names.It is clear from the patch 0/7 where its

If I read 0/n its _after_ I've read all the other patches. The thing is,
0/n should not contain anything persistent. Patches should stand on
their own.

> explained that RDT is the umbrella term and CAT is a part of it and this
> patch series is only for CAT ... It also mentions what exact section of the
> Intel manual this refers to. Is there still some lack of clarification here
> ?

But we're not implementing an umbrella right? We're implementing Cache
QoS Enforcement (CQE aka. CAT).

Why confuse things with calling it random other names?

>From what I understand the whole RDT thing is the umbrella term for
Cache QoS Monitoring and Enforcement together. CQM is implemented
elsewhere, this part is only implementing CQE.

So just call it that, calling it RDT is actively misleading, because it
explicitly does _NOT_ do the monitoring half of it.

> If its just your disliking the term thats already known.

I think its crazy to go CQE no CAT no RDT, but I could get over that in
time. But now it turns out you need _both_, and that's even more silly.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-04 Thread Vikas Shivappa



On Sat, 2 May 2015, Peter Zijlstra wrote:



There's CAT in your subject, make up your minds already on what you want
to call this stuff.


We dont have control over the names.It is clear from the patch 0/7 where 
its explained that RDT is 
the umbrella term and CAT is a part of it and this patch series is only for CAT 
... It also mentions what exact section of the Intel manual this refers to. Is 
there still some lack of clarification here ?

If its just your disliking the term thats already known.

would have received suggestions for some fancy names 
like venilla / or icecream or burger and spent time deciding on the names but 
we cant do that to keep consistent with the rest of naming.




On Fri, May 01, 2015 at 06:36:37PM -0700, Vikas Shivappa wrote:

+static void rdt_free_closid(unsigned int clos)
+{
+


superfluous whitespace


Will fix the whitespace issues (including before return) or other possible 
coding convention issues.


It could be more of a common sense to have this in checkpatch rather that manually having to 
pointing out. If you want to have fun with that go for it though.
An automated approach would make the time taken much 
smaller in terms of reviewer's time and for people submiting the code 
as well.

They are all run against checkpatch.




+   lockdep_assert_held(&rdt_group_mutex);
+
+   clear_bit(clos, rdtss_info.closmap);
+}



+static inline bool cbm_is_contiguous(unsigned long var)
+{
+   unsigned long first_bit, zero_bit;
+   unsigned long maxcbm = MAX_CBM_LENGTH;


flip these two lines


+
+   if (!var)
+   return false;
+
+   first_bit = find_next_bit(&var, maxcbm, 0);


What was wrong with find_first_bit() ?


+   zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+   if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+   return false;
+
+   return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+   struct intel_rdt *ir = css_rdt(seq_css(m));
+
+   seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask);


inconsistent spacing, you mostly have a whilespace before the return
statement, but here you have not.


+   return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+   struct intel_rdt *par, *c;
+   struct cgroup_subsys_state *css;
+   unsigned long *cbm_tmp;


No reason no to order these on line length is there?


+
+   if (!cbm_is_contiguous(cbmvalue)) {
+   pr_err("bitmask should have >= 1 bits and be contiguous\n");
+   return -EINVAL;
+   }
+
+   par = parent_rdt(ir);
+   cbm_tmp = &ccmap[par->clos].cache_mask;
+   if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
+   return -EINVAL;
+
+   rcu_read_lock();
+   rdt_for_each_child(css, ir) {
+   c = css_rdt(css);
+   cbm_tmp = &ccmap[c->clos].cache_mask;
+   if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
+   rcu_read_unlock();
+   pr_err("Children's mask not a subset\n");
+   return -EINVAL;
+   }
+   }
+
+   rcu_read_unlock();


Daft whitespace again.


+   return 0;
+}
+
+static bool cbm_search(unsigned long cbm, int *closid)
+{
+   int maxid = boot_cpu_data.x86_cat_closs;
+   unsigned int i;
+
+   for (i = 0; i < maxid; i++) {
+   if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) {
+   *closid = i;
+   return true;
+   }
+   }


and again


+   return false;
+}
+
+static void cbmmap_dump(void)
+{
+   int i;
+
+   pr_debug("CBMMAP\n");
+   for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
+   pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n",
+(unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt);


This is missing {}


+}
+
+static void __cpu_cbm_update(void *info)
+{
+   unsigned int closid = *((unsigned int *)info);
+
+   wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask);
+}



+static int cat_cbm_write(struct cgroup_subsys_state *css,
+struct cftype *cft, u64 cbmvalue)
+{
+   struct intel_rdt *ir = css_rdt(css);
+   ssize_t err = 0;
+   unsigned long cache_mask, max_mask;
+   unsigned long *cbm_tmp;
+   unsigned int closid;
+   u32 max_cbm = boot_cpu_data.x86_cat_cbmlength;


That's just a right mess isn't it?


+
+   if (ir == &rdt_root_group)
+   return -EPERM;
+   bitmap_set(&max_mask, 0, max_cbm);
+
+   /*
+* Need global mutex as cbm write may allocate a closid.
+*/
+   mutex_lock(&rdt_group_mutex);
+   bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
+   cbm_tmp = &ccmap[ir->clos].cache_mask;
+
+   if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
+   goto out;
+
+ 

Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-02 Thread Peter Zijlstra

There's CAT in your subject, make up your minds already on what you want
to call this stuff.

On Fri, May 01, 2015 at 06:36:37PM -0700, Vikas Shivappa wrote:
> +static void rdt_free_closid(unsigned int clos)
> +{
> +

superfluous whitespace

> + lockdep_assert_held(&rdt_group_mutex);
> +
> + clear_bit(clos, rdtss_info.closmap);
> +}

> +static inline bool cbm_is_contiguous(unsigned long var)
> +{
> + unsigned long first_bit, zero_bit;
> + unsigned long maxcbm = MAX_CBM_LENGTH;

flip these two lines

> +
> + if (!var)
> + return false;
> +
> + first_bit = find_next_bit(&var, maxcbm, 0);

What was wrong with find_first_bit() ?

> + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
> +
> + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
> + return false;
> +
> + return true;
> +}
> +
> +static int cat_cbm_read(struct seq_file *m, void *v)
> +{
> + struct intel_rdt *ir = css_rdt(seq_css(m));
> +
> + seq_printf(m, "%08lx\n", ccmap[ir->clos].cache_mask);

inconsistent spacing, you mostly have a whilespace before the return
statement, but here you have not.

> + return 0;
> +}
> +
> +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
> +{
> + struct intel_rdt *par, *c;
> + struct cgroup_subsys_state *css;
> + unsigned long *cbm_tmp;

No reason no to order these on line length is there?

> +
> + if (!cbm_is_contiguous(cbmvalue)) {
> + pr_err("bitmask should have >= 1 bits and be contiguous\n");
> + return -EINVAL;
> + }
> +
> + par = parent_rdt(ir);
> + cbm_tmp = &ccmap[par->clos].cache_mask;
> + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
> + return -EINVAL;
> +
> + rcu_read_lock();
> + rdt_for_each_child(css, ir) {
> + c = css_rdt(css);
> + cbm_tmp = &ccmap[c->clos].cache_mask;
> + if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
> + rcu_read_unlock();
> + pr_err("Children's mask not a subset\n");
> + return -EINVAL;
> + }
> + }
> +
> + rcu_read_unlock();

Daft whitespace again.

> + return 0;
> +}
> +
> +static bool cbm_search(unsigned long cbm, int *closid)
> +{
> + int maxid = boot_cpu_data.x86_cat_closs;
> + unsigned int i;
> +
> + for (i = 0; i < maxid; i++) {
> + if (bitmap_equal(&cbm, &ccmap[i].cache_mask, MAX_CBM_LENGTH)) {
> + *closid = i;
> + return true;
> + }
> + }

and again

> + return false;
> +}
> +
> +static void cbmmap_dump(void)
> +{
> + int i;
> +
> + pr_debug("CBMMAP\n");
> + for (i = 0; i < boot_cpu_data.x86_cat_closs; i++)
> + pr_debug("cache_mask: 0x%x,clos_refcnt: %u\n",
> +  (unsigned int)ccmap[i].cache_mask, ccmap[i].clos_refcnt);

This is missing {}

> +}
> +
> +static void __cpu_cbm_update(void *info)
> +{
> + unsigned int closid = *((unsigned int *)info);
> +
> + wrmsrl(CBM_FROM_INDEX(closid), ccmap[closid].cache_mask);
> +}

> +static int cat_cbm_write(struct cgroup_subsys_state *css,
> +  struct cftype *cft, u64 cbmvalue)
> +{
> + struct intel_rdt *ir = css_rdt(css);
> + ssize_t err = 0;
> + unsigned long cache_mask, max_mask;
> + unsigned long *cbm_tmp;
> + unsigned int closid;
> + u32 max_cbm = boot_cpu_data.x86_cat_cbmlength;

That's just a right mess isn't it?

> +
> + if (ir == &rdt_root_group)
> + return -EPERM;
> + bitmap_set(&max_mask, 0, max_cbm);
> +
> + /*
> +  * Need global mutex as cbm write may allocate a closid.
> +  */
> + mutex_lock(&rdt_group_mutex);
> + bitmap_and(&cache_mask, (unsigned long *)&cbmvalue, &max_mask, max_cbm);
> + cbm_tmp = &ccmap[ir->clos].cache_mask;
> +
> + if (bitmap_equal(&cache_mask, cbm_tmp, MAX_CBM_LENGTH))
> + goto out;
> +
> + err = validate_cbm(ir, cache_mask);
> + if (err)
> + goto out;
> +
> + /*
> +  * At this point we are sure to change the cache_mask.Hence release the
> +  * reference to the current CLOSid and try to get a reference for
> +  * a different CLOSid.
> +  */
> + __clos_put(ir->clos);
> +
> + if (cbm_search(cache_mask, &closid)) {
> + ir->clos = closid;
> + __clos_get(closid);
> + } else {
> + err = rdt_alloc_closid(ir);
> + if (err)
> + goto out;
> +
> + ccmap[ir->clos].cache_mask = cache_mask;
> + cbm_update_all(ir->clos);
> + }
> +
> + cbmmap_dump();
> +out:
> +

Daft whitespace again.. Also inconsistent return paradigm, here you use
an out label, where in validate_cbm() you did rcu_read_unlock() and
return from the middle.

> + mutex_unlock(&rdt_group_mutex);
> + return err;
> +}
> +
> +static inline bool rdt_u

[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-05-01 Thread Vikas Shivappa
Add support for cache bit mask manipulation. The change adds a file
cache_mask to the RDT cgroup which represents the CBM(cache bit mask)
  for the cgroup.

Update to the CBM is done by writing to the IA32_L3_MASK_n.
The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the cache_mask from its parent.  When a user changes the
default CBM for a cgroup, a new CLOSid may be allocated if the
cache_mask was not used before. If the new CBM is the one that is
already used, the count for that CLOSid<->CBM is incremented. The
changing of 'cbm' may fail with -ENOSPC once the kernel runs out of
maximum CLOSids it can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the L3 cache represented
by the cgroup's cache_mask file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

During cpu bringup due to a hotplug event, IA32_L3_MASK_n MSR is
synchronized from the clos cbm map if it is used by any cgroup for the
package.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h |   7 +-
 arch/x86/kernel/cpu/intel_rdt.c  | 364 ---
 2 files changed, 346 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 87af1a5..9e9dbbe 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include 
+#define MAX_CBM_LENGTH 32
+#define IA32_L3_CBM_BASE   0xc90
+#define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
/* Clos Bitmap to keep track of available CLOSids.*/
@@ -17,8 +20,8 @@ struct intel_rdt {
 };
 
 struct clos_cbm_map {
-   unsigned long cbm;
-   unsigned int cgrp_count;
+   unsigned long cache_mask;
+   unsigned int clos_refcnt;
 };
 
 /*
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index eec57fe..58b39d6 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -24,16 +24,25 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
- * ccmap maintains 1:1 mapping between CLOSid and cbm.
+ * ccmap maintains 1:1 mapping between CLOSid and cache_mask.
  */
 static struct clos_cbm_map *ccmap;
 static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+/*
+ * Mask of CPUs for writing CBM values. We only need one per-socket.
+ */
+static cpumask_t rdt_cpumask;
+
+#define rdt_for_each_child(pos_css, parent_ir) \
+   css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -42,22 +51,66 @@ static inline bool cat_supported(struct cpuinfo_x86 *c)
return false;
 }
 
+static void __clos_init(unsigned int closid)
+{
+   struct clos_cbm_map *ccm = &ccmap[closid];
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   ccm->clos_refcnt = 1;
+}
+
 /*
-* Called with the rdt_group_mutex held.
-*/
-static int rdt_free_closid(struct intel_rdt *ir)
+ * Allocates a new closid from unused closids.
+ */
+static int rdt_alloc_closid(struct intel_rdt *ir)
 {
+   unsigned int id;
+   unsigned int maxid;
 
lockdep_assert_held(&rdt_group_mutex);
 
-   WARN_ON(!ccmap[ir->clos].cgrp_count);
-   ccmap[ir->clos].cgrp_count--;
-   if (!ccmap[ir->clos].cgrp_count)
-   clear_bit(ir->clos, rdtss_info.closmap);
+   maxid = boot_cpu_data.x86_cat_closs;
+   id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+   if (id == maxid)
+   return -ENOSPC;
+
+   set_bit(id, rdtss_info.closmap);
+   __clos_init(id);
+   ir->clos = id;
 
return 0;
 }
 
+static void rdt_free_closid(unsigned int clos)
+{
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   clear_bit(clos, rdtss_info.closmap);
+}
+
+static void __clos_get(unsigned int closid)
+{
+   struct clos_cbm_map *ccm = &ccmap[closid];
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   ccm->clos_refcnt += 1;
+}
+
+static void __clos_put(unsigned int closid)
+{
+   struct clos_cbm_map *ccm = &ccmap[closid];
+
+   lockdep_assert_held(&rdt_group_mutex);
+   WARN_ON(!ccm->clos_refcnt);
+
+   ccm->clos_refcnt -= 1;
+   if (!ccm->clos_refcnt)
+   rdt_free_closid(closid);
+}
+
 static struct cgroup_subsys_state 

Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-04-12 Thread Vikas Shivappa



On Thu, 9 Apr 2015, Marcelo Tosatti wrote:


On Thu, Mar 12, 2015 at 04:16:03PM -0700, Vikas Shivappa wrote:

Add support for cache bit mask manipulation. The change adds a file to
the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.

The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the CBM from its parent.  When a user changes the default
CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
used before. If the new CBM is the one that is already used, the
count for that CLOSid<->CBM is incremented. The changing of 'cbm'
may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the LLC cache represented
by the cgroup's 'cbm' file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h |   3 +
 arch/x86/kernel/cpu/intel_rdt.c  | 205 +++
 2 files changed, 208 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 87af1a5..0ed28d9 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT

 #include 
+#define MAX_CBM_LENGTH 32
+#define IA32_L3_CBM_BASE   0xc90
+#define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)

 struct rdt_subsys_info {
/* Clos Bitmap to keep track of available CLOSids.*/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 3726f41..495497a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;

+#define rdt_for_each_child(pos_css, parent_ir) \
+   css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -83,6 +86,31 @@ static int __init rdt_late_init(void)
 late_initcall(rdt_late_init);

 /*
+ * Allocates a new closid from unused closids.
+ * Called with the rdt_group_mutex held.
+ */
+
+static int rdt_alloc_closid(struct intel_rdt *ir)
+{
+   unsigned int id;
+   unsigned int maxid;
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   maxid = boot_cpu_data.x86_cat_closs;
+   id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+   if (id == maxid)
+   return -ENOSPC;
+
+   set_bit(id, rdtss_info.closmap);
+   WARN_ON(ccmap[id].cgrp_count);
+   ccmap[id].cgrp_count++;
+   ir->clos = id;
+
+   return 0;
+}
+
+/*
 * Called with the rdt_group_mutex held.
 */
 static int rdt_free_closid(struct intel_rdt *ir)
@@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
mutex_unlock(&rdt_group_mutex);
 }

+/*
+ * Tests if atleast two contiguous bits are set.
+ */
+
+static inline bool cbm_is_contiguous(unsigned long var)
+{
+   unsigned long first_bit, zero_bit;
+   unsigned long maxcbm = MAX_CBM_LENGTH;
+
+   if (bitmap_weight(&var, maxcbm) < 2)
+   return false;
+
+   first_bit = find_next_bit(&var, maxcbm, 0);
+   zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+   if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+   return false;
+
+   return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+   struct intel_rdt *ir = css_rdt(seq_css(m));
+
+   seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm);
+   return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+   struct intel_rdt *par, *c;
+   struct cgroup_subsys_state *css;
+   unsigned long *cbm_tmp;
+
+   if (!cbm_is_contiguous(cbmvalue)) {
+   pr_info("cbm should have >= 2 bits and be contiguous\n");
+   return -EINVAL;
+   }
+
+   par = parent_rdt(ir);
+   cbm_tmp = &ccmap[par->clos].cbm;
+   if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
+   return -EINVAL;


Can you have different errors for the different cases?


Could use -EPER




+   rcu_read_lock();
+   rdt_for_each_child(css, ir) {
+   c = css_rdt(css);
+   cbm_tmp = &ccmap[c->clos].cbm;
+   if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LE

Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-04-09 Thread Marcelo Tosatti
On Thu, Mar 12, 2015 at 04:16:03PM -0700, Vikas Shivappa wrote:
> Add support for cache bit mask manipulation. The change adds a file to
> the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.
> 
> The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
> cgroup never fails.  When a child cgroup is created it inherits the
> CLOSid and the CBM from its parent.  When a user changes the default
> CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
> used before. If the new CBM is the one that is already used, the
> count for that CLOSid<->CBM is incremented. The changing of 'cbm'
> may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
> can support.
> User can create as many cgroups as he wants but having different CBMs
> at the same time is restricted by the maximum number of CLOSids
> (multiple cgroups can have the same CBM).
> Kernel maintains a CLOSid<->cbm mapping which keeps count
> of cgroups using a CLOSid.
> 
> The tasks in the CAT cgroup would get to fill the LLC cache represented
> by the cgroup's 'cbm' file.
> 
> Reuse of CLOSids for cgroups with same bitmask also has following
> advantages:
> - This helps to use the scant CLOSids optimally.
> - This also implies that during context switch, write to PQR-MSR is done
> only when a task with a different bitmask is scheduled in.
> 
> Signed-off-by: Vikas Shivappa 
> ---
>  arch/x86/include/asm/intel_rdt.h |   3 +
>  arch/x86/kernel/cpu/intel_rdt.c  | 205 
> +++
>  2 files changed, 208 insertions(+)
> 
> diff --git a/arch/x86/include/asm/intel_rdt.h 
> b/arch/x86/include/asm/intel_rdt.h
> index 87af1a5..0ed28d9 100644
> --- a/arch/x86/include/asm/intel_rdt.h
> +++ b/arch/x86/include/asm/intel_rdt.h
> @@ -4,6 +4,9 @@
>  #ifdef CONFIG_CGROUP_RDT
>  
>  #include 
> +#define MAX_CBM_LENGTH   32
> +#define IA32_L3_CBM_BASE 0xc90
> +#define CBM_FROM_INDEX(x)(IA32_L3_CBM_BASE + x)
>  
>  struct rdt_subsys_info {
>   /* Clos Bitmap to keep track of available CLOSids.*/
> diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
> index 3726f41..495497a 100644
> --- a/arch/x86/kernel/cpu/intel_rdt.c
> +++ b/arch/x86/kernel/cpu/intel_rdt.c
> @@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
>  static DEFINE_MUTEX(rdt_group_mutex);
>  struct intel_rdt rdt_root_group;
>  
> +#define rdt_for_each_child(pos_css, parent_ir)   \
> + css_for_each_child((pos_css), &(parent_ir)->css)
> +
>  static inline bool cat_supported(struct cpuinfo_x86 *c)
>  {
>   if (cpu_has(c, X86_FEATURE_CAT_L3))
> @@ -83,6 +86,31 @@ static int __init rdt_late_init(void)
>  late_initcall(rdt_late_init);
>  
>  /*
> + * Allocates a new closid from unused closids.
> + * Called with the rdt_group_mutex held.
> + */
> +
> +static int rdt_alloc_closid(struct intel_rdt *ir)
> +{
> + unsigned int id;
> + unsigned int maxid;
> +
> + lockdep_assert_held(&rdt_group_mutex);
> +
> + maxid = boot_cpu_data.x86_cat_closs;
> + id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
> + if (id == maxid)
> + return -ENOSPC;
> +
> + set_bit(id, rdtss_info.closmap);
> + WARN_ON(ccmap[id].cgrp_count);
> + ccmap[id].cgrp_count++;
> + ir->clos = id;
> +
> + return 0;
> +}
> +
> +/*
>  * Called with the rdt_group_mutex held.
>  */
>  static int rdt_free_closid(struct intel_rdt *ir)
> @@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state 
> *css)
>   mutex_unlock(&rdt_group_mutex);
>  }
>  
> +/*
> + * Tests if atleast two contiguous bits are set.
> + */
> +
> +static inline bool cbm_is_contiguous(unsigned long var)
> +{
> + unsigned long first_bit, zero_bit;
> + unsigned long maxcbm = MAX_CBM_LENGTH;
> +
> + if (bitmap_weight(&var, maxcbm) < 2)
> + return false;
> +
> + first_bit = find_next_bit(&var, maxcbm, 0);
> + zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
> +
> + if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
> + return false;
> +
> + return true;
> +}
> +
> +static int cat_cbm_read(struct seq_file *m, void *v)
> +{
> + struct intel_rdt *ir = css_rdt(seq_css(m));
> +
> + seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm);
> + return 0;
> +}
> +
> +static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
> +{
> + struct intel_rdt *par, *c;
> + struct cgroup_subsys_state *css;
> + unsigned long *cbm_tmp;
> +
> + if (!cbm_is_contiguous(cbmvalue)) {
> + pr_info("cbm should have >= 2 bits and be contiguous\n");
> + return -EINVAL;
> + }
> +
> + par = parent_rdt(ir);
> + cbm_tmp = &ccmap[par->clos].cbm;
> + if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
> + return -EINVAL;

Can you have different errors for the different cases?

> + rcu_read_lock();
> + rdt_for_each_child(css,

[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-03-12 Thread Vikas Shivappa
Add support for cache bit mask manipulation. The change adds a file to
the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.

The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the CBM from its parent.  When a user changes the default
CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
used before. If the new CBM is the one that is already used, the
count for that CLOSid<->CBM is incremented. The changing of 'cbm'
may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the LLC cache represented
by the cgroup's 'cbm' file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h |   3 +
 arch/x86/kernel/cpu/intel_rdt.c  | 205 +++
 2 files changed, 208 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index 87af1a5..0ed28d9 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include 
+#define MAX_CBM_LENGTH 32
+#define IA32_L3_CBM_BASE   0xc90
+#define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
/* Clos Bitmap to keep track of available CLOSids.*/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 3726f41..495497a 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+#define rdt_for_each_child(pos_css, parent_ir) \
+   css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -83,6 +86,31 @@ static int __init rdt_late_init(void)
 late_initcall(rdt_late_init);
 
 /*
+ * Allocates a new closid from unused closids.
+ * Called with the rdt_group_mutex held.
+ */
+
+static int rdt_alloc_closid(struct intel_rdt *ir)
+{
+   unsigned int id;
+   unsigned int maxid;
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   maxid = boot_cpu_data.x86_cat_closs;
+   id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+   if (id == maxid)
+   return -ENOSPC;
+
+   set_bit(id, rdtss_info.closmap);
+   WARN_ON(ccmap[id].cgrp_count);
+   ccmap[id].cgrp_count++;
+   ir->clos = id;
+
+   return 0;
+}
+
+/*
 * Called with the rdt_group_mutex held.
 */
 static int rdt_free_closid(struct intel_rdt *ir)
@@ -133,8 +161,185 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
mutex_unlock(&rdt_group_mutex);
 }
 
+/*
+ * Tests if atleast two contiguous bits are set.
+ */
+
+static inline bool cbm_is_contiguous(unsigned long var)
+{
+   unsigned long first_bit, zero_bit;
+   unsigned long maxcbm = MAX_CBM_LENGTH;
+
+   if (bitmap_weight(&var, maxcbm) < 2)
+   return false;
+
+   first_bit = find_next_bit(&var, maxcbm, 0);
+   zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+   if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+   return false;
+
+   return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+   struct intel_rdt *ir = css_rdt(seq_css(m));
+
+   seq_printf(m, "%08lx\n", ccmap[ir->clos].cbm);
+   return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+   struct intel_rdt *par, *c;
+   struct cgroup_subsys_state *css;
+   unsigned long *cbm_tmp;
+
+   if (!cbm_is_contiguous(cbmvalue)) {
+   pr_info("cbm should have >= 2 bits and be contiguous\n");
+   return -EINVAL;
+   }
+
+   par = parent_rdt(ir);
+   cbm_tmp = &ccmap[par->clos].cbm;
+   if (!bitmap_subset(&cbmvalue, cbm_tmp, MAX_CBM_LENGTH))
+   return -EINVAL;
+
+   rcu_read_lock();
+   rdt_for_each_child(css, ir) {
+   c = css_rdt(css);
+   cbm_tmp = &ccmap[c->clos].cbm;
+   if (!bitmap_subset(cbm_tmp, &cbmvalue, MAX_CBM_LENGTH)) {
+   pr_info("Children's mask not a subset\n");
+   rcu_read_unlock();
+   return -EINVAL;
+   }
+   

Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-02-27 Thread Vikas Shivappa



On Fri, 27 Feb 2015, Tejun Heo wrote:


Hello, Vikas.

On Fri, Feb 27, 2015 at 11:34:16AM -0800, Vikas Shivappa wrote:

This cgroup subsystem would basically let the user partition one of the
Platform shared resource , the LLC cache. This could be extended in future


I suppose LLC means last level cache?  It'd be great if you can spell
out the full term when the abbreviation is first referenced in the
comments or documentation.



Yes that's last level cache. Will update documentation/comments if any.


to partition more shared resources when there is hardware support that way
we may eventually have more files in the cgroup. RDT is a generic term for
platform resource sharing.



For more information you can refer to section 17.15 of Intel SDM.
We did go through quite a bit of discussion on lkml regarding adding the
cgroup interface for CAT and the patches were posted only after that.
This cgroup would not interact with other cgroups in the sense would not
modify or add any elements to existing cgroups - there was such a proposal
but was removed as we did not get agreement on lkml.

the original lkml thread is here from 10/2014 for your reference -
https://lkml.org/lkml/2014/10/16/568


Yeap, I followed that thread and this being a separate controller
definitely makes a lot more sense.


  I

take it that the feature implemented is too coarse to allow for weight
based distribution?


Could you please clarify more on this ? However there is a limitation from
hardware that there have to be a minimum of 2 bits in the cbm if thats what
you referred to. Otherwise the bits in the cbm directly map to the number of
cache ways and hence the cache capacity ..


Right, so the granularity is fairly coarse and specifying things like
"distribute cache in 4:2:1 (or even in absolute bytes) to these three
cgroups" wouldn't work at all.


Specifying in any amount of cache bytes would be not possible because the minimum 
granularity has to be atleast one cache way because the entire memory can be 
indexed into one cache way.
Providing the bit mask granularity helps users to not worry about how much bytes 
cache way is and can specify in terms of the bitmask. If we want to 
provide such an interface in the cgroups where users can specify the size in 
bytes then we need to show the user the 
minimum granularity in bytes as well. Also note that this 
bit masks are overlapping and hence the users have a way to specify overlapped 
regions in cache which may be very useful in lot of scenarios where multiple 
cgroups want to share the capacity.


The minimum granularity is 2 bits in the pre-production SKUs  and it does
put limitation to scenarios you say. We will issue a patch update once it 
hopefully gets updated in later SKUs. But note that the SDM also recommends 
using 
2 bits from performance aspect because an application using only cache-way would 
have a lot more conflicts.

Say if max cbm is 20bits then the granularity is 10% of total cache..



Thanks.

--
tejun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-02-27 Thread Tejun Heo
Hello, Vikas.

On Fri, Feb 27, 2015 at 11:34:16AM -0800, Vikas Shivappa wrote:
> This cgroup subsystem would basically let the user partition one of the
> Platform shared resource , the LLC cache. This could be extended in future

I suppose LLC means last level cache?  It'd be great if you can spell
out the full term when the abbreviation is first referenced in the
comments or documentation.

> to partition more shared resources when there is hardware support that way
> we may eventually have more files in the cgroup. RDT is a generic term for
> platform resource sharing.

> For more information you can refer to section 17.15 of Intel SDM.
> We did go through quite a bit of discussion on lkml regarding adding the
> cgroup interface for CAT and the patches were posted only after that.
> This cgroup would not interact with other cgroups in the sense would not
> modify or add any elements to existing cgroups - there was such a proposal
> but was removed as we did not get agreement on lkml.
>
> the original lkml thread is here from 10/2014 for your reference -
> https://lkml.org/lkml/2014/10/16/568

Yeap, I followed that thread and this being a separate controller
definitely makes a lot more sense.

>   I
> >take it that the feature implemented is too coarse to allow for weight
> >based distribution?
> >
> Could you please clarify more on this ? However there is a limitation from
> hardware that there have to be a minimum of 2 bits in the cbm if thats what
> you referred to. Otherwise the bits in the cbm directly map to the number of
> cache ways and hence the cache capacity ..

Right, so the granularity is fairly coarse and specifying things like
"distribute cache in 4:2:1 (or even in absolute bytes) to these three
cgroups" wouldn't work at all.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-02-27 Thread Vikas Shivappa


Hello Tejun,

On Fri, 27 Feb 2015, Tejun Heo wrote:


Hello,

On Tue, Feb 24, 2015 at 03:16:40PM -0800, Vikas Shivappa wrote:

Add support for cache bit mask manipulation. The change adds a file to
the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.

The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the CBM from its parent.  When a user changes the default
CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
used before. If the new CBM is the one that is already used, the
count for that CLOSid<->CBM is incremented. The changing of 'cbm'
may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the LLC cache represented
by the cgroup's 'cbm' file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.


I feel a bit underwhelmed about this new controller and its interface.
It is evidently at a lot lower level and way more niche than what
other controllers are doing, even cpuset.  At the same time, as long
as it's well isolated, it piggybacking on cgroup should be okay.


This cgroup subsystem would basically let the user partition one of the Platform 
shared resource , the LLC cache. This could be extended in future to partition 
more shared resources when there is hardware support that way we may eventually have more 
files in the cgroup. RDT is a generic term for platform resource sharing.

For more information you can refer to section 17.15 of Intel SDM.
We did go through quite a bit of discussion on lkml regarding adding 
the cgroup interface for CAT and the patches were posted only after that.
This cgroup would not interact with other cgroups in the sense would not modify 
or add any elements to existing cgroups - there was such a proposal but was 
removed as we did not get agreement on lkml.


the original lkml thread is here from 10/2014 for your reference -
https://lkml.org/lkml/2014/10/16/568

  I

take it that the feature implemented is too coarse to allow for weight
based distribution?

Could you please clarify more on this ? However there is a limitation from 
hardware that there have to be a minimum of 2 bits in the cbm if thats what you 
referred to. Otherwise the bits in the cbm directly map to the number of cache 
ways and hence the cache capacity ..


Thanks,
Vikas


Thanks.

--
tejun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-02-27 Thread Tejun Heo
On Fri, Feb 27, 2015 at 07:12:22AM -0500, Tejun Heo wrote:
> I feel a bit underwhelmed about this new controller and its interface.
> It is evidently at a lot lower level and way more niche than what
> other controllers are doing, even cpuset.  At the same time, as long
> as it's well isolated, it piggybacking on cgroup should be okay.  I
> take it that the feature implemented is too coarse to allow for weight
> based distribution?

And, Ingo, Peter, are you guys in general agreeing with this addition?
As Tony said, we don't wanna be left way behind but that doesn't mean
we wanna jump on everything giving off the faintest sign of movement,
which sadly has happened often enough in the storage area at least.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-02-27 Thread Tejun Heo
Hello,

On Tue, Feb 24, 2015 at 03:16:40PM -0800, Vikas Shivappa wrote:
> Add support for cache bit mask manipulation. The change adds a file to
> the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.
> 
> The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
> cgroup never fails.  When a child cgroup is created it inherits the
> CLOSid and the CBM from its parent.  When a user changes the default
> CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
> used before. If the new CBM is the one that is already used, the
> count for that CLOSid<->CBM is incremented. The changing of 'cbm'
> may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
> can support.
> User can create as many cgroups as he wants but having different CBMs
> at the same time is restricted by the maximum number of CLOSids
> (multiple cgroups can have the same CBM).
> Kernel maintains a CLOSid<->cbm mapping which keeps count
> of cgroups using a CLOSid.
> 
> The tasks in the CAT cgroup would get to fill the LLC cache represented
> by the cgroup's 'cbm' file.
> 
> Reuse of CLOSids for cgroups with same bitmask also has following
> advantages:
> - This helps to use the scant CLOSids optimally.
> - This also implies that during context switch, write to PQR-MSR is done
> only when a task with a different bitmask is scheduled in.

I feel a bit underwhelmed about this new controller and its interface.
It is evidently at a lot lower level and way more niche than what
other controllers are doing, even cpuset.  At the same time, as long
as it's well isolated, it piggybacking on cgroup should be okay.  I
take it that the feature implemented is too coarse to allow for weight
based distribution?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] x86/intel_rdt: Support cache bit mask for Intel CAT

2015-02-24 Thread Vikas Shivappa
Add support for cache bit mask manipulation. The change adds a file to
the RDT cgroup which represents the CBM(cache bit mask) for the cgroup.

The RDT cgroup follows cgroup hierarchy ,mkdir and adding tasks to the
cgroup never fails.  When a child cgroup is created it inherits the
CLOSid and the CBM from its parent.  When a user changes the default
CBM for a cgroup, a new CLOSid may be allocated if the CBM was not
used before. If the new CBM is the one that is already used, the
count for that CLOSid<->CBM is incremented. The changing of 'cbm'
may fail with -ENOSPC once the kernel runs out of maximum CLOSids it
can support.
User can create as many cgroups as he wants but having different CBMs
at the same time is restricted by the maximum number of CLOSids
(multiple cgroups can have the same CBM).
Kernel maintains a CLOSid<->cbm mapping which keeps count
of cgroups using a CLOSid.

The tasks in the CAT cgroup would get to fill the LLC cache represented
by the cgroup's 'cbm' file.

Reuse of CLOSids for cgroups with same bitmask also has following
advantages:
- This helps to use the scant CLOSids optimally.
- This also implies that during context switch, write to PQR-MSR is done
only when a task with a different bitmask is scheduled in.

Signed-off-by: Vikas Shivappa 
---
 arch/x86/include/asm/intel_rdt.h |   3 +
 arch/x86/kernel/cpu/intel_rdt.c  | 179 +++
 2 files changed, 182 insertions(+)

diff --git a/arch/x86/include/asm/intel_rdt.h b/arch/x86/include/asm/intel_rdt.h
index ecd9664..a414771 100644
--- a/arch/x86/include/asm/intel_rdt.h
+++ b/arch/x86/include/asm/intel_rdt.h
@@ -4,6 +4,9 @@
 #ifdef CONFIG_CGROUP_RDT
 
 #include 
+#define MAX_CBM_LENGTH 32
+#define IA32_L3_CBM_BASE   0xc90
+#define CBM_FROM_INDEX(x)  (IA32_L3_CBM_BASE + x)
 
 struct rdt_subsys_info {
/* Clos Bitmap to keep track of available CLOSids.*/
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c
index 6cf1a16..dd090a7 100644
--- a/arch/x86/kernel/cpu/intel_rdt.c
+++ b/arch/x86/kernel/cpu/intel_rdt.c
@@ -33,6 +33,9 @@ static struct rdt_subsys_info rdtss_info;
 static DEFINE_MUTEX(rdt_group_mutex);
 struct intel_rdt rdt_root_group;
 
+#define rdt_for_each_child(pos_css, parent_ir) \
+   css_for_each_child((pos_css), &(parent_ir)->css)
+
 static inline bool cat_supported(struct cpuinfo_x86 *c)
 {
if (cpu_has(c, X86_FEATURE_CAT_L3))
@@ -84,6 +87,30 @@ static int __init rdt_late_init(void)
 late_initcall(rdt_late_init);
 
 /*
+ * Allocates a new closid from unused closids.
+ * Called with the rdt_group_mutex held.
+ */
+
+static int rdt_alloc_closid(struct intel_rdt *ir)
+{
+   unsigned int id;
+   unsigned int maxid;
+
+   lockdep_assert_held(&rdt_group_mutex);
+
+   maxid = boot_cpu_data.x86_cat_closs;
+   id = find_next_zero_bit(rdtss_info.closmap, maxid, 0);
+   if (id == maxid)
+   return -ENOSPC;
+
+   set_bit(id, rdtss_info.closmap);
+   ccmap[id].cgrp_count++;
+   ir->clos = id;
+
+   return 0;
+}
+
+/*
 * Called with the rdt_group_mutex held.
 */
 static int rdt_free_closid(struct intel_rdt *ir)
@@ -135,8 +162,160 @@ static void rdt_css_free(struct cgroup_subsys_state *css)
mutex_unlock(&rdt_group_mutex);
 }
 
+/*
+ * Tests if atleast two contiguous bits are set.
+ */
+
+static inline bool cbm_is_contiguous(unsigned long var)
+{
+   unsigned long first_bit, zero_bit;
+   unsigned long maxcbm = MAX_CBM_LENGTH;
+
+   if (bitmap_weight(&var, maxcbm) < 2)
+   return false;
+
+   first_bit = find_next_bit(&var, maxcbm, 0);
+   zero_bit = find_next_zero_bit(&var, maxcbm, first_bit);
+
+   if (find_next_bit(&var, maxcbm, zero_bit) < maxcbm)
+   return false;
+
+   return true;
+}
+
+static int cat_cbm_read(struct seq_file *m, void *v)
+{
+   struct intel_rdt *ir = css_rdt(seq_css(m));
+
+   seq_bitmap(m, ir->cbm, MAX_CBM_LENGTH);
+   seq_putc(m, '\n');
+   return 0;
+}
+
+static int validate_cbm(struct intel_rdt *ir, unsigned long cbmvalue)
+{
+   struct intel_rdt *par, *c;
+   struct cgroup_subsys_state *css;
+
+   if (!cbm_is_contiguous(cbmvalue)) {
+   pr_info("cbm should have >= 2 bits and be contiguous\n");
+   return -EINVAL;
+   }
+
+   par = parent_rdt(ir);
+   if (!bitmap_subset(&cbmvalue, par->cbm, MAX_CBM_LENGTH))
+   return -EINVAL;
+
+   rcu_read_lock();
+   rdt_for_each_child(css, ir) {
+   c = css_rdt(css);
+   if (!bitmap_subset(c->cbm, &cbmvalue, MAX_CBM_LENGTH)) {
+   pr_info("Children's mask not a subset\n");
+   rcu_read_unlock();
+   return -EINVAL;
+   }
+   }
+
+   rcu_read_unlock();
+   return 0;
+}
+
+static bool cbm_search(unsigned long cbm, int *closid)
+{
+   int maxid = boot_c