Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-24 Thread Marcelo Tosatti
On Sun, Aug 23, 2015 at 11:47:49AM -0700, Vikas Shivappa wrote:
> 
> 
> On Fri, 21 Aug 2015, Marcelo Tosatti wrote:
> 
> >On Thu, Aug 20, 2015 at 05:06:51PM -0700, Vikas Shivappa wrote:
> >>
> >>
> >>On Mon, 17 Aug 2015, Marcelo Tosatti wrote:
> >>
> >>>Vikas, Tejun,
> >>>
> >>>This is an updated interface. It addresses all comments made
> >>>so far and also covers all use-cases the cgroup interface
> >>>covers.
> >>>
> >>>Let me know what you think. I'll proceed to writing
> >>>the test applications.
> >>>
> >>>Usage model:
> >>>
> >>>
> >>>This document details how CAT technology is
> >>>exposed to userspace.
> >>>
> >>>Each task has a list of task cache reservation entries (TCRE list).
> >>>
> >>>The init process is created with empty TCRE list.
> >>>
> >>>There is a system-wide unique ID space, each TCRE is assigned
> >>>an ID from this space. ID's can be reused (but no two TCREs
> >>>have the same ID at one time).
> >>>
> >>>The interface accomodates transient and independent cache allocation
> >>>adjustments from applications, as well as static cache partitioning
> >>>schemes.
> >>>
> >>>Allocation:
> >>>Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.
> >>>
> >>>A configurable percentage is reserved to tasks with empty TCRE list.
> >
> >Hi Vikas,
> >
> >>And how do you think you will do this without a system controlled
> >>mechanism ?
> >>Everytime in your proposal you include these caveats
> >>which actually mean to include a system controlled interface in the
> >>background ,
> >>and your below interfaces make no mention of this really ! Why do we
> >>want to confuse ourselves like this ?
> >>syscall only interface does not seem to work on its own for the
> >>cache allocation scenario. This can only be a nice to have interface
> >>on top of a system controlled mechanism like cgroup interface. Sure
> >>you can do all the things you did with cgroup with the same with
> >>syscall interface but the point is what are the use cases that cant
> >>be done with this syscall only interface. (ex: to deal with cases
> >>you brought up earlier like when an app does cache intensive work
> >>for some time and later changes - it could use the syscall interface
> >>to quickly reqlinquish the cache lines or change a clos associated
> >>with it)
> >
> >All use cases can be covered with the syscall interface.
> >
> >* How to convert from cgroups interface to syscall interface:
> >Cgroup: Partition cache in cgroups, add tasks to cgroups.
> >Syscall: Partition cache in TCRE, add TCREs to tasks.
> >
> >You build the same structure (task <--> CBM) either via syscall
> >or via cgroups.
> >
> >Please be more specific, can't really see any problem.
> 
> Well at first you mentioned that the cgroup does not support
> specifying size in bytes and percentage and then you eventually
> agreed to my explanation that you can easily write a bash script to
> do the same with cgroup bitmasks. (although i had to go through the
> pain of reading all the proposals you sent without giving a chance
> to explain how it can be used or so). 

Yes we could write the (bytes --to--> cacheways) convertion in
userspace. But since we are going for a different interface, can also
fix that problem as well in the kernel.

> Then you had a confusion in
> how I explained the co mounting of the cpuset and intel_rdt and
> instead of asking a question or pointing out issue, you go ahead and
> write a whole proposal and in the end even say will cook a patch
> before I even try to explain you.

The syscall interface is more flexible.

Why not use a more flexible interface if possible?

> And then you send proposals after proposals 
> which varied from
> modifying the cgroup interface itself to slightly modifying cgroups

Yes, trying to solve the problems our customers will be facing in the field.
So, this proposals are not coming out of thin air.

> and adding syscalls and then also automatically controlling the
> cache alloc (with all your extend mask capabilities) without
> understanding what the framework is meant to do or just asking or
> specifically pointing out any issues in the patch. 

There is a practical problem the "extension" of mask capabilities is 
solving. Check item 6 of the attached text document.

> You had been
> reviewing the cgroup pathes for many versions unlike others who
> accepted they need time to think about it or accepted that they
> maynot understand the feature yet.
> So what is that changed in the patches that is not acceptable now ?

Tejun proposed a syscall interface. He is a right, a syscall interface
is much more flexible. Blame him.

> Many things have been bought up multiple times even after you agreed
> to a solution already proposed. I was only suggesting that this can
> be better and less confusing if you point out the exact issue in the
> patch just like how Thomas or all of the reviewers have been doing.
>
> With the rest of the reviewers I either fix the issue or point out a
> 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-24 Thread Marcelo Tosatti
On Sun, Aug 23, 2015 at 11:47:49AM -0700, Vikas Shivappa wrote:
 
 
 On Fri, 21 Aug 2015, Marcelo Tosatti wrote:
 
 On Thu, Aug 20, 2015 at 05:06:51PM -0700, Vikas Shivappa wrote:
 
 
 On Mon, 17 Aug 2015, Marcelo Tosatti wrote:
 
 Vikas, Tejun,
 
 This is an updated interface. It addresses all comments made
 so far and also covers all use-cases the cgroup interface
 covers.
 
 Let me know what you think. I'll proceed to writing
 the test applications.
 
 Usage model:
 
 
 This document details how CAT technology is
 exposed to userspace.
 
 Each task has a list of task cache reservation entries (TCRE list).
 
 The init process is created with empty TCRE list.
 
 There is a system-wide unique ID space, each TCRE is assigned
 an ID from this space. ID's can be reused (but no two TCREs
 have the same ID at one time).
 
 The interface accomodates transient and independent cache allocation
 adjustments from applications, as well as static cache partitioning
 schemes.
 
 Allocation:
 Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.
 
 A configurable percentage is reserved to tasks with empty TCRE list.
 
 Hi Vikas,
 
 And how do you think you will do this without a system controlled
 mechanism ?
 Everytime in your proposal you include these caveats
 which actually mean to include a system controlled interface in the
 background ,
 and your below interfaces make no mention of this really ! Why do we
 want to confuse ourselves like this ?
 syscall only interface does not seem to work on its own for the
 cache allocation scenario. This can only be a nice to have interface
 on top of a system controlled mechanism like cgroup interface. Sure
 you can do all the things you did with cgroup with the same with
 syscall interface but the point is what are the use cases that cant
 be done with this syscall only interface. (ex: to deal with cases
 you brought up earlier like when an app does cache intensive work
 for some time and later changes - it could use the syscall interface
 to quickly reqlinquish the cache lines or change a clos associated
 with it)
 
 All use cases can be covered with the syscall interface.
 
 * How to convert from cgroups interface to syscall interface:
 Cgroup: Partition cache in cgroups, add tasks to cgroups.
 Syscall: Partition cache in TCRE, add TCREs to tasks.
 
 You build the same structure (task -- CBM) either via syscall
 or via cgroups.
 
 Please be more specific, can't really see any problem.
 
 Well at first you mentioned that the cgroup does not support
 specifying size in bytes and percentage and then you eventually
 agreed to my explanation that you can easily write a bash script to
 do the same with cgroup bitmasks. (although i had to go through the
 pain of reading all the proposals you sent without giving a chance
 to explain how it can be used or so). 

Yes we could write the (bytes --to-- cacheways) convertion in
userspace. But since we are going for a different interface, can also
fix that problem as well in the kernel.

 Then you had a confusion in
 how I explained the co mounting of the cpuset and intel_rdt and
 instead of asking a question or pointing out issue, you go ahead and
 write a whole proposal and in the end even say will cook a patch
 before I even try to explain you.

The syscall interface is more flexible.

Why not use a more flexible interface if possible?

 And then you send proposals after proposals 
 which varied from
 modifying the cgroup interface itself to slightly modifying cgroups

Yes, trying to solve the problems our customers will be facing in the field.
So, this proposals are not coming out of thin air.

 and adding syscalls and then also automatically controlling the
 cache alloc (with all your extend mask capabilities) without
 understanding what the framework is meant to do or just asking or
 specifically pointing out any issues in the patch. 

There is a practical problem the extension of mask capabilities is 
solving. Check item 6 of the attached text document.

 You had been
 reviewing the cgroup pathes for many versions unlike others who
 accepted they need time to think about it or accepted that they
 maynot understand the feature yet.
 So what is that changed in the patches that is not acceptable now ?

Tejun proposed a syscall interface. He is a right, a syscall interface
is much more flexible. Blame him.

 Many things have been bought up multiple times even after you agreed
 to a solution already proposed. I was only suggesting that this can
 be better and less confusing if you point out the exact issue in the
 patch just like how Thomas or all of the reviewers have been doing.

 With the rest of the reviewers I either fix the issue or point out a
 flaw in the review.
 If you dont like cgroup interface now , 
 would be best to indicate or
 discuss the specifics of the shortcommings clearly before sending
 new proposals. 
 That way we can come up with an interface which does
 better and works better in 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-23 Thread Vikas Shivappa



On Fri, 21 Aug 2015, Marcelo Tosatti wrote:


On Thu, Aug 20, 2015 at 05:06:51PM -0700, Vikas Shivappa wrote:



On Mon, 17 Aug 2015, Marcelo Tosatti wrote:


Vikas, Tejun,

This is an updated interface. It addresses all comments made
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing
the test applications.

Usage model:


This document details how CAT technology is
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.


Hi Vikas,


And how do you think you will do this without a system controlled
mechanism ?
Everytime in your proposal you include these caveats
which actually mean to include a system controlled interface in the
background ,
and your below interfaces make no mention of this really ! Why do we
want to confuse ourselves like this ?
syscall only interface does not seem to work on its own for the
cache allocation scenario. This can only be a nice to have interface
on top of a system controlled mechanism like cgroup interface. Sure
you can do all the things you did with cgroup with the same with
syscall interface but the point is what are the use cases that cant
be done with this syscall only interface. (ex: to deal with cases
you brought up earlier like when an app does cache intensive work
for some time and later changes - it could use the syscall interface
to quickly reqlinquish the cache lines or change a clos associated
with it)


All use cases can be covered with the syscall interface.

* How to convert from cgroups interface to syscall interface:
Cgroup: Partition cache in cgroups, add tasks to cgroups.
Syscall: Partition cache in TCRE, add TCREs to tasks.

You build the same structure (task <--> CBM) either via syscall
or via cgroups.

Please be more specific, can't really see any problem.


Well at first you mentioned that the cgroup does not support specifying size in 
bytes and percentage and then you eventually  agreed to my explanation that you 
can easily write a bash script to do the same with cgroup bitmasks. (although i 
had to go through the pain of reading all the proposals you sent without giving 
a chance to explain how it can be used or so). Then you had a confusion in how I 
explained the co mounting of the cpuset 
and intel_rdt and instead of asking a question or pointing out issue, you go 
ahead and write a whole proposal and in the end even say will cook a patch

before I even try to explain you.
And then you send proposals after proposals which varied from modifying the 
cgroup interface itself to slightly modifying cgroups and adding syscalls and 
then also automatically controlling the cache alloc (with all your extend mask 
capabilities) without understanding what the framework is meant to do or just 
asking or specifically pointing out 
any issues in the patch. You had been reviewing the cgroup pathes for 
many versions unlike others who accepted they need time to think about it or 
accepted that they maynot understand the feature yet.
So what is that changed in the patches that is not acceptable now ?  Many things 
have been bought up multiple times even 
after you agreed to a solution already proposed. I was only suggesting that this 
can be better and less confusing if you point out the exact issue in the patch 
just like how Thomas or all of the reviewers have been doing. With the rest of 
the reviewers I either fix the issue or point out a flaw in the review.
If you dont like cgroup interface now , would be best to 
indicate or discuss the specifics of the shortcommings clearly 
before sending new proposals. That way we can come up with an interface which 
does better and works better in linux if we can. Otherwise we may just end up 
adding more code which just does the same thing?


However I have been working on an alternate interface as well and have just sent 
it for your ref.





I have repeatedly listed the use cases that can be dealt with , with
this interface. How will you address the cases like 1.1 and 1.2 with
your syscall only interface ?


Case 1.1:


 1.1> Exclusive access:  The task cannot give *itself* exclusive
access from using the cache. For this it needs to have visibility of
the cache allocation of other tasks and may need to reclaim or
override others cache allocs which is not feasible (isnt that the
ability of a system managing agent?).

Answer: if the application has CAP_SYS_CACHE_RESERVATION, it can
create cache 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-23 Thread Vikas Shivappa



On Fri, 21 Aug 2015, Marcelo Tosatti wrote:


On Thu, Aug 20, 2015 at 05:06:51PM -0700, Vikas Shivappa wrote:



On Mon, 17 Aug 2015, Marcelo Tosatti wrote:


Vikas, Tejun,

This is an updated interface. It addresses all comments made
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing
the test applications.

Usage model:


This document details how CAT technology is
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.


Hi Vikas,


And how do you think you will do this without a system controlled
mechanism ?
Everytime in your proposal you include these caveats
which actually mean to include a system controlled interface in the
background ,
and your below interfaces make no mention of this really ! Why do we
want to confuse ourselves like this ?
syscall only interface does not seem to work on its own for the
cache allocation scenario. This can only be a nice to have interface
on top of a system controlled mechanism like cgroup interface. Sure
you can do all the things you did with cgroup with the same with
syscall interface but the point is what are the use cases that cant
be done with this syscall only interface. (ex: to deal with cases
you brought up earlier like when an app does cache intensive work
for some time and later changes - it could use the syscall interface
to quickly reqlinquish the cache lines or change a clos associated
with it)


All use cases can be covered with the syscall interface.

* How to convert from cgroups interface to syscall interface:
Cgroup: Partition cache in cgroups, add tasks to cgroups.
Syscall: Partition cache in TCRE, add TCREs to tasks.

You build the same structure (task -- CBM) either via syscall
or via cgroups.

Please be more specific, can't really see any problem.


Well at first you mentioned that the cgroup does not support specifying size in 
bytes and percentage and then you eventually  agreed to my explanation that you 
can easily write a bash script to do the same with cgroup bitmasks. (although i 
had to go through the pain of reading all the proposals you sent without giving 
a chance to explain how it can be used or so). Then you had a confusion in how I 
explained the co mounting of the cpuset 
and intel_rdt and instead of asking a question or pointing out issue, you go 
ahead and write a whole proposal and in the end even say will cook a patch

before I even try to explain you.
And then you send proposals after proposals which varied from modifying the 
cgroup interface itself to slightly modifying cgroups and adding syscalls and 
then also automatically controlling the cache alloc (with all your extend mask 
capabilities) without understanding what the framework is meant to do or just 
asking or specifically pointing out 
any issues in the patch. You had been reviewing the cgroup pathes for 
many versions unlike others who accepted they need time to think about it or 
accepted that they maynot understand the feature yet.
So what is that changed in the patches that is not acceptable now ?  Many things 
have been bought up multiple times even 
after you agreed to a solution already proposed. I was only suggesting that this 
can be better and less confusing if you point out the exact issue in the patch 
just like how Thomas or all of the reviewers have been doing. With the rest of 
the reviewers I either fix the issue or point out a flaw in the review.
If you dont like cgroup interface now , would be best to 
indicate or discuss the specifics of the shortcommings clearly 
before sending new proposals. That way we can come up with an interface which 
does better and works better in linux if we can. Otherwise we may just end up 
adding more code which just does the same thing?


However I have been working on an alternate interface as well and have just sent 
it for your ref.





I have repeatedly listed the use cases that can be dealt with , with
this interface. How will you address the cases like 1.1 and 1.2 with
your syscall only interface ?


Case 1.1:


 1.1 Exclusive access:  The task cannot give *itself* exclusive
access from using the cache. For this it needs to have visibility of
the cache allocation of other tasks and may need to reclaim or
override others cache allocs which is not feasible (isnt that the
ability of a system managing agent?).

Answer: if the application has CAP_SYS_CACHE_RESERVATION, it can
create cache 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-21 Thread Marcelo Tosatti
On Thu, Aug 20, 2015 at 05:06:51PM -0700, Vikas Shivappa wrote:
> 
> 
> On Mon, 17 Aug 2015, Marcelo Tosatti wrote:
> 
> >Vikas, Tejun,
> >
> >This is an updated interface. It addresses all comments made
> >so far and also covers all use-cases the cgroup interface
> >covers.
> >
> >Let me know what you think. I'll proceed to writing
> >the test applications.
> >
> >Usage model:
> >
> >
> >This document details how CAT technology is
> >exposed to userspace.
> >
> >Each task has a list of task cache reservation entries (TCRE list).
> >
> >The init process is created with empty TCRE list.
> >
> >There is a system-wide unique ID space, each TCRE is assigned
> >an ID from this space. ID's can be reused (but no two TCREs
> >have the same ID at one time).
> >
> >The interface accomodates transient and independent cache allocation
> >adjustments from applications, as well as static cache partitioning
> >schemes.
> >
> >Allocation:
> >Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.
> >
> >A configurable percentage is reserved to tasks with empty TCRE list.

Hi Vikas,

> And how do you think you will do this without a system controlled
> mechanism ?
> Everytime in your proposal you include these caveats
> which actually mean to include a system controlled interface in the
> background ,
> and your below interfaces make no mention of this really ! Why do we
> want to confuse ourselves like this ?
> syscall only interface does not seem to work on its own for the
> cache allocation scenario. This can only be a nice to have interface
> on top of a system controlled mechanism like cgroup interface. Sure
> you can do all the things you did with cgroup with the same with
> syscall interface but the point is what are the use cases that cant
> be done with this syscall only interface. (ex: to deal with cases
> you brought up earlier like when an app does cache intensive work
> for some time and later changes - it could use the syscall interface
> to quickly reqlinquish the cache lines or change a clos associated
> with it)

All use cases can be covered with the syscall interface.

* How to convert from cgroups interface to syscall interface:
Cgroup: Partition cache in cgroups, add tasks to cgroups.
Syscall: Partition cache in TCRE, add TCREs to tasks.

You build the same structure (task <--> CBM) either via syscall
or via cgroups.

Please be more specific, can't really see any problem.

> I have repeatedly listed the use cases that can be dealt with , with
> this interface. How will you address the cases like 1.1 and 1.2 with
> your syscall only interface ? 

Case 1.1:


  1.1> Exclusive access:  The task cannot give *itself* exclusive
access from using the cache. For this it needs to have visibility of
the cache allocation of other tasks and may need to reclaim or
override others cache allocs which is not feasible (isnt that the
ability of a system managing agent?).

Answer: if the application has CAP_SYS_CACHE_RESERVATION, it can
create cache allocation and remove cache allocation from 
other applications. So only the administrator could do it.

Case 1.2 answer below.

> So we expect all the millions of apps
> like SAP, oracle etc and etc and all the millions of app developers
> to magically learn our new syscall interface and also cooperate
> between themselves to decide a cache allocation that is agreeable to
> all ?  (which btw the interface doesnt list below how to do it) and

They don't have to: the administrator can use "cacheset" application.

If an application wants to control the cache, it can.

> then by some godly powers the noisly neighbour will decide himself
> to give up the cache ?

I suppose you imagine something like this:
http://arxiv.org/pdf/1410.6513.pdf

No, the syscall interface does not need to care about that because:

* If you can set cache (CAP_SYS_CACHE_RESERVATION capability), 
you can remove cache reservation from your neighbours.

So this problem does not exist (it assumes participants are
cooperative).

There is one confusion in the argument for cases 1.1 and case 1.2:
that applications are supposed to include in their decision of cache
allocation size the status of the system as a whole. This is a flawed
argument. Please point specifically if this is not the case or if there
is another case still not covered.

It would be possible to partition the cache into watermarks such
as: 

task group A - can reserve up to 20% of cache.
task group B - can reserve up to 25% of cache.
task group C - can reserve 50% of cache.

But i am not sure... Tejun, do you think that is necessary?
(CAP_SYS_CACHE_RESERVATION is good enough for our usecases).

>  (that should be first ever app to not request
> more resource in the world for himself and hurt his own performance
> - they surely dont want to do social service !)
>
> And how do we do the case 1.5 where the administrator want to assign
> cache to specific VMs in a cloud etc - with the hypothetical syscall
> 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-21 Thread Marcelo Tosatti
On Thu, Aug 20, 2015 at 05:06:51PM -0700, Vikas Shivappa wrote:
 
 
 On Mon, 17 Aug 2015, Marcelo Tosatti wrote:
 
 Vikas, Tejun,
 
 This is an updated interface. It addresses all comments made
 so far and also covers all use-cases the cgroup interface
 covers.
 
 Let me know what you think. I'll proceed to writing
 the test applications.
 
 Usage model:
 
 
 This document details how CAT technology is
 exposed to userspace.
 
 Each task has a list of task cache reservation entries (TCRE list).
 
 The init process is created with empty TCRE list.
 
 There is a system-wide unique ID space, each TCRE is assigned
 an ID from this space. ID's can be reused (but no two TCREs
 have the same ID at one time).
 
 The interface accomodates transient and independent cache allocation
 adjustments from applications, as well as static cache partitioning
 schemes.
 
 Allocation:
 Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.
 
 A configurable percentage is reserved to tasks with empty TCRE list.

Hi Vikas,

 And how do you think you will do this without a system controlled
 mechanism ?
 Everytime in your proposal you include these caveats
 which actually mean to include a system controlled interface in the
 background ,
 and your below interfaces make no mention of this really ! Why do we
 want to confuse ourselves like this ?
 syscall only interface does not seem to work on its own for the
 cache allocation scenario. This can only be a nice to have interface
 on top of a system controlled mechanism like cgroup interface. Sure
 you can do all the things you did with cgroup with the same with
 syscall interface but the point is what are the use cases that cant
 be done with this syscall only interface. (ex: to deal with cases
 you brought up earlier like when an app does cache intensive work
 for some time and later changes - it could use the syscall interface
 to quickly reqlinquish the cache lines or change a clos associated
 with it)

All use cases can be covered with the syscall interface.

* How to convert from cgroups interface to syscall interface:
Cgroup: Partition cache in cgroups, add tasks to cgroups.
Syscall: Partition cache in TCRE, add TCREs to tasks.

You build the same structure (task -- CBM) either via syscall
or via cgroups.

Please be more specific, can't really see any problem.

 I have repeatedly listed the use cases that can be dealt with , with
 this interface. How will you address the cases like 1.1 and 1.2 with
 your syscall only interface ? 

Case 1.1:


  1.1 Exclusive access:  The task cannot give *itself* exclusive
access from using the cache. For this it needs to have visibility of
the cache allocation of other tasks and may need to reclaim or
override others cache allocs which is not feasible (isnt that the
ability of a system managing agent?).

Answer: if the application has CAP_SYS_CACHE_RESERVATION, it can
create cache allocation and remove cache allocation from 
other applications. So only the administrator could do it.

Case 1.2 answer below.

 So we expect all the millions of apps
 like SAP, oracle etc and etc and all the millions of app developers
 to magically learn our new syscall interface and also cooperate
 between themselves to decide a cache allocation that is agreeable to
 all ?  (which btw the interface doesnt list below how to do it) and

They don't have to: the administrator can use cacheset application.

If an application wants to control the cache, it can.

 then by some godly powers the noisly neighbour will decide himself
 to give up the cache ?

I suppose you imagine something like this:
http://arxiv.org/pdf/1410.6513.pdf

No, the syscall interface does not need to care about that because:

* If you can set cache (CAP_SYS_CACHE_RESERVATION capability), 
you can remove cache reservation from your neighbours.

So this problem does not exist (it assumes participants are
cooperative).

There is one confusion in the argument for cases 1.1 and case 1.2:
that applications are supposed to include in their decision of cache
allocation size the status of the system as a whole. This is a flawed
argument. Please point specifically if this is not the case or if there
is another case still not covered.

It would be possible to partition the cache into watermarks such
as: 

task group A - can reserve up to 20% of cache.
task group B - can reserve up to 25% of cache.
task group C - can reserve 50% of cache.

But i am not sure... Tejun, do you think that is necessary?
(CAP_SYS_CACHE_RESERVATION is good enough for our usecases).

  (that should be first ever app to not request
 more resource in the world for himself and hurt his own performance
 - they surely dont want to do social service !)

 And how do we do the case 1.5 where the administrator want to assign
 cache to specific VMs in a cloud etc - with the hypothetical syscall
 interface we now should expect all the apps to do the above and now
 they also need to know where they run 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-20 Thread Vikas Shivappa



On Thu, 20 Aug 2015, Vikas Shivappa wrote:




On Mon, 17 Aug 2015, Marcelo Tosatti wrote:


Vikas, Tejun,

This is an updated interface. It addresses all comments made
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing
the test applications.

Usage model:


This document details how CAT technology is
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.


And how do you think you will do this without a system controlled mechanism ? 
Everytime in your proposal you include these caveats which actually mean to 
include a system controlled interface in the background ,
and your below interfaces make no mention of this really ! Why do we want to 
confuse ourselves like this ?


syscall only interface does not seem to work on its own for the cache 
allocation scenario. This can only be a nice to have interface on top of a 
system controlled mechanism like cgroup interface. Sure you can do all the 
things you did with cgroup with the same with syscall interface but the point 
is what are the use cases that cant be done with this syscall only interface. 
(ex: to deal with cases you brought up earlier like when an app does cache 
intensive work for some time and later changes - it could use the syscall 
interface to quickly reqlinquish the cache lines or change a clos associated 
with it)


I have repeatedly listed the use cases that can be dealt with , with this 

big typo - 'use cases that cannot be dealt with'

interface. How will you address the cases like 1.1 and 1.2 with your syscall 
only interface ? So we expect all the millions of apps like SAP, oracle etc 
and etc and all the millions of app developers to magically learn our new 
syscall interface and also cooperate between themselves to decide a cache 
allocation that is agreeable to all ?  (which btw the interface doesnt list 
below how to do it) and then by some godly powers the noisly neighbour will 
decide himself to give up the cache ? (that should be first ever app to not 
request more resource in the world for himself and hurt his own performance - 
they surely dont want to do social service !)


And how do we do the case 1.5 where the administrator want to assign cache to 
specific VMs in a cloud etc - with the hypothetical syscall interface we now 
should expect all the apps to do the above and now they also need to know 
where they run (what VM , what socket etc) and then decide and cooperate an 
allocation : compare this to a container environment like rancher where today 
the admin can convinetly use docker underneath to allocate 
mem/storage/compute to containers and easily extend this to include shared 
l3.


http://marc.info/?l=linux-kernel=143889397419199

without addressing the above the details of the interface below is irrelavant 
-


Your initial request was to extend the cgroup interface to include rounding 
off the size of cache (which can easily be done with a bash script on top of 
cgroup interface !) and now you are proposing a syscall only interface ? this 
is very confusing and will only unnecessarily delay the process without 
adding any value.


however like i mentioned the syscall interface or user/app being able to 
modify the cache alloc could be used to address some very specific use cases 
on top an existing system managed interface. This is not really a common case 
in cloud or container environment and neither a feasible deployable solution.
Just consider the millions of apps that have to transition to such an 
interface to even use it - if thats the only way to do it, thats dead on 
arrival.


Also please donot include kernel automatically adjusting resources in your 
reply as thats totally irrelavent and again more confusing as we have already 
exchanged some >100 emails on this same patch version without meaning 
anything so far.


The debate is purely between a syscall only interface and a system manageable 
interface(like cgroup where admin or a central entity controls the 
resources). If not define what is it first before going into details.


Thanks,
Vikas



On fork, the child inherits the TCR from its parent.

Semantics:
Once a TCRE is created and assigned to a task, that task has
guaranteed reservation on any CPU where its scheduled in,
for the lifetime of the TCRE.

A task can have its TCR list modified without notification.

FIXME: Add a per-task flag to not copy the TCR list of a task but 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-20 Thread Vikas Shivappa



On Mon, 17 Aug 2015, Marcelo Tosatti wrote:


Vikas, Tejun,

This is an updated interface. It addresses all comments made
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing
the test applications.

Usage model:


This document details how CAT technology is
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.


And how do you think you will do this without a system controlled mechanism ? 
Everytime in your proposal you include these caveats which actually mean to 
include a system controlled interface in the background ,
and your below interfaces make no mention of this really ! Why do we want to 
confuse ourselves like this ?


syscall only interface does not seem to work on its own for the cache 
allocation scenario. This can only be a nice 
to have interface on top of a system controlled mechanism like cgroup interface. 
Sure you can do all the things you did with cgroup with the same with syscall 
interface but the point is what are the use cases that cant be done with this 
syscall only interface. (ex: to deal with cases you brought up earlier like when 
an app does cache intensive work for some time and later changes - it could use 
the syscall interface to quickly reqlinquish the cache lines or change a clos 
associated with it)


I have repeatedly listed the use cases that can be dealt with , with this 
interface. How will you address the cases like 1.1 and 1.2 with your syscall 
only interface ? So we expect all the millions of apps like SAP, oracle etc and 
etc and all the millions of app developers to magically learn our new syscall 
interface and also cooperate between themselves to decide a cache allocation 
that is agreeable to all ?  (which btw the interface doesnt list below how to do 
it) and then by some godly powers the noisly neighbour will decide himself to 
give up the cache ? (that should be first ever app to not request more resource 
in the world for himself and hurt his own performance - they surely dont want 
to do social service !)


And how do we do the case 1.5 where the administrator want to assign cache to 
specific VMs in a cloud etc - with the hypothetical syscall interface we now 
should expect all the apps to do the above and now they also need to know where they run (what 
VM , what socket etc) and then decide and cooperate an allocation : compare this 
to a container environment like rancher where today the admin can convinetly use 
docker underneath to allocate mem/storage/compute to containers and easily 
extend this to include shared l3.


http://marc.info/?l=linux-kernel=143889397419199

without addressing the above the details of the interface below is irrelavant -

Your initial request was to extend the cgroup interface to include rounding off 
the size of cache (which can easily be done with a bash script on top of cgroup 
interface !) and now you are proposing a syscall only interface ? this is 
very confusing and will only unnecessarily delay the process without adding any 
value.


however like i mentioned the syscall interface or user/app being able to modify 
the cache alloc could be used to address some very specific use 
cases on top an existing system managed interface. This is not really a common 
case in cloud or container environment and neither a feasible deployable 
solution.
Just consider the millions of apps that have to transition to such an interface 
to even use it - if thats the only way to do it, thats dead on arrival.


Also please donot include kernel automatically adjusting resources in your reply 
as thats totally irrelavent and again more confusing as we have already 
exchanged some >100 emails on this same patch version without meaning anything 
so far.


The debate is purely between a syscall only 
interface and a system manageable interface(like cgroup where admin or a central 
entity controls the resources). If not define what is it first before going into 
details.


Thanks,
Vikas



On fork, the child inherits the TCR from its parent.

Semantics:
Once a TCRE is created and assigned to a task, that task has
guaranteed reservation on any CPU where its scheduled in,
for the lifetime of the TCRE.

A task can have its TCR list modified without notification.

FIXME: Add a per-task flag to not copy the TCR list of a task but delete
all TCR's on fork.

Interface:

enum cache_rsvt_flags {
  CACHE_RSVT_ROUND_DOWN   =  (1 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-20 Thread Vikas Shivappa



On Mon, 17 Aug 2015, Marcelo Tosatti wrote:


Vikas, Tejun,

This is an updated interface. It addresses all comments made
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing
the test applications.

Usage model:


This document details how CAT technology is
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.


And how do you think you will do this without a system controlled mechanism ? 
Everytime in your proposal you include these caveats which actually mean to 
include a system controlled interface in the background ,
and your below interfaces make no mention of this really ! Why do we want to 
confuse ourselves like this ?


syscall only interface does not seem to work on its own for the cache 
allocation scenario. This can only be a nice 
to have interface on top of a system controlled mechanism like cgroup interface. 
Sure you can do all the things you did with cgroup with the same with syscall 
interface but the point is what are the use cases that cant be done with this 
syscall only interface. (ex: to deal with cases you brought up earlier like when 
an app does cache intensive work for some time and later changes - it could use 
the syscall interface to quickly reqlinquish the cache lines or change a clos 
associated with it)


I have repeatedly listed the use cases that can be dealt with , with this 
interface. How will you address the cases like 1.1 and 1.2 with your syscall 
only interface ? So we expect all the millions of apps like SAP, oracle etc and 
etc and all the millions of app developers to magically learn our new syscall 
interface and also cooperate between themselves to decide a cache allocation 
that is agreeable to all ?  (which btw the interface doesnt list below how to do 
it) and then by some godly powers the noisly neighbour will decide himself to 
give up the cache ? (that should be first ever app to not request more resource 
in the world for himself and hurt his own performance - they surely dont want 
to do social service !)


And how do we do the case 1.5 where the administrator want to assign cache to 
specific VMs in a cloud etc - with the hypothetical syscall interface we now 
should expect all the apps to do the above and now they also need to know where they run (what 
VM , what socket etc) and then decide and cooperate an allocation : compare this 
to a container environment like rancher where today the admin can convinetly use 
docker underneath to allocate mem/storage/compute to containers and easily 
extend this to include shared l3.


http://marc.info/?l=linux-kernelm=143889397419199

without addressing the above the details of the interface below is irrelavant -

Your initial request was to extend the cgroup interface to include rounding off 
the size of cache (which can easily be done with a bash script on top of cgroup 
interface !) and now you are proposing a syscall only interface ? this is 
very confusing and will only unnecessarily delay the process without adding any 
value.


however like i mentioned the syscall interface or user/app being able to modify 
the cache alloc could be used to address some very specific use 
cases on top an existing system managed interface. This is not really a common 
case in cloud or container environment and neither a feasible deployable 
solution.
Just consider the millions of apps that have to transition to such an interface 
to even use it - if thats the only way to do it, thats dead on arrival.


Also please donot include kernel automatically adjusting resources in your reply 
as thats totally irrelavent and again more confusing as we have already 
exchanged some 100 emails on this same patch version without meaning anything 
so far.


The debate is purely between a syscall only 
interface and a system manageable interface(like cgroup where admin or a central 
entity controls the resources). If not define what is it first before going into 
details.


Thanks,
Vikas



On fork, the child inherits the TCR from its parent.

Semantics:
Once a TCRE is created and assigned to a task, that task has
guaranteed reservation on any CPU where its scheduled in,
for the lifetime of the TCRE.

A task can have its TCR list modified without notification.

FIXME: Add a per-task flag to not copy the TCR list of a task but delete
all TCR's on fork.

Interface:

enum cache_rsvt_flags {
  CACHE_RSVT_ROUND_DOWN   =  (1  

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-20 Thread Vikas Shivappa



On Thu, 20 Aug 2015, Vikas Shivappa wrote:




On Mon, 17 Aug 2015, Marcelo Tosatti wrote:


Vikas, Tejun,

This is an updated interface. It addresses all comments made
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing
the test applications.

Usage model:


This document details how CAT technology is
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.


And how do you think you will do this without a system controlled mechanism ? 
Everytime in your proposal you include these caveats which actually mean to 
include a system controlled interface in the background ,
and your below interfaces make no mention of this really ! Why do we want to 
confuse ourselves like this ?


syscall only interface does not seem to work on its own for the cache 
allocation scenario. This can only be a nice to have interface on top of a 
system controlled mechanism like cgroup interface. Sure you can do all the 
things you did with cgroup with the same with syscall interface but the point 
is what are the use cases that cant be done with this syscall only interface. 
(ex: to deal with cases you brought up earlier like when an app does cache 
intensive work for some time and later changes - it could use the syscall 
interface to quickly reqlinquish the cache lines or change a clos associated 
with it)


I have repeatedly listed the use cases that can be dealt with , with this 

big typo - 'use cases that cannot be dealt with'

interface. How will you address the cases like 1.1 and 1.2 with your syscall 
only interface ? So we expect all the millions of apps like SAP, oracle etc 
and etc and all the millions of app developers to magically learn our new 
syscall interface and also cooperate between themselves to decide a cache 
allocation that is agreeable to all ?  (which btw the interface doesnt list 
below how to do it) and then by some godly powers the noisly neighbour will 
decide himself to give up the cache ? (that should be first ever app to not 
request more resource in the world for himself and hurt his own performance - 
they surely dont want to do social service !)


And how do we do the case 1.5 where the administrator want to assign cache to 
specific VMs in a cloud etc - with the hypothetical syscall interface we now 
should expect all the apps to do the above and now they also need to know 
where they run (what VM , what socket etc) and then decide and cooperate an 
allocation : compare this to a container environment like rancher where today 
the admin can convinetly use docker underneath to allocate 
mem/storage/compute to containers and easily extend this to include shared 
l3.


http://marc.info/?l=linux-kernelm=143889397419199

without addressing the above the details of the interface below is irrelavant 
-


Your initial request was to extend the cgroup interface to include rounding 
off the size of cache (which can easily be done with a bash script on top of 
cgroup interface !) and now you are proposing a syscall only interface ? this 
is very confusing and will only unnecessarily delay the process without 
adding any value.


however like i mentioned the syscall interface or user/app being able to 
modify the cache alloc could be used to address some very specific use cases 
on top an existing system managed interface. This is not really a common case 
in cloud or container environment and neither a feasible deployable solution.
Just consider the millions of apps that have to transition to such an 
interface to even use it - if thats the only way to do it, thats dead on 
arrival.


Also please donot include kernel automatically adjusting resources in your 
reply as thats totally irrelavent and again more confusing as we have already 
exchanged some 100 emails on this same patch version without meaning 
anything so far.


The debate is purely between a syscall only interface and a system manageable 
interface(like cgroup where admin or a central entity controls the 
resources). If not define what is it first before going into details.


Thanks,
Vikas



On fork, the child inherits the TCR from its parent.

Semantics:
Once a TCRE is created and assigned to a task, that task has
guaranteed reservation on any CPU where its scheduled in,
for the lifetime of the TCRE.

A task can have its TCR list modified without notification.

FIXME: Add a per-task flag to not copy the TCR list of a task but 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-17 Thread Marcelo Tosatti
Vikas, Tejun,

This is an updated interface. It addresses all comments made 
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing 
the test applications.

Usage model:


This document details how CAT technology is 
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs 
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.

On fork, the child inherits the TCR from its parent.

Semantics:
Once a TCRE is created and assigned to a task, that task has 
guaranteed reservation on any CPU where its scheduled in,
for the lifetime of the TCRE.

A task can have its TCR list modified without notification.

FIXME: Add a per-task flag to not copy the TCR list of a task but delete
all TCR's on fork.

Interface:

enum cache_rsvt_flags {
   CACHE_RSVT_ROUND_DOWN   =  (1 << 0),/* round "kbytes" down */
};

enum cache_rsvt_type {
   CACHE_RSVT_TYPE_CODE = 0,  /* cache reservation is for code */
   CACHE_RSVT_TYPE_DATA,  /* cache reservation is for data */
   CACHE_RSVT_TYPE_BOTH,  /* cache reservation is for code and data */
};

struct cache_reservation {
unsigned long kbytes;
int type;
int flags;
int trcid;
};

The following syscalls modify the TCR of a task:

* int sys_create_cache_reservation(struct cache_reservation *rsvt);
DESCRIPTION: Creates a cache reservation entry, and assigns 
it to the current task.

returns -ENOMEM if not enough space, -EPERM if no permission.
returns 0 if reservation has been successful, copying actual
number of kbytes reserved to "kbytes", type to type, and tcrid.

* int sys_delete_cache_reservation(struct cache_reservation *rsvt);
DESCRIPTION: Deletes a cache reservation entry, deassigning it
from any task.

Backward compatibility for processors with no support for code/data
differentiation: by default code and data cache allocation types
fallback to CACHE_RSVT_TYPE_BOTH on older processors (and return the
information that they done so via "flags").

* int sys_attach_cache_reservation(pid_t pid, unsigned int tcrid);
DESCRIPTION: Attaches cache reservation identified by "tcrid" to 
task by identified by pid.
returns 0 if successful.

* int sys_detach_cache_reservation(pid_t pid, unsigned int tcrid);
DESCRIPTION: Detaches cache reservation identified by "tcrid" to 
task by identified pid.

The following syscalls list the TCRs:
* int sys_get_cache_reservations(size_t size, struct cache_reservation list[]);
DESCRIPTION: Return all cache reservations in the system.
Size should be set to the maximum number of items that can be stored
in the buffer pointed to by list.

* int sys_get_tcrid_tasks(unsigned int tcrid, size_t size, pid_t list[]);
DESCRIPTION: Return which pids are associated to tcrid.

* sys_get_pid_cache_reservations(pid_t pid, size_t size,
 struct cache_reservation list[]);
DESCRIPTION: Return all cache reservations associated with "pid".
Size should be set to the maximum number of items that can be stored
in the buffer pointed to by list.

* sys_get_cache_reservation_info()
DESCRIPTION: ioctl to retrieve hardware info: cache round size, whether
code/data separation is supported.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-17 Thread Marcelo Tosatti
Vikas, Tejun,

This is an updated interface. It addresses all comments made 
so far and also covers all use-cases the cgroup interface
covers.

Let me know what you think. I'll proceed to writing 
the test applications.

Usage model:


This document details how CAT technology is 
exposed to userspace.

Each task has a list of task cache reservation entries (TCRE list).

The init process is created with empty TCRE list.

There is a system-wide unique ID space, each TCRE is assigned
an ID from this space. ID's can be reused (but no two TCREs 
have the same ID at one time).

The interface accomodates transient and independent cache allocation
adjustments from applications, as well as static cache partitioning
schemes.

Allocation:
Usage of the system calls require CAP_SYS_CACHE_RESERVATION capability.

A configurable percentage is reserved to tasks with empty TCRE list.

On fork, the child inherits the TCR from its parent.

Semantics:
Once a TCRE is created and assigned to a task, that task has 
guaranteed reservation on any CPU where its scheduled in,
for the lifetime of the TCRE.

A task can have its TCR list modified without notification.

FIXME: Add a per-task flag to not copy the TCR list of a task but delete
all TCR's on fork.

Interface:

enum cache_rsvt_flags {
   CACHE_RSVT_ROUND_DOWN   =  (1  0),/* round kbytes down */
};

enum cache_rsvt_type {
   CACHE_RSVT_TYPE_CODE = 0,  /* cache reservation is for code */
   CACHE_RSVT_TYPE_DATA,  /* cache reservation is for data */
   CACHE_RSVT_TYPE_BOTH,  /* cache reservation is for code and data */
};

struct cache_reservation {
unsigned long kbytes;
int type;
int flags;
int trcid;
};

The following syscalls modify the TCR of a task:

* int sys_create_cache_reservation(struct cache_reservation *rsvt);
DESCRIPTION: Creates a cache reservation entry, and assigns 
it to the current task.

returns -ENOMEM if not enough space, -EPERM if no permission.
returns 0 if reservation has been successful, copying actual
number of kbytes reserved to kbytes, type to type, and tcrid.

* int sys_delete_cache_reservation(struct cache_reservation *rsvt);
DESCRIPTION: Deletes a cache reservation entry, deassigning it
from any task.

Backward compatibility for processors with no support for code/data
differentiation: by default code and data cache allocation types
fallback to CACHE_RSVT_TYPE_BOTH on older processors (and return the
information that they done so via flags).

* int sys_attach_cache_reservation(pid_t pid, unsigned int tcrid);
DESCRIPTION: Attaches cache reservation identified by tcrid to 
task by identified by pid.
returns 0 if successful.

* int sys_detach_cache_reservation(pid_t pid, unsigned int tcrid);
DESCRIPTION: Detaches cache reservation identified by tcrid to 
task by identified pid.

The following syscalls list the TCRs:
* int sys_get_cache_reservations(size_t size, struct cache_reservation list[]);
DESCRIPTION: Return all cache reservations in the system.
Size should be set to the maximum number of items that can be stored
in the buffer pointed to by list.

* int sys_get_tcrid_tasks(unsigned int tcrid, size_t size, pid_t list[]);
DESCRIPTION: Return which pids are associated to tcrid.

* sys_get_pid_cache_reservations(pid_t pid, size_t size,
 struct cache_reservation list[]);
DESCRIPTION: Return all cache reservations associated with pid.
Size should be set to the maximum number of items that can be stored
in the buffer pointed to by list.

* sys_get_cache_reservation_info()
DESCRIPTION: ioctl to retrieve hardware info: cache round size, whether
code/data separation is supported.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-07 Thread Tejun Heo
Hello,

On Thu, Aug 06, 2015 at 01:58:39PM -0700, Vikas Shivappa wrote:
> >I'm having hard time believing that.  There definitely are use cases
> >where cachelines are trashed among service threads.  Are you
> >proclaiming that those cases aren't gonna be supported?
> 
> Please refer to the noisy neighbour example i give here to help resolve
> thrashing by a noisy neighbour -
> http://marc.info/?l=linux-kernel=143889397419199

I don't think that's relevant to the discussion.  Implement a taskset
like tool and the administrator can deal with it just fine.  As I
wrote multiple times now, people have been dealing with CPU affinity
fine w/o cgroups.  Sure, cgroups do add on top but it's an a lot more
complex facility and not a replacement for a more basic control
mechanism.

> >>- This interface like you said can easily bolt-on. basically an easy to use
> >>interface without worrying about the architectural details.
> >
> >But it's ripe with architectural details.
> 
> If specifying the bitmask is an issue , it can easily be addressed by
> writing a script which calculates the bitmask to size - like mentioned here
> http://marc.info/?l=linux-kernel=143889397419199

Let's say we fully virtualize cache partitioning so that each user can
express what they want and the kernel can compute and manage the
closest mapping supportable by the underlying hardware.  That should
be doable but I don't think that's what we want at this point.  This,
at least for now, is a niche feature which requires specific
configurations to be useful and while useful to certain narrow use
cases unlikely to be used across the board.  Given that, we don't want
to overengineer the solution.  Implement something simple and
specific.  We don't yet even know the full usefulness or use cases of
the feature.  It doesn't make sense to overcommit to complex
abstractions and mechanisms when there's a fairly good chance that our
understanding of the problem itself is very porous.

This applies the same to making it part of cgroups.  It's a lot more
complex and we end up committing a lot more than implementing
something simple and specific.  Let's please keep it simple.

> >I'm not saying they are mutually exclusive but that we're going
> >overboard in this direction when programmable interface should be the
> >priority.  While this mostly happened naturally for other resources
> >because cgroups was introduced later but I think there's a general
> >rule to follow there.
> 
> Right , the cache allocation cannot be treated like memory like explained
> here in 1.3 and 1.4
> http://marc.info/?l=linux-kernel=143889397419199

Who said that it could be?  If it actually were a resource which is as
ubiquitous, flexible and dividable as memory, cgroups would be an a
lot better fit.

> >If you factor in threads of a process, the above model is
> >fundamentally flawed.  How would root or any external entity find out
> >what threads are to be allocated what?
> 
> the process ID can be added to the cgroup together with all its threads as
> shown in example of cgroup usage in (2) here -

And how does an external entity find out which ID should be put where?
This is a knowledge only known to the process itself.  That's what I
meant by going this route requires individual applications
communicating with external agents.

> In most cases in the cloud you will be able to decide based on what
> workloads are running - see the example 1.5 here
> 
> http://marc.info/?l=linux-kernel=143889397419199

Sure, that's an way outer scope.  The point was that this can't handle
in-process scope.

> Each application would
> >constnatly have to tell an external agent about what its intentions
> >are.  This might seem to work in a limited feature testing setup where
> >you know everything about who's doing what but is no way a widely
> >deployable solution.  This pretty much degenerates into #3 you listed
> >below.
> 
> App may not be the best one to decide 1.1 and 1.2 here
> http://marc.info/?l=linux-kernel=143889397419199

That paragraph just shows how little is understood, so you can't
imagine a situation where threads of a process agree upon how they'll
use cache to improve performance?  Threads of the same program do
things like this all the time with different types of resources.  This
is a large portion of what server software programmers do - making the
threads and other components behave in a way that maxmizes the
efficacy of the underlying system.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-07 Thread Marcelo Tosatti
On Thu, Aug 06, 2015 at 01:46:06PM -0700, Vikas Shivappa wrote:
> 
> 
> On Wed, 5 Aug 2015, Marcelo Tosatti wrote:
> 
> >On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:
> >>On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:
> >>>
> >>>But we're doing it the wrong way around.  You can do most of what
> >>>cgroup interface can do with systemcall-like interface with some
> >>>inconvenience.  The other way doesn't really work.  As I wrote in the
> >>>other reply, cgroups is a horrible programmable interface and we don't
> >>>want individual applications to interact with it directly and CAT's
> >>>use cases most definitely include each application programming its own
> >>>cache mask.
> >>
> >>I wager that this assertion is wrong. Having individual applications
> >>program their own cache mask is not going to be the most common
> >>scenario.
> >
> >What i like about the syscall interface is that it moves the knowledge
> >of cache behaviour close to the application launching (or inside it),
> >which allows the following common scenario, say on a multi purpose
> >desktop:
> >
> >Event: launch high performance application: use cache reservation, finish
> >quickly.
> >Event: cache hog application: do not thrash the cache.
> >
> >The two cache reservations are logically unrelated in terms of
> >configuration, and configured separately do not affect each other.
> 
> There could be several issues to let apps allocate the cache
> themselves. We just cannot treat the cache alloc just like memory
> allocation, please consider the scenarios below:
> 
> all examples consider cache size : 10MB. cbm max bits : 10
> 
> 
>   (1)user programmable syscall:
> 
>   1.1> Exclusive access:  The task cannot give *itself* exclusive
> access from using the cache. For this it needs to have visibility of
> the cache allocation of other tasks and may need to reclaim or
> override others cache allocs which is not feasible (isnt that the
> ability of a system managing agent?).

Different allocation of the resource (cache in this case) causes 
different cache miss patterns and therefore different results.

>   eg:
> app1... 10 ask for 1MB of exclusive cache each.
> they get it as there was 10MB.
> 
> But now a large portion of tasks on the system will end up without any cache 
> ? -
> this is not possible
> or do they share a common pool or a default shared pool ? - if there is such a
> default pool  then that needs to be *managed* and this reduces the
> number of exclusive cache access given.

The proposal would be for the administrator to setup how much each user
can reserve via ulimit (per-user).
To change that per-user configuration, its necessary to
stop the tasks. 

However, that makes no sense, revoking crossed my mind as well.
To allow revoking it would be necessary to have a special capability
(which only root has by default).

The point here is that it should be possible to modify cache 
reservations.

Alternatively, use a priority system. So:

Revoking:

Priviledged systemcall to list and invalidate cache reservations.
Assumes that reservations returned by "sys_cache_reservation" 
are persistent and that users of the "remove" system call
are aware of the consequences.

Priority:
-
Use some priority order (based on nice value, or a new separate
value to perform comparison), and use that to decide which 
reservations have priority.

*I-1* (todo notes)


>   1.2> Noisy neighbour problem: how does the task itself decide its the noisy
> neighbor ? This is the
> key requirement the feature wants to address. We want to address the
> jitter and inconsistencies in the quality of service things like
> response times the apps get. If you read the SDM its mentioned
> clearly there as well. can the task voluntarily declare itself
> noisy neighbour(how ??) and relinquish the cache allocation (how
> much ?). But thats not even guaranteed.

I suppose this requires global information (how much cache each
application is using), and the goal: what is the end goal of 
a particular cache resource division.

Each cache division has an outcome: certain instruction sequences
execute faster than others.

Whether a given task is a "cache hog" (that is, evicting cachelines
of other tasks does not reduce execution time of the "cache hog" task
itself, and therefore does not benefit the performance of the system
as a whole) is probably not an ideal visualization: each task has 
different subparts that could be considered "cache hogs", and parts
that are not "cache hogs".

I think that for now, handling the static usecases is good enough.

> How can we expect every application coder to know what system the
> app is going to run and how much is the optimal amount of cache the
> app can get - its not like memory allocation for #3 and #4 below.

"Optimal" depends on what the desired end result is: execution time as
a whole, execution time of an individual task, etc.

In the case the applications are not aware of the cache, the OS 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-07 Thread Marcelo Tosatti
On Thu, Aug 06, 2015 at 01:46:06PM -0700, Vikas Shivappa wrote:
 
 
 On Wed, 5 Aug 2015, Marcelo Tosatti wrote:
 
 On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:
 On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:
 
 But we're doing it the wrong way around.  You can do most of what
 cgroup interface can do with systemcall-like interface with some
 inconvenience.  The other way doesn't really work.  As I wrote in the
 other reply, cgroups is a horrible programmable interface and we don't
 want individual applications to interact with it directly and CAT's
 use cases most definitely include each application programming its own
 cache mask.
 
 I wager that this assertion is wrong. Having individual applications
 program their own cache mask is not going to be the most common
 scenario.
 
 What i like about the syscall interface is that it moves the knowledge
 of cache behaviour close to the application launching (or inside it),
 which allows the following common scenario, say on a multi purpose
 desktop:
 
 Event: launch high performance application: use cache reservation, finish
 quickly.
 Event: cache hog application: do not thrash the cache.
 
 The two cache reservations are logically unrelated in terms of
 configuration, and configured separately do not affect each other.
 
 There could be several issues to let apps allocate the cache
 themselves. We just cannot treat the cache alloc just like memory
 allocation, please consider the scenarios below:
 
 all examples consider cache size : 10MB. cbm max bits : 10
 
 
   (1)user programmable syscall:
 
   1.1 Exclusive access:  The task cannot give *itself* exclusive
 access from using the cache. For this it needs to have visibility of
 the cache allocation of other tasks and may need to reclaim or
 override others cache allocs which is not feasible (isnt that the
 ability of a system managing agent?).

Different allocation of the resource (cache in this case) causes 
different cache miss patterns and therefore different results.

   eg:
 app1... 10 ask for 1MB of exclusive cache each.
 they get it as there was 10MB.
 
 But now a large portion of tasks on the system will end up without any cache 
 ? -
 this is not possible
 or do they share a common pool or a default shared pool ? - if there is such a
 default pool  then that needs to be *managed* and this reduces the
 number of exclusive cache access given.

The proposal would be for the administrator to setup how much each user
can reserve via ulimit (per-user).
To change that per-user configuration, its necessary to
stop the tasks. 

However, that makes no sense, revoking crossed my mind as well.
To allow revoking it would be necessary to have a special capability
(which only root has by default).

The point here is that it should be possible to modify cache 
reservations.

Alternatively, use a priority system. So:

Revoking:

Priviledged systemcall to list and invalidate cache reservations.
Assumes that reservations returned by sys_cache_reservation 
are persistent and that users of the remove system call
are aware of the consequences.

Priority:
-
Use some priority order (based on nice value, or a new separate
value to perform comparison), and use that to decide which 
reservations have priority.

*I-1* (todo notes)


   1.2 Noisy neighbour problem: how does the task itself decide its the noisy
 neighbor ? This is the
 key requirement the feature wants to address. We want to address the
 jitter and inconsistencies in the quality of service things like
 response times the apps get. If you read the SDM its mentioned
 clearly there as well. can the task voluntarily declare itself
 noisy neighbour(how ??) and relinquish the cache allocation (how
 much ?). But thats not even guaranteed.

I suppose this requires global information (how much cache each
application is using), and the goal: what is the end goal of 
a particular cache resource division.

Each cache division has an outcome: certain instruction sequences
execute faster than others.

Whether a given task is a cache hog (that is, evicting cachelines
of other tasks does not reduce execution time of the cache hog task
itself, and therefore does not benefit the performance of the system
as a whole) is probably not an ideal visualization: each task has 
different subparts that could be considered cache hogs, and parts
that are not cache hogs.

I think that for now, handling the static usecases is good enough.

 How can we expect every application coder to know what system the
 app is going to run and how much is the optimal amount of cache the
 app can get - its not like memory allocation for #3 and #4 below.

Optimal depends on what the desired end result is: execution time as
a whole, execution time of an individual task, etc.

In the case the applications are not aware of the cache, the OS should
divide the resource automatically by heuristics (in analogy with LRU).

For special applications, the programmer/compiler can 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-07 Thread Tejun Heo
Hello,

On Thu, Aug 06, 2015 at 01:58:39PM -0700, Vikas Shivappa wrote:
 I'm having hard time believing that.  There definitely are use cases
 where cachelines are trashed among service threads.  Are you
 proclaiming that those cases aren't gonna be supported?
 
 Please refer to the noisy neighbour example i give here to help resolve
 thrashing by a noisy neighbour -
 http://marc.info/?l=linux-kernelm=143889397419199

I don't think that's relevant to the discussion.  Implement a taskset
like tool and the administrator can deal with it just fine.  As I
wrote multiple times now, people have been dealing with CPU affinity
fine w/o cgroups.  Sure, cgroups do add on top but it's an a lot more
complex facility and not a replacement for a more basic control
mechanism.

 - This interface like you said can easily bolt-on. basically an easy to use
 interface without worrying about the architectural details.
 
 But it's ripe with architectural details.
 
 If specifying the bitmask is an issue , it can easily be addressed by
 writing a script which calculates the bitmask to size - like mentioned here
 http://marc.info/?l=linux-kernelm=143889397419199

Let's say we fully virtualize cache partitioning so that each user can
express what they want and the kernel can compute and manage the
closest mapping supportable by the underlying hardware.  That should
be doable but I don't think that's what we want at this point.  This,
at least for now, is a niche feature which requires specific
configurations to be useful and while useful to certain narrow use
cases unlikely to be used across the board.  Given that, we don't want
to overengineer the solution.  Implement something simple and
specific.  We don't yet even know the full usefulness or use cases of
the feature.  It doesn't make sense to overcommit to complex
abstractions and mechanisms when there's a fairly good chance that our
understanding of the problem itself is very porous.

This applies the same to making it part of cgroups.  It's a lot more
complex and we end up committing a lot more than implementing
something simple and specific.  Let's please keep it simple.

 I'm not saying they are mutually exclusive but that we're going
 overboard in this direction when programmable interface should be the
 priority.  While this mostly happened naturally for other resources
 because cgroups was introduced later but I think there's a general
 rule to follow there.
 
 Right , the cache allocation cannot be treated like memory like explained
 here in 1.3 and 1.4
 http://marc.info/?l=linux-kernelm=143889397419199

Who said that it could be?  If it actually were a resource which is as
ubiquitous, flexible and dividable as memory, cgroups would be an a
lot better fit.

 If you factor in threads of a process, the above model is
 fundamentally flawed.  How would root or any external entity find out
 what threads are to be allocated what?
 
 the process ID can be added to the cgroup together with all its threads as
 shown in example of cgroup usage in (2) here -

And how does an external entity find out which ID should be put where?
This is a knowledge only known to the process itself.  That's what I
meant by going this route requires individual applications
communicating with external agents.

 In most cases in the cloud you will be able to decide based on what
 workloads are running - see the example 1.5 here
 
 http://marc.info/?l=linux-kernelm=143889397419199

Sure, that's an way outer scope.  The point was that this can't handle
in-process scope.

 Each application would
 constnatly have to tell an external agent about what its intentions
 are.  This might seem to work in a limited feature testing setup where
 you know everything about who's doing what but is no way a widely
 deployable solution.  This pretty much degenerates into #3 you listed
 below.
 
 App may not be the best one to decide 1.1 and 1.2 here
 http://marc.info/?l=linux-kernelm=143889397419199

That paragraph just shows how little is understood, so you can't
imagine a situation where threads of a process agree upon how they'll
use cache to improve performance?  Threads of the same program do
things like this all the time with different types of resources.  This
is a large portion of what server software programmers do - making the
threads and other components behave in a way that maxmizes the
efficacy of the underlying system.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-06 Thread Vikas Shivappa



On Wed, 5 Aug 2015, Tejun Heo wrote:


Hello,

On Tue, Aug 04, 2015 at 07:21:52PM -0700, Vikas Shivappa wrote:

I get that this would be an easier "bolt-on" solution but isn't a good
solution by itself in the long term.  As I wrote multiple times
before, this is a really bad programmable interface.  Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications,


Yes, this doesnt have to be a programmable interface for threads. May not be
a good idea to let the threads decide the cache allocation by themselves
using this direct interface. We are transfering the decision maker
responsibility to the system administrator.


I'm having hard time believing that.  There definitely are use cases
where cachelines are trashed among service threads.  Are you
proclaiming that those cases aren't gonna be supported?


Please refer to the noisy neighbour example i give here to help resolve 
thrashing by a 
noisy neighbour -

http://marc.info/?l=linux-kernel=143889397419199

and the reference
http://www.intel.com/content/www/us/en/communications/cache-allocation-technology-white-paper.html





- This interface like you said can easily bolt-on. basically an easy to use
interface without worrying about the architectural details.


But it's ripe with architectural details.


If specifying the bitmask is an issue , it can easily be addressed by writing a 
script which calculates the bitmask to size - like mentioned here

http://marc.info/?l=linux-kernel=143889397419199

 What I meant by bolt-on was

that this is a shortcut way of introducing this feature without
actually worrying about how this will be used by applications and
that's not a good thing.  We need to be worrying about that.


- But still does the job. root user can allocate exclusive or overlapping
cache lines to threads or group of threads.
- No major roadblocks for usage as we can make the allocations like
mentioned above and still keep the hierarchy etc and use it when needed.
- An important factor is that it can co-exist with other interfaces like #2
and #3 for the same easily. So I donot see a reason why we should not use
this.
This is not meant to be a programmable interface, however it does not
prevent co-existence.


I'm not saying they are mutually exclusive but that we're going
overboard in this direction when programmable interface should be the
priority.  While this mostly happened naturally for other resources
because cgroups was introduced later but I think there's a general
rule to follow there.


Right , the cache allocation cannot be treated like memory like explained here 
in 1.3 and 1.4

http://marc.info/?l=linux-kernel=143889397419199





- If root user has to set affinity of threads that he is allocating cache,
he can do so using other cgroups like cpuset or set the masks seperately
using taskset. This would let him configure the cache allocation on a
socket.


Well, root can do whatever it wants with programmable interface too.
The way things are designed, even containment isn't an issue, assign
an ID to all processes by default and change the allocation on that.


this is a pretty bad interface by itself.



There is already a lot of such usage among different enterprise users at
Intel/google/cisco etc who have been testing the patches posted to lkml and
academically there is plenty of usage as well.


I mean, that's the tool you gave them.  Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too.  Again, please think of cpu affinity.


All the methodology to support the feature may need an arbitrator/agent to
decide the allocation.

1. Let the root user or system administrator be the one who decides the
allocation based on the current usage. We assume this to be one with
administrative privileges. He could use the cgroup interface to perform the
task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup
together.


If you factor in threads of a process, the above model is
fundamentally flawed.  How would root or any external entity find out
what threads are to be allocated what?


the process ID can be added to the cgroup together with all its threads as shown 
in example of cgroup usage in (2) here -


In most cases in the cloud you will be able to decide based on what workloads 
are running - see the example 1.5 here


http://marc.info/?l=linux-kernel=143889397419199


Each application would

constnatly have to tell an external agent about what its intentions
are.  This might seem to work in a limited feature testing setup where
you know everything about who's doing what but is no way a widely
deployable solution.  This pretty much degenerates into #3 you listed
below.


App may not be the best one to decide 
1.1 and 1.2 here

http://marc.info/?l=linux-kernel=143889397419199




2. Kernel automatically assigning the cache based on the priority of the apps
etc. This is something which could be designed to co-exist with the #1 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-06 Thread Vikas Shivappa



On Wed, 5 Aug 2015, Marcelo Tosatti wrote:


On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:

On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:


But we're doing it the wrong way around.  You can do most of what
cgroup interface can do with systemcall-like interface with some
inconvenience.  The other way doesn't really work.  As I wrote in the
other reply, cgroups is a horrible programmable interface and we don't
want individual applications to interact with it directly and CAT's
use cases most definitely include each application programming its own
cache mask.


I wager that this assertion is wrong. Having individual applications
program their own cache mask is not going to be the most common
scenario.


What i like about the syscall interface is that it moves the knowledge
of cache behaviour close to the application launching (or inside it),
which allows the following common scenario, say on a multi purpose
desktop:

Event: launch high performance application: use cache reservation, finish
quickly.
Event: cache hog application: do not thrash the cache.

The two cache reservations are logically unrelated in terms of
configuration, and configured separately do not affect each other.


There could be several issues to let apps allocate the cache themselves. We just 
cannot treat the cache alloc just like memory allocation, please consider the 
scenarios below:


all examples consider cache size : 10MB. cbm max bits : 10


(1)user programmable syscall:

  1.1> Exclusive access:  The task cannot give *itself* exclusive access from 
using the cache. For this it needs to have visibility of the cache allocation of 
other tasks and may need to reclaim or override others cache allocs which is not 
feasible (isnt that the ability of a system managing agent?).


  eg:
app1... 10 ask for 1MB of exclusive cache each.
they get it as there was 10MB.

But now a large portion of tasks on the system will end up without any cache ? -
this is not possible
or do they share a common pool or a default shared pool ? - if there is such a
default pool  then that needs to be *managed* and this reduces the number 
of exclusive cache access given.


  1.2> Noisy neighbour problem: how does the task itself decide its the noisy
neighbor ? This is the
key requirement the feature wants to address. We want to address the 
jitter and inconsistencies in the quality of service things like response times 
the apps get. If you read the SDM 
its mentioned clearly there as well. can the task voluntarily declare itself
noisy neighbour(how ??) and relinquish the cache allocation (how much ?). But 
thats not even guaranteed.
How can we expect every application coder to know what system the app is going 
to run and how much is the optimal amount of cache the app can get - its not 
like memory allocation for #3 and #4 below.


  1.3> cannot treat cache allocation similar to memory allocation.
there is system-calls alternatives to do memory allocation apart from cgroups
like cpuset but we cannot treat both as the same.
(This is with reference to the point that there are alternatives to memory
allocation apart from using cpuset, but the whole point is you cant treat 
memory allocation and cache allocation as same)

1.3.1> memory is a very large pool in terms of GBs and we are talking
about only a few MBs (~10 - 20 orders and orders of magnitude). So this could 
easily get into a situation mentioned

above where a few first apps get all the exclusive cache and the rest have to
starve.
1.3.2> memory is virtualized : each process has its own space and we are
not even bound by the physical memory capacity as we can virtualize it so an app 
can indeed ask for more memory than the physical memory along with other apps 
doing the same - but we cant do the same here with cache allocation. Even if we 
evict the cache , that defeats the purpose of cache allocation to threads.


  1.4> specific h/w requirements : With code data prioritization(cdp) , the h/w
requires the OS to reset all the capacity bitmasks once we change mode
from to legacy cache alloc. So
naturally we need to remove the tasks with all its allocations.  We cannot
easily take away all the cache allocations that users will be thinking is theirs
when they had allocated using the syscall. This is something like the tasks
malloc successfully and midway their allocation is no more there.
Also this would add to the logic that you need to treat the cache allocation and
other resource allocation like memory differently.

  1.5> In cloud and container environments , say we would need to allocate cache 
for entire VM which runs a specific real_time workload vs. allocate cache for VMs 
which run say noisy_workload - how can we achieve this by letting each app 
decide how much cache that needs to be allocated ? This is best done by an 
external system manager.


(2)cgroup interface:

 (2.1) compare above usage

1.1> and 1.2> above can easily be done with 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-06 Thread Vikas Shivappa



On Wed, 5 Aug 2015, Marcelo Tosatti wrote:


On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:

On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:


But we're doing it the wrong way around.  You can do most of what
cgroup interface can do with systemcall-like interface with some
inconvenience.  The other way doesn't really work.  As I wrote in the
other reply, cgroups is a horrible programmable interface and we don't
want individual applications to interact with it directly and CAT's
use cases most definitely include each application programming its own
cache mask.


I wager that this assertion is wrong. Having individual applications
program their own cache mask is not going to be the most common
scenario.


What i like about the syscall interface is that it moves the knowledge
of cache behaviour close to the application launching (or inside it),
which allows the following common scenario, say on a multi purpose
desktop:

Event: launch high performance application: use cache reservation, finish
quickly.
Event: cache hog application: do not thrash the cache.

The two cache reservations are logically unrelated in terms of
configuration, and configured separately do not affect each other.


There could be several issues to let apps allocate the cache themselves. We just 
cannot treat the cache alloc just like memory allocation, please consider the 
scenarios below:


all examples consider cache size : 10MB. cbm max bits : 10


(1)user programmable syscall:

  1.1 Exclusive access:  The task cannot give *itself* exclusive access from 
using the cache. For this it needs to have visibility of the cache allocation of 
other tasks and may need to reclaim or override others cache allocs which is not 
feasible (isnt that the ability of a system managing agent?).


  eg:
app1... 10 ask for 1MB of exclusive cache each.
they get it as there was 10MB.

But now a large portion of tasks on the system will end up without any cache ? -
this is not possible
or do they share a common pool or a default shared pool ? - if there is such a
default pool  then that needs to be *managed* and this reduces the number 
of exclusive cache access given.


  1.2 Noisy neighbour problem: how does the task itself decide its the noisy
neighbor ? This is the
key requirement the feature wants to address. We want to address the 
jitter and inconsistencies in the quality of service things like response times 
the apps get. If you read the SDM 
its mentioned clearly there as well. can the task voluntarily declare itself
noisy neighbour(how ??) and relinquish the cache allocation (how much ?). But 
thats not even guaranteed.
How can we expect every application coder to know what system the app is going 
to run and how much is the optimal amount of cache the app can get - its not 
like memory allocation for #3 and #4 below.


  1.3 cannot treat cache allocation similar to memory allocation.
there is system-calls alternatives to do memory allocation apart from cgroups
like cpuset but we cannot treat both as the same.
(This is with reference to the point that there are alternatives to memory
allocation apart from using cpuset, but the whole point is you cant treat 
memory allocation and cache allocation as same)

1.3.1 memory is a very large pool in terms of GBs and we are talking
about only a few MBs (~10 - 20 orders and orders of magnitude). So this could 
easily get into a situation mentioned

above where a few first apps get all the exclusive cache and the rest have to
starve.
1.3.2 memory is virtualized : each process has its own space and we are
not even bound by the physical memory capacity as we can virtualize it so an app 
can indeed ask for more memory than the physical memory along with other apps 
doing the same - but we cant do the same here with cache allocation. Even if we 
evict the cache , that defeats the purpose of cache allocation to threads.


  1.4 specific h/w requirements : With code data prioritization(cdp) , the h/w
requires the OS to reset all the capacity bitmasks once we change mode
from to legacy cache alloc. So
naturally we need to remove the tasks with all its allocations.  We cannot
easily take away all the cache allocations that users will be thinking is theirs
when they had allocated using the syscall. This is something like the tasks
malloc successfully and midway their allocation is no more there.
Also this would add to the logic that you need to treat the cache allocation and
other resource allocation like memory differently.

  1.5 In cloud and container environments , say we would need to allocate cache 
for entire VM which runs a specific real_time workload vs. allocate cache for VMs 
which run say noisy_workload - how can we achieve this by letting each app 
decide how much cache that needs to be allocated ? This is best done by an 
external system manager.


(2)cgroup interface:

 (2.1) compare above usage

1.1 and 1.2 above can easily be done with cgroup 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-06 Thread Vikas Shivappa



On Wed, 5 Aug 2015, Tejun Heo wrote:


Hello,

On Tue, Aug 04, 2015 at 07:21:52PM -0700, Vikas Shivappa wrote:

I get that this would be an easier bolt-on solution but isn't a good
solution by itself in the long term.  As I wrote multiple times
before, this is a really bad programmable interface.  Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications,


Yes, this doesnt have to be a programmable interface for threads. May not be
a good idea to let the threads decide the cache allocation by themselves
using this direct interface. We are transfering the decision maker
responsibility to the system administrator.


I'm having hard time believing that.  There definitely are use cases
where cachelines are trashed among service threads.  Are you
proclaiming that those cases aren't gonna be supported?


Please refer to the noisy neighbour example i give here to help resolve 
thrashing by a 
noisy neighbour -

http://marc.info/?l=linux-kernelm=143889397419199

and the reference
http://www.intel.com/content/www/us/en/communications/cache-allocation-technology-white-paper.html





- This interface like you said can easily bolt-on. basically an easy to use
interface without worrying about the architectural details.


But it's ripe with architectural details.


If specifying the bitmask is an issue , it can easily be addressed by writing a 
script which calculates the bitmask to size - like mentioned here

http://marc.info/?l=linux-kernelm=143889397419199

 What I meant by bolt-on was

that this is a shortcut way of introducing this feature without
actually worrying about how this will be used by applications and
that's not a good thing.  We need to be worrying about that.


- But still does the job. root user can allocate exclusive or overlapping
cache lines to threads or group of threads.
- No major roadblocks for usage as we can make the allocations like
mentioned above and still keep the hierarchy etc and use it when needed.
- An important factor is that it can co-exist with other interfaces like #2
and #3 for the same easily. So I donot see a reason why we should not use
this.
This is not meant to be a programmable interface, however it does not
prevent co-existence.


I'm not saying they are mutually exclusive but that we're going
overboard in this direction when programmable interface should be the
priority.  While this mostly happened naturally for other resources
because cgroups was introduced later but I think there's a general
rule to follow there.


Right , the cache allocation cannot be treated like memory like explained here 
in 1.3 and 1.4

http://marc.info/?l=linux-kernelm=143889397419199





- If root user has to set affinity of threads that he is allocating cache,
he can do so using other cgroups like cpuset or set the masks seperately
using taskset. This would let him configure the cache allocation on a
socket.


Well, root can do whatever it wants with programmable interface too.
The way things are designed, even containment isn't an issue, assign
an ID to all processes by default and change the allocation on that.


this is a pretty bad interface by itself.



There is already a lot of such usage among different enterprise users at
Intel/google/cisco etc who have been testing the patches posted to lkml and
academically there is plenty of usage as well.


I mean, that's the tool you gave them.  Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too.  Again, please think of cpu affinity.


All the methodology to support the feature may need an arbitrator/agent to
decide the allocation.

1. Let the root user or system administrator be the one who decides the
allocation based on the current usage. We assume this to be one with
administrative privileges. He could use the cgroup interface to perform the
task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup
together.


If you factor in threads of a process, the above model is
fundamentally flawed.  How would root or any external entity find out
what threads are to be allocated what?


the process ID can be added to the cgroup together with all its threads as shown 
in example of cgroup usage in (2) here -


In most cases in the cloud you will be able to decide based on what workloads 
are running - see the example 1.5 here


http://marc.info/?l=linux-kernelm=143889397419199


Each application would

constnatly have to tell an external agent about what its intentions
are.  This might seem to work in a limited feature testing setup where
you know everything about who's doing what but is no way a widely
deployable solution.  This pretty much degenerates into #3 you listed
below.


App may not be the best one to decide 
1.1 and 1.2 here

http://marc.info/?l=linux-kernelm=143889397419199




2. Kernel automatically assigning the cache based on the priority of the apps
etc. This is something which could be designed to co-exist with the #1 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Marcelo Tosatti
On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:
> On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:
> > 
> > But we're doing it the wrong way around.  You can do most of what
> > cgroup interface can do with systemcall-like interface with some
> > inconvenience.  The other way doesn't really work.  As I wrote in the
> > other reply, cgroups is a horrible programmable interface and we don't
> > want individual applications to interact with it directly and CAT's
> > use cases most definitely include each application programming its own
> > cache mask.
> 
> I wager that this assertion is wrong. Having individual applications
> program their own cache mask is not going to be the most common
> scenario. 

What i like about the syscall interface is that it moves the knowledge
of cache behaviour close to the application launching (or inside it),
which allows the following common scenario, say on a multi purpose
desktop:

Event: launch high performance application: use cache reservation, finish
quickly.
Event: cache hog application: do not thrash the cache.

The two cache reservations are logically unrelated in terms of
configuration, and configured separately do not affect each other.

They should be configured separately.

Also, data/code reservation is specific to the application, so it
should its specification should be close to the application (its just
cumbersome to maintain that data somewhere else).

> Only in very specific situations would you trust an
> application to do that.

Perhaps ulimit can be used to allow a certain limit on applications.

> A much more likely use case is having the sysadmin carve up the cache
> for a workload which may include multiple, uncooperating applications.

Sorry, what cooperating means in this context?

> Yes, a programmable interface would be useful, but only for a limited
> set of workloads. I don't think it's how most people are going to want
> to use this hardware technology.

It seems syscall interface handles all usecases which the cgroup
interface handles.

> -- 
> Matt Fleming, Intel Open Source Technology Center

Tentative interface, please comment.

The "return key/use key" scheme would allow COSid sharing similarly to
shmget. Intra-application, that is functional, but i am not experienced
with shmget to judge whether there is a better alternative. Would have
to think how cross-application setup would work,
and in the simple "cacheset" configuration.
Also, the interface should work for other architectures (TODO item, PPC
at least has similar functionality).

enum cache_rsvt_flags {
   CACHE_RSVT_ROUND_UP   =  (1 << 0),/* round "bytes" up */
   CACHE_RSVT_ROUND_DOWN =  (1 << 1),/* round "bytes" down */
   CACHE_RSVT_EXTAGENTS  =  (1 << 2),/* allow usage of area common with 
external agents */
};

enum cache_rsvt_type {
   CACHE_RSVT_TYPE_CODE = 0,  /* cache reservation is for code */
   CACHE_RSVT_TYPE_DATA,  /* cache reservation is for data */
   CACHE_RSVT_TYPE_BOTH,  /* cache reservation is for code and data */
};

struct cache_reservation {
size_t kbytes;
u32 type;
u32 flags;
};

int sys_cache_reservation(struct cache_reservation *cv);

returns -ENOMEM if not enough space, -EPERM if no permission.
returns keyid > 0 if reservation has been successful, copying actual
number of kbytes reserved to "kbytes".

-

int sys_use_cache_reservation_key(struct cache_reservation *cv, int
key);

returns -EPERM if no permission.
returns -EINVAL if no such key exists.
returns 0 if instantiation of reservation has been successful,
copying actual reservation to cv.

Backward compatibility for processors with no support for code/data
differentiation: by default code and data cache allocation types
fallback to CACHE_RSVT_TYPE_BOTH on older processors (and return the
information that they done so via "flags").


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Tejun Heo
Hello,

On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:
> I wager that this assertion is wrong. Having individual applications
> program their own cache mask is not going to be the most common
> scenario. Only in very specific situations would you trust an
> application to do that.

As I wrote in the other reply, I don't buy that.  The above only holds
if you exclude use cases where this feature is used by multiple
threads of an application and I can't see a single reason why such
uses would be excluded.

> A much more likely use case is having the sysadmin carve up the cache
> for a workload which may include multiple, uncooperating applications.
> 
> Yes, a programmable interface would be useful, but only for a limited
> set of workloads. I don't think it's how most people are going to want
> to use this hardware technology.

It's actually the other way around.  You can achieve most of what
cgroups can do with programmable interface albeit with some
awkwardness.  The other direction is a lot more heavier and painful.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Tejun Heo
Hello,

On Tue, Aug 04, 2015 at 07:21:52PM -0700, Vikas Shivappa wrote:
> >I get that this would be an easier "bolt-on" solution but isn't a good
> >solution by itself in the long term.  As I wrote multiple times
> >before, this is a really bad programmable interface.  Unless you're
> >sure that this doesn't have to be programmable for threads of an
> >individual applications,
> 
> Yes, this doesnt have to be a programmable interface for threads. May not be
> a good idea to let the threads decide the cache allocation by themselves
> using this direct interface. We are transfering the decision maker
> responsibility to the system administrator.

I'm having hard time believing that.  There definitely are use cases
where cachelines are trashed among service threads.  Are you
proclaiming that those cases aren't gonna be supported?

> - This interface like you said can easily bolt-on. basically an easy to use
> interface without worrying about the architectural details.

But it's ripe with architectural details.  What I meant by bolt-on was
that this is a shortcut way of introducing this feature without
actually worrying about how this will be used by applications and
that's not a good thing.  We need to be worrying about that.

> - But still does the job. root user can allocate exclusive or overlapping
> cache lines to threads or group of threads.
> - No major roadblocks for usage as we can make the allocations like
> mentioned above and still keep the hierarchy etc and use it when needed.
> - An important factor is that it can co-exist with other interfaces like #2
> and #3 for the same easily. So I donot see a reason why we should not use
> this.
> This is not meant to be a programmable interface, however it does not
> prevent co-existence.

I'm not saying they are mutually exclusive but that we're going
overboard in this direction when programmable interface should be the
priority.  While this mostly happened naturally for other resources
because cgroups was introduced later but I think there's a general
rule to follow there.

> - If root user has to set affinity of threads that he is allocating cache,
> he can do so using other cgroups like cpuset or set the masks seperately
> using taskset. This would let him configure the cache allocation on a
> socket.

Well, root can do whatever it wants with programmable interface too.
The way things are designed, even containment isn't an issue, assign
an ID to all processes by default and change the allocation on that.

> this is a pretty bad interface by itself.
> >
> >>There is already a lot of such usage among different enterprise users at
> >>Intel/google/cisco etc who have been testing the patches posted to lkml and
> >>academically there is plenty of usage as well.
> >
> >I mean, that's the tool you gave them.  Of course they'd be using it
> >but I suspect most of them would do fine with a programmable interface
> >too.  Again, please think of cpu affinity.
> 
> All the methodology to support the feature may need an arbitrator/agent to
> decide the allocation.
> 
> 1. Let the root user or system administrator be the one who decides the
> allocation based on the current usage. We assume this to be one with
> administrative privileges. He could use the cgroup interface to perform the
> task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup
> together.

If you factor in threads of a process, the above model is
fundamentally flawed.  How would root or any external entity find out
what threads are to be allocated what?  Each application would
constnatly have to tell an external agent about what its intentions
are.  This might seem to work in a limited feature testing setup where
you know everything about who's doing what but is no way a widely
deployable solution.  This pretty much degenerates into #3 you listed
below.

> 2. Kernel automatically assigning the cache based on the priority of the apps
> etc. This is something which could be designed to co-exist with the #1 above
> much like how the cpusets cgroup co-exist with the kernel assigning cpus to
> tasks. (the task could be having a cache capacity mask just like the cpu
> affinity mask)

I don't think CAT would be applicable in this manner.  BE allocation
is what the CPU is doing by default already.  I'm highly doubtful
something like CAT would be used automatically in generic systems.  It
requires fairly specific coordination after all.

> 3. User programmable interface , where say a resource management program
> x (and hence apps) could link a library which supports cache alloc/monitoring
> etc and then try to control and monitor the resources. The arbitrator could 
> just
> be the resource management interface itself or the kernel could decide.
>
> If users use this programmable interface, we need to make sure all the apps
> just cannot allocate resources without some interfacing agent (in which case
> they could interface with #2 ?).
> 
> Do you think there are any issues for the user 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Matt Fleming
On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:
> 
> But we're doing it the wrong way around.  You can do most of what
> cgroup interface can do with systemcall-like interface with some
> inconvenience.  The other way doesn't really work.  As I wrote in the
> other reply, cgroups is a horrible programmable interface and we don't
> want individual applications to interact with it directly and CAT's
> use cases most definitely include each application programming its own
> cache mask.

I wager that this assertion is wrong. Having individual applications
program their own cache mask is not going to be the most common
scenario. Only in very specific situations would you trust an
application to do that.

A much more likely use case is having the sysadmin carve up the cache
for a workload which may include multiple, uncooperating applications.

Yes, a programmable interface would be useful, but only for a limited
set of workloads. I don't think it's how most people are going to want
to use this hardware technology.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Matt Fleming
On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:
 
 But we're doing it the wrong way around.  You can do most of what
 cgroup interface can do with systemcall-like interface with some
 inconvenience.  The other way doesn't really work.  As I wrote in the
 other reply, cgroups is a horrible programmable interface and we don't
 want individual applications to interact with it directly and CAT's
 use cases most definitely include each application programming its own
 cache mask.

I wager that this assertion is wrong. Having individual applications
program their own cache mask is not going to be the most common
scenario. Only in very specific situations would you trust an
application to do that.

A much more likely use case is having the sysadmin carve up the cache
for a workload which may include multiple, uncooperating applications.

Yes, a programmable interface would be useful, but only for a limited
set of workloads. I don't think it's how most people are going to want
to use this hardware technology.

-- 
Matt Fleming, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Tejun Heo
Hello,

On Tue, Aug 04, 2015 at 07:21:52PM -0700, Vikas Shivappa wrote:
 I get that this would be an easier bolt-on solution but isn't a good
 solution by itself in the long term.  As I wrote multiple times
 before, this is a really bad programmable interface.  Unless you're
 sure that this doesn't have to be programmable for threads of an
 individual applications,
 
 Yes, this doesnt have to be a programmable interface for threads. May not be
 a good idea to let the threads decide the cache allocation by themselves
 using this direct interface. We are transfering the decision maker
 responsibility to the system administrator.

I'm having hard time believing that.  There definitely are use cases
where cachelines are trashed among service threads.  Are you
proclaiming that those cases aren't gonna be supported?

 - This interface like you said can easily bolt-on. basically an easy to use
 interface without worrying about the architectural details.

But it's ripe with architectural details.  What I meant by bolt-on was
that this is a shortcut way of introducing this feature without
actually worrying about how this will be used by applications and
that's not a good thing.  We need to be worrying about that.

 - But still does the job. root user can allocate exclusive or overlapping
 cache lines to threads or group of threads.
 - No major roadblocks for usage as we can make the allocations like
 mentioned above and still keep the hierarchy etc and use it when needed.
 - An important factor is that it can co-exist with other interfaces like #2
 and #3 for the same easily. So I donot see a reason why we should not use
 this.
 This is not meant to be a programmable interface, however it does not
 prevent co-existence.

I'm not saying they are mutually exclusive but that we're going
overboard in this direction when programmable interface should be the
priority.  While this mostly happened naturally for other resources
because cgroups was introduced later but I think there's a general
rule to follow there.

 - If root user has to set affinity of threads that he is allocating cache,
 he can do so using other cgroups like cpuset or set the masks seperately
 using taskset. This would let him configure the cache allocation on a
 socket.

Well, root can do whatever it wants with programmable interface too.
The way things are designed, even containment isn't an issue, assign
an ID to all processes by default and change the allocation on that.

 this is a pretty bad interface by itself.
 
 There is already a lot of such usage among different enterprise users at
 Intel/google/cisco etc who have been testing the patches posted to lkml and
 academically there is plenty of usage as well.
 
 I mean, that's the tool you gave them.  Of course they'd be using it
 but I suspect most of them would do fine with a programmable interface
 too.  Again, please think of cpu affinity.
 
 All the methodology to support the feature may need an arbitrator/agent to
 decide the allocation.
 
 1. Let the root user or system administrator be the one who decides the
 allocation based on the current usage. We assume this to be one with
 administrative privileges. He could use the cgroup interface to perform the
 task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup
 together.

If you factor in threads of a process, the above model is
fundamentally flawed.  How would root or any external entity find out
what threads are to be allocated what?  Each application would
constnatly have to tell an external agent about what its intentions
are.  This might seem to work in a limited feature testing setup where
you know everything about who's doing what but is no way a widely
deployable solution.  This pretty much degenerates into #3 you listed
below.

 2. Kernel automatically assigning the cache based on the priority of the apps
 etc. This is something which could be designed to co-exist with the #1 above
 much like how the cpusets cgroup co-exist with the kernel assigning cpus to
 tasks. (the task could be having a cache capacity mask just like the cpu
 affinity mask)

I don't think CAT would be applicable in this manner.  BE allocation
is what the CPU is doing by default already.  I'm highly doubtful
something like CAT would be used automatically in generic systems.  It
requires fairly specific coordination after all.

 3. User programmable interface , where say a resource management program
 x (and hence apps) could link a library which supports cache alloc/monitoring
 etc and then try to control and monitor the resources. The arbitrator could 
 just
 be the resource management interface itself or the kernel could decide.

 If users use this programmable interface, we need to make sure all the apps
 just cannot allocate resources without some interfacing agent (in which case
 they could interface with #2 ?).
 
 Do you think there are any issues for the user programmable interface to
 co-exist with the cgroup interface ?

Isn't that a 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Tejun Heo
Hello,

On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:
 I wager that this assertion is wrong. Having individual applications
 program their own cache mask is not going to be the most common
 scenario. Only in very specific situations would you trust an
 application to do that.

As I wrote in the other reply, I don't buy that.  The above only holds
if you exclude use cases where this feature is used by multiple
threads of an application and I can't see a single reason why such
uses would be excluded.

 A much more likely use case is having the sysadmin carve up the cache
 for a workload which may include multiple, uncooperating applications.
 
 Yes, a programmable interface would be useful, but only for a limited
 set of workloads. I don't think it's how most people are going to want
 to use this hardware technology.

It's actually the other way around.  You can achieve most of what
cgroups can do with programmable interface albeit with some
awkwardness.  The other direction is a lot more heavier and painful.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-05 Thread Marcelo Tosatti
On Wed, Aug 05, 2015 at 01:22:57PM +0100, Matt Fleming wrote:
 On Sun, 02 Aug, at 12:31:57PM, Tejun Heo wrote:
  
  But we're doing it the wrong way around.  You can do most of what
  cgroup interface can do with systemcall-like interface with some
  inconvenience.  The other way doesn't really work.  As I wrote in the
  other reply, cgroups is a horrible programmable interface and we don't
  want individual applications to interact with it directly and CAT's
  use cases most definitely include each application programming its own
  cache mask.
 
 I wager that this assertion is wrong. Having individual applications
 program their own cache mask is not going to be the most common
 scenario. 

What i like about the syscall interface is that it moves the knowledge
of cache behaviour close to the application launching (or inside it),
which allows the following common scenario, say on a multi purpose
desktop:

Event: launch high performance application: use cache reservation, finish
quickly.
Event: cache hog application: do not thrash the cache.

The two cache reservations are logically unrelated in terms of
configuration, and configured separately do not affect each other.

They should be configured separately.

Also, data/code reservation is specific to the application, so it
should its specification should be close to the application (its just
cumbersome to maintain that data somewhere else).

 Only in very specific situations would you trust an
 application to do that.

Perhaps ulimit can be used to allow a certain limit on applications.

 A much more likely use case is having the sysadmin carve up the cache
 for a workload which may include multiple, uncooperating applications.

Sorry, what cooperating means in this context?

 Yes, a programmable interface would be useful, but only for a limited
 set of workloads. I don't think it's how most people are going to want
 to use this hardware technology.

It seems syscall interface handles all usecases which the cgroup
interface handles.

 -- 
 Matt Fleming, Intel Open Source Technology Center

Tentative interface, please comment.

The return key/use key scheme would allow COSid sharing similarly to
shmget. Intra-application, that is functional, but i am not experienced
with shmget to judge whether there is a better alternative. Would have
to think how cross-application setup would work,
and in the simple cacheset configuration.
Also, the interface should work for other architectures (TODO item, PPC
at least has similar functionality).

enum cache_rsvt_flags {
   CACHE_RSVT_ROUND_UP   =  (1  0),/* round bytes up */
   CACHE_RSVT_ROUND_DOWN =  (1  1),/* round bytes down */
   CACHE_RSVT_EXTAGENTS  =  (1  2),/* allow usage of area common with 
external agents */
};

enum cache_rsvt_type {
   CACHE_RSVT_TYPE_CODE = 0,  /* cache reservation is for code */
   CACHE_RSVT_TYPE_DATA,  /* cache reservation is for data */
   CACHE_RSVT_TYPE_BOTH,  /* cache reservation is for code and data */
};

struct cache_reservation {
size_t kbytes;
u32 type;
u32 flags;
};

int sys_cache_reservation(struct cache_reservation *cv);

returns -ENOMEM if not enough space, -EPERM if no permission.
returns keyid  0 if reservation has been successful, copying actual
number of kbytes reserved to kbytes.

-

int sys_use_cache_reservation_key(struct cache_reservation *cv, int
key);

returns -EPERM if no permission.
returns -EINVAL if no such key exists.
returns 0 if instantiation of reservation has been successful,
copying actual reservation to cv.

Backward compatibility for processors with no support for code/data
differentiation: by default code and data cache allocation types
fallback to CACHE_RSVT_TYPE_BOTH on older processors (and return the
information that they done so via flags).


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Vikas Shivappa



On Tue, 4 Aug 2015, Tejun Heo wrote:


Hello, Vikas.

On Tue, Aug 04, 2015 at 11:50:16AM -0700, Vikas Shivappa wrote:

I will make this more clear in the documentation - We intend this cgroup
interface to be used by a root or superuser - more like a system
administrator being able to control the allocation of the threads , the one
who has the knowledge of the usage and being able to decide.


I get that this would be an easier "bolt-on" solution but isn't a good
solution by itself in the long term.  As I wrote multiple times
before, this is a really bad programmable interface.  Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications,


Yes, this doesnt have to be a programmable interface for threads. May not be a 
good idea to let the threads decide the cache allocation by themselves using this direct 
interface. We are transfering the decision maker responsibility to the system 
administrator.


- This interface like you said can easily bolt-on. basically an easy to use 
interface without worrying about the architectural details.
- But still does the job. root user can allocate exclusive or overlapping cache 
lines to threads or group of threads.
- No major roadblocks for usage as we can make the allocations like mentioned 
above and still keep the hierarchy etc and use it when needed.
- An important factor is that it can co-exist with other interfaces like #2 and 
#3 for the same easily. So I donot see a reason why we should not use this.
This is not meant to be a programmable interface, however it does not prevent 
co-existence.
- If root user has to set affinity of threads that he is allocating cache, he 
can do so using other cgroups like cpuset or set the masks seperately using 
taskset. This would let him configure the cache allocation on a socket.


this is a pretty bad interface by itself.



There is already a lot of such usage among different enterprise users at
Intel/google/cisco etc who have been testing the patches posted to lkml and
academically there is plenty of usage as well.


I mean, that's the tool you gave them.  Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too.  Again, please think of cpu affinity.


All the methodology to support the feature may need an arbitrator/agent to 
decide the allocation.


1. Let the root user or system administrator be the one who decides the
allocation based on the current usage. We assume this to be one with
administrative privileges. He could use the cgroup interface to perform the
task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup 
together.


2. Kernel automatically assigning the cache based on the priority of the apps
etc. This is something which could be designed to co-exist with the #1 above
much like how the cpusets cgroup co-exist with the kernel assigning cpus to 
tasks. (the task could be having a cache capacity mask 
just like the cpu affinity mask)


3. User programmable interface , where say a resource management program
x (and hence apps) could link a library which supports cache alloc/monitoring
etc and then try to control and monitor the resources. The arbitrator could just
be the resource management interface itself or the kernel could decide.

If users use this programmable interface, we need to 
make sure all the apps just cannot allocate resources without some interfacing 
agent (in which case they could interface with #2 ?).


Do you think there are any issues for the user programmable interface to 
co-exist with the cgroup interface ?


Thanks,
Vikas



Thanks.

--
tejun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Tejun Heo
Hello, Vikas.

On Tue, Aug 04, 2015 at 11:50:16AM -0700, Vikas Shivappa wrote:
> I will make this more clear in the documentation - We intend this cgroup
> interface to be used by a root or superuser - more like a system
> administrator being able to control the allocation of the threads , the one
> who has the knowledge of the usage and being able to decide.

I get that this would be an easier "bolt-on" solution but isn't a good
solution by itself in the long term.  As I wrote multiple times
before, this is a really bad programmable interface.  Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications, this is a pretty bad interface by itself.

> There is already a lot of such usage among different enterprise users at
> Intel/google/cisco etc who have been testing the patches posted to lkml and
> academically there is plenty of usage as well.

I mean, that's the tool you gave them.  Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too.  Again, please think of cpu affinity.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Tejun Heo
Hello,

On Tue, Aug 04, 2015 at 09:55:20AM -0300, Marcelo Tosatti wrote:
...
> Can't "cacheset" helper (similar to taskset) talk to systemd
> to achieve the flexibility you point ?

I don't know.  This is the case in point.  You're now suggesting doing
things completely backwards - a thread of an application talking to
external agent to tweak system management interface so that it can
change the attribute of that thread.  Let's please build a
programmable interface first.  I'm sure there are use cases which
aren't gonna be covered 100% but at the same time I'm sure just simple
inheritable per-thread attribute would cover majority of use cases.
This really isn't that different from CPU affinity after all.  *If* it
turns out that a lot of people yearn for fully hierarchical
enforcement, we sure can do that in the future but at this point it
really looks like an overkill in the wrong direction.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Vikas Shivappa


Hello Tejun,

On Sun, 2 Aug 2015, Tejun Heo wrote:


Hello, Vikas.

On Fri, Jul 31, 2015 at 09:24:58AM -0700, Vikas Shivappa wrote:

Yes today we dont have an alternative interface - but we can always build
one. We simply dont have it because till now Linux kernel just tolerated the
degradation that could have occured by cache contention and this is the
first interface we are building.


But we're doing it the wrong way around.  You can do most of what
cgroup interface can do with systemcall-like interface with some
inconvenience.  The other way doesn't really work.  As I wrote in the
other reply, cgroups is a horrible programmable interface and we don't
want individual applications to interact with it directly and CAT's
use cases most definitely include each application programming its own
cache mask.


I will make this more clear in the documentation - We intend this cgroup 
interface to be used by a root or superuser - more like a system administrator 
being able to control the allocation of the threads , the one who has the 
knowledge of the usage and being able to decide.


There is already a lot of such usage among different enterprise users at 
Intel/google/cisco etc who have been testing the patches posted to lkml and 
academically there is plenty of usage as well.


As a quick ref : below is a quick summary of usage

Cache Allocation Technology provides a way for the Software (OS/VMM) to
restrict cache allocation to a defined 'subset' of cache which may be
overlapping with other 'subsets'.
This feature is used when allocating a
line in cache ie when pulling new data into the cache.
- The tasks are grouped into CLOS (class of service). or grouped into a 
administrator created cgroup.

- Then OS uses MSR writes to indicate the
CLOSid of the thread when scheduling in (this is done by kernel) and to indicate 
the cache capacity associated with the CLOSid (the root user indicates the 
capacity for each task).

Currently cache allocation is supported for L3 cache.

More information can be found in the Intel SDM June 2015, Volume 3,
section 17.16.

Thanks,
Vikas

Let's build something which is simple and can be used

easily first.  If this turns out to be widely useful and an overall
management capability over it is wanted, we can consider cgroups then.

Thanks.

--
tejun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Tejun Heo
Hello,

On Mon, Aug 03, 2015 at 05:32:50PM -0300, Marcelo Tosatti wrote:
> You really want to specify the cache configuration "at once": 
> having process-A exclusive access to 2MB of cache at all times,
> and process-B 4MB exclusive, means you can't have process-C use 4MB of 
> cache exclusively (consider 8MB cache machine).

This is akin to arguing for implementing cpuset without
sched_setaffinity() or any other facility to adjust affinity.  People
have been using affinity fine before cgroups.  Sure, certain things
are cumbersome but cgroups isn't a replacement for a proper API.

> > cgroups is not a superset of a programmable interface.  It has
> > distinctive disadvantages and not a substitute with hirearchy support
> > for regular systemcall-like interface.  I don't think it makes sense
> > to go full-on hierarchical cgroups when we don't have basic interface
> > which is likely to cover many use cases better.  A syscall-like
> > interface combined with a tool similar to taskset would cover a lot in
> > a more accessible way.
> 
> How are you going to specify sharing of portions of cache by two sets
> of tasks with a syscall interface?

Again, think about how people have been using CPU affinity.

> > cpuset-style allocation can be easier for things like this but that
> > should be an addition on top not the one and only interface.  How is
> > it gonna handle if multiple threads of a process want to restrict
> > cache usages to avoid stepping on each other's toes?  Delegate the
> > subdirectory and let the process itself open it and write to files to
> > configure when there isn't even a way to atomically access the
> > process's own directory or a way to synchronize against migration?
> 
> One would preconfigure that in advance - but you are right, a 
> syscall interface is more flexible in that respect.

I'm not trying to say cgroup controller would be useless but the
current approach seems somewhat backwards and over-engineered.  Can't
we just start with something simple?  e.g. a platform device driver
that allows restricting cache usage of a target thread (be that self
or ptraceable target)?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Marcelo Tosatti
On Mon, Aug 03, 2015 at 05:32:50PM -0300, Marcelo Tosatti wrote:
> On Sun, Aug 02, 2015 at 12:23:25PM -0400, Tejun Heo wrote:
> > Hello,
> > 
> > On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
> > > > I don't really think it makes sense to implement a fully hierarchical
> > > > cgroup solution when there isn't the basic affinity-adjusting
> > > > interface 
> > > 
> > > What is an "affinity adjusting interface" ? Can you give an example
> > > please?
> > 
> > Something similar to sched_setaffinity().  Just a syscall / prctl or
> > whatever programmable interface which sets per-task attribute.
> 
> You really want to specify the cache configuration "at once": 
> having process-A exclusive access to 2MB of cache at all times,
> and process-B 4MB exclusive, means you can't have process-C use 4MB of 
> cache exclusively (consider 8MB cache machine).

Thats not true. Its fine to setup the 

task set <--> cache portion

mapping in pieces.

In fact, its more natural because you don't necessarily know in advance
the entire cache allocation (think of "cp largefile /destination" with
sequential use-once behavior).

However, there is a use-case for sharing: in scenario 1 it might be
possible (and desired) to share code between applications.

> > > > and it isn't clear whether fully hierarchical resource
> > > > distribution would be necessary especially given that the granularity
> > > > of the target resource is very coarse.
> > > 
> > > As i see it, the benefit of the hierarchical structure to the CAT
> > > configuration is simply to organize sharing of cache ways in subtrees
> > > - two cgroups can share a given cache way only if they have a common
> > > parent. 
> > > 
> > > That is the only benefit. Vikas, please correct me if i'm wrong.
> > 
> > cgroups is not a superset of a programmable interface.  It has
> > distinctive disadvantages and not a substitute with hirearchy support
> > for regular systemcall-like interface.  I don't think it makes sense
> > to go full-on hierarchical cgroups when we don't have basic interface
> > which is likely to cover many use cases better.  A syscall-like
> > interface combined with a tool similar to taskset would cover a lot in
> > a more accessible way.
> 
> How are you going to specify sharing of portions of cache by two sets
> of tasks with a syscall interface?
> 
> > > > I can see that how cpuset would seem to invite this sort of usage but
> > > > cpuset itself is more of an arbitrary outgrowth (regardless of
> > > > history) in terms of resource control and most things controlled by
> > > > cpuset already have countepart interface which is readily accessible
> > > > to the normal applications.
> > > 
> > > I can't parse that phrase (due to ignorance). Please educate.
> > 
> > Hmmm... consider CPU affinity.  cpuset definitely is useful for some
> > use cases as a management tool especially if the workloads are not
> > cooperative or delegated; however, it's no substitute for a proper
> > syscall interface and it'd be silly to try to replace that with
> > cpuset.
> > 
> > > > Given that what the feature allows is restricting usage rather than
> > > > granting anything exclusively, a programmable interface wouldn't need
> > > > to worry about complications around priviledges
> > > 
> > > What complications about priviledges you refer to?
> > 
> > It's not granting exclusive access, so individual user applications
> > can be allowed to do whatever it wanna do as long as the issuer has
> > enough priv over the target task.
> 
> Priviledge management with cgroup system: to change cache allocation
> requires priviledge over cgroups.
> 
> Priviledge management with system call interface: applications 
> could be allowed to reserve up to a certain percentage of the cache.
> 
> > > > while being able to reap most of the benefits in an a lot easier way.
> > > > Am I missing something?
> > > 
> > > The interface does allow for exclusive cache usage by an application.
> > > Please read the Intel manual, section 17, it is very instructive.
> > 
> > For that, it'd have to require some CAP but I think just having
> > restrictive interface in the style of CPU or NUMA affinity would go a
> > long way.
> > 
> > > The use cases we have now are the following:
> > > 
> > > Scenario 1: Consider a system with 4 high performance applications
> > > running, one of which is a streaming application that manages a very
> > > large address space from which it reads and writes as it does its 
> > > processing.
> > > As such the application will use all the cache it can get but does
> > > not need much if any cache. So, it spoils the cache for everyone for no
> > > gain on its own. In this case we'd like to constrain it to the
> > > smallest possible amount of cache while at the same time constraining
> > > the other 3 applications to stay out of this thrashed area of the
> > > cache.
> > 
> > A tool in the style of taskset should be enough for the above
> > scenario.
> > 
> 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Marcelo Tosatti
On Mon, Aug 03, 2015 at 05:32:50PM -0300, Marcelo Tosatti wrote:
 On Sun, Aug 02, 2015 at 12:23:25PM -0400, Tejun Heo wrote:
  Hello,
  
  On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
I don't really think it makes sense to implement a fully hierarchical
cgroup solution when there isn't the basic affinity-adjusting
interface 
   
   What is an affinity adjusting interface ? Can you give an example
   please?
  
  Something similar to sched_setaffinity().  Just a syscall / prctl or
  whatever programmable interface which sets per-task attribute.
 
 You really want to specify the cache configuration at once: 
 having process-A exclusive access to 2MB of cache at all times,
 and process-B 4MB exclusive, means you can't have process-C use 4MB of 
 cache exclusively (consider 8MB cache machine).

Thats not true. Its fine to setup the 

task set -- cache portion

mapping in pieces.

In fact, its more natural because you don't necessarily know in advance
the entire cache allocation (think of cp largefile /destination with
sequential use-once behavior).

However, there is a use-case for sharing: in scenario 1 it might be
possible (and desired) to share code between applications.

and it isn't clear whether fully hierarchical resource
distribution would be necessary especially given that the granularity
of the target resource is very coarse.
   
   As i see it, the benefit of the hierarchical structure to the CAT
   configuration is simply to organize sharing of cache ways in subtrees
   - two cgroups can share a given cache way only if they have a common
   parent. 
   
   That is the only benefit. Vikas, please correct me if i'm wrong.
  
  cgroups is not a superset of a programmable interface.  It has
  distinctive disadvantages and not a substitute with hirearchy support
  for regular systemcall-like interface.  I don't think it makes sense
  to go full-on hierarchical cgroups when we don't have basic interface
  which is likely to cover many use cases better.  A syscall-like
  interface combined with a tool similar to taskset would cover a lot in
  a more accessible way.
 
 How are you going to specify sharing of portions of cache by two sets
 of tasks with a syscall interface?
 
I can see that how cpuset would seem to invite this sort of usage but
cpuset itself is more of an arbitrary outgrowth (regardless of
history) in terms of resource control and most things controlled by
cpuset already have countepart interface which is readily accessible
to the normal applications.
   
   I can't parse that phrase (due to ignorance). Please educate.
  
  Hmmm... consider CPU affinity.  cpuset definitely is useful for some
  use cases as a management tool especially if the workloads are not
  cooperative or delegated; however, it's no substitute for a proper
  syscall interface and it'd be silly to try to replace that with
  cpuset.
  
Given that what the feature allows is restricting usage rather than
granting anything exclusively, a programmable interface wouldn't need
to worry about complications around priviledges
   
   What complications about priviledges you refer to?
  
  It's not granting exclusive access, so individual user applications
  can be allowed to do whatever it wanna do as long as the issuer has
  enough priv over the target task.
 
 Priviledge management with cgroup system: to change cache allocation
 requires priviledge over cgroups.
 
 Priviledge management with system call interface: applications 
 could be allowed to reserve up to a certain percentage of the cache.
 
while being able to reap most of the benefits in an a lot easier way.
Am I missing something?
   
   The interface does allow for exclusive cache usage by an application.
   Please read the Intel manual, section 17, it is very instructive.
  
  For that, it'd have to require some CAP but I think just having
  restrictive interface in the style of CPU or NUMA affinity would go a
  long way.
  
   The use cases we have now are the following:
   
   Scenario 1: Consider a system with 4 high performance applications
   running, one of which is a streaming application that manages a very
   large address space from which it reads and writes as it does its 
   processing.
   As such the application will use all the cache it can get but does
   not need much if any cache. So, it spoils the cache for everyone for no
   gain on its own. In this case we'd like to constrain it to the
   smallest possible amount of cache while at the same time constraining
   the other 3 applications to stay out of this thrashed area of the
   cache.
  
  A tool in the style of taskset should be enough for the above
  scenario.
  
   Scenario 2: We have a numeric application that has been highly optimized
   to fit in the L2 cache (2M for example). We want to ensure that its
   cached data does not get flushed from the cache hierarchy while it is
   scheduled out. In this 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Vikas Shivappa



On Tue, 4 Aug 2015, Tejun Heo wrote:


Hello, Vikas.

On Tue, Aug 04, 2015 at 11:50:16AM -0700, Vikas Shivappa wrote:

I will make this more clear in the documentation - We intend this cgroup
interface to be used by a root or superuser - more like a system
administrator being able to control the allocation of the threads , the one
who has the knowledge of the usage and being able to decide.


I get that this would be an easier bolt-on solution but isn't a good
solution by itself in the long term.  As I wrote multiple times
before, this is a really bad programmable interface.  Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications,


Yes, this doesnt have to be a programmable interface for threads. May not be a 
good idea to let the threads decide the cache allocation by themselves using this direct 
interface. We are transfering the decision maker responsibility to the system 
administrator.


- This interface like you said can easily bolt-on. basically an easy to use 
interface without worrying about the architectural details.
- But still does the job. root user can allocate exclusive or overlapping cache 
lines to threads or group of threads.
- No major roadblocks for usage as we can make the allocations like mentioned 
above and still keep the hierarchy etc and use it when needed.
- An important factor is that it can co-exist with other interfaces like #2 and 
#3 for the same easily. So I donot see a reason why we should not use this.
This is not meant to be a programmable interface, however it does not prevent 
co-existence.
- If root user has to set affinity of threads that he is allocating cache, he 
can do so using other cgroups like cpuset or set the masks seperately using 
taskset. This would let him configure the cache allocation on a socket.


this is a pretty bad interface by itself.



There is already a lot of such usage among different enterprise users at
Intel/google/cisco etc who have been testing the patches posted to lkml and
academically there is plenty of usage as well.


I mean, that's the tool you gave them.  Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too.  Again, please think of cpu affinity.


All the methodology to support the feature may need an arbitrator/agent to 
decide the allocation.


1. Let the root user or system administrator be the one who decides the
allocation based on the current usage. We assume this to be one with
administrative privileges. He could use the cgroup interface to perform the
task. One way to do the cpu affinity is by mounting cpuset and rdt cgroup 
together.


2. Kernel automatically assigning the cache based on the priority of the apps
etc. This is something which could be designed to co-exist with the #1 above
much like how the cpusets cgroup co-exist with the kernel assigning cpus to 
tasks. (the task could be having a cache capacity mask 
just like the cpu affinity mask)


3. User programmable interface , where say a resource management program
x (and hence apps) could link a library which supports cache alloc/monitoring
etc and then try to control and monitor the resources. The arbitrator could just
be the resource management interface itself or the kernel could decide.

If users use this programmable interface, we need to 
make sure all the apps just cannot allocate resources without some interfacing 
agent (in which case they could interface with #2 ?).


Do you think there are any issues for the user programmable interface to 
co-exist with the cgroup interface ?


Thanks,
Vikas



Thanks.

--
tejun


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Vikas Shivappa


Hello Tejun,

On Sun, 2 Aug 2015, Tejun Heo wrote:


Hello, Vikas.

On Fri, Jul 31, 2015 at 09:24:58AM -0700, Vikas Shivappa wrote:

Yes today we dont have an alternative interface - but we can always build
one. We simply dont have it because till now Linux kernel just tolerated the
degradation that could have occured by cache contention and this is the
first interface we are building.


But we're doing it the wrong way around.  You can do most of what
cgroup interface can do with systemcall-like interface with some
inconvenience.  The other way doesn't really work.  As I wrote in the
other reply, cgroups is a horrible programmable interface and we don't
want individual applications to interact with it directly and CAT's
use cases most definitely include each application programming its own
cache mask.


I will make this more clear in the documentation - We intend this cgroup 
interface to be used by a root or superuser - more like a system administrator 
being able to control the allocation of the threads , the one who has the 
knowledge of the usage and being able to decide.


There is already a lot of such usage among different enterprise users at 
Intel/google/cisco etc who have been testing the patches posted to lkml and 
academically there is plenty of usage as well.


As a quick ref : below is a quick summary of usage

Cache Allocation Technology provides a way for the Software (OS/VMM) to
restrict cache allocation to a defined 'subset' of cache which may be
overlapping with other 'subsets'.
This feature is used when allocating a
line in cache ie when pulling new data into the cache.
- The tasks are grouped into CLOS (class of service). or grouped into a 
administrator created cgroup.

- Then OS uses MSR writes to indicate the
CLOSid of the thread when scheduling in (this is done by kernel) and to indicate 
the cache capacity associated with the CLOSid (the root user indicates the 
capacity for each task).

Currently cache allocation is supported for L3 cache.

More information can be found in the Intel SDM June 2015, Volume 3,
section 17.16.

Thanks,
Vikas

Let's build something which is simple and can be used

easily first.  If this turns out to be widely useful and an overall
management capability over it is wanted, we can consider cgroups then.

Thanks.

--
tejun


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Tejun Heo
Hello,

On Mon, Aug 03, 2015 at 05:32:50PM -0300, Marcelo Tosatti wrote:
 You really want to specify the cache configuration at once: 
 having process-A exclusive access to 2MB of cache at all times,
 and process-B 4MB exclusive, means you can't have process-C use 4MB of 
 cache exclusively (consider 8MB cache machine).

This is akin to arguing for implementing cpuset without
sched_setaffinity() or any other facility to adjust affinity.  People
have been using affinity fine before cgroups.  Sure, certain things
are cumbersome but cgroups isn't a replacement for a proper API.

  cgroups is not a superset of a programmable interface.  It has
  distinctive disadvantages and not a substitute with hirearchy support
  for regular systemcall-like interface.  I don't think it makes sense
  to go full-on hierarchical cgroups when we don't have basic interface
  which is likely to cover many use cases better.  A syscall-like
  interface combined with a tool similar to taskset would cover a lot in
  a more accessible way.
 
 How are you going to specify sharing of portions of cache by two sets
 of tasks with a syscall interface?

Again, think about how people have been using CPU affinity.

  cpuset-style allocation can be easier for things like this but that
  should be an addition on top not the one and only interface.  How is
  it gonna handle if multiple threads of a process want to restrict
  cache usages to avoid stepping on each other's toes?  Delegate the
  subdirectory and let the process itself open it and write to files to
  configure when there isn't even a way to atomically access the
  process's own directory or a way to synchronize against migration?
 
 One would preconfigure that in advance - but you are right, a 
 syscall interface is more flexible in that respect.

I'm not trying to say cgroup controller would be useless but the
current approach seems somewhat backwards and over-engineered.  Can't
we just start with something simple?  e.g. a platform device driver
that allows restricting cache usage of a target thread (be that self
or ptraceable target)?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Tejun Heo
Hello,

On Tue, Aug 04, 2015 at 09:55:20AM -0300, Marcelo Tosatti wrote:
...
 Can't cacheset helper (similar to taskset) talk to systemd
 to achieve the flexibility you point ?

I don't know.  This is the case in point.  You're now suggesting doing
things completely backwards - a thread of an application talking to
external agent to tweak system management interface so that it can
change the attribute of that thread.  Let's please build a
programmable interface first.  I'm sure there are use cases which
aren't gonna be covered 100% but at the same time I'm sure just simple
inheritable per-thread attribute would cover majority of use cases.
This really isn't that different from CPU affinity after all.  *If* it
turns out that a lot of people yearn for fully hierarchical
enforcement, we sure can do that in the future but at this point it
really looks like an overkill in the wrong direction.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-04 Thread Tejun Heo
Hello, Vikas.

On Tue, Aug 04, 2015 at 11:50:16AM -0700, Vikas Shivappa wrote:
 I will make this more clear in the documentation - We intend this cgroup
 interface to be used by a root or superuser - more like a system
 administrator being able to control the allocation of the threads , the one
 who has the knowledge of the usage and being able to decide.

I get that this would be an easier bolt-on solution but isn't a good
solution by itself in the long term.  As I wrote multiple times
before, this is a really bad programmable interface.  Unless you're
sure that this doesn't have to be programmable for threads of an
individual applications, this is a pretty bad interface by itself.

 There is already a lot of such usage among different enterprise users at
 Intel/google/cisco etc who have been testing the patches posted to lkml and
 academically there is plenty of usage as well.

I mean, that's the tool you gave them.  Of course they'd be using it
but I suspect most of them would do fine with a programmable interface
too.  Again, please think of cpu affinity.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-03 Thread Marcelo Tosatti
On Sun, Aug 02, 2015 at 12:23:25PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
> > > I don't really think it makes sense to implement a fully hierarchical
> > > cgroup solution when there isn't the basic affinity-adjusting
> > > interface 
> > 
> > What is an "affinity adjusting interface" ? Can you give an example
> > please?
> 
> Something similar to sched_setaffinity().  Just a syscall / prctl or
> whatever programmable interface which sets per-task attribute.

You really want to specify the cache configuration "at once": 
having process-A exclusive access to 2MB of cache at all times,
and process-B 4MB exclusive, means you can't have process-C use 4MB of 
cache exclusively (consider 8MB cache machine).

But the syscall allows processes to set and retrieve

> > > and it isn't clear whether fully hierarchical resource
> > > distribution would be necessary especially given that the granularity
> > > of the target resource is very coarse.
> > 
> > As i see it, the benefit of the hierarchical structure to the CAT
> > configuration is simply to organize sharing of cache ways in subtrees
> > - two cgroups can share a given cache way only if they have a common
> > parent. 
> > 
> > That is the only benefit. Vikas, please correct me if i'm wrong.
> 
> cgroups is not a superset of a programmable interface.  It has
> distinctive disadvantages and not a substitute with hirearchy support
> for regular systemcall-like interface.  I don't think it makes sense
> to go full-on hierarchical cgroups when we don't have basic interface
> which is likely to cover many use cases better.  A syscall-like
> interface combined with a tool similar to taskset would cover a lot in
> a more accessible way.

How are you going to specify sharing of portions of cache by two sets
of tasks with a syscall interface?

> > > I can see that how cpuset would seem to invite this sort of usage but
> > > cpuset itself is more of an arbitrary outgrowth (regardless of
> > > history) in terms of resource control and most things controlled by
> > > cpuset already have countepart interface which is readily accessible
> > > to the normal applications.
> > 
> > I can't parse that phrase (due to ignorance). Please educate.
> 
> Hmmm... consider CPU affinity.  cpuset definitely is useful for some
> use cases as a management tool especially if the workloads are not
> cooperative or delegated; however, it's no substitute for a proper
> syscall interface and it'd be silly to try to replace that with
> cpuset.
> 
> > > Given that what the feature allows is restricting usage rather than
> > > granting anything exclusively, a programmable interface wouldn't need
> > > to worry about complications around priviledges
> > 
> > What complications about priviledges you refer to?
> 
> It's not granting exclusive access, so individual user applications
> can be allowed to do whatever it wanna do as long as the issuer has
> enough priv over the target task.

Priviledge management with cgroup system: to change cache allocation
requires priviledge over cgroups.

Priviledge management with system call interface: applications 
could be allowed to reserve up to a certain percentage of the cache.

> > > while being able to reap most of the benefits in an a lot easier way.
> > > Am I missing something?
> > 
> > The interface does allow for exclusive cache usage by an application.
> > Please read the Intel manual, section 17, it is very instructive.
> 
> For that, it'd have to require some CAP but I think just having
> restrictive interface in the style of CPU or NUMA affinity would go a
> long way.
> 
> > The use cases we have now are the following:
> > 
> > Scenario 1: Consider a system with 4 high performance applications
> > running, one of which is a streaming application that manages a very
> > large address space from which it reads and writes as it does its 
> > processing.
> > As such the application will use all the cache it can get but does
> > not need much if any cache. So, it spoils the cache for everyone for no
> > gain on its own. In this case we'd like to constrain it to the
> > smallest possible amount of cache while at the same time constraining
> > the other 3 applications to stay out of this thrashed area of the
> > cache.
> 
> A tool in the style of taskset should be enough for the above
> scenario.
> 
> > Scenario 2: We have a numeric application that has been highly optimized
> > to fit in the L2 cache (2M for example). We want to ensure that its
> > cached data does not get flushed from the cache hierarchy while it is
> > scheduled out. In this case we exclusively allocate enough L3 cache to
> > hold all of the L2 cache.
> >
> > Scenario 3: Latency sensitive application executing in a shared
> > environment, where memory to handle an event must be in L3 cache
> > for latency requirements to be met.
> 
> Either isolate CPUs or run other stuff with affinity restricted.
> 
> cpuset-style 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-03 Thread Marcelo Tosatti
On Sun, Aug 02, 2015 at 12:23:25PM -0400, Tejun Heo wrote:
 Hello,
 
 On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
   I don't really think it makes sense to implement a fully hierarchical
   cgroup solution when there isn't the basic affinity-adjusting
   interface 
  
  What is an affinity adjusting interface ? Can you give an example
  please?
 
 Something similar to sched_setaffinity().  Just a syscall / prctl or
 whatever programmable interface which sets per-task attribute.

You really want to specify the cache configuration at once: 
having process-A exclusive access to 2MB of cache at all times,
and process-B 4MB exclusive, means you can't have process-C use 4MB of 
cache exclusively (consider 8MB cache machine).

But the syscall allows processes to set and retrieve

   and it isn't clear whether fully hierarchical resource
   distribution would be necessary especially given that the granularity
   of the target resource is very coarse.
  
  As i see it, the benefit of the hierarchical structure to the CAT
  configuration is simply to organize sharing of cache ways in subtrees
  - two cgroups can share a given cache way only if they have a common
  parent. 
  
  That is the only benefit. Vikas, please correct me if i'm wrong.
 
 cgroups is not a superset of a programmable interface.  It has
 distinctive disadvantages and not a substitute with hirearchy support
 for regular systemcall-like interface.  I don't think it makes sense
 to go full-on hierarchical cgroups when we don't have basic interface
 which is likely to cover many use cases better.  A syscall-like
 interface combined with a tool similar to taskset would cover a lot in
 a more accessible way.

How are you going to specify sharing of portions of cache by two sets
of tasks with a syscall interface?

   I can see that how cpuset would seem to invite this sort of usage but
   cpuset itself is more of an arbitrary outgrowth (regardless of
   history) in terms of resource control and most things controlled by
   cpuset already have countepart interface which is readily accessible
   to the normal applications.
  
  I can't parse that phrase (due to ignorance). Please educate.
 
 Hmmm... consider CPU affinity.  cpuset definitely is useful for some
 use cases as a management tool especially if the workloads are not
 cooperative or delegated; however, it's no substitute for a proper
 syscall interface and it'd be silly to try to replace that with
 cpuset.
 
   Given that what the feature allows is restricting usage rather than
   granting anything exclusively, a programmable interface wouldn't need
   to worry about complications around priviledges
  
  What complications about priviledges you refer to?
 
 It's not granting exclusive access, so individual user applications
 can be allowed to do whatever it wanna do as long as the issuer has
 enough priv over the target task.

Priviledge management with cgroup system: to change cache allocation
requires priviledge over cgroups.

Priviledge management with system call interface: applications 
could be allowed to reserve up to a certain percentage of the cache.

   while being able to reap most of the benefits in an a lot easier way.
   Am I missing something?
  
  The interface does allow for exclusive cache usage by an application.
  Please read the Intel manual, section 17, it is very instructive.
 
 For that, it'd have to require some CAP but I think just having
 restrictive interface in the style of CPU or NUMA affinity would go a
 long way.
 
  The use cases we have now are the following:
  
  Scenario 1: Consider a system with 4 high performance applications
  running, one of which is a streaming application that manages a very
  large address space from which it reads and writes as it does its 
  processing.
  As such the application will use all the cache it can get but does
  not need much if any cache. So, it spoils the cache for everyone for no
  gain on its own. In this case we'd like to constrain it to the
  smallest possible amount of cache while at the same time constraining
  the other 3 applications to stay out of this thrashed area of the
  cache.
 
 A tool in the style of taskset should be enough for the above
 scenario.
 
  Scenario 2: We have a numeric application that has been highly optimized
  to fit in the L2 cache (2M for example). We want to ensure that its
  cached data does not get flushed from the cache hierarchy while it is
  scheduled out. In this case we exclusively allocate enough L3 cache to
  hold all of the L2 cache.
 
  Scenario 3: Latency sensitive application executing in a shared
  environment, where memory to handle an event must be in L3 cache
  for latency requirements to be met.
 
 Either isolate CPUs or run other stuff with affinity restricted.
 
 cpuset-style allocation can be easier for things like this but that
 should be an addition on top not the one and only interface.  How is
 it gonna handle if multiple threads of a 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-02 Thread Tejun Heo
Hello, Vikas.

On Fri, Jul 31, 2015 at 09:24:58AM -0700, Vikas Shivappa wrote:
> Yes today we dont have an alternative interface - but we can always build
> one. We simply dont have it because till now Linux kernel just tolerated the
> degradation that could have occured by cache contention and this is the
> first interface we are building.

But we're doing it the wrong way around.  You can do most of what
cgroup interface can do with systemcall-like interface with some
inconvenience.  The other way doesn't really work.  As I wrote in the
other reply, cgroups is a horrible programmable interface and we don't
want individual applications to interact with it directly and CAT's
use cases most definitely include each application programming its own
cache mask.  Let's build something which is simple and can be used
easily first.  If this turns out to be widely useful and an overall
management capability over it is wanted, we can consider cgroups then.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-02 Thread Tejun Heo
Hello,

On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
> > I don't really think it makes sense to implement a fully hierarchical
> > cgroup solution when there isn't the basic affinity-adjusting
> > interface 
> 
> What is an "affinity adjusting interface" ? Can you give an example
> please?

Something similar to sched_setaffinity().  Just a syscall / prctl or
whatever programmable interface which sets per-task attribute.

> > and it isn't clear whether fully hierarchical resource
> > distribution would be necessary especially given that the granularity
> > of the target resource is very coarse.
> 
> As i see it, the benefit of the hierarchical structure to the CAT
> configuration is simply to organize sharing of cache ways in subtrees
> - two cgroups can share a given cache way only if they have a common
> parent. 
> 
> That is the only benefit. Vikas, please correct me if i'm wrong.

cgroups is not a superset of a programmable interface.  It has
distinctive disadvantages and not a substitute with hirearchy support
for regular systemcall-like interface.  I don't think it makes sense
to go full-on hierarchical cgroups when we don't have basic interface
which is likely to cover many use cases better.  A syscall-like
interface combined with a tool similar to taskset would cover a lot in
a more accessible way.

> > I can see that how cpuset would seem to invite this sort of usage but
> > cpuset itself is more of an arbitrary outgrowth (regardless of
> > history) in terms of resource control and most things controlled by
> > cpuset already have countepart interface which is readily accessible
> > to the normal applications.
> 
> I can't parse that phrase (due to ignorance). Please educate.

Hmmm... consider CPU affinity.  cpuset definitely is useful for some
use cases as a management tool especially if the workloads are not
cooperative or delegated; however, it's no substitute for a proper
syscall interface and it'd be silly to try to replace that with
cpuset.

> > Given that what the feature allows is restricting usage rather than
> > granting anything exclusively, a programmable interface wouldn't need
> > to worry about complications around priviledges
> 
> What complications about priviledges you refer to?

It's not granting exclusive access, so individual user applications
can be allowed to do whatever it wanna do as long as the issuer has
enough priv over the target task.

> > while being able to reap most of the benefits in an a lot easier way.
> > Am I missing something?
> 
> The interface does allow for exclusive cache usage by an application.
> Please read the Intel manual, section 17, it is very instructive.

For that, it'd have to require some CAP but I think just having
restrictive interface in the style of CPU or NUMA affinity would go a
long way.

> The use cases we have now are the following:
> 
> Scenario 1: Consider a system with 4 high performance applications
> running, one of which is a streaming application that manages a very
> large address space from which it reads and writes as it does its processing.
> As such the application will use all the cache it can get but does
> not need much if any cache. So, it spoils the cache for everyone for no
> gain on its own. In this case we'd like to constrain it to the
> smallest possible amount of cache while at the same time constraining
> the other 3 applications to stay out of this thrashed area of the
> cache.

A tool in the style of taskset should be enough for the above
scenario.

> Scenario 2: We have a numeric application that has been highly optimized
> to fit in the L2 cache (2M for example). We want to ensure that its
> cached data does not get flushed from the cache hierarchy while it is
> scheduled out. In this case we exclusively allocate enough L3 cache to
> hold all of the L2 cache.
>
> Scenario 3: Latency sensitive application executing in a shared
> environment, where memory to handle an event must be in L3 cache
> for latency requirements to be met.

Either isolate CPUs or run other stuff with affinity restricted.

cpuset-style allocation can be easier for things like this but that
should be an addition on top not the one and only interface.  How is
it gonna handle if multiple threads of a process want to restrict
cache usages to avoid stepping on each other's toes?  Delegate the
subdirectory and let the process itself open it and write to files to
configure when there isn't even a way to atomically access the
process's own directory or a way to synchronize against migration?
cgroups may be an okay management interface but a horrible
programmable interface.

Sure, if this turns out to be as important as cpu or numa affinity and
gets widely used creating management burden in many use cases, we sure
can add cgroups controller for it but that's a remote possibility at
this point and the current attempt is over-engineering solution for
problems which haven't been shown to exist.  Let's please first
implement 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-02 Thread Tejun Heo
Hello,

On Fri, Jul 31, 2015 at 12:12:18PM -0300, Marcelo Tosatti wrote:
  I don't really think it makes sense to implement a fully hierarchical
  cgroup solution when there isn't the basic affinity-adjusting
  interface 
 
 What is an affinity adjusting interface ? Can you give an example
 please?

Something similar to sched_setaffinity().  Just a syscall / prctl or
whatever programmable interface which sets per-task attribute.

  and it isn't clear whether fully hierarchical resource
  distribution would be necessary especially given that the granularity
  of the target resource is very coarse.
 
 As i see it, the benefit of the hierarchical structure to the CAT
 configuration is simply to organize sharing of cache ways in subtrees
 - two cgroups can share a given cache way only if they have a common
 parent. 
 
 That is the only benefit. Vikas, please correct me if i'm wrong.

cgroups is not a superset of a programmable interface.  It has
distinctive disadvantages and not a substitute with hirearchy support
for regular systemcall-like interface.  I don't think it makes sense
to go full-on hierarchical cgroups when we don't have basic interface
which is likely to cover many use cases better.  A syscall-like
interface combined with a tool similar to taskset would cover a lot in
a more accessible way.

  I can see that how cpuset would seem to invite this sort of usage but
  cpuset itself is more of an arbitrary outgrowth (regardless of
  history) in terms of resource control and most things controlled by
  cpuset already have countepart interface which is readily accessible
  to the normal applications.
 
 I can't parse that phrase (due to ignorance). Please educate.

Hmmm... consider CPU affinity.  cpuset definitely is useful for some
use cases as a management tool especially if the workloads are not
cooperative or delegated; however, it's no substitute for a proper
syscall interface and it'd be silly to try to replace that with
cpuset.

  Given that what the feature allows is restricting usage rather than
  granting anything exclusively, a programmable interface wouldn't need
  to worry about complications around priviledges
 
 What complications about priviledges you refer to?

It's not granting exclusive access, so individual user applications
can be allowed to do whatever it wanna do as long as the issuer has
enough priv over the target task.

  while being able to reap most of the benefits in an a lot easier way.
  Am I missing something?
 
 The interface does allow for exclusive cache usage by an application.
 Please read the Intel manual, section 17, it is very instructive.

For that, it'd have to require some CAP but I think just having
restrictive interface in the style of CPU or NUMA affinity would go a
long way.

 The use cases we have now are the following:
 
 Scenario 1: Consider a system with 4 high performance applications
 running, one of which is a streaming application that manages a very
 large address space from which it reads and writes as it does its processing.
 As such the application will use all the cache it can get but does
 not need much if any cache. So, it spoils the cache for everyone for no
 gain on its own. In this case we'd like to constrain it to the
 smallest possible amount of cache while at the same time constraining
 the other 3 applications to stay out of this thrashed area of the
 cache.

A tool in the style of taskset should be enough for the above
scenario.

 Scenario 2: We have a numeric application that has been highly optimized
 to fit in the L2 cache (2M for example). We want to ensure that its
 cached data does not get flushed from the cache hierarchy while it is
 scheduled out. In this case we exclusively allocate enough L3 cache to
 hold all of the L2 cache.

 Scenario 3: Latency sensitive application executing in a shared
 environment, where memory to handle an event must be in L3 cache
 for latency requirements to be met.

Either isolate CPUs or run other stuff with affinity restricted.

cpuset-style allocation can be easier for things like this but that
should be an addition on top not the one and only interface.  How is
it gonna handle if multiple threads of a process want to restrict
cache usages to avoid stepping on each other's toes?  Delegate the
subdirectory and let the process itself open it and write to files to
configure when there isn't even a way to atomically access the
process's own directory or a way to synchronize against migration?
cgroups may be an okay management interface but a horrible
programmable interface.

Sure, if this turns out to be as important as cpu or numa affinity and
gets widely used creating management burden in many use cases, we sure
can add cgroups controller for it but that's a remote possibility at
this point and the current attempt is over-engineering solution for
problems which haven't been shown to exist.  Let's please first
implement something simple and easy to use.

Thanks.

-- 
tejun
--
To unsubscribe from 

Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-08-02 Thread Tejun Heo
Hello, Vikas.

On Fri, Jul 31, 2015 at 09:24:58AM -0700, Vikas Shivappa wrote:
 Yes today we dont have an alternative interface - but we can always build
 one. We simply dont have it because till now Linux kernel just tolerated the
 degradation that could have occured by cache contention and this is the
 first interface we are building.

But we're doing it the wrong way around.  You can do most of what
cgroup interface can do with systemcall-like interface with some
inconvenience.  The other way doesn't really work.  As I wrote in the
other reply, cgroups is a horrible programmable interface and we don't
want individual applications to interact with it directly and CAT's
use cases most definitely include each application programming its own
cache mask.  Let's build something which is simple and can be used
easily first.  If this turns out to be widely useful and an overall
management capability over it is wanted, we can consider cgroups then.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-31 Thread Vikas Shivappa



On Thu, 30 Jul 2015, Tejun Heo wrote:


Hello, Vikas.

On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:

This patch adds a cgroup subsystem for Intel Resource Director
Technology(RDT) feature and Class of service(CLOSid) management which is
part of common RDT framework.  This cgroup would eventually be used by
all sub-features of RDT and hence be associated with the common RDT
framework as well as sub-feature specific framework.  However current
patch series only adds cache allocation sub-feature specific code.

When a cgroup directory is created it has a CLOSid associated with it
which is inherited from its parent.  The Closid is mapped to a
cache_mask which represents the L3 cache allocation to the cgroup.
Tasks belonging to the cgroup get to fill the cache represented by the
cache_mask.


First of all, I apologize for being so late.  I've been thinking about
it but the thoughts didn't quite crystalize (which isn't to say that
it's very crystal now) until recently.  If I understand correctly,
there are a couple suggested use cases for explicitly managing cache
usage.

1. Pinning known hot areas of memory in cache.


No , the cache allocation doesnt do this. (or it isn't expected to do)



2. Explicitly regulating cache usage so that cacheline allocation can
  be better than CPU itself doing it.


yes , this is what we want to do using cache alloc.



#1 isn't part of this patchset, right?  Is there any plan for working
towards this too?


cache allocation is not intended to do #1 , so we dont have to support this.



For #2, it is likely that the targeted use cases would involve threads
of a process or at least cooperating processes and having a simple API
which just goes "this (or the current) thread is only gonna use this
part of cache" would be a lot easier to use and actually beneficial.

I don't really think it makes sense to implement a fully hierarchical
cgroup solution when there isn't the basic affinity-adjusting
interface and it isn't clear whether fully hierarchical resource
distribution would be necessary especially given that the granularity
of the target resource is very coarse.

I can see that how cpuset would seem to invite this sort of usage but
cpuset itself is more of an arbitrary outgrowth (regardless of
history) in terms of resource control and most things controlled by
cpuset already have countepart interface which is readily accessible
to the normal applications.


Yes today we dont have an alternative interface - but we can always build one. 
We simply dont have it because till now Linux kernel just tolerated the 
degradation that could have occured by cache contention and this is the first 
interface we are building.




Given that what the feature allows is restricting usage rather than
granting anything exclusively, a programmable interface wouldn't need
to worry about complications around priviledges while being able to
reap most of the benefits in an a lot easier way.  Am I missing
something?



For #2 , from the intel_rdt cgroup we develop a framework where the user can 
regulate the cache allocation. A user space app could also eventually use this 
as underlying support and then do things on top of it depending on the 
enterprise or other requirements.


A typical use case would be that an application which 
is say continuously polluting the cache(low priority app from cache usage 
perspective) by bringing in data from the network (copying/streaming app) and 
and not letting an app to use the cache which has legitimate requirement of 
cache usage(high priority app).


We need to map the group of tasks to a particular class of service and way for 
the user to specify the cache capacity for that class of service . Also a 
default cgroup which could have all the tasks and use all the cache.
The hierarchical interface can be used by the user as required and does not 
really interfere with allocating exclusive blocks of cache - all the user needs 
to do is make sure the masks dont overlap.

The user can configure the masks to be exclusive from others.
But note that overlapping mask provides a very easy way to share the cache usage 
which is what you may want to do sometimes. The current implementation can be 
easily extended to *enforce* exclusive capacity masks between child nodes if 
required. But since its expected for the super user to be using this , the usage 
may be limited as well or the user can still care of it like i said above. Some 
of the emails may have been confusing that we cannot do exclusive allocations - 
but thats not true all together : we can do canfigure the masks to have 
exclusive cache blocks for different cgroups but its just left to the user...



We did have a lot of discussions during the design and V3 if you remember and 
were closed on using a seperate controller ... Below is one such thread where 
we discussed the same . Dont want to loop throug again with this already full 
marathon patch :)



Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-31 Thread Marcelo Tosatti
On Thu, Jul 30, 2015 at 03:44:58PM -0400, Tejun Heo wrote:
> Hello, Vikas.
> 
> On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
> > This patch adds a cgroup subsystem for Intel Resource Director
> > Technology(RDT) feature and Class of service(CLOSid) management which is
> > part of common RDT framework.  This cgroup would eventually be used by
> > all sub-features of RDT and hence be associated with the common RDT
> > framework as well as sub-feature specific framework.  However current
> > patch series only adds cache allocation sub-feature specific code.
> > 
> > When a cgroup directory is created it has a CLOSid associated with it
> > which is inherited from its parent.  The Closid is mapped to a
> > cache_mask which represents the L3 cache allocation to the cgroup.
> > Tasks belonging to the cgroup get to fill the cache represented by the
> > cache_mask.
> 
> First of all, I apologize for being so late.  I've been thinking about
> it but the thoughts didn't quite crystalize (which isn't to say that
> it's very crystal now) until recently.  If I understand correctly,
> there are a couple suggested use cases for explicitly managing cache
> usage.
> 
> 1. Pinning known hot areas of memory in cache.
> 
> 2. Explicitly regulating cache usage so that cacheline allocation can
>be better than CPU itself doing it.
> 
> #1 isn't part of this patchset, right?  Is there any plan for working
> towards this too?
> 
> For #2, it is likely that the targeted use cases would involve threads
> of a process or at least cooperating processes and having a simple API
> which just goes "this (or the current) thread is only gonna use this
> part of cache" would be a lot easier to use and actually beneficial.
> 
> I don't really think it makes sense to implement a fully hierarchical
> cgroup solution when there isn't the basic affinity-adjusting
> interface 

What is an "affinity adjusting interface" ? Can you give an example
please?

> and it isn't clear whether fully hierarchical resource
> distribution would be necessary especially given that the granularity
> of the target resource is very coarse.

As i see it, the benefit of the hierarchical structure to the CAT
configuration is simply to organize sharing of cache ways in subtrees
- two cgroups can share a given cache way only if they have a common
parent. 

That is the only benefit. Vikas, please correct me if i'm wrong.

> I can see that how cpuset would seem to invite this sort of usage but
> cpuset itself is more of an arbitrary outgrowth (regardless of
> history) in terms of resource control and most things controlled by
> cpuset already have countepart interface which is readily accessible
> to the normal applications.

I can't parse that phrase (due to ignorance). Please educate.

> Given that what the feature allows is restricting usage rather than
> granting anything exclusively, a programmable interface wouldn't need
> to worry about complications around priviledges

What complications about priviledges you refer to?

> while being able to reap most of the benefits in an a lot easier way.
> Am I missing something?

The interface does allow for exclusive cache usage by an application.
Please read the Intel manual, section 17, it is very instructive.

The use cases we have now are the following:

Scenario 1: Consider a system with 4 high performance applications
running, one of which is a streaming application that manages a very
large address space from which it reads and writes as it does its processing.
As such the application will use all the cache it can get but does
not need much if any cache. So, it spoils the cache for everyone for no
gain on its own. In this case we'd like to constrain it to the
smallest possible amount of cache while at the same time constraining
the other 3 applications to stay out of this thrashed area of the
cache.

Scenario 2: We have a numeric application that has been highly optimized
to fit in the L2 cache (2M for example). We want to ensure that its
cached data does not get flushed from the cache hierarchy while it is
scheduled out. In this case we exclusively allocate enough L3 cache to
hold all of the L2 cache.

Scenario 3: Latency sensitive application executing in a shared
environment, where memory to handle an event must be in L3 cache
for latency requirements to be met.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-31 Thread Marcelo Tosatti
On Thu, Jul 30, 2015 at 03:44:58PM -0400, Tejun Heo wrote:
 Hello, Vikas.
 
 On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
  This patch adds a cgroup subsystem for Intel Resource Director
  Technology(RDT) feature and Class of service(CLOSid) management which is
  part of common RDT framework.  This cgroup would eventually be used by
  all sub-features of RDT and hence be associated with the common RDT
  framework as well as sub-feature specific framework.  However current
  patch series only adds cache allocation sub-feature specific code.
  
  When a cgroup directory is created it has a CLOSid associated with it
  which is inherited from its parent.  The Closid is mapped to a
  cache_mask which represents the L3 cache allocation to the cgroup.
  Tasks belonging to the cgroup get to fill the cache represented by the
  cache_mask.
 
 First of all, I apologize for being so late.  I've been thinking about
 it but the thoughts didn't quite crystalize (which isn't to say that
 it's very crystal now) until recently.  If I understand correctly,
 there are a couple suggested use cases for explicitly managing cache
 usage.
 
 1. Pinning known hot areas of memory in cache.
 
 2. Explicitly regulating cache usage so that cacheline allocation can
be better than CPU itself doing it.
 
 #1 isn't part of this patchset, right?  Is there any plan for working
 towards this too?
 
 For #2, it is likely that the targeted use cases would involve threads
 of a process or at least cooperating processes and having a simple API
 which just goes this (or the current) thread is only gonna use this
 part of cache would be a lot easier to use and actually beneficial.
 
 I don't really think it makes sense to implement a fully hierarchical
 cgroup solution when there isn't the basic affinity-adjusting
 interface 

What is an affinity adjusting interface ? Can you give an example
please?

 and it isn't clear whether fully hierarchical resource
 distribution would be necessary especially given that the granularity
 of the target resource is very coarse.

As i see it, the benefit of the hierarchical structure to the CAT
configuration is simply to organize sharing of cache ways in subtrees
- two cgroups can share a given cache way only if they have a common
parent. 

That is the only benefit. Vikas, please correct me if i'm wrong.

 I can see that how cpuset would seem to invite this sort of usage but
 cpuset itself is more of an arbitrary outgrowth (regardless of
 history) in terms of resource control and most things controlled by
 cpuset already have countepart interface which is readily accessible
 to the normal applications.

I can't parse that phrase (due to ignorance). Please educate.

 Given that what the feature allows is restricting usage rather than
 granting anything exclusively, a programmable interface wouldn't need
 to worry about complications around priviledges

What complications about priviledges you refer to?

 while being able to reap most of the benefits in an a lot easier way.
 Am I missing something?

The interface does allow for exclusive cache usage by an application.
Please read the Intel manual, section 17, it is very instructive.

The use cases we have now are the following:

Scenario 1: Consider a system with 4 high performance applications
running, one of which is a streaming application that manages a very
large address space from which it reads and writes as it does its processing.
As such the application will use all the cache it can get but does
not need much if any cache. So, it spoils the cache for everyone for no
gain on its own. In this case we'd like to constrain it to the
smallest possible amount of cache while at the same time constraining
the other 3 applications to stay out of this thrashed area of the
cache.

Scenario 2: We have a numeric application that has been highly optimized
to fit in the L2 cache (2M for example). We want to ensure that its
cached data does not get flushed from the cache hierarchy while it is
scheduled out. In this case we exclusively allocate enough L3 cache to
hold all of the L2 cache.

Scenario 3: Latency sensitive application executing in a shared
environment, where memory to handle an event must be in L3 cache
for latency requirements to be met.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-31 Thread Vikas Shivappa



On Thu, 30 Jul 2015, Tejun Heo wrote:


Hello, Vikas.

On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:

This patch adds a cgroup subsystem for Intel Resource Director
Technology(RDT) feature and Class of service(CLOSid) management which is
part of common RDT framework.  This cgroup would eventually be used by
all sub-features of RDT and hence be associated with the common RDT
framework as well as sub-feature specific framework.  However current
patch series only adds cache allocation sub-feature specific code.

When a cgroup directory is created it has a CLOSid associated with it
which is inherited from its parent.  The Closid is mapped to a
cache_mask which represents the L3 cache allocation to the cgroup.
Tasks belonging to the cgroup get to fill the cache represented by the
cache_mask.


First of all, I apologize for being so late.  I've been thinking about
it but the thoughts didn't quite crystalize (which isn't to say that
it's very crystal now) until recently.  If I understand correctly,
there are a couple suggested use cases for explicitly managing cache
usage.

1. Pinning known hot areas of memory in cache.


No , the cache allocation doesnt do this. (or it isn't expected to do)



2. Explicitly regulating cache usage so that cacheline allocation can
  be better than CPU itself doing it.


yes , this is what we want to do using cache alloc.



#1 isn't part of this patchset, right?  Is there any plan for working
towards this too?


cache allocation is not intended to do #1 , so we dont have to support this.



For #2, it is likely that the targeted use cases would involve threads
of a process or at least cooperating processes and having a simple API
which just goes this (or the current) thread is only gonna use this
part of cache would be a lot easier to use and actually beneficial.

I don't really think it makes sense to implement a fully hierarchical
cgroup solution when there isn't the basic affinity-adjusting
interface and it isn't clear whether fully hierarchical resource
distribution would be necessary especially given that the granularity
of the target resource is very coarse.

I can see that how cpuset would seem to invite this sort of usage but
cpuset itself is more of an arbitrary outgrowth (regardless of
history) in terms of resource control and most things controlled by
cpuset already have countepart interface which is readily accessible
to the normal applications.


Yes today we dont have an alternative interface - but we can always build one. 
We simply dont have it because till now Linux kernel just tolerated the 
degradation that could have occured by cache contention and this is the first 
interface we are building.




Given that what the feature allows is restricting usage rather than
granting anything exclusively, a programmable interface wouldn't need
to worry about complications around priviledges while being able to
reap most of the benefits in an a lot easier way.  Am I missing
something?



For #2 , from the intel_rdt cgroup we develop a framework where the user can 
regulate the cache allocation. A user space app could also eventually use this 
as underlying support and then do things on top of it depending on the 
enterprise or other requirements.


A typical use case would be that an application which 
is say continuously polluting the cache(low priority app from cache usage 
perspective) by bringing in data from the network (copying/streaming app) and 
and not letting an app to use the cache which has legitimate requirement of 
cache usage(high priority app).


We need to map the group of tasks to a particular class of service and way for 
the user to specify the cache capacity for that class of service . Also a 
default cgroup which could have all the tasks and use all the cache.
The hierarchical interface can be used by the user as required and does not 
really interfere with allocating exclusive blocks of cache - all the user needs 
to do is make sure the masks dont overlap.

The user can configure the masks to be exclusive from others.
But note that overlapping mask provides a very easy way to share the cache usage 
which is what you may want to do sometimes. The current implementation can be 
easily extended to *enforce* exclusive capacity masks between child nodes if 
required. But since its expected for the super user to be using this , the usage 
may be limited as well or the user can still care of it like i said above. Some 
of the emails may have been confusing that we cannot do exclusive allocations - 
but thats not true all together : we can do canfigure the masks to have 
exclusive cache blocks for different cgroups but its just left to the user...



We did have a lot of discussions during the design and V3 if you remember and 
were closed on using a seperate controller ... Below is one such thread where 
we discussed the same . Dont want to loop throug again with this already full 
marathon patch :)



Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-30 Thread Tejun Heo
Hello, Vikas.

On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
> This patch adds a cgroup subsystem for Intel Resource Director
> Technology(RDT) feature and Class of service(CLOSid) management which is
> part of common RDT framework.  This cgroup would eventually be used by
> all sub-features of RDT and hence be associated with the common RDT
> framework as well as sub-feature specific framework.  However current
> patch series only adds cache allocation sub-feature specific code.
> 
> When a cgroup directory is created it has a CLOSid associated with it
> which is inherited from its parent.  The Closid is mapped to a
> cache_mask which represents the L3 cache allocation to the cgroup.
> Tasks belonging to the cgroup get to fill the cache represented by the
> cache_mask.

First of all, I apologize for being so late.  I've been thinking about
it but the thoughts didn't quite crystalize (which isn't to say that
it's very crystal now) until recently.  If I understand correctly,
there are a couple suggested use cases for explicitly managing cache
usage.

1. Pinning known hot areas of memory in cache.

2. Explicitly regulating cache usage so that cacheline allocation can
   be better than CPU itself doing it.

#1 isn't part of this patchset, right?  Is there any plan for working
towards this too?

For #2, it is likely that the targeted use cases would involve threads
of a process or at least cooperating processes and having a simple API
which just goes "this (or the current) thread is only gonna use this
part of cache" would be a lot easier to use and actually beneficial.

I don't really think it makes sense to implement a fully hierarchical
cgroup solution when there isn't the basic affinity-adjusting
interface and it isn't clear whether fully hierarchical resource
distribution would be necessary especially given that the granularity
of the target resource is very coarse.

I can see that how cpuset would seem to invite this sort of usage but
cpuset itself is more of an arbitrary outgrowth (regardless of
history) in terms of resource control and most things controlled by
cpuset already have countepart interface which is readily accessible
to the normal applications.

Given that what the feature allows is restricting usage rather than
granting anything exclusively, a programmable interface wouldn't need
to worry about complications around priviledges while being able to
reap most of the benefits in an a lot easier way.  Am I missing
something?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-30 Thread Vikas Shivappa



On Tue, 28 Jul 2015, Peter Zijlstra wrote:


On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:

 static int __init intel_rdt_late_init(void)
 {
struct cpuinfo_x86 *c = _cpu_data;
+   static struct clos_cbm_map *ccm;
+   u32 maxid, max_cbm_len;
+   size_t sizeb;


Why 'sizeb' ? 'size' is still available, right?


will fix. int size should be good enough.




+   int err = 0;

-   if (!cpu_has(c, X86_FEATURE_CAT_L3))
+   if (!cpu_has(c, X86_FEATURE_CAT_L3)) {
+   rdt_root_group.css.ss->disabled = 1;
return -ENODEV;
+   }
+   maxid = c->x86_cache_max_closid;
+   max_cbm_len = c->x86_cache_max_cbm_len;
+
+   sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
+   rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
+   if (!rdtss_info.closmap) {
+   err = -ENOMEM;
+   goto out_err;
+   }
+
+   sizeb = maxid * sizeof(struct clos_cbm_map);
+   ccmap = kzalloc(sizeb, GFP_KERNEL);
+   if (!ccmap) {
+   kfree(rdtss_info.closmap);
+   err = -ENOMEM;
+   goto out_err;
+   }


What's the expected size of max_closid? iow, how big of an array are you
in fact allocating here?


the size of maxclosid value is 16 bits.. For systems with large CPUs this may be 
more but with EPs have only seen 20-30.

Thanks,
Vikas




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-30 Thread Vikas Shivappa



On Tue, 28 Jul 2015, Peter Zijlstra wrote:


On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:

+struct clos_cbm_map {
+   unsigned long cache_mask;
+   unsigned int clos_refcnt;
+};


This structure is not a map at all, its the map value. Furthermore,
cache_mask seems a confusing name for the capacity bitmask (CBM).


clos_cbm_table ? since its really a table which is indexed by the clos.

will fix the mask names.
Thanks,
Vikas



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-30 Thread Tejun Heo
Hello, Vikas.

On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
 This patch adds a cgroup subsystem for Intel Resource Director
 Technology(RDT) feature and Class of service(CLOSid) management which is
 part of common RDT framework.  This cgroup would eventually be used by
 all sub-features of RDT and hence be associated with the common RDT
 framework as well as sub-feature specific framework.  However current
 patch series only adds cache allocation sub-feature specific code.
 
 When a cgroup directory is created it has a CLOSid associated with it
 which is inherited from its parent.  The Closid is mapped to a
 cache_mask which represents the L3 cache allocation to the cgroup.
 Tasks belonging to the cgroup get to fill the cache represented by the
 cache_mask.

First of all, I apologize for being so late.  I've been thinking about
it but the thoughts didn't quite crystalize (which isn't to say that
it's very crystal now) until recently.  If I understand correctly,
there are a couple suggested use cases for explicitly managing cache
usage.

1. Pinning known hot areas of memory in cache.

2. Explicitly regulating cache usage so that cacheline allocation can
   be better than CPU itself doing it.

#1 isn't part of this patchset, right?  Is there any plan for working
towards this too?

For #2, it is likely that the targeted use cases would involve threads
of a process or at least cooperating processes and having a simple API
which just goes this (or the current) thread is only gonna use this
part of cache would be a lot easier to use and actually beneficial.

I don't really think it makes sense to implement a fully hierarchical
cgroup solution when there isn't the basic affinity-adjusting
interface and it isn't clear whether fully hierarchical resource
distribution would be necessary especially given that the granularity
of the target resource is very coarse.

I can see that how cpuset would seem to invite this sort of usage but
cpuset itself is more of an arbitrary outgrowth (regardless of
history) in terms of resource control and most things controlled by
cpuset already have countepart interface which is readily accessible
to the normal applications.

Given that what the feature allows is restricting usage rather than
granting anything exclusively, a programmable interface wouldn't need
to worry about complications around priviledges while being able to
reap most of the benefits in an a lot easier way.  Am I missing
something?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-30 Thread Vikas Shivappa



On Tue, 28 Jul 2015, Peter Zijlstra wrote:


On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:

 static int __init intel_rdt_late_init(void)
 {
struct cpuinfo_x86 *c = boot_cpu_data;
+   static struct clos_cbm_map *ccm;
+   u32 maxid, max_cbm_len;
+   size_t sizeb;


Why 'sizeb' ? 'size' is still available, right?


will fix. int size should be good enough.




+   int err = 0;

-   if (!cpu_has(c, X86_FEATURE_CAT_L3))
+   if (!cpu_has(c, X86_FEATURE_CAT_L3)) {
+   rdt_root_group.css.ss-disabled = 1;
return -ENODEV;
+   }
+   maxid = c-x86_cache_max_closid;
+   max_cbm_len = c-x86_cache_max_cbm_len;
+
+   sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
+   rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
+   if (!rdtss_info.closmap) {
+   err = -ENOMEM;
+   goto out_err;
+   }
+
+   sizeb = maxid * sizeof(struct clos_cbm_map);
+   ccmap = kzalloc(sizeb, GFP_KERNEL);
+   if (!ccmap) {
+   kfree(rdtss_info.closmap);
+   err = -ENOMEM;
+   goto out_err;
+   }


What's the expected size of max_closid? iow, how big of an array are you
in fact allocating here?


the size of maxclosid value is 16 bits.. For systems with large CPUs this may be 
more but with EPs have only seen 20-30.

Thanks,
Vikas




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-30 Thread Vikas Shivappa



On Tue, 28 Jul 2015, Peter Zijlstra wrote:


On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:

+struct clos_cbm_map {
+   unsigned long cache_mask;
+   unsigned int clos_refcnt;
+};


This structure is not a map at all, its the map value. Furthermore,
cache_mask seems a confusing name for the capacity bitmask (CBM).


clos_cbm_table ? since its really a table which is indexed by the clos.

will fix the mask names.
Thanks,
Vikas



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-28 Thread Peter Zijlstra
On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
>  static int __init intel_rdt_late_init(void)
>  {
>   struct cpuinfo_x86 *c = _cpu_data;
> + static struct clos_cbm_map *ccm;
> + u32 maxid, max_cbm_len;
> + size_t sizeb;

Why 'sizeb' ? 'size' is still available, right?

> + int err = 0;
>  
> - if (!cpu_has(c, X86_FEATURE_CAT_L3))
> + if (!cpu_has(c, X86_FEATURE_CAT_L3)) {
> + rdt_root_group.css.ss->disabled = 1;
>   return -ENODEV;
> + }
> + maxid = c->x86_cache_max_closid;
> + max_cbm_len = c->x86_cache_max_cbm_len;
> +
> + sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
> + rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
> + if (!rdtss_info.closmap) {
> + err = -ENOMEM;
> + goto out_err;
> + }
> +
> + sizeb = maxid * sizeof(struct clos_cbm_map);
> + ccmap = kzalloc(sizeb, GFP_KERNEL);
> + if (!ccmap) {
> + kfree(rdtss_info.closmap);
> + err = -ENOMEM;
> + goto out_err;
> + }

What's the expected size of max_closid? iow, how big of an array are you
in fact allocating here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-28 Thread Peter Zijlstra
On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
> +struct clos_cbm_map {
> + unsigned long cache_mask;
> + unsigned int clos_refcnt;
> +};

This structure is not a map at all, its the map value. Furthermore,
cache_mask seems a confusing name for the capacity bitmask (CBM).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-28 Thread Peter Zijlstra
On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
  static int __init intel_rdt_late_init(void)
  {
   struct cpuinfo_x86 *c = boot_cpu_data;
 + static struct clos_cbm_map *ccm;
 + u32 maxid, max_cbm_len;
 + size_t sizeb;

Why 'sizeb' ? 'size' is still available, right?

 + int err = 0;
  
 - if (!cpu_has(c, X86_FEATURE_CAT_L3))
 + if (!cpu_has(c, X86_FEATURE_CAT_L3)) {
 + rdt_root_group.css.ss-disabled = 1;
   return -ENODEV;
 + }
 + maxid = c-x86_cache_max_closid;
 + max_cbm_len = c-x86_cache_max_cbm_len;
 +
 + sizeb = BITS_TO_LONGS(maxid) * sizeof(long);
 + rdtss_info.closmap = kzalloc(sizeb, GFP_KERNEL);
 + if (!rdtss_info.closmap) {
 + err = -ENOMEM;
 + goto out_err;
 + }
 +
 + sizeb = maxid * sizeof(struct clos_cbm_map);
 + ccmap = kzalloc(sizeb, GFP_KERNEL);
 + if (!ccmap) {
 + kfree(rdtss_info.closmap);
 + err = -ENOMEM;
 + goto out_err;
 + }

What's the expected size of max_closid? iow, how big of an array are you
in fact allocating here?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/9] x86/intel_rdt: Add new cgroup and Class of service management

2015-07-28 Thread Peter Zijlstra
On Wed, Jul 01, 2015 at 03:21:06PM -0700, Vikas Shivappa wrote:
 +struct clos_cbm_map {
 + unsigned long cache_mask;
 + unsigned int clos_refcnt;
 +};

This structure is not a map at all, its the map value. Furthermore,
cache_mask seems a confusing name for the capacity bitmask (CBM).
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/