Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-12 Thread Marcelo Tosatti
On Thu, Jan 12, 2017 at 11:06:06AM +, Daniel P. Berrange wrote:
> On Thu, Jan 12, 2017 at 08:48:01AM -0200, Marcelo Tosatti wrote:
> > On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote:
> > > hi, It's really good to have you get involved to support CAT in
> > > libvirt/OpenStack.
> > > replied inlines.
> > > 
> > > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti :
> > > 
> > > >
> > > > Hi,
> > > >
> > > > Comments/questions related to:
> > > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> > > >
> > > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> > > >
> > > > How does allocation of code/data look like?
> > > >
> > > 
> > > My plan's expose new options:
> > > 
> > > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2
> > > 
> > > Please notes, you can use only l3 or l3data/l3code(if enable cdp while
> > > mount resctrl fs)
> > 
> > Fine. However, you should be able to emulate a type=both reservation
> > (non cdp) by writing a schemata file with the same CBM bits:
> > 
> > L3code:0=0x000ff;1=0x000ff
> > L3data:0=0x000ff;1=0x000ff
> > 
> > (*)
> > 
> > I don't see how this interface enables that possibility.
> > 
> > I suppose it would be easier for mgmt software to have it
> > done automatically: 
> > 
> > virsh cachetune kvm02 --l3 size_in_kbytes.
> > 
> > Would create the reservations as (*) in resctrlfs, in 
> > case host is CDP enabled.
> 
> You'll be able to query libvirt to determine whether you have
> l3, or l3data + l3code. So mgmt app can decide to emulate
> "type=both", if it sees l3data+l3code as separate items.

No it can't, because the interface does not allow you to specify
whether l3data and l3code should intersect each other (and by how much).

Unless you do that. Which IMO is overkill.

Or a parameter (option 1):

virsh cachetune kvm02 --l3data size_in_kbytes --l3code size_in_kbytes
--share-l3

meaning the reservations share space.

OR (option 2):

virsh cachetune kvm02 --l3 size_in_kbytes
(with internal translation to l3data and l3code reservations in the
same space).

I don't see any point in having option 1, option 2 is simpler and
removes a tunable from the interface.

Do you have a reason behind the statement that mgmt app should
decide this?

> > > > 4) Usefulness of exposing minimum unit size.
> > > >
> > > > Rather than specify unit sizes (which forces the user
> > > > to convert every time the command is executed), why not specify
> > > > in kbytes and round up?
> > > >
> > > 
> > > I accept this, I propose to expose minimum unit size because of I'd like 
> > > to
> > > let using specify the unit count(which as you say this is not good),
> > > 
> > > as you know the minimum unit size is decided by hard ware
> > > eg
> > > on a host, we have 56320 KiB cache, and the max cbm length is 20 (f),
> > > so the minimum cache should be 56320/20 = 2816 KiB
> > > 
> > > if we allow use specify cache size instead of cache unit count, user may
> > > set the cache as 2817 KiB, and we should round up it to 2816 * 2,  there
> > > will be 2815 KiB wasted.
> > 
> > Yes but the user can know the wasted amount if necessary, if you expose
> > the cache unit size (again, i doubt this will happen in practice because
> > the granularity of the CBM bits is small compared to the cache size).
> > 
> > The problem with the cache unit count specification is that it does not
> > work across different hosts: if a user saves the "cache unit count"
> > value manually in a XML file, then uses that XML file on a different
> > host, the reservation on the new host can become smaller than desired,
> > which violates expectations.
> 
> Yes, public APIs should always use an actual size, usually KB in most
> of our APIs, but sometimes bytes.
> 
> > > Anyway , I am open to using KiB size and let libvirt to calculate the cbm
> > > bits, am thinking if we need to tell the actual_cache_size is up to 5632
> > > KiB even they wants 2816 KiB cache.
> > 
> > Another thing i did on resctrltool is to have a safety margin for
> > allocations: do not let the user allocate all of the cache (that is
> > leave 0 bytes for the default group). I used one cache unit as the
> > minimum:
> > 
> > if ret == ERR_LOW_SPACE:
> > print "Warning: free space on default mask is <= %d\n" %
> > (kbytes_per_bit_of_cbm)
> > print "use --force to force"
> 
> Libvirt explicitly aims to avoid making policy decisions like this.
> As your "--force" message shows there, it means you then have to
> add in ways to get around the policy. Libvirt just tries to provide
> the mechanism and leave it to the app to decide on usage policy.

Actually what i said is nonsense the kernel does it already.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-12 Thread Daniel P. Berrange
On Thu, Jan 12, 2017 at 08:48:01AM -0200, Marcelo Tosatti wrote:
> On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote:
> > hi, It's really good to have you get involved to support CAT in
> > libvirt/OpenStack.
> > replied inlines.
> > 
> > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti :
> > 
> > >
> > > Hi,
> > >
> > > Comments/questions related to:
> > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> > >
> > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> > >
> > > How does allocation of code/data look like?
> > >
> > 
> > My plan's expose new options:
> > 
> > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2
> > 
> > Please notes, you can use only l3 or l3data/l3code(if enable cdp while
> > mount resctrl fs)
> 
> Fine. However, you should be able to emulate a type=both reservation
> (non cdp) by writing a schemata file with the same CBM bits:
> 
>   L3code:0=0x000ff;1=0x000ff
>   L3data:0=0x000ff;1=0x000ff
> 
> (*)
> 
> I don't see how this interface enables that possibility.
> 
> I suppose it would be easier for mgmt software to have it
> done automatically: 
> 
> virsh cachetune kvm02 --l3 size_in_kbytes.
> 
> Would create the reservations as (*) in resctrlfs, in 
> case host is CDP enabled.

You'll be able to query libvirt to determine whether you have
l3, or l3data + l3code. So mgmt app can decide to emulate
"type=both", if it sees l3data+l3code as separate items.

> 
> > >
> > > 4) Usefulness of exposing minimum unit size.
> > >
> > > Rather than specify unit sizes (which forces the user
> > > to convert every time the command is executed), why not specify
> > > in kbytes and round up?
> > >
> > 
> > I accept this, I propose to expose minimum unit size because of I'd like to
> > let using specify the unit count(which as you say this is not good),
> > 
> > as you know the minimum unit size is decided by hard ware
> > eg
> > on a host, we have 56320 KiB cache, and the max cbm length is 20 (f),
> > so the minimum cache should be 56320/20 = 2816 KiB
> > 
> > if we allow use specify cache size instead of cache unit count, user may
> > set the cache as 2817 KiB, and we should round up it to 2816 * 2,  there
> > will be 2815 KiB wasted.
> 
> Yes but the user can know the wasted amount if necessary, if you expose
> the cache unit size (again, i doubt this will happen in practice because
> the granularity of the CBM bits is small compared to the cache size).
> 
> The problem with the cache unit count specification is that it does not
> work across different hosts: if a user saves the "cache unit count"
> value manually in a XML file, then uses that XML file on a different
> host, the reservation on the new host can become smaller than desired,
> which violates expectations.

Yes, public APIs should always use an actual size, usually KB in most
of our APIs, but sometimes bytes.

> > Anyway , I am open to using KiB size and let libvirt to calculate the cbm
> > bits, am thinking if we need to tell the actual_cache_size is up to 5632
> > KiB even they wants 2816 KiB cache.
> 
> Another thing i did on resctrltool is to have a safety margin for
> allocations: do not let the user allocate all of the cache (that is
> leave 0 bytes for the default group). I used one cache unit as the
> minimum:
> 
> if ret == ERR_LOW_SPACE:
> print "Warning: free space on default mask is <= %d\n" %
> (kbytes_per_bit_of_cbm)
> print "use --force to force"

Libvirt explicitly aims to avoid making policy decisions like this.
As your "--force" message shows there, it means you then have to
add in ways to get around the policy. Libvirt just tries to provide
the mechanism and leave it to the app to decide on usage policy.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-12 Thread Marcelo Tosatti
On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote:
> hi, It's really good to have you get involved to support CAT in
> libvirt/OpenStack.
> replied inlines.
> 
> 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti :
> 
> >
> > Hi,
> >
> > Comments/questions related to:
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> >
> > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> >
> > How does allocation of code/data look like?
> >
> 
> My plan's expose new options:
> 
> virsh cachetune kvm02 --l3data.count 2 --l3code.count 2
> 
> Please notes, you can use only l3 or l3data/l3code(if enable cdp while
> mount resctrl fs)

Fine. However, you should be able to emulate a type=both reservation
(non cdp) by writing a schemata file with the same CBM bits:

L3code:0=0x000ff;1=0x000ff
L3data:0=0x000ff;1=0x000ff

(*)

I don't see how this interface enables that possibility.

I suppose it would be easier for mgmt software to have it
done automatically: 

virsh cachetune kvm02 --l3 size_in_kbytes.

Would create the reservations as (*) in resctrlfs, in 
case host is CDP enabled.

(also please use kbytes, or give a reason to not use
kbytes).

Note: exposing the unit size is fine as mgmt software might 
decide a placement of VMs which reduces the amount of L3
cache reservation rounding (although i doubt anyone is going
to care about that in practice).

> > 2) 'nodecachestats' command:
> >
> > 3. Add new virsh command 'nodecachestats':
> > This API is to expose vary cache resouce left on each hardware (cpu
> > socket).
> > It will be formated as:
> > .: left size KiB
> >
> > Does this take into account that only contiguous regions of cbm masks
> > can be used for allocations?
> >
> >
> yes, it is the contiguous regions cbm or in another word it's the default
> cbm represent's cache value.
> 
> resctrl doesn't allow set non-contiguous cbm (which is restricted by
> hardware)

OK.

> 
> 
> > Also, it should return the amount of free cache on each cacheid.
> >
> 
> yes, it is.  resource_id == cacheid

OK.
> >
> > 3) The interface should support different sizes for different
> > cache-ids. See the KVM-RT use case at
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
> > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".
> >
> 
> I don't think it's good to let user specify cache-ids while doing cache
> allocation.

This is necessary for our usecase.

> the cache ids used should rely on what cpu affinity the vm are setting.

The cache ids configuration should match the cpu affinity configuration.

> eg.
> 
> 1. for those host who has only one cache id(one socket host), we don't need
> to set cache id

Right.

> 2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping
> (define cpuset for a VM), then we (libvirt) need to compute how much cache
> on which cache id should set.
> Which is to say, user should set the cpu affinity before cache allocation.
> 
> I know that the most cases of using CAT is for NFV. As far as I know, NFV
> is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to
> worry about on which cache id we set the cache size.
> 
> So, just let user specify cache size(here my propose is cache unit account)
> and let libvirt detect on which cache id set how many cache.

Ok fine, its OK to not expose this to the user but calculate it
internally in libvirt. As long as you recompute the schematas whenever
cpu affinity changes. But using different cache-id's in schemata is
necessary for our usecase.

> >
> > 4) Usefulness of exposing minimum unit size.
> >
> > Rather than specify unit sizes (which forces the user
> > to convert every time the command is executed), why not specify
> > in kbytes and round up?
> >
> 
> I accept this, I propose to expose minimum unit size because of I'd like to
> let using specify the unit count(which as you say this is not good),
> 
> as you know the minimum unit size is decided by hard ware
> eg
> on a host, we have 56320 KiB cache, and the max cbm length is 20 (f),
> so the minimum cache should be 56320/20 = 2816 KiB
> 
> if we allow use specify cache size instead of cache unit count, user may
> set the cache as 2817 KiB, and we should round up it to 2816 * 2,  there
> will be 2815 KiB wasted.

Yes but the user can know the wasted amount if necessary, if you expose
the cache unit size (again, i doubt this will happen in practice because
the granularity of the CBM bits is small compared to the cache size).

The problem with the cache unit count specification is that it does not
work across different hosts: if a user saves the "cache unit count"
value manually in a XML file, then uses that XML file on a different
host, the reservation on the new host can become smaller than desired,
which violates expectations.

> Anyway , I am open to using KiB size and let libvirt to calculate the cbm
> bits, 

Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-12 Thread Marcelo Tosatti
On Thu, Jan 12, 2017 at 08:47:58AM -0200, Marcelo Tosatti wrote:
> On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote:
> > hi, It's really good to have you get involved to support CAT in
> > libvirt/OpenStack.
> > replied inlines.
> > 
> > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti :
> > 
> > >
> > > Hi,
> > >
> > > Comments/questions related to:
> > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> > >
> > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> > >
> > > How does allocation of code/data look like?
> > >
> > 
> > My plan's expose new options:
> > 
> > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2
> > 
> > Please notes, you can use only l3 or l3data/l3code(if enable cdp while
> > mount resctrl fs)
> 
> Fine. However, you should be able to emulate a type=both reservation
> (non cdp) by writing a schemata file with the same CBM bits:
> 
>   L3code:0=0x000ff;1=0x000ff
>   L3data:0=0x000ff;1=0x000ff
> 
> (*)
> 
> I don't see how this interface enables that possibility.
> 
> I suppose it would be easier for mgmt software to have it
> done automatically: 
> 
> virsh cachetune kvm02 --l3 size_in_kbytes.
> 
> Would create the reservations as (*) in resctrlfs, in 
> case host is CDP enabled.
> 
> (also please use kbytes, or give a reason to not use
> kbytes).
> 
> Note: exposing the unit size is fine as mgmt software might 
> decide a placement of VMs which reduces the amount of L3
> cache reservation rounding (although i doubt anyone is going
> to care about that in practice).
> 
> > > 2) 'nodecachestats' command:
> > >
> > > 3. Add new virsh command 'nodecachestats':
> > > This API is to expose vary cache resouce left on each hardware 
> > > (cpu
> > > socket).
> > > It will be formated as:
> > > .: left size KiB
> > >
> > > Does this take into account that only contiguous regions of cbm masks
> > > can be used for allocations?
> > >
> > >
> > yes, it is the contiguous regions cbm or in another word it's the default
> > cbm represent's cache value.
> > 
> > resctrl doesn't allow set non-contiguous cbm (which is restricted by
> > hardware)
> 
> OK.
> 
> > 
> > 
> > > Also, it should return the amount of free cache on each cacheid.
> > >
> > 
> > yes, it is.  resource_id == cacheid
> 
> OK.
> > >
> > > 3) The interface should support different sizes for different
> > > cache-ids. See the KVM-RT use case at
> > > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
> > > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".
> > >
> > 
> > I don't think it's good to let user specify cache-ids while doing cache
> > allocation.
> 
> This is necessary for our usecase.
> 
> > the cache ids used should rely on what cpu affinity the vm are setting.
> 
> The cache ids configuration should match the cpu affinity configuration.
> 
> > eg.
> > 
> > 1. for those host who has only one cache id(one socket host), we don't need
> > to set cache id
> 
> Right.
> 
> > 2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping
> > (define cpuset for a VM), then we (libvirt) need to compute how much cache
> > on which cache id should set.
> > Which is to say, user should set the cpu affinity before cache allocation.
> > 
> > I know that the most cases of using CAT is for NFV. As far as I know, NFV
> > is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to
> > worry about on which cache id we set the cache size.
> > 
> > So, just let user specify cache size(here my propose is cache unit account)
> > and let libvirt detect on which cache id set how many cache.
> 
> Ok fine, its OK to not expose this to the user but calculate it
> internally in libvirt. As long as you recompute the schematas whenever
> cpu affinity changes. But using different cache-id's in schemata is
> necessary for our usecase.

Hum, thinking again about this, it needs to be per-vcpu. So for the NFV
use-case you want:

vcpu0: no reservation (belongs to the default group).
vcpu1: reservation with particular size.

Then if a vcpu is pinned, "trim" the reservation down to the
particular cache-id where its pinned to.

This is important because it allows vcpu0 workload to not 
interfere with the realtime workload running on vcpu1.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-12 Thread Martin Kletzander

On Thu, Jan 12, 2017 at 09:20:30AM +, Daniel P. Berrange wrote:

On Thu, Jan 12, 2017 at 11:15:39AM +0800, 乔立勇(Eli Qiao) wrote:

>
>
> 
>   
>   
>

yes, I like this too, it could tell the the resource sharing logic by cpus.

Another thinking is that if kernel enable CDP, it will split l3 cache to
code / data type

  
  

So these information should not only
from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if
linux resctrl under /sys/fs/resctrl/



>   
>

I think on your system you don't enable SMT, so if on a system which
enabled SMT.

we will have:
  


  
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
>
>

hmm... l2 and l1 cache are per core, I am not sure if we really need to
tune the l2 and l1 cache at all, that's too low level...

Per my understanding, if we expose this kinds of capabilities, we should
support to manage it, just wonder if we are too early to
expose it since low level (linux kernel) have not support it yet.


We don't need to list l2/l1 cache in the XML right now. The example
above shows that the schemas is capable of supporting it in the
future, which is the important thing. So we can start with only
reporting L3, and add l2/l1 later if we find it is needed without
having to change the XML again.



Another idea of mine was to expose those caches that hosts supports
allocation on (i.e. capability a client can use).  But that could feel
messy in the end.  Just a thought.



Regards,
Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-12 Thread Daniel P. Berrange
On Thu, Jan 12, 2017 at 11:15:39AM +0800, 乔立勇(Eli Qiao) wrote:
> >
> >
> > 
> >   
> >   
> >
> 
> yes, I like this too, it could tell the the resource sharing logic by cpus.
> 
> Another thinking is that if kernel enable CDP, it will split l3 cache to
> code / data type
> 
>   
>   
> 
> So these information should not only
> from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if
> linux resctrl under /sys/fs/resctrl/
> 
> 
> 
> >   
> >
> 
> I think on your system you don't enable SMT, so if on a system which
> enabled SMT.
> 
> we will have:
>   
> 
> 
>   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> >   
> > 
> >
> >
> 
> hmm... l2 and l1 cache are per core, I am not sure if we really need to
> tune the l2 and l1 cache at all, that's too low level...
> 
> Per my understanding, if we expose this kinds of capabilities, we should
> support to manage it, just wonder if we are too early to
> expose it since low level (linux kernel) have not support it yet.

We don't need to list l2/l1 cache in the XML right now. The example
above shows that the schemas is capable of supporting it in the
future, which is the important thing. So we can start with only
reporting L3, and add l2/l1 later if we find it is needed without
having to change the XML again.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-12 Thread Daniel P. Berrange
On Thu, Jan 12, 2017 at 10:58:53AM +0800, 乔立勇(Eli Qiao) wrote:
> 2017-01-11 19:09 GMT+08:00 Daniel P. Berrange :
> 
> > On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
> > > On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote:
> > > >
> > > > IIUC, the kernel lets us associate individual PIDs
> > > > with each cache. Since each vCPU is a PID, this means
> > > > we are able to allocate different cache size to
> > > > different CPUs. So we need to be able to represent
> > > > that in the XML. I think we should also represent
> > > > the allocation in a normal size (ie KiB), not in
> > > > count of min unit.
> > > >
> > > > So eg this shows allocating two cache banks and giving
> > > > one to the first 4 cpus, and one to the second 4 cpus
> > > >
> > > >   
> > > >  
> > > >  
> > > >   
> > > >
> > >
> > > I agree with your approach, we just need to keep in mind two more
> > > things.  I/O threads and the mail QEMU (emulator) thread can have
> > > allocations as well.  Also we need to say on which socket the allocation
> > > should be done.
> >
> > Also, I wonder if this is better put in the existing 
> > element, since this is really an aspect of the CPU configuration.
> >
> > Perhaps split configuration of cache banks from the mapping to
> > cpus/iothreads/emulator. Also, per Marcello's mail, we need to
> > include the host cache ID, so we know where to allocate from
> > if there's multiple caches of the same type. So XML could look
> > more like this:
> >
> >
> >
> >
> 
> 
> I don't think we require host_id here. we can only allow setting cache
> allocation only IF the VM has vcpu -> pcpu affinity setting. and let
> libvirt calculate where to set the cache (on which
> cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches
> are cpu's resource, only the VM running on specify cpu can benefit the
> cache.

Lets say the guest is pinned to CPU 3, and there are two separate L3
caches associated with CPU 3. If we don't include host_id, then libvirt
has to decide which of the two possible caches to allocate from. We can
do that, but generally we've tried to avoid such policy decisions in
libvirt before, hence I thought it preferrable to have the admin be
explicit about which cache they want.


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Martin Kletzander

On Thu, Jan 12, 2017 at 10:58:53AM +0800, 乔立勇(Eli Qiao) wrote:

2017-01-11 19:09 GMT+08:00 Daniel P. Berrange :


On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
> On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote:
> >
> > IIUC, the kernel lets us associate individual PIDs
> > with each cache. Since each vCPU is a PID, this means
> > we are able to allocate different cache size to
> > different CPUs. So we need to be able to represent
> > that in the XML. I think we should also represent
> > the allocation in a normal size (ie KiB), not in
> > count of min unit.
> >
> > So eg this shows allocating two cache banks and giving
> > one to the first 4 cpus, and one to the second 4 cpus
> >
> >   
> >  
> >  
> >   
> >
>
> I agree with your approach, we just need to keep in mind two more
> things.  I/O threads and the mail QEMU (emulator) thread can have
> allocations as well.  Also we need to say on which socket the allocation
> should be done.

Also, I wonder if this is better put in the existing 
element, since this is really an aspect of the CPU configuration.

Perhaps split configuration of cache banks from the mapping to
cpus/iothreads/emulator. Also, per Marcello's mail, we need to
include the host cache ID, so we know where to allocate from
if there's multiple caches of the same type. So XML could look
more like this:

   
   
   



I don't think we require host_id here. we can only allow setting cache
allocation only IF the VM has vcpu -> pcpu affinity setting. and let
libvirt calculate where to set the cache (on which
cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches
are cpu's resource, only the VM running on specify cpu can benefit the
cache.

if we explicit allocate cache no care about what's the VM's pcpu affinity,
helpless.



One thing we need to decide upfront is whether we are going to be fixing
user misconfiguration and to which extent because I feel like there's
too much discussion about that.  So either:

 a) We make sure that each thread that utilizes CAT is pinned to host
threads without split cache, i.e. it cannot be scheduled outside of
those.  I'm not using socket/core/thread and L3 because we need to
be prepared here just in case any other cache hierarchy is used.

 b) We let user specify whatever they want.

Option (a) requires more code, more work, and must be checked on all
changes (vcpupin API, XML change, CPU hotplug, etc.), but option (b)
goes more with the rest of libvirt's config where we just let the users
shoot themselves in their feet by misconfiguration, i.e. if someone
wants to allocate cache on socket 0 and schedule all CPUs on socket 1,
then it's their fault.  Option (a) can save us some specification from
the XML, because we can compute some of the values.  However, that might
not be very reliable and we might end up requiring all the values
specified at the end anyway.  So from my point of view, I'd rather go
with (b) just so we don't swamp ourselves with the details, also we can
add the checks later.  And most importantly, as mentioned before, it
goes with the rest of the code.








  

   
   
   
   


Regards,
Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
:|
|: http://libvirt.org  -o- http://virt-manager.org
:|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
:|





--
Best regards
- Eli

天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Martin Kletzander

On Thu, Jan 12, 2017 at 11:19:07AM +0800, 乔立勇(Eli Qiao) wrote:


more like this:

   
   
   



If so, we need to extend "virsh cputune" or and new API like cachetune?



Yeah, sure, that's a detail to be done after the design is done.




   
   
   
   
   


Regards,
Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
:|
|: http://libvirt.org  -o- http://virt-manager.org
:|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
:|





--
Best regards
- Eli

天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Eli Qiao
>
> more like this:
>
>
>
>
>
>
If so, we need to extend "virsh cputune" or and new API like cachetune?


>
>
>
>
>
>
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
> :|
>



-- 
Best regards
- Eli

天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Eli Qiao
>
>
> > which shows each socket has its own dedicated L3 cache, and each
> > core has its own L2 & L1 cache.
>
> We need to also include the host cache ID value in the XML to
> let us reliably distinguish / associate with differet cache
> banks when placing guests, if there's multiple caches of the
> same type associated with the same CPU.
>
>  
>
>
> cpus="3,4,5,9,10,11"/>
> cpus="3,4,5,9,10,11"/>
>
>
>  
>
>
>
> > > 3. Add new virsh command 'nodecachestats':
> > > This API is to expose vary cache resouce left on each hardware (cpu
> socket).
> > >
> > > It will be formated as:
> > >
> > > .: left size KiB
> > >
> > > for example I have a 2 socket cpus host, and I'v enabled cat_l3
> feature only
> > >
> > > root@s2600wt:~/linux# virsh nodecachestats
> > > L3.0 : 56320 KiB
> > > L3.1 : 56320 KiB
> > >
> > >   P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
> >
> > This feels like something we should have in the capabilities XML too
> > rather than a new command
> >
> > 
> >   
> >   
> >   
> >   
> > 
>
> Opps, ignore this. I remember the reason we always report available
> resource separately from physically present resource, is that we
> don't want to re-generate capabilities XML every time available
> resource changes.
>
> So, yes, we do need some API like  virNodeFreeCache()  / virs nodefreecache
>

yes, we need this.


> We probably want to use an 2d array of typed parameters. The first level of
> the array would represent the cache bank, the second level woudl represent
> the parameters for that bank. eg if we had 3 cache banks, we'd report a
> 3x3 typed parameter array, with parameters for the cache ID, its type and
> the available / free size
>
>id=0
>type=l3
>avail=56320
>
>id=1
>type=l3
>avail=56320
>
>id=2
>type=l3
>avail=56320
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
> :|
>



-- 
Best regards
- Eli

天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Eli Qiao
>
>
> 
>   
>   
>

yes, I like this too, it could tell the the resource sharing logic by cpus.

Another thinking is that if kernel enable CDP, it will split l3 cache to
code / data type

  
  

So these information should not only
from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if
linux resctrl under /sys/fs/resctrl/



>   
>

I think on your system you don't enable SMT, so if on a system which
enabled SMT.

we will have:
  


  
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
>
>

hmm... l2 and l1 cache are per core, I am not sure if we really need to
tune the l2 and l1 cache at all, that's too low level...

Per my understanding, if we expose this kinds of capabilities, we should
support to manage it, just wonder if we are too early to
expose it since low level (linux kernel) have not support it yet.



> which shows each socket has its own dedicated L3 cache, and each
> core has its own L2 & L1 cache.
>
> > 2. Extend capabilities outputs.
> >
> > virsh capabilities | grep resctrl
> > 
> > ...
> >cache_unit='2816'/>
> > 
> >
> > This will tell that the host have enabled resctrl(which you can find
> it in /sys/fs/resctrl),
> > And it supports to allocate 'L3' type cache, total 'L3' cache size is
> 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB.
> >   P.S. L3 cache size unit is the minum l3 cache unit can be allocated.
> It's hardware related and can not be changed.
>
> If we're already reported cache in the capabilities from step
> one, then it ought to be extendable to cover this reporting.
>
> 
>   
>   
>   
>   
>   
>   
> 
>
>
Looks good to me.


> note how we report the control info for both l3 caches, since they
> come from separate sockets and thus could conceivably report different
> info if different CPUs were in each socket.
>
> > 3. Add new virsh command 'nodecachestats':
> > This API is to expose vary cache resouce left on each hardware (cpu
> socket).
> >
> > It will be formated as:
> >
> > .: left size KiB
> >
> > for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature
> only
> >
> > root@s2600wt:~/linux# virsh nodecachestats
> > L3.0 : 56320 KiB
> > L3.1 : 56320 KiB
> >
> >   P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
>
> This feels like something we should have in the capabilities XML too
> rather than a new command
>
> 
>   
>   
>   
>   
> 
>
> > 4. Add new interface to manage how many cache can be allociated for a
> domain
> >
> > root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> >
> > root@s2600wt:~/linux# virsh cachetune kvm02
> > l3.count   : 2
> >
> > This will allocate 2 units(2816 * 2) l3 cache for domain kvm02
> >
> > ## Domain XML changes
> >
> > Cache Tuneing
> >
> > 
> >   ...
> >   
> > 2
> >   
> >   ...
> > 
>
> IIUC, the kernel lets us associate individual PIDs
> with each cache. Since each vCPU is a PID, this means
> we are able to allocate different cache size to
> different CPUs. So we need to be able to represent
> that in the XML. I think we should also represent
> the allocation in a normal size (ie KiB), not in
> count of min unit.
>
>
ok


> So eg this shows allocating two cache banks and giving
> one to the first 4 cpus, and one to the second 4 cpus
>
>
>   
>   
>

oh, that depend what the CPUs topology, so I don't like here to ad cpus =
"0, 1, 2 , 3", we can not guarantee VM can running though CPU 0 1 2 3, so
they  may not benefit the cache bank.


>
>
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
> :|
>



-- 
Best regards
- Eli

天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Eli Qiao
2017-01-11 19:09 GMT+08:00 Daniel P. Berrange :

> On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
> > On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote:
> > >
> > > IIUC, the kernel lets us associate individual PIDs
> > > with each cache. Since each vCPU is a PID, this means
> > > we are able to allocate different cache size to
> > > different CPUs. So we need to be able to represent
> > > that in the XML. I think we should also represent
> > > the allocation in a normal size (ie KiB), not in
> > > count of min unit.
> > >
> > > So eg this shows allocating two cache banks and giving
> > > one to the first 4 cpus, and one to the second 4 cpus
> > >
> > >   
> > >  
> > >  
> > >   
> > >
> >
> > I agree with your approach, we just need to keep in mind two more
> > things.  I/O threads and the mail QEMU (emulator) thread can have
> > allocations as well.  Also we need to say on which socket the allocation
> > should be done.
>
> Also, I wonder if this is better put in the existing 
> element, since this is really an aspect of the CPU configuration.
>
> Perhaps split configuration of cache banks from the mapping to
> cpus/iothreads/emulator. Also, per Marcello's mail, we need to
> include the host cache ID, so we know where to allocate from
> if there's multiple caches of the same type. So XML could look
> more like this:
>
>
>
>


I don't think we require host_id here. we can only allow setting cache
allocation only IF the VM has vcpu -> pcpu affinity setting. and let
libvirt calculate where to set the cache (on which
cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches
are cpu's resource, only the VM running on specify cpu can benefit the
cache.

if we explicit allocate cache no care about what's the VM's pcpu affinity,
helpless.



>
>
   
>
>
>
>
>
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/
> :|
>



-- 
Best regards
- Eli

天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-11 Thread Eli Qiao
hi, It's really good to have you get involved to support CAT in
libvirt/OpenStack.
replied inlines.

2017-01-11 20:19 GMT+08:00 Marcelo Tosatti :

>
> Hi,
>
> Comments/questions related to:
> https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
>
> 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
>
> How does allocation of code/data look like?
>

My plan's expose new options:

virsh cachetune kvm02 --l3data.count 2 --l3code.count 2

Please notes, you can use only l3 or l3data/l3code(if enable cdp while
mount resctrl fs)


>
> 2) 'nodecachestats' command:
>
> 3. Add new virsh command 'nodecachestats':
> This API is to expose vary cache resouce left on each hardware (cpu
> socket).
> It will be formated as:
> .: left size KiB
>
> Does this take into account that only contiguous regions of cbm masks
> can be used for allocations?
>
>
yes, it is the contiguous regions cbm or in another word it's the default
cbm represent's cache value.

resctrl doesn't allow set non-contiguous cbm (which is restricted by
hardware)



> Also, it should return the amount of free cache on each cacheid.
>

yes, it is.  resource_id == cacheid



>
> 3) The interface should support different sizes for different
> cache-ids. See the KVM-RT use case at
> https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
> "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".
>

I don't think it's good to let user specify cache-ids while doing cache
allocation.

the cache ids used should rely on what cpu affinity the vm are setting.

eg.

1. for those host who has only one cache id(one socket host), we don't need
to set cache id
2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping
(define cpuset for a VM), then we (libvirt) need to compute how much cache
on which cache id should set.
Which is to say, user should set the cpu affinity before cache allocation.

I know that the most cases of using CAT is for NFV. As far as I know, NFV
is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to
worry about on which cache id we set the cache size.

So, just let user specify cache size(here my propose is cache unit account)
and let libvirt detect on which cache id set how many cache.



>
> 4) Usefulness of exposing minimum unit size.
>
> Rather than specify unit sizes (which forces the user
> to convert every time the command is executed), why not specify
> in kbytes and round up?
>

I accept this, I propose to expose minimum unit size because of I'd like to
let using specify the unit count(which as you say this is not good),

as you know the minimum unit size is decided by hard ware
eg
on a host, we have 56320 KiB cache, and the max cbm length is 20 (f),
so the minimum cache should be 56320/20 = 2816 KiB

if we allow use specify cache size instead of cache unit count, user may
set the cache as 2817 KiB, and we should round up it to 2816 * 2,  there
will be 2815 KiB wasted.

Anyway , I am open to using KiB size and let libvirt to calculate the cbm
bits, am thinking if we need to tell the actual_cache_size is up to 5632
KiB even they wants 2816 KiB cache.



>
>cache_unit='2816'/>
>
> As noted in item 1 of
> https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html,
> "1) Convertion of kbytes (user specification) --> number of CBM bits
> for host.",
> the format where the size is stored is kbytes, so its awkward
> to force users and OpenStack to perform the convertion themselves
> (and zero benefits... nothing changes if you know the unit size).



Hmm.. as I can see libvirt is just an user space API, not sure if whether
in libvirt we bypass some low level detail..


>


> Thanks!
>
>
>
>
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>



-- 
Regards Eli
天涯无处不重逢
a leaf duckweed belongs to the sea , where not to meet in life
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-11 Thread Marcelo Tosatti
On Wed, Jan 11, 2017 at 10:34:00AM -0200, Marcelo Tosatti wrote:
> On Wed, Jan 11, 2017 at 10:19:10AM -0200, Marcelo Tosatti wrote:
> > 
> > Hi,
> > 
> > Comments/questions related to:
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> > 
> > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> > 
> > How does allocation of code/data look like?
> > 
> > 2) 'nodecachestats' command:
> > 
> > 3. Add new virsh command 'nodecachestats':
> > This API is to expose vary cache resouce left on each hardware (cpu
> > socket).
> > It will be formated as:
> > .: left size KiB
> > 
> > Does this take into account that only contiguous regions of cbm masks
> > can be used for allocations?
> > 
> > Also, it should return the amount of free cache on each cacheid.
> > 
> > 3) The interface should support different sizes for different
> > cache-ids. See the KVM-RT use case at 
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
> > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".
> 
> And when the user specification lacks cacheid of a given socket in
> the system, the code should use the default resctrlfs masks
> (that is for the default group).
> 
> > 4) Usefulness of exposing minimum unit size.
> > 
> > Rather than specify unit sizes (which forces the user 
> > to convert every time the command is executed), why not specify 
> > in kbytes and round up?
> > 
> >> cache_unit='2816'/>
> > 
> > As noted in item 1 of
> > https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html,
> > "1) Convertion of kbytes (user specification) --> number of CBM bits
> > for host.", 
> > the format where the size is stored is kbytes, so its awkward 
> > to force users and OpenStack to perform the convertion themselves
> > (and zero benefits... nothing changes if you know the unit size).
> 
> 5) Please perform necessary filesystem locking as described
> at Documentation/x86/intel_rdt_ui.txt in the kernel source.

6) libvirt API should expose the cacheid <-> pcpu mapping
(when implementing cacheid support).

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-11 Thread Marcelo Tosatti
On Wed, Jan 11, 2017 at 10:19:10AM -0200, Marcelo Tosatti wrote:
> 
> Hi,
> 
> Comments/questions related to:
> https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html
> 
> 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> 
> How does allocation of code/data look like?
> 
> 2) 'nodecachestats' command:
> 
>   3. Add new virsh command 'nodecachestats':
>   This API is to expose vary cache resouce left on each hardware (cpu
>   socket).
>   It will be formated as:
>   .: left size KiB
> 
> Does this take into account that only contiguous regions of cbm masks
> can be used for allocations?
> 
> Also, it should return the amount of free cache on each cacheid.
> 
> 3) The interface should support different sizes for different
> cache-ids. See the KVM-RT use case at 
> https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
> "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".

And when the user specification lacks cacheid of a given socket in
the system, the code should use the default resctrlfs masks
(that is for the default group).

> 4) Usefulness of exposing minimum unit size.
> 
> Rather than specify unit sizes (which forces the user 
> to convert every time the command is executed), why not specify 
> in kbytes and round up?
> 
>cache_unit='2816'/>
> 
> As noted in item 1 of
> https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html,
> "1) Convertion of kbytes (user specification) --> number of CBM bits
> for host.", 
> the format where the size is stored is kbytes, so its awkward 
> to force users and OpenStack to perform the convertion themselves
> (and zero benefits... nothing changes if you know the unit size).

5) Please perform necessary filesystem locking as described
at Documentation/x86/intel_rdt_ui.txt in the kernel source.


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] "[V3] RFC for support cache tune in libvirt"

2017-01-11 Thread Marcelo Tosatti

Hi,

Comments/questions related to:
https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html

1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2

How does allocation of code/data look like?

2) 'nodecachestats' command:

3. Add new virsh command 'nodecachestats':
This API is to expose vary cache resouce left on each hardware (cpu
socket).
It will be formated as:
.: left size KiB

Does this take into account that only contiguous regions of cbm masks
can be used for allocations?

Also, it should return the amount of free cache on each cacheid.

3) The interface should support different sizes for different
cache-ids. See the KVM-RT use case at 
https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html
"WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)".

4) Usefulness of exposing minimum unit size.

Rather than specify unit sizes (which forces the user 
to convert every time the command is executed), why not specify 
in kbytes and round up?

  

As noted in item 1 of
https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html,
"1) Convertion of kbytes (user specification) --> number of CBM bits
for host.", 
the format where the size is stored is kbytes, so its awkward 
to force users and OpenStack to perform the convertion themselves
(and zero benefits... nothing changes if you know the unit size).


Thanks!


 


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Daniel P. Berrange
On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote:
> On Tue, Jan 10, 2017 at 07:42:59AM +, Qiao, Liyong wrote:
> > Add support for cache allocation.
> > 
> > Thanks Martin for the previous version comments, this is the v3 version for 
> > RFC , I’v have some PoC code [2]. The follow changes are partly finished by 
> > the PoC.
> > 
> > #Propose Changes
> > 
> > ## virsh command line
> > 
> > 1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last 
> > level cache size).
> > 
> > This will expose how many cache on a host which can be used.
> > 
> > root@s2600wt:~/linux# virsh nodeinfo | grep L3
> > L3 cache size:   56320 KiB
> 
> Ok, as previously discussed, we should include this in the capabilities
> XML instead and have info about all the caches. We likely also want to
> relate which CPUs are associated with which cache in some way.
> 
> eg if we have this topology
> 
> 
>   
> 
>   
> 
> 
> 
> 
> 
> 
>   
> 
> 
>   
> 
> 
> 
> 
> 
> 
>   
> 
>   
> 
> 
> We might have something like this cache info
> 
> 
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
>   
> 
> 
> which shows each socket has its own dedicated L3 cache, and each
> core has its own L2 & L1 cache.

We need to also include the host cache ID value in the XML to
let us reliably distinguish / associate with differet cache
banks when placing guests, if there's multiple caches of the
same type associated with the same CPU.

 
   
   
   
   
   
   
 



> > 3. Add new virsh command 'nodecachestats':
> > This API is to expose vary cache resouce left on each hardware (cpu socket).
> > 
> > It will be formated as:
> > 
> > .: left size KiB
> > 
> > for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
> > 
> > root@s2600wt:~/linux# virsh nodecachestats
> > L3.0 : 56320 KiB
> > L3.1 : 56320 KiB
> > 
> >   P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.
> 
> This feels like something we should have in the capabilities XML too
> rather than a new command
> 
> 
>   
>   
>   
>   
> 

Opps, ignore this. I remember the reason we always report available
resource separately from physically present resource, is that we
don't want to re-generate capabilities XML every time available
resource changes.

So, yes, we do need some API like  virNodeFreeCache()  / virs nodefreecache
We probably want to use an 2d array of typed parameters. The first level of
the array would represent the cache bank, the second level woudl represent
the parameters for that bank. eg if we had 3 cache banks, we'd report a
3x3 typed parameter array, with parameters for the cache ID, its type and
the available / free size

   id=0
   type=l3
   avail=56320

   id=1
   type=l3
   avail=56320

   id=2
   type=l3
   avail=56320

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Daniel P. Berrange
On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote:
> On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote:
> > 
> > IIUC, the kernel lets us associate individual PIDs
> > with each cache. Since each vCPU is a PID, this means
> > we are able to allocate different cache size to
> > different CPUs. So we need to be able to represent
> > that in the XML. I think we should also represent
> > the allocation in a normal size (ie KiB), not in
> > count of min unit.
> > 
> > So eg this shows allocating two cache banks and giving
> > one to the first 4 cpus, and one to the second 4 cpus
> > 
> >   
> >  
> >  
> >   
> > 
> 
> I agree with your approach, we just need to keep in mind two more
> things.  I/O threads and the mail QEMU (emulator) thread can have
> allocations as well.  Also we need to say on which socket the allocation
> should be done.

Also, I wonder if this is better put in the existing 
element, since this is really an aspect of the CPU configuration.

Perhaps split configuration of cache banks from the mapping to
cpus/iothreads/emulator. Also, per Marcello's mail, we need to
include the host cache ID, so we know where to allocate from
if there's multiple caches of the same type. So XML could look
more like this:

   
   
   

   
   
   
   
   


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Martin Kletzander

On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote:

On Tue, Jan 10, 2017 at 07:42:59AM +, Qiao, Liyong wrote:

Add support for cache allocation.

Thanks Martin for the previous version comments, this is the v3 version for RFC 
, I’v have some PoC code [2]. The follow changes are partly finished by the PoC.

#Propose Changes

## virsh command line

1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level 
cache size).

This will expose how many cache on a host which can be used.

root@s2600wt:~/linux# virsh nodeinfo | grep L3
L3 cache size:   56320 KiB


Ok, as previously discussed, we should include this in the capabilities
XML instead and have info about all the caches. We likely also want to
relate which CPUs are associated with which cache in some way.

eg if we have this topology

   
 
   
 
   
   
   
   
   
   
 
   
   
 
   
   
   
   
   
   
 
   
 
   

We might have something like this cache info

   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   

which shows each socket has its own dedicated L3 cache, and each
core has its own L2 & L1 cache.


2. Extend capabilities outputs.

virsh capabilities | grep resctrl

...
  


This will tell that the host have enabled resctrl(which you can find it in 
/sys/fs/resctrl),
And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 
KiB, and the minimum unit size of 'L3' cache is 2816 KiB.
  P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's 
hardware related and can not be changed.


If we're already reported cache in the capabilities from step
one, then it ought to be extendable to cover this reporting.

   
 
 
 
 
 
 
   

note how we report the control info for both l3 caches, since they
come from separate sockets and thus could conceivably report different
info if different CPUs were in each socket.


3. Add new virsh command 'nodecachestats':
This API is to expose vary cache resouce left on each hardware (cpu socket).

It will be formated as:

.: left size KiB

for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only

root@s2600wt:~/linux# virsh nodecachestats
L3.0 : 56320 KiB
L3.1 : 56320 KiB

  P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.


This feels like something we should have in the capabilities XML too
rather than a new command

   
 
 
 
 
   


4. Add new interface to manage how many cache can be allociated for a domain

root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2

root@s2600wt:~/linux# virsh cachetune kvm02
l3.count   : 2

This will allocate 2 units(2816 * 2) l3 cache for domain kvm02

## Domain XML changes

Cache Tuneing


  ...
  
2
  
  ...



IIUC, the kernel lets us associate individual PIDs
with each cache. Since each vCPU is a PID, this means
we are able to allocate different cache size to
different CPUs. So we need to be able to represent
that in the XML. I think we should also represent
the allocation in a normal size (ie KiB), not in
count of min unit.

So eg this shows allocating two cache banks and giving
one to the first 4 cpus, and one to the second 4 cpus

  
 
 
  



I agree with your approach, we just need to keep in mind two more
things.  I/O threads and the mail QEMU (emulator) thread can have
allocations as well.  Also we need to say on which socket the allocation
should be done.



Regards,
Daniel
--
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|


signature.asc
Description: Digital signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [V3] RFC for support cache tune in libvirt

2017-01-11 Thread Daniel P. Berrange
On Tue, Jan 10, 2017 at 07:42:59AM +, Qiao, Liyong wrote:
> Add support for cache allocation.
> 
> Thanks Martin for the previous version comments, this is the v3 version for 
> RFC , I’v have some PoC code [2]. The follow changes are partly finished by 
> the PoC.
> 
> #Propose Changes
> 
> ## virsh command line
> 
> 1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level 
> cache size).
> 
> This will expose how many cache on a host which can be used.
> 
> root@s2600wt:~/linux# virsh nodeinfo | grep L3
> L3 cache size:   56320 KiB

Ok, as previously discussed, we should include this in the capabilities
XML instead and have info about all the caches. We likely also want to
relate which CPUs are associated with which cache in some way.

eg if we have this topology


  

  






  


  






  

  


We might have something like this cache info


  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


which shows each socket has its own dedicated L3 cache, and each
core has its own L2 & L1 cache.

> 2. Extend capabilities outputs.
> 
> virsh capabilities | grep resctrl
> 
> ...
>   
> 
> 
> This will tell that the host have enabled resctrl(which you can find it 
> in /sys/fs/resctrl),
> And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 
> KiB, and the minimum unit size of 'L3' cache is 2816 KiB.
>   P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's 
> hardware related and can not be changed.

If we're already reported cache in the capabilities from step
one, then it ought to be extendable to cover this reporting.


  
  
  
  
  
  


note how we report the control info for both l3 caches, since they
come from separate sockets and thus could conceivably report different
info if different CPUs were in each socket.

> 3. Add new virsh command 'nodecachestats':
> This API is to expose vary cache resouce left on each hardware (cpu socket).
> 
> It will be formated as:
> 
> .: left size KiB
> 
> for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only
> 
> root@s2600wt:~/linux# virsh nodecachestats
> L3.0 : 56320 KiB
> L3.1 : 56320 KiB
> 
>   P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now.

This feels like something we should have in the capabilities XML too
rather than a new command


  
  
  
  


> 4. Add new interface to manage how many cache can be allociated for a domain
> 
> root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2
> 
> root@s2600wt:~/linux# virsh cachetune kvm02
> l3.count   : 2
> 
> This will allocate 2 units(2816 * 2) l3 cache for domain kvm02
> 
> ## Domain XML changes
> 
> Cache Tuneing
> 
> 
>   ...
>   
> 2
>   
>   ...
> 

IIUC, the kernel lets us associate individual PIDs
with each cache. Since each vCPU is a PID, this means
we are able to allocate different cache size to
different CPUs. So we need to be able to represent
that in the XML. I think we should also represent
the allocation in a normal size (ie KiB), not in
count of min unit.

So eg this shows allocating two cache banks and giving
one to the first 4 cpus, and one to the second 4 cpus

   
  
  
   


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list