Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
On Thu, Jan 12, 2017 at 11:06:06AM +, Daniel P. Berrange wrote: > On Thu, Jan 12, 2017 at 08:48:01AM -0200, Marcelo Tosatti wrote: > > On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote: > > > hi, It's really good to have you get involved to support CAT in > > > libvirt/OpenStack. > > > replied inlines. > > > > > > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti: > > > > > > > > > > > Hi, > > > > > > > > Comments/questions related to: > > > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > > > > > > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > > > > > > > How does allocation of code/data look like? > > > > > > > > > > My plan's expose new options: > > > > > > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2 > > > > > > Please notes, you can use only l3 or l3data/l3code(if enable cdp while > > > mount resctrl fs) > > > > Fine. However, you should be able to emulate a type=both reservation > > (non cdp) by writing a schemata file with the same CBM bits: > > > > L3code:0=0x000ff;1=0x000ff > > L3data:0=0x000ff;1=0x000ff > > > > (*) > > > > I don't see how this interface enables that possibility. > > > > I suppose it would be easier for mgmt software to have it > > done automatically: > > > > virsh cachetune kvm02 --l3 size_in_kbytes. > > > > Would create the reservations as (*) in resctrlfs, in > > case host is CDP enabled. > > You'll be able to query libvirt to determine whether you have > l3, or l3data + l3code. So mgmt app can decide to emulate > "type=both", if it sees l3data+l3code as separate items. No it can't, because the interface does not allow you to specify whether l3data and l3code should intersect each other (and by how much). Unless you do that. Which IMO is overkill. Or a parameter (option 1): virsh cachetune kvm02 --l3data size_in_kbytes --l3code size_in_kbytes --share-l3 meaning the reservations share space. OR (option 2): virsh cachetune kvm02 --l3 size_in_kbytes (with internal translation to l3data and l3code reservations in the same space). I don't see any point in having option 1, option 2 is simpler and removes a tunable from the interface. Do you have a reason behind the statement that mgmt app should decide this? > > > > 4) Usefulness of exposing minimum unit size. > > > > > > > > Rather than specify unit sizes (which forces the user > > > > to convert every time the command is executed), why not specify > > > > in kbytes and round up? > > > > > > > > > > I accept this, I propose to expose minimum unit size because of I'd like > > > to > > > let using specify the unit count(which as you say this is not good), > > > > > > as you know the minimum unit size is decided by hard ware > > > eg > > > on a host, we have 56320 KiB cache, and the max cbm length is 20 (f), > > > so the minimum cache should be 56320/20 = 2816 KiB > > > > > > if we allow use specify cache size instead of cache unit count, user may > > > set the cache as 2817 KiB, and we should round up it to 2816 * 2, there > > > will be 2815 KiB wasted. > > > > Yes but the user can know the wasted amount if necessary, if you expose > > the cache unit size (again, i doubt this will happen in practice because > > the granularity of the CBM bits is small compared to the cache size). > > > > The problem with the cache unit count specification is that it does not > > work across different hosts: if a user saves the "cache unit count" > > value manually in a XML file, then uses that XML file on a different > > host, the reservation on the new host can become smaller than desired, > > which violates expectations. > > Yes, public APIs should always use an actual size, usually KB in most > of our APIs, but sometimes bytes. > > > > Anyway , I am open to using KiB size and let libvirt to calculate the cbm > > > bits, am thinking if we need to tell the actual_cache_size is up to 5632 > > > KiB even they wants 2816 KiB cache. > > > > Another thing i did on resctrltool is to have a safety margin for > > allocations: do not let the user allocate all of the cache (that is > > leave 0 bytes for the default group). I used one cache unit as the > > minimum: > > > > if ret == ERR_LOW_SPACE: > > print "Warning: free space on default mask is <= %d\n" % > > (kbytes_per_bit_of_cbm) > > print "use --force to force" > > Libvirt explicitly aims to avoid making policy decisions like this. > As your "--force" message shows there, it means you then have to > add in ways to get around the policy. Libvirt just tries to provide > the mechanism and leave it to the app to decide on usage policy. Actually what i said is nonsense the kernel does it already. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
On Thu, Jan 12, 2017 at 08:48:01AM -0200, Marcelo Tosatti wrote: > On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote: > > hi, It's really good to have you get involved to support CAT in > > libvirt/OpenStack. > > replied inlines. > > > > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti: > > > > > > > > Hi, > > > > > > Comments/questions related to: > > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > > > > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > > > > > How does allocation of code/data look like? > > > > > > > My plan's expose new options: > > > > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2 > > > > Please notes, you can use only l3 or l3data/l3code(if enable cdp while > > mount resctrl fs) > > Fine. However, you should be able to emulate a type=both reservation > (non cdp) by writing a schemata file with the same CBM bits: > > L3code:0=0x000ff;1=0x000ff > L3data:0=0x000ff;1=0x000ff > > (*) > > I don't see how this interface enables that possibility. > > I suppose it would be easier for mgmt software to have it > done automatically: > > virsh cachetune kvm02 --l3 size_in_kbytes. > > Would create the reservations as (*) in resctrlfs, in > case host is CDP enabled. You'll be able to query libvirt to determine whether you have l3, or l3data + l3code. So mgmt app can decide to emulate "type=both", if it sees l3data+l3code as separate items. > > > > > > > 4) Usefulness of exposing minimum unit size. > > > > > > Rather than specify unit sizes (which forces the user > > > to convert every time the command is executed), why not specify > > > in kbytes and round up? > > > > > > > I accept this, I propose to expose minimum unit size because of I'd like to > > let using specify the unit count(which as you say this is not good), > > > > as you know the minimum unit size is decided by hard ware > > eg > > on a host, we have 56320 KiB cache, and the max cbm length is 20 (f), > > so the minimum cache should be 56320/20 = 2816 KiB > > > > if we allow use specify cache size instead of cache unit count, user may > > set the cache as 2817 KiB, and we should round up it to 2816 * 2, there > > will be 2815 KiB wasted. > > Yes but the user can know the wasted amount if necessary, if you expose > the cache unit size (again, i doubt this will happen in practice because > the granularity of the CBM bits is small compared to the cache size). > > The problem with the cache unit count specification is that it does not > work across different hosts: if a user saves the "cache unit count" > value manually in a XML file, then uses that XML file on a different > host, the reservation on the new host can become smaller than desired, > which violates expectations. Yes, public APIs should always use an actual size, usually KB in most of our APIs, but sometimes bytes. > > Anyway , I am open to using KiB size and let libvirt to calculate the cbm > > bits, am thinking if we need to tell the actual_cache_size is up to 5632 > > KiB even they wants 2816 KiB cache. > > Another thing i did on resctrltool is to have a safety margin for > allocations: do not let the user allocate all of the cache (that is > leave 0 bytes for the default group). I used one cache unit as the > minimum: > > if ret == ERR_LOW_SPACE: > print "Warning: free space on default mask is <= %d\n" % > (kbytes_per_bit_of_cbm) > print "use --force to force" Libvirt explicitly aims to avoid making policy decisions like this. As your "--force" message shows there, it means you then have to add in ways to get around the policy. Libvirt just tries to provide the mechanism and leave it to the app to decide on usage policy. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote: > hi, It's really good to have you get involved to support CAT in > libvirt/OpenStack. > replied inlines. > > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti: > > > > > Hi, > > > > Comments/questions related to: > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > > > How does allocation of code/data look like? > > > > My plan's expose new options: > > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2 > > Please notes, you can use only l3 or l3data/l3code(if enable cdp while > mount resctrl fs) Fine. However, you should be able to emulate a type=both reservation (non cdp) by writing a schemata file with the same CBM bits: L3code:0=0x000ff;1=0x000ff L3data:0=0x000ff;1=0x000ff (*) I don't see how this interface enables that possibility. I suppose it would be easier for mgmt software to have it done automatically: virsh cachetune kvm02 --l3 size_in_kbytes. Would create the reservations as (*) in resctrlfs, in case host is CDP enabled. (also please use kbytes, or give a reason to not use kbytes). Note: exposing the unit size is fine as mgmt software might decide a placement of VMs which reduces the amount of L3 cache reservation rounding (although i doubt anyone is going to care about that in practice). > > 2) 'nodecachestats' command: > > > > 3. Add new virsh command 'nodecachestats': > > This API is to expose vary cache resouce left on each hardware (cpu > > socket). > > It will be formated as: > > .: left size KiB > > > > Does this take into account that only contiguous regions of cbm masks > > can be used for allocations? > > > > > yes, it is the contiguous regions cbm or in another word it's the default > cbm represent's cache value. > > resctrl doesn't allow set non-contiguous cbm (which is restricted by > hardware) OK. > > > > Also, it should return the amount of free cache on each cacheid. > > > > yes, it is. resource_id == cacheid OK. > > > > 3) The interface should support different sizes for different > > cache-ids. See the KVM-RT use case at > > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html > > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)". > > > > I don't think it's good to let user specify cache-ids while doing cache > allocation. This is necessary for our usecase. > the cache ids used should rely on what cpu affinity the vm are setting. The cache ids configuration should match the cpu affinity configuration. > eg. > > 1. for those host who has only one cache id(one socket host), we don't need > to set cache id Right. > 2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping > (define cpuset for a VM), then we (libvirt) need to compute how much cache > on which cache id should set. > Which is to say, user should set the cpu affinity before cache allocation. > > I know that the most cases of using CAT is for NFV. As far as I know, NFV > is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to > worry about on which cache id we set the cache size. > > So, just let user specify cache size(here my propose is cache unit account) > and let libvirt detect on which cache id set how many cache. Ok fine, its OK to not expose this to the user but calculate it internally in libvirt. As long as you recompute the schematas whenever cpu affinity changes. But using different cache-id's in schemata is necessary for our usecase. > > > > 4) Usefulness of exposing minimum unit size. > > > > Rather than specify unit sizes (which forces the user > > to convert every time the command is executed), why not specify > > in kbytes and round up? > > > > I accept this, I propose to expose minimum unit size because of I'd like to > let using specify the unit count(which as you say this is not good), > > as you know the minimum unit size is decided by hard ware > eg > on a host, we have 56320 KiB cache, and the max cbm length is 20 (f), > so the minimum cache should be 56320/20 = 2816 KiB > > if we allow use specify cache size instead of cache unit count, user may > set the cache as 2817 KiB, and we should round up it to 2816 * 2, there > will be 2815 KiB wasted. Yes but the user can know the wasted amount if necessary, if you expose the cache unit size (again, i doubt this will happen in practice because the granularity of the CBM bits is small compared to the cache size). The problem with the cache unit count specification is that it does not work across different hosts: if a user saves the "cache unit count" value manually in a XML file, then uses that XML file on a different host, the reservation on the new host can become smaller than desired, which violates expectations. > Anyway , I am open to using KiB size and let libvirt to calculate the cbm > bits,
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
On Thu, Jan 12, 2017 at 08:47:58AM -0200, Marcelo Tosatti wrote: > On Thu, Jan 12, 2017 at 09:44:36AM +0800, 乔立勇(Eli Qiao) wrote: > > hi, It's really good to have you get involved to support CAT in > > libvirt/OpenStack. > > replied inlines. > > > > 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti: > > > > > > > > Hi, > > > > > > Comments/questions related to: > > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > > > > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > > > > > How does allocation of code/data look like? > > > > > > > My plan's expose new options: > > > > virsh cachetune kvm02 --l3data.count 2 --l3code.count 2 > > > > Please notes, you can use only l3 or l3data/l3code(if enable cdp while > > mount resctrl fs) > > Fine. However, you should be able to emulate a type=both reservation > (non cdp) by writing a schemata file with the same CBM bits: > > L3code:0=0x000ff;1=0x000ff > L3data:0=0x000ff;1=0x000ff > > (*) > > I don't see how this interface enables that possibility. > > I suppose it would be easier for mgmt software to have it > done automatically: > > virsh cachetune kvm02 --l3 size_in_kbytes. > > Would create the reservations as (*) in resctrlfs, in > case host is CDP enabled. > > (also please use kbytes, or give a reason to not use > kbytes). > > Note: exposing the unit size is fine as mgmt software might > decide a placement of VMs which reduces the amount of L3 > cache reservation rounding (although i doubt anyone is going > to care about that in practice). > > > > 2) 'nodecachestats' command: > > > > > > 3. Add new virsh command 'nodecachestats': > > > This API is to expose vary cache resouce left on each hardware > > > (cpu > > > socket). > > > It will be formated as: > > > .: left size KiB > > > > > > Does this take into account that only contiguous regions of cbm masks > > > can be used for allocations? > > > > > > > > yes, it is the contiguous regions cbm or in another word it's the default > > cbm represent's cache value. > > > > resctrl doesn't allow set non-contiguous cbm (which is restricted by > > hardware) > > OK. > > > > > > > > Also, it should return the amount of free cache on each cacheid. > > > > > > > yes, it is. resource_id == cacheid > > OK. > > > > > > 3) The interface should support different sizes for different > > > cache-ids. See the KVM-RT use case at > > > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html > > > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)". > > > > > > > I don't think it's good to let user specify cache-ids while doing cache > > allocation. > > This is necessary for our usecase. > > > the cache ids used should rely on what cpu affinity the vm are setting. > > The cache ids configuration should match the cpu affinity configuration. > > > eg. > > > > 1. for those host who has only one cache id(one socket host), we don't need > > to set cache id > > Right. > > > 2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping > > (define cpuset for a VM), then we (libvirt) need to compute how much cache > > on which cache id should set. > > Which is to say, user should set the cpu affinity before cache allocation. > > > > I know that the most cases of using CAT is for NFV. As far as I know, NFV > > is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to > > worry about on which cache id we set the cache size. > > > > So, just let user specify cache size(here my propose is cache unit account) > > and let libvirt detect on which cache id set how many cache. > > Ok fine, its OK to not expose this to the user but calculate it > internally in libvirt. As long as you recompute the schematas whenever > cpu affinity changes. But using different cache-id's in schemata is > necessary for our usecase. Hum, thinking again about this, it needs to be per-vcpu. So for the NFV use-case you want: vcpu0: no reservation (belongs to the default group). vcpu1: reservation with particular size. Then if a vcpu is pinned, "trim" the reservation down to the particular cache-id where its pinned to. This is important because it allows vcpu0 workload to not interfere with the realtime workload running on vcpu1. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Thu, Jan 12, 2017 at 09:20:30AM +, Daniel P. Berrange wrote: On Thu, Jan 12, 2017 at 11:15:39AM +0800, 乔立勇(Eli Qiao) wrote: > > > > > > yes, I like this too, it could tell the the resource sharing logic by cpus. Another thinking is that if kernel enable CDP, it will split l3 cache to code / data type So these information should not only from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if linux resctrl under /sys/fs/resctrl/ > > I think on your system you don't enable SMT, so if on a system which enabled SMT. we will have: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hmm... l2 and l1 cache are per core, I am not sure if we really need to tune the l2 and l1 cache at all, that's too low level... Per my understanding, if we expose this kinds of capabilities, we should support to manage it, just wonder if we are too early to expose it since low level (linux kernel) have not support it yet. We don't need to list l2/l1 cache in the XML right now. The example above shows that the schemas is capable of supporting it in the future, which is the important thing. So we can start with only reporting L3, and add l2/l1 later if we find it is needed without having to change the XML again. Another idea of mine was to expose those caches that hosts supports allocation on (i.e. capability a client can use). But that could feel messy in the end. Just a thought. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Thu, Jan 12, 2017 at 11:15:39AM +0800, 乔立勇(Eli Qiao) wrote: > > > > > > > > > > > > > > yes, I like this too, it could tell the the resource sharing logic by cpus. > > Another thinking is that if kernel enable CDP, it will split l3 cache to > code / data type > > > > > So these information should not only > from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if > linux resctrl under /sys/fs/resctrl/ > > > > > > > > > I think on your system you don't enable SMT, so if on a system which > enabled SMT. > > we will have: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hmm... l2 and l1 cache are per core, I am not sure if we really need to > tune the l2 and l1 cache at all, that's too low level... > > Per my understanding, if we expose this kinds of capabilities, we should > support to manage it, just wonder if we are too early to > expose it since low level (linux kernel) have not support it yet. We don't need to list l2/l1 cache in the XML right now. The example above shows that the schemas is capable of supporting it in the future, which is the important thing. So we can start with only reporting L3, and add l2/l1 later if we find it is needed without having to change the XML again. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Thu, Jan 12, 2017 at 10:58:53AM +0800, 乔立勇(Eli Qiao) wrote: > 2017-01-11 19:09 GMT+08:00 Daniel P. Berrange: > > > On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote: > > > On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote: > > > > > > > > IIUC, the kernel lets us associate individual PIDs > > > > with each cache. Since each vCPU is a PID, this means > > > > we are able to allocate different cache size to > > > > different CPUs. So we need to be able to represent > > > > that in the XML. I think we should also represent > > > > the allocation in a normal size (ie KiB), not in > > > > count of min unit. > > > > > > > > So eg this shows allocating two cache banks and giving > > > > one to the first 4 cpus, and one to the second 4 cpus > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I agree with your approach, we just need to keep in mind two more > > > things. I/O threads and the mail QEMU (emulator) thread can have > > > allocations as well. Also we need to say on which socket the allocation > > > should be done. > > > > Also, I wonder if this is better put in the existing > > element, since this is really an aspect of the CPU configuration. > > > > Perhaps split configuration of cache banks from the mapping to > > cpus/iothreads/emulator. Also, per Marcello's mail, we need to > > include the host cache ID, so we know where to allocate from > > if there's multiple caches of the same type. So XML could look > > more like this: > > > > > > > > > > > I don't think we require host_id here. we can only allow setting cache > allocation only IF the VM has vcpu -> pcpu affinity setting. and let > libvirt calculate where to set the cache (on which > cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches > are cpu's resource, only the VM running on specify cpu can benefit the > cache. Lets say the guest is pinned to CPU 3, and there are two separate L3 caches associated with CPU 3. If we don't include host_id, then libvirt has to decide which of the two possible caches to allocate from. We can do that, but generally we've tried to avoid such policy decisions in libvirt before, hence I thought it preferrable to have the admin be explicit about which cache they want. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Thu, Jan 12, 2017 at 10:58:53AM +0800, 乔立勇(Eli Qiao) wrote: 2017-01-11 19:09 GMT+08:00 Daniel P. Berrange: On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote: > On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote: > > > > IIUC, the kernel lets us associate individual PIDs > > with each cache. Since each vCPU is a PID, this means > > we are able to allocate different cache size to > > different CPUs. So we need to be able to represent > > that in the XML. I think we should also represent > > the allocation in a normal size (ie KiB), not in > > count of min unit. > > > > So eg this shows allocating two cache banks and giving > > one to the first 4 cpus, and one to the second 4 cpus > > > > > > > > > > > > > > I agree with your approach, we just need to keep in mind two more > things. I/O threads and the mail QEMU (emulator) thread can have > allocations as well. Also we need to say on which socket the allocation > should be done. Also, I wonder if this is better put in the existing element, since this is really an aspect of the CPU configuration. Perhaps split configuration of cache banks from the mapping to cpus/iothreads/emulator. Also, per Marcello's mail, we need to include the host cache ID, so we know where to allocate from if there's multiple caches of the same type. So XML could look more like this: I don't think we require host_id here. we can only allow setting cache allocation only IF the VM has vcpu -> pcpu affinity setting. and let libvirt calculate where to set the cache (on which cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches are cpu's resource, only the VM running on specify cpu can benefit the cache. if we explicit allocate cache no care about what's the VM's pcpu affinity, helpless. One thing we need to decide upfront is whether we are going to be fixing user misconfiguration and to which extent because I feel like there's too much discussion about that. So either: a) We make sure that each thread that utilizes CAT is pinned to host threads without split cache, i.e. it cannot be scheduled outside of those. I'm not using socket/core/thread and L3 because we need to be prepared here just in case any other cache hierarchy is used. b) We let user specify whatever they want. Option (a) requires more code, more work, and must be checked on all changes (vcpupin API, XML change, CPU hotplug, etc.), but option (b) goes more with the rest of libvirt's config where we just let the users shoot themselves in their feet by misconfiguration, i.e. if someone wants to allocate cache on socket 0 and schedule all CPUs on socket 1, then it's their fault. Option (a) can save us some specification from the XML, because we can compute some of the values. However, that might not be very reliable and we might end up requiring all the values specified at the end anyway. So from my point of view, I'd rather go with (b) just so we don't swamp ourselves with the details, also we can add the checks later. And most importantly, as mentioned before, it goes with the rest of the code. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Thu, Jan 12, 2017 at 11:19:07AM +0800, 乔立勇(Eli Qiao) wrote: more like this: If so, we need to extend "virsh cputune" or and new API like cachetune? Yeah, sure, that's a detail to be done after the design is done. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
> > more like this: > > > > > > If so, we need to extend "virsh cputune" or and new API like cachetune? > > > > > > > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ > :| > |: http://libvirt.org -o- http://virt-manager.org > :| > |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ > :| > -- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
> > > > which shows each socket has its own dedicated L3 cache, and each > > core has its own L2 & L1 cache. > > We need to also include the host cache ID value in the XML to > let us reliably distinguish / associate with differet cache > banks when placing guests, if there's multiple caches of the > same type associated with the same CPU. > > > > > cpus="3,4,5,9,10,11"/> > cpus="3,4,5,9,10,11"/> > > > > > > > > > 3. Add new virsh command 'nodecachestats': > > > This API is to expose vary cache resouce left on each hardware (cpu > socket). > > > > > > It will be formated as: > > > > > > .: left size KiB > > > > > > for example I have a 2 socket cpus host, and I'v enabled cat_l3 > feature only > > > > > > root@s2600wt:~/linux# virsh nodecachestats > > > L3.0 : 56320 KiB > > > L3.1 : 56320 KiB > > > > > > P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now. > > > > This feels like something we should have in the capabilities XML too > > rather than a new command > > > > > > > > > > > > > > > > Opps, ignore this. I remember the reason we always report available > resource separately from physically present resource, is that we > don't want to re-generate capabilities XML every time available > resource changes. > > So, yes, we do need some API like virNodeFreeCache() / virs nodefreecache > yes, we need this. > We probably want to use an 2d array of typed parameters. The first level of > the array would represent the cache bank, the second level woudl represent > the parameters for that bank. eg if we had 3 cache banks, we'd report a > 3x3 typed parameter array, with parameters for the cache ID, its type and > the available / free size > >id=0 >type=l3 >avail=56320 > >id=1 >type=l3 >avail=56320 > >id=2 >type=l3 >avail=56320 > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ > :| > |: http://libvirt.org -o- http://virt-manager.org > :| > |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ > :| > -- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
> > > > > > yes, I like this too, it could tell the the resource sharing logic by cpus. Another thinking is that if kernel enable CDP, it will split l3 cache to code / data type So these information should not only from /sys/devices/system/cpu/cpu0/cache/index3/size , also depend on if linux resctrl under /sys/fs/resctrl/ > > I think on your system you don't enable SMT, so if on a system which enabled SMT. we will have: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hmm... l2 and l1 cache are per core, I am not sure if we really need to tune the l2 and l1 cache at all, that's too low level... Per my understanding, if we expose this kinds of capabilities, we should support to manage it, just wonder if we are too early to expose it since low level (linux kernel) have not support it yet. > which shows each socket has its own dedicated L3 cache, and each > core has its own L2 & L1 cache. > > > 2. Extend capabilities outputs. > > > > virsh capabilities | grep resctrl > > > > ... > >cache_unit='2816'/> > > > > > > This will tell that the host have enabled resctrl(which you can find > it in /sys/fs/resctrl), > > And it supports to allocate 'L3' type cache, total 'L3' cache size is > 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB. > > P.S. L3 cache size unit is the minum l3 cache unit can be allocated. > It's hardware related and can not be changed. > > If we're already reported cache in the capabilities from step > one, then it ought to be extendable to cover this reporting. > > > > > > > > > > > Looks good to me. > note how we report the control info for both l3 caches, since they > come from separate sockets and thus could conceivably report different > info if different CPUs were in each socket. > > > 3. Add new virsh command 'nodecachestats': > > This API is to expose vary cache resouce left on each hardware (cpu > socket). > > > > It will be formated as: > > > > .: left size KiB > > > > for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature > only > > > > root@s2600wt:~/linux# virsh nodecachestats > > L3.0 : 56320 KiB > > L3.1 : 56320 KiB > > > > P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now. > > This feels like something we should have in the capabilities XML too > rather than a new command > > > > > > > > > > 4. Add new interface to manage how many cache can be allociated for a > domain > > > > root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > > > root@s2600wt:~/linux# virsh cachetune kvm02 > > l3.count : 2 > > > > This will allocate 2 units(2816 * 2) l3 cache for domain kvm02 > > > > ## Domain XML changes > > > > Cache Tuneing > > > > > > ... > > > > 2 > > > > ... > > > > IIUC, the kernel lets us associate individual PIDs > with each cache. Since each vCPU is a PID, this means > we are able to allocate different cache size to > different CPUs. So we need to be able to represent > that in the XML. I think we should also represent > the allocation in a normal size (ie KiB), not in > count of min unit. > > ok > So eg this shows allocating two cache banks and giving > one to the first 4 cpus, and one to the second 4 cpus > > > > > oh, that depend what the CPUs topology, so I don't like here to ad cpus = "0, 1, 2 , 3", we can not guarantee VM can running though CPU 0 1 2 3, so they may not benefit the cache bank. > > > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ > :| > |: http://libvirt.org -o- http://virt-manager.org > :| > |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ > :| > -- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
2017-01-11 19:09 GMT+08:00 Daniel P. Berrange: > On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote: > > On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote: > > > > > > IIUC, the kernel lets us associate individual PIDs > > > with each cache. Since each vCPU is a PID, this means > > > we are able to allocate different cache size to > > > different CPUs. So we need to be able to represent > > > that in the XML. I think we should also represent > > > the allocation in a normal size (ie KiB), not in > > > count of min unit. > > > > > > So eg this shows allocating two cache banks and giving > > > one to the first 4 cpus, and one to the second 4 cpus > > > > > > > > > > > > > > > > > > > > > > I agree with your approach, we just need to keep in mind two more > > things. I/O threads and the mail QEMU (emulator) thread can have > > allocations as well. Also we need to say on which socket the allocation > > should be done. > > Also, I wonder if this is better put in the existing > element, since this is really an aspect of the CPU configuration. > > Perhaps split configuration of cache banks from the mapping to > cpus/iothreads/emulator. Also, per Marcello's mail, we need to > include the host cache ID, so we know where to allocate from > if there's multiple caches of the same type. So XML could look > more like this: > > > > I don't think we require host_id here. we can only allow setting cache allocation only IF the VM has vcpu -> pcpu affinity setting. and let libvirt calculate where to set the cache (on which cache_id/resource_id/socket_id, the 3 ids are some meaning) since l3 caches are cpu's resource, only the VM running on specify cpu can benefit the cache. if we explicit allocate cache no care about what's the VM's pcpu affinity, helpless. > > > > > > > > > Regards, > Daniel > -- > |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ > :| > |: http://libvirt.org -o- http://virt-manager.org > :| > |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ > :| > -- Best regards - Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
hi, It's really good to have you get involved to support CAT in libvirt/OpenStack. replied inlines. 2017-01-11 20:19 GMT+08:00 Marcelo Tosatti: > > Hi, > > Comments/questions related to: > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > How does allocation of code/data look like? > My plan's expose new options: virsh cachetune kvm02 --l3data.count 2 --l3code.count 2 Please notes, you can use only l3 or l3data/l3code(if enable cdp while mount resctrl fs) > > 2) 'nodecachestats' command: > > 3. Add new virsh command 'nodecachestats': > This API is to expose vary cache resouce left on each hardware (cpu > socket). > It will be formated as: > .: left size KiB > > Does this take into account that only contiguous regions of cbm masks > can be used for allocations? > > yes, it is the contiguous regions cbm or in another word it's the default cbm represent's cache value. resctrl doesn't allow set non-contiguous cbm (which is restricted by hardware) > Also, it should return the amount of free cache on each cacheid. > yes, it is. resource_id == cacheid > > 3) The interface should support different sizes for different > cache-ids. See the KVM-RT use case at > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)". > I don't think it's good to let user specify cache-ids while doing cache allocation. the cache ids used should rely on what cpu affinity the vm are setting. eg. 1. for those host who has only one cache id(one socket host), we don't need to set cache id 2. if multiple cache ids(sockets), user should set vcpu -> pcpu mapping (define cpuset for a VM), then we (libvirt) need to compute how much cache on which cache id should set. Which is to say, user should set the cpu affinity before cache allocation. I know that the most cases of using CAT is for NFV. As far as I know, NFV is using NUMA and cpu pining (vcpu -> pcpu mapping), so we don't need to worry about on which cache id we set the cache size. So, just let user specify cache size(here my propose is cache unit account) and let libvirt detect on which cache id set how many cache. > > 4) Usefulness of exposing minimum unit size. > > Rather than specify unit sizes (which forces the user > to convert every time the command is executed), why not specify > in kbytes and round up? > I accept this, I propose to expose minimum unit size because of I'd like to let using specify the unit count(which as you say this is not good), as you know the minimum unit size is decided by hard ware eg on a host, we have 56320 KiB cache, and the max cbm length is 20 (f), so the minimum cache should be 56320/20 = 2816 KiB if we allow use specify cache size instead of cache unit count, user may set the cache as 2817 KiB, and we should round up it to 2816 * 2, there will be 2815 KiB wasted. Anyway , I am open to using KiB size and let libvirt to calculate the cbm bits, am thinking if we need to tell the actual_cache_size is up to 5632 KiB even they wants 2816 KiB cache. > >cache_unit='2816'/> > > As noted in item 1 of > https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html, > "1) Convertion of kbytes (user specification) --> number of CBM bits > for host.", > the format where the size is stored is kbytes, so its awkward > to force users and OpenStack to perform the convertion themselves > (and zero benefits... nothing changes if you know the unit size). Hmm.. as I can see libvirt is just an user space API, not sure if whether in libvirt we bypass some low level detail.. > > Thanks! > > > > > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- Regards Eli 天涯无处不重逢 a leaf duckweed belongs to the sea , where not to meet in life -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
On Wed, Jan 11, 2017 at 10:34:00AM -0200, Marcelo Tosatti wrote: > On Wed, Jan 11, 2017 at 10:19:10AM -0200, Marcelo Tosatti wrote: > > > > Hi, > > > > Comments/questions related to: > > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > > > How does allocation of code/data look like? > > > > 2) 'nodecachestats' command: > > > > 3. Add new virsh command 'nodecachestats': > > This API is to expose vary cache resouce left on each hardware (cpu > > socket). > > It will be formated as: > > .: left size KiB > > > > Does this take into account that only contiguous regions of cbm masks > > can be used for allocations? > > > > Also, it should return the amount of free cache on each cacheid. > > > > 3) The interface should support different sizes for different > > cache-ids. See the KVM-RT use case at > > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html > > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)". > > And when the user specification lacks cacheid of a given socket in > the system, the code should use the default resctrlfs masks > (that is for the default group). > > > 4) Usefulness of exposing minimum unit size. > > > > Rather than specify unit sizes (which forces the user > > to convert every time the command is executed), why not specify > > in kbytes and round up? > > > >> cache_unit='2816'/> > > > > As noted in item 1 of > > https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html, > > "1) Convertion of kbytes (user specification) --> number of CBM bits > > for host.", > > the format where the size is stored is kbytes, so its awkward > > to force users and OpenStack to perform the convertion themselves > > (and zero benefits... nothing changes if you know the unit size). > > 5) Please perform necessary filesystem locking as described > at Documentation/x86/intel_rdt_ui.txt in the kernel source. 6) libvirt API should expose the cacheid <-> pcpu mapping (when implementing cacheid support). -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
On Wed, Jan 11, 2017 at 10:19:10AM -0200, Marcelo Tosatti wrote: > > Hi, > > Comments/questions related to: > https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html > > 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > How does allocation of code/data look like? > > 2) 'nodecachestats' command: > > 3. Add new virsh command 'nodecachestats': > This API is to expose vary cache resouce left on each hardware (cpu > socket). > It will be formated as: > .: left size KiB > > Does this take into account that only contiguous regions of cbm masks > can be used for allocations? > > Also, it should return the amount of free cache on each cacheid. > > 3) The interface should support different sizes for different > cache-ids. See the KVM-RT use case at > https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html > "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)". And when the user specification lacks cacheid of a given socket in the system, the code should use the default resctrlfs masks (that is for the default group). > 4) Usefulness of exposing minimum unit size. > > Rather than specify unit sizes (which forces the user > to convert every time the command is executed), why not specify > in kbytes and round up? > >cache_unit='2816'/> > > As noted in item 1 of > https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html, > "1) Convertion of kbytes (user specification) --> number of CBM bits > for host.", > the format where the size is stored is kbytes, so its awkward > to force users and OpenStack to perform the convertion themselves > (and zero benefits... nothing changes if you know the unit size). 5) Please perform necessary filesystem locking as described at Documentation/x86/intel_rdt_ui.txt in the kernel source. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] "[V3] RFC for support cache tune in libvirt"
Hi, Comments/questions related to: https://www.redhat.com/archives/libvir-list/2017-January/msg00354.html 1) root s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 How does allocation of code/data look like? 2) 'nodecachestats' command: 3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket). It will be formated as: .: left size KiB Does this take into account that only contiguous regions of cbm masks can be used for allocations? Also, it should return the amount of free cache on each cacheid. 3) The interface should support different sizes for different cache-ids. See the KVM-RT use case at https://www.redhat.com/archives/libvir-list/2017-January/msg00415.html "WHAT THE USER NEEDS TO SPECIFY FOR VIRTUALIZATION (KVM-RT)". 4) Usefulness of exposing minimum unit size. Rather than specify unit sizes (which forces the user to convert every time the command is executed), why not specify in kbytes and round up? As noted in item 1 of https://www.redhat.com/archives/libvir-list/2017-January/msg00494.html, "1) Convertion of kbytes (user specification) --> number of CBM bits for host.", the format where the size is stored is kbytes, so its awkward to force users and OpenStack to perform the convertion themselves (and zero benefits... nothing changes if you know the unit size). Thanks! -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote: > On Tue, Jan 10, 2017 at 07:42:59AM +, Qiao, Liyong wrote: > > Add support for cache allocation. > > > > Thanks Martin for the previous version comments, this is the v3 version for > > RFC , I’v have some PoC code [2]. The follow changes are partly finished by > > the PoC. > > > > #Propose Changes > > > > ## virsh command line > > > > 1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last > > level cache size). > > > > This will expose how many cache on a host which can be used. > > > > root@s2600wt:~/linux# virsh nodeinfo | grep L3 > > L3 cache size: 56320 KiB > > Ok, as previously discussed, we should include this in the capabilities > XML instead and have info about all the caches. We likely also want to > relate which CPUs are associated with which cache in some way. > > eg if we have this topology > > > > > > > > > > > > > > > > > > > > > > > > > > > We might have something like this cache info > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > which shows each socket has its own dedicated L3 cache, and each > core has its own L2 & L1 cache. We need to also include the host cache ID value in the XML to let us reliably distinguish / associate with differet cache banks when placing guests, if there's multiple caches of the same type associated with the same CPU. > > 3. Add new virsh command 'nodecachestats': > > This API is to expose vary cache resouce left on each hardware (cpu socket). > > > > It will be formated as: > > > > .: left size KiB > > > > for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only > > > > root@s2600wt:~/linux# virsh nodecachestats > > L3.0 : 56320 KiB > > L3.1 : 56320 KiB > > > > P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now. > > This feels like something we should have in the capabilities XML too > rather than a new command > > > > > > > Opps, ignore this. I remember the reason we always report available resource separately from physically present resource, is that we don't want to re-generate capabilities XML every time available resource changes. So, yes, we do need some API like virNodeFreeCache() / virs nodefreecache We probably want to use an 2d array of typed parameters. The first level of the array would represent the cache bank, the second level woudl represent the parameters for that bank. eg if we had 3 cache banks, we'd report a 3x3 typed parameter array, with parameters for the cache ID, its type and the available / free size id=0 type=l3 avail=56320 id=1 type=l3 avail=56320 id=2 type=l3 avail=56320 Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Wed, Jan 11, 2017 at 11:55:28AM +0100, Martin Kletzander wrote: > On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote: > > > > IIUC, the kernel lets us associate individual PIDs > > with each cache. Since each vCPU is a PID, this means > > we are able to allocate different cache size to > > different CPUs. So we need to be able to represent > > that in the XML. I think we should also represent > > the allocation in a normal size (ie KiB), not in > > count of min unit. > > > > So eg this shows allocating two cache banks and giving > > one to the first 4 cpus, and one to the second 4 cpus > > > > > > > > > > > > > > I agree with your approach, we just need to keep in mind two more > things. I/O threads and the mail QEMU (emulator) thread can have > allocations as well. Also we need to say on which socket the allocation > should be done. Also, I wonder if this is better put in the existing element, since this is really an aspect of the CPU configuration. Perhaps split configuration of cache banks from the mapping to cpus/iothreads/emulator. Also, per Marcello's mail, we need to include the host cache ID, so we know where to allocate from if there's multiple caches of the same type. So XML could look more like this: Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Wed, Jan 11, 2017 at 10:05:26AM +, Daniel P. Berrange wrote: On Tue, Jan 10, 2017 at 07:42:59AM +, Qiao, Liyong wrote: Add support for cache allocation. Thanks Martin for the previous version comments, this is the v3 version for RFC , I’v have some PoC code [2]. The follow changes are partly finished by the PoC. #Propose Changes ## virsh command line 1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level cache size). This will expose how many cache on a host which can be used. root@s2600wt:~/linux# virsh nodeinfo | grep L3 L3 cache size: 56320 KiB Ok, as previously discussed, we should include this in the capabilities XML instead and have info about all the caches. We likely also want to relate which CPUs are associated with which cache in some way. eg if we have this topology We might have something like this cache info which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache. 2. Extend capabilities outputs. virsh capabilities | grep resctrl ... This will tell that the host have enabled resctrl(which you can find it in /sys/fs/resctrl), And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 KiB, and the minimum unit size of 'L3' cache is 2816 KiB. P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's hardware related and can not be changed. If we're already reported cache in the capabilities from step one, then it ought to be extendable to cover this reporting. note how we report the control info for both l3 caches, since they come from separate sockets and thus could conceivably report different info if different CPUs were in each socket. 3. Add new virsh command 'nodecachestats': This API is to expose vary cache resouce left on each hardware (cpu socket). It will be formated as: .: left size KiB for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only root@s2600wt:~/linux# virsh nodecachestats L3.0 : 56320 KiB L3.1 : 56320 KiB P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now. This feels like something we should have in the capabilities XML too rather than a new command 4. Add new interface to manage how many cache can be allociated for a domain root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 root@s2600wt:~/linux# virsh cachetune kvm02 l3.count : 2 This will allocate 2 units(2816 * 2) l3 cache for domain kvm02 ## Domain XML changes Cache Tuneing ... 2 ... IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit. So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus I agree with your approach, we just need to keep in mind two more things. I/O threads and the mail QEMU (emulator) thread can have allocations as well. Also we need to say on which socket the allocation should be done. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [V3] RFC for support cache tune in libvirt
On Tue, Jan 10, 2017 at 07:42:59AM +, Qiao, Liyong wrote: > Add support for cache allocation. > > Thanks Martin for the previous version comments, this is the v3 version for > RFC , I’v have some PoC code [2]. The follow changes are partly finished by > the PoC. > > #Propose Changes > > ## virsh command line > > 1. Extend output of nodeinfo, to expose L3 cache size for Level 3 (last level > cache size). > > This will expose how many cache on a host which can be used. > > root@s2600wt:~/linux# virsh nodeinfo | grep L3 > L3 cache size: 56320 KiB Ok, as previously discussed, we should include this in the capabilities XML instead and have info about all the caches. We likely also want to relate which CPUs are associated with which cache in some way. eg if we have this topology We might have something like this cache info which shows each socket has its own dedicated L3 cache, and each core has its own L2 & L1 cache. > 2. Extend capabilities outputs. > > virsh capabilities | grep resctrl > > ... > > > > This will tell that the host have enabled resctrl(which you can find it > in /sys/fs/resctrl), > And it supports to allocate 'L3' type cache, total 'L3' cache size is 56320 > KiB, and the minimum unit size of 'L3' cache is 2816 KiB. > P.S. L3 cache size unit is the minum l3 cache unit can be allocated. It's > hardware related and can not be changed. If we're already reported cache in the capabilities from step one, then it ought to be extendable to cover this reporting. note how we report the control info for both l3 caches, since they come from separate sockets and thus could conceivably report different info if different CPUs were in each socket. > 3. Add new virsh command 'nodecachestats': > This API is to expose vary cache resouce left on each hardware (cpu socket). > > It will be formated as: > > .: left size KiB > > for example I have a 2 socket cpus host, and I'v enabled cat_l3 feature only > > root@s2600wt:~/linux# virsh nodecachestats > L3.0 : 56320 KiB > L3.1 : 56320 KiB > > P.S. resource_type can be L3, L3DATA, L3CODE, L2 for now. This feels like something we should have in the capabilities XML too rather than a new command > 4. Add new interface to manage how many cache can be allociated for a domain > > root@s2600wt:~/linux# virsh cachetune kvm02 --l3.count 2 > > root@s2600wt:~/linux# virsh cachetune kvm02 > l3.count : 2 > > This will allocate 2 units(2816 * 2) l3 cache for domain kvm02 > > ## Domain XML changes > > Cache Tuneing > > > ... > > 2 > > ... > IIUC, the kernel lets us associate individual PIDs with each cache. Since each vCPU is a PID, this means we are able to allocate different cache size to different CPUs. So we need to be able to represent that in the XML. I think we should also represent the allocation in a normal size (ie KiB), not in count of min unit. So eg this shows allocating two cache banks and giving one to the first 4 cpus, and one to the second 4 cpus Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o-http://search.cpan.org/~danberr/ :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list