> -----Original Message----- > From: libvir-list-boun...@redhat.com [mailto:libvir-list-boun...@redhat.com] > On Behalf Of Ren, Qiaowei > Sent: Monday, August 10, 2015 9:06 AM > To: 'Daniel P. Berrange' > Cc: 'libvir-list@redhat.com' > Subject: Re: [libvirt] [PATCH 2/3] Qemu: add CMT support > > > > -----Original Message----- > > From: Ren, Qiaowei > > Sent: Tuesday, July 21, 2015 4:00 PM > > To: Daniel P. Berrange > > Cc: libvir-list@redhat.com > > Subject: RE: [libvirt] [PATCH 2/3] Qemu: add CMT support > > > > On Jul 20, 2015 22:34, Daniel P. Berrange wrote: > > > On Mon, Jul 20, 2015 at 01:50:54PM +0000, Ren, Qiaowei wrote: > > >> > > >> Daniel P. Berrange wrote on Jul 20, 2015 17:32: > > >>> On Sun, Jul 05, 2015 at 07:43:43PM +0800, Qiaowei Ren wrote: > > >>>> One RFC in > > >>>> https://www.redhat.com/archives/libvir-list/2015-June/msg01509.ht > > >>>> m > > >>>> l > > >>>> > > >>>> CMT (Cache Monitoring Technology) can be used to measure the > > >>>> usage of cache by VM running on the host. This patch will extend > > >>>> the bulk stats API (virDomainListGetStats) to add this field. > > >>>> Applications based on libvirt can use this API to achieve cache > > >>>> usage of VM. Because CMT implementation in Linux kernel is based > > >>>> on perf mechanism, this patch will enable perf event for CMT when > > >>>> VM is created and disable it when VM is destroyed. > > >>>> > > >>>> Signed-off-by: Qiaowei Ren <qiaowei....@intel.com> > > >>>> > > >>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > > >>>> index > > >>>> 4cfae03..8c678c9 100644 --- a/src/qemu/qemu_driver.c +++ > > >>>> b/src/qemu/qemu_driver.c @@ -19320,6 +19320,53 @@ > > >>>> qemuDomainGetStatsBlock(virQEMUDriverPtr driver, > > >>>> > > >>>> #undef QEMU_ADD_COUNT_PARAM > > >>>> +static int +qemuDomainGetStatsCache(virQEMUDriverPtr driver > > >>>> ATTRIBUTE_UNUSED, + virDomainObjPtr dom, + > > >>>> virDomainStatsRecordPtr record, + > > >>>> int *maxparams, + unsigned int privflags > > >>>> ATTRIBUTE_UNUSED) > > >>> > > >>> So this is a method that is used to collect per-domain information > > >>> > > >>>> +{ > > >>>> + qemuDomainObjPrivatePtr priv = dom->privateData; > > >>>> + FILE *fd; > > >>>> + unsigned long long cache = 0; > > >>>> + int scaling_factor = 0; > > >>>> + > > >>>> + if (priv->cmt_fd <= 0) > > >>>> + return -1; > > >>>> + > > >>>> + if (read(priv->cmt_fd, &cache, sizeof(uint64_t)) < 0) { > > >>>> + virReportSystemError(errno, "%s", > > >>>> + _("Unable to read cache data")); > > >>>> + return -1; > > >>>> + } > > >>>> + > > >>>> + fd = fopen("/sys/devices/intel_cqm/events/llc_occupancy.scale", > > >>>> "r"); > > >>>> + if (!fd) { > > >>>> + virReportSystemError(errno, "%s", > > >>>> + _("Unable to open CMT scale file")); > > >>>> + return -1; > > >>>> + } > > >>>> + if (fscanf(fd, "%d", &scaling_factor) != 1) { > > >>>> + virReportSystemError(errno, "%s", > > >>>> + _("Unable to read CMT scale file")); > > >>>> + VIR_FORCE_FCLOSE(fd); > > >>>> + return -1; > > >>>> + } > > >>>> + VIR_FORCE_FCLOSE(fd); > > >>> > > >>> But this data you are reading is global to the entire host. > > >>> > > >> > > >> In fact this data is only for per-domain. When the perf syscall is > > >> called to enable perf event for domain, the pid of that domain is > > >> passed. > > > > > > Ah, I see - you rely on the open file descriptor to be associated with > > > the VM > pid. > > > > > > > > >>>> -5122,6 +5199,15 @@ void qemuProcessStop(virQEMUDriverPtr driver, > > >>>> virPortAllocatorRelease(driver->migrationPorts, priv->nbdPort); > > >>>> priv->nbdPort = 0; > > >>>> + /* Disable CMT */ > > >>>> + if (priv->cmt_fd > 0) { > > >>> > > >>> You can't rely on keeping an open file descriptor for the guest > > >>> because libvirtd may be restarted. > > >>> > > >> > > >> Sorry, I don't really get the meaning of this. You mean that when > > >> libvirtd is restarted, those resource which the domain opened > > >> should be closed, right? > > > > > > No, when libvirtd is restarted, the domains must all continuing > > > running without loss of state. You open the FD when starting the > > > guest, then libvirtd is restarted, now someone wants to query the > > > perf data. The perf FD will not be open anymore because libvirtd was > > > restarted. At least you'd need to re-open the file descriptor when > > > libvirtd starts up again, for any running guest. I'm not really > > > convinced we want to keep the perf file descriptors open for all the > > > domains for the entire time they are running. Should really only > > > open them > > when we actually want to read the collected data. > > > > > > > Got it! Should open/disable them when read the data. > > > > >> > > >>>> + if (ioctl(priv->cmt_fd, PERF_EVENT_IOC_DISABLE) < 0) { > > >>>> + virReportSystemError(errno, "%s", > > >>>> + _("Unable to disable perf event for > > >>>> CMT")); > > >>>> + } > > >>>> + VIR_FORCE_CLOSE(priv->cmt_fd); > > >>>> + } > > >>>> + > > >>>> if (priv->agent) { > > >>>> qemuAgentClose(priv->agent); > > >>>> priv->agent = NULL; > > >>> > > >>> > > >>> Conceptually I think this approach to implementation is flawed. > > >>> While you are turning on/off the perf events for each QEMU > > >>> process, the data collection does not distinguish data from each > > >>> QEMU process > > >>> - the data reported is host wide. So this really doesn't make much > > >>> sense > > IMHO. > > >>> > > >> > > >> As mentioned above, the data reported is only for domain. > > >> > > >>> I'm also wondering whether this is really going to be sufficiently > > >>> useful on its own. CPUs have countless other performance counters > > >>> that I would imagine apps/admins will want to read in order to > > >>> analyse QEMU performance, beyond this new CMT feature. The domain > > >>> stats API won't really scale up to dealing with arbitrary perf > > >>> event counter reporting so I'm not much of a fan of just special > > >>> casing CMT in this > > way. > > >>> > > >>> IOW, if we want to support host performance analysis in libvirt, > > >>> then we probably want to design an set of APIs specifically for > > >>> this purpose, but I could well see us saying that this is out of > > >>> scope for libvirt and apps shoud just use the linux perf interfaces > > >>> directly. > > >>> > > >> > > >> Yes. I can get what you mean. Maybe libvirt doesn't have to be > > >> responsible for supporting host performance. > > >> > > >> But I guess cache usage should be important for each domain, if > > >> those apps based on libvirt can achieve this information they will > > >> be able to better check and confirm the domain works normally, like > > >> the stats of cpu/memory/block/... which have be supported in > > >> libvirt now. Do you think so? > > > > > > I'm not saying cache usage is unimportant. There are quite a lot of > > > other hardware event counters in modern CPUs though, so I'm asking > > > why we should add just this new special intel event, and not any of > > > the other existing performance counters that are useful in > > > diagnosing > > performance issues. > > > > Ah, I guess you mean that the best way is providing a set of utility > > method for perf operation, like cgroup support. Then we can use these > > interface to support a lot of necessary perf counters. > > > > If we have to do so, maybe I can try to firstly implement such methods > > and then you can help review them. > > > > > Also, I'm thinking that QEMU has many different threads - VCPU > > > threads, I/O threads, emulator threads and so on. I could see users > > > want to have distinct profiling for these different functional areas > > > of QEMU too instead of just whole- QEMU granularity. > > > > > > > Yes. I believe such features should be useful for some users. I am > > currently working on adding some features like CMT into OpenStack, I > > only know profiling for VCPU/ IO / emulator threads should be not > > necessary for OpenStack. ^-^ > > > > Hi Daniel, > > what do you think about it now? If we have to add CMT support in Nova, do you > think what is the best way? Can we directly use linux perf interface in > OpenStack? >
Hi Daniel, If we firstly add utility methods for perf operation into libvirt like cgroup, and then implement CMT feature based on this, do you think it is OK? Thanks, Qiaowei -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list