Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-06-03 Thread Daniel P. Berrange
On Fri, May 24, 2013 at 11:37:04AM -0400, Peter Feiner wrote:
 On Wed, May 22, 2013 at 7:31 PM, Peter Feiner pe...@gridcentric.ca wrote:
  Since some security driver operations are costly, I think it's
  worthwhile to reduce the scope of the security manager lock or
  increase the granularity by introducing more locks. After a cursory
  look, the security manager lock seems to have a much broader scope
  than necessary. The system / library calls underlying the security
  drivers are all thread safe (e.g., defining apparmor security profiles
  or chowning disk files), so a global lock isn't strictly necessary.
  Moreover, since most virSecurity calls are made whilst a virDomainObj
  lock is held and the security calls are generally domain specific,
  *most* of the security calls are probably thread safe in the absence
  of the global security manager lock. Obviously some work will have to
  be done to see where the security lock actually matters and some
  finer-grained locks will have to be introduced to handle these
  situations.
 
 To verify that this is worthwhile, I disabled the apparmor driver
 entirely. My 20 VM creation test ran about 10s faster (down from 35s
 to 25s).
 
 After giving this approach a little more thought, I think an
 incremental series of patches is a good way to go. The responsibility
 of locking could be pushed down into the security drivers. At first,
 all of the drivers would lock where their managers' locked. Then each
 driver could be updated to do more fine-grained locking. I'm going to
 work on a patch to push the locking down into the drivers, then I'm
 going to work on a patch for better locking in the apparmor driver.

Yep, that sounds like a sane approach to me. Previously the security
drivers had no locking at all, since they were relying on the global
lock at the QEMU driver level. When I introduced the lock into the
security manager module, I was pessimistic and used coarse locking.
As you say, we can clearly relax this somewhat, if we have the locking
in the individual security drivers.

  I also think it's worthwhile to eliminate locking from the the
  virDomainObjList lookups and traversals. Since virDomainObjLists are
  accessed in a bunch of places, I think it's a good defensive idea to
  decouple the performance of these accesses from virDomainObj locks,
  which are held during potentially long-running operations like domain
  creation. An easy way to divorce virDomainObjListSearchName from the
  virDomainObj lock would be to keep a copy of the domain names in the
  virDomainObjList and protect that list with the virDomainObjList lock.
 
 After removing the security driver contention, this was still a
 substantial bottleneck: virConnectDefineXML could still take a few
 seconds. I removed the contention by keeping a copy of the domain
 definition's name in the domain object. Since the name is immutable
 and the domain object is protected by the list lock, the list
 traversal can read the name without taking any additional locks. This
 patch reduced virConnectDefineXML to tens of milliseconds.

Yep, I had a patch to add a security hash table to the domain object
list, hashing based on name, but I lost the code when a disk died.
I didn't find it made any difference, but agree we should just do it
anyway, since it'll almost certainly be a problem in some scenarios.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-24 Thread Peter Feiner
On Wed, May 22, 2013 at 7:31 PM, Peter Feiner pe...@gridcentric.ca wrote:
 Since some security driver operations are costly, I think it's
 worthwhile to reduce the scope of the security manager lock or
 increase the granularity by introducing more locks. After a cursory
 look, the security manager lock seems to have a much broader scope
 than necessary. The system / library calls underlying the security
 drivers are all thread safe (e.g., defining apparmor security profiles
 or chowning disk files), so a global lock isn't strictly necessary.
 Moreover, since most virSecurity calls are made whilst a virDomainObj
 lock is held and the security calls are generally domain specific,
 *most* of the security calls are probably thread safe in the absence
 of the global security manager lock. Obviously some work will have to
 be done to see where the security lock actually matters and some
 finer-grained locks will have to be introduced to handle these
 situations.

To verify that this is worthwhile, I disabled the apparmor driver
entirely. My 20 VM creation test ran about 10s faster (down from 35s
to 25s).

After giving this approach a little more thought, I think an
incremental series of patches is a good way to go. The responsibility
of locking could be pushed down into the security drivers. At first,
all of the drivers would lock where their managers' locked. Then each
driver could be updated to do more fine-grained locking. I'm going to
work on a patch to push the locking down into the drivers, then I'm
going to work on a patch for better locking in the apparmor driver.

 I also think it's worthwhile to eliminate locking from the the
 virDomainObjList lookups and traversals. Since virDomainObjLists are
 accessed in a bunch of places, I think it's a good defensive idea to
 decouple the performance of these accesses from virDomainObj locks,
 which are held during potentially long-running operations like domain
 creation. An easy way to divorce virDomainObjListSearchName from the
 virDomainObj lock would be to keep a copy of the domain names in the
 virDomainObjList and protect that list with the virDomainObjList lock.

After removing the security driver contention, this was still a
substantial bottleneck: virConnectDefineXML could still take a few
seconds. I removed the contention by keeping a copy of the domain
definition's name in the domain object. Since the name is immutable
and the domain object is protected by the list lock, the list
traversal can read the name without taking any additional locks. This
patch reduced virConnectDefineXML to tens of milliseconds.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-22 Thread Peter Feiner
   One theory I had was that the virDomainObjListSearchName method could
   be a bottleneck, becaue that acquires a lock on every single VM. This
   is invoked when starting a VM, when we call virDomainObjListAddLocked.
   I tried removing this locking though  didn't see any performance
   benefit, so never persued this further.  Before trying things like
   this again, I think we'd need to find a way to actually identify where
   the true bottlenecks are, rather than guesswork.
...
 Oh someone has already written such a systemtap script

 http://sourceware.org/systemtap/examples/process/mutex-contention.stp

 I think that is preferrable to trying to embed special code in
 libvirt for this task.

 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Cool! The systemtap approach was very fruitful. BTW, at the time of
writing, the example script has a bug. See
http://sourceware.org/ml/systemtap/2013-q2/msg00169.html for the fix.

So the root cause of my bottleneck is the virSecurityManager lock.
From this root cause a few other bottlenecks emerge. The interesting
parts of the mutex-contention.stp report are pasted at the end of this
email. Here's the summary  my analysis:

When a domain is created (domainCreateWithFlags), the domain object's
lock is held. During the domain creation, various virSecurity
functions are called, which all grab the security manager's lock.
Since the security manager's lock is global, some fraction of
domainCreateWithFlags is serialized by this lock. Since some
virSecurity functions can take a long time, such as
virSecurityManagerGenLabel for the apparmor security driver, which
takes around 1s, the serialization that the security manager lock
induces in domainCreateWithFlags is substantial. Since the domain's
object lock is held all of this time, virDomainObjListSearchName
blocks, thereby serializing virConnectDefineXML via
virDomainObjListAdd, as you suggested earlier. Moreover, since the
virDomainObjList lock is held while blocking in
virDomainObjListSearchName, there's measurable contention whilst
looking up domains during domainCreateWithFlags.

Since some security driver operations are costly, I think it's
worthwhile to reduce the scope of the security manager lock or
increase the granularity by introducing more locks. After a cursory
look, the security manager lock seems to have a much broader scope
than necessary. The system / library calls underlying the security
drivers are all thread safe (e.g., defining apparmor security profiles
or chowning disk files), so a global lock isn't strictly necessary.
Moreover, since most virSecurity calls are made whilst a virDomainObj
lock is held and the security calls are generally domain specific,
*most* of the security calls are probably thread safe in the absence
of the global security manager lock. Obviously some work will have to
be done to see where the security lock actually matters and some
finer-grained locks will have to be introduced to handle these
situations.

I also think it's worthwhile to eliminate locking from the the
virDomainObjList lookups and traversals. Since virDomainObjLists are
accessed in a bunch of places, I think it's a good defensive idea to
decouple the performance of these accesses from virDomainObj locks,
which are held during potentially long-running operations like domain
creation. An easy way to divorce virDomainObjListSearchName from the
virDomainObj lock would be to keep a copy of the domain names in the
virDomainObjList and protect that list with the virDomainObjList lock.

What do you think?

Peter

==
stack contended 4 times, 261325 avg usec, 576521 max usec, 1045301
total usec, at
__lll_lock_wait+0x1c [libpthread-2.15.so]
_L_lock_858+0xf [libpthread-2.15.so]
__pthread_mutex_lock+0x3a [libpthread-2.15.so]
virDomainObjListFindByUUID+0x21 [libvirt.so.0.1000.4]
qemuDomainGetXMLDesc+0x48 [libvirt_driver_qemu.so]
virDomainGetXMLDesc+0xf5 [libvirt.so.0.1000.4]
remoteDispatchDomainGetXMLDescHelper+0xb6 [libvirtd]
virNetServerProgramDispatch+0x498 [libvirt.so.0.1000.4]
virNetServerProcessMsg+0x2a [libvirt.so.0.1000.4]
virNetServerHandleJob+0x73 [libvirt.so.0.1000.4]
virThreadPoolWorker+0x10e
==
stack contended 12 times, 128053 avg usec, 992567 max usec, 1536640
total usec, at
__lll_lock_wait+0x1c [libpthread-2.15.so]
_L_lock_858+0xf [libpthread-2.15.so]
__pthread_mutex_lock+0x3a [libpthread-2.15.so]
virDomainObjListFindByUUID+0x21 [libvirt.so.0.1000.4]
qemuDomainStartWithFlags+0x5a [libvirt_driver_qemu.so]
virDomainCreateWithFlags+0xf5 [libvirt.so.0.1000.4]
remoteDispatchDomainCreateWithFlagsHelper+0xbe [libvirtd]
virNetServerProgramDispatch+0x498 [libvirt.so.0.1000.4]
virNetServerProcessMsg+0x2a 

Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Daniel P. Berrange
On Thu, May 16, 2013 at 12:09:39PM -0400, Peter Feiner wrote:
 Hello Daniel,
 
 I've been working on improving scalability in OpenStack on libvirt+kvm
 for the last couple of months. I'm particularly interested in reducing
 the time it takes to create VMs when many VMs are requested in
 parallel.
 
 One apparent bottleneck during virtual machine creation is libvirt. As
 more VMs are created in parallel, some libvirt calls (i.e.,
 virConnectGetLibVersion and virDomainCreateWithFlags) take longer
 without a commensurate increase in hardware utilization.
 
 Thanks to your patches in libvirt-1.0.3, the situation has improved.
 Some libvirt calls OpenStack makes during VM creation (i.e.,
 virConnectDefineXML) have no measurable slowdown when many VMs are
 created in parallel. In turn, parallel VM creation in OpenStack is
 significantly faster with libvirt-1.0.3. On my standard benchmark
 (create 20 VMs in parallel, wait until the VM is ACTIVE, which is
 essentially after virDomainCreateWithFlags returns), libvirt-1.0.3
 reduces the median creation time from 90s to 60s when compared to
 libvirt-0.9.8.

How many CPU cores are you testing on ?  That's a good improvement,
but I'd expect the improvement to be greater as # of core is larger.

Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
limit a single connection to only 5 RPC calls. Beyond that calls
queue up, even if libvirtd is otherwise idle. OpenStack uses a
single connection for everythin so will hit this. I suspect this
would be why  virConnectGetLibVersion would appear to be slow. That
API does absolutely nothing of any consequence, so the only reason
I'd expect that to be slow is if you're hitting a libvirtd RPC
limit causing the API to be queued up.

 I'd like to know if your concurrency work in the qemu driver is
 ongoing. If it isn't, I'd like to pick the work up myself and work on
 further improvements. Any advice or insight would be appreciated.

I'm not actively doing anything in this area. Mostly because I've got not
clear data on where any remaining bottlenecks are. 

One theory I had was that the virDomainObjListSearchName method could
be a bottleneck, becaue that acquires a lock on every single VM. This
is invoked when starting a VM, when we call virDomainObjListAddLocked.
I tried removing this locking though  didn't see any performance
benefit, so never persued this further.  Before trying things like
this again, I think we'd need to find a way to actually identify where
the true bottlenecks are, rather than guesswork.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Peter Feiner
 How many CPU cores are you testing on ?  That's a good improvement,
 but I'd expect the improvement to be greater as # of core is larger.

I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
software bottlenecks, I'm intentionally running fewer tasks (20 parallel
creations) than the number of logical cores (24). The memory, disk and
network are also well over provisioned.

 Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
 limit a single connection to only 5 RPC calls. Beyond that calls
 queue up, even if libvirtd is otherwise idle. OpenStack uses a
 single connection for everythin so will hit this. I suspect this
 would be why  virConnectGetLibVersion would appear to be slow. That
 API does absolutely nothing of any consequence, so the only reason
 I'd expect that to be slow is if you're hitting a libvirtd RPC
 limit causing the API to be queued up.

I hadn't tuned libvirtd.conf at all. I have just increased
max_{clients,workers,requests,client_requests} to 50 and repeated my
experiment. As you expected, virtConnectGetLibVersion is now very fast.
Unfortunately, the median VM creation time didn't change.

 I'm not actively doing anything in this area. Mostly because I've got not
 clear data on where any remaining bottlenecks are.

Unless there are other parameters to tweak, I believe I'm still hitting a
bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
calls are

virConnectDefineXML*: 13ms vs 4.5s
virDomainCreateWithFlags*: 1.8s vs 20s

* I had said that virConnectDefineXML wasn't serialized in my first email. I
  based that observation on a single trace I looked at :-) In the average case,
  virConnectDefineXML is affected by a bottleneck.

Note that when I took these measurements, I also monitored CPU  disk
utilization.
During the 20 VM test, both CPU  disk were well below 100% for 97% of the test
(i.e., 60s test duration, measured utilization with atop using a 2
second interval,
CPU was pegged for 2s).

 One theory I had was that the virDomainObjListSearchName method could
 be a bottleneck, becaue that acquires a lock on every single VM. This
 is invoked when starting a VM, when we call virDomainObjListAddLocked.
 I tried removing this locking though  didn't see any performance
 benefit, so never persued this further.  Before trying things like
 this again, I think we'd need to find a way to actually identify where
 the true bottlenecks are, rather than guesswork.

Testing your hypothesis would be straightforward. I'll add some
instrumentation to
measure the time spent waiting for the locks and repeat my 20 VM experiment. Or,
if there's some systematic lock profiling in place, then I can turn
that on and report
the results.

Thanks,
Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Daniel P. Berrange
On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
  How many CPU cores are you testing on ?  That's a good improvement,
  but I'd expect the improvement to be greater as # of core is larger.
 
 I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
 software bottlenecks, I'm intentionally running fewer tasks (20 parallel
 creations) than the number of logical cores (24). The memory, disk and
 network are also well over provisioned.
 
  Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
  limit a single connection to only 5 RPC calls. Beyond that calls
  queue up, even if libvirtd is otherwise idle. OpenStack uses a
  single connection for everythin so will hit this. I suspect this
  would be why  virConnectGetLibVersion would appear to be slow. That
  API does absolutely nothing of any consequence, so the only reason
  I'd expect that to be slow is if you're hitting a libvirtd RPC
  limit causing the API to be queued up.
 
 I hadn't tuned libvirtd.conf at all. I have just increased
 max_{clients,workers,requests,client_requests} to 50 and repeated my
 experiment. As you expected, virtConnectGetLibVersion is now very fast.
 Unfortunately, the median VM creation time didn't change.
 
  I'm not actively doing anything in this area. Mostly because I've got not
  clear data on where any remaining bottlenecks are.
 
 Unless there are other parameters to tweak, I believe I'm still hitting a
 bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
 calls are
 
 virConnectDefineXML*: 13ms vs 4.5s
 virDomainCreateWithFlags*: 1.8s vs 20s
 
 * I had said that virConnectDefineXML wasn't serialized in my first email. I
   based that observation on a single trace I looked at :-) In the average 
 case,
   virConnectDefineXML is affected by a bottleneck.

virConnectDefineXML would at least hit the possible bottleneck on
the virDomainObjListAddLocked method. In fact that's pretty much
the only contended lock I'd expect it to hit. Nothing else that
it runs has any serious locking involved.

 Note that when I took these measurements, I also monitored CPU  disk
 utilization.
 During the 20 VM test, both CPU  disk were well below 100% for 97% of the 
 test
 (i.e., 60s test duration, measured utilization with atop using a 2
 second interval,
 CPU was pegged for 2s).
 
  One theory I had was that the virDomainObjListSearchName method could
  be a bottleneck, becaue that acquires a lock on every single VM. This
  is invoked when starting a VM, when we call virDomainObjListAddLocked.
  I tried removing this locking though  didn't see any performance
  benefit, so never persued this further.  Before trying things like
  this again, I think we'd need to find a way to actually identify where
  the true bottlenecks are, rather than guesswork.
 
 Testing your hypothesis would be straightforward. I'll add some
 instrumentation to
 measure the time spent waiting for the locks and repeat my 20 VM experiment. 
 Or,
 if there's some systematic lock profiling in place, then I can turn
 that on and report
 the results.

There's no lock profiling support built-in to libvirt. I'm not sure
of the best way introduce such support without it impacting the very
thing we're trying to test.  Suggestions welcome

Perhaps a systemtap script would do a reasonable job at it though.
eg record any stack traces associated with long futex_wait() system
calls or something like that.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Peter Feiner
On Thu, May 16, 2013 at 1:18 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
  How many CPU cores are you testing on ?  That's a good improvement,
  but I'd expect the improvement to be greater as # of core is larger.

 I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
 software bottlenecks, I'm intentionally running fewer tasks (20 parallel
 creations) than the number of logical cores (24). The memory, disk and
 network are also well over provisioned.

  Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
  limit a single connection to only 5 RPC calls. Beyond that calls
  queue up, even if libvirtd is otherwise idle. OpenStack uses a
  single connection for everythin so will hit this. I suspect this
  would be why  virConnectGetLibVersion would appear to be slow. That
  API does absolutely nothing of any consequence, so the only reason
  I'd expect that to be slow is if you're hitting a libvirtd RPC
  limit causing the API to be queued up.

 I hadn't tuned libvirtd.conf at all. I have just increased
 max_{clients,workers,requests,client_requests} to 50 and repeated my
 experiment. As you expected, virtConnectGetLibVersion is now very fast.
 Unfortunately, the median VM creation time didn't change.

  I'm not actively doing anything in this area. Mostly because I've got not
  clear data on where any remaining bottlenecks are.

 Unless there are other parameters to tweak, I believe I'm still hitting a
 bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for libvirt
 calls are

 virConnectDefineXML*: 13ms vs 4.5s
 virDomainCreateWithFlags*: 1.8s vs 20s

 * I had said that virConnectDefineXML wasn't serialized in my first email. I
   based that observation on a single trace I looked at :-) In the average 
 case,
   virConnectDefineXML is affected by a bottleneck.

 virConnectDefineXML would at least hit the possible bottleneck on
 the virDomainObjListAddLocked method. In fact that's pretty much
 the only contended lock I'd expect it to hit. Nothing else that
 it runs has any serious locking involved.

Okay cool, I'll measure this. I'll also try to figure out what
virDomainCreateWithFlags is waiting on.

 Note that when I took these measurements, I also monitored CPU  disk
 utilization.
 During the 20 VM test, both CPU  disk were well below 100% for 97% of the 
 test
 (i.e., 60s test duration, measured utilization with atop using a 2
 second interval,
 CPU was pegged for 2s).

  One theory I had was that the virDomainObjListSearchName method could
  be a bottleneck, becaue that acquires a lock on every single VM. This
  is invoked when starting a VM, when we call virDomainObjListAddLocked.
  I tried removing this locking though  didn't see any performance
  benefit, so never persued this further.  Before trying things like
  this again, I think we'd need to find a way to actually identify where
  the true bottlenecks are, rather than guesswork.

 Testing your hypothesis would be straightforward. I'll add some
 instrumentation to
 measure the time spent waiting for the locks and repeat my 20 VM experiment. 
 Or,
 if there's some systematic lock profiling in place, then I can turn
 that on and report
 the results.

 There's no lock profiling support built-in to libvirt. I'm not sure
 of the best way introduce such support without it impacting the very
 thing we're trying to test.  Suggestions welcome

A straightforward way to keep lock statistics with low overhead and
w/out affecting concurrency would be to use thread local storage
(TLS). At the end of a run, or periodically, the stats could be
aggregated and reported. Since the stats don't have to be precise,
it's OK to do the aggregation racily.

Simple statistics to keep are

* For each lock L, the time spent waiting.
* For each lock L and callsite C, the time spent waiting.

It would probably be sufficient to identify L as the lock's parent
class name. If per-instance stats are necessary, then we could add the
address of the object to the identity of L.

So pseudo code would look something like this:

struct lock_stats {
map of (lock_class) to unsigned long: wait_time;
map of (lock_class, stack_trace) to unsigned long: callsite_wait_time;
};

__thread struct lock_stats * lock_stats;

thread_local_storage_init() {
lock_stats = new lock_stats;
}

/* return microseconds elapsed since some arbitrary start time */
unsigned long timestamp(void) {
struct timespec timespec;
clock_gettime(CLOCK_MONOTONIC, timespec);
return timespec.tv_sec * 1e6 + timespec.tv_sec / 10;
}

void virObjectLock(void *anyobj) {
unsigned long start, elapsed;
virObjectLockablePtr obj = anyobj;

start = timestamp();
virMutexLock(obj-lock);
elapsed = timestamp() - start;

lock_stats-wait_time[obj-parent.klass-name] += elapsed
lock_stats-wait_time[obj-parent.klass-name, get_stack_trace()] += elapsed
}

 Perhaps a 

Re: [libvirt] Ongoing work on lock contention in qemu driver?

2013-05-16 Thread Daniel P. Berrange
On Thu, May 16, 2013 at 06:18:57PM +0100, Daniel P. Berrange wrote:
 On Thu, May 16, 2013 at 01:00:15PM -0400, Peter Feiner wrote:
   How many CPU cores are you testing on ?  That's a good improvement,
   but I'd expect the improvement to be greater as # of core is larger.
  
  I'm testing on 12 Cores x 2 HT per code. As I'm working on teasing out
  software bottlenecks, I'm intentionally running fewer tasks (20 parallel
  creations) than the number of logical cores (24). The memory, disk and
  network are also well over provisioned.
  
   Also did you tune /etc/libvirt/libvirtd.conf at all ? By default we
   limit a single connection to only 5 RPC calls. Beyond that calls
   queue up, even if libvirtd is otherwise idle. OpenStack uses a
   single connection for everythin so will hit this. I suspect this
   would be why  virConnectGetLibVersion would appear to be slow. That
   API does absolutely nothing of any consequence, so the only reason
   I'd expect that to be slow is if you're hitting a libvirtd RPC
   limit causing the API to be queued up.
  
  I hadn't tuned libvirtd.conf at all. I have just increased
  max_{clients,workers,requests,client_requests} to 50 and repeated my
  experiment. As you expected, virtConnectGetLibVersion is now very fast.
  Unfortunately, the median VM creation time didn't change.
  
   I'm not actively doing anything in this area. Mostly because I've got not
   clear data on where any remaining bottlenecks are.
  
  Unless there are other parameters to tweak, I believe I'm still hitting a
  bottleneck. Booting 1 VM vs booting 20 VMs in parallel, the times for 
  libvirt
  calls are
  
  virConnectDefineXML*: 13ms vs 4.5s
  virDomainCreateWithFlags*: 1.8s vs 20s
  
  * I had said that virConnectDefineXML wasn't serialized in my first email. I
based that observation on a single trace I looked at :-) In the average 
  case,
virConnectDefineXML is affected by a bottleneck.
 
 virConnectDefineXML would at least hit the possible bottleneck on
 the virDomainObjListAddLocked method. In fact that's pretty much
 the only contended lock I'd expect it to hit. Nothing else that
 it runs has any serious locking involved.
 
  Note that when I took these measurements, I also monitored CPU  disk
  utilization.
  During the 20 VM test, both CPU  disk were well below 100% for 97% of the 
  test
  (i.e., 60s test duration, measured utilization with atop using a 2
  second interval,
  CPU was pegged for 2s).
  
   One theory I had was that the virDomainObjListSearchName method could
   be a bottleneck, becaue that acquires a lock on every single VM. This
   is invoked when starting a VM, when we call virDomainObjListAddLocked.
   I tried removing this locking though  didn't see any performance
   benefit, so never persued this further.  Before trying things like
   this again, I think we'd need to find a way to actually identify where
   the true bottlenecks are, rather than guesswork.
  
  Testing your hypothesis would be straightforward. I'll add some
  instrumentation to
  measure the time spent waiting for the locks and repeat my 20 VM 
  experiment. Or,
  if there's some systematic lock profiling in place, then I can turn
  that on and report
  the results.
 
 There's no lock profiling support built-in to libvirt. I'm not sure
 of the best way introduce such support without it impacting the very
 thing we're trying to test.  Suggestions welcome
 
 Perhaps a systemtap script would do a reasonable job at it though.
 eg record any stack traces associated with long futex_wait() system
 calls or something like that.

Oh someone has already written such a systemtap script

http://sourceware.org/systemtap/examples/process/mutex-contention.stp

I think that is preferrable to trying to embed special code in
libvirt for this task.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list