Re: [libvirt] [Qemu-devel] incorrect memory size inside vm

2015-06-18 Thread Andrey Korolyov
 Do You see similar results at Your side?

 Best regards


Would you mind to share you argument set to an emulator? As far as I
understood you are using plain ballooning with most results from above
for which those numbers are expected. The case with 5+gig memory
consumption for deflated 1G guest looks like a bug with mixed
dimm/balloon configuration if you are tried against latest qemu, so
please describe a setup a bit more verbosely too.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Overhead for a default cpu cg placement scheme

2015-06-18 Thread Andrey Korolyov
On Thu, Jun 18, 2015 at 12:09 PM, Daniel P. Berrange
berra...@redhat.com wrote:
 On Wed, Jun 17, 2015 at 10:55:35PM +0300, Andrey Korolyov wrote:

 Sorry for a delay, the 'perf numa numa-mem -p 8 -t 2 -P 384 -C 0 -M 0
 -s 200 -zZq --thp 1 --no-data_rand_walk' exposes a difference of value
 0.96 by 1. The trick I did (and successfully forget) before is in
 setting the value of the cfs_quota in a machine wide group, up one
 level from individual vcpus.

 Right now, libvirt sets values from
 cputune
 period10/period
 quota20/quota
 /cputune
 for each vCPU thread cgroup, which is a bit wrong by my understanding , like
 /cgroup/cpu/machine/vmxx/vcpu0: period=10, quota=200
 /cgroup/cpu/machine/vmxx/vcpu1: period=10, quota=200
 /cgroup/cpu/machine/vmxx/vcpu2: period=10, quota=200
 /cgroup/cpu/machine/vmxx/vcpu3: period=10, quota=200


 In other words, the user (me) assumed that he limited total
 consumption of the VM by two cores total, though all every thread can
 consume up to a single CPU, resulting in a four-core consumption
 instead. With different cpu count/quota/host cpu count ratios there
 would be different practical limitations with same period to quota
 ratio, where a single total quota will result in much more predictable
 top consumption. I had put the same quota to period ratio in a
 VM-level directory to meet the expectancies from a config setting and
 there one can observe a mentioned performance drop.

 With default placement there is no difference in a performance
 numbers, but the behavior of the libvirt itself is kinda controversial
 there. The documentation says that this is a right behavior as well,
 but I think that the limiting the vcpu group with total quota is far
 more flexible than per-vcpu limitations which can negatively impact
 single-threaded processes in the guest, plus the overall consumption
 should be recalculated every time when host core count or guest core
 count changes. Sorry for not mentioning the custom scheme before, if
 mine assumption about execution flexibility is plainly wrong, I`ll
 withdraw my concerns from above. I am using the 'mine' scheme for a
 couple of years in production and it is proved (for me) to be a far
 less complex for a workload balancing for a cpu-congested hypervisor
 than a generic one.

 As you say there are two possible directions libvirt was able to take
 when implementing the schedular tunables. Either apply them to the
 VM as a whole, or apply them to the individual vCPUS. We debated this
 a fair bit, but in the end we took the per-VCPU approach. There were
 two real compelling reasons. First, if users have 2 guests with
 identical configurations, but give one of the guests 2 vCPUs and the
 other guest 4 vCPUs, the general expectation is that the one with
 4 vCPUS will have twice the performance. If we apply the CFS tuning
 at the VM level, then as you added vCPUs you'd get no increase in
 performance.  The second reason was that people wanted to be able to
 control performance of the emulator threads, separately from the
 vCPU threads. Now we also have dedicated I/O threads that can have
 different tuning set. This would be impossible if we were always
 setting stuff at the VM level.

 It would in theory be possible for us to add a further tunable to the
 VM config which allowed VM level tuning.  eg we could define something
 like

  vmtune
period10/period
quota20/quota
  /vmtune

 Semantically, if vmtune was set, we would then forbid use of the
 cputune and emulatortune configurations, as they'd be mutually
 exclusive. In such a case we'd avoid creating the sub-cgroups for
 vCPUs and emulator threads, etc.

 The question is whether the benefit would outweigh the extra code
 complexity to deal with this. I appreciate you would desire this
 kind of setup, but I think we'd probably need more than one person
 requesting use of this kind of setup in order to justify the work
 involved.


Thanks for a quite awesome explanation! I see, the thing that is
obvious for Xen-era hosting (more vCPUs means more power) is not an
obvious thing for myself. I agree with the fact that less count of
more powerful cores is always preferable over a large set of 'weak on
average' cores with the approach I proposed. The thing that is still
confusing is that the one should mind *three* exact things while
setting a limit in a current scheme - real or HT core count, the VM`
core count and the quota to period ratio itself to determine an upper
cap for a designated VM` consumption, and it would be even more
confusing when we will talk for a share ratios - for me, it is
completely unclear how two VMs with 2:1 share ratio for both vCPUs and
emulator would behave, will the emulator thread starve first on a CPU
congestion or vice-versa, will the many vCPU processes with equal
share to an emulator make enough influence inside a capped node to
displace the actual available bandwidths from 2:1, will the guest
emulator

Re: [libvirt] Overhead for a default cpu cg placement scheme

2015-06-17 Thread Andrey Korolyov
On Thu, Jun 11, 2015 at 4:30 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Thu, Jun 11, 2015 at 04:24:18PM +0300, Andrey Korolyov wrote:
 On Thu, Jun 11, 2015 at 4:13 PM, Daniel P. Berrange berra...@redhat.com 
 wrote:
  On Thu, Jun 11, 2015 at 04:06:59PM +0300, Andrey Korolyov wrote:
  On Thu, Jun 11, 2015 at 2:33 PM, Daniel P. Berrange berra...@redhat.com 
  wrote:
   On Thu, Jun 11, 2015 at 02:16:50PM +0300, Andrey Korolyov wrote:
   On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange 
   berra...@redhat.com wrote:
On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote:
Hi Daniel,
   
would it possible to adopt an optional tunable for a virCgroup
mechanism which targets to a disablement of a nested (per-thread)
cgroup creation? Those are bringing visible overhead for 
many-threaded
guest workloads, almost 5% in non-congested host CPU state, 
primarily
because the host scheduler should make a much more decisions with
those cgroups than without them. We also experienced a lot of host
lockups with currently exploited cgroup placement and disabled 
nested
behavior a couple of years ago. Though the current patch is simply
carves out the mentioned behavior, leaving only top-level 
per-machine
cgroups, it can serve for an upstream after some adaptation, that`s
why I`m asking about a chance of its acceptance. This message is a
kind of 'request of a feature', it either can be accepted/dropped 
from
our side or someone may give a hand and redo it from scratch. The
detailed benchmarks are related to a host 3.10.y, if anyone is
interested in the numbers for latest stable, I can update those.
   
When you say nested cgroup creation, as you referring to the modern
libvirt hierarchy, or the legacy hierarchy - as described here:
   
  http://libvirt.org/cgroups.html
   
The current libvirt setup used for a year or so now is much shallower
than previously, to the extent that we'd consider performance 
problems
with it to be the job of the kernel to fix.
  
   Thanks, I`m referring to a 'new nested' hiearchy for an overhead
   mentioned above. The host crashes I mentioned happened with old
   hierarchy back ago, forgot to mention this. Despite the flattening of
   the topo for the current scheme it should be possible to disable fine
   group creation for the VM threads for some users who don`t need
   per-vcpu cpu pinning/accounting (though overhead caused by a placement
   for cpu cgroup, not by accounting/pinning ones, I`m assuming equal
   distribution with such disablement for all nested-aware cgroup types),
   that`s the point for now.
  
   Ok, so the per-vCPU cgroups are used for a couple of things
  
- Setting scheduler tunables - period/quota/shares/etc
- Setting CPU pinning
- Setting NUMA memory pinning
  
   In addition to the per-VCPU cgroup, we have one cgroup fr each
   I/O thread, and also one more for general QEMU emulator threads.
  
   In the case of CPU pinning we already have automatic fallback to
   sched_setaffinity if the CPUSET controller isn't available.
  
   We could in theory start off without the per-vCPU/emulator/I/O
   cgroups and only create them as  when the feature is actually
   used. The concern I would have though is that changing the cgroups
   layout on the fly may cause unexpected sideeffects in behaviour of
   the VM. More critically, there would be alot of places in the code
   where we would need to deal with this which could hurt maintainability.
  
   How confident are you that the performance problems you see are inherant
   to the actual use of the cgroups, and not instead as a result of some
   particular bad choice of default parameters we might have left in the
   cgroups ?  In general I'd have a desire to try to work to eliminate the
   perf impact before we consider the complexity of disabling this feature
  
   Regards,
   Daniel
 
  Hm, what are you proposing to begin with in a testing terms? By my
  understanding the excessive cgroup usage along with small scheduler
  quanta *will* lead to some overhead anyway. Let`s look at the numbers
  which I would bring tomorrow, the mentioned five percents was catched
  on a guest 'perf numa xxx' for a different kind of mappings and host
  behavior (post-3.8): memory automigration on/off, kind of 'numa
  passthrough', like grouping vcpu threads according to the host and
  emulated guest NUMA topologies, totally scattered and unpinned threads
  within a single and within a multiple NUMA nodes. As the result for
  3.10.y, there was a five-percent difference between best-performing
  case with thread-level cpu cgroups and a 'totally scattered' case on a
  simple mid-range two-headed node. If you think that the choice of an
  emulated workload is wrong, please let me know, I was afraid that the
  non-synthetic workload in the guest may suffer from a range of a side
  factors and therefore

Re: [libvirt] [Qemu-devel] incorrect memory size inside vm

2015-06-17 Thread Andrey Korolyov
On Thu, Jun 18, 2015 at 12:21 AM, Vasiliy Tolstov v.tols...@selfip.ru wrote:
 2015-06-17 19:26 GMT+03:00 Vasiliy Tolstov v.tols...@selfip.ru:
 This is band news =( i have debian wheezy that have old kernel...


 Does it possible to get proper results with balloon ? For example by
 patching qemu or something like this?



Yes, but I`m afraid that I don`t fully understand why do you need this
when pure hotplug mechanism is available, aside may be nice memory
stats from balloon and easy-to-use deflation. Just populate a couple
of static dimms with small enough 'base' e820 memory and use  balloon
on this setup, you`ll get the reserved memory footprint as small as it
would be in setup with equal overall amount of memory populated via
BIOS. For example, you may use -m 128 ... {handful amount of memory
placed in memory slots} setup to achieve the thing you want.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] incorrect memory size inside vm

2015-06-17 Thread Andrey Korolyov
On Thu, Jun 18, 2015 at 1:44 AM, Vasiliy Tolstov v.tols...@selfip.ru wrote:
 2015-06-18 1:40 GMT+03:00 Andrey Korolyov and...@xdel.ru:

 Yes, but I`m afraid that I don`t fully understand why do you need this
 when pure hotplug mechanism is available, aside may be nice memory
 stats from balloon and easy-to-use deflation. Just populate a couple
 of static dimms with small enough 'base' e820 memory and use  balloon
 on this setup, you`ll get the reserved memory footprint as small as it
 would be in setup with equal overall amount of memory populated via
 BIOS. For example, you may use -m 128 ... {handful amount of memory
 placed in memory slots} setup to achieve the thing you want.


 I have debian wheezy guests with 3.4 kernels (or 3.2..) and many
 others like 32 centos 6, opensue , ubuntu, and other.
 Does memory hotplug works with this distros (kernels)?


Whoosh... technically it is possible but it would be an incompatible
fork for the upstreams for both SeaBIOS and Qemu, because the generic
way of plugging DIMMs in is available down to at least generic 2.6.32.
Except may be Centos where broken kabi would bring great consequences,
it may be better to just provide a backport repository with newer
kernels, but it doesn`t sound very optimistic. For the history
records, the initial hotplug support proposal provided by Vasilis
Liaskovitis a couple of years ago worked in an exact way you are
suggesting to, but its resurrection would mean emulator and rom code
alteration, as I said above.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] incorrect memory size inside vm

2015-06-17 Thread Andrey Korolyov
On Wed, Jun 17, 2015 at 4:35 PM, Vasiliy Tolstov v.tols...@selfip.ru wrote:
 Hi. I have issue with incorrect memory side inside vm. I'm try utilize
 memory balloon (not memory hotplug, because i have guest without
 memory hotplug (may be)).

 When domain started with static memory all works fine, but then i'm
 specify in libvirt
 memory = 16384 , maxMemory = 16384 and currentMemory = 1024, guest in
 f/rpoc/meminfo says that have only 603608 Kb memory. Then i set memory
 via virsh setmem to 2Gb, guest see only 1652184 Kb memory.

 software versions
 libvirt: 1.2.10
 qemu: 2.3.0
 Guest OS: centos 6.

 qemu.log:
 LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
 HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/kvm
 -name 26543 -S -machine pc-i440fx-1.7,accel=kvm,usb=off -m 1024
 -realtime mlock=off -smp 1,maxcpus=4,sockets=4,cores=1,threads=1 -uuid
 4521fb01-c2ca-4269-d2d6-035fd910 -no-user-config -nodefaults
 -chardev 
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/26543.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
 -no-shutdown -boot strict=on -device
 piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
 virtio-scsi-pci,id=scsi0,num_queues=1,bus=pci.0,addr=0x4 -device
 virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive
 file=/dev/vg4/26543,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,discard=unmap,aio=native,iops=5000
 -device 
 scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
 -drive if=none,id=drive-scsi0-0-1-0,readonly=on,format=raw -device
 scsi-cd,bus=scsi0.0,channel=0,scsi-id=1,lun=0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0
 -netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=52 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:34:f7,bus=pci.0,addr=0x3,rombar=0
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -chardev
 socket,id=charchannel0,path=/var/lib/libvirt/qemu/26543.agent,server,nowait
 -device 
 virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -device usb-mouse,id=input0 -device usb-kbd,id=input1 -vnc
 [::]:8,password -device VGA,id=video0,bus=pci.0,addr=0x2 -device
 virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -object
 rng-random,id=rng0,filename=/dev/random -device
 virtio-rng-pci,rng=rng0,max-bytes=1024,period=2000,bus=pci.0,addr=0x7
 -msg timestamp=on

 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru



The rest of visible memory is eaten by reserved kernel areas, for us
this was a main reason to switch to a hotplug a couple of years ago.
You would not be able to scale a VM by an order of magnitude with
regular balloon mechanism without mentioned impact, unfortunately.
Igor Mammedov posted hotplug-related patches for 2.6.32 a while ago,
though RHEL6 never adopted them by some reason.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] incorrect memory size inside vm

2015-06-17 Thread Andrey Korolyov
On Wed, Jun 17, 2015 at 6:33 PM, Vasiliy Tolstov v.tols...@selfip.ru wrote:
 2015-06-17 17:09 GMT+03:00 Andrey Korolyov and...@xdel.ru:
 The rest of visible memory is eaten by reserved kernel areas, for us
 this was a main reason to switch to a hotplug a couple of years ago.
 You would not be able to scale a VM by an order of magnitude with
 regular balloon mechanism without mentioned impact, unfortunately.
 Igor Mammedov posted hotplug-related patches for 2.6.32 a while ago,
 though RHEL6 never adopted them by some reason.


 Hmm.. Thanks for info, from what version of kernel memory hotplug works?

 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru

Currently QEMU memory hotplug should work with 3.8 and onwards.
Mentioned patches are an adaptation for an older frankenkernel of 3.8`
functionality.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] Overhead for a default cpu cg placement scheme

2015-06-11 Thread Andrey Korolyov
Hi Daniel,

would it possible to adopt an optional tunable for a virCgroup
mechanism which targets to a disablement of a nested (per-thread)
cgroup creation? Those are bringing visible overhead for many-threaded
guest workloads, almost 5% in non-congested host CPU state, primarily
because the host scheduler should make a much more decisions with
those cgroups than without them. We also experienced a lot of host
lockups with currently exploited cgroup placement and disabled nested
behavior a couple of years ago. Though the current patch is simply
carves out the mentioned behavior, leaving only top-level per-machine
cgroups, it can serve for an upstream after some adaptation, that`s
why I`m asking about a chance of its acceptance. This message is a
kind of 'request of a feature', it either can be accepted/dropped from
our side or someone may give a hand and redo it from scratch. The
detailed benchmarks are related to a host 3.10.y, if anyone is
interested in the numbers for latest stable, I can update those.

Thanks!

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Overhead for a default cpu cg placement scheme

2015-06-11 Thread Andrey Korolyov
On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote:
 Hi Daniel,

 would it possible to adopt an optional tunable for a virCgroup
 mechanism which targets to a disablement of a nested (per-thread)
 cgroup creation? Those are bringing visible overhead for many-threaded
 guest workloads, almost 5% in non-congested host CPU state, primarily
 because the host scheduler should make a much more decisions with
 those cgroups than without them. We also experienced a lot of host
 lockups with currently exploited cgroup placement and disabled nested
 behavior a couple of years ago. Though the current patch is simply
 carves out the mentioned behavior, leaving only top-level per-machine
 cgroups, it can serve for an upstream after some adaptation, that`s
 why I`m asking about a chance of its acceptance. This message is a
 kind of 'request of a feature', it either can be accepted/dropped from
 our side or someone may give a hand and redo it from scratch. The
 detailed benchmarks are related to a host 3.10.y, if anyone is
 interested in the numbers for latest stable, I can update those.

 When you say nested cgroup creation, as you referring to the modern
 libvirt hierarchy, or the legacy hierarchy - as described here:

   http://libvirt.org/cgroups.html

 The current libvirt setup used for a year or so now is much shallower
 than previously, to the extent that we'd consider performance problems
 with it to be the job of the kernel to fix.

 Regards,
 Daniel
 --


Thanks, I`m referring to a 'new nested' hiearchy for an overhead
mentioned above. The host crashes I mentioned happened with old
hierarchy back ago, forgot to mention this. Despite the flattening of
the topo for the current scheme it should be possible to disable fine
group creation for the VM threads for some users who don`t need
per-vcpu cpu pinning/accounting (though overhead caused by a placement
for cpu cgroup, not by accounting/pinning ones, I`m assuming equal
distribution with such disablement for all nested-aware cgroup types),
that`s the point for now.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Overhead for a default cpu cg placement scheme

2015-06-11 Thread Andrey Korolyov
On Thu, Jun 11, 2015 at 2:33 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Thu, Jun 11, 2015 at 02:16:50PM +0300, Andrey Korolyov wrote:
 On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange berra...@redhat.com 
 wrote:
  On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote:
  Hi Daniel,
 
  would it possible to adopt an optional tunable for a virCgroup
  mechanism which targets to a disablement of a nested (per-thread)
  cgroup creation? Those are bringing visible overhead for many-threaded
  guest workloads, almost 5% in non-congested host CPU state, primarily
  because the host scheduler should make a much more decisions with
  those cgroups than without them. We also experienced a lot of host
  lockups with currently exploited cgroup placement and disabled nested
  behavior a couple of years ago. Though the current patch is simply
  carves out the mentioned behavior, leaving only top-level per-machine
  cgroups, it can serve for an upstream after some adaptation, that`s
  why I`m asking about a chance of its acceptance. This message is a
  kind of 'request of a feature', it either can be accepted/dropped from
  our side or someone may give a hand and redo it from scratch. The
  detailed benchmarks are related to a host 3.10.y, if anyone is
  interested in the numbers for latest stable, I can update those.
 
  When you say nested cgroup creation, as you referring to the modern
  libvirt hierarchy, or the legacy hierarchy - as described here:
 
http://libvirt.org/cgroups.html
 
  The current libvirt setup used for a year or so now is much shallower
  than previously, to the extent that we'd consider performance problems
  with it to be the job of the kernel to fix.

 Thanks, I`m referring to a 'new nested' hiearchy for an overhead
 mentioned above. The host crashes I mentioned happened with old
 hierarchy back ago, forgot to mention this. Despite the flattening of
 the topo for the current scheme it should be possible to disable fine
 group creation for the VM threads for some users who don`t need
 per-vcpu cpu pinning/accounting (though overhead caused by a placement
 for cpu cgroup, not by accounting/pinning ones, I`m assuming equal
 distribution with such disablement for all nested-aware cgroup types),
 that`s the point for now.

 Ok, so the per-vCPU cgroups are used for a couple of things

  - Setting scheduler tunables - period/quota/shares/etc
  - Setting CPU pinning
  - Setting NUMA memory pinning

 In addition to the per-VCPU cgroup, we have one cgroup fr each
 I/O thread, and also one more for general QEMU emulator threads.

 In the case of CPU pinning we already have automatic fallback to
 sched_setaffinity if the CPUSET controller isn't available.

 We could in theory start off without the per-vCPU/emulator/I/O
 cgroups and only create them as  when the feature is actually
 used. The concern I would have though is that changing the cgroups
 layout on the fly may cause unexpected sideeffects in behaviour of
 the VM. More critically, there would be alot of places in the code
 where we would need to deal with this which could hurt maintainability.

 How confident are you that the performance problems you see are inherant
 to the actual use of the cgroups, and not instead as a result of some
 particular bad choice of default parameters we might have left in the
 cgroups ?  In general I'd have a desire to try to work to eliminate the
 perf impact before we consider the complexity of disabling this feature

 Regards,
 Daniel

Hm, what are you proposing to begin with in a testing terms? By my
understanding the excessive cgroup usage along with small scheduler
quanta *will* lead to some overhead anyway. Let`s look at the numbers
which I would bring tomorrow, the mentioned five percents was catched
on a guest 'perf numa xxx' for a different kind of mappings and host
behavior (post-3.8): memory automigration on/off, kind of 'numa
passthrough', like grouping vcpu threads according to the host and
emulated guest NUMA topologies, totally scattered and unpinned threads
within a single and within a multiple NUMA nodes. As the result for
3.10.y, there was a five-percent difference between best-performing
case with thread-level cpu cgroups and a 'totally scattered' case on a
simple mid-range two-headed node. If you think that the choice of an
emulated workload is wrong, please let me know, I was afraid that the
non-synthetic workload in the guest may suffer from a range of a side
factors and therefore chose perf for this task.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Overhead for a default cpu cg placement scheme

2015-06-11 Thread Andrey Korolyov
On Thu, Jun 11, 2015 at 4:13 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Thu, Jun 11, 2015 at 04:06:59PM +0300, Andrey Korolyov wrote:
 On Thu, Jun 11, 2015 at 2:33 PM, Daniel P. Berrange berra...@redhat.com 
 wrote:
  On Thu, Jun 11, 2015 at 02:16:50PM +0300, Andrey Korolyov wrote:
  On Thu, Jun 11, 2015 at 2:09 PM, Daniel P. Berrange berra...@redhat.com 
  wrote:
   On Thu, Jun 11, 2015 at 01:50:24PM +0300, Andrey Korolyov wrote:
   Hi Daniel,
  
   would it possible to adopt an optional tunable for a virCgroup
   mechanism which targets to a disablement of a nested (per-thread)
   cgroup creation? Those are bringing visible overhead for many-threaded
   guest workloads, almost 5% in non-congested host CPU state, primarily
   because the host scheduler should make a much more decisions with
   those cgroups than without them. We also experienced a lot of host
   lockups with currently exploited cgroup placement and disabled nested
   behavior a couple of years ago. Though the current patch is simply
   carves out the mentioned behavior, leaving only top-level per-machine
   cgroups, it can serve for an upstream after some adaptation, that`s
   why I`m asking about a chance of its acceptance. This message is a
   kind of 'request of a feature', it either can be accepted/dropped from
   our side or someone may give a hand and redo it from scratch. The
   detailed benchmarks are related to a host 3.10.y, if anyone is
   interested in the numbers for latest stable, I can update those.
  
   When you say nested cgroup creation, as you referring to the modern
   libvirt hierarchy, or the legacy hierarchy - as described here:
  
 http://libvirt.org/cgroups.html
  
   The current libvirt setup used for a year or so now is much shallower
   than previously, to the extent that we'd consider performance problems
   with it to be the job of the kernel to fix.
 
  Thanks, I`m referring to a 'new nested' hiearchy for an overhead
  mentioned above. The host crashes I mentioned happened with old
  hierarchy back ago, forgot to mention this. Despite the flattening of
  the topo for the current scheme it should be possible to disable fine
  group creation for the VM threads for some users who don`t need
  per-vcpu cpu pinning/accounting (though overhead caused by a placement
  for cpu cgroup, not by accounting/pinning ones, I`m assuming equal
  distribution with such disablement for all nested-aware cgroup types),
  that`s the point for now.
 
  Ok, so the per-vCPU cgroups are used for a couple of things
 
   - Setting scheduler tunables - period/quota/shares/etc
   - Setting CPU pinning
   - Setting NUMA memory pinning
 
  In addition to the per-VCPU cgroup, we have one cgroup fr each
  I/O thread, and also one more for general QEMU emulator threads.
 
  In the case of CPU pinning we already have automatic fallback to
  sched_setaffinity if the CPUSET controller isn't available.
 
  We could in theory start off without the per-vCPU/emulator/I/O
  cgroups and only create them as  when the feature is actually
  used. The concern I would have though is that changing the cgroups
  layout on the fly may cause unexpected sideeffects in behaviour of
  the VM. More critically, there would be alot of places in the code
  where we would need to deal with this which could hurt maintainability.
 
  How confident are you that the performance problems you see are inherant
  to the actual use of the cgroups, and not instead as a result of some
  particular bad choice of default parameters we might have left in the
  cgroups ?  In general I'd have a desire to try to work to eliminate the
  perf impact before we consider the complexity of disabling this feature
 
  Regards,
  Daniel

 Hm, what are you proposing to begin with in a testing terms? By my
 understanding the excessive cgroup usage along with small scheduler
 quanta *will* lead to some overhead anyway. Let`s look at the numbers
 which I would bring tomorrow, the mentioned five percents was catched
 on a guest 'perf numa xxx' for a different kind of mappings and host
 behavior (post-3.8): memory automigration on/off, kind of 'numa
 passthrough', like grouping vcpu threads according to the host and
 emulated guest NUMA topologies, totally scattered and unpinned threads
 within a single and within a multiple NUMA nodes. As the result for
 3.10.y, there was a five-percent difference between best-performing
 case with thread-level cpu cgroups and a 'totally scattered' case on a
 simple mid-range two-headed node. If you think that the choice of an
 emulated workload is wrong, please let me know, I was afraid that the
 non-synthetic workload in the guest may suffer from a range of a side
 factors and therefore chose perf for this task.

 Benchmarking isn't my area of expertize, but you should be able to just
 disable the CPUSET controller entirely in qemu.conf. If we got some
 comparative results for with  without CPUSET that'd be interesting
 place to start

[libvirt] Adding timestamps for all emulator output

2015-02-26 Thread Andrey Korolyov
Hello,

I think it would be useful if libvirt will be able to prefix all
messages from emulator pipes with the date stamping, for example I am
trying to catch very rare and non-fatal race with

virtio-serial-bus: Guest failure in adding device virtio-serial0.0

which is specific to the Windows guests on qemu-kvm. I have supporting
infrastructure which can tell me exact times for every action upon
this VM, but as bug is not frequent, the continuous monitoring looks
like an overkill for a desired goal - find a correlation between
emulator event and message from stderr. Patching emulator itself is
barely an option, as it is ugly, requires refresh of the running code
via loopback migration and completely non-universal, as every
hypervisor should be modified seperately.

I`d highly appreciate positive consideration for such functionality
addition or can write a patch shortly, if needed. Thanks!

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Adding timestamps for all emulator output

2015-02-26 Thread Andrey Korolyov
On Thu, Feb 26, 2015 at 5:36 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Thu, Feb 26, 2015 at 06:29:49PM +0400, Andrey Korolyov wrote:
 Hello,

 I think it would be useful if libvirt will be able to prefix all
 messages from emulator pipes with the date stamping, for example I am
 trying to catch very rare and non-fatal race with

 virtio-serial-bus: Guest failure in adding device virtio-serial0.0

 which is specific to the Windows guests on qemu-kvm. I have supporting
 infrastructure which can tell me exact times for every action upon
 this VM, but as bug is not frequent, the continuous monitoring looks
 like an overkill for a desired goal - find a correlation between
 emulator event and message from stderr. Patching emulator itself is
 barely an option, as it is ugly, requires refresh of the running code
 via loopback migration and completely non-universal, as every
 hypervisor should be modified seperately.

 I`d highly appreciate positive consideration for such functionality
 addition or can write a patch shortly, if needed. Thanks!

 Just set 'log_timestamp' in /etc/libvirt/qemu.conf and QEMU will then
 include a timestamp in any messages it writes to the stderr log file.



Thanks Daniel, this will definitely help me! Anyway, is the more
generic way to add timestamps in libvirt for any given hypervisor is
not created by a reason, say, incompatibility at some point?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] strange behavior when using iotune

2014-11-24 Thread Andrey Korolyov
On Mon, Nov 24, 2014 at 3:02 PM, Vasiliy Tolstov v.tols...@selfip.ru wrote:
 Hi. I'm try to shape disk via total_iops_sec in libvirt
 libvirt 1.2.10
 qemu 2.0.0

 Firstly when i'm run vm with predefined
 total_iops_sec5000/total_iops_sec i have around 11000 iops (dd
 if=/dev/sda bs=512K of=/dev/null)
 After that i'm try to set via virsh --total_iops_sec 10 to want to
 minimize io, but nothing changed.
 After that i'm reboot vm with total_iops_sec10/total_iops_sec and
 get very slow io, but this expected. But libvirt says that i have is
 around 600 iops.

 My questions is - why i can't change total_iops_sec in run-time, and
 why entered values does not equal values getting from libvirt ?

 Thanks for any suggestions and any help.

 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru
 jabber: v...@selfip.ru


Hello Vasiliy,

can you please check actual values via qemu-monitor-command domid '{
execute: query-block}', just to be sure to pin the potential
problem to the emulator itself?

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] strange behavior when using iotune

2014-11-24 Thread Andrey Korolyov
On Mon, Nov 24, 2014 at 5:09 PM, Vasiliy Tolstov v.tols...@selfip.ru wrote:
 2014-11-24 16:57 GMT+03:00 Andrey Korolyov and...@xdel.ru:
 Hello Vasiliy,

 can you please check actual values via qemu-monitor-command domid '{
 execute: query-block}', just to be sure to pin the potential
 problem to the emulator itself?

 virsh qemu-monitor-command 11151 '{ execute: query-block}' | jq '.'
 {
   return: [
 {
   io-status: ok,
   device: drive-scsi0-0-0-0,
   locked: false,
   removable: false,
   inserted: {
 iops_rd: 0,
 image: {
   virtual-size: 21474836480,
   filename: /dev/vg3/11151,
   format: raw,
   actual-size: 0,
   dirty-flag: false
 },
 iops_wr: 0,
 ro: false,
 backing_file_depth: 0,
 drv: raw,
 iops: 5000,
 bps_wr: 0,
 encrypted: false,
 bps: 0,
 bps_rd: 0,
 iops_max: 500,
 file: /dev/vg3/11151,
 encryption_key_missing: false
   },
   type: unknown
 }
   ],
   id: libvirt-22
 }

 i'm used this site
 http://www.ssdfreaks.com/content/599/how-to-convert-mbps-to-iops-or-calculate-iops-from-mbs
 root@11151:~# dd if=/dev/sda bs=4K of=/dev/null
 5242880+0 records in
 5242880+0 records out
 21474836480 bytes (21 GB) copied, 45.2557 s, 475 MB/s

 so in case of 5000 iops i need to get only 19-20 MB/s


 --
 Vasiliy Tolstov,
 e-mail: v.tols...@selfip.ru
 jabber: v...@selfip.ru

I am not sure for friendliness of possible dd interpretations for new
leaky bucket mechanism, as its results can be a little confusing even
for fio (all operations which are above the limit for long-running
test will have 250ms latency, putting down score numbers in most
popular tests like UnixBench), also w/o sync options these results are
almost meaningless. May be fio with direct=1|fsync=1 (for fs) will
give a more appropriate numbers in your case.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] libvirt accidentally destroys guests after being restarted

2013-03-02 Thread Andrey Korolyov
Sorry in advance for possible top-post, I`m not able to add proper
messageid here.

Does it ever occur if you don't run with DHCP snooping enabled?

  Stefan

No, please disregard those errors. We don`t run DHCP snooping/IP
learning on interfaces but only modified clean-traffic rules, current
issue almost for sure does not have relation to it.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] libvirt accidentally destroys guests after being restarted

2013-03-02 Thread Andrey Korolyov
Stefan,

qemu-1.1.2 with dfsg-5 patchset
http://ftp.de.debian.org/debian/pool/main/q/qemu/qemu_1.1.2+dfsg-5.debian.tar.gz
for example

VM` xml is quite simple, qemu64 cpu model, pc-1.1 machine model, one
virtio disk, two bridged virtio NICs, serial and virtio-serial ptys.

On Sat, Mar 2, 2013 at 7:26 PM, Stefan Berger
stef...@linux.vnet.ibm.com wrote:
 On 03/02/2013 09:39 AM, Andrey Korolyov wrote:

 Sorry in advance for possible top-post, I`m not able to add proper
 messageid here.

 Does it ever occur if you don't run with DHCP snooping enabled?

   Stefan

 No, please disregard those errors. We don`t run DHCP snooping/IP
 learning on interfaces but only modified clean-traffic rules, current
 issue almost for sure does not have relation to it.

 Ok. Nevertheless I may convert the noise from NWFilter DHCP Snooping to only
 become active in case debugging is enabled.

 Other information that may be of interest is the checkout revision of
 libvirt 'beyond' version 1.0.2. What version of QEMU are you using? And
 possibly what is the XML your VMs are using? At least that's the info I
 would use to start debugging this, though 'others' may have something in the
 back of their minds...

Stefan


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list