Re: [Qemu-devel] Overcommiting cpu results in all vms offline
Hi Stefan, Indeed, this is really strange. Could it be a systemd bug ? (as you use proxmox, and proxmox use systemd scope to launch vm ?) - Mail original - De: "Stefan Priebe, Profihost AG" À: "qemu-devel" Envoyé: Dimanche 16 Septembre 2018 15:30:34 Objet: [Qemu-devel] Overcommiting cpu results in all vms offline Hello, while overcommiting cpu I had several situations where all vms gone offline while two vms saturated all cores. I believed all vms would stay online but would just not be able to use all their cores? My original idea was to automate live migration on high host load to move vms to another node but that makes only sense if all vms stay online. Is this expected? Anything special needed to archive this? Greets, Stefan
Re: [Qemu-devel] [Bug 1775702] Re: High host CPU load and slower guest after upgrade guest OS Windows 10 to ver 1803
Hi, proxmox users have reported this bug https://forum.proxmox.com/threads/high-cpu-load-for-windows-10-guests-when-idle.44531/#post-213876 hv_synic && hv_stimer hyperv enlightments fix it (seem to be related to some hpet change in windows) - Mail original - De: "Lemos Lemosov" <1775...@bugs.launchpad.net> À: "qemu-devel" Envoyé: Mercredi 1 Août 2018 08:43:46 Objet: [Qemu-devel] [Bug 1775702] Re: High host CPU load and slower guest after upgrade guest OS Windows 10 to ver 1803 This bug affect me -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1775702 Title: High host CPU load and slower guest after upgrade guest OS Windows 10 to ver 1803 Status in QEMU: New Bug description: After upgrading Windows 10 guest to version 1803, guests VM runs slower and there is high host CPU load even when guest is almost idle. Did not happened with windows 10 up to version 1709. See my 1st report here: https://askubuntu.com/questions/1033985/kvm-high-host-cpu-load-after-upgrading-vm-to-windows-10-1803 Another user report is here: https://lime-technology.com/forums/topic/71479-windows-10-vm-cpu-usage/ Tested on: Ubuntu 16.04 with qemu 2.5.0 and i3-3217U, Arch with qemu 2.12 i5-7200U, Ubuntu 18.04 qemu 2.11.1 AMD FX-4300. All three platform showing the same slowdown and higher host cpu load with windows 10 1803 VM compared to windows 10 1709 VM. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1775702/+subscriptions
Re: [Qemu-devel] Multiqueue block layer
>>Heh. I have stopped pushing my patches (and scratched a few itches with >>patchew instead) because I'm still a bit burned out from recent KVM >>stuff, but this may be the injection of enthusiasm that I needed. :) Thanks Paolo for your great work on multiqueue, that's a lot of work since the last years ! I cross my fingers for 2018, and wait for ceph rbd block driver multiqueue implementation :) Regards, Alexandre - Mail original - De: "pbonzini" À: "Stefan Hajnoczi" , "qemu-devel" , "qemu-block" Cc: "Kevin Wolf" , "Fam Zheng" Envoyé: Lundi 19 Février 2018 19:03:19 Objet: Re: [Qemu-devel] Multiqueue block layer On 18/02/2018 19:20, Stefan Hajnoczi wrote: > Paolo's patches have been getting us closer to multiqueue block layer > support but there is a final set of changes required that has become > clearer to me just recently. I'm curious if this matches Paolo's > vision and whether anyone else has comments. > > We need to push the AioContext lock down into BlockDriverState so that > thread-safety is not tied to a single AioContext but to the > BlockDriverState itself. We also need to audit block layer code to > identify places that assume everything is run from a single > AioContext. This is mostly done already. Within BlockDriverState dirty_bitmap_mutex, reqs_lock and the BQL is good enough in many cases. Drivers already have their mutex. > After this is done the final piece is to eliminate > bdrv_set_aio_context(). BlockDriverStates should not be associated > with an AioContext. Instead they should use whichever AioContext they > are invoked under. The current thread's AioContext can be fetched > using qemu_get_current_aio_context(). This is either the main loop > AioContext or an IOThread AioContext. > > The .bdrv_attach/detach_aio_context() callbacks will no longer be > necessary in a world where block driver code is thread-safe and any > AioContext can be used. This is not entirely possible. In particular, network drivers still have a "home context" which is where the file descriptor callbacks are attached to. They could still dispatch I/O from any thread in a multiqueue setup. This is the remaining intermediate step between "no AioContext lock" and "multiqueue". > bdrv_drain_all() and friends do not require extensive modifications > because the bdrv_wakeup() mechanism already works properly when there > are multiple IOThreads involved. Yes, this is already done indeed. > Block jobs no longer need to be in the same AioContext as the > BlockDriverState. For simplicity we may choose to always run them in > the main loop AioContext by default. This may have a performance > impact on tight loops like bdrv_is_allocated() and the initial > mirroring phase, but maybe not. > > The upshot of all this is that bdrv_set_aio_context() goes away while > all block driver code needs to be more aware of thread-safety. It can > no longer assume that everything is called from one AioContext. Correct. > We should optimize file-posix.c and qcow2.c for maximum parallelism > using fine-grained locks and other techniques. The remaining block > drivers can use one CoMutex per BlockDriverState. Even better: there is one thread pool and linux-aio context per I/O thread, file-posix.c should just submit I/O to the current thread with no locking whatsoever. There is still reqs_lock, but that can be optimized easily (see http://lists.gnu.org/archive/html/qemu-devel/2017-04/msg03323.html; now that we have QemuLockable, reqs_lock could also just become a QemuSpin). qcow2.c could be adjusted to use rwlocks. > I'm excited that we're relatively close to multiqueue now. I don't > want to jinx it by saying 2018 is the year of the multiqueue block > layer, but I'll say it anyway :). Heh. I have stopped pushing my patches (and scratched a few itches with patchew instead) because I'm still a bit burned out from recent KVM stuff, but this may be the injection of enthusiasm that I needed. :) Actually, I'd be content with removing the AioContext lock in the first half of 2018. 1/3rd of that is gone already---doh! But we're actually pretty close, thanks to you and all the others who have helped reviewing the past 100 or so patches! Paolo
Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
Sorry, I just find that the problem is in our proxmox implementation, as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity. So, not a qemu bug. Regards, Alexandre - Mail original - De: "aderumier" À: "qemu-devel" Envoyé: Mercredi 14 Février 2018 09:54:21 Objet: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11) Hi, I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel. step to reproduce, with 2 disks: 1) launch mirroring job of first disk to remote target nbd.(to qemu running target) 2) wait until is reach ready = 1 , do not complete 3) launch mirroring job of second disk to remote target nbd(to same qemu running target) -> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming. then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error. Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually. Another similar bug report on proxmox bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=1664 Maybe related to this : https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html ? I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11. Best Regards, Alexandre
[Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
Hi, I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel. step to reproduce, with 2 disks: 1) launch mirroring job of first disk to remote target nbd.(to qemu running target) 2) wait until is reach ready = 1 , do not complete 3) launch mirroring job of second disk to remote target nbd(to same qemu running target) -> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming. then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error. Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually. Another similar bug report on proxmox bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=1664 Maybe related to this : https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html ? I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11. Best Regards, Alexandre
Re: [Qemu-devel] [qemu-web PATCH] add a blog post about "Spectre"
Thanks Paolo ! Do we need to update guest kernel too, if qemu use cpumodel=qemu64 ? (For example, I have some very old guests where kernel update is not possible) Regards, Alexandre - Mail original - De: "pbonzini" À: "qemu-devel" Cc: "ehabkost" Envoyé: Jeudi 4 Janvier 2018 18:56:09 Objet: [Qemu-devel] [qemu-web PATCH] add a blog post about "Spectre" --- _posts/2018-01-04-spectre.md | 60 1 file changed, 60 insertions(+) create mode 100644 _posts/2018-01-04-spectre.md diff --git a/_posts/2018-01-04-spectre.md b/_posts/2018-01-04-spectre.md new file mode 100644 index 000..1be86d0 --- /dev/null +++ b/_posts/2018-01-04-spectre.md @@ -0,0 +1,60 @@ +--- +layout: post +title: "QEMU and the Spectre and Meltdown attacks" +date: 2018-01-04 18:00:00 + +author: Paolo Bonzini and Eduardo Habkost +categories: [meltdown, spectre, security, x86] +--- +As you probably know by now, three critical architectural flaws in CPUs have +been recently disclosed that allow user processes to read kernel or hypervisor +memory through cache side-channel attacks. These flaws, collectively +named _Meltdown_ and _Spectre_, affect in one way or another almost +all processors that perform out-of-order execution, including x86 (from +Intel and AMD), POWER, s390 and ARM processors. + +No microcode updates are required to block the _Meltdown_ attack; it is +enough to update the guest operating system to a version that separates +the user and kernel address spaces (known as _page table isolation_ for +the Linux kernel). Therefore, this post will focus on _Spectre_, and +especially on [CVE-2017-5715](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-5715). + +Fixing or mitigating _Spectre_ in general, and CVE-2017-5715 in particular, +requires cooperation between the processor and the operating system kernel or +hypervisor; the processor can be updated through microcode or millicode +patches to provide the required functionality. CVE-2017-5715 allows guests +to read potentially sensitive data from hypervisor memory; however, __patching +the host kernel is sufficient to block this attack__. + +On the other hand, in order to protect the guest kernel from a malicious +userspace, updates are also needed to the guest kernel and, depending on +the processor architecture, to QEMU. Just like on bare-metal, the guest +kernel will use the new functionality provided by the microcode or millicode +updates. When running under a hypervisor, processor emulation is mostly out of +QEMU's scope, so QEMU's role in the fix is small, but nevertheless important. +In the case of KVM: + +* QEMU configures the hypervisor to emulate a specific processor model. +For x86, QEMU has to be aware of new CPUID bits introduced by the microcode +update, and it must provide them to guests depending on how the guest is +configured. + +* upon virtual machine migration, QEMU reads the CPU state on the source +and transmits it to the destination. For x86, QEMU has to be aware of new +model specific registers (MSRs). + +Right now, there are no public patches to KVM that expose the new CPUID bits +and MSRs to the virtual machines, therefore there is no urgent need to update +QEMU; remember that __updating the host kernel is enough to protect the +host from malicious guests__. Nevertheless, updates will be posted to the +qemu-devel mailing list in the next few days, and a 2.11.1 patch release +will be released with the fix. + +As of today, the QEMU project is not aware of whether similar changes will +be required for non-x86 processors. If so, they will also posted to the +mailing list and backported to recent stable releases. + +For more information on the vulnerabilities, please refer to the [Google Security +Blog](https://security.googleblog.com/2018/01/todays-cpu-vulnerability-what-you-need.html) +and [Google Project +Zero](https://googleprojectzero.blogspot.it/2018/01/reading-privileged-memory-with-side.html) +posts on the topic, as well as the [Spectre and Meltdown FAQ](https://meltdownattack.com/#faq). -- 2.14.3
Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches
>>So you need: >>1.) intel / amd cpu microcode update >>2.) qemu update to pass the new MSR and CPU flags from the microcode update >>3.) host kernel update >>4.) guest kernel update are you sure we need to patch guest kernel if we are able to patch qemu ? I have some pretty old guest (linux and windows) If I understand, patching the host kernel, should avoid that a vm is reading memory of another vm. (the most critical) patching the guest kernel, to avoid that a process from the vm have access to memory of another process of same vm. right ? - Mail original - De: "Stefan Priebe, Profihost AG" À: "aderumier" Cc: "qemu-devel" Envoyé: Jeudi 4 Janvier 2018 09:17:41 Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches Am 04.01.2018 um 08:27 schrieb Alexandre DERUMIER: > does somebody have a redhat account to see te content of: > > https://access.redhat.com/solutions/3307851 > "Impacts of CVE-2017-5754, CVE-2017-5753, and CVE-2017-5715 to Red Hat > Virtualization products" i don't have one but the content might be something like this: https://www.suse.com/de-de/support/kb/doc/?id=7022512 So you need: 1.) intel / amd cpu microcode update 2.) qemu update to pass the new MSR and CPU flags from the microcode update 3.) host kernel update 4.) guest kernel update The microcode update and the kernel update is publicly available but i'm missing the qemu one. Greets, Stefan > - Mail original - > De: "aderumier" > À: "Stefan Priebe, Profihost AG" > Cc: "qemu-devel" > Envoyé: Jeudi 4 Janvier 2018 08:24:34 > Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches > >>> Can anybody point me to the relevant qemu patches? > > I don't have find them yet. > > Do you known if a vm using kvm64 cpu model is protected or not ? > > - Mail original - > De: "Stefan Priebe, Profihost AG" > À: "qemu-devel" > Envoyé: Jeudi 4 Janvier 2018 07:27:01 > Objet: [Qemu-devel] CVE-2017-5715: relevant qemu patches > > Hello, > > i've seen some vendors have updated qemu regarding meltdown / spectre. > > f.e.: > > CVE-2017-5715: QEMU was updated to allow passing through new MSR and > CPUID flags from the host VM to the CPU, to allow enabling/disabling > branch prediction features in the Intel CPU. (bsc#1068032) > > Can anybody point me to the relevant qemu patches? > > Thanks! > > Greets, > Stefan >
Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches
does somebody have a redhat account to see te content of: https://access.redhat.com/solutions/3307851 "Impacts of CVE-2017-5754, CVE-2017-5753, and CVE-2017-5715 to Red Hat Virtualization products" - Mail original - De: "aderumier" À: "Stefan Priebe, Profihost AG" Cc: "qemu-devel" Envoyé: Jeudi 4 Janvier 2018 08:24:34 Objet: Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches >>Can anybody point me to the relevant qemu patches? I don't have find them yet. Do you known if a vm using kvm64 cpu model is protected or not ? - Mail original - De: "Stefan Priebe, Profihost AG" À: "qemu-devel" Envoyé: Jeudi 4 Janvier 2018 07:27:01 Objet: [Qemu-devel] CVE-2017-5715: relevant qemu patches Hello, i've seen some vendors have updated qemu regarding meltdown / spectre. f.e.: CVE-2017-5715: QEMU was updated to allow passing through new MSR and CPUID flags from the host VM to the CPU, to allow enabling/disabling branch prediction features in the Intel CPU. (bsc#1068032) Can anybody point me to the relevant qemu patches? Thanks! Greets, Stefan
Re: [Qemu-devel] CVE-2017-5715: relevant qemu patches
>>Can anybody point me to the relevant qemu patches? I don't have find them yet. Do you known if a vm using kvm64 cpu model is protected or not ? - Mail original - De: "Stefan Priebe, Profihost AG" À: "qemu-devel" Envoyé: Jeudi 4 Janvier 2018 07:27:01 Objet: [Qemu-devel] CVE-2017-5715: relevant qemu patches Hello, i've seen some vendors have updated qemu regarding meltdown / spectre. f.e.: CVE-2017-5715: QEMU was updated to allow passing through new MSR and CPUID flags from the host VM to the CPU, to allow enabling/disabling branch prediction features in the Intel CPU. (bsc#1068032) Can anybody point me to the relevant qemu patches? Thanks! Greets, Stefan
Re: [Qemu-devel] dropped pkts with Qemu on tap interace (RX)
Hi Stefan, >>The tap devices on the target vm shows dropped RX packages on BOTH tap >>interfaces - strangely with the same amount of pkts? that's strange indeed. if you tcpdump tap interfaces, do you see incoming traffic only on 1 interface, or both random ? (can you provide the network configuration in the guest for both interfaces ?) I'm seeing that you have enable multiqueue on 1 of the interfaces, do you have setup correctly the multiqueue part inside the guest. do you have enough vcpu to handle all the queues ? - Mail original - De: "Stefan Priebe, Profihost AG" À: "qemu-devel" Envoyé: Mardi 2 Janvier 2018 12:17:29 Objet: [Qemu-devel] dropped pkts with Qemu on tap interace (RX) Hello, currently i'm trying to fix a problem where we have "random" missing packets. We're doing an ssh connect from machine a to machine b every 5 minutes via rsync and ssh. Sometimes it happens that we get this cron message: "Connection to 192.168.0.2 closed by remote host. rsync: connection unexpectedly closed (0 bytes received so far) [sender] rsync error: unexplained error (code 255) at io.c(226) [sender=3.1.2] ssh: connect to host 192.168.0.2 port 22: Connection refused" The tap devices on the target vm shows dropped RX packages on BOTH tap interfaces - strangely with the same amount of pkts? # ifconfig tap317i0; ifconfig tap317i1 tap317i0 Link encap:Ethernet HWaddr 6e:cb:65:94:bb:bf UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:2238445 errors:0 dropped:13159 overruns:0 frame:0 TX packets:9655853 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:177991267 (169.7 MiB) TX bytes:910412749 (868.2 MiB) tap317i1 Link encap:Ethernet HWaddr 96:f8:b5:d0:9a:07 UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1 RX packets:1516085 errors:0 dropped:13159 overruns:0 frame:0 TX packets:1446964 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1597564313 (1.4 GiB) TX bytes:3517734365 (3.2 GiB) Any ideas how to inspect this issue? Greets, Stefan
Re: [Qemu-devel] [PATCH] virtio: fix descriptor counting in virtqueue_pop
Hi, has somebody reviewed this patch ? I'm also able de reproduce the vm crash like the proxmox user. This patch is fixing it for me too. Regards, Alexandre - Mail original - De: "Wolfgang Bumiller" À: "qemu-devel" Cc: "pbonzini" , "Michael S. Tsirkin" Envoyé: Mercredi 20 Septembre 2017 08:09:33 Objet: [Qemu-devel] [PATCH] virtio: fix descriptor counting in virtqueue_pop While changing the s/g list allocation, commit 3b3b0628 also changed the descriptor counting to count iovec entries as split by cpu_physical_memory_map(). Previously only the actual descriptor entries were counted and the split into the iovec happened afterwards in virtqueue_map(). Count the entries again instead to avoid erroneous "Looped descriptor" errors. Reported-by: Hans Middelhoek Link: https://forum.proxmox.com/threads/vm-crash-with-memory-hotplug.35904/ Fixes: 3b3b0628217e ("virtio: slim down allocation of VirtQueueElements") Signed-off-by: Wolfgang Bumiller --- hw/virtio/virtio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 890b4d7eb7..33bb770177 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -834,7 +834,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) int64_t len; VirtIODevice *vdev = vq->vdev; VirtQueueElement *elem = NULL; - unsigned out_num, in_num; + unsigned out_num, in_num, elem_entries; hwaddr addr[VIRTQUEUE_MAX_SIZE]; struct iovec iov[VIRTQUEUE_MAX_SIZE]; VRingDesc desc; @@ -852,7 +852,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) smp_rmb(); /* When we start there are none of either input nor output. */ - out_num = in_num = 0; + out_num = in_num = elem_entries = 0; max = vq->vring.num; @@ -922,7 +922,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz) } /* If we've got too many, that implies a descriptor loop. */ - if ((in_num + out_num) > max) { + if (++elem_entries > max) { virtio_error(vdev, "Looped descriptor"); goto err_undo_map; } -- 2.11.0
Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ?
>>What I don't understand is how do you map device to the blockdev ? Ah,ok, got it. info qtree show me that device drive is now the new nodename ! So it's working fine. Thanks for help. - Mail original - De: "aderumier" À: "Kashyap Chamarthy" Cc: "qemu-devel" Envoyé: Mercredi 19 Avril 2017 15:36:48 Objet: Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ? Ok, I have same result than my test: {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1073741824, "filename": "base.raw", "format": "raw", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "node1", "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "base.qcow2, "encryption_key_missing": false}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1073741824, "filename": "base.qcow2", "format": "file", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block194", "backing_file_depth": 0, "drv": "file", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "base.qcow2", "encryption_key_missing": false}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1073741824, "filename": "target.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 352256, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "foo", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "target.qcow2", "encryption_key_missing": false}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1074135040, "filename": "target.qcow2", "format": "file", "actual-size": 352256, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block008", "backing_file_depth": 0, "drv": "file", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "target.qcow2", "encryption_key_missing": false}]} What I don't understand is how do you map device to the blockdev ? qemu doc said something like -device virtio-blk,drive=blk0 where blk0 is the nodename. but when I blockdev-mirror, the target blockdev has a new nodename. So how does it work ? - Mail original - De: "Kashyap Chamarthy" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 19 Avril 2017 14:58:01 Objet: Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ? On Wed, Apr 19, 2017 at 02:28:46PM +0200, Alexandre DERUMIER wrote: > Thanks I'll try. (I'm on 2.9.0-rc4) > > can you send your initial qemu command line ? $ ~/build/qemu-upstream/x86_64-softmmu/qemu-system-x86_64 \ -display none -nodefconfig -nodefaults -m 512 \ -device virtio-scsi-pci,id=scsi -device virtio-serial-pci \ -blockdev node-name=foo,driver=qcow2,file.driver=file,file.filename=./base.qcow2 \ -monitor stdio -qmp unix:./qmp-sock,server,nowait > also : > > >>"execute":"blockdev-add", > >>"arguments":{ &g
Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ?
Ok, I have same result than my test: {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1073741824, "filename": "base.raw", "format": "raw", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "node1", "backing_file_depth": 0, "drv": "raw", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "base.qcow2, "encryption_key_missing": false}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1073741824, "filename": "base.qcow2", "format": "file", "actual-size": 0, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block194", "backing_file_depth": 0, "drv": "file", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "base.qcow2", "encryption_key_missing": false}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1073741824, "filename": "target.qcow2", "cluster-size": 65536, "format": "qcow2", "actual-size": 352256, "format-specific": {"type": "qcow2", "data": {"compat": "1.1", "lazy-refcounts": false, "refcount-bits": 16, "corrupt": false}}, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "foo", "backing_file_depth": 0, "drv": "qcow2", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "target.qcow2", "encryption_key_missing": false}, {"iops_rd": 0, "detect_zeroes": "off", "image": {"virtual-size": 1074135040, "filename": "target.qcow2", "format": "file", "actual-size": 352256, "dirty-flag": false}, "iops_wr": 0, "ro": false, "node-name": "#block008", "backing_file_depth": 0, "drv": "file", "iops": 0, "bps_wr": 0, "write_threshold": 0, "encrypted": false, "bps": 0, "bps_rd": 0, "cache": {"no-flush": false, "direct": false, "writeback": true}, "file": "target.qcow2", "encryption_key_missing": false}]} What I don't understand is how do you map device to the blockdev ? qemu doc said something like -device virtio-blk,drive=blk0 where blk0 is the nodename. but when I blockdev-mirror, the target blockdev has a new nodename. So how does it work ? - Mail original - De: "Kashyap Chamarthy" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 19 Avril 2017 14:58:01 Objet: Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ? On Wed, Apr 19, 2017 at 02:28:46PM +0200, Alexandre DERUMIER wrote: > Thanks I'll try. (I'm on 2.9.0-rc4) > > can you send your initial qemu command line ? $ ~/build/qemu-upstream/x86_64-softmmu/qemu-system-x86_64 \ -display none -nodefconfig -nodefaults -m 512 \ -device virtio-scsi-pci,id=scsi -device virtio-serial-pci \ -blockdev node-name=foo,driver=qcow2,file.driver=file,file.filename=./base.qcow2 \ -monitor stdio -qmp unix:./qmp-sock,server,nowait > also : > > >>"execute":"blockdev-add", > >>"arguments":{ > >>"driver":"qcow2", > >>"node-name":"node1", Oops, it should be 'node2'. Copy / paste mistake, sorry. > ... > > then > > > >>"execute":"blockdev-mirror", > >>"arguments":{ > >>"device":"foo", > >>"job-id":"job-2", > >>"target":"node2",
Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ?
Thanks I'll try. (I'm on 2.9.0-rc4) can you send your initial qemu command line ? also : >>"execute":"blockdev-add", >>"arguments":{ >>"driver":"qcow2", >>"node-name":"node1", ... then >>"execute":"blockdev-mirror", >>"arguments":{ >>"device":"foo", >>"job-id":"job-2", >>"target":"node2",--> node2 ? >>"sync":"full" - Mail original ----- De: "Kashyap Chamarthy" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 19 Avril 2017 12:43:00 Objet: Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ? On Wed, Apr 19, 2017 at 09:08:20AM +0200, Alexandre DERUMIER wrote: > Hi, > > I'm trying to implement blockdev-mirror, to replace drive-mirror as we > can pass more options with blockdev-mirror. > > > I would like to mirror an attached blockdev to a new blockdev, then > switch at the end of block-job-complete, like for drive-mirror. [...] > blockdev-mirror: > > {"arguments":{"job-id":"drive-virtio0","target":"tempmirror","sync":"full","replaces":"drive-virtio0","device":"drive-virtio0"},"execute":"blockdev-mirror"} > > > (I have try with or without replaces option) > > then query-name-block-nodes, show vm-138-disk-2.raw file on tempmirror > "node-name", and vm-138-disk1.qcow2 on "drive-virtio0" node-name > > I expected that both was switched, like for drive-mirror. For me, 'blockdev-mirror' does do the switch when I issue 'block-job-complete' (similar to 'drive-mirror') The below is my test from Git: I was here (on Git): $ git describe v2.9.0-rc5 --- $ qemu-img create -f qcow2 /export/target2.qcow2 1G Formatting '/export/target2.qcow2', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16 --- QMP> { "execute":"blockdev-add", "arguments":{ "driver":"qcow2", "node-name":"node1", "file":{ "driver":"file", "filename":"/export/target2.qcow2" } } } {"return": {}} --- QMP> {"execute":"query-named-block-nodes"} [...] --- QMP> { "execute":"blockdev-mirror", "arguments":{ "device":"foo", "job-id":"job-2", "target":"node2", "sync":"full" } } {"return": {}} {"timestamp": {"seconds": 1492598410, "microseconds": 35946}, "event": "BLOCK_JOB_READY", "data": {"device": "job-2", "len": 24182784, "offset": 24182784, "speed": 0, "type": "mirror"}} --- QMP> { "execute":"block-job-complete", "arguments":{ "device":"job-2" } } {"return": {}} {"timestamp": {"seconds": 1492598419, "microseconds": 115458}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job-2", "len": 24182784, "offset": 24182784, "speed": 0, "type": "mirror"}} --- [...] -- /kashyap
Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ?
Ok thanks. I'll retry without replaces, but I'm pretty sure It was not working too. Also, I notice something strange, when I define the job-id (which is mandatory), it's replace the device option. (If I set a random job-id name, the mirroring is not working because it don't find the device). - Mail original - De: "Fam Zheng" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 19 Avril 2017 11:02:36 Objet: Re: [Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ? On Wed, 04/19 09:08, Alexandre DERUMIER wrote: > Hi, > > I'm trying to implement blockdev-mirror, to replace drive-mirror as we can > pass more options with blockdev-mirror. > > > I would like to mirror an attached blockdev to a new blockdev, then switch at > the end of block-job-complete, like for drive-mirror. > > > qemu command line (vm-138-disk-1.qcow2 filename on "drive-virtio0" nodename: > > -blockdev > {"node-name":"drive-virtio0","cache":{"direct":true},"driver":"qcow2","file":{"filename":"/var/lib/vz/images/138/vm-138-disk-1.qcow2","aio":"threads","driver":"file"}} > > -device > virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,write-cache=on > > > > add a new blockdev with blockdev-add: (a file vm-138-disk-2.raw has been > created before, I assign it to a "tempmirror" nodename) > > > {"arguments":{"driver":"raw","cache":{"direct":true},"node-name":"tempmirror","file":{"driver":"file","aio":"threads","filename":"/var/lib/vz/images/138/vm-138-disk-2.raw"}},"execute":"blockdev-add"} > > > > blockdev-mirror: > > {"arguments":{"job-id":"drive-virtio0","target":"tempmirror","sync":"full","replaces":"drive-virtio0","device":"drive-virtio0"},"execute":"blockdev-mirror"} > > > (I have try with or without replaces option) > > then query-name-block-nodes, show vm-138-disk-2.raw file on tempmirror > "node-name", and vm-138-disk1.qcow2 on "drive-virtio0" node-name > > > I expected that both was switched, like for drive-mirror. The replace/switch happens when you do block-job-complete. And I think in your use case "replaces" is not needed. That one is for replacing a node under quorum Fam
[Qemu-devel] blockdev-mirror , how to replace old nodename by new nodename ?
Hi, I'm trying to implement blockdev-mirror, to replace drive-mirror as we can pass more options with blockdev-mirror. I would like to mirror an attached blockdev to a new blockdev, then switch at the end of block-job-complete, like for drive-mirror. qemu command line (vm-138-disk-1.qcow2 filename on "drive-virtio0" nodename: -blockdev {"node-name":"drive-virtio0","cache":{"direct":true},"driver":"qcow2","file":{"filename":"/var/lib/vz/images/138/vm-138-disk-1.qcow2","aio":"threads","driver":"file"}} -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,write-cache=on add a new blockdev with blockdev-add: (a file vm-138-disk-2.raw has been created before, I assign it to a "tempmirror" nodename) {"arguments":{"driver":"raw","cache":{"direct":true},"node-name":"tempmirror","file":{"driver":"file","aio":"threads","filename":"/var/lib/vz/images/138/vm-138-disk-2.raw"}},"execute":"blockdev-add"} blockdev-mirror: {"arguments":{"job-id":"drive-virtio0","target":"tempmirror","sync":"full","replaces":"drive-virtio0","device":"drive-virtio0"},"execute":"blockdev-mirror"} (I have try with or without replaces option) then query-name-block-nodes, show vm-138-disk-2.raw file on tempmirror "node-name", and vm-138-disk1.qcow2 on "drive-virtio0" node-name I expected that both was switched, like for drive-mirror. Any idea ? Regards, Alexandre [query-named-block-nodes","arguments":{} $VAR1 = [ { 'bps_wr' => 0, 'cache' => { 'no-flush' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ), 'direct' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ), 'writeback' => $VAR1->[0]{'cache'}{'direct'} }, 'bps' => 0, 'iops' => 0, 'node-name' => 'toto', 'ro' => $VAR1->[0]{'cache'}{'no-flush'}, 'image' => { 'actual-size' => 1074794496, 'filename' => '/var/lib/vz/images/138/vm-138-disk-2.raw', 'dirty-flag' => $VAR1->[0]{'cache'}{'no-flush'}, 'format' => 'raw', 'virtual-size' => 1073741824 }, 'write_threshold' => 0, 'iops_rd' => 0, 'bps_rd' => 0, 'encrypted' => $VAR1->[0]{'cache'}{'no-flush'}, 'encryption_key_missing' => $VAR1->[0]{'cache'}{'no-flush'}, 'backing_file_depth' => 0, 'detect_zeroes' => 'off', 'file' => '/var/lib/vz/images/138/vm-138-disk-2.raw', 'drv' => 'raw', 'iops_wr' => 0 }, { 'ro' => $VAR1->[0]{'cache'}{'no-flush'}, 'node-name' => '#block132', 'file' => '/var/lib/vz/images/138/vm-138-disk-2.raw', 'detect_zeroes' => 'off', 'iops_wr' => 0, 'drv' => 'file', 'iops_rd' => 0, 'bps_rd' => 0, 'image' => { 'actual-size' => 1074794496, 'filename' => '/var/lib/vz/images/138/vm-138-disk-2.raw', 'dirty-flag' => $VAR1->[0]{'cache'}{'no-flush'}, 'virtual-size' => 1073741824, 'format' => 'file' }, 'write_threshold' => 0, 'backing_file_depth' => 0, 'encryption_key_missing' => $VAR1->[0]{'cache'}{'no-flush'}, 'encrypted' => $VAR1->[0]{'cache'}{'no-flush'}, 'bps_wr' => 0, 'bps' => 0, 'iops' => 0, 'cache' => { 'no-flush' => $VAR1->[0]{'cache'}{'no-flush'}, 'direct' => $VAR1->[0]{'cache'}{'direct'}, 'writeback' => $VAR1->[0]{'cache'}{'direct'} } }, { 'drv' => 'qcow2', 'iops_wr' => 0, 'detect_zeroes' => 'off', 'file' => '/var/lib/vz/images/138/vm-138-disk-1.qcow2', 'encrypted' => $VAR1->[0]{'cache'}{'no-flush'}, 'encryption_key_missing' => $VAR1->[0]{'cache'}{'no-flush'}, 'backing_file_depth' => 0, 'image' => { 'cluster-size' => 65536, 'virtual-size' => 1073741824, 'format-specific' => { 'type' => 'qcow2', 'data' => { 'corrupt' => $VAR1->[0]{'cache'}{'no-flush'}, 'compat' => '1.1', 'lazy-refcounts' => $VAR1->[0]{'cache'}{'no-flush'}, 'refcount-bits' => 16
Re: [Qemu-devel] [PATCH for-2.9] block: Declare blockdev-add and blockdev-del supported
Pretty awesome news ! Congrat ! So, can we update the wiki changelog ? http://wiki.qemu-project.org/ChangeLog/2.9 "QMP command blockdev-add is still a work in progress. It doesn't support all block drivers, it lacks a matching blockdev-del, and more. It might change incompatibly." - Mail original - De: "Markus Armbruster" À: "qemu-devel" Cc: "Kevin Wolf" , pkre...@redhat.com, "mreitz" Envoyé: Mardi 21 Mars 2017 17:57:53 Objet: [Qemu-devel] [PATCH for-2.9] block: Declare blockdev-add and blockdev-del supported It's been a long journey, but here we are. x-blockdev-remove-medium, x-blockdev-insert-medium and x-blockdev-change need a bit more work, so leave them alone for now. Signed-off-by: Markus Armbruster --- blockdev.c | 4 ++-- qapi/block-core.json | 14 +- tests/qemu-iotests/139 | 8 tests/qemu-iotests/141 | 4 ++-- tests/qemu-iotests/147 | 2 +- 5 files changed, 14 insertions(+), 18 deletions(-) diff --git a/blockdev.c b/blockdev.c index c5b2c2c..040c152 100644 --- a/blockdev.c +++ b/blockdev.c @@ -2835,7 +2835,7 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict) bs = bdrv_find_node(id); if (bs) { - qmp_x_blockdev_del(id, &local_err); + qmp_blockdev_del(id, &local_err); if (local_err) { error_report_err(local_err); } @@ -3900,7 +3900,7 @@ fail: visit_free(v); } -void qmp_x_blockdev_del(const char *node_name, Error **errp) +void qmp_blockdev_del(const char *node_name, Error **errp) { AioContext *aio_context; BlockDriverState *bs; diff --git a/qapi/block-core.json b/qapi/block-core.json index 0f132fc..5d913d4 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2907,11 +2907,7 @@ # BlockBackend will be created; otherwise, @node-name is mandatory at the top # level and no BlockBackend will be created. # -# Note: This command is still a work in progress. It doesn't support all -# block drivers among other things. Stay away from it unless you want -# to help with its development. -# -# Since: 1.7 +# Since: 2.9 # # Example: # @@ -2957,7 +2953,7 @@ { 'command': 'blockdev-add', 'data': 'BlockdevOptions', 'boxed': true } ## -# @x-blockdev-del: +# @blockdev-del: # # Deletes a block device that has been added using blockdev-add. # The command will fail if the node is attached to a device or is @@ -2969,7 +2965,7 @@ # experimental. Stay away from it unless you want to help with its # development. # -# Since: 2.5 +# Since: 2.9 # # Example: # @@ -2985,13 +2981,13 @@ # } # <- { "return": {} } # -# -> { "execute": "x-blockdev-del", +# -> { "execute": "blockdev-del", # "arguments": { "node-name": "node0" } # } # <- { "return": {} } # ## -{ 'command': 'x-blockdev-del', 'data': { 'node-name': 'str' } } +{ 'command': 'blockdev-del', 'data': { 'node-name': 'str' } } ## # @blockdev-open-tray: diff --git a/tests/qemu-iotests/139 b/tests/qemu-iotests/139 index 6d98e4f..175d8f0 100644 --- a/tests/qemu-iotests/139 +++ b/tests/qemu-iotests/139 @@ -1,6 +1,6 @@ #!/usr/bin/env python # -# Test cases for the QMP 'x-blockdev-del' command +# Test cases for the QMP 'blockdev-del' command # # Copyright (C) 2015 Igalia, S.L. # Author: Alberto Garcia @@ -79,7 +79,7 @@ class TestBlockdevDel(iotests.QMPTestCase): # Delete a BlockDriverState def delBlockDriverState(self, node, expect_error = False): self.checkBlockDriverState(node) - result = self.vm.qmp('x-blockdev-del', node_name = node) + result = self.vm.qmp('blockdev-del', node_name = node) if expect_error: self.assert_qmp(result, 'error/class', 'GenericError') else: @@ -173,7 +173,7 @@ class TestBlockdevDel(iotests.QMPTestCase): self.wait_until_completed(id) # Add a BlkDebug node - # Note that the purpose of this is to test the x-blockdev-del + # Note that the purpose of this is to test the blockdev-del # sanity checks, not to create a usable blkdebug drive def addBlkDebug(self, debug, node): self.checkBlockDriverState(node, False) @@ -191,7 +191,7 @@ class TestBlockdevDel(iotests.QMPTestCase): self.checkBlockDriverState(debug) # Add a BlkVerify node - # Note that the purpose of this is to test the x-blockdev-del + # Note that the purpose of this is to test the blockdev-del # sanity checks, not to create a usable blkverify drive def addBlkVerify(self, blkverify, test, raw): self.checkBlockDriverState(test, False) diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141 index 6d8f0a1..27fb1cc 100755 --- a/tests/qemu-iotests/141 +++ b/tests/qemu-iotests/141 @@ -65,7 +65,7 @@ test_blockjob() # We want this to return an error because the block job is still running _send_qemu_cmd $QEMU_HANDLE \ - "{'execute': 'x-blockdev-del', + "{'execute': 'blockdev-del', 'arguments': {'node-name': 'drv0'}}" \ 'error' | _filter_generated_node_ids @@ -75,7 +75,7 @@ test_blockjob() "$3" _send_qemu_cmd $QEMU_HANDLE \ - "{'execute': 'x-blockdev-del', + "{'execute': 'blockdev-del', 'arguments': {'node-name':
Re: [Qemu-devel] [RFC v5] RBD: Add support readv,writev for rbd
>>No yet. I just test on one qemu-kvm vm. It works fine. >>The performance may need more time. >>Any one can test on this patch if you do fast Hi, I would like to bench it with small 4k read/write. On the ceph side,do we need this PR ? : https://github.com/ceph/ceph/pull/13447 - Mail original - De: "jazeltq" À: "Tiger Hu" Cc: "Josh Durgin" , "Jeff Cody" , "dillaman" , "Kevin Wolf" , mre...@redhat.com, qemu-bl...@nongnu.org, "qemu-devel" , "ceph-devel" , "tianqing" Envoyé: Jeudi 16 Février 2017 15:03:52 Objet: Re: [RFC v5] RBD: Add support readv,writev for rbd No yet. I just test on one qemu-kvm vm. It works fine. The performance may need more time. Any one can test on this patch if you do fast 2017-02-16 20:07 GMT+08:00 Tiger Hu : > Tianqing, > > Do we have any performance data for this patch? Thanks. > > Tiger >> 在 2017年2月16日,下午5:00,jaze...@gmail.com 写道: >> >> From: tianqing >> >> Rbd can do readv and writev directly, so wo do not need to transform >> iov to buf or vice versa any more. >> >> Signed-off-by: tianqing >> --- >> block/rbd.c | 49 ++--- >> 1 file changed, 42 insertions(+), 7 deletions(-) >> >> diff --git a/block/rbd.c b/block/rbd.c >> index a57b3e3..75ae1d6 100644 >> --- a/block/rbd.c >> +++ b/block/rbd.c >> @@ -47,7 +47,7 @@ >> */ >> >> /* rbd_aio_discard added in 0.1.2 */ >> -#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2) >> +#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(12, 0, 0) >> #define LIBRBD_SUPPORTS_DISCARD >> #else >> #undef LIBRBD_SUPPORTS_DISCARD >> @@ -73,7 +73,12 @@ typedef struct RBDAIOCB { >> BlockAIOCB common; >> int64_t ret; >> QEMUIOVector *qiov; >> +/* Note: >> + * The LIBRBD_SUPPORTS_IOVEC is defined in librbd.h. >> + */ >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> char *bounce; >> +#endif >> RBDAIOCmd cmd; >> int error; >> struct BDRVRBDState *s; >> @@ -83,7 +88,9 @@ typedef struct RADOSCB { >> RBDAIOCB *acb; >> struct BDRVRBDState *s; >> int64_t size; >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> char *buf; >> +#endif >> int64_t ret; >> } RADOSCB; >> >> @@ -426,11 +433,21 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) >> } >> } else { >> if (r < 0) { >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> memset(rcb->buf, 0, rcb->size); >> +#else >> + iov_memset(acb->qiov->iov, acb->qiov->niov, 0, 0, acb->qiov->size); >> +#endif >> acb->ret = r; >> acb->error = 1; >> } else if (r < rcb->size) { >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> memset(rcb->buf + r, 0, rcb->size - r); >> +#else >> + iov_memset(acb->qiov->iov, acb->qiov->niov, >> + r, 0, acb->qiov->size - r); >> +#endif >> + >> if (!acb->error) { >> acb->ret = rcb->size; >> } >> @@ -441,10 +458,12 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) >> >> g_free(rcb); >> >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> if (acb->cmd == RBD_AIO_READ) { >> qemu_iovec_from_buf(acb->qiov, 0, acb->bounce, acb->qiov->size); >> } >> qemu_vfree(acb->bounce); >> +#endif >> acb->common.cb(acb->common.opaque, (acb->ret > 0 ? 0 : acb->ret)); >> >> qemu_aio_unref(acb); >> @@ -655,8 +674,10 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, >> RBDAIOCB *acb; >> RADOSCB *rcb = NULL; >> rbd_completion_t c; >> - char *buf; >> int r; >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> + char *buf = NULL; >> +#endif >> >> BDRVRBDState *s = bs->opaque; >> >> @@ -664,6 +685,8 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, >> acb->cmd = cmd; >> acb->qiov = qiov; >> assert(!qiov || qiov->size == size); >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> + >> if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) { >> acb->bounce = NULL; >> } else { >> @@ -672,19 +695,21 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, >> goto failed; >> } >> } >> - acb->ret = 0; >> - acb->error = 0; >> - acb->s = s; >> - >> if (cmd == RBD_AIO_WRITE) { >> qemu_iovec_to_buf(acb->qiov, 0, acb->bounce, qiov->size); >> } >> - >> buf = acb->bounce; >> +#endif >> + acb->ret = 0; >> + acb->error = 0; >> + acb->s = s; >> >> rcb = g_new(RADOSCB, 1); >> + >> rcb->acb = acb; >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> rcb->buf = buf; >> +#endif >> rcb->s = acb->s; >> rcb->size = size; >> r = rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb, &c); >> @@ -694,10 +719,18 @@ static BlockAIOCB *rbd_start_aio(BlockDriverState *bs, >> >> switch (cmd) { >> case RBD_AIO_WRITE: >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> r = rbd_aio_write(s->image, off, size, buf, c); >> +#else >> + r = rbd_aio_writev(s->image, qiov->iov, qiov->niov, off, c); >> +#endif >> break; >> case RBD_AIO_READ: >> +#ifndef LIBRBD_SUPPORTS_IOVEC >> r = rbd_aio_read(s->image, off, size, buf, c); >> +#else >> + r = rbd_aio_readv(s->image, qiov->iov, qiov->niov, off, c); >> +#endif >> break; >> case RBD_AIO_DISCARD: >> r = rbd_aio_discard_wrapper(s->image, off, size, c); >> @@ -719,7 +752,9 @@ failed_completion: >> rbd_aio_releas
Re: [Qemu-devel] Data corruption in Qemu 2.7.1
Hi, proxmox users have reported recently corruption with qemu 2.7 and scsi-block (with passing physical /dev/sdX to virtio-scsi). working fine with qemu 2.6. qemu 2.7 + scsi-hd works fine https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-2 - Mail original - De: "Peter Lieven" À: "qemu-devel" Cc: "qemu-stable" Envoyé: Vendredi 13 Janvier 2017 11:44:32 Objet: [Qemu-devel] Data corruption in Qemu 2.7.1 Hi, i currently facing a problem in our testing environment where I see file system corruption with 2.7.1 on iSCSI and Local Storage (LVM). Trying to bisect, but has anyone observed this before? Thanks, Peter
[Qemu-devel] drive_mirror to nbd with tls ?
Hi, I'm currently trying to implemented drive_mirror to nbd with tls. It's working fine without tls, but with tls enabled on target, I don't known how to pass tls-creds to drive_mirror nbd uri ? I don't see any option in nbd uri schema. Currently, I have this error "TLS negotiation required before option 3" with calling drive_mirror to nbd. Any idea ? Regards, Alexandre Derumier
Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?
does rollbacking the kernel to previous version fix the problem ? i'm not sure if "perf" could give you some hints - Mail original - De: "Stefan Priebe, Profihost AG" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 14 Décembre 2016 21:36:23 Objet: Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7? Am 14.12.2016 um 16:33 schrieb Alexandre DERUMIER: > Hi Stefan, > > do you have upgraded kernel ? Yes sure. But i'm out of ideas how to debug. Sometimes it gives me constant 80MB/s, sometimes 125 and sometimes only 6. While on the host the cards are not busy. Greets, Stefan > > maybe it could be related to vhost-net module too. > > > - Mail original - > De: "Stefan Priebe, Profihost AG" > À: "qemu-devel" > Envoyé: Mercredi 14 Décembre 2016 16:04:08 > Objet: [Qemu-devel] any known virtio-net regressions in Qemu 2.7? > > Hello, > > after upgrading a cluster OS, Qemu, ... i'm experiencing slow and > volatile network speeds inside my VMs. > > Currently I've no idea what causes this but it's related to the host > upgrades. Before i was running Qemu 2.6.2. > > I'm using virtio for the network cards. > > Greets, > Stefan >
Re: [Qemu-devel] any known virtio-net regressions in Qemu 2.7?
Hi Stefan, do you have upgraded kernel ? maybe it could be related to vhost-net module too. - Mail original - De: "Stefan Priebe, Profihost AG" À: "qemu-devel" Envoyé: Mercredi 14 Décembre 2016 16:04:08 Objet: [Qemu-devel] any known virtio-net regressions in Qemu 2.7? Hello, after upgrading a cluster OS, Qemu, ... i'm experiencing slow and volatile network speeds inside my VMs. Currently I've no idea what causes this but it's related to the host upgrades. Before i was running Qemu 2.6.2. I'm using virtio for the network cards. Greets, Stefan
Re: [Qemu-devel] [PATCH v7] docs: add cpu-hotplug.txt
Hello, I'm looking to implement cpu hotplug, and I have a question about cpu flags currently I have something like -cpu qemu64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -smp 4,sockets=2,cores=2,maxcpus=4 Does I need to define flags like: -smp 2,sockets=2,cores=2,maxcpus=4 -device qemu64-x86_64-cpu,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce,id=cpu1,socket-id=1,core-id=1,thread-id=0 ... ? Another question, is -smp mandatory ? (if I want coldplug all cpus) -smp sockets=2,cores=2,maxcpus=4 -device qemu64-x86_64-cpu,id=cpu1,socket-id=1,core-id=1,thread-id=0 -device qemu64-x86_64-cpu,id=cpu1,socket-id=1,core-id=2,thread-id=0 -device qemu64-x86_64-cpu,id=cpu3,socket-id=2,core-id=1,thread-id=0 -device qemu64-x86_64-cpu,id=cpu4,socket-id=2,core-id=2,thread-id=0 or does I need minimum 1 non unplugable cpu -smp 1,sockets=2,cores=2,maxcpus=4 -device qemu64-x86_64-cpu,id=cpu1,socket-id=1,core-id=2,thread-id=0 -device qemu64-x86_64-cpu,id=cpu3,socket-id=2,core-id=1,thread-id=0 -device qemu64-x86_64-cpu,id=cpu4,socket-id=2,core-id=2,thread-id=0 Regards, Alexandre - Mail original - De: "Dou Liyang" À: "qemu-devel" Cc: "Dou Liyang" , drjo...@redhat.com, "ehabkost" , "Markus Armbruster" , bhar...@linux.vnet.ibm.com, "Fam Zheng" , "Igor Mammedov" , da...@gibson.dropbear.id.au Envoyé: Jeudi 18 Août 2016 03:50:50 Objet: [Qemu-devel] [PATCH v7] docs: add cpu-hotplug.txt This document describes how to use cpu hotplug in QEMU. Signed-off-by: Dou Liyang Reviewed-by: Andrew Jones --- Change log v6 -> v7 >From Bharata's advice 1. add "qom_path" property explanation for "info hotpluggable-cpus" command >From drew's advice 1. Fix some spelling mistake Change log v5 -> v6 >From drew's advice 1. Fix some spelling and grammar mistakes Change log v4 -> v5 1. add an example for sPAPR >From Bharata's advice 1. Fix the examples Change log v3 -> v4 >From David's advice 1. add spapr examples 2. Fix some comment >From drew's advice 1. Fix some syntax Change log v2 -> v3: >From drew's advice: 1. modify the examples. 2. Fix some syntax. Change log v1 -> v2: >From Fam's advice: 1. Fix some comment. Change log v1: >From Igor's advice: 1. Remove any mentioning of apic-id from the document. 2. Remove the "device_del qom_path" from the CPU hot-unplug. 3. Fix some comment. docs/cpu-hotplug.txt | 156 +++ 1 file changed, 156 insertions(+) create mode 100644 docs/cpu-hotplug.txt diff --git a/docs/cpu-hotplug.txt b/docs/cpu-hotplug.txt new file mode 100644 index 000..3667641 --- /dev/null +++ b/docs/cpu-hotplug.txt @@ -0,0 +1,156 @@ +QEMU CPU hotplug + + +This document explains how to use the CPU hotplug feature in QEMU, +which regards the CPU as a device, using -device/device_add and +device_del. + +QEMU support was merged for 2.7. + +Guest support is required for CPU hotplug to work. + +CPU hot-plug + + +In order to be able to hotplug CPUs, QEMU has to be told the maximum +number of CPUs which the guest can have. This is done at startup time +by means of the -smp command-line option, which has the following +format: + + -smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads] + [,sockets=sockets] + +where, + + - "cpus" sets the number of CPUs to 'n' [default=1]. + - "maxcpus" sets the maximum number of CPUs, including offline VCPUs + for hotplug. + - "sockets" sets the number of discrete sockets in the system. + - "cores" sets the number of CPU cores on one socket. + - "threads" sets the number of threads on one CPU core. + +For example, the following command-line: + + qemu [...] -smp 4,maxcpus=8,sockets=2,cores=2,threads=2 + +creates a guest with 4 VCPUs and supports up to 8 VCPUs. The CPU topology +is sockets (2) * cores (2) * threads (2) and should compute a number of +slots exactly equal to maxcpus. A computed number of slots greater than +maxcpus will result in error. When the guest finishes loading, the guest +will see 4 VCPUs. More of this below. + +Query available CPU objects +--- + +To add a VCPU, it must be identified by socket-id, core-id, and/or +thread-id parameters. + +Before adding the VCPU, we should know the topology parameters, so +that we can find the available location (socket,core,thread) for a +new VCPU. + +Use the HMP command "info hotpluggable-cpus" to obtain them, for example: + + (qemu) info hotpluggable-cpus + +lists all CPUs including the present and possible hot-pluggable CPUs. +Such as this: + + ... + type: "qemu64-x86_64-cpu" + vcpus_count: "1" + CPUInstance Properties: + socket-id: "1" + core-id: "0" + thread-id: "0" + type: "qemu64-x86_64-cpu" + vcpus_count: "1" + qom_path: "/machine/unattached/device[4]" + CPUInstance Properties: + socket-id: "0" + core-id: "1" + thread-id: "1" + ... + +or + + ... + type: "POWER7_v2.3-spapr-cpu
[Qemu-devel] [PATCH] rbd : disable rbd_cache_writethrough_until_flush for cache=unsafe
ceph force writethrough until a flush is detected. With cache=unsafe, we never send flush. So we need to tell to ceph to set rbd_cache_writethrough_until_flush=false in this case. This speedup a lot qemu-img convert which use cache=unsafe by default Signed-off-by: Alexandre Derumier --- block/rbd.c | 4 1 file changed, 4 insertions(+) diff --git a/block/rbd.c b/block/rbd.c index 0106fea..f3af6c8 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -552,6 +552,10 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags, rados_conf_set(s->cluster, "rbd_cache", "true"); } +if (flags & BDRV_O_NO_FLUSH) { +rados_conf_set(s->cluster, "rbd_cache_writethrough_until_flush", "false"); +} + r = rados_connect(s->cluster); if (r < 0) { error_setg_errno(errp, -r, "error connecting"); -- 2.1.4
Re: [Qemu-devel] [RFC] virtio-blk: simple multithreaded MQ implementation for bdrv_raw
Hi, >>To avoid any locks in qemu backend and not to introduce thread safety >>into qemu block-layer I open same backend device several times, one >>device per one MQ. e.g. the following is the stack for a virtio-blk >>with num-queues=2: Could it be possible in the future to not open several times the same backend ? I'm thinking about ceph/librbd, which since last version allow only to open once a backend by default (exclusive-lock, which is a requirement for advanced features like rbd-mirroring, fast-diff,) Regards, Alexandre Derumier - Mail original - De: "Stefan Hajnoczi" À: "Roman Pen" Cc: "qemu-devel" , "stefanha" Envoyé: Samedi 28 Mai 2016 00:27:10 Objet: Re: [Qemu-devel] [RFC] virtio-blk: simple multithreaded MQ implementation for bdrv_raw On Fri, May 27, 2016 at 01:55:04PM +0200, Roman Pen wrote: > Hello, all. > > This is RFC because mostly this patch is a quick attempt to get true > multithreaded multiqueue support for a block device with native AIO. > The goal is to squeeze everything possible on lockless IO path from > MQ block on a guest to MQ block on a host. > > To avoid any locks in qemu backend and not to introduce thread safety > into qemu block-layer I open same backend device several times, one > device per one MQ. e.g. the following is the stack for a virtio-blk > with num-queues=2: > > VirtIOBlock > / \ > VirtQueue#0 VirtQueue#1 > IOThread#0 IOThread#1 > BH#0 BH#1 > Backend#0 Backend#1 > \ / > /dev/null0 > > To group all objects related to one vq new structure is introduced: > > typedef struct VirtQueueCtx { > BlockBackend *blk; > struct VirtIOBlock *s; > VirtQueue *vq; > void *rq; > QEMUBH *bh; > QEMUBH *batch_notify_bh; > IOThread *iothread; > Notifier insert_notifier; > Notifier remove_notifier; > /* Operation blocker on BDS */ > Error *blocker; > } VirtQueueCtx; > > And VirtIOBlock includes an array of these contexts: > > typedef struct VirtIOBlock { > VirtIODevice parent_obj; > + VirtQueueCtx mq[VIRTIO_QUEUE_MAX]; > ... > > This patch is based on Stefan's series: "virtio-blk: multiqueue support", > with minor difference: I reverted "virtio-blk: multiqueue batch notify", > which does not make a lot sense when each VQ is handled by it's own > iothread. > > The qemu configuration stays the same, i.e. put num-queues=N and N > iothreads will be started on demand and N drives will be opened: > > qemu -device virtio-blk-pci,num-queues=8 > > My configuration is the following: > > host: > Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, > 8 CPUs, > /dev/nullb0 as backend with the following parameters: > $ cat /sys/module/null_blk/parameters/submit_queues > 8 > $ cat /sys/module/null_blk/parameters/irqmode > 1 > > guest: > 8 VCPUs > > qemu: > -object iothread,id=t0 \ > -drive > if=none,id=d0,file=/dev/nullb0,format=raw,snapshot=off,cache=none,aio=native > \ > -device > virtio-blk-pci,num-queues=$N,iothread=t0,drive=d0,disable-modern=off,disable-legacy=on > > > where $N varies during the tests. > > fio: > [global] > description=Emulation of Storage Server Access Pattern > bssplit=512/20:1k/16:2k/9:4k/12:8k/19:16k/10:32k/8:64k/4 > fadvise_hint=0 > rw=randrw:2 > direct=1 > > ioengine=libaio > iodepth=64 > iodepth_batch_submit=64 > iodepth_batch_complete=64 > numjobs=8 > gtod_reduce=1 > group_reporting=1 > > time_based=1 > runtime=30 > > [job] > filename=/dev/vda > > Results: > num-queues RD bw WR bw > -- - - > > * with 1 iothread * > > 1 thr 1 mq 1225MB/s 1221MB/s > 1 thr 2 mq 1559MB/s 1553MB/s > 1 thr 4 mq 1729MB/s 1725MB/s > 1 thr 8 mq 1660MB/s 1655MB/s > > * with N iothreads * > > 2 thr 2 mq 1845MB/s 1842MB/s > 4 thr 4 mq 2187MB/s 2183MB/s > 8 thr 8 mq 1383MB/s 1378MB/s > > Obviously, 8 iothreads + 8 vcpu threads is too much for my machine > with 8 CPUs, but 4 iothreads show quite good result. Cool, thanks for trying this experiment and posting results. It's encouraging to see the improvement. Did you use any CPU affinity settings to co-locate vcpu and iothreads onto host CPUs? Stefan
Re: [Qemu-devel] [PATCH] fw_cfg: unbreak migration compatibility for 2.4 and earlier machines
>>... but some actual migration testing would be great indeed. I have sent a patched version build to proxmox users for testing, I'm waiting for their results - Mail original - De: "kraxel" À: "Laszlo Ersek" Cc: "qemu-devel" , "Marc Mar?" , "aderumier" , "qemu-stable" Envoyé: Vendredi 19 Février 2016 08:09:54 Objet: Re: [PATCH] fw_cfg: unbreak migration compatibility for 2.4 and earlier machines On Do, 2016-02-18 at 20:31 +0100, Laszlo Ersek wrote: > When I reviewed Marc's fw_cfg DMA patches, I completely missed that the > way we set dma_enabled would break migration. Patch looks fine to me ... > Not tested: > * actual migration > * when board code doesn't request DMA support > > Testing feedback from people who use migration would be nice. ... but some actual migration testing would be great indeed. thanks, Gerd
[Qemu-devel] [Bug 1536487] Re: Unable to migrate pc-i440fx-2.4 KVM guest from QEMU 2.5.0 to QEMU 2.4.1
Hi, Proxmox users have reported same bug (qemu 2.5 with pc-i440fc-2.4 not migrating to qemu 2.4.1) https://forum.proxmox.com/threads/cant-live-migrate-after-dist- upgrade.26097/ I don't have verified yet, but it seem to be related. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1536487 Title: Unable to migrate pc-i440fx-2.4 KVM guest from QEMU 2.5.0 to QEMU 2.4.1 Status in QEMU: New Bug description: When migrating a pc-i440fc-2.4 KVM guest from QEMU 2.5.0 to QEMU 2.4.1, the target QEMU errors out: qemu-system-x86_64: error while loading state for instance 0x0 of device 'fw_cfg' This appears to be related to the addition of a DMA interface to fw_cfg last October: http://lists.nongnu.org/archive/html/qemu- devel/2015-10/msg04568.html "info qtree" on the source QEMU shows that the DMA interface for fw_cfg had been enabled: bus: main-system-bus type System ... dev: fw_cfg_io, id "" iobase = 1296 (0x510) dma_iobase = 1300 (0x514) dma_enabled = true Incidentally, this guest had just undergone a migration from QEMU 2.4.1 to QEMU 2.5.0, so it looks like DMA was enabled simply through the migration. It seems to me that the DMA interface for fw_cfg should only be enabled on pc-i440fx-2.5 machines or higher. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1536487/+subscriptions
[Qemu-devel] Unable to migrate pc-i440fx-2.4 KVM guest from QEMU 2.5.0 to QEMU 2.4.1
Hi, I (and promox users) have same problem than this bugreport https://bugs.launchpad.net/qemu/+bug/1536487 https://forum.proxmox.com/threads/cant-live-migrate-after-dist-upgrade.26097/ Migrating from qemu 2.5 with pc-i440fx-2.4 to qemu 2.4.1 don't work. qemu-system-x86_64: error while loading state for instance 0x0 of device 'fw_cfg' Details are in the bugreport. Regards, Alexandre
Re: [Qemu-devel] poor virtio-scsi performance (fio testing)
>>May be my default cfq slowdown? Yes ! (in your first mail you said that you use deadline scheduler ?) cfq don't play well with a lof of current job. cfq + numjobs=10 : 1 iops cfq + numjobs=1 : 25000 iops deadline + numjobs=1 : 25000 iops deadline + numjobs=10 : 25000 iops - Mail original - De: "Vasiliy Tolstov" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 25 Novembre 2015 11:48:11 Objet: Re: [Qemu-devel] poor virtio-scsi performance (fio testing) 2015-11-25 13:27 GMT+03:00 Alexandre DERUMIER : > I have tested with a raw file, qemu 2.4 + virtio-scsi (without iothread), I'm > around 25k iops > with an intel ssd 3500. (host cpu are xeon v3 3,1ghz) What scheduler you have on host system? May be my default cfq slowdown? -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] poor virtio-scsi performance (fio testing)
>>I'm try with classic lvm - sometimes i get more iops, but stable results is the same =) I have tested with a raw file, qemu 2.4 + virtio-scsi (without iothread), I'm around 25k iops with an intel ssd 3500. (host cpu are xeon v3 3,1ghz) randrw: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... fio-2.1.11 Starting 10 processes Jobs: 4 (f=3): [m(2),_(3),m(1),_(3),m(1)] [100.0% done] [96211KB/96639KB/0KB /s] [24.6K/24.2K/0 iops] [eta 00m:00s] randrw: (groupid=0, jobs=10): err= 0: pid=25662: Wed Nov 25 11:25:22 2015 read : io=5124.7MB, bw=97083KB/s, iops=24270, runt= 54047msec slat (usec): min=1, max=34577, avg=181.90, stdev=739.20 clat (usec): min=177, max=49641, avg=6511.31, stdev=3176.16 lat (usec): min=185, max=52810, avg=6693.55, stdev=3247.36 clat percentiles (usec): | 1.00th=[ 1704], 5.00th=[ 2576], 10.00th=[ 3184], 20.00th=[ 4016], | 30.00th=[ 4704], 40.00th=[ 5344], 50.00th=[ 5984], 60.00th=[ 6688], | 70.00th=[ 7456], 80.00th=[ 8512], 90.00th=[10304], 95.00th=[12224], | 99.00th=[17024], 99.50th=[19584], 99.90th=[26240], 99.95th=[29568], | 99.99th=[37632] bw (KB /s): min= 6690, max=12432, per=10.02%, avg=9728.49, stdev=796.49 write: io=5115.1MB, bw=96929KB/s, iops=24232, runt= 54047msec slat (usec): min=1, max=37270, avg=188.68, stdev=756.21 clat (usec): min=98, max=54737, avg=6246.50, stdev=3078.27 lat (usec): min=109, max=56078, avg=6435.53, stdev=3134.23 clat percentiles (usec): | 1.00th=[ 1960], 5.00th=[ 2640], 10.00th=[ 3120], 20.00th=[ 3856], | 30.00th=[ 4512], 40.00th=[ 5088], 50.00th=[ 5664], 60.00th=[ 6304], | 70.00th=[ 7072], 80.00th=[ 8160], 90.00th=[ 9920], 95.00th=[11712], | 99.00th=[16768], 99.50th=[19328], 99.90th=[26496], 99.95th=[31104], | 99.99th=[42752] bw (KB /s): min= 7424, max=12712, per=10.02%, avg=9712.21, stdev=768.32 lat (usec) : 100=0.01%, 250=0.01%, 500=0.02%, 750=0.03%, 1000=0.05% lat (msec) : 2=1.49%, 4=19.18%, 10=68.69%, 20=10.10%, 50=0.45% lat (msec) : 100=0.01% cpu : usr=1.28%, sys=8.94%, ctx=329299, majf=0, minf=76 IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued: total=r=1311760/w=1309680/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): READ: io=5124.7MB, aggrb=97082KB/s, minb=97082KB/s, maxb=97082KB/s, mint=54047msec, maxt=54047msec WRITE: io=5115.1MB, aggrb=96928KB/s, minb=96928KB/s, maxb=96928KB/s, mint=54047msec, maxt=54047msec Disk stats (read/write): sdb: ios=1307835/1305417, merge=2523/2770, ticks=4309296/3954628, in_queue=8274916, util=99.95% - Mail original - De: "Vasiliy Tolstov" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 25 Novembre 2015 11:12:33 Objet: Re: [Qemu-devel] poor virtio-scsi performance (fio testing) 2015-11-25 13:08 GMT+03:00 Alexandre DERUMIER : > Maybe could you try to create 2 disk in your vm, each with 1 dedicated > iothread, > > then try to run fio on both disk at the same time, and see if performance > improve. > Thats fine, but by default i have only one disk inside vm, so i prefer increase single disk speed. > > But maybe they are some write overhead with lvmthin (because of copy on > write) and sheepdog. > > Do you have tried with classic lvm or raw file ? I'm try with classic lvm - sometimes i get more iops, but stable results is the same =) -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] poor virtio-scsi performance (fio testing)
Maybe could you try to create 2 disk in your vm, each with 1 dedicated iothread, then try to run fio on both disk at the same time, and see if performance improve. But maybe they are some write overhead with lvmthin (because of copy on write) and sheepdog. Do you have tried with classic lvm or raw file ? - Mail original - De: "Vasiliy Tolstov" À: "qemu-devel" Envoyé: Jeudi 19 Novembre 2015 09:16:22 Objet: [Qemu-devel] poor virtio-scsi performance (fio testing) I'm test virtio-scsi on various kernels (with and without scsi-mq) with deadline io scheduler (best performance). I'm test with lvm thin volume and with sheepdog storage. Data goes to ssd that have on host system is about 30K iops. When i'm test via fio [randrw] blocksize=4k filename=/dev/sdb rw=randrw direct=1 buffered=0 ioengine=libaio iodepth=32 group_reporting numjobs=10 runtime=600 I'm always stuck at 11K-12K iops with sheepdog or with lvm. When i'm switch to virtio-blk and enable data-plane i'm get around 16K iops. I'm try to enable virtio-scsi-data-plane but may be miss something (get around 13K iops) I'm use libvirt 1.2.16 and qemu 2.4.1 What can i do to get near 20K-25K iops? (qemu testing drive have cache=none io=native) -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever
>>Can you reproduce with Ceph debug logging enabled (i.e. debug rbd=20 in your >>ceph.conf)? If you could attach the log to the Ceph tracker ticket I opened >>[1], that would be very helpful. >> >>[1] http://tracker.ceph.com/issues/13726 yes,I'm able to reproduce it 100%, I have attached the log to the tracker. Alexandre - Mail original - De: "Jason Dillaman" À: "aderumier" Cc: "ceph-devel" , "qemu-devel" , jdur...@redhat.com Envoyé: Lundi 9 Novembre 2015 14:42:42 Objet: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever Can you reproduce with Ceph debug logging enabled (i.e. debug rbd=20 in your ceph.conf)? If you could attach the log to the Ceph tracker ticket I opened [1], that would be very helpful. [1] http://tracker.ceph.com/issues/13726 Thanks, Jason - Original Message - > From: "Alexandre DERUMIER" > To: "ceph-devel" > Cc: "qemu-devel" , jdur...@redhat.com > Sent: Monday, November 9, 2015 5:48:45 AM > Subject: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and > vm_stop is hanging forever > > adding to ceph.conf > > [client] > rbd_non_blocking_aio = false > > > fix the problem for me (with rbd_cache=false) > > > (@cc jdur...@redhat.com) > > > > - Mail original - > De: "Denis V. Lunev" > À: "aderumier" , "ceph-devel" > , "qemu-devel" > Envoyé: Lundi 9 Novembre 2015 08:22:34 > Objet: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop > is hanging forever > > On 11/09/2015 10:19 AM, Denis V. Lunev wrote: > > On 11/09/2015 06:10 AM, Alexandre DERUMIER wrote: > >> Hi, > >> > >> with qemu (2.4.1), if I do an internal snapshot of an rbd device, > >> then I pause the vm with vm_stop, > >> > >> the qemu process is hanging forever > >> > >> > >> monitor commands to reproduce: > >> > >> > >> # snapshot_blkdev_internal drive-virtio0 yoursnapname > >> # stop > >> > >> > >> > >> > >> I don't see this with qcow2 or sheepdog block driver for example. > >> > >> > >> Regards, > >> > >> Alexandre > >> > > this could look like the problem I have recenty trying to > > fix with dataplane enabled. Patch series is named as > > > > [PATCH for 2.5 v6 0/10] dataplane snapshot fixes > > > > Den > > anyway, even if above will not help, can you collect gdb > traces from all threads in QEMU process. May be I'll be > able to give a hit. > > Den > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever
adding to ceph.conf [client] rbd_non_blocking_aio = false fix the problem for me (with rbd_cache=false) (@cc jdur...@redhat.com) - Mail original - De: "Denis V. Lunev" À: "aderumier" , "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 08:22:34 Objet: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever On 11/09/2015 10:19 AM, Denis V. Lunev wrote: > On 11/09/2015 06:10 AM, Alexandre DERUMIER wrote: >> Hi, >> >> with qemu (2.4.1), if I do an internal snapshot of an rbd device, >> then I pause the vm with vm_stop, >> >> the qemu process is hanging forever >> >> >> monitor commands to reproduce: >> >> >> # snapshot_blkdev_internal drive-virtio0 yoursnapname >> # stop >> >> >> >> >> I don't see this with qcow2 or sheepdog block driver for example. >> >> >> Regards, >> >> Alexandre >> > this could look like the problem I have recenty trying to > fix with dataplane enabled. Patch series is named as > > [PATCH for 2.5 v6 0/10] dataplane snapshot fixes > > Den anyway, even if above will not help, can you collect gdb traces from all threads in QEMU process. May be I'll be able to give a hit. Den
Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever
Something is really wrong, because guest is also freezing, with a simple snapshot, with cache=none / rbd_cache=false qemu monitor : snapshot_blkdev_internal drive-virtio0 snap1 or rbd command : rbd --image myrbdvolume snap create --snap snap1 Then the guest can't read/write to disk anymore. I have tested with last ceph internalis, same problem - Mail original - De: "aderumier" À: "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 04:54:23 Objet: Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever Also, this occur only with rbd_cache=false or qemu drive cache=none. If I use rbd_cache=true or qemu drive cache=writeback, I don't have this bug. - Mail original - De: "aderumier" À: "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 04:23:10 Objet: Re: qemu : rbd block driver internal snapshot and vm_stop is hanging forever Some other infos: I can reproduce it too with manual snapshot with rbd command #rbd --image myrbdvolume snap create --snap snap1 qemu monitor: #stop This is with ceph hammer 0.94.5. in qemu vm_stop, the only thing related to block driver are bdrv_drain_all(); ret = bdrv_flush_all(); - Mail original - De: "aderumier" À: "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 04:10:45 Objet: qemu : rbd block driver internal snapshot and vm_stop is hanging forever Hi, with qemu (2.4.1), if I do an internal snapshot of an rbd device, then I pause the vm with vm_stop, the qemu process is hanging forever monitor commands to reproduce: # snapshot_blkdev_internal drive-virtio0 yoursnapname # stop I don't see this with qcow2 or sheepdog block driver for example. Regards, Alexandre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever
Also, this occur only with rbd_cache=false or qemu drive cache=none. If I use rbd_cache=true or qemu drive cache=writeback, I don't have this bug. - Mail original - De: "aderumier" À: "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 04:23:10 Objet: Re: qemu : rbd block driver internal snapshot and vm_stop is hanging forever Some other infos: I can reproduce it too with manual snapshot with rbd command #rbd --image myrbdvolume snap create --snap snap1 qemu monitor: #stop This is with ceph hammer 0.94.5. in qemu vm_stop, the only thing related to block driver are bdrv_drain_all(); ret = bdrv_flush_all(); - Mail original - De: "aderumier" À: "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 04:10:45 Objet: qemu : rbd block driver internal snapshot and vm_stop is hanging forever Hi, with qemu (2.4.1), if I do an internal snapshot of an rbd device, then I pause the vm with vm_stop, the qemu process is hanging forever monitor commands to reproduce: # snapshot_blkdev_internal drive-virtio0 yoursnapname # stop I don't see this with qcow2 or sheepdog block driver for example. Regards, Alexandre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever
Some other infos: I can reproduce it too with manual snapshot with rbd command #rbd --image myrbdvolume snap create --snap snap1 qemu monitor: #stop This is with ceph hammer 0.94.5. in qemu vm_stop, the only thing related to block driver are bdrv_drain_all(); ret = bdrv_flush_all(); - Mail original - De: "aderumier" À: "ceph-devel" , "qemu-devel" Envoyé: Lundi 9 Novembre 2015 04:10:45 Objet: qemu : rbd block driver internal snapshot and vm_stop is hanging forever Hi, with qemu (2.4.1), if I do an internal snapshot of an rbd device, then I pause the vm with vm_stop, the qemu process is hanging forever monitor commands to reproduce: # snapshot_blkdev_internal drive-virtio0 yoursnapname # stop I don't see this with qcow2 or sheepdog block driver for example. Regards, Alexandre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Qemu-devel] qemu : rbd block driver internal snapshot and vm_stop is hanging forever
Hi, with qemu (2.4.1), if I do an internal snapshot of an rbd device, then I pause the vm with vm_stop, the qemu process is hanging forever monitor commands to reproduce: # snapshot_blkdev_internal drive-virtio0 yoursnapname # stop I don't see this with qcow2 or sheepdog block driver for example. Regards, Alexandre
Re: [Qemu-devel] [Bug 1494350] Re: QEMU: causes vCPU steal time overflow on live migration
>>Hi, Hi >> >>I've seen the same issue with debian jessie. >> >>Compiled 4.2.3 from kernel.org with "make localyesconfig", >>no problem any more host kernel or guest kernel ? - Mail original - De: "Markus Breitegger" <1494...@bugs.launchpad.net> À: "qemu-devel" Envoyé: Jeudi 15 Octobre 2015 12:16:07 Objet: [Qemu-devel] [Bug 1494350] Re: QEMU: causes vCPU steal time overflow on live migration Hi, I've seen the same issue with debian jessie. Compiled 4.2.3 from kernel.org with "make localyesconfig", no problem any more -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1494350 Title: QEMU: causes vCPU steal time overflow on live migration Status in QEMU: New Bug description: I'm pasting in text from Debian Bug 785557 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785557 b/c I couldn't find this issue reported. It is present in QEMU 2.3, but I haven't tested later versions. Perhaps someone else will find this bug and confirm for later versions. (Or I will when I have time!) Hi, I'm trying to debug an issue we're having with some debian.org machines running in QEMU 2.1.2 instances (see [1] for more background). In short, after a live migration guests running Debian Jessie (linux 3.16) stop accounting CPU time properly. /proc/stat in the guest shows no increase in user and system time anymore (regardless of workload) and what stands out are extremely large values for steal time: % cat /proc/stat cpu 2400 0 1842 650879168 2579640 0 25 136562317270 0 0 cpu0 1366 0 1028 161392988 1238598 0 11 383803090749 0 0 cpu1 294 0 240 162582008 639105 0 8 39686436048 0 0 cpu2 406 0 338 163331066 383867 0 4 333994238765 0 0 cpu3 332 0 235 163573105 318069 0 1 1223752959076 0 0 intr 355773871 33 10 0 0 0 0 3 0 1 0 0 36 144 0 0 1638612 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5001741 41 0 8516993 0 3669582 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 837862829 btime 1431642967 processes 8529939 procs_running 1 procs_blocked 0 softirq 225193331 2 77532878 172 7250024 819289 0 54 33739135 176552 105675225 Reading the memory pointed to by the steal time MSRs pre- and post-migration, I can see that post-migration the high bytes are set to 0xff: (qemu) xp /8b 0x1fc0cfc0 1fc0cfc0: 0x94 0x57 0x77 0xf5 0xff 0xff 0xff 0xff The "jump" in steal time happens when the guest is resumed on the receiving side. I've also been able to consistently reproduce this on a Ganeti cluster at work, using QEMU 2.1.3 and kernels 3.16 and 4.0 in the guests. The issue goes away if I disable the steal time MSR using `-cpu qemu64,-kvm_steal_time`. So, it looks to me as if the steal time MSR is not set/copied properly during live migration, although AFAICT this should be the case after 917367aa968fd4fef29d340e0c7ec8c608dffaab. After investigating a bit more, it looks like the issue comes from an overflow in the kernel's accumulate_steal_time() (arch/x86/kvm/x86.c:2023): static void accumulate_steal_time(struct kvm_vcpu *vcpu) { u64 delta; if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; delta = current->sched_info.run_delay - vcpu->arch.st.last_steal; Using systemtap with the attached script to trace KVM execution on the receiving host kernel, we can see that shortly before marking the vCPUs as runnable on a migrated KVM instance with 2 vCPUs, the following happens (** marks lines of interest): ** 0 qemu-system-x86(18446): kvm_arch_vcpu_load: run_delay=7856949 ns st
[Qemu-devel] qemu 2.4 machine + kernel 4.2 + kvm : freeze & cpu 100% at start on some hardware (maybe KVM_CAP_X86_SMM related)
Hi, I'm currently implementing qemu 2.4 for proxmox hypervisors, and a lot of users have reported qemu freeze with cpu at 100% when starting. Connecting with vnc display : "qemu guest has not initialized the display yet" Similar bug report here : https://lacyc3.eu/qemu-guest-has-not-initialized-the-display This does not occur on all hardware, for example it freeze on dell powerege r710 (xeon E5540), but not on dell r630 (CPU E5-2687W v3 @ 3.10GHz) or very old dell poweredge 2950 (xeon 5110 @ 1.60GHz). This is only with qemu 2.4 + kernel 4.2 (kernel 4.1 works fine) + kvm not working command line - /usr/bin/kvm chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -name test -cpu kvm64 -m 4096 -machine pc-i440fx-2.4 working command line - qemu 2.4 + kvm + compat 2.3 profil: /usr/bin/kvm chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -name test -cpu kvm64 -m 4096 -machine pc-i440fx-2.3 qemu 2.4 without kvm: /usr/bin/kvm chardev socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -name test -cpu kvm64 -m 4096 -machine accel=tcg,type=pc-i440fx-2.4 So it's working with qemu 2.4 + machine 2.3 compat profil. Looking at the code: static void pc_compat_2_3(MachineState *machine) { PCMachineState *pcms = PC_MACHINE(machine); savevm_skip_section_footers(); if (kvm_enabled()) { pcms->smm = ON_OFF_AUTO_OFF; } global_state_set_optional(); savevm_skip_configuration(); } If I comment //pcms->smm = ON_OFF_AUTO_OFF; I have the same freeze too. So,it's seem to come from somewhere in bool pc_machine_is_smm_enabled(PCMachineState *pcms) { bool smm_available = false; if (pcms->smm == ON_OFF_AUTO_OFF) { return false; } if (tcg_enabled() || qtest_enabled()) { smm_available = true; } else if (kvm_enabled()) { smm_available = kvm_has_smm();>> maybe here ? } if (smm_available) { return true; } if (pcms->smm == ON_OFF_AUTO_ON) { error_report("System Management Mode not supported by this hypervisor."); exit(1); } return false; } bool kvm_has_smm(void) { return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM); } I'm not sure if it's a qemu bug or kernel/kvm bug. Help is welcome. Regards, Alexandre Derumier
Re: [Qemu-devel] [wiki] New wiki page - vhost-user setup with ovs/dpdk backend
Hi, Ovs documentation say that 1GB hugepage are needed https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK.md is is true ? (as the wiki say 2M hugepages) - Mail original - De: "Star Chang" À: "Marcel Apfelbaum" Cc: "qemu-devel" Envoyé: Mardi 22 Septembre 2015 07:43:13 Objet: Re: [Qemu-devel] [wiki] New wiki page - vhost-user setup with ovs/dpdk backend Thanks Marcel!!! After tailing the logs and basic troubleshooting, we finally can have 2 VMs to ping to each other. One dpdkvhostuser interface is for each VM and we have to new add flows based on dpdkvhostuser ofport numbers. Another issue we might post it to proper mail loop is that we run pktgen-dpdk 2.9.1 inside of VM but no packets counted from pktgen terminal display. It is so weird that we only see 128 tx packets bursted out and then nothing is continue working. After that, we try to dump-flows via ovs-ofctl utility but it shows "connection refued". I am not sure have any one of you guys ideas here? Thanks!!! Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "dpdk1" Interface "dpdk1" type: dpdk Port "dpdk0" Interface "dpdk0" type: dpdk Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuser On Tue, Sep 22, 2015 at 2:34 AM, Marcel Apfelbaum < mar...@redhat.com > wrote: On 09/21/2015 11:25 AM, Marcel Apfelbaum wrote: BQ_BEGIN On 09/21/2015 08:01 AM, Star Chang wrote: BQ_BEGIN Hi Marcel: Many thanks for you to contribute wiki page in how to configuring vhost-user type with openvsitch/qemu in VM environment. We bring 2 VMs up with vhost-user type. We can see eth0 interfaces created in 2 VMs with proper mac address we assign. After IP address assignment, 2 VMs could not PING to each other when they are in the same subnet. There are not packets at all in count when running ovs-ofctl dump-ports :( However, we check link up state via ovs-ofctl utility but LINK_DOWN as below. Have any one with experiences and give some helps … Thanks!! Hi, I would look into OVS log for specific issues: journalctl --since `date +%T --date="-10 minutes"` The above command will show you the last 10 minutes log. You are welcomed to show us the log. BQ_END I forgot to CC the list. Thanks, Marcel BQ_BEGIN Thanks, Marcel BQ_BEGIN $ sudo ./utilities/ovs-ofctl show ovsbr0 OFPT_FEATURES_REPLY (xid=0x2): dpid:0670da615e4a n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(vhost-user1): addr:00:00:00:00:00:00 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 2(vhost-user2): addr:00:00:00:00:00:00 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max LOCAL(ovsbr0): addr:06:70:da:61:5e:4a config: PORT_DOWN state: LINK_DOWN current: 10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 Thanks, Star -- The future belongs to those who believe in the beauty of their dreams. BQ_END BQ_END BQ_END -- The future belongs to those who believe in the beauty of their dreams.
[Qemu-devel] [Bug 1494350] Re: QEMU: causes vCPU steal time overflow on live migration
Hi, I confirm this bug, I have seen this a lot of time with debian jessie (kernel 3.16) and ubuntu (kernel 4.X) with qemu 2.2 and qemu 2.3 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1494350 Title: QEMU: causes vCPU steal time overflow on live migration Status in QEMU: New Bug description: I'm pasting in text from Debian Bug 785557 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=785557 b/c I couldn't find this issue reported. It is present in QEMU 2.3, but I haven't tested later versions. Perhaps someone else will find this bug and confirm for later versions. (Or I will when I have time!) Hi, I'm trying to debug an issue we're having with some debian.org machines running in QEMU 2.1.2 instances (see [1] for more background). In short, after a live migration guests running Debian Jessie (linux 3.16) stop accounting CPU time properly. /proc/stat in the guest shows no increase in user and system time anymore (regardless of workload) and what stands out are extremely large values for steal time: % cat /proc/stat cpu 2400 0 1842 650879168 2579640 0 25 136562317270 0 0 cpu0 1366 0 1028 161392988 1238598 0 11 383803090749 0 0 cpu1 294 0 240 162582008 639105 0 8 39686436048 0 0 cpu2 406 0 338 163331066 383867 0 4 333994238765 0 0 cpu3 332 0 235 163573105 318069 0 1 1223752959076 0 0 intr 355773871 33 10 0 0 0 0 3 0 1 0 0 36 144 0 0 1638612 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5001741 41 0 8516993 0 3669582 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ctxt 837862829 btime 1431642967 processes 8529939 procs_running 1 procs_blocked 0 softirq 225193331 2 77532878 172 7250024 819289 0 54 33739135 176552 105675225 Reading the memory pointed to by the steal time MSRs pre- and post-migration, I can see that post-migration the high bytes are set to 0xff: (qemu) xp /8b 0x1fc0cfc0 1fc0cfc0: 0x94 0x57 0x77 0xf5 0xff 0xff 0xff 0xff The "jump" in steal time happens when the guest is resumed on the receiving side. I've also been able to consistently reproduce this on a Ganeti cluster at work, using QEMU 2.1.3 and kernels 3.16 and 4.0 in the guests. The issue goes away if I disable the steal time MSR using `-cpu qemu64,-kvm_steal_time`. So, it looks to me as if the steal time MSR is not set/copied properly during live migration, although AFAICT this should be the case after 917367aa968fd4fef29d340e0c7ec8c608dffaab. After investigating a bit more, it looks like the issue comes from an overflow in the kernel's accumulate_steal_time() (arch/x86/kvm/x86.c:2023): static void accumulate_steal_time(struct kvm_vcpu *vcpu) { u64 delta; if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED)) return; delta = current->sched_info.run_delay - vcpu->arch.st.last_steal; Using systemtap with the attached script to trace KVM execution on the receiving host kernel, we can see that shortly before marking the vCPUs as runnable on a migrated KVM instance with 2 vCPUs, the following happens (** marks lines of interest): ** 0 qemu-system-x86(18446): kvm_arch_vcpu_load: run_delay=7856949 ns steal=7856949 ns 0 qemu-system-x86(18446): -> kvm_arch_vcpu_load 0 vhost-18446(18447): -> kvm_arch_vcpu_should_kick 5 vhost-18446(18447): <- kvm_arch_vcpu_should_kick 23 qem
Re: [Qemu-devel] [Bug 1488363] Re: qemu 2.4.0 hangs using vfio for pci passthrough of graphics card
Hi, proxmox users report same bug here with qemu 2.4: http://forum.proxmox.com/threads/23346-Proxmox-4b1-q35-machines-failing-to-reboot-problems-with-PCI-passthrough we are going to test with reverting the commit to see if it's help. - Mail original - De: "Peter Maloney" <1488...@bugs.launchpad.net> À: "qemu-devel" Envoyé: Mercredi 26 Août 2015 21:48:16 Objet: [Qemu-devel] [Bug 1488363] Re: qemu 2.4.0 hangs using vfio for pci passthrough of graphics card I ran a bisect, and here's the result: b8eb5512fd8a115f164edbbe897cdf8884920ccb is the first bad commit commit b8eb5512fd8a115f164edbbe897cdf8884920ccb Author: Nadav Amit Date: Mon Apr 13 02:32:08 2015 +0300 target-i386: disable LINT0 after reset Due to old Seabios bug, QEMU reenable LINT0 after reset. This bug is long gone and therefore this hack is no longer needed. Since it violates the specifications, it is removed. Signed-off-by: Nadav Amit Message-Id: <1428881529-29459-2-git-send-email-na...@cs.technion.ac.il> Signed-off-by: Paolo Bonzini :04 04 a8ec76841b8d4e837c2cd0d0b82e08c0717a0ec6 d33744231c98c9f588cefbc92f416183f639706f M hw $ git diff 7398dfc7799a50097803db4796c7edb6cd7d47a1 b8eb5512fd8a115f164edbbe897cdf8884920ccb diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c index 042e960..d38d24b 100644 --- a/hw/intc/apic_common.c +++ b/hw/intc/apic_common.c @@ -243,15 +243,6 @@ static void apic_reset_common(DeviceState *dev) info->vapic_base_update(s); apic_init_reset(dev); - - if (bsp) { - /* - * LINT0 delivery mode on CPU #0 is set to ExtInt at initialization - * time typically by BIOS, so PIC interrupt can be delivered to the - * processor when local APIC is enabled. - */ - s->lvt[APIC_LVT_LINT0] = 0x700; - } } /* This function is only used for old state version 1 and 2 */ And then to confirm it: git checkout v2.4.0 git revert b8eb5512fd8a115f164edbbe897cdf8884920ccb And this build works. :) -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1488363 Title: qemu 2.4.0 hangs using vfio for pci passthrough of graphics card Status in QEMU: New Bug description: 2.3.0 (manjaro distro package) works fine. 2.4.0 (manjaro or the arch vanilla one) hangs on the SeaBIOS screen when saying "Press F12 for boot menu". All tested with the same hardware, OS, command and configuration. It also starts without the GPU passed through, even with the USB passed through. I am using the latest SeaBIOS 1.8.2. The release notes say: VFIO Support for resetting AMD Bonaire and Hawaii GPUs Platform device passthrough support for Calxeda xgmac devices So maybe something there broke it. I am using the arch qemu 2.4.0 PKGBUILD (modified to have make -j8 and removed iscsi, gluster, ceph, etc.), which uses vanilla sources and no patches. https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/qemu I am not using a frontend. I am using a script I wrote that generates the command below. Guest OS here would be 64 bit windows 7, but it didn't start so that's not relevant. Also a Manjaro Linux VM won't start. CPU is AMD FX-8150; board is Gigabyte GA-990FXA-UD5 (990FX chipset). full command line (without the \ after each line) is: qemu-system-x86_64 -enable-kvm -M q35 -m 3584 -cpu host -boot c -smp 7,sockets=1,cores=7,threads=1 -vga none -device ioh3420,bus=pcie.0,addr=1c.0,port=1,chassis=1,id=root.1 -device vfio-pci,host=04:00.0,bus=root.1,multifunction=on,x-vga=on,addr=0.0,romfile=Sapphire.R7260X.1024.131106.rom -device vfio-pci,host=00:14.2,bus=pcie.0 -device vfio-pci,host=00:16.0,bus=root.1 -device vfio-pci,host=00:16.2,bus=root.1 -usb -device ahci,bus=pcie.0,id=ahci -drive file=/dev/data/vm1,id=disk1,format=raw,if=virtio,index=0,media=disk,discard=on -drive media=cdrom,id=cdrom,index=5,media=cdrom -netdev type=tap,id=net0,ifname=tap-vm1 -device virtio-net-pci,netdev=net0,mac=00:01:02:03:04:05 -monitor stdio -boot menu=on $ lspci -nn | grep -E "04:00.0|00:14.2|00:16.0|00:16.2" 00:14.2 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 Azalia (Intel HDA) [1002:4383] (rev 40) 00:16.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB OHCI0 Controller [1002:4397] 00:16.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] SB7x0/SB8x0/SB9x0 USB EHCI Controller [1002:4396] 04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XTX [Radeon R7 260X] [1002:6658] Also I have this one that also hangs: 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Juniper XT [Radeon HD 6770] [1002:68ba] To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1488363/+subscriptions
Re: [Qemu-devel] Steal time MSR not set properly during live migration?
Hi, I have add this bug today on 3 debian jessie guests (kernel 3.16), after migration from qemu 2.3 to qemu 2.4. Is it a qemu bug or guest kernel 3.16 ? Regards, Alexandre Derumier - Mail original - De: "Michael Tokarev" À: "qemu-devel" , debian-ad...@lists.debian.org Cc: "Marcelo Tosatti" , "Gleb Natapov" Envoyé: Jeudi 11 Juin 2015 23:42:02 Objet: Re: [Qemu-devel] Steal time MSR not set properly during live migration? 11.06.2015 23:46, Apollon Oikonomopoulos wrote: > On 15:12 Wed 03 Jun , Apollon Oikonomopoulos wrote: >> Any ideas? > > As far as I understand, there is an issue when reading the MSR on the > incoming side: there is a KVM_SET_MSRS vcpu ioctl issued by the main > thread during initialization, that causes the initial vCPU steal time > value to be set using the main thread's (and not the vCPU thread's) > run_delay. Then, upon resuming execution, kvm_arch_load_vcpu uses the > vCPU thread's run_delay to determine steal time, causing an overflow. > The issue was introduced by commit > 917367aa968fd4fef29d340e0c7ec8c608dffaab. > > For the full analysis, see https://bugs.debian.org/785557#64 and the > followup e-mail. Adding Cc's... Thanks, /mjt
Re: [Qemu-devel] qemu 2.4rc2 : iothread hanging vm (2.4rc1 works fine)
>>Stefan has a set of patches that fix it. Yes, I just see it. I'll them today. Thanks ! - Mail original - De: "pbonzini" À: "aderumier" , "qemu-devel" Envoyé: Mercredi 29 Juillet 2015 09:39:03 Objet: Re: qemu 2.4rc2 : iothread hanging vm (2.4rc1 works fine) On 29/07/2015 06:50, Alexandre DERUMIER wrote: > seem to come from this commit: > > http://git.qemu.org/?p=qemu.git;a=commit;h=eabc977973103527bbb8fed69c91cfaa6691f8ab > > "AioContext: fix broken ctx->dispatching optimization" Stefan has a set of patches that fix it. Paolo
Re: [Qemu-devel] qemu 2.4rc2 : iothread hanging vm (2.4rc1 works fine)
seem to come from this commit: http://git.qemu.org/?p=qemu.git;a=commit;h=eabc977973103527bbb8fed69c91cfaa6691f8ab "AioContext: fix broken ctx->dispatching optimization" - Mail original - De: "aderumier" À: "qemu-devel" Envoyé: Mercredi 29 Juillet 2015 06:15:48 Objet: [Qemu-devel] qemu 2.4rc2 : iothread hanging vm (2.4rc1 works fine) Hi, since qemu 2.4rc2, when I'm using iothreads, the vm is hanging, qmp queries are not working, vnc not working. (qemu process don't crash). qemu 2.4 rc1 works fine. (don't have bisect it yet) qemu command line: qemu -chardev socket,id=qmp,path=/var/run/qemu-server/150.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/150.vnc,x509,password -pidfile /var/run/qemu-server/150.pid -daemonize -smbios type=1,manufacturer=dell,version=1,product=3,uuid=f0686bfb-50b8-4d31-a4cb-b1cf60eeb648 -name debianok -smp 2,sockets=2,cores=1,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k fr -object iothread,id=iothread-virtio0 -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian.01.24b0d01a62a3 -drive file=/var/lib/vz/images/150/vm-150-disk-1.raw,if=none,id=drive-virtio0,cache=none,format=raw,aio=native,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100 Regards, Alexandre
[Qemu-devel] qemu 2.4rc2 : iothread hanging vm (2.4rc1 works fine)
Hi, since qemu 2.4rc2, when I'm using iothreads, the vm is hanging, qmp queries are not working, vnc not working. (qemu process don't crash). qemu 2.4 rc1 works fine. (don't have bisect it yet) qemu command line: qemu -chardev socket,id=qmp,path=/var/run/qemu-server/150.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/150.vnc,x509,password -pidfile /var/run/qemu-server/150.pid -daemonize -smbios type=1,manufacturer=dell,version=1,product=3,uuid=f0686bfb-50b8-4d31-a4cb-b1cf60eeb648 -name debianok -smp 2,sockets=2,cores=1,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 8192 -k fr -object iothread,id=iothread-virtio0 -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian.01.24b0d01a62a3 -drive file=/var/lib/vz/images/150/vm-150-disk-1.raw,if=none,id=drive-virtio0,cache=none,format=raw,aio=native,detect-zeroes=on -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,iothread=iothread-virtio0,bootindex=100 Regards, Alexandre
Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan
Hi again, I have redone a lot of tests, with raw on nfs --- Patch 3/3, fix my problem (not apply patch 1/3 and patch 2/3). without patch 3/3, I'm seeing a lot of lseek, can take some minutes with guest hang. with patch 3/3, it almost take no time to generate the bitmap, no guest hang . (I have tested with differents raw files, different sizes) with raw on ext3 Here, this hurt a lot. Not with every disk, but with some disks is totaly hanging the guest --- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 4387 root 20 0 8985788 288900 12532 R 99.3 1.8 2:50.82 qemu 10.27% [kernel] [k] ext4_es_find_delayed_extent_range 9.64% [kernel] [k] _raw_read_lock 9.17% [kernel] [k] ext4_es_lookup_extent 6.63% [kernel] [k] down_read 5.49% [kernel] [k] ext4_map_blocks 4.82% [kernel] [k] up_read 3.09% [kernel] [k] ext4_ind_map_blocks 2.58% [kernel] [k] ext4_block_to_path.isra.9 2.18% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.99% [kernel] [k] vmx_get_segment 1.99% [kernel] [k] ext4_llseek 1.62% [kernel] [k] ext4_get_branch 1.43% [kernel] [k] vmx_segment_access_rights.isra.28.part.29 1.23% [kernel] [k] x86_decode_insn 1.00% [kernel] [k] copy_user_generic_string 0.83% [kernel] [k] x86_emulate_instruction 0.81% [kernel] [k] rmode_segment_valid 0.78% [kernel] [k] gfn_to_hva_prot 0.74% [kernel] [k] __es_tree_search 0.73% [kernel] [k] do_insn_fetch 0.70% [kernel] [k] vmx_vcpu_run 0.69% [kernel] [k] vmx_read_guest_seg_selector 0.67% [kernel] [k] vmcs_writel 0.64% [kernel] [k] init_emulate_ctxt 0.62% [kernel] [k] native_write_msr_safe 0.61% [kernel] [k] kvm_apic_has_interrupt 0.57% [kernel] [k] vmx_handle_external_intr 0.57% [kernel] [k] x86_emulate_insn 0.56% [kernel] [k] vmx_handle_exit 0.50% [kernel] [k] native_read_tsc 0.47% [kernel] [k] _cond_resched 0.46% [kernel] [k] decode_operand 0.44% [kernel] [k] __srcu_read_lock 0.41% [kernel] [k] __linearize.isra.25 0.41% [kernel] [k] vmx_read_l1_tsc 0.39% [kernel] [k] clear_page_c 0.39% [kernel] [k] vmx_get_interrupt_shadow 0.36% [kernel] [k] vmx_set_interrupt_shadow Seem to be a known kernel bug: http://qemu.11.n7.nabble.com/Re-2-Strange-problems-with-lseek-in-qemu-img-map-td334414.html http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=14516bb7bb6ffbd49f35389f9ece3b2045ba5815 So, I think patch 3/3 is enough. (could be great to have it for qemu 2.4) Thanks again Fam for your hard work. Regards, Alexandre - Mail original - De: "aderumier" À: "Fam Zheng" Cc: "Kevin Wolf" , "Stefan Hajnoczi" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Vendredi 10 Juillet 2015 12:36:40 Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan >>Does it completely hang? yes, can't ping network, and vnc console is frozen. >>What does "perf top" show? I'll do test today . (I'm going to vacation this night,I'll try to send results before that) - Mail original - De: "Fam Zheng" À: "aderumier" Cc: "Stefan Hajnoczi" , "Kevin Wolf" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Vendredi 10 Juillet 2015 09:13:33 Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan On Fri, 07/10 08:54, Alexandre DERUMIER wrote: > >>Thinking about this again, I doubt > >>that lengthening the duration with a hardcoded value benifits everyone; and > >>before Alexandre's reply on old server/slow disks > > With 1ms sleep, I can reproduce the hang 100% with a fast cpu (xeon e5 v3 > 3,1ghz) and source raw file on nfs. Does it completely hang? I can't reproduce this with my machine mirroring from nfs to local: the guest runs smoothly. What does "perf top" show? Fam > > > - Mail original - > De: "Fam Zheng" > À: "Stefan Hajnoczi" > Cc: "Kevin Wolf" , "qemu-devel" , > qemu-bl...@nongnu.org > Env
Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan
>>Does it completely hang? yes, can't ping network, and vnc console is frozen. >>What does "perf top" show? I'll do test today . (I'm going to vacation this night,I'll try to send results before that) - Mail original - De: "Fam Zheng" À: "aderumier" Cc: "Stefan Hajnoczi" , "Kevin Wolf" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Vendredi 10 Juillet 2015 09:13:33 Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan On Fri, 07/10 08:54, Alexandre DERUMIER wrote: > >>Thinking about this again, I doubt > >>that lengthening the duration with a hardcoded value benifits everyone; and > >>before Alexandre's reply on old server/slow disks > > With 1ms sleep, I can reproduce the hang 100% with a fast cpu (xeon e5 v3 > 3,1ghz) and source raw file on nfs. Does it completely hang? I can't reproduce this with my machine mirroring from nfs to local: the guest runs smoothly. What does "perf top" show? Fam > > > - Mail original - > De: "Fam Zheng" > À: "Stefan Hajnoczi" > Cc: "Kevin Wolf" , "qemu-devel" , > qemu-bl...@nongnu.org > Envoyé: Vendredi 10 Juillet 2015 08:43:50 > Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest > responsiveness during bitmap scan > > On Thu, 07/09 14:02, Stefan Hajnoczi wrote: > > This patch only converts the mirror block job to use the new relax > > function. The other block jobs (stream, backup, commit) are still using > > a 0 ns delay and are therefore broken. They should probably be > > converted in the same series. > > That's because they all can be perfectly mitigated by setting a reasonable > "speed" that matchs the host horsepower. Thinking about this again, I doubt > that lengthening the duration with a hardcoded value benifits everyone; and > before Alexandre's reply on old server/slow disks, I don't recall any report, > because the coroutines would already yield often enough, when waiting for IO > to > complete. So I am not sure whether they're broken in practice. > > Fam
Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan
>>Thinking about this again, I doubt >>that lengthening the duration with a hardcoded value benifits everyone; and >>before Alexandre's reply on old server/slow disks With 1ms sleep, I can reproduce the hang 100% with a fast cpu (xeon e5 v3 3,1ghz) and source raw file on nfs. - Mail original - De: "Fam Zheng" À: "Stefan Hajnoczi" Cc: "Kevin Wolf" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Vendredi 10 Juillet 2015 08:43:50 Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan On Thu, 07/09 14:02, Stefan Hajnoczi wrote: > This patch only converts the mirror block job to use the new relax > function. The other block jobs (stream, backup, commit) are still using > a 0 ns delay and are therefore broken. They should probably be > converted in the same series. That's because they all can be perfectly mitigated by setting a reasonable "speed" that matchs the host horsepower. Thinking about this again, I doubt that lengthening the duration with a hardcoded value benifits everyone; and before Alexandre's reply on old server/slow disks, I don't recall any report, because the coroutines would already yield often enough, when waiting for IO to complete. So I am not sure whether they're broken in practice. Fam
Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: Introduce block_job_relax_cpu
Hi Stefan, >>By the way, why did you choose 10 milliseconds? That is quite long. >> >>If this function is called once per 10 ms disk I/O operations then we >>lose 50% utilization. 1 ms or less would be reasonable. >From my tests, 1ms is not enough, It still hanging in guest or qmp queries. 10ms give me optimal balance between bitmap scan speed and guest responsiveness. I don't known if this can be compute automaticaly ? (maybe it's depend of disk lseek speed and cpu speed). - Mail original - De: "Stefan Hajnoczi" À: "Fam Zheng" Cc: "Kevin Wolf" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Jeudi 9 Juillet 2015 14:54:55 Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 1/3] blockjob: Introduce block_job_relax_cpu On Thu, Jul 09, 2015 at 11:47:56AM +0800, Fam Zheng wrote: > block_job_sleep_ns is called by block job coroutines to yield the > execution to VCPU threads and monitor etc. It is pointless to sleep for > 0 or a few nanoseconds, because that equals to a "yield + enter" with no > intermission in between (the timer fires immediately in the same > iteration of event loop), which means other code still doesn't get a > fair share of main loop / BQL. > > Introduce block_job_relax_cpu which will at least for > BLOCK_JOB_RELAX_CPU_NS. Existing block_job_sleep_ns(job, 0) callers can > be replaced by this later. > > Reported-by: Alexandre DERUMIER > Signed-off-by: Fam Zheng > --- > include/block/blockjob.h | 16 > 1 file changed, 16 insertions(+) > > diff --git a/include/block/blockjob.h b/include/block/blockjob.h > index 57d8ef1..53ac4f4 100644 > --- a/include/block/blockjob.h > +++ b/include/block/blockjob.h > @@ -157,6 +157,22 @@ void *block_job_create(const BlockJobDriver *driver, > BlockDriverState *bs, > */ > void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns); > > +#define BLOCK_JOB_RELAX_CPU_NS 1000L By the way, why did you choose 10 milliseconds? That is quite long. If this function is called once per 10 ms disk I/O operations then we lose 50% utilization. 1 ms or less would be reasonable. > + > +/** > + * block_job_relax_cpu: > + * @job: The job that calls the function. > + * > + * Sleep a little to avoid intensive cpu time occupation. Block jobs should > + * call this or block_job_sleep_ns (for more precision, but note that 0 ns > is > + * usually not enought) periodically, otherwise the QMP and VCPU could > starve s/enought/enough/ > + * on CPU and/or BQL. > + */ > +static inline void block_job_relax_cpu(BlockJob *job) coroutine_fn is missing.
Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan
>>The other block jobs (stream, backup, commit) are still using >>a 0 ns delay and are therefore broken. Also in mirror.c, they are block_job_sleep_ns with delay_ns, where delay_ns can be 0. I have seen sometime,some server/qmp hangs, on a pretty old server/slow disks. (no more in bitmap scan with theses last patches, but on the block mirror itself) if (!s->synced) { block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, delay_ns); if (block_job_is_cancelled(&s->common)) { break; } } else if (!should_complete) { delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0); - Mail original - De: "Stefan Hajnoczi" À: "Fam Zheng" Cc: "Kevin Wolf" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Jeudi 9 Juillet 2015 15:02:08 Objet: Re: [Qemu-devel] [Qemu-block] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan On Thu, Jul 09, 2015 at 11:47:55AM +0800, Fam Zheng wrote: > This supersedes: > > http://patchwork.ozlabs.org/patch/491415/ > > and [1] which is currently in Jeff's tree. > > Although [1] fixed the QMP responsiveness, Alexandre DERUMIER reported that > guest responsiveness still suffers when we are busy in the initial dirty > bitmap > scanning loop of mirror job. That is because 1) we issue too many lseeks; 2) > we > only sleep for 0 ns which turns out quite ineffective in yielding BQL to vcpu > threads. Both are fixed. > > To reproduce: start a guest, attach a 10G raw image, then mirror it. Your > guest will immediately start to stutter (with patch [1] testing on a local > ext4 > raw image, and "while echo -n .; do sleep 0.05; done" in guest console). > > This series adds block_job_relax_cpu as suggested by Stefan Hajnoczi and uses > it in mirror job; and lets bdrv_is_allocated_above return a larger p_num as > suggested by Paolo Bonzini (although it's done without changing the API). > > [1]: http://patchwork.ozlabs.org/patch/471656/ "block/mirror: Sleep > periodically during bitmap scanning" > > Fam Zheng (3): > blockjob: Introduce block_job_relax_cpu > mirror: Use block_job_relax_cpu during bitmap scanning > mirror: Speed up bitmap initial scanning > > block/mirror.c | 17 +++-- > include/block/blockjob.h | 16 > 2 files changed, 23 insertions(+), 10 deletions(-) This patch only converts the mirror block job to use the new relax function. The other block jobs (stream, backup, commit) are still using a 0 ns delay and are therefore broken. They should probably be converted in the same series.
Re: [Qemu-devel] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan
Hi, I have tested them, they works fine here (tested with source :raw on nfs, raw on ext3, and ceph/rbd device). Patch 3 give me a big boost with an empty 10GB source raw on ext3. (take some seconds vs some minutes) - Mail original - De: "Fam Zheng" À: "qemu-devel" Cc: "Kevin Wolf" , "Jeff Cody" , qemu-bl...@nongnu.org Envoyé: Jeudi 9 Juillet 2015 05:47:55 Objet: [Qemu-devel] [PATCH 0/3] mirror: Fix guest responsiveness during bitmap scan This supersedes: http://patchwork.ozlabs.org/patch/491415/ and [1] which is currently in Jeff's tree. Although [1] fixed the QMP responsiveness, Alexandre DERUMIER reported that guest responsiveness still suffers when we are busy in the initial dirty bitmap scanning loop of mirror job. That is because 1) we issue too many lseeks; 2) we only sleep for 0 ns which turns out quite ineffective in yielding BQL to vcpu threads. Both are fixed. To reproduce: start a guest, attach a 10G raw image, then mirror it. Your guest will immediately start to stutter (with patch [1] testing on a local ext4 raw image, and "while echo -n .; do sleep 0.05; done" in guest console). This series adds block_job_relax_cpu as suggested by Stefan Hajnoczi and uses it in mirror job; and lets bdrv_is_allocated_above return a larger p_num as suggested by Paolo Bonzini (although it's done without changing the API). [1]: http://patchwork.ozlabs.org/patch/471656/ "block/mirror: Sleep periodically during bitmap scanning" Fam Zheng (3): blockjob: Introduce block_job_relax_cpu mirror: Use block_job_relax_cpu during bitmap scanning mirror: Speed up bitmap initial scanning block/mirror.c | 17 +++-- include/block/blockjob.h | 16 2 files changed, 23 insertions(+), 10 deletions(-) -- 2.4.3
[Qemu-devel] -cpu host, enforce fail on amd processor (CPUID.80000001H:EDX)
Hi, I'm currently testing -cpu host,enforce, and it's failing to start on amd processors (tested with opteron 61XX,opteron 63xx,FX-6300 and FX-9590) Is it expected ? warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 0] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 1] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 2] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 3] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 4] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 5] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 6] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 7] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 8] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 9] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 12] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 13] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 14] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 15] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 16] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 17] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 23] warning: host doesn't support requested feature: CPUID.8001H:EDX [bit 24]
Re: [Qemu-devel] [PATCH] blockjob: Don't sleep too short
Works fine here, Thanks ! - Mail original - De: "Fam Zheng" À: "qemu-devel" Cc: "Kevin Wolf" , "Jeff Cody" , qemu-bl...@nongnu.org, mre...@redhat.com, js...@redhat.com, "aderumier" , "stefanha" , "pbonzini" Envoyé: Lundi 6 Juillet 2015 05:28:11 Objet: [PATCH] blockjob: Don't sleep too short block_job_sleep_ns is called by block job coroutines to yield the execution to VCPU threads and monitor etc. It is pointless to sleep for 0 or a few nanoseconds, because that equals to a "yield + enter" with no intermission in between (the timer fires immediately in the same iteration of event loop), which means other code still doesn't get a fair share of main loop / BQL. Trim the sleep duration with a minimum value. Reported-by: Alexandre DERUMIER Signed-off-by: Fam Zheng --- blockjob.c | 2 ++ include/block/blockjob.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/blockjob.c b/blockjob.c index ec46fad..b17ed1f 100644 --- a/blockjob.c +++ b/blockjob.c @@ -238,6 +238,8 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns) return; } + ns = MAX(ns, BLOCK_JOB_SLEEP_NS_MIN); + job->busy = false; if (block_job_is_paused(job)) { qemu_coroutine_yield(); diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 57d8ef1..3deb731 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -146,11 +146,13 @@ void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs, int64_t speed, BlockCompletionFunc *cb, void *opaque, Error **errp); +#define BLOCK_JOB_SLEEP_NS_MIN 1000L /** * block_job_sleep_ns: * @job: The job that calls the function. * @clock: The clock to sleep on. - * @ns: How many nanoseconds to stop for. + * @ns: How many nanoseconds to stop for. It sleeps at least + * for BLOCK_JOB_SLEEP_NS_MIN ns, even if a smaller value is specified. * * Put the job to sleep (assuming that it wasn't canceled) for @ns * nanoseconds. Canceling the job will interrupt the wait immediately. -- 2.4.3
Re: [Qemu-devel] [PATCH] block/mirror: Sleep periodically during bitmap scanning
Hi, I'm currently testing this patch, and it still hanging for me (mirroring a raw file from nfs) I just wonder why block_job_sleep_ns is set to 0 ? +block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 0); I have tried to increasing it to a bigger value +block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 1); And now it's working fine for me, no more qemu hang - Mail original - De: "Jeff Cody" À: "Fam Zheng" Cc: "Kevin Wolf" , "pbonzini" , "qemu-devel" , qemu-bl...@nongnu.org Envoyé: Mardi 16 Juin 2015 06:06:25 Objet: Re: [Qemu-devel] [PATCH] block/mirror: Sleep periodically during bitmap scanning On Wed, May 13, 2015 at 11:11:13AM +0800, Fam Zheng wrote: > Before, we only yield after initializing dirty bitmap, where the QMP > command would return. That may take very long, and guest IO will be > blocked. > > Add sleep points like the later mirror iterations. > > Signed-off-by: Fam Zheng > --- > block/mirror.c | 13 - > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 1a1d997..baed225 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -467,11 +467,23 @@ static void coroutine_fn mirror_run(void *opaque) > sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS; > mirror_free_init(s); > > + last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > if (!s->is_none_mode) { > /* First part, loop on the sectors and initialize the dirty bitmap. */ > BlockDriverState *base = s->base; > for (sector_num = 0; sector_num < end; ) { > int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1; > + int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > + > + if (now - last_pause_ns > SLICE_TIME) { > + last_pause_ns = now; > + block_job_sleep_ns(&s->common, QEMU_CLOCK_REALTIME, 0); > + } > + > + if (block_job_is_cancelled(&s->common)) { > + goto immediate_exit; > + } > + > ret = bdrv_is_allocated_above(bs, base, > sector_num, next - sector_num, &n); > > @@ -490,7 +502,6 @@ static void coroutine_fn mirror_run(void *opaque) > } > > bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); > - last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > for (;;) { > uint64_t delay_ns = 0; > int64_t cnt; > -- > 2.4.0 > Thanks, applied to my block tree: https://github.com/codyprime/qemu-kvm-jtc/commits/block Jeff
Re: [Qemu-devel] [Qemu-stable] [PATCH v7 0/8] block: Mirror discarded sectors
>>With nfs as source storage, it's really slow currently (lseek slow + a lot of >>nfs ops). it's blocked around 30min for 300GB, with a raw file on a netapp san array through nfs. - Mail original - De: "aderumier" À: "Fam Zheng" Cc: "Kevin Wolf" , qemu-bl...@nongnu.org, "Jeff Cody" , "qemu-devel" , "qemu-stable" , "stefanha" , "pbonzini" , js...@redhat.com, wangxiaol...@ucloud.cn Envoyé: Vendredi 26 Juin 2015 15:36:10 Objet: Re: [Qemu-devel] [Qemu-stable] [PATCH v7 0/8] block: Mirror discarded sectors Hi, >>There is no problem, the observasion by Andrey was just that qmp command >>takes >>a few minutes before returning, because he didn't apply >> >>https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02511.html Is this patch already apply on the block tree ? With nfs as source storage, it's really slow currently (lseek slow + a lot of nfs ops). - Mail original - De: "Fam Zheng" À: "pbonzini" Cc: "Kevin Wolf" , qemu-bl...@nongnu.org, "Jeff Cody" , "qemu-devel" , "qemu-stable" , "stefanha" , js...@redhat.com, wangxiaol...@ucloud.cn Envoyé: Jeudi 25 Juin 2015 12:45:38 Objet: Re: [Qemu-devel] [Qemu-stable] [PATCH v7 0/8] block: Mirror discarded sectors On Thu, 06/25 09:02, Fam Zheng wrote: > On Wed, 06/24 19:01, Paolo Bonzini wrote: > > > > > > On 24/06/2015 11:08, Fam Zheng wrote: > > >> Stefan, > > >> > > >> The only controversial patches are the qmp/drive-mirror ones (1-3), > > >> while > > >> patches 4-8 are still useful on their own: they fix the mentioned crash > > >> and > > >> improve iotests. > > >> > > >> Shall we merge the second half (of course none of them depend on 1-3) > > >> now that > > >> softfreeze is approaching? > > > > > > Stefan, would you consider applying patches 4-8? > > > > Actually why not apply all of them? Even if blockdev-mirror is a > > superior interface in the long run, the current behavior of drive-mirror > > can cause images to balloon up to the full size, which is bad. > > Extending drive-mirror is okay IMHO for 2.4. > > > > Before we do that, Andrey Korolyov has reported a hang issue with unmap=true, > I'll take a look at it today. There is no problem, the observasion by Andrey was just that qmp command takes a few minutes before returning, because he didn't apply https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02511.html Fam
Re: [Qemu-devel] [Qemu-stable] [PATCH v7 0/8] block: Mirror discarded sectors
Hi, >>There is no problem, the observasion by Andrey was just that qmp command >>takes >>a few minutes before returning, because he didn't apply >> >>https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02511.html Is this patch already apply on the block tree ? With nfs as source storage, it's really slow currently (lseek slow + a lot of nfs ops). - Mail original - De: "Fam Zheng" À: "pbonzini" Cc: "Kevin Wolf" , qemu-bl...@nongnu.org, "Jeff Cody" , "qemu-devel" , "qemu-stable" , "stefanha" , js...@redhat.com, wangxiaol...@ucloud.cn Envoyé: Jeudi 25 Juin 2015 12:45:38 Objet: Re: [Qemu-devel] [Qemu-stable] [PATCH v7 0/8] block: Mirror discarded sectors On Thu, 06/25 09:02, Fam Zheng wrote: > On Wed, 06/24 19:01, Paolo Bonzini wrote: > > > > > > On 24/06/2015 11:08, Fam Zheng wrote: > > >> Stefan, > > >> > > >> The only controversial patches are the qmp/drive-mirror ones (1-3), > > >> while > > >> patches 4-8 are still useful on their own: they fix the mentioned crash > > >> and > > >> improve iotests. > > >> > > >> Shall we merge the second half (of course none of them depend on 1-3) > > >> now that > > >> softfreeze is approaching? > > > > > > Stefan, would you consider applying patches 4-8? > > > > Actually why not apply all of them? Even if blockdev-mirror is a > > superior interface in the long run, the current behavior of drive-mirror > > can cause images to balloon up to the full size, which is bad. > > Extending drive-mirror is okay IMHO for 2.4. > > > > Before we do that, Andrey Korolyov has reported a hang issue with unmap=true, > I'll take a look at it today. There is no problem, the observasion by Andrey was just that qmp command takes a few minutes before returning, because he didn't apply https://lists.gnu.org/archive/html/qemu-devel/2015-05/msg02511.html Fam
Re: [Qemu-devel] [PATCH] configure: Add support for jemalloc
>>You can try using or modifying this systemtap script: Thanks, I'll check this. If it can help, I have done a small top of qemu libc vs jemalloc vs tcmalloc libc-2.19.so - started --- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 5294 root 20 0 26.953g 596744 18468 S 1.0 0.2 0:16.81 qemu fio running (200kiops) -- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 5294 root 20 0 30.373g 633300 18552 S 2053 0.2 9:37.03 qemu jemalloc started PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1789 root 20 0 11.640g 609224 18812 S 1.0 0.2 0:21.03 qemu fio running (300kiops) -- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1789 root 20 0 12.868g 684924 19144 S 2689 0.3 4:57.96 qemu tcmalloc (16MB) --- started --- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1789 root 20 0 11.640g 609224 18812 S 1.0 0.2 0:21.03 qemu fio running (65k iops) --- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 6900 root 20 0 10.925g 643624 18432 S 1125 0.2 4:18.51 qemu tcmalloc (256MB) --- started --- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 10640 root 20 0 10.117g 584524 18236 S 0.7 0.2 0:16.28 qemu fio running (200k iops) --- PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 10640 root 20 0 10.983g 685136 18984 S 2566 0.3 12:34.73 qemu - Mail original - De: "pbonzini" À: "aderumier" Cc: "qemu-devel" Envoyé: Mardi 23 Juin 2015 10:57:03 Objet: Re: [PATCH] configure: Add support for jemalloc On 23/06/2015 10:22, Alexandre DERUMIER wrote: > > What is the peak memory usage of jemalloc and tcmalloc? > > I'll try to use an heap profiling to see. > Don't known if "perf" can give me the info easily without profiling ? You can try using or modifying this systemtap script: https://sourceware.org/systemtap/examples/process/proctop.stp Paolo
Re: [Qemu-devel] [PATCH] configure: Add support for jemalloc
Hi Paolo, >>What is the peak memory usage of jemalloc and tcmalloc? I'll try to use an heap profiling to see. Don't known if "perf" can give me the info easily without profiling ? - Mail original - De: "pbonzini" À: "aderumier" , "qemu-devel" Envoyé: Mardi 23 Juin 2015 09:57:19 Objet: Re: [PATCH] configure: Add support for jemalloc On 19/06/2015 12:56, Alexandre Derumier wrote: > This adds "--enable-jemalloc" and "--disable-jemalloc" to allow linking > to jemalloc memory allocator. > > We have already tcmalloc support, > but it seem to not working well with a lot of iothreads/disks. > > The main problem is that tcmalloc use a shared thread cache of 16MB > by default. > With more threads, this cache is shared, and some bad garbage collections > can occur if the cache is too low. > > It's possible to tcmalloc cache increase it with a env var: > TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=256MB It's also possible to do MallocExtension_SetNumericProperty("tcmalloc.max_total_thread_cache_bytes", num_io_threads << 24); What is the peak memory usage of jemalloc and tcmalloc? Paolo > With default 16MB, performances are really bad with more than 2 disks. > Increasing to 256MB, it's helping but still have problem with 16 > disks/iothreads. > > Jemalloc don't have performance problem with default configuration. > > Here the benchmark results in iops of 1 qemu vm randread 4K iodepth=32, > with rbd block backend (librbd is doing a lot of memory allocation), > 1 iothread by disk > > glibc malloc > > > 1 disk 29052 > 2 disks 55878 > 4 disks 127899 > 8 disks 240566 > 15 disks 269976 > > jemalloc > > > 1 disk 41278 > 2 disks 75781 > 4 disks 195351 > 8 disks 294241 > 15 disks 298199 > > tcmalloc 2.2.1 default 16M cache > > > 1 disk 37911 > 2 disks 67698 > 4 disks 41076 > 8 disks 43312 > 15 disks 37569 > > tcmalloc : 256M cache > --- > > 1 disk 33914 > 2 disks 58839 > 4 disks 148205 > 8 disks 213298 > 15 disks 218383 > > Signed-off-by: Alexandre Derumier > --- > configure | 29 + > 1 file changed, 29 insertions(+) > > diff --git a/configure b/configure > index 222694f..2fe1e05 100755 > --- a/configure > +++ b/configure > @@ -336,6 +336,7 @@ vhdx="" > quorum="" > numa="" > tcmalloc="no" > +jemalloc="no" > > # parse CC options first > for opt do > @@ -1147,6 +1148,10 @@ for opt do > ;; > --enable-tcmalloc) tcmalloc="yes" > ;; > + --disable-jemalloc) jemalloc="no" > + ;; > + --enable-jemalloc) jemalloc="yes" > + ;; > *) > echo "ERROR: unknown option $opt" > echo "Try '$0 --help' for more information" > @@ -1420,6 +1425,8 @@ Advanced options (experts only): > --enable-numa enable libnuma support > --disable-tcmalloc disable tcmalloc support > --enable-tcmalloc enable tcmalloc support > + --disable-jemalloc disable jemalloc support > + --enable-jemalloc enable jemalloc support > > NOTE: The object files are built at the place where configure is launched > EOF > @@ -3344,6 +3351,11 @@ EOF > fi > fi > > +if test "$tcmalloc" = "yes" && test "$jemalloc" = "yes" ; then > + echo "ERROR: tcmalloc && jemalloc can't be used at the same time" > + exit 1 > +fi > + > ## > # tcmalloc probe > > @@ -3361,6 +3373,22 @@ EOF > fi > > ## > +# jemalloc probe > + > +if test "$jemalloc" = "yes" ; then > + cat > $TMPC << EOF > +#include > +int main(void) { malloc(1); return 0; } > +EOF > + > + if compile_prog "" "-ljemalloc" ; then > + LIBS="-ljemalloc $LIBS" > + else > + feature_not_found "jemalloc" "install jemalloc devel" > + fi > +fi > + > +## > # signalfd probe > signalfd="no" > cat > $TMPC << EOF > @@ -4499,6 +4527,7 @@ echo "snappy support $snappy" > echo "bzip2 support $bzip2" > echo "NUMA host support $numa" > echo "tcmalloc support $tcmalloc" > +echo "jemalloc support $jemalloc" > > if test "$sdl_too_old" = "yes"; then > echo "-> Your SDL version is too old - please upgrade to have SDL support" >
Re: [Qemu-devel] [PATCH] configure: Add support for jemalloc
@cc Fam Zheng , as he's the author of tcmalloc support patch - Mail original - De: "aderumier" À: "qemu-devel" Cc: "aderumier" Envoyé: Vendredi 19 Juin 2015 12:56:58 Objet: [PATCH] configure: Add support for jemalloc This adds "--enable-jemalloc" and "--disable-jemalloc" to allow linking to jemalloc memory allocator. We have already tcmalloc support, but it seem to not working well with a lot of iothreads/disks. The main problem is that tcmalloc use a shared thread cache of 16MB by default. With more threads, this cache is shared, and some bad garbage collections can occur if the cache is too low. It's possible to tcmalloc cache increase it with a env var: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=256MB With default 16MB, performances are really bad with more than 2 disks. Increasing to 256MB, it's helping but still have problem with 16 disks/iothreads. Jemalloc don't have performance problem with default configuration. Here the benchmark results in iops of 1 qemu vm randread 4K iodepth=32, with rbd block backend (librbd is doing a lot of memory allocation), 1 iothread by disk glibc malloc 1 disk 29052 2 disks 55878 4 disks 127899 8 disks 240566 15 disks 269976 jemalloc 1 disk 41278 2 disks 75781 4 disks 195351 8 disks 294241 15 disks 298199 tcmalloc 2.2.1 default 16M cache 1 disk 37911 2 disks 67698 4 disks 41076 8 disks 43312 15 disks 37569 tcmalloc : 256M cache --- 1 disk 33914 2 disks 58839 4 disks 148205 8 disks 213298 15 disks 218383 Signed-off-by: Alexandre Derumier --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 222694f..2fe1e05 100755 --- a/configure +++ b/configure @@ -336,6 +336,7 @@ vhdx="" quorum="" numa="" tcmalloc="no" +jemalloc="no" # parse CC options first for opt do @@ -1147,6 +1148,10 @@ for opt do ;; --enable-tcmalloc) tcmalloc="yes" ;; + --disable-jemalloc) jemalloc="no" + ;; + --enable-jemalloc) jemalloc="yes" + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1420,6 +1425,8 @@ Advanced options (experts only): --enable-numa enable libnuma support --disable-tcmalloc disable tcmalloc support --enable-tcmalloc enable tcmalloc support + --disable-jemalloc disable jemalloc support + --enable-jemalloc enable jemalloc support NOTE: The object files are built at the place where configure is launched EOF @@ -3344,6 +3351,11 @@ EOF fi fi +if test "$tcmalloc" = "yes" && test "$jemalloc" = "yes" ; then + echo "ERROR: tcmalloc && jemalloc can't be used at the same time" + exit 1 +fi + ## # tcmalloc probe @@ -3361,6 +3373,22 @@ EOF fi ## +# jemalloc probe + +if test "$jemalloc" = "yes" ; then + cat > $TMPC << EOF +#include +int main(void) { malloc(1); return 0; } +EOF + + if compile_prog "" "-ljemalloc" ; then + LIBS="-ljemalloc $LIBS" + else + feature_not_found "jemalloc" "install jemalloc devel" + fi +fi + +## # signalfd probe signalfd="no" cat > $TMPC << EOF @@ -4499,6 +4527,7 @@ echo "snappy support $snappy" echo "bzip2 support $bzip2" echo "NUMA host support $numa" echo "tcmalloc support $tcmalloc" +echo "jemalloc support $jemalloc" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" -- 2.1.4
[Qemu-devel] [PATCH] configure: Add support for jemalloc
This adds "--enable-jemalloc" and "--disable-jemalloc" to allow linking to jemalloc memory allocator. We have already tcmalloc support, but it seem to not working well with a lot of iothreads/disks. The main problem is that tcmalloc use a shared thread cache of 16MB by default. With more threads, this cache is shared, and some bad garbage collections can occur if the cache is too low. It's possible to tcmalloc cache increase it with a env var: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=256MB With default 16MB, performances are really bad with more than 2 disks. Increasing to 256MB, it's helping but still have problem with 16 disks/iothreads. Jemalloc don't have performance problem with default configuration. Here the benchmark results in iops of 1 qemu vm randread 4K iodepth=32, with rbd block backend (librbd is doing a lot of memory allocation), 1 iothread by disk glibc malloc 1 disk 29052 2 disks 55878 4 disks 127899 8 disks 240566 15 disks269976 jemalloc 1 disk 41278 2 disks 75781 4 disks 195351 8 disks 294241 15 disks298199 tcmalloc 2.2.1 default 16M cache 1 disk 37911 2 disks 67698 4 disks 41076 8 disks 43312 15 disks 37569 tcmalloc : 256M cache --- 1 disk 33914 2 disks58839 4 disks148205 8 disks213298 15 disks 218383 Signed-off-by: Alexandre Derumier --- configure | 29 + 1 file changed, 29 insertions(+) diff --git a/configure b/configure index 222694f..2fe1e05 100755 --- a/configure +++ b/configure @@ -336,6 +336,7 @@ vhdx="" quorum="" numa="" tcmalloc="no" +jemalloc="no" # parse CC options first for opt do @@ -1147,6 +1148,10 @@ for opt do ;; --enable-tcmalloc) tcmalloc="yes" ;; + --disable-jemalloc) jemalloc="no" + ;; + --enable-jemalloc) jemalloc="yes" + ;; *) echo "ERROR: unknown option $opt" echo "Try '$0 --help' for more information" @@ -1420,6 +1425,8 @@ Advanced options (experts only): --enable-numaenable libnuma support --disable-tcmalloc disable tcmalloc support --enable-tcmallocenable tcmalloc support + --disable-jemalloc disable jemalloc support + --enable-jemallocenable jemalloc support NOTE: The object files are built at the place where configure is launched EOF @@ -3344,6 +3351,11 @@ EOF fi fi +if test "$tcmalloc" = "yes" && test "$jemalloc" = "yes" ; then +echo "ERROR: tcmalloc && jemalloc can't be used at the same time" +exit 1 +fi + ## # tcmalloc probe @@ -3361,6 +3373,22 @@ EOF fi ## +# jemalloc probe + +if test "$jemalloc" = "yes" ; then + cat > $TMPC << EOF +#include +int main(void) { malloc(1); return 0; } +EOF + + if compile_prog "" "-ljemalloc" ; then +LIBS="-ljemalloc $LIBS" + else +feature_not_found "jemalloc" "install jemalloc devel" + fi +fi + +## # signalfd probe signalfd="no" cat > $TMPC << EOF @@ -4499,6 +4527,7 @@ echo "snappy support$snappy" echo "bzip2 support $bzip2" echo "NUMA host support $numa" echo "tcmalloc support $tcmalloc" +echo "jemalloc support $jemalloc" if test "$sdl_too_old" = "yes"; then echo "-> Your SDL version is too old - please upgrade to have SDL support" -- 2.1.4
Re: [Qemu-devel] [PATCH 1/2] qapi: Add "detect-zeroes" option to drive-mirror
>>Sorry, forgot to mention - of course I`ve pulled all previous >>zeroing-related queue, so I haven`t had only the QMP-related fix >>running on top of the master :) Hi, I had a discussion about rbd mirroring, detect-zeroes and sparse target, some months ago with paolo http://qemu.11.n7.nabble.com/qemu-drive-mirror-to-rbd-storage-no-sparse-rbd-image-td293642.html Not sure it's related but "Lack of bdrv_co_write_zeroes is why detect-zeroes does not work. Lack of bdrv_get_block_status is why sparse->sparse does not work without detect-zeroes. Paolo " - Mail original - De: "Andrey Korolyov" À: "Fam Zheng" Cc: "Kevin Wolf" , "qemu block" , "Jeff Cody" , "qemu-devel" , "Markus Armbruster" , "stefanha" , "pbonzini" Envoyé: Mercredi 10 Juin 2015 17:48:54 Objet: Re: [Qemu-devel] [PATCH 1/2] qapi: Add "detect-zeroes" option to drive-mirror > '{"execute":"drive-mirror", "arguments": { "device": > "drive-virtio-disk0", "target": > "rbd:dev-rack2/vm33090-dest:id=qemukvm:key=xxx:auth_supported=cephx\\;none:mon_host=10.6.0.1\\:6789\\;10.6.0.3\\:6789\\;10.6.0.4\\:6789", > > "mode": "existing", "sync": "full", "detect-zeroes": true, "format": > "raw" } }' Sorry, forgot to mention - of course I`ve pulled all previous zeroing-related queue, so I haven`t had only the QMP-related fix running on top of the master :)
Re: [Qemu-devel] poor virtio-scsi performance
Hi, if you want to use multiqueues in guest, you need to enabled it on virtio-scsi controller. for example. - Mail original - De: "Vasiliy Tolstov" À: "qemu-devel" , libvir-l...@redhat.com Envoyé: Lundi 8 Juin 2015 12:30:59 Objet: [Qemu-devel] poor virtio-scsi performance Hi all! I suspected poor performance of virtio-scsi driver. I did a few tests: Host machine: linux 3.19.1, QEMU emulator version 2.3.0 Guest machine: linux 4.0.4 part of domain xml: /usr/bin/kvm /dev/ram0 I got by running `modprobe brd rd_size=$((5*1024*1024))` on host machine. fio conf: [readtest] blocksize=4k filename=/dev/sdb (/dev/ram0 whe test from host machine) rw=randread direct=1 buffered=0 ioengine=libaio iodepth=32 results: from host: bw=1594.6MB/s, iops=408196, clat=76usec from guest: bw=398MB/s, iops=99720, clat=316usec Both host and guest system I boot with `scsi_mod.use_blk_mq=Y`. Why difference in 4 times?! -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru
Re: [Qemu-devel] qmp block-mirror hang, lots of nfs op/s on nfs source disk
>>That is because drive-mirror checks the whole disk for allocated areas. Oh, ok, that's why I'm seeing a lot of lseek in strace lseek(21, 1447493632, 0x4 /* SEEK_??? */) = 107374182400 lseek(21, 1447559168, 0x3 /* SEEK_??? */) = 1447559168 lseek(21, 1447559168, 0x4 /* SEEK_??? */) = 107374182400 lseek(21, 1447624704, 0x3 /* SEEK_??? */) = 1447624704 lseek(21, 1447624704, 0x4 /* SEEK_??? */) = 107374182400 lseek(21, 1447690240, 0x3 /* SEEK_??? */) = 1447690240 lseek(21, 1447690240, 0x4 /* SEEK_??? */) = 107374182400 ... >>We are aware of that and Fam (CCed) is working on it. Thanks for the >>report nevertheless! Ok, thanks ! - Mail original - De: "pbonzini" À: "aderumier" , "qemu-devel" , "Fam Zheng" Envoyé: Lundi 11 Mai 2015 13:46:54 Objet: Re: qmp block-mirror hang, lots of nfs op/s on nfs source disk On 11/05/2015 13:38, Alexandre DERUMIER wrote: > Hi, > > I'm currently playing with drive-mirror, (qemu 2.2) > > and I have qmp hang when drive-mirror is starting. > > just after qmp "drive-mirror" exec, qmp socket or hmp are not > responding. > > After some times it's working again, and I can see result of > query-block-jobs. > > > The source volume is on nfs (v4), and it seem that the bigger the > volume is, the bigger I need to wait before the mirror begin. That is because drive-mirror checks the whole disk for allocated areas. We are aware of that and Fam (CCed) is working on it. Thanks for the report nevertheless! Paolo
[Qemu-devel] qmp block-mirror hang, lots of nfs op/s on nfs source disk
Hi, I'm currently playing with drive-mirror, (qemu 2.2) and I have qmp hang when drive-mirror is starting. just after qmp "drive-mirror" exec, qmp socket or hmp are not responding. After some times it's working again, and I can see result of query-block-jobs. The source volume is on nfs (v4), and it seem that the bigger the volume is, the bigger I need to wait before the mirror begin. Looking at nfsiostat op/s rpc bklog 1806.000.00 read: ops/skB/s kB/op retrans avg RTT (ms)avg exe (ms) 0.000 0.000 0.0000 (0.0%) 0.000 0.000 write:ops/skB/s kB/op retrans avg RTT (ms)avg exe (ms) 0.000 0.000 0.0000 (0.0%) 0.000 0.000 I'm seeing a lot of nfs op/s on source disk (but not read). I wonder if some kind of source file scan is done ? is it expected ? I can reproduce it 100% I found a report on redhat bugzilla here, which seem to have same behaviour (not on nfs): https://bugzilla.redhat.com/show_bug.cgi?id=1185172
Re: [Qemu-devel] VCPU Hot-Unplug Feature
Hi, they are a lot of patches sent on the mailing list recently, cpu hotplug|unplug with device_add|del https://www.redhat.com/archives/libvir-list/2015-February/msg00084.html - Mail original - De: "Fahri Cihan Demirci" À: "qemu-devel" Envoyé: Lundi 4 Mai 2015 18:24:43 Objet: [Qemu-devel] VCPU Hot-Unplug Feature Hello, We are interested in being able to remove a VCPU from an active Libvirt domain running under QEMU/KVM. However, currently that does not seem to be possible because QEMU does not provide an interface for hot-unplugging a VCPU. The corresponding feature page [1] on the QEMU wiki mentions the cpu-add interface and the generalized device add/del interface. So, my question is, is there someone working on a cpu-remove interface or making progress in the generalized device add/del interface to hot-unplug VCPUS? If you are in any way interested in or planning such a feature, we may provide feedback, testing or help in the development effort. Any other input, guidance or corrections would be also most welcome. Thank you. [1] http://wiki.qemu.org/Features/CPUHotplug Best regards, Fahri Cihan Demirci
Re: [Qemu-devel] qcow2 problem: while a qcow2 image run out of disk space the guest paused.
Hi, Isn't it related to drive options ? " werror=action,rerror=action Specify which action to take on write and read errors. Valid actions are: “ignore” (ignore the error and try to continue), “stop” (pause QEMU), “report” (report the error to the guest), “enospc” (pause QEMU only if the host disk is full; report the error to the guest otherwise). The default setting is werror=enospc and rerror=report. " - Mail original - De: "chenxg" À: "Kevin Wolf" , "stefanha" , m...@linux.vnet.ibm.com, t...@cn.ibm.com, borntrae...@de.ibm.com, "Holger Wolf" , "qemu-devel" Envoyé: Vendredi 17 Avril 2015 04:22:57 Objet: [Qemu-devel] qcow2 problem: while a qcow2 image run out of disk space the guest paused. Hi Kevin, While installing a guest into a qcow2 image if the qcow2 image run out of disk space, the guest will stop work and change state to paused without messages. When we tried to free up disk space in the host and the virsh resume to work. The guest then got I/O errors but later on QEMU also crashed. A guest in paused state is certainly not the best result. Probably good way to react are: - return I/O errors to the guest and the let the guest handle those (Linux file system code might just crash) - allow resume only when disk is free again. What's your opinion?
Re: [Qemu-devel] virtio-scsi + iothread : segfault on drive_del
Ok, thanks paolo ! - Mail original - De: "pbonzini" À: "aderumier" , "qemu-devel" Envoyé: Mercredi 1 Avril 2015 12:27:27 Objet: Re: virtio-scsi + iothread : segfault on drive_del On 01/04/2015 05:34, Alexandre DERUMIER wrote: > > I'm currently testing virtio-scsi and iothread, > > and I'm seeing qemu segfault when I try to remove an scsi drive > on top of an virtio-scsi controller with iothread enabled. > > > virtio-blk + iothread drive_del is supported since this patch > > http://comments.gmane.org/gmane.comp.emulators.qemu/291562 > "block: support drive_del with dataplane > > This series makes hot unplug work with virtio-blk dataplane devices. It > should > also work with Fam's virtio-scsi dataplane patches." virtio-scsi + iothread is experimental and not yet thread-safe. Deadlocks or crashes are expected, and not really fixable without a lot of preparatory changes to core QEMU code. These are being done, but I don't expect it to be fixed before 2.5. Paolo
[Qemu-devel] virtio-scsi + iothread : segfault on drive_del
Hi, I'm currently testing virtio-scsi and iothread, and I'm seeing qemu segfault when I try to remove an scsi drive on top of an virtio-scsi controller with iothread enabled. virtio-blk + iothread drive_del is supported since this patch http://comments.gmane.org/gmane.comp.emulators.qemu/291562 "block: support drive_del with dataplane This series makes hot unplug work with virtio-blk dataplane devices. It should also work with Fam's virtio-scsi dataplane patches." I don't known about virtio-scsi ? Regards, Alexandre
Re: [Qemu-devel] balloon stats not working if qemu is started with -machine option
>> But you do have to set the polling interval to use the feature. Ok thanks Luiz. The problem was that we didn't set polling interval on migration only (on target qemu). Thanks again ! Alexandre - Mail original - De: "Luiz Capitulino" À: "aderumier" Cc: "qemu-devel" , "dietmar" Envoyé: Mardi 10 Mars 2015 14:30:20 Objet: Re: [Qemu-devel] balloon stats not working if qemu is started with -machine option On Mon, 9 Mar 2015 08:04:54 +0100 (CET) Alexandre DERUMIER wrote: > I have forgot to said that we don't setup pooling interval manually. (which > seem to works fine without -machine) > > > Now,if I setup guest-stats-polling-interval with qom-set, > it seem to works fine with -machine option. Setting the polling interval is a required step. The output you sent seems to indicate that the guest sent a stats update to the host unilaterally and those values seem buggy. I can investigate it if you provide full details (guest command-line plus full QMP command sequence). But you do have to set the polling interval to use the feature. > > > > - Mail original - > De: "aderumier" > À: "qemu-devel" , "Luiz Capitulino" > > Cc: "dietmar" > Envoyé: Lundi 9 Mars 2015 07:49:22 > Objet: [Qemu-devel] balloon stats not working if qemu is started with > -machine option > > Hi, > > I have noticed that balloon stats are not working if a qemu guest is started > with -machine option. > > (-machine pc, or any version) . Tested of qemu 1.7,2.1 && 2.2 > > > When the guest is starting (balloon driver not yet loaded) > > $VAR1 = { > 'last-update' => 0, > 'stats' => { > 'stat-free-memory' => -1, > 'stat-swap-in' => -1, > 'stat-total-memory' => -1, > 'stat-major-faults' => -1, > 'stat-minor-faults' => -1, > 'stat-swap-out' => -1 > } > }; > > > then > > when the guest has loaded his driver > > $VAR1 = { > 'last-update' => 1425882998, > 'stats' => { > 'stat-free-memory' => -1, > 'stat-swap-in' => '4039065379203448832', > 'stat-total-memory' => -1, > 'stat-major-faults' => -1, > 'stat-minor-faults' => -1, > 'stat-swap-out' => '-6579759055588294656' > } > }; > > $VAR1 = { > 'last-update' => 1425882998, > 'stats' => { > 'stat-free-memory' => -1, > 'stat-swap-in' => '4039065379203448832', > 'stat-total-memory' => -1, > 'stat-major-faults' => -1, > 'stat-minor-faults' => -1, > 'stat-swap-out' => '-6579759055588294656' > } > }; > > > $VAR1 = { > 'last-update' => 1425882998, > 'stats' => { > 'stat-free-memory' => -1, > 'stat-swap-in' => '4039065379203448832', > 'stat-total-memory' => -1, > 'stat-major-faults' => -1, > 'stat-minor-faults' => -1, > 'stat-swap-out' => '-6579759055588294656' > } > }; > > > It's seem that a some stats are retrieved, but last-update don't increment. > Removing the machine option resolve the problem. > > > I'm working with proxmox team, and a lot of user have reported balloning bug, > because we pass the -machine option when are a doing live migration. > > I'm surprised that -machine pc also have this bug. (Isn't it supposed to be > the default machine config ?) > > > > > here the sample command line: > > > /usr/bin/kvm -id 150 -chardev > socket,id=qmp,path=/var/run/qemu-server/150.qmp,server,nowait -mon > chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/150.vnc,x509,password > -pidfile /var/run/qemu-server/150.pid -daemonize -smbios > type=1,manufacturer=dell,version=1,product=3,uuid=f0686bfb-50b8-4d31-a4cb-b1cf60eeb648 > -name debianok -smp 1,sockets=2,cores=1,maxcpus=2 -nodefaults -boot > menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu > kvm64,+lahf_lm,+x2apic,+sep -m 4096 -k fr -device > piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device > usb-tablet,id=tablet,bus=uhci.0,port=1 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi > initiator-name=iqn.1993-08.org.debian.01.24b0d01a62a3 -drive > file=/var/lib/vz/images/150/vm-150-disk-1.raw,if=none,id=drive-virtio0,format=raw,aio=native,cache=none > -device > virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 > -netdev > type=tap,id=net0,ifname=tap150i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on > -device > virtio-net-pci,mac=76:EF:E9:ED:9D:41,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 > -machine pc >
Re: [Qemu-devel] balloon stats not working if qemu is started with -machine option
I have forgot to said that we don't setup pooling interval manually. (which seem to works fine without -machine) Now,if I setup guest-stats-polling-interval with qom-set, it seem to works fine with -machine option. - Mail original - De: "aderumier" À: "qemu-devel" , "Luiz Capitulino" Cc: "dietmar" Envoyé: Lundi 9 Mars 2015 07:49:22 Objet: [Qemu-devel] balloon stats not working if qemu is started with -machine option Hi, I have noticed that balloon stats are not working if a qemu guest is started with -machine option. (-machine pc, or any version) . Tested of qemu 1.7,2.1 && 2.2 When the guest is starting (balloon driver not yet loaded) $VAR1 = { 'last-update' => 0, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => -1, 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => -1 } }; then when the guest has loaded his driver $VAR1 = { 'last-update' => 1425882998, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => '4039065379203448832', 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => '-6579759055588294656' } }; $VAR1 = { 'last-update' => 1425882998, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => '4039065379203448832', 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => '-6579759055588294656' } }; $VAR1 = { 'last-update' => 1425882998, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => '4039065379203448832', 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => '-6579759055588294656' } }; It's seem that a some stats are retrieved, but last-update don't increment. Removing the machine option resolve the problem. I'm working with proxmox team, and a lot of user have reported balloning bug, because we pass the -machine option when are a doing live migration. I'm surprised that -machine pc also have this bug. (Isn't it supposed to be the default machine config ?) here the sample command line: /usr/bin/kvm -id 150 -chardev socket,id=qmp,path=/var/run/qemu-server/150.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/150.vnc,x509,password -pidfile /var/run/qemu-server/150.pid -daemonize -smbios type=1,manufacturer=dell,version=1,product=3,uuid=f0686bfb-50b8-4d31-a4cb-b1cf60eeb648 -name debianok -smp 1,sockets=2,cores=1,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -m 4096 -k fr -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian.01.24b0d01a62a3 -drive file=/var/lib/vz/images/150/vm-150-disk-1.raw,if=none,id=drive-virtio0,format=raw,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap150i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=76:EF:E9:ED:9D:41,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -machine pc
[Qemu-devel] balloon stats not working if qemu is started with -machine option
Hi, I have noticed that balloon stats are not working if a qemu guest is started with -machine option. (-machine pc, or any version) . Tested of qemu 1.7,2.1 && 2.2 When the guest is starting (balloon driver not yet loaded) $VAR1 = { 'last-update' => 0, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => -1, 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => -1 } }; then when the guest has loaded his driver $VAR1 = { 'last-update' => 1425882998, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => '4039065379203448832', 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => '-6579759055588294656' } }; $VAR1 = { 'last-update' => 1425882998, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => '4039065379203448832', 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => '-6579759055588294656' } }; $VAR1 = { 'last-update' => 1425882998, 'stats' => { 'stat-free-memory' => -1, 'stat-swap-in' => '4039065379203448832', 'stat-total-memory' => -1, 'stat-major-faults' => -1, 'stat-minor-faults' => -1, 'stat-swap-out' => '-6579759055588294656' } }; It's seem that a some stats are retrieved, but last-update don't increment. Removing the machine option resolve the problem. I'm working with proxmox team, and a lot of user have reported balloning bug, because we pass the -machine option when are a doing live migration. I'm surprised that -machine pc also have this bug. (Isn't it supposed to be the default machine config ?) here the sample command line: /usr/bin/kvm -id 150 -chardev socket,id=qmp,path=/var/run/qemu-server/150.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/150.vnc,x509,password -pidfile /var/run/qemu-server/150.pid -daemonize -smbios type=1,manufacturer=dell,version=1,product=3,uuid=f0686bfb-50b8-4d31-a4cb-b1cf60eeb648 -name debianok -smp 1,sockets=2,cores=1,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000 -vga cirrus -cpu kvm64,+lahf_lm,+x2apic,+sep -m 4096 -k fr -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian.01.24b0d01a62a3 -drive file=/var/lib/vz/images/150/vm-150-disk-1.raw,if=none,id=drive-virtio0,format=raw,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -netdev type=tap,id=net0,ifname=tap150i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=76:EF:E9:ED:9D:41,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -machine pc
Re: [Qemu-devel] [BUG] Balloon malfunctions with memory hotplug
Hi, I think they was already reported some month ago, and a patch was submitted to the mailing list (but waiting that memory unplug was merged before apply it) http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html - Mail original - De: "Luiz Capitulino" À: "qemu-devel" Cc: "kvm" , "Igor Mammedov" , "zhang zhanghailiang" , pkre...@redhat.com, "Eric Blake" , "Michael S. Tsirkin" , "amit shah" Envoyé: Jeudi 26 Février 2015 20:26:29 Objet: [BUG] Balloon malfunctions with memory hotplug Hello, Reproducer: 1. Start QEMU with balloon and memory hotplug support: # qemu [...] -m 1G,slots=2,maxmem=2G -balloon virtio 2. Check balloon size: (qemu) info balloon balloon: actual=1024 (qemu) 3. Hotplug some memory: (qemu) object_add memory-backend-ram,id=mem1,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 4. This is step is _not_ needed to reproduce the problem, but you may need to online memory manually on Linux so that it becomes available in the guest 5. Check balloon size again: (qemu) info balloon balloon: actual=1024 (qemu) BUG: The guest now has 2GB of memory, but the balloon thinks the guest has 1GB One may think that the problem is that the balloon driver is ignoring hotplugged memory. This is not what's happening. If you do balloon your guest, there's nothing stopping the balloon driver in the guest from ballooning hotplugged memory. The problem is that the balloon device in QEMU needs to know the current amount of memory available to the guest. Before memory hotplug this information was easy to obtain: the current amount of memory available to the guest is the memory the guest was booted with. This value is stored in the ram_size global variable in QEMU and this is what the balloon device emulation code uses today. However, when memory is hotplugged ram_size is _not_ updated and the balloon device breaks. I see two possible solutions for this problem: 1. In addition to reading ram_size, the balloon device in QEMU could scan pc-dimm devices to account for hotplugged memory. This solution was already implemented by zhanghailiang: http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg02362.html It works, except that on Linux memory hotplug is a two-step procedure: first memory is inserted then it has to be onlined from user-space. So, if memory is inserted but not onlined this solution gives the opposite problem: the balloon device will report a larger memory amount than the guest actually has. Can we live with that? I guess not, but I'm open for discussion. If QEMU could be notified when Linux makes memory online, then the problem would be gone. But I guess this can't be done. 2. Modify the balloon driver in the guest to inform the balloon device on the host about the current memory available to the guest. This way, whenever the balloon device in QEMU needs to know the current amount of memory in the guest, it asks the guest. This drops any usage of ram_size in the balloon device I'm not completely sure this is feasible though. For example, what happens if the guest reports a memory amount to QEMU and right after this more memory is plugged? Besides, this solution is more complex than solution 1 and won't address older guests. Another important detail is that, I *suspect* that a very similar bug already exists with 32-bit guests even without memory hotplug: what happens if you assign 6GB to a 32-bit without PAE support? I think the same problem we're seeing with memory hotplug will happen and solution 1 won't fix this, although no one seems to care about 32-bit guests... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] slow speed for virtio-scsi since qemu 2.2
H Stefan, only for write ? or also read ? I'll try to reproduce on my test cluster. - Mail original - De: "Stefan Priebe" À: "qemu-devel" Envoyé: Dimanche 15 Février 2015 19:46:12 Objet: [Qemu-devel] slow speed for virtio-scsi since qemu 2.2 Hi, while i get a constant random 4k i/o write speed of 20.000 iops with qemu 2.1.0 or 2.1.3. I get jumping speeds with qemu 2.2 (jumping between 500 iops and 15.000 iop/s). If i use virtio instead of virtio-scsi speed is the same between 2.2 and 2.1. Is there a known regression? Greets, Stefan
Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support
>>About this, I can do it successfully on my qemu. >>So can you tell us more information about your operation? simply start with -smp 2,sockets=2,cores=2,maxcpus=4 -device kvm64-x86_64-cpu,apic-id=2,id=cpu2 then #device_del cpu2 Guest return [ 324.195024] Unregister pv shared memory for cpu 2 [ 324.250579] smpboot: CPU 2 is now offline but cpu is not remove in qemu. I had also try to eject manualy from guest echo 1 > /sys/bus/acpi/devices/LNXCPU\:02/eject But I have this error message ACPI: \_SB_.CP02: Eject incomplete - status 0xf (maybe is it normal ? or maybe is it a guest bug (kernel 3.16 from debian wheezy backports) ?) >>And I found the patchset you applied is the last version in your last >>email. I think you'd better apply the latest version. I have applied on top of git theses patches series https://www.mail-archive.com/qemu-devel%40nongnu.org/msg272745.html [Qemu-devel] [RESEND PATCH v1 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug. https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg01552.html [Qemu-devel] [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support https://www.mail-archive.com/qemu-devel@nongnu.org/msg273870.html [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support - Mail original - De: "Zhu Guihua" À: "aderumier" Cc: "qemu-devel" , tangc...@cn.fujitsu.com, "guz fnst" , "isimatu yasuaki" , "Anshul Makkar" , "chen fan fnst" , "Igor Mammedov" , "afaerber" Envoyé: Lundi 26 Janvier 2015 04:47:13 Objet: Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support On Mon, 2015-01-26 at 04:19 +0100, Alexandre DERUMIER wrote: > Thanks for your reply. > > 2 others things: > > 1) > on cpu unplug, I see that the cpu is correctly removed from my linux guest > but not from qemu > About this, I can do it successfully on my qemu. So can you tell us more information about your operation? And I found the patchset you applied is the last version in your last email. I think you'd better apply the latest version. Regards, Zhu > starting with a guest with 3cpus: > > guest: #ls -lah /sys/devices/system/ |grep cpu > drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu0 > drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu1 > drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu2 > > hmp: # info cpus > * CPU #0: pc=0x81057022 (halted) thread_id=24972 > CPU #1: pc=0x81057022 (halted) thread_id=24973 > CPU #2: pc=0x81048bc1 (halted) thread_id=25102 > > > then unplug cpu2 > hmp : device_del cpu2 > > guest: > > dmesg: > [ 176.219754] Unregister pv shared memory for cpu 2 > [ 176.278881] smpboot: CPU 2 is now offline > > #ls -lah /sys/devices/system/ |grep cpu > drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu0 > drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu1 > > hmp: # info cpus > * CPU #0: pc=0x81057022 (halted) thread_id=24972 > CPU #1: pc=0x81057022 (halted) thread_id=24973 > CPU #2: pc=0x81048bc1 (halted) thread_id=25102 > > > > > 2)when numa is used, the hotplugged cpu is always on numa node 0 > (cpu_add or device_add cpu) > > > starting a guest, with 2 sockets,1 cores > > -smp 2,sockets=2,cores=1,maxcpus=2 > -object memory-backend-ram,size=256M,id=ram-node0 -numa > node,nodeid=0,cpus=0,memdev=ram-node0 > -object memory-backend-ram,size=256M,id=ram-node1 -numa > node,nodeid=1,cpus=1,memdev=ram-node1 > > hmp: > # info numa > 2 nodes > node 0 cpus: 0 > node 0 size: 256 MB > node 1 cpus: 1 > node 1 size: 256 MB > > ok > > now > > starting with same topology, but with 1cpu at start > -smp 2,sockets=2,cores=1,maxcpus=2 > -object memory-backend-ram,size=256M,id=ram-node0 -numa > node,nodeid=0,cpus=0,memdev=ram-node0 > -object memory-backend-ram,size=256M,id=ram-node1 -numa > node,nodeid=1,cpus=1,memdev=ram-node1 > > # info numa > 2 nodes > node 0 cpus: 0 > node 0 size: 256 MB > node 1 cpus: > node 1 size: 256 MB > > hotpluging a cpu > # device_add kvm64-x86_64-cpu,apic-id=1,id=cpu1 > > # info numa > 2 nodes > node 0 cpus: 0 1 > node 0 size: 256 MB > node 1 cpus: > node 1 size: 256 MB > > cpu1 should be on node1, not node0. > > > Regards, > > Alexandre > > - Mail original - > De: "Zhu Guihua" > À: "aderumier" > Cc: "qemu-devel" , tangc...@cn.fujitsu.com, "guz fnst" > , "isimatu yasuaki" > , "Anshul Makkar" > , "chen fan fnst" > , "Igor Mammedov" , > "afaerber"
Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support
>>2)when numa is used, the hotplugged cpu is always on numa node 0 >>(cpu_add or device_add cpu) About this, it seem to be a display bug in "info numa", cpu is correctly assigned to numa node1 in the guest. - Mail original - De: "aderumier" À: "Zhu Guihua" Cc: "qemu-devel" , tangc...@cn.fujitsu.com, "guz fnst" , "isimatu yasuaki" , "Anshul Makkar" , "chen fan fnst" , "Igor Mammedov" , "afaerber" Envoyé: Lundi 26 Janvier 2015 04:19:52 Objet: Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support Thanks for your reply. 2 others things: 1) on cpu unplug, I see that the cpu is correctly removed from my linux guest but not from qemu starting with a guest with 3cpus: guest: #ls -lah /sys/devices/system/ |grep cpu drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu0 drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu1 drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu2 hmp: # info cpus * CPU #0: pc=0x81057022 (halted) thread_id=24972 CPU #1: pc=0x81057022 (halted) thread_id=24973 CPU #2: pc=0x81048bc1 (halted) thread_id=25102 then unplug cpu2 hmp : device_del cpu2 guest: dmesg: [ 176.219754] Unregister pv shared memory for cpu 2 [ 176.278881] smpboot: CPU 2 is now offline #ls -lah /sys/devices/system/ |grep cpu drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu0 drwxr-xr-x 6 root root 0 Jan 25 22:16 cpu1 hmp: # info cpus * CPU #0: pc=0x81057022 (halted) thread_id=24972 CPU #1: pc=0x81057022 (halted) thread_id=24973 CPU #2: pc=0x81048bc1 (halted) thread_id=25102 2)when numa is used, the hotplugged cpu is always on numa node 0 (cpu_add or device_add cpu) starting a guest, with 2 sockets,1 cores -smp 2,sockets=2,cores=1,maxcpus=2 -object memory-backend-ram,size=256M,id=ram-node0 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -object memory-backend-ram,size=256M,id=ram-node1 -numa node,nodeid=1,cpus=1,memdev=ram-node1 hmp: # info numa 2 nodes node 0 cpus: 0 node 0 size: 256 MB node 1 cpus: 1 node 1 size: 256 MB ok now starting with same topology, but with 1cpu at start -smp 2,sockets=2,cores=1,maxcpus=2 -object memory-backend-ram,size=256M,id=ram-node0 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -object memory-backend-ram,size=256M,id=ram-node1 -numa node,nodeid=1,cpus=1,memdev=ram-node1 # info numa 2 nodes node 0 cpus: 0 node 0 size: 256 MB node 1 cpus: node 1 size: 256 MB hotpluging a cpu # device_add kvm64-x86_64-cpu,apic-id=1,id=cpu1 # info numa 2 nodes node 0 cpus: 0 1 node 0 size: 256 MB node 1 cpus: node 1 size: 256 MB cpu1 should be on node1, not node0. Regards, Alexandre - Mail original - De: "Zhu Guihua" À: "aderumier" Cc: "qemu-devel" , tangc...@cn.fujitsu.com, "guz fnst" , "isimatu yasuaki" , "Anshul Makkar" , "chen fan fnst" , "Igor Mammedov" , "afaerber" Envoyé: Lundi 26 Janvier 2015 03:01:48 Objet: Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support On Fri, 2015-01-23 at 11:24 +0100, Alexandre DERUMIER wrote: > Hello, > > I'm currently testing the new cpu unplug features, > Works fine here with debian guests and kernel 3.14. > Thanks for your test. > But I have notice some small potential bugs, but I'm not sure I'm doing it > right. > > 1)first, to unplug cpu, we need an id for cpu > Yes, if you want to unplug cpu, you must have an id for cpu. > The problem is that the current qemu command line > -smp 1,sockets=2,cores=1,maxcpus=2 > > for example, will create 1 cpu on apic-id 0 without any id, so we can't > unplug it. > > > So, I have tried with > > -smp 1,sockets=2,cores=1,maxcpus=2 -device kvm64-x86_64-cpu,apic-id=0,id=cpu0 > > But this give me an error: > "-device kvm64-x86_64-cpu,apic-id=0,id=cpu0: CPU with APIC ID 0 exists" > APIC ID 0 was used by the cpu of '-smp 1'. So you should use apic-id=1 > (also try to set -smp 0, but it's not working). > > > > 2) second problem, if I start with > -smp 1,sockets=2,cores=1,maxcpus=2 > > then hmp: > device_add kvm64-x86_64-cpu,apic-id=1,id=cpu1 > > then hmp : device_del cpu1 > > Got an error:" > This is the last cpu, should not be removed!" > > Oh, it's our problem, thanks for your pointing out. I will fix it in next version. Regards, Zhu > > This is coming from > [PATCH 06/12] pc: add cpu hot unplug request callback support > + if (smp_cpus == 1) { > + error_setg(&local_err, > + "This is the last cpu, should not be removed!"); > + goto out; > + } > > > > So, the only way unplug is working for me, is to start with -smp 2 minimum > -smp 2,sockets=2,cores=1,maxcpus=4 > > Then I can hotplug|unplug cpuid >= 2 > > > > Regards, > > Alexandre Derumier [...]
Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support
Thanks for your reply. 2 others things: 1) on cpu unplug, I see that the cpu is correctly removed from my linux guest but not from qemu starting with a guest with 3cpus: guest: #ls -lah /sys/devices/system/ |grep cpu drwxr-xr-x 6 root root0 Jan 25 22:16 cpu0 drwxr-xr-x 6 root root0 Jan 25 22:16 cpu1 drwxr-xr-x 6 root root0 Jan 25 22:16 cpu2 hmp: # info cpus * CPU #0: pc=0x81057022 (halted) thread_id=24972 CPU #1: pc=0x81057022 (halted) thread_id=24973 CPU #2: pc=0x81048bc1 (halted) thread_id=25102 then unplug cpu2 hmp : device_del cpu2 guest: dmesg: [ 176.219754] Unregister pv shared memory for cpu 2 [ 176.278881] smpboot: CPU 2 is now offline #ls -lah /sys/devices/system/ |grep cpu drwxr-xr-x 6 root root0 Jan 25 22:16 cpu0 drwxr-xr-x 6 root root0 Jan 25 22:16 cpu1 hmp: # info cpus * CPU #0: pc=0x81057022 (halted) thread_id=24972 CPU #1: pc=0x81057022 (halted) thread_id=24973 CPU #2: pc=0x81048bc1 (halted) thread_id=25102 2)when numa is used, the hotplugged cpu is always on numa node 0 (cpu_add or device_add cpu) starting a guest, with 2 sockets,1 cores -smp 2,sockets=2,cores=1,maxcpus=2 -object memory-backend-ram,size=256M,id=ram-node0 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -object memory-backend-ram,size=256M,id=ram-node1 -numa node,nodeid=1,cpus=1,memdev=ram-node1 hmp: # info numa 2 nodes node 0 cpus: 0 node 0 size: 256 MB node 1 cpus: 1 node 1 size: 256 MB ok now starting with same topology, but with 1cpu at start -smp 2,sockets=2,cores=1,maxcpus=2 -object memory-backend-ram,size=256M,id=ram-node0 -numa node,nodeid=0,cpus=0,memdev=ram-node0 -object memory-backend-ram,size=256M,id=ram-node1 -numa node,nodeid=1,cpus=1,memdev=ram-node1 # info numa 2 nodes node 0 cpus: 0 node 0 size: 256 MB node 1 cpus: node 1 size: 256 MB hotpluging a cpu # device_add kvm64-x86_64-cpu,apic-id=1,id=cpu1 # info numa 2 nodes node 0 cpus: 0 1 node 0 size: 256 MB node 1 cpus: node 1 size: 256 MB cpu1 should be on node1, not node0. Regards, Alexandre - Mail original - De: "Zhu Guihua" À: "aderumier" Cc: "qemu-devel" , tangc...@cn.fujitsu.com, "guz fnst" , "isimatu yasuaki" , "Anshul Makkar" , "chen fan fnst" , "Igor Mammedov" , "afaerber" Envoyé: Lundi 26 Janvier 2015 03:01:48 Objet: Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support On Fri, 2015-01-23 at 11:24 +0100, Alexandre DERUMIER wrote: > Hello, > > I'm currently testing the new cpu unplug features, > Works fine here with debian guests and kernel 3.14. > Thanks for your test. > But I have notice some small potential bugs, but I'm not sure I'm doing it > right. > > 1)first, to unplug cpu, we need an id for cpu > Yes, if you want to unplug cpu, you must have an id for cpu. > The problem is that the current qemu command line > -smp 1,sockets=2,cores=1,maxcpus=2 > > for example, will create 1 cpu on apic-id 0 without any id, so we can't > unplug it. > > > So, I have tried with > > -smp 1,sockets=2,cores=1,maxcpus=2 -device kvm64-x86_64-cpu,apic-id=0,id=cpu0 > > But this give me an error: > "-device kvm64-x86_64-cpu,apic-id=0,id=cpu0: CPU with APIC ID 0 exists" > APIC ID 0 was used by the cpu of '-smp 1'. So you should use apic-id=1 > (also try to set -smp 0, but it's not working). > > > > 2) second problem, if I start with > -smp 1,sockets=2,cores=1,maxcpus=2 > > then hmp: > device_add kvm64-x86_64-cpu,apic-id=1,id=cpu1 > > then hmp : device_del cpu1 > > Got an error:" > This is the last cpu, should not be removed!" > > Oh, it's our problem, thanks for your pointing out. I will fix it in next version. Regards, Zhu > > This is coming from > [PATCH 06/12] pc: add cpu hot unplug request callback support > + if (smp_cpus == 1) { > + error_setg(&local_err, > + "This is the last cpu, should not be removed!"); > + goto out; > + } > > > > So, the only way unplug is working for me, is to start with -smp 2 minimum > -smp 2,sockets=2,cores=1,maxcpus=4 > > Then I can hotplug|unplug cpuid >= 2 > > > > Regards, > > Alexandre Derumier [...]
Re: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support
Hello, I'm currently testing the new cpu unplug features, Works fine here with debian guests and kernel 3.14. But I have notice some small potential bugs, but I'm not sure I'm doing it right. 1)first, to unplug cpu, we need an id for cpu The problem is that the current qemu command line -smp 1,sockets=2,cores=1,maxcpus=2 for example, will create 1 cpu on apic-id 0 without any id, so we can't unplug it. So, I have tried with -smp 1,sockets=2,cores=1,maxcpus=2 -device kvm64-x86_64-cpu,apic-id=0,id=cpu0 But this give me an error: "-device kvm64-x86_64-cpu,apic-id=0,id=cpu0: CPU with APIC ID 0 exists" (also try to set -smp 0, but it's not working). 2) second problem, if I start with -smp 1,sockets=2,cores=1,maxcpus=2 then hmp: device_add kvm64-x86_64-cpu,apic-id=1,id=cpu1 then hmp : device_del cpu1 Got an error:" This is the last cpu, should not be removed!" This is coming from [PATCH 06/12] pc: add cpu hot unplug request callback support +if (smp_cpus == 1) { +error_setg(&local_err, + "This is the last cpu, should not be removed!"); +goto out; +} So, the only way unplug is working for me, is to start with -smp 2 minimum -smp 2,sockets=2,cores=1,maxcpus=4 Then I can hotplug|unplug cpuid >= 2 Regards, Alexandre Derumier - Mail original - De: "Zhu Guihua" À: "qemu-devel" Cc: "Zhu Guihua" , tangc...@cn.fujitsu.com, "guz fnst" , "isimatu yasuaki" , "Anshul Makkar" , "chen fan fnst" , "Igor Mammedov" , "afaerber" Envoyé: Mercredi 14 Janvier 2015 08:44:53 Objet: [Qemu-devel] [PATCH v2 00/11] cpu: add i386 cpu hot remove support This series is based on chen fan's previous i386 cpu hot remove patchset: https://lists.nongnu.org/archive/html/qemu-devel/2013-12/msg04266.html Via implementing ACPI standard methods _EJ0 in ACPI table, after Guest OS remove one vCPU online, the fireware will store removed bitmap to QEMU, then QEMU could know to notify the assigned vCPU of exiting. Meanwhile, intruduce the QOM command 'device_del' to remove vCPU from QEMU itself. The whole work is based on the new hot plug/unplug framework, ,the unplug request callback does the pre-check and send the request, unplug callback does the removal handling. This series depends on tangchen's common hot plug/unplug enhance patchset. [RESEND PATCH v1 0/5] Common unplug and unplug request cb for memory and CPU hot-unplug https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg00429.html The is the second half of the previous series: [RFC V2 00/10] cpu: add device_add foo-x86_64-cpu and i386 cpu hot remove support https://lists.nongnu.org/archive/html/qemu-devel/2014-08/msg04779.html If you want to test the series, you need to apply the 'device_add foo-x86_64-cpu' patchset first: [PATCH v3 0/7] cpu: add device_add foo-x86_64-cpu support https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg01552.html --- Changelog since v1: -rebase on the latest version. -delete patch i386/cpu: add instance finalize callback, and put it into patchset [PATCH v3 0/6] cpu: add device_add foo-x86_64-cpu support. Changelog since RFC: -splited the i386 cpu hot remove into single thread. -replaced apic_no with apic_id, so does the related stuff to make it work with arbitrary CPU hotadd. -add the icc_device_unrealize callback to handle apic unrealize. -rework on the new hot plug/unplug platform. --- Chen Fan (2): x86: add x86_cpu_unrealizefn() for cpu apic remove cpu hotplug: implement function cpu_status_write() for vcpu ejection Gu Zheng (5): acpi/cpu: add cpu hot unplug request callback function acpi/piix4: add cpu hot unplug callback support acpi/ich9: add cpu hot unplug support pc: add cpu hot unplug callback support cpus: reclaim allocated vCPU objects Zhu Guihua (4): acpi/piix4: add cpu hot unplug request callback support acpi/ich9: add cpu hot unplug request callback support pc: add cpu hot unplug request callback support acpi/cpu: add cpu hot unplug callback function cpus.c | 44 hw/acpi/cpu_hotplug.c | 88 --- hw/acpi/ich9.c | 17 ++-- hw/acpi/piix4.c | 12 +- hw/core/qdev.c | 2 +- hw/cpu/icc_bus.c | 11 + hw/i386/acpi-dsdt-cpu-hotplug.dsl | 6 ++- hw/i386/kvm/apic.c | 8 hw/i386/pc.c | 62 +-- hw/intc/apic.c | 10 + hw/intc/apic_common.c | 21 ++ include/hw/acpi/cpu_hotplug.h | 8 include/hw/cpu/icc_bus.h | 1 + include/hw/i386/apic_internal.h | 1 + include/hw/qdev-core.h | 1 + include/qom/cpu.h | 9 include/sysemu/kvm.h | 1 + kvm-all.c | 57 - target-i386/cpu.c | 46 19 files changed, 378 insertions(+), 27 deletions(-) -- 1.9.3
Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2)
Ok, I think it's a windows bug, I have the same topology than you, I don't work. (I see the new vcpu in device manager, but not in taskmgr or perform) Rebooting the os make the new vcpu display, so it's seem to be ok on qemu side. (Just to known, which version of win2012r2 do you use ? standard or datacenter ?) Thanks for your help! Regards, Alexandre - Mail original - De: "Andrey Korolyov" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 21 Janvier 2015 00:16:45 Objet: Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2) On Tue, Jan 20, 2015 at 10:06 PM, Alexandre DERUMIER wrote: > Hi, > > I have tried with numa enabled, and it's still don't work. > Can you send me your vm qemu command line ? > > > Also, with numa I have notice something strange with "info numa" command. > > starting with -smp socket=2,cores=1 > > # info numa > 2 nodes > node 0 cpus: 0 > node 0 size: 2048 MB > node 1 cpus: 1 > node 1 size: 2048 MB > > ok > > > now: > > starting with -smp 1,socket=2,cores=1,maxcpus=2 > > # info numa > 2 nodes > node 0 cpus: 0 > node 0 size: 2048 MB > node 1 cpus: > node 1 size: 2048 MB > > ok > > > now hotplug cpu > > # cpu-add 1 > # info numa > 2 nodes > node 0 cpus: 0 1 > node 0 size: 2048 MB > node 1 cpus: > node 1 size: 2048 MB > > > cpu1 has been hotplugged on numa node0 ?? > > http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02240.html here are working examples. Yes, everything goes to the node0 unless specified (you can see an example for bounded sockets in this url too).
Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2)
Hi, I have tried with numa enabled, and it's still don't work. Can you send me your vm qemu command line ? Also, with numa I have notice something strange with "info numa" command. starting with -smp socket=2,cores=1 # info numa 2 nodes node 0 cpus: 0 node 0 size: 2048 MB node 1 cpus: 1 node 1 size: 2048 MB ok now: starting with -smp 1,socket=2,cores=1,maxcpus=2 # info numa 2 nodes node 0 cpus: 0 node 0 size: 2048 MB node 1 cpus: node 1 size: 2048 MB ok now hotplug cpu # cpu-add 1 # info numa 2 nodes node 0 cpus: 0 1 node 0 size: 2048 MB node 1 cpus: node 1 size: 2048 MB cpu1 has been hotplugged on numa node0 ?? - Mail original - De: "Andrey Korolyov" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 14 Janvier 2015 17:07:41 Objet: Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2) On Fri, Jan 9, 2015 at 4:35 PM, Andrey Korolyov wrote: > On Fri, Jan 9, 2015 at 1:26 PM, Alexandre DERUMIER > wrote: >> Hi, >> >> I'm currently testing cpu hotplug with a windows 2012R2 standard guest, >> >> and I can't get it too work. (works fine with linux guest). >> >> host kernel : rhel7 3.10 kernel >> qemu 2.2 >> >> >> qemu command line : -smp cpus=1,sockets=2,cores=1,maxcpus=2 >> >> Started with 1cpu, topogoly is 2sockets with 1cores. >> >> >> Then >> >> qmp# cpu-add 1 >> >> >> I can see a new cpu is windows device manager, and event log in the device >> said that it's online. >> >> So it should be ok, but. >> >> I can't see new processor in taskmanager or perfmon. (I had tried to >> relaunch them to be sure.). >> >> >> So, it is a windows bug ? Does I need to do something else ? >> > > Did you populated appropriate topology in arguments? As far as I can > remember from pre-1.1 era CPU hotplug not worked with windows, so it > should be neither lack of configured NUMA or a simply OS-specific > issue. Just to let anyone know that it works well: http://xdel.ru/downloads/windows-hotplugged-cpu.png Topology is a bit rotten (I have two sockets with single CPU in each) and I *need* to specify non-zero amount of memory for each numa node to be seen, if each CPU belongs to different node for example but everything else is just fine.
Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2)
>>it might be a limitation of specific Windows edition, >>I've tested it with W2012R2 Datacenter edition where it's supported for sure. >From the ovirt documentation: http://www.ovirt.org/Hot_plug_cpu It should work with any win2012 version. (Indeed for win2008, datacenter edition was needed) - Mail original - De: "Igor Mammedov" À: "aderumier" Cc: "qemu-devel" Envoyé: Lundi 19 Janvier 2015 17:06:37 Objet: Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2) On Fri, 9 Jan 2015 11:26:08 +0100 (CET) Alexandre DERUMIER wrote: > Hi, > > I'm currently testing cpu hotplug with a windows 2012R2 standard guest, > > and I can't get it too work. (works fine with linux guest). > > host kernel : rhel7 3.10 kernel > qemu 2.2 > > > qemu command line : -smp cpus=1,sockets=2,cores=1,maxcpus=2 > > Started with 1cpu, topogoly is 2sockets with 1cores. > > > Then > > qmp# cpu-add 1 > > > I can see a new cpu is windows device manager, and event log in the device > said that it's online. > > So it should be ok, but. > > I can't see new processor in taskmanager or perfmon. (I had tried to relaunch > them to be sure.). > > > So, it is a windows bug ? Does I need to do something else ? it might be a limitation of specific Windows edition, I've tested it with W2012R2 Datacenter edition where it's supported for sure. > > > > > > >
Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2)
>> I *need* to specify non-zero amount of memory for each numa node >>to be seen, if each CPU belongs to different node for example but >>everything else is just fine. Ok, thanks I'll try to enable numa to see if it's helping (I known that numa is mandatory for memory hotplug, maybe for cpu too) - Mail original - De: "Andrey Korolyov" À: "aderumier" Cc: "qemu-devel" Envoyé: Mercredi 14 Janvier 2015 17:07:41 Objet: Re: [Qemu-devel] cpu hotplug and windows guest (win2012r2) On Fri, Jan 9, 2015 at 4:35 PM, Andrey Korolyov wrote: > On Fri, Jan 9, 2015 at 1:26 PM, Alexandre DERUMIER > wrote: >> Hi, >> >> I'm currently testing cpu hotplug with a windows 2012R2 standard guest, >> >> and I can't get it too work. (works fine with linux guest). >> >> host kernel : rhel7 3.10 kernel >> qemu 2.2 >> >> >> qemu command line : -smp cpus=1,sockets=2,cores=1,maxcpus=2 >> >> Started with 1cpu, topogoly is 2sockets with 1cores. >> >> >> Then >> >> qmp# cpu-add 1 >> >> >> I can see a new cpu is windows device manager, and event log in the device >> said that it's online. >> >> So it should be ok, but. >> >> I can't see new processor in taskmanager or perfmon. (I had tried to >> relaunch them to be sure.). >> >> >> So, it is a windows bug ? Does I need to do something else ? >> > > Did you populated appropriate topology in arguments? As far as I can > remember from pre-1.1 era CPU hotplug not worked with windows, so it > should be neither lack of configured NUMA or a simply OS-specific > issue. Just to let anyone know that it works well: http://xdel.ru/downloads/windows-hotplugged-cpu.png Topology is a bit rotten (I have two sockets with single CPU in each) and I *need* to specify non-zero amount of memory for each numa node to be seen, if each CPU belongs to different node for example but everything else is just fine.
[Qemu-devel] cpu hotplug and windows guest (win2012r2)
Hi, I'm currently testing cpu hotplug with a windows 2012R2 standard guest, and I can't get it too work. (works fine with linux guest). host kernel : rhel7 3.10 kernel qemu 2.2 qemu command line : -smp cpus=1,sockets=2,cores=1,maxcpus=2 Started with 1cpu, topogoly is 2sockets with 1cores. Then qmp# cpu-add 1 I can see a new cpu is windows device manager, and event log in the device said that it's online. So it should be ok, but. I can't see new processor in taskmanager or perfmon. (I had tried to relaunch them to be sure.). So, it is a windows bug ? Does I need to do something else ?
Re: [Qemu-devel] iothread object hotplug ?
>>Yes, there is an object_add/object-add command (respectively HMP and >>QMP), but I don't think libvirt has bindings already. Thanks Paolo ! I'm currently implementing iothread on proxmox, so no libvirt. Alexandre. - Mail original - De: "Paolo Bonzini" À: "Alexandre DERUMIER" , "qemu-devel" Envoyé: Vendredi 14 Novembre 2014 09:37:02 Objet: Re: iothread object hotplug ? On 14/11/2014 08:25, Alexandre DERUMIER wrote: > Hi, > > I would like to known if it's possible to hot-add|hot-plug an iothread object > on a running guest ? Yes, there is an object_add/object-add command (respectively HMP and QMP), but I don't think libvirt has bindings already. Paolo
[Qemu-devel] iothread object hotplug ?
Hi, I would like to known if it's possible to hot-add|hot-plug an iothread object on a running guest ? (I would like to be able to hotplug new virtio devices on new iothread at the same time) Regards, Alexandre
Re: [Qemu-devel] is it possible to use a disk with multiple iothreads ?
>>You are missing debug information unfortunately, Ok thanks, I'll try to add qemu debug symbols. (I have already libc6,librbd,librados debug symbols installed) - Mail original - De: "Paolo Bonzini" À: "Alexandre DERUMIER" , "Stefan Hajnoczi" Cc: "josh durgin" , "qemu-devel" Envoyé: Lundi 27 Octobre 2014 15:47:19 Objet: Re: is it possible to use a disk with multiple iothreads ? On 10/27/2014 03:13 PM, Alexandre DERUMIER wrote: >>> That's very interesting! Please keep Josh and me in CC when you want to >>> >>discuss the results. > Here the aggregate results (perf report details attached in this mail) > > + 33,02% 3974368751 kvm [kernel.kallsyms] > + 23,66% 2847206635 kvm libc-2.13.so > + 18,79% 2262052133 kvm librados.so.2.0.0 > + 11,04% 1328581527 kvm librbd.so.1.0.0 > + 5,87% 706713737 kvm libpthread-2.13.so > + 3,75% 451690142 kvm kvm > + 2,74% 329457334 kvm libstdc++.so.6.0.17 > + 0,51% 61519664 kvm [vdso] + 0,42% 5089 kvm libglib-2.0.so.0.3200.4 + > 0,15% 18119658 kvm libm-2.13.so + 0,05% 5705776 kvm librt-2.13.so + 0,00% > 356625 kvm libz.so.1. 2.7 > > > > >>> >> 23,66% 2847206635 kvm libc-2.13.so > This one is mostly malloc,free,... > > > I see almost same results using fio with rbdengine on the host (outside the > kvm process). > So I think they are all mostly related to librbd. You are missing debug information unfortunately, but the slowdown seems to be related to futexes. Paolo
Re: [Qemu-devel] is it possible to use a disk with multiple iothreads ?
>>virtio-blk and virtio-scsi emulation only runs in 1 thread at a time. >>It is currently not possible to achieve true multiqueue from guest, >>through QEMU, and down to the host. >> >>This is what the final slides in my presentation were about. Ok Thanks ! >>Regarding Ceph, do you know why it burns a lot of CPU? I'm currently doing some perfs to find why is so huge. One thing is that qemu+ qemu rbd block driver + librbd seem to use 2-3x more cpu, than passing krbd /dev/rbd0 from host to qemu. Don't known yet if the problem is librbd or qemu rbd block driver. - Mail original - De: "Stefan Hajnoczi" À: "Alexandre DERUMIER" Cc: "qemu-devel" , "josh durgin" Envoyé: Vendredi 24 Octobre 2014 11:04:06 Objet: Re: is it possible to use a disk with multiple iothreads ? On Thu, Oct 23, 2014 at 08:26:17PM +0200, Alexandre DERUMIER wrote: > I was reading this interesting presentation, > > http://vmsplice.net/~stefan/stefanha-kvm-forum-2014.pdf > > and I have a specific question. > > > I'm currently evaluate ceph/rbd storage performance through qemu, > > and the current bottleneck seem to be the cpu usage of the iothread. > (rbd procotol cpu usage is really huge). > > Currently It's around 1iops on 1core of recent xeons. > > > > So, I had tried with virtio-scsi multiqueue, but it doesn't help. (Not sure, > but I think that all queues are mapped on 1 iothread ?) > > I had tried with with virtio-blk and iothread/dataplane, it's help a little , > but not too much too. > > So, my question is : Is it possible to use multiple iothreads with 1 disk ? > > (something like 1 queue - 1 iothread?) virtio-blk and virtio-scsi emulation only runs in 1 thread at a time. It is currently not possible to achieve true multiqueue from guest, through QEMU, and down to the host. This is what the final slides in my presentation were about. Regarding Ceph, do you know why it burns a lot of CPU? CCed Josh in case he has ideas for reducing Ceph library CPU utilization. Stefan
[Qemu-devel] is it possible to use a disk with multiple iothreads ?
Hi, I was reading this interesting presentation, http://vmsplice.net/~stefan/stefanha-kvm-forum-2014.pdf and I have a specific question. I'm currently evaluate ceph/rbd storage performance through qemu, and the current bottleneck seem to be the cpu usage of the iothread. (rbd procotol cpu usage is really huge). Currently It's around 1iops on 1core of recent xeons. So, I had tried with virtio-scsi multiqueue, but it doesn't help. (Not sure, but I think that all queues are mapped on 1 iothread ?) I had tried with with virtio-blk and iothread/dataplane, it's help a little , but not too much too. So, my question is : Is it possible to use multiple iothreads with 1 disk ? (something like 1 queue - 1 iothread?) Regards, Alexandre
Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image
>>Ah, you're right. We need to add an options field, or use a new >>blockdev-mirror command. Ok, thanks. Can't help to implement this, but I'll glad to help for testing. - Mail original - De: "Paolo Bonzini" À: "Alexandre DERUMIER" Cc: "Ceph Devel" , "qemu-devel" Envoyé: Lundi 13 Octobre 2014 09:06:01 Objet: Re: qemu drive-mirror to rbd storage : no sparse rbd image Il 13/10/2014 08:06, Alexandre DERUMIER ha scritto: > > Also, about drive-mirror, I had tried with detect-zeroes with simple qcow2 > file, > and It don't seem to help. > I'm not sure that detect-zeroes is implement in drive-mirror. > > also, the target mirrored volume don't seem to have the detect-zeroes option > > > # info block > drive-virtio1: /source.qcow2 (qcow2) > Detect zeroes: on > > #du -sh source.qcow2 : 2M > > drive-mirror source.qcow2 -> target.qcow2 > > # info block > drive-virtio1: /target.qcow2 (qcow2) > > #du -sh target.qcow2 : 11G > Ah, you're right. We need to add an options field, or use a new blockdev-mirror command. Paolo
Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image
>>Lack of bdrv_co_write_zeroes is why detect-zeroes does not work. >> >>Lack of bdrv_get_block_status is why sparse->sparse does not work >>without detect-zeroes. Ok, thanks Paolo ! Both are missing in rbd block driver. @ceph-devel . Could it be possible to implement them ? Also, about drive-mirror, I had tried with detect-zeroes with simple qcow2 file, and It don't seem to help. I'm not sure that detect-zeroes is implement in drive-mirror. also, the target mirrored volume don't seem to have the detect-zeroes option # info block drive-virtio1: /source.qcow2 (qcow2) Detect zeroes:on #du -sh source.qcow2 : 2M drive-mirror source.qcow2 -> target.qcow2 # info block drive-virtio1: /target.qcow2 (qcow2) #du -sh target.qcow2 : 11G - Mail original - De: "Paolo Bonzini" À: "Alexandre DERUMIER" , "qemu-devel" Cc: "Ceph Devel" Envoyé: Dimanche 12 Octobre 2014 15:02:12 Objet: Re: qemu drive-mirror to rbd storage : no sparse rbd image Il 08/10/2014 13:15, Alexandre DERUMIER ha scritto: > Hi, > > I'm currently planning to migrate our storage to ceph/rbd through qemu > drive-mirror > > and It seem that drive-mirror with rbd block driver, don't create a sparse > image. (all zeros are copied to the target rbd). > > Also note, that it's working fine with "qemu-img convert" , the rbd volume is > sparse after conversion. > > > Could it be related to the "bdrv_co_write_zeroes" missing features in > block/rbd.c ? > > (It's available in other block drivers (scsi,gluster,raw-aio) , and I don't > have this problem with theses block drivers). Lack of bdrv_co_write_zeroes is why detect-zeroes does not work. Lack of bdrv_get_block_status is why sparse->sparse does not work without detect-zeroes. Paolo
Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image
>>These don't tell me much. Maybe it's better to show the actual commands and >>how >>you tell sparse from no sparse? Well, I create 2 empty source images files of 10G. (source.qcow2 and source.raw) then: du -sh source.qcow2 : 2M du -sh source.raw : 0M then I convert them with qemu-img convert : (source.qcow2 -> target.qcow2 , source.raw -> target.raw) du -sh target.qcow2 : 2M du -sh target.raw : 0M (So it's ok here) But If I convert them with drive-mirror: du -sh target.qcow2 : 11G du -sh target.raw : 11G I have also double check with #df, and I see space allocated on filesystem when using drive-mirror. I have the same behavior if target is a rbd storage. Also I have done test again ext3,ext4, and I see the same problem than with xfs. I'm pretty sure that drive-mirror copy zero block, qemu-img convert take around 2s to convert the empty file (because it's skipping zero block), and drive mirror take around 5min. - Mail original - De: "Fam Zheng" À: "Alexandre DERUMIER" Cc: "qemu-devel" , "Ceph Devel" Envoyé: Samedi 11 Octobre 2014 10:25:35 Objet: Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image On Sat, 10/11 10:00, Alexandre DERUMIER wrote: > >>What is the source format? If the zero clusters are actually unallocated in > >>the > >>source image, drive-mirror will not write those clusters either. I.e. with > >>"drive-mirror sync=top", both source and target should have the same > >>"qemu-img > >>map" output. > > Thanks for your reply, > > I had tried drive mirror (sync=full) with > > raw file (sparse) -> rbd (no sparse) > rbd (sparse) -> rbd (no sparse) > raw file (sparse) -> qcow2 on ext4 (sparse) > rbd (sparse) -> raw on ext4 (sparse) > > Also I see that I have the same problem with target file format on xfs. > > raw file (sparse) -> qcow2 on xfs (no sparse) > rbd (sparse) -> raw on xfs (no sparse) > These don't tell me much. Maybe it's better to show the actual commands and how you tell sparse from no sparse? Does "qcow2 -> qcow2" work for you on xfs? > > I only have this problem with drive-mirror, qemu-img convert seem to simply > skip zero blocks. > > > Or maybe this is because I'm using sync=full ? > > What is the difference between full and top ? > > ""sync": what parts of the disk image should be copied to the destination; > possibilities include "full" for all the disk, "top" for only the sectors > allocated in the topmost image". > > (what is topmost image ?) For "sync=top", only the clusters allocated in the image itself is copied; for "full", all those clusters allocated in the image itself, and its backing image, and it's backing's backing image, ..., are copied. The image itself, having a backing image or not, is called the topmost image. Fam
Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image
>>Just a wild guess - Alexandre, did you tried detect-zeroes blk option >>for mirroring targets? Hi, yes, I have also tried with detect-zeroes (on or discard) with virtio and virtio-scsi, doesn't help. (I'm not sure that is implemtend in drive-mirror). As workaround currently, after drive-mirror, I doing fstrim inside the guest (with virtio-scsi + discard), and like this I can free space on rbd storage. - Mail original - De: "Andrey Korolyov" À: "Fam Zheng" Cc: "Alexandre DERUMIER" , "qemu-devel" , "Ceph Devel" Envoyé: Samedi 11 Octobre 2014 10:30:40 Objet: Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image On Sat, Oct 11, 2014 at 12:25 PM, Fam Zheng wrote: > On Sat, 10/11 10:00, Alexandre DERUMIER wrote: >> >>What is the source format? If the zero clusters are actually unallocated >> >>in the >> >>source image, drive-mirror will not write those clusters either. I.e. with >> >>"drive-mirror sync=top", both source and target should have the same >> >>"qemu-img >> >>map" output. >> >> Thanks for your reply, >> >> I had tried drive mirror (sync=full) with >> >> raw file (sparse) -> rbd (no sparse) >> rbd (sparse) -> rbd (no sparse) >> raw file (sparse) -> qcow2 on ext4 (sparse) >> rbd (sparse) -> raw on ext4 (sparse) >> >> Also I see that I have the same problem with target file format on xfs. >> >> raw file (sparse) -> qcow2 on xfs (no sparse) >> rbd (sparse) -> raw on xfs (no sparse) >> > > These don't tell me much. Maybe it's better to show the actual commands and > how > you tell sparse from no sparse? > > Does "qcow2 -> qcow2" work for you on xfs? > >> >> I only have this problem with drive-mirror, qemu-img convert seem to simply >> skip zero blocks. >> >> >> Or maybe this is because I'm using sync=full ? >> >> What is the difference between full and top ? >> >> ""sync": what parts of the disk image should be copied to the destination; >> possibilities include "full" for all the disk, "top" for only the sectors >> allocated in the topmost image". >> >> (what is topmost image ?) > > For "sync=top", only the clusters allocated in the image itself is copied; for > "full", all those clusters allocated in the image itself, and its backing > image, and it's backing's backing image, ..., are copied. > > The image itself, having a backing image or not, is called the topmost image. > > Fam > -- Just a wild guess - Alexandre, did you tried detect-zeroes blk option for mirroring targets?
Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image
>>What is the source format? If the zero clusters are actually unallocated in >>the >>source image, drive-mirror will not write those clusters either. I.e. with >>"drive-mirror sync=top", both source and target should have the same "qemu-img >>map" output. Thanks for your reply, I had tried drive mirror (sync=full) with raw file (sparse) -> rbd (no sparse) rbd (sparse) -> rbd (no sparse) raw file (sparse) -> qcow2 on ext4 (sparse) rbd (sparse) -> raw on ext4 (sparse) Also I see that I have the same problem with target file format on xfs. raw file (sparse) -> qcow2 on xfs (no sparse) rbd (sparse) -> raw on xfs (no sparse) I only have this problem with drive-mirror, qemu-img convert seem to simply skip zero blocks. Or maybe this is because I'm using sync=full ? What is the difference between full and top ? ""sync": what parts of the disk image should be copied to the destination; possibilities include "full" for all the disk, "top" for only the sectors allocated in the topmost image". (what is topmost image ?) - Mail original - De: "Fam Zheng" À: "Alexandre DERUMIER" Cc: "qemu-devel" , "Ceph Devel" Envoyé: Samedi 11 Octobre 2014 09:01:18 Objet: Re: [Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image On Wed, 10/08 13:15, Alexandre DERUMIER wrote: > Hi, > > I'm currently planning to migrate our storage to ceph/rbd through qemu > drive-mirror > > and It seem that drive-mirror with rbd block driver, don't create a sparse > image. (all zeros are copied to the target rbd). > > Also note, that it's working fine with "qemu-img convert" , the rbd volume is > sparse after conversion. What is the source format? If the zero clusters are actually unallocated in the source image, drive-mirror will not write those clusters either. I.e. with "drive-mirror sync=top", both source and target should have the same "qemu-img map" output. Fam > > > Could it be related to the "bdrv_co_write_zeroes" missing features in > block/rbd.c ? > > (It's available in other block drivers (scsi,gluster,raw-aio) , and I don't > have this problem with theses block drivers). > > > > Regards, > > Alexandre Derumier > > >
[Qemu-devel] drive-mirror remove detect-zeroes drive property
Hi, It seem that drive-mirror block job, remove the detect-zeroes drive property on the target drive qemu -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/source.raw,if=none,id=drive-scsi2,cache=writeback,discard=on,aio=native,detect-zeroes=unmap # info block drive-scsi2: /source.raw (raw) Detect zeroes:unmap do a drive mirror to a new target.raw file # info block drive-scsi2: /target.raw (raw) is it a bug ?
[Qemu-devel] qemu drive-mirror to rbd storage : no sparse rbd image
Hi, I'm currently planning to migrate our storage to ceph/rbd through qemu drive-mirror and It seem that drive-mirror with rbd block driver, don't create a sparse image. (all zeros are copied to the target rbd). Also note, that it's working fine with "qemu-img convert" , the rbd volume is sparse after conversion. Could it be related to the "bdrv_co_write_zeroes" missing features in block/rbd.c ? (It's available in other block drivers (scsi,gluster,raw-aio) , and I don't have this problem with theses block drivers). Regards, Alexandre Derumier
Re: [Qemu-devel] is x-data-plane considered "stable" ?
Ok, Great :) Thanks ! - Mail original - De: "Paolo Bonzini" À: "Alexandre DERUMIER" Cc: qemu-devel@nongnu.org, "Scott Sullivan" Envoyé: Vendredi 3 Octobre 2014 17:33:00 Objet: Re: is x-data-plane considered "stable" ? Il 03/10/2014 16:26, Alexandre DERUMIER ha scritto: > Hi Paolo, > > do you you think it'll be possible to use block jobs with dataplane ? > > Or is it technically impossible ? I think Stefan has posted patches for that or is going to do that very soon. The goal is to let dataplane do everything that is possible without it, at the same speed of QEMU 2.0 dataplane (right now it's a bit slower, but can already do much more of course). Same for virtio-scsi. Paolo
Re: [Qemu-devel] is x-data-plane considered "stable" ?
Hi Paolo, do you you think it'll be possible to use block jobs with dataplane ? Or is it technically impossible ? - Mail original - De: "Paolo Bonzini" À: "Alexandre DERUMIER" , "Scott Sullivan" Cc: qemu-devel@nongnu.org Envoyé: Vendredi 3 Octobre 2014 15:40:59 Objet: Re: is x-data-plane considered "stable" ? Il 03/10/2014 12:11, Alexandre DERUMIER ha scritto: > I don't now about the stability, but here what is working and I have tested: > > -live migration > -resizing > -io throttling > -hotplugging > > I think block jobs (mirror,backup,...) don't work yet. > > > it's working with virtio-blk and virtio-scsi support is coming for qemu 2.2 virtio-scsi support will _not_ be stable in 2.2. It will only be there for developers, basically. Paolo
Re: [Qemu-devel] is x-data-plane considered "stable" ?
Hi, x-data-plane syntax is deprecated (should be remove in qemu 2.2), it's using now iothreads http://comments.gmane.org/gmane.comp.emulators.qemu/279118 qemu -object iothread,id=iothread0 \ -drive if=none,id=drive0,file=test.qcow2,format=qcow2 \ -device virtio-blk-pci,iothread=iothread0,drive=drive0 I don't now about the stability, but here what is working and I have tested: -live migration -resizing -io throttling -hotplugging I think block jobs (mirror,backup,...) don't work yet. it's working with virtio-blk and virtio-scsi support is coming for qemu 2.2 Regards, alexandre - Mail original - De: "Scott Sullivan" À: qemu-devel@nongnu.org Envoyé: Jeudi 2 Octobre 2014 17:20:55 Objet: [Qemu-devel] is x-data-plane considered "stable" ? Can anyone tell me if in any QEMU release x-data-plane is considered "stable"? If its unclear, I am referring to the feature introduced in QEMU 1.4 for high performance disk I/O called virtio-blk data plane. http://blog.vmsplice.net/2013/03/new-in-qemu-14-high-performance-virtio.html I've tried searching docs, but can't find any mention if its still considered experimental or not in the latest QEMU releases.
[Qemu-devel] ballooning not working on hotplugged pc-dimm
Hello, I was playing with pc-dimm hotplug, and I notice that balloning is not working on memory space of pc-dimm devices. example: qemu -m size=1024,slots=255,maxmem=15000M #free -m : 1024M -> qmp balloon 512M #free -m : 512M -> hotplug pc-dimm 1G: #free -m : 1512M (This is the same behavior if qemu is started with pc-dimm devices) qemu 2.1 Guest kernel : 3.12. Does it need a guest balloon module update ? Regards, Alexandre Derumier
Re: [Qemu-devel] q35 : virtio-serial on pci bridge : bus not found
>>No, you should lay 'kvm: -device virtio-serial,id=spice,bus=pci.0' before >>"-readconfig /usr/share/qemu-server/pve-q35.cfg",which can assure >>the bus pci.0 has been created. Oh, you are right, I didn't notice that pci bridge was defined after this device. Sorry to disturb the mailing list about this. Thanks ! Alexandre ----- Mail original - De: "Gonglei" À: "Alexandre DERUMIER" Cc: "qemu-devel" Envoyé: Mardi 12 Août 2014 14:09:26 Objet: Re: [Qemu-devel] q35 : virtio-serial on pci bridge : bus not found On 2014/8/12 18:51, Alexandre DERUMIER wrote: > Hi, I can't use virtio-serial, with q35 machine, on a pci bridge (other > devices works fine). > > > Is it a known bug ? > No, you should lay 'kvm: -device virtio-serial,id=spice,bus=pci.0' before "-readconfig /usr/share/qemu-server/pve-q35.cfg",which can assure the bus pci.0 has been created. Best regards, -Gonglei > > error message: > --- > kvm: -device virtio-serial,id=spice,bus=pci.0,addr=0x9: Bus 'pci.0' not found > > > architecture is: > > pcie.0 > --->pcidmi (i82801b11-bridge) > --->pci.0 (pci-bridge) > > > > /usr/bin/kvm -id 600 -chardev > socket,id=qmp,path=/var/run/qemu-server/600.qmp,server,nowait > -mon chardev=qmp,mode=control -vnc > unix:/var/run/qemu-server/600.vnc,x509,password > -pidfile /var/run/qemu-server/600.pid -daemonize -name tankrechner -smp > sockets=1,cores=1 > -nodefaults -boot menu=on -vga qxl -cpu kvm64,+lahf_lm,+x2apic,+sep > -spice > tls-port=61002,addr=127.0.0.1,tls-ciphers=DES-CBC3-SHA,seamless-migration=on > -device virtio-serial,id=spice,bus=pci.0,addr=0x9 > -chardev spicevmc,id=vdagent,name=vdagent > -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 > -m 1024 > -readconfig /usr/share/qemu-server/pve-q35.cfg > -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 > -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 > -drive if=none,id=drive-ide2,media=cdrom,aio=native > -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 > -drive > file=/mnt/vmfs-hdd-r5_3x698/images/600/vm-600-disk-1.raw,if=none,id=drive-virtio0,format=raw,aio=native,cache=none > -device > virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 > -netdev > type=tap,id=net0,ifname=tap600i0,script=/var/lib/qemu-server/pve-bridge,vhost=on > -device > virtio-net-pci,mac=EA:07:3A:31:12:7F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 > -rtc driftfix=slew,base=localtime > -machine type=q35 > > > with pve-q35.cfg > > [device "ehci"] > driver = "ich9-usb-ehci1" > multifunction = "on" > bus = "pcie.0" > addr = "1d.7" > > [device "uhci-1"] > driver = "ich9-usb-uhci1" > multifunction = "on" > bus = "pcie.0" > addr = "1d.0" > masterbus = "ehci.0" > firstport = "0" > > [device "uhci-2"] > driver = "ich9-usb-uhci2" > multifunction = "on" > bus = "pcie.0" > addr = "1d.1" > masterbus = "ehci.0" > firstport = "2" > > [device "uhci-3"] > driver = "ich9-usb-uhci3" > multifunction = "on" > bus = "pcie.0" > addr = "1d.2" > masterbus = "ehci.0" > firstport = "4" > > [device "ehci-2"] > driver = "ich9-usb-ehci2" > multifunction = "on" > bus = "pcie.0" > addr = "1a.7" > > [device "uhci-4"] > driver = "ich9-usb-uhci4" > multifunction = "on" > bus = "pcie.0" > addr = "1a.0" > masterbus = "ehci-2.0" > firstport = "0" > > [device "uhci-5"] > driver = "ich9-usb-uhci5" > multifunction = "on" > bus = "pcie.0" > addr = "1a.1" > masterbus = "ehci-2.0" > firstport = "2" > > [device "uhci-6"] > driver = "ich9-usb-uhci6" > multifunction = "on" > bus = "pcie.0" > addr = "1a.2" > masterbus = "ehci-2.0" > firstport = "4" > > [device "audio0"] > driver = "ich9-intel-hda" > bus = "pcie.0" > addr = "1b.0" > > > [device "ich9-pcie-port-1"] > driver = "ioh3420" > multifunction = "on" > bus = "pcie.0" > addr = "1c.0" > port = "1" > chassis = "1" > > [device "ich9-pcie-port-2"] > driver = "ioh3420" > multifunction = "on" > bus = "pcie.0" > addr = "1c.1" > port = "2" > chassis = "2" > > [device "ich9-pcie-port-3"] > driver = "ioh3420" > multifunction = "on" > bus = "pcie.0" > addr = "1c.2" > port = "3" > chassis = "3" > > [device "ich9-pcie-port-4"] > driver = "ioh3420" > multifunction = "on" > bus = "pcie.0" > addr = "1c.3" > port = "4" > chassis = "4" > [device "pcidmi"} > driver = "i82801b11-bridge" > bus = "pcie.0" > addr = "1e.0" > > [device "pci.0"] > driver = "pci-bridge" > bus = "pcidmi" > addr = "1.0" > chassis_nr = "1" > > [device "pci.1"] > driver = "pci-bridge" > bus = "pcidmi" > addr = "2.0" > chassis_nr = "2" > > [device "pci.2"] > driver = "pci-bridge" > bus = "pcidmi" > addr = "3.0" > chassis_nr = "3" > >