Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
Hi, Thank you for everyone's interest and work on this. Sorry I haven't been...better. I will offer my knowledge where it helps. And the server is GPL in case that was seen as an issue. On Mon, Jun 23, 2014 at 8:18 AM, Claudio Fontana claudio.font...@huawei.com wrote: Hi, we were reading through this quickly today, and these are some of the questions that we think can came up when reading this. Answers to some of these questions we think we have figured out, but I think it's important to put this information into the documentation. I will quote the file in its entirety, and insert some questions inline. Device Specification for Inter-VM shared memory device -- The Inter-VM shared memory device is designed to share a region of memory to userspace in multiple virtual guests. What does to userspace mean in this context? The userspace of the host, or the userspace in the guest? The memory is intended to be shared between userspaces in the guests. However, since the memory is POSIX shm region, it is visible on the host too. What about The Inter-VM shared memory device is designed to share a memory region (created on the host via the POSIX shared memory API) between multiple QEMU processes running different guests. In order for all guests to be able to pick up the shared memory area, it is modeled by QEMU as a PCI device exposing said memory to the guest as a PCI BAR. Whether in those guests the memory region is used in kernel space or userspace, or there is even any meaning for those terms is guest-dependent I would think (I think of an OSv here, where the application and kernel execute at the same privilege level and in the same address space). I'm not exactly clear what you're asking here. The region is visible to both the guest kernel and userspace (once mounted). The memory region does not belong to any guest, but is a POSIX memory object on the host. Ok that's clear. One thing I would ask is, but I don't know if it makes sense to mention here, is who creates this memory object on the host? I understand in some cases it's the contributed server (what you provide in contrib/), in some cases it's the user of this device who has to write some server code for that, but is it true that also the qemu process itself can create this memory object on its own, without any external process needed? Is this the use case for host-guest only? (Answering based on my original server code) When using the server, the server creates it. Without the server, each qemu process will check if it exists and if it does, it will use it. If it does not exist, the qemu process will create it. Optionally, the device may support sending interrupts to other guests sharing the same memory region. This opens a lot of questions here which are partly answered later (If I understand correctly, not only interrupts are involved, but a complete communication protocol involving registers in BAR0), but what about staying a bit general here, like Optionally, the device may also provide a communication mechanism between guests sharing the same memory region. More details about that in the section 'OPTIONAL ivshmem guest to guest communication protocol'. Thinking out loud, I wonder if this communication mechanism should be part of this device in QEMU, or it should be provided at another layer.. The Inter-VM PCI device --- *BARs* The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is used to map the shared memory object from the host. The size of BAR2 is specified when the guest is started and must be a power of 2 in size. Are BAR0 and BAR1 optional? That's what I would think by reading the whole, but I'm still not sure. Am I forced to map BAR0 and BAR1 anyway? I don't think so, but.. They do not need to be mapped. You do not need to map them if you don't want to use them. If so, can we separate the explanation into the base shared memory feature, and a separate section which explains the OPTIONAL communication mechanism, and the OPTIONAL MSI-X BAR? For example, say that I am a potential ivshmem user (which I am), and I am interested in the shared memory but I want to use my own communication mechanism and protocol between guests, can we make it so that I don't have to wonder whether some of the info I read applies or not? The solution to that I think is to put all the OPTIONAL parts into separate sections. *Registers* Ok, so this should I think go into one such OPTIONAL sections. The device currently supports 4 registers of 32-bits each. Registers are used for synchronization between guests sharing the same memory object when interrupts are supported (this requires using the shared memory server). So use of BAR0 goes together with
Re: [Qemu-devel] [PATCH 1/2] docs: update ivshmem device spec
Hi Vince, Yes, I did see the patches for the new server. I will review within the week. Cheers, Cam On Thu, Jun 26, 2014 at 9:37 AM, Vincent JARDIN vincent.jar...@6wind.com wrote: Hi Cam, FYI, David did implement a new server. http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg04978.html which is easier to maintain. Please, could you review his patch? He'll be back from holiday within 1 week. Best regards, Vincent PS: thanks for your comments
Re: [Qemu-devel] Why I advise against using ivshmem
Hello, Just to add my two bits. I will fully support getting all the necessary parts of ivshmem into tree where appropriate, both qemu and a driver in Linux. I understand those concerns. I do not have the time to fully maintain ivshmem at the level needed, but I will help as much as I can. Sorry for the delay in contributing to this conversation. Cheers, Cam On Sat, Jun 21, 2014 at 3:34 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Wed, Jun 18, 2014 at 10:57 PM, David Marchand david.march...@6wind.com wrote: On 06/18/2014 12:48 PM, Stefan Hajnoczi wrote: One more thing to add to the list: static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) The flags argument should be size. Size should be checked before accessing buf. You are welcome to send a fix and I will review it. I don't plan to send ivshmem patches in the near future because I don't use or support it. I thought you were interested in bringing ivshmem up to a level where distros feel comfortable enabling and supporting it. Getting there will require effort from you to audit, clean up, and achieve test coverage. That's what a maintainer needs to do in a case like this. Stefan ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Qemu-devel] [Bug 1314857] Re: seg fault in ivshmem when using ioeventfd=on
Hello, The patch for this later bug has been proposed. I'm not sure why it's not merged. http://patchwork.ozlabs.org/patch/316785/ Cheers, Cam On Thu, May 1, 2014 at 10:53 AM, Gene Snider g...@cvtt.net wrote: When I tried the same thing with git master (latest) I get a different error: qemu_chr_fe_claim_no_fail: error chardev (null) already used ** Also affects: qemu-kvm (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1314857 Title: seg fault in ivshmem when using ioeventfd=on Status in QEMU: New Status in “qemu-kvm” package in Ubuntu: New Bug description: When launching qemu with the ivshmem device and the nahanni guest server there is segmentation fault in the setup_ioeventfds function of ivshmem.c. If the ioeventfd=on flag is set the pci_ivshmem_init will call setup_ioeventfds at line 668. This function relies on the 'peers' member of the server info which is not allocated until line 669. To reproduce you will need the nahanni guest server code. The driver code is not needed. You will also need a qcow2 or other bootable image to use for launching qemu. The error occurs before the actual image launch. Start the nahanni ivshmem server with a small global memory space ( although the bug is not allocation specific ) ivshmem -m 1 -n 2 -p /tmp/ivshmem_socket Next launch qemu with initialization for the ivshmem device. qemu-system-x86_64 -hda test_iso.qcow2 -localtime -boot c -chardev socket,path=/tmp/ivshmem_socket,id=ivshmem_socket -device ivshmem,chardev=ivshmem_socket,size=1,ioeventfd=on If gdb is used the following error is recorded: Program received signal SIGSEGV, Segmentation fault. 0x5579dd52 in setup_ioeventfds (s=0x56619580) at /home/genes/work/ubuntu/qemu-kvm-1.0+noroms/hw/ivshmem.c:367 367 for (j = 0; j s-peers[i].nb_eventfds; j++) { (gdb) print s-peers $2 = (Peer *) 0x0 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1314857/+subscriptions
Re: [Qemu-devel] [PATCH RFC] char: fix avail_connections init in qemu_chr_open_eventfd()
Hi David, I'm not sure which is the correct approach. You could either do what you did or you could simply remove the qemu_chr_fe_claim_no_fail() from ivshmem.c. I'm not sure how your change impacts other devices. Sincerely, Cam On Tue, Feb 4, 2014 at 2:17 PM, David Marchand david.march...@6wind.comwrote: Hello, First of all, this is a pure RFC patch, I did not take too much time to dig into qemu source code to find the right solution, but since qemu_chr_open_eventfd() is only used by the code I was looking at, here is a patch. When trying to use a ivshmem server with qemu, ivshmem init code tries to create a CharDriverState object for each eventfd retrieved from the server. To create this object, a call to qemu_chr_open_eventfd() is done. Right after this, before adding a frontend, qemu_chr_fe_claim_no_fail() is called. qemu_chr_open_eventfd() does not set avail_connections to 1, so no frontend can be associated because qemu_chr_fe_claim_no_fail() makes qemu stop right away. I suppose this problem comes from 456d60692310e7ac25cf822cc1e98192ad636ece qemu-char: Call fe_claim / fe_release when not using qdev chr properties. Fix this, by setting avail_connections to 1 in qemu_chr_open_eventfd(). Signed-off-by: David Marchand david.march...@6wind.com --- qemu-char.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/qemu-char.c b/qemu-char.c index 30c5a6a..c0adb04 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2492,7 +2492,12 @@ static gboolean tcp_chr_read(GIOChannel *chan, GIOCondition cond, void *opaque) #ifndef _WIN32 CharDriverState *qemu_chr_open_eventfd(int eventfd) { -return qemu_chr_open_fd(eventfd, eventfd); +CharDriverState *chr = qemu_chr_open_fd(eventfd, eventfd); + +if (chr) +chr-avail_connections = 1; + +return chr; } #endif -- David Marchand
Re: [Qemu-devel] help me for nahanni device
Hello, Can you please try with the most recent version? I would suggest building from the source code which can be found here: http://qemu-project.org/Main_Page Sincerely, Cam On Tue, Jul 2, 2013 at 7:50 PM, DAI Weibin weibin@alcatel-sbell.com.cnwrote: Hello Cam, I am honored that receive your reply. ** ** 1 the QEMU version is showed as following, the version is 1.0.50 *[root@node1 sh-vm]# /opt/qemu-kvm-rt/x86_64-softmmu/qemu-system-x86_64 -version* *QEMU emulator version 1.0.50 (qemu-kvm-devel), Copyright (c) 2003-2008 Fabrice Bellard* *[root@node1 sh-vm]#* * * 2 ”QEMU/KVM doesn't work” My mean is that qemu has been start, but I can’t log on VM or ping VM IP. When only attach a nahanni device, all are ok. ** ** Thank you. ** ** Dai weibin ** ** *From:* c...@ualberta.ca [mailto:c...@ualberta.ca] *On Behalf Of *Cam Macdonell *Sent:* 2013年7月3日 0:20 *To:* DAI Weibin *Cc:* qemu-devel@nongnu.org *Subject:* Re: [Qemu-devel] help me for nahanni device ** ** Hello Dai, ** ** Which qemu version you are using? When you say QEMU/KVM doesn't work, do you mean it does not start? ** ** Cam ** ** On Mon, Jul 1, 2013 at 12:39 AM, DAI Weibin weibin@alcatel-sbell.com.cn wrote: Hi all, When use nahanni device as following, The QEMU/KVM doesn’t work. I attach two ivshmem devices to VM, QEMU/KVM only supports a ivshmem device, why? Expect your reply! *-device virtio-net-pci,vlan=0,id=net0,mac=DE:AD:BE:09:03:01,bus=pci.0,addr=0x5 \** *** *-net tap,script=/etc/qemu-ifup \* *-device ivshmem,size=256m,shm=l1shm \* *-device ivshmem,size=256m,shm=l2shm \* *-usbdevice tablet \* *-vnc :2 \* *-daemonize* Best regards 戴卫彬 ** **
Re: [Qemu-devel] help me for nahanni device
Hello Dai, Which qemu version you are using? When you say QEMU/KVM doesn't work, do you mean it does not start? Cam On Mon, Jul 1, 2013 at 12:39 AM, DAI Weibin weibin@alcatel-sbell.com.cn wrote: Hi all, When use nahanni device as following, The QEMU/KVM doesn’t work. I attach two ivshmem devices to VM, QEMU/KVM only supports a ivshmem device, why? Expect your reply! *-device virtio-net-pci,vlan=0,id=net0,mac=DE:AD:BE:09:03:01,bus=pci.0,addr=0x5 \* *-net tap,script=/etc/qemu-ifup \* *-device ivshmem,size=256m,shm=l1shm \* *-device ivshmem,size=256m,shm=l2shm \* *-usbdevice tablet \* *-vnc :2 \* *-daemonize* Best regards 戴卫彬
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Wed, Dec 5, 2012 at 1:50 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2012-12-05 06:34, Cam Macdonell wrote: static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); Problem 1) in my tests is_enabled is always 0, so I don't think the irqfds are getting setup You likely want to call msix_enabled here. Yup, that gets it working. Liu Ping, can you update the patch to use msix_enabled()? Also, it seems that with irqfd enabled the user-level handlers are not triggered, but it may still be a better idea to not add the user-level handlers to the char devices at all if irqfd is enabled. Cam Jan -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Tue, Dec 4, 2012 at 4:10 AM, Andrew Jones drjo...@redhat.com wrote: - Original Message - On Thu, Nov 29, 2012 at 1:34 AM, liu ping fan qemul...@gmail.com wrote: On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell c...@cs.ualberta.ca wrote: On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan qemul...@gmail.com wrote: On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell c...@cs.ualberta.ca wrote: On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Using irqfd, so we can avoid switch between kernel and user when VMs interrupts each other. Nice work. Due to a hardware failure, there will be a small delay in me being able to test this. I'll follow up as soon as I can. BTW where can I find the latest guest code for test? I got the guest code from git://gitorious.org/nahanni/guest-code.git. But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits position (the codes conflict with ivshmem spec), it works (I have tested for V1). Hello, Which device driver are you using? guest-code/kernel_module/standard/kvm_ivshmem.c The uio driver is the recommended one, however if you want to use the kvm_ivshmem one and have it working, then feel free to continue. If the uio driver is the recommended one, then can you please post it to lkml? It should be integrated into drivers/virt with an appropriate Kconfig update. Sure. Should it go under drivers/virt or drivers/uio? It seems the uio drivers all get grouped together. Thanks, Cam Thanks, Drew I had deleted it form the repo, but some users had based solutions off it, so I added it back. btw, my hardware issue has been resolved, so I'll get to testing your patch soon. Sincerely, Cam Cam Regards, Pingfan Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- hw/ivshmem.c | 54 +- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 7c8630c..5709e89 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -19,6 +19,7 @@ #include hw.h #include pc.h #include pci.h +#include msi.h #include msix.h #include kvm.h #include migration.h @@ -83,6 +84,7 @@ typedef struct IVShmemState { uint32_t vectors; uint32_t features; EventfdEntry *eventfd_table; +int *vector_virqs; Error *migration_blocker; @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, + MSIMessage msg) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +int virq; +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; + +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq = 0 kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 0) { +s-vector_virqs[vector] = virq; +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, NULL); +} else if (virq = 0) { +kvm_irqchip_release_virq(kvm_state, virq); +error_report(ivshmem, can not setup irqfd\n); +return -1; +} else { +error_report(ivshmem, no enough msi route to setup irqfd\n); +return -1; +} + +return 0; +} + +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; +int virq = s-vector_virqs[vector]; + +if (s-vector_virqs[vector] = 0) { +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); +kvm_irqchip_release_virq(kvm_state, virq); +s-vector_virqs[vector] = -1; +} +} + static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); +if (!was_enabled is_enabled) { +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, +ivshmem_vector_release); +} else if (was_enabled !is_enabled) { +msix_unset_vector_notifiers(pci_dev); +} } static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); uint8_t *pci_conf; +int i; if (s-sizearg == NULL) s-ivshmem_size = 4 20; /* 4 MB default */ @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) } s-dev.config_write = ivshmem_write_config; - +s-vector_virqs = g_new0(int, s-vectors); +for (i = 0; i s-vectors; i++) { +s-vector_virqs[i] = -1
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Using irqfd, so we can avoid switch between kernel and user when VMs interrupts each other. Hi Liu Ping, With this patch applied I was still seeing transitions to user-level on the receipt of an msi interrupt. uncomment DEBUG_IVSHMEM in hw/ivshmem.c (and fix one compile error in the debug statement :) ) IVSHMEM: msix initialized (1 vectors) ... IVSHMEM: interrupt on vector 0x7f2971b1d8d0 0 IVSHMEM: interrupt on vector 0x7f2971b1d8d0 0 if we're using irqfd, this should be avoided. Here's my command-line: -device ivshmem,chardev=nahanni,msi=on,ioeventfd=on,size=2048,use64=1,role=peer There are two issues as I see it: 1) irqfd is not being enabled in my tests 2) the defaults handlers are still being added One difference is that I'm using the UIO driver, which enables PCI using pci_enable_msix as follows pci_enable_msix(ivs_info-dev, ivs_info-msix_entries, ivs_info-nvectors); and succeeds [2.651253] uio_ivshmem :00:04.0: irq 43 for MSI/MSI-X [2.651326] MSI-X enabled (continued below) Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- hw/ivshmem.c | 54 +- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 7c8630c..5709e89 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -19,6 +19,7 @@ #include hw.h #include pc.h #include pci.h +#include msi.h #include msix.h #include kvm.h #include migration.h @@ -83,6 +84,7 @@ typedef struct IVShmemState { uint32_t vectors; uint32_t features; EventfdEntry *eventfd_table; +int *vector_virqs; Error *migration_blocker; @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, + MSIMessage msg) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +int virq; +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; + +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq = 0 kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 0) { +s-vector_virqs[vector] = virq; +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, NULL); +} else if (virq = 0) { +kvm_irqchip_release_virq(kvm_state, virq); +error_report(ivshmem, can not setup irqfd\n); +return -1; +} else { +error_report(ivshmem, no enough msi route to setup irqfd\n); +return -1; +} + +return 0; +} + +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; +int virq = s-vector_virqs[vector]; + +if (s-vector_virqs[vector] = 0) { +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); +kvm_irqchip_release_virq(kvm_state, virq); +s-vector_virqs[vector] = -1; +} +} + static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); Problem 1) in my tests is_enabled is always 0, so I don't think the irqfds are getting setup +if (!was_enabled is_enabled) { +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, +ivshmem_vector_release); +} else if (was_enabled !is_enabled) { +msix_unset_vector_notifiers(pci_dev); +} } static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); uint8_t *pci_conf; +int i; if (s-sizearg == NULL) s-ivshmem_size = 4 20; /* 4 MB default */ @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) } s-dev.config_write = ivshmem_write_config; - +s-vector_virqs = g_new0(int, s-vectors); +for (i = 0; i s-vectors; i++) { +s-vector_virqs[i] = -1; +} return 0; } @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) migrate_del_blocker(s-migration_blocker); error_free(s-migration_blocker); } +g_free(s-vector_virqs); memory_region_destroy(s-ivshmem_mmio); memory_region_del_subregion(s-bar, s-ivshmem); -- 1.7.4.4 Problem 2) We'll also have to not add the handlers as below if irqfd is present otherwise we'll get double interrupts, so we'll have to add a check here too. /* if MSI is supported we need multiple interrupts */ if (!ivshmem_has_feature(s, IVSHMEM_MSI)) { s-eventfd_table[vector].pdev = s-dev;
Re: [Qemu-devel] [PATCH v2 1/2] ivshmem: remove msix_write_config
On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com This logic has been integrated into pci core, so remove it. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- hw/ivshmem.c |1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index f6dbb21..7c8630c 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -629,7 +629,6 @@ static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { pci_default_write_config(pci_dev, address, val, len); -msix_write_config(pci_dev, address, val, len); } static int pci_ivshmem_init(PCIDevice *dev) -- 1.7.4.4
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Thu, Nov 29, 2012 at 1:34 AM, liu ping fan qemul...@gmail.com wrote: On Thu, Nov 29, 2012 at 12:42 PM, Cam Macdonell c...@cs.ualberta.ca wrote: On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan qemul...@gmail.com wrote: On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell c...@cs.ualberta.ca wrote: On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Using irqfd, so we can avoid switch between kernel and user when VMs interrupts each other. Nice work. Due to a hardware failure, there will be a small delay in me being able to test this. I'll follow up as soon as I can. BTW where can I find the latest guest code for test? I got the guest code from git://gitorious.org/nahanni/guest-code.git. But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits position (the codes conflict with ivshmem spec), it works (I have tested for V1). Hello, Which device driver are you using? guest-code/kernel_module/standard/kvm_ivshmem.c The uio driver is the recommended one, however if you want to use the kvm_ivshmem one and have it working, then feel free to continue. I had deleted it form the repo, but some users had based solutions off it, so I added it back. btw, my hardware issue has been resolved, so I'll get to testing your patch soon. Sincerely, Cam Cam Regards, Pingfan Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- hw/ivshmem.c | 54 +- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 7c8630c..5709e89 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -19,6 +19,7 @@ #include hw.h #include pc.h #include pci.h +#include msi.h #include msix.h #include kvm.h #include migration.h @@ -83,6 +84,7 @@ typedef struct IVShmemState { uint32_t vectors; uint32_t features; EventfdEntry *eventfd_table; +int *vector_virqs; Error *migration_blocker; @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, + MSIMessage msg) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +int virq; +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; + +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq = 0 kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 0) { +s-vector_virqs[vector] = virq; +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, NULL); +} else if (virq = 0) { +kvm_irqchip_release_virq(kvm_state, virq); +error_report(ivshmem, can not setup irqfd\n); +return -1; +} else { +error_report(ivshmem, no enough msi route to setup irqfd\n); +return -1; +} + +return 0; +} + +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; +int virq = s-vector_virqs[vector]; + +if (s-vector_virqs[vector] = 0) { +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); +kvm_irqchip_release_virq(kvm_state, virq); +s-vector_virqs[vector] = -1; +} +} + static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); +if (!was_enabled is_enabled) { +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, +ivshmem_vector_release); +} else if (was_enabled !is_enabled) { +msix_unset_vector_notifiers(pci_dev); +} } static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); uint8_t *pci_conf; +int i; if (s-sizearg == NULL) s-ivshmem_size = 4 20; /* 4 MB default */ @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) } s-dev.config_write = ivshmem_write_config; - +s-vector_virqs = g_new0(int, s-vectors); +for (i = 0; i s-vectors; i++) { +s-vector_virqs[i] = -1; +} return 0; } @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) migrate_del_blocker(s-migration_blocker); error_free(s-migration_blocker); } +g_free(s-vector_virqs); memory_region_destroy(s-ivshmem_mmio); memory_region_del_subregion(s-bar, s-ivshmem); -- 1.7.4.4
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Tue, Nov 27, 2012 at 7:53 PM, liu ping fan qemul...@gmail.com wrote: On Wed, Nov 28, 2012 at 5:48 AM, Cam Macdonell c...@cs.ualberta.ca wrote: On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Using irqfd, so we can avoid switch between kernel and user when VMs interrupts each other. Nice work. Due to a hardware failure, there will be a small delay in me being able to test this. I'll follow up as soon as I can. BTW where can I find the latest guest code for test? I got the guest code from git://gitorious.org/nahanni/guest-code.git. But it seems outdated, after fixing the unlocked_ioctl, and vm-id bits position (the codes conflict with ivshmem spec), it works (I have tested for V1). Hello, Which device driver are you using? Cam Regards, Pingfan Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- hw/ivshmem.c | 54 +- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 7c8630c..5709e89 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -19,6 +19,7 @@ #include hw.h #include pc.h #include pci.h +#include msi.h #include msix.h #include kvm.h #include migration.h @@ -83,6 +84,7 @@ typedef struct IVShmemState { uint32_t vectors; uint32_t features; EventfdEntry *eventfd_table; +int *vector_virqs; Error *migration_blocker; @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, + MSIMessage msg) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +int virq; +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; + +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq = 0 kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 0) { +s-vector_virqs[vector] = virq; +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, NULL); +} else if (virq = 0) { +kvm_irqchip_release_virq(kvm_state, virq); +error_report(ivshmem, can not setup irqfd\n); +return -1; +} else { +error_report(ivshmem, no enough msi route to setup irqfd\n); +return -1; +} + +return 0; +} + +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; +int virq = s-vector_virqs[vector]; + +if (s-vector_virqs[vector] = 0) { +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); +kvm_irqchip_release_virq(kvm_state, virq); +s-vector_virqs[vector] = -1; +} +} + static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); +if (!was_enabled is_enabled) { +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, +ivshmem_vector_release); +} else if (was_enabled !is_enabled) { +msix_unset_vector_notifiers(pci_dev); +} } static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); uint8_t *pci_conf; +int i; if (s-sizearg == NULL) s-ivshmem_size = 4 20; /* 4 MB default */ @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) } s-dev.config_write = ivshmem_write_config; - +s-vector_virqs = g_new0(int, s-vectors); +for (i = 0; i s-vectors; i++) { +s-vector_virqs[i] = -1; +} return 0; } @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) migrate_del_blocker(s-migration_blocker); error_free(s-migration_blocker); } +g_free(s-vector_virqs); memory_region_destroy(s-ivshmem_mmio); memory_region_del_subregion(s-bar, s-ivshmem); -- 1.7.4.4
Re: [Qemu-devel] [PATCH v2 2/2] ivshmem: use irqfd to interrupt among VMs
On Sat, Nov 24, 2012 at 8:51 PM, Liu Ping Fan qemul...@gmail.com wrote: From: Liu Ping Fan pingf...@linux.vnet.ibm.com Using irqfd, so we can avoid switch between kernel and user when VMs interrupts each other. Nice work. Due to a hardware failure, there will be a small delay in me being able to test this. I'll follow up as soon as I can. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- hw/ivshmem.c | 54 +- 1 files changed, 53 insertions(+), 1 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 7c8630c..5709e89 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -19,6 +19,7 @@ #include hw.h #include pc.h #include pci.h +#include msi.h #include msix.h #include kvm.h #include migration.h @@ -83,6 +84,7 @@ typedef struct IVShmemState { uint32_t vectors; uint32_t features; EventfdEntry *eventfd_table; +int *vector_virqs; Error *migration_blocker; @@ -625,16 +627,62 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static int ivshmem_vector_use(PCIDevice *dev, unsigned vector, + MSIMessage msg) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +int virq; +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; + +virq = kvm_irqchip_add_msi_route(kvm_state, msg); +if (virq = 0 kvm_irqchip_add_irqfd_notifier(kvm_state, n, virq) = 0) { +s-vector_virqs[vector] = virq; +qemu_chr_add_handlers(s-eventfd_chr[vector], NULL, NULL, NULL, NULL); +} else if (virq = 0) { +kvm_irqchip_release_virq(kvm_state, virq); +error_report(ivshmem, can not setup irqfd\n); +return -1; +} else { +error_report(ivshmem, no enough msi route to setup irqfd\n); +return -1; +} + +return 0; +} + +static void ivshmem_vector_release(PCIDevice *dev, unsigned vector) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +EventNotifier *n = s-peers[s-vm_id].eventfds[vector]; +int virq = s-vector_virqs[vector]; + +if (s-vector_virqs[vector] = 0) { +kvm_irqchip_remove_irqfd_notifier(kvm_state, n, virq); +kvm_irqchip_release_virq(kvm_state, virq); +s-vector_virqs[vector] = -1; +} +} + static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { +bool is_enabled, was_enabled = msi_enabled(pci_dev); + pci_default_write_config(pci_dev, address, val, len); +is_enabled = msi_enabled(pci_dev); +if (!was_enabled is_enabled) { +msix_set_vector_notifiers(pci_dev, ivshmem_vector_use, +ivshmem_vector_release); +} else if (was_enabled !is_enabled) { +msix_unset_vector_notifiers(pci_dev); +} } static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); uint8_t *pci_conf; +int i; if (s-sizearg == NULL) s-ivshmem_size = 4 20; /* 4 MB default */ @@ -758,7 +806,10 @@ static int pci_ivshmem_init(PCIDevice *dev) } s-dev.config_write = ivshmem_write_config; - +s-vector_virqs = g_new0(int, s-vectors); +for (i = 0; i s-vectors; i++) { +s-vector_virqs[i] = -1; +} return 0; } @@ -770,6 +821,7 @@ static void pci_ivshmem_uninit(PCIDevice *dev) migrate_del_blocker(s-migration_blocker); error_free(s-migration_blocker); } +g_free(s-vector_virqs); memory_region_destroy(s-ivshmem_mmio); memory_region_del_subregion(s-bar, s-ivshmem); -- 1.7.4.4
Re: [Qemu-devel] [PATCH v4] ivshmem: add 64bit option
Can this patch be included in 1.2? Thanks, Cam On Thu, Aug 23, 2012 at 6:49 AM, Gerd Hoffmann kra...@redhat.com wrote: This patch adds a use64 property which will make the ivshmem driver register a 64bit memory bar when set, so you have something to play with when testing 64bit pci bits. It also allows to have quite big shared memory regions, like this: [root@fedora ~]# lspci -vs1:1 01:01.0 RAM memory: Red Hat, Inc Device 1110 Subsystem: Red Hat, Inc Device 1100 Physical Slot: 1-1 Flags: fast devsel Memory at fd40 (32-bit, non-prefetchable) [disabled] [size=256] Memory at 804000 (64-bit, prefetchable) [size=1G] [ v4: rebase adapt to latest master again ] [ v3: rebase adapt to latest master ] [ v2: default to on as suggested by avi, turn off for pc-$old using compat property ] Signed-off-by: Gerd Hoffmann kra...@redhat.com Tested-by: Cam Macdonell c...@cs.ualberta.ca --- hw/ivshmem.c | 13 ++--- hw/pc_piix.c |4 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index b4d65a6..8c45aa6 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -71,6 +71,8 @@ typedef struct IVShmemState { MemoryRegion bar; MemoryRegion ivshmem; uint64_t ivshmem_size; /* size of shared memory region */ +uint32_t ivshmem_attr; +uint32_t ivshmem_64bit; int shm_fd; /* shared memory file descriptor */ Peer *peers; @@ -339,7 +341,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { memory_region_add_subregion(s-bar, 0, s-ivshmem); /* region for shared memory */ -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar); +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar); } static void ivshmem_add_eventfd(IVShmemState *s, int posn, int i) @@ -701,6 +703,11 @@ static int pci_ivshmem_init(PCIDevice *dev) s-ivshmem_mmio); memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size); +s-ivshmem_attr = PCI_BASE_ADDRESS_SPACE_MEMORY | +PCI_BASE_ADDRESS_MEM_PREFETCH; +if (s-ivshmem_64bit) { +s-ivshmem_attr |= PCI_BASE_ADDRESS_MEM_TYPE_64; +} if ((s-server_chr != NULL) (strncmp(s-server_chr-filename, unix:, 5) == 0)) { @@ -726,8 +733,7 @@ static int pci_ivshmem_init(PCIDevice *dev) /* allocate/initialize space for interrupt handling */ s-peers = g_malloc0(s-nb_peers * sizeof(Peer)); -pci_register_bar(s-dev, 2, - PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar); +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar); s-eventfd_chr = g_malloc0(s-vectors * sizeof(CharDriverState *)); @@ -797,6 +803,7 @@ static Property ivshmem_properties[] = { DEFINE_PROP_BIT(msi, IVShmemState, features, IVSHMEM_MSI, true), DEFINE_PROP_STRING(shm, IVShmemState, shmobj), DEFINE_PROP_STRING(role, IVShmemState, role), +DEFINE_PROP_UINT32(use64, IVShmemState, ivshmem_64bit, 1), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 88ff041..c7fbaa0 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -387,6 +387,10 @@ static QEMUMachine pc_machine_v1_2 = { .driver = virtio-blk-pci,\ .property = config-wce,\ .value= off,\ +},{\ +.driver = ivshmem,\ +.property = use64,\ +.value= 0,\ } static QEMUMachine pc_machine_v1_1 = { -- 1.7.1
Re: [Qemu-devel] [PATCH for-1.2] msix: make [un]use vectors on reset/load optional
Yes, will test shortly. Cam On Wed, Aug 29, 2012 at 10:47 AM, Michael S. Tsirkin m...@redhat.com wrote: On Wed, Aug 29, 2012 at 06:44:49PM +0200, Jan Kiszka wrote: On 2012-08-29 18:40, Michael S. Tsirkin wrote: The facility to use/unuse vectors dynamically is helpful for virtio but little else: everyone just seems to use vectors in their init function. Avoid clearing msix vector use info on reset and load. For virtio, clear it explicitly. This should fix regressions reported with ivshmem - though I didn't test this, I verified that virtio keeps working like it did. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/msix.c | 13 +++-- hw/virtio-pci.c | 2 ++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 800fc32..d040cc2 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -340,6 +340,15 @@ static void msix_free_irq_entries(PCIDevice *dev) } } +static void msix_clear_all_vectors(PCIDevice *dev) +{ +int vector; + +for (vector = 0; vector dev-msix_entries_nr; ++vector) { +msix_clr_pending(dev, vector); +} +} + /* Clean up resources for the device. */ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar) { @@ -394,7 +403,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f) return; } -msix_free_irq_entries(dev); +msix_clear_all_vectors(dev); qemu_get_buffer(f, dev-msix_table, n * PCI_MSIX_ENTRY_SIZE); qemu_get_buffer(f, dev-msix_pba, (n + 7) / 8); msix_update_function_masked(dev); @@ -440,7 +449,7 @@ void msix_reset(PCIDevice *dev) if (!msix_present(dev)) { return; } -msix_free_irq_entries(dev); +msix_clear_all_vectors(dev); dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] = ~dev-wmask[dev-msix_cap + MSIX_CONTROL_OFFSET]; memset(dev-msix_table, 0, dev-msix_entries_nr * PCI_MSIX_ENTRY_SIZE); diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 125eded..ca0b204 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -131,6 +131,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f) if (ret) { return ret; } +msix_unuse_all_vectors(proxy-pci_dev); msix_load(proxy-pci_dev, f); if (msix_present(proxy-pci_dev)) { qemu_get_be16s(f, proxy-vdev-config_vector); @@ -246,6 +247,7 @@ void virtio_pci_reset(DeviceState *d) VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev); virtio_pci_stop_ioeventfd(proxy); virtio_reset(proxy-vdev); +msix_unuse_all_vectors(proxy-pci_dev); proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } Fine with me, but let's ask Cam to test. Jan Cam deadline is today - can u test quickly pls? -- Siemens AG, Corporate Technology, CT RTC ITP SDP-DE Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH for-1.2] msix: make [un]use vectors on reset/load optional
Yes, works for me. On Wed, Aug 29, 2012 at 10:40 AM, Michael S. Tsirkin m...@redhat.com wrote: The facility to use/unuse vectors dynamically is helpful for virtio but little else: everyone just seems to use vectors in their init function. Avoid clearing msix vector use info on reset and load. For virtio, clear it explicitly. This should fix regressions reported with ivshmem - though I didn't test this, I verified that virtio keeps working like it did. Signed-off-by: Michael S. Tsirkin m...@redhat.com Tested-by: Cam Macdonell c...@cs.ualberta.ca --- hw/msix.c | 13 +++-- hw/virtio-pci.c | 2 ++ 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index 800fc32..d040cc2 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -340,6 +340,15 @@ static void msix_free_irq_entries(PCIDevice *dev) } } +static void msix_clear_all_vectors(PCIDevice *dev) +{ +int vector; + +for (vector = 0; vector dev-msix_entries_nr; ++vector) { +msix_clr_pending(dev, vector); +} +} + /* Clean up resources for the device. */ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar) { @@ -394,7 +403,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f) return; } -msix_free_irq_entries(dev); +msix_clear_all_vectors(dev); qemu_get_buffer(f, dev-msix_table, n * PCI_MSIX_ENTRY_SIZE); qemu_get_buffer(f, dev-msix_pba, (n + 7) / 8); msix_update_function_masked(dev); @@ -440,7 +449,7 @@ void msix_reset(PCIDevice *dev) if (!msix_present(dev)) { return; } -msix_free_irq_entries(dev); +msix_clear_all_vectors(dev); dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] = ~dev-wmask[dev-msix_cap + MSIX_CONTROL_OFFSET]; memset(dev-msix_table, 0, dev-msix_entries_nr * PCI_MSIX_ENTRY_SIZE); diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 125eded..ca0b204 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -131,6 +131,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f) if (ret) { return ret; } +msix_unuse_all_vectors(proxy-pci_dev); msix_load(proxy-pci_dev, f); if (msix_present(proxy-pci_dev)) { qemu_get_be16s(f, proxy-vdev-config_vector); @@ -246,6 +247,7 @@ void virtio_pci_reset(DeviceState *d) VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev); virtio_pci_stop_ioeventfd(proxy); virtio_reset(proxy-vdev); +msix_unuse_all_vectors(proxy-pci_dev); proxy-flags = ~VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } -- MST
[Qemu-devel] [PATCH for-1.2] ivshmem: remove redundant ioeventfd configuration
setup_ioeventfds() is unnecessary and actually causes a segfault when used ioeventfd=on is used on the command-line. Since ioeventfds are handled within the memory API, it can be removed. Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- hw/ivshmem.c | 15 --- 1 files changed, 0 insertions(+), 15 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 068e1f3..d0fe5a2 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -387,17 +387,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - -int i, j; - -for (i = 0; i = s-max_peer; i++) { -for (j = 0; j s-peers[i].nb_eventfds; j++) { -ivshmem_add_eventfd(s, i, j); -} -} -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -678,10 +667,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -setup_ioeventfds(s); -} - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio); -- 1.7.5.4
Re: [Qemu-devel] [PATCH v2][for 1.2?] msix: Drop tracking of used vectors
On Fri, Aug 24, 2012 at 12:19 AM, Jan Kiszka jan.kis...@web.de wrote: From: Jan Kiszka jan.kis...@siemens.com This optimization was once used in qemu-kvm to keep KVM route usage low. But now we solved that problem via lazy updates. It also tried to handle the case of vectors shared between different sources of the same device. However, this never really worked and will have to be addressed differently anyway. So drop this obsolete interface. We still need interfaces to clear pending vectors though. Provide msix_clear_vector and msix_clear_all_vectors for this. This also unbreaks MSI-X after reset for ivshmem and megasas as device models can't easily mark their vectors used afterward (megasas didn't even try). Signed-off-by: Jan Kiszka jan.kis...@siemens.com Tested-by: Cam Macdonell c...@cs.ualberta.ca --- This patch has been posted some moons again, and we had a discussion about the impact on the existing users. At that time, MSI-X refactoring for KVM support was not yet merged. Now it is, and it should be better much clearer that vector usage tracking has no business with that feature. hw/ivshmem.c| 20 --- hw/megasas.c|4 --- hw/msix.c | 57 -- hw/msix.h |5 +-- hw/pci.h|2 - hw/virtio-pci.c | 20 +++--- 6 files changed, 23 insertions(+), 85 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 47f2a16..548 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -523,28 +523,11 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) return; } -/* Select the MSI-X vectors used by device. - * ivshmem maps events to vectors statically, so - * we just enable all vectors on init and after reset. */ -static void ivshmem_use_msix(IVShmemState * s) -{ -int i; - -if (!msix_present(s-dev)) { -return; -} - -for (i = 0; i s-vectors; i++) { -msix_vector_use(s-dev, i); -} -} - static void ivshmem_reset(DeviceState *d) { IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; -ivshmem_use_msix(s); return; } @@ -586,8 +569,6 @@ static void ivshmem_setup_msi(IVShmemState * s) /* allocate QEMU char devices for receiving interrupts */ s-eventfd_table = g_malloc0(s-vectors * sizeof(EventfdEntry)); - -ivshmem_use_msix(s); } static void ivshmem_save(QEMUFile* f, void *opaque) @@ -629,7 +610,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) { msix_load(proxy-dev, f); - ivshmem_use_msix(proxy); } else { proxy-intrstatus = qemu_get_be32(f); proxy-intrmask = qemu_get_be32(f); diff --git a/hw/megasas.c b/hw/megasas.c index c35a15d..4766117 100644 --- a/hw/megasas.c +++ b/hw/megasas.c @@ -2121,10 +2121,6 @@ static int megasas_scsi_init(PCIDevice *dev) pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_IO, s-port_io); pci_register_bar(s-dev, 3, bar_type, s-queue_io); -if (megasas_use_msix(s)) { -msix_vector_use(s-dev, 0); -} - if (!s-sas_addr) { s-sas_addr = ((NAA_LOCALLY_ASSIGNED_ID 24) | IEEE_COMPANY_LOCALLY_ASSIGNED) 36; diff --git a/hw/msix.c b/hw/msix.c index aea340b..163ce4c 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -273,7 +273,6 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, dev-msix_table = g_malloc0(table_size); dev-msix_pba = g_malloc0(pba_size); -dev-msix_entry_used = g_malloc0(nentries * sizeof *dev-msix_entry_used); msix_mask_all(dev, nentries); @@ -326,16 +325,6 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned short nentries, return 0; } -static void msix_free_irq_entries(PCIDevice *dev) -{ -int vector; - -for (vector = 0; vector dev-msix_entries_nr; ++vector) { -dev-msix_entry_used[vector] = 0; -msix_clr_pending(dev, vector); -} -} - /* Clean up resources for the device. */ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar) { @@ -344,7 +333,6 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar) } pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH); dev-msix_cap = 0; -msix_free_irq_entries(dev); dev-msix_entries_nr = 0; memory_region_del_subregion(pba_bar, dev-msix_pba_mmio); memory_region_destroy(dev-msix_pba_mmio); @@ -354,8 +342,6 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar) memory_region_destroy(dev-msix_table_mmio); g_free(dev-msix_table); dev-msix_table = NULL; -g_free(dev-msix_entry_used); -dev-msix_entry_used = NULL; dev-cap_present = ~QEMU_PCI_CAP_MSIX; return; } @@ -390,7 +376,6 @@ void
[Qemu-devel] MSI-X bug with ivshmem since msix_reset moved to PCI
Hi Jan, I've bisected a bug in which MSI interrupts are not being delivered to the following patch, where msix_reset was moved in tot he PCI core. commit cbd2d4342b3d42ab33baa99f5b7a23491b5692f2 Author: Jan Kiszka jan.kis...@siemens.com Date: Tue May 15 20:09:56 2012 -0300 msi: Invoke msi/msix_reset from PCI core There is no point in pushing this burden to the devices, they tend to forget to call them (like intel-hda, ahci, xhci did). Instead, reset functions are now called from pci_device_reset. They do nothing if MSI/MSI-X is not in use. I've been debugging and it seems that when msix_notify() is triggered the second test in the if fails /* Send an MSI-X message */ void msix_notify(PCIDevice *dev, unsigned vector) { MSIMessage msg; if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector]) return; … } here is some MSI-X debugging statements msix_init IVSHMEM: msix initialized (1 vectors) IVSHMEM: using vector 0 IVSHMEM: ivshmem_reset IVSHMEM: using vector 0 msix_reset msix_free_irq_entries 0x7fd52d1cea20 msix_free_irq_entries() sets dev-msix_entries_nr to 0, so I think it may be the cause. Shouldn't ivshmem's reset (which reenables the vectors) be triggered by the msix_reset? Thanks, Cam p.s. And apologies, I should've caught this bug closer to that patch being merged.
Re: [Qemu-devel] ivshmem assertion failure with EventNotifier
On Wed, Aug 22, 2012 at 6:01 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/08/2012 06:29, Cam Macdonell ha scritto: Hi Paolo, I've noticed an assertion error when sending interrupts via ivshmem. I bisected to this patch. Does this help? Yes, that solves it. diff --git a/hw/ivshmem.c b/hw/ivshmem.c index b4d65a6..47f2a16 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -366,6 +366,10 @@ static void close_guest_eventfds(IVShmemState *s, int posn) { int i, guest_curr_max; +if (!ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { +return; +} + guest_curr_max = s-peers[posn].nb_eventfds; memory_region_transaction_begin(); Paolo
Re: [Qemu-devel] [PATCH] ivshmem: add 64bit option
Thanks Gerd, might need another rebase. I have had some users request larger memory regions. Quoting Gerd Hoffmann kra...@redhat.com: This patch adds a use64 property which will make the ivshmem driver register a 64bit memory bar when set, so you have something to play with when testing 64bit pci bits. It also allows to have quite big shared memory regions, like this: [root@fedora ~]# lspci -vs1:1 01:01.0 RAM memory: Red Hat, Inc Device 1110 Subsystem: Red Hat, Inc Device 1100 Physical Slot: 1-1 Flags: fast devsel Memory at fd40 (32-bit, non-prefetchable) [disabled] [size=256] Memory at 804000 (64-bit, prefetchable) [size=1G] [ v3: rebase adapt to latest master ] [ v2: default to on as suggested by avi, turn off for pc-$old using compat property ] Signed-off-by: Gerd Hoffmann kra...@redhat.com Tested-by: Cam Macdonell c...@cs.ualberta.ca --- hw/ivshmem.c | 13 ++--- hw/pc_piix.c |4 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 05559b6..8a776e2 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -72,6 +72,8 @@ typedef struct IVShmemState { MemoryRegion ivshmem; MemoryRegion msix_bar; uint64_t ivshmem_size; /* size of shared memory region */ +uint32_t ivshmem_attr; +uint32_t ivshmem_64bit; int shm_fd; /* shared memory file descriptor */ Peer *peers; @@ -344,7 +346,7 @@ static void create_shared_memory_BAR(IVShmemState *s, int fd) { memory_region_add_subregion(s-bar, 0, s-ivshmem); /* region for shared memory */ -pci_register_bar(s-dev, 2, PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar); +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar); } static void close_guest_eventfds(IVShmemState *s, int posn) @@ -693,6 +695,11 @@ static int pci_ivshmem_init(PCIDevice *dev) s-ivshmem_mmio); memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size); +s-ivshmem_attr = PCI_BASE_ADDRESS_SPACE_MEMORY | +PCI_BASE_ADDRESS_MEM_PREFETCH; +if (s-ivshmem_64bit) { +s-ivshmem_attr |= PCI_BASE_ADDRESS_MEM_TYPE_64; +} if ((s-server_chr != NULL) (strncmp(s-server_chr-filename, unix:, 5) == 0)) { @@ -718,8 +725,7 @@ static int pci_ivshmem_init(PCIDevice *dev) /* allocate/initialize space for interrupt handling */ s-peers = g_malloc0(s-nb_peers * sizeof(Peer)); -pci_register_bar(s-dev, 2, - PCI_BASE_ADDRESS_SPACE_MEMORY, s-bar); +pci_register_bar(s-dev, 2, s-ivshmem_attr, s-bar); s-eventfd_chr = g_malloc0(s-vectors * sizeof(CharDriverState *)); @@ -791,6 +797,7 @@ static Property ivshmem_properties[] = { DEFINE_PROP_BIT(msi, IVShmemState, features, IVSHMEM_MSI, true), DEFINE_PROP_STRING(shm, IVShmemState, shmobj), DEFINE_PROP_STRING(role, IVShmemState, role), +DEFINE_PROP_UINT32(use64, IVShmemState, ivshmem_64bit, 1), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 0c0096f..5101f5d 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -375,6 +375,10 @@ static QEMUMachine pc_machine_v1_2 = { .driver = qxl,\ .property = vgamem_mb,\ .value= stringify(8),\ +},{\ +.driver = ivshmem,\ +.property = use64,\ +.value= 0,\ } static QEMUMachine pc_machine_v1_1 = { -- 1.7.1
[Qemu-devel] ivshmem assertion failure with EventNotifier
Hi Paolo, I've noticed an assertion error when sending interrupts via ivshmem. I bisected to this patch. commit 563027cc0c94aa4846c18f9d665a4c90f8c42ba8 Author: Paolo Bonzini pbonz...@redhat.com Date: Thu Jul 5 17:16:25 2012 +0200 ivshmem: use EventNotifier and memory API All of ivshmem's usage of eventfd now has a corresponding API in EventNotifier. Simplify the code by using it, and also use the memory API consistently to set up and tear down the ioeventfds. Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Avi Kivity a...@redhat.com qemu-system-x86_64: /home/cam/src/git/qemu/memory.c:1244: memory_region_del_even tfd: Assertion `i != mr-ioeventfd_nb' failed. This assertion failure occurs when the eventfd is triggered. I'll continue to dig around, but can you explain what this assertion is catching. Is there an initialization that might be missing? Thanks, Cam
Re: [Qemu-devel] [PATCH master/v1.0.x] ivshmem: add missing msix calls
Hello, Can this patch be merged, please? Thanks, Cam On Mon, Dec 5, 2011 at 12:48 PM, Michael S. Tsirkin m...@redhat.com wrote: ivshmem used msix but didn't call it on either reset or config write paths. This used to partically work since guests don't use all of msi-x configuration fields, and reset is rarely used, but the patch 'msix: track function masked in pci device state' broke that. Fix by adding appropriate calls. Signed-off-by: Michael S. Tsirkin m...@redhat.com Reported-by: Cam Macdonell c...@cs.ualberta.ca Tested-by: Cam Macdonell c...@cs.ualberta.ca --- Please apply the following to both master and 1.0 stable branch. Thanks! diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..c58f4d3 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -500,11 +500,29 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) return; } +/* Select the MSI-X vectors used by device. + * ivshmem maps events to vectors statically, so + * we just enable all vectors on init and after reset. */ +static void ivshmem_use_msix(IVShmemState * s) +{ + int i; + + if (!msix_present(s-dev)) { + return; + } + + for (i = 0; i s-vectors; i++) { + msix_vector_use(s-dev, i); + } +} + static void ivshmem_reset(DeviceState *d) { IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; + msix_reset(s-dev); + ivshmem_use_msix(s); return; } @@ -535,12 +553,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) { return value; } -static void ivshmem_setup_msi(IVShmemState * s) { - - int i; - - /* allocate the MSI-X vectors */ - +static void ivshmem_setup_msi(IVShmemState * s) +{ memory_region_init(s-msix_bar, ivshmem-msix, 4096); if (!msix_init(s-dev, s-vectors, s-msix_bar, 1, 0)) { pci_register_bar(s-dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, @@ -551,13 +565,10 @@ static void ivshmem_setup_msi(IVShmemState * s) { exit(1); } - /* 'activate' the vectors */ - for (i = 0; i s-vectors; i++) { - msix_vector_use(s-dev, i); - } - /* allocate Qemu char devices for receiving interrupts */ s-eventfd_table = g_malloc0(s-vectors * sizeof(EventfdEntry)); + + ivshmem_use_msix(s); } static void ivshmem_save(QEMUFile* f, void *opaque) @@ -610,6 +621,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ + pci_default_write_config(pci_dev, address, val, len); + msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +752,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } + s-dev.config_write = ivshmem_write_config; + return 0; }
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
Can this patch please be merged? It fixes an important regression with ioeventfd. Thanks, Cam On Thu, Nov 24, 2011 at 3:05 AM, zanghongy...@huawei.com wrote: From: Hongyong Zang zanghongy...@huawei.com When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j s-peers[i].nb_eventfds; j++) { The bug is due to accessing s-peers which is NULL. This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; - pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { - kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], - s-mmio_addr + DOORBELL, (posn 16) | i, 0); + if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { + memory_region_del_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (posn 16) | i, + s-peers[posn].eventfds[i]); + } close(s-peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - - int i, j; - - for (i = 0; i = s-max_peer; i++) { - for (j = 0; j s-peers[i].nb_eventfds; j++) { - memory_region_add_eventfd(s-ivshmem_mmio, - DOORBELL, - 4, - true, - (i 16) | j, - s-peers[i].eventfds[j]); - } - } -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { - if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, - (incoming_posn 16) | guest_max_eventfd, 1) 0) { - fprintf(stderr, ivshmem: ioeventfd not available\n); - } + memory_region_add_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); - if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { - setup_ioeventfds(s); - } - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio); -- 1.7.1
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On Mon, Dec 5, 2011 at 2:08 AM, Michael S. Tsirkin m...@redhat.com wrote: On Sun, Dec 04, 2011 at 04:47:17PM -0700, Cam Macdonell wrote: On Sun, Dec 4, 2011 at 3:20 AM, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. I think the following should fix it. Compiled-only - could you pls check? If yes let's apply to the stable branch. Thanks for the patch Michael. It addresses the need for msix_write_config() to be called, but the addition of the msix_reset() is causing a reset of the vectors after they've been initialized in pci_ivshmem_init(). So, interrupts still aren't delivered with this patch applied as it is. In particular, a reset occurs after pci_ivshmem_init runs, so the msix_entry_used array is reset to 0s, which causes the interrupt delivery to fail. If I comment out the msix_reset(), then interrupts are delivered. Would the reset be caused by a bug in the guest driver? or do I need to reconfigure the msix after reset? I'm unclear as to the proper behaviour after a reset. Thanks, Cam I didn't anticipate this, virtio only configures vectors when msi-x is enabled. For 1.0 it's safest to do the same and just re-enable after reset. So we'll end up with something like the following - does it help? Yes, this fixes interrupt delivery. Side note: exit(1) is not the best way to handle errors in the init function. I think you should add error_report, then goto err on failures to init, cleanup what you have set up meanwhile and return an error code. Thanks, I'll fix that. diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..c58f4d3 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -500,11 +500,29 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) return; } +/* Select the MSI-X vectors used by device. + * ivshmem maps events to vectors statically, so + * we just enable all vectors on init and after reset. */ +static void ivshmem_use_msix(IVShmemState * s) +{ + int i; + + if (!msix_present(s-dev)) { + return; + } + + for (i = 0; i s-vectors; i++) { + msix_vector_use(s-dev, i); + } +} + static void ivshmem_reset(DeviceState *d) { IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; + msix_reset(s-dev); + ivshmem_use_msix(s); return; } @@ -535,12 +553,8 @@ static uint64_t ivshmem_get_size(IVShmemState * s) { return value; } -static void ivshmem_setup_msi(IVShmemState * s) { - - int i; - - /* allocate the MSI-X vectors */ - +static void ivshmem_setup_msi(IVShmemState * s) +{ memory_region_init(s-msix_bar, ivshmem-msix, 4096); if (!msix_init(s-dev, s-vectors, s-msix_bar, 1, 0)) { pci_register_bar(s-dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, @@ -551,13 +565,10 @@ static void ivshmem_setup_msi(IVShmemState * s) { exit(1); } - /* 'activate' the vectors */ - for (i = 0; i s-vectors; i++) { - msix_vector_use(s-dev, i); - } - /* allocate Qemu char devices for receiving interrupts */ s-eventfd_table = g_malloc0(s-vectors * sizeof(EventfdEntry)); + + ivshmem_use_msix(s); } static void ivshmem_save(QEMUFile* f, void *opaque) @@ -610,6 +621,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ + pci_default_write_config(pci_dev, address, val, len); + msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +752,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } + s-dev.config_write = ivshmem_write_config; + return 0; }
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
2011/12/2 Cam Macdonell c...@cs.ualberta.ca: 2011/11/30 Cam Macdonell c...@cs.ualberta.ca: 2011/11/30 Zang Hongyong zanghongy...@huawei.com: Can this bug fix patch be applied yet? Sorry, for not replying yet. I'll test your patch within the next day. Have you confirmed the proper receipt of interrupts in the receiving guests? I can confirm the bug occurs with ioeventfd enabled and that the patches fixes it, but sometime after 15.1, I no longer see interrupts (MSI or regular) being delivered in the guest. I will bisect tomorrow. With Michael's help we debugged msi-x interrupt delivery. With that fix in place, this patch fixes ioeventfd in ivshmem. Cam With this bug, guest os cannot successfully boot with ioeventfd. Thus the new PIO DoorBell patch cannot be posted. Well, you can certainly post the new patch, just clarify that it's dependent on this patch. Sincerely, Cam Thanks, Hongyong 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道: From: Hongyong Zang zanghongy...@huawei.com When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j s-peers[i].nb_eventfds; j++) { The bug is due to accessing s-peers which is NULL. This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; -pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], -s-mmio_addr + DOORBELL, (posn 16) | i, 0); +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { +memory_region_del_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (posn 16) | i, + s-peers[posn].eventfds[i]); +} close(s-peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - -int i, j; - -for (i = 0; i = s-max_peer; i++) { -for (j = 0; j s-peers[i].nb_eventfds; j++) { -memory_region_add_eventfd(s-ivshmem_mmio, - DOORBELL, - 4, - true, - (i 16) | j, - s-peers[i].eventfds[j]); -} -} -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, -(incoming_posn 16) | guest_max_eventfd, 1) 0) { -fprintf(stderr, ivshmem: ioeventfd not available\n); -} +memory_region_add_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -setup_ioeventfds(s); -} - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio);
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
On Sun, Dec 4, 2011 at 3:20 AM, Michael S. Tsirkin m...@redhat.com wrote: On Fri, Dec 02, 2011 at 04:34:21PM -0700, Cam Macdonell wrote: Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. I think the following should fix it. Compiled-only - could you pls check? If yes let's apply to the stable branch. Thanks for the patch Michael. It addresses the need for msix_write_config() to be called, but the addition of the msix_reset() is causing a reset of the vectors after they've been initialized in pci_ivshmem_init(). So, interrupts still aren't delivered with this patch applied as it is. In particular, a reset occurs after pci_ivshmem_init runs, so the msix_entry_used array is reset to 0s, which causes the interrupt delivery to fail. If I comment out the msix_reset(), then interrupts are delivered. Would the reset be caused by a bug in the guest driver? or do I need to reconfigure the msix after reset? I'm unclear as to the proper behaviour after a reset. Thanks, Cam -- ivshmem: add missing msix calls ivshmem used msix but didn't call it on either reset or config write paths. This used to partically work since guests don't use all of msi-x configuration fields, and reset is rarely used, but the patch 'msix: track function masked in pci device state' broke that. Fix by adding appropriate calls. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..3680c0f 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -505,6 +505,7 @@ static void ivshmem_reset(DeviceState *d) IVShmemState *s = DO_UPCAST(IVShmemState, dev.qdev, d); s-intrstatus = 0; + msix_reset(s-dev); return; } @@ -610,6 +611,13 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id) return 0; } +static void ivshmem_write_config(PCIDevice *pci_dev, uint32_t address, + uint32_t val, int len) +{ + pci_default_write_config(pci_dev, address, val, len); + msix_write_config(pci_dev, address, val, len); +} + static int pci_ivshmem_init(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); @@ -734,6 +742,8 @@ static int pci_ivshmem_init(PCIDevice *dev) } + s-dev.config_write = ivshmem_write_config; + return 0; }
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
2011/11/30 Cam Macdonell c...@cs.ualberta.ca: 2011/11/30 Zang Hongyong zanghongy...@huawei.com: Can this bug fix patch be applied yet? Sorry, for not replying yet. I'll test your patch within the next day. Have you confirmed the proper receipt of interrupts in the receiving guests? I can confirm the bug occurs with ioeventfd enabled and that the patches fixes it, but sometime after 15.1, I no longer see interrupts (MSI or regular) being delivered in the guest. I will bisect tomorrow. Cam With this bug, guest os cannot successfully boot with ioeventfd. Thus the new PIO DoorBell patch cannot be posted. Well, you can certainly post the new patch, just clarify that it's dependent on this patch. Sincerely, Cam Thanks, Hongyong 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道: From: Hongyong Zang zanghongy...@huawei.com When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j s-peers[i].nb_eventfds; j++) { The bug is due to accessing s-peers which is NULL. This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; -pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], -s-mmio_addr + DOORBELL, (posn 16) | i, 0); +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { +memory_region_del_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (posn 16) | i, + s-peers[posn].eventfds[i]); +} close(s-peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - -int i, j; - -for (i = 0; i = s-max_peer; i++) { -for (j = 0; j s-peers[i].nb_eventfds; j++) { -memory_region_add_eventfd(s-ivshmem_mmio, - DOORBELL, - 4, - true, - (i 16) | j, - s-peers[i].eventfds[j]); -} -} -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, -(incoming_posn 16) | guest_max_eventfd, 1) 0) { -fprintf(stderr, ivshmem: ioeventfd not available\n); -} +memory_region_add_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -setup_ioeventfds(s); -} - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio);
Re: [Qemu-devel] [PATCH for v1.0 1/3] msix: track function masked in pci device state
Based on a git bisect, this patch breaks msi-x interrupt delivery in the ivshmem device. On Mon, Nov 21, 2011 at 9:57 AM, Michael S. Tsirkin m...@redhat.com wrote: Only go over the table when function is masked. This is not really important for qemu.git but helps fix a bug in qemu-kvm.git. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/msix.c | 21 ++--- hw/pci.h | 2 ++ 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index b15bafc..63b41b9 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -79,6 +79,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, /* Make flags bit writable. */ pdev-wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK | MSIX_MASKALL_MASK; + pdev-msix_function_masked = true; return 0; iiuc, this masks the msix by default. } @@ -117,16 +118,11 @@ static void msix_clr_pending(PCIDevice *dev, int vector) *msix_pending_byte(dev, vector) = ~msix_pending_mask(vector); } -static int msix_function_masked(PCIDevice *dev) -{ - return dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] MSIX_MASKALL_MASK; -} - static int msix_is_masked(PCIDevice *dev, int vector) { unsigned offset = vector * PCI_MSIX_ENTRY_SIZE + PCI_MSIX_ENTRY_VECTOR_CTRL; - return msix_function_masked(dev) || + return dev-msix_function_masked || dev-msix_table_page[offset] PCI_MSIX_ENTRY_CTRL_MASKBIT; } @@ -138,24 +134,34 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector) } } +static void msix_update_function_masked(PCIDevice *dev) +{ + dev-msix_function_masked = !msix_enabled(dev) || + (dev-config[dev-msix_cap + MSIX_CONTROL_OFFSET] MSIX_MASKALL_MASK); +} + /* Handle MSI-X capability config write. */ void msix_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) { unsigned enable_pos = dev-msix_cap + MSIX_CONTROL_OFFSET; int vector; + bool was_masked; if (!range_covers_byte(addr, len, enable_pos)) { return; } + was_masked = dev-msix_function_masked; + msix_update_function_masked(dev); + if (!msix_enabled(dev)) { return; } pci_device_deassert_intx(dev); - if (msix_function_masked(dev)) { + if (dev-msix_function_masked == was_masked) { return; } So I believe my bug is due to the fact the new logic included in this patch requires msix_write_config() to be called to unmask the vectors. Virtio-pci calls msix_write_config(), but ivshmem does not (nor does PCIe so I'm not sure if it's also affected). I haven't been able to fix the bug yet, but I wanted to make sure I was looking in the correct place. Any help of further explanation of this patch would be greatly appreciated. Sincerely, Cam @@ -300,6 +306,7 @@ void msix_load(PCIDevice *dev, QEMUFile *f) msix_free_irq_entries(dev); qemu_get_buffer(f, dev-msix_table_page, n * PCI_MSIX_ENTRY_SIZE); qemu_get_buffer(f, dev-msix_table_page + MSIX_PAGE_PENDING, (n + 7) / 8); + msix_update_function_masked(dev); } /* Does device support MSI-X? */ diff --git a/hw/pci.h b/hw/pci.h index 4b2e785..625e717 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -178,6 +178,8 @@ struct PCIDevice { unsigned *msix_entry_used; /* Region including the MSI-X table */ uint32_t msix_bar_size; + /* MSIX function mask set or MSIX disabled */ + bool msix_function_masked; /* Version id needed for VMState */ int32_t version_id; -- 1.7.5.53.gc233e
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
2011/11/30 Zang Hongyong zanghongy...@huawei.com: Can this bug fix patch be applied yet? Sorry, for not replying yet. I'll test your patch within the next day. With this bug, guest os cannot successfully boot with ioeventfd. Thus the new PIO DoorBell patch cannot be posted. Well, you can certainly post the new patch, just clarify that it's dependent on this patch. Sincerely, Cam Thanks, Hongyong 于 2011/11/24,星期四 18:05, zanghongy...@huawei.com 写道: From: Hongyong Zang zanghongy...@huawei.com When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j s-peers[i].nb_eventfds; j++) { The bug is due to accessing s-peers which is NULL. This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; -pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { -kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], -s-mmio_addr + DOORBELL, (posn 16) | i, 0); +if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { +memory_region_del_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (posn 16) | i, + s-peers[posn].eventfds[i]); +} close(s-peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - -int i, j; - -for (i = 0; i = s-max_peer; i++) { -for (j = 0; j s-peers[i].nb_eventfds; j++) { -memory_region_add_eventfd(s-ivshmem_mmio, - DOORBELL, - 4, - true, - (i 16) | j, - s-peers[i].eventfds[j]); -} -} -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, -(incoming_posn 16) | guest_max_eventfd, 1) 0) { -fprintf(stderr, ivshmem: ioeventfd not available\n); -} +memory_region_add_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); -if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { -setup_ioeventfds(s); -} - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio);
Re: [Qemu-devel] [PATCH v2] ivshmem: add a new PIO BAR3(Doorbell) besides MMIO BAR0 to reduce notification time
On Thu, Nov 17, 2011 at 10:50 PM, zanghongy...@huawei.com wrote: From: Hongyong Zang zanghongy...@huawei.com This patch, adds a PIO BAR3 for guest notifying qemu. And we find the new notification way of PIO BAR3 reduces 30% time in comparison with the original MMIO BAR0 way. Come to think of it, should we bump the PIO to BAR4 so that the shared memory region could be made a 64-bit BAR and therefore take up BAR2 and BAR3? Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 24 ++-- kvm-all.c | 23 +++ kvm.h | 1 + 3 files changed, 46 insertions(+), 2 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..031cdd8 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -29,6 +29,7 @@ #define IVSHMEM_MASTER 1 #define IVSHMEM_REG_BAR_SIZE 0x100 +#define IVSHIO_REG_BAR_SIZE 0x10 //#define DEBUG_IVSHMEM #ifdef DEBUG_IVSHMEM @@ -57,8 +58,10 @@ typedef struct IVShmemState { CharDriverState **eventfd_chr; CharDriverState *server_chr; MemoryRegion ivshmem_mmio; + MemoryRegion ivshmem_pio; pcibus_t mmio_addr; + pcibus_t pio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -234,7 +237,7 @@ static uint64_t ivshmem_io_read(void *opaque, target_phys_addr_t addr, return ret; } -static const MemoryRegionOps ivshmem_mmio_ops = { +static const MemoryRegionOps ivshmem_io_ops = { .read = ivshmem_io_read, .write = ivshmem_io_write, .endianness = DEVICE_NATIVE_ENDIAN, @@ -348,6 +351,8 @@ static void close_guest_eventfds(IVShmemState *s, int posn) for (i = 0; i guest_curr_max; i++) { kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], s-mmio_addr + DOORBELL, (posn 16) | i, 0); + kvm_set_ioeventfd_pio_long(s-peers[posn].eventfds[i], + s-pio_addr + DOORBELL, (posn 16) | i, 0); close(s-peers[posn].eventfds[i]); } @@ -367,6 +372,12 @@ static void setup_ioeventfds(IVShmemState *s) { true, (i 16) | j, s-peers[i].eventfds[j]); + memory_region_add_eventfd(s-ivshmem_pio, + DOORBELL, + 4, + true, + (i 16) | j, + s-peers[i].eventfds[j]); } } } @@ -495,6 +506,10 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) (incoming_posn 16) | guest_max_eventfd, 1) 0) { fprintf(stderr, ivshmem: ioeventfd not available\n); } + if (kvm_set_ioeventfd_pio_long(incoming_fd, s-pio_addr + DOORBELL, + (incoming_posn 16) | guest_max_eventfd, 1) 0) { + fprintf(stderr, ivshmem: ioeventfd not available\n); + } } return; @@ -656,8 +671,10 @@ static int pci_ivshmem_init(PCIDevice *dev) s-shm_fd = 0; - memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, + memory_region_init_io(s-ivshmem_mmio, ivshmem_io_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); + memory_region_init_io(s-ivshmem_pio, ivshmem_io_ops, s, + ivshmem-pio, IVSHIO_REG_BAR_SIZE); if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { setup_ioeventfds(s); @@ -666,6 +683,8 @@ static int pci_ivshmem_init(PCIDevice *dev) /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio); + pci_register_bar(s-dev, 3, PCI_BASE_ADDRESS_SPACE_IO, + s-ivshmem_pio); memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size); @@ -742,6 +761,7 @@ static int pci_ivshmem_uninit(PCIDevice *dev) IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); memory_region_destroy(s-ivshmem_mmio); + memory_region_destroy(s-ivshmem_pio); memory_region_del_subregion(s-bar, s-ivshmem); memory_region_destroy(s-ivshmem); memory_region_destroy(s-bar); diff --git a/kvm-all.c b/kvm-all.c index 5d500e1..737c2e2 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1396,6 +1396,29 @@ int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign return 0; } +int kvm_set_ioeventfd_pio_long(int fd, uint32_t addr, uint32_t val, bool assign) +{ + struct kvm_ioeventfd kick = { + .datamatch = val, + .addr = addr, + .len = 4, + .flags = KVM_IOEVENTFD_FLAG_DATAMATCH | KVM_IOEVENTFD_FLAG_PIO, + .fd = fd, + }; + int r; + if (!kvm_enabled()) { + return -ENOSYS;
Re: [Qemu-devel] [PATCH] ivshmem: fix guest unable to start with ioeventfd
On Thu, Nov 24, 2011 at 3:05 AM, zanghongy...@huawei.com wrote: From: Hongyong Zang zanghongy...@huawei.com When a guest boots with ioeventfd, an error (by gdb) occurs: Program received signal SIGSEGV, Segmentation fault. 0x006009cc in setup_ioeventfds (s=0x171dc40) at /home/louzhengwei/git_source/qemu-kvm/hw/ivshmem.c:363 363 for (j = 0; j s-peers[i].nb_eventfds; j++) { The bug is due to accessing s-peers which is NULL. Can you share the command-line that caused the fault? This patch uses the memory region API to replace the old one kvm_set_ioeventfd_mmio_long(). And this patch makes memory_region_add_eventfd() called in ivshmem_read() when qemu receives eventfd information from ivshmem_server. Should this patch be split into two patches, to separate the bug fix from the other changes related to the Memory API? Unless I misunderstand how the two are necessarily related. Cam Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 41 ++--- 1 files changed, 14 insertions(+), 27 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..be26f03 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -58,7 +58,6 @@ typedef struct IVShmemState { CharDriverState *server_chr; MemoryRegion ivshmem_mmio; - pcibus_t mmio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -346,8 +345,14 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { - kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], - s-mmio_addr + DOORBELL, (posn 16) | i, 0); + if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { + memory_region_del_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (posn 16) | i, + s-peers[posn].eventfds[i]); + } close(s-peers[posn].eventfds[i]); } @@ -355,22 +360,6 @@ static void close_guest_eventfds(IVShmemState *s, int posn) s-peers[posn].nb_eventfds = 0; } -static void setup_ioeventfds(IVShmemState *s) { - - int i, j; - - for (i = 0; i = s-max_peer; i++) { - for (j = 0; j s-peers[i].nb_eventfds; j++) { - memory_region_add_eventfd(s-ivshmem_mmio, - DOORBELL, - 4, - true, - (i 16) | j, - s-peers[i].eventfds[j]); - } - } -} - /* this function increase the dynamic storage need to store data about other * guests */ static void increase_dynamic_storage(IVShmemState *s, int new_min_size) { @@ -491,10 +480,12 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { - if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, - (incoming_posn 16) | guest_max_eventfd, 1) 0) { - fprintf(stderr, ivshmem: ioeventfd not available\n); - } + memory_region_add_eventfd(s-ivshmem_mmio, + DOORBELL, + 4, + true, + (incoming_posn 16) | guest_max_eventfd, + incoming_fd); } return; @@ -659,10 +650,6 @@ static int pci_ivshmem_init(PCIDevice *dev) memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); - if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { - setup_ioeventfds(s); - } - /* region for registers*/ pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, s-ivshmem_mmio); -- 1.7.1
Re: [Qemu-devel] [PATCH] ivshmem: use PIO for BAR0(Doorbell) instead of MMIO to reduce notification time
On Sun, Nov 13, 2011 at 8:56 PM, zanghongy...@huawei.com wrote: From: Hongyong Zang zanghongy...@huawei.com Ivshmem(nahanni) is a mechanism for sharing host memory with VMs running on the same host. Currently, guest notifies qemu by reading or writing ivshmem device's PCI MMIO BAR0(Doorbell). This patch, changes this PCI MMIO BAR0(Doorbell) to PIO. And we find guest accesses PIO BAR 30% faster than MMIO BAR. Nice work :) Test it with: Call 5,000,000 times writing PCI BAR0's DOORBELL register, we got the total time as follows: linux command #time: MMIO(regular interrupt) PIO(regular interrupt) MMIO(msi+ioeventfd) PIO(msi+ioeventfd) real 101.441s 68.863s 70.720s 49.521s user 0.391s 0.305s 0.404s 0.340s sys 46.308s 30.634s 38.740s 27.559s Did you pin the VMs to cores? You're sending between 5-10 notifications per second, did you confirm that they are all being received? Since eventfds do not buffer, some may be lost at that rate. Of course, one would expect that a single notification should be faster based on these results, but I'm just curious. Do you know of any issues with mapping a PIO region to user-space with the UIO driver framework? Thanks, Cam Signed-off-by: Hongyong Zang zanghongy...@huawei.com --- hw/ivshmem.c | 26 +- kvm-all.c | 23 +++ kvm.h | 1 + 3 files changed, 37 insertions(+), 13 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 242fbea..e68d0a7 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -28,7 +28,7 @@ #define IVSHMEM_PEER 0 #define IVSHMEM_MASTER 1 -#define IVSHMEM_REG_BAR_SIZE 0x100 +#define IVSHMEM_REG_BAR_SIZE 0x10 //#define DEBUG_IVSHMEM #ifdef DEBUG_IVSHMEM @@ -56,9 +56,9 @@ typedef struct IVShmemState { CharDriverState **eventfd_chr; CharDriverState *server_chr; - MemoryRegion ivshmem_mmio; + MemoryRegion ivshmem_pio; - pcibus_t mmio_addr; + pcibus_t pio_addr; /* We might need to register the BAR before we actually have the memory. * So prepare a container MemoryRegion for the BAR immediately and * add a subregion when we have the memory. @@ -234,7 +234,7 @@ static uint64_t ivshmem_io_read(void *opaque, target_phys_addr_t addr, return ret; } -static const MemoryRegionOps ivshmem_mmio_ops = { +static const MemoryRegionOps ivshmem_pio_ops = { .read = ivshmem_io_read, .write = ivshmem_io_write, .endianness = DEVICE_NATIVE_ENDIAN, @@ -346,8 +346,8 @@ static void close_guest_eventfds(IVShmemState *s, int posn) guest_curr_max = s-peers[posn].nb_eventfds; for (i = 0; i guest_curr_max; i++) { - kvm_set_ioeventfd_mmio_long(s-peers[posn].eventfds[i], - s-mmio_addr + DOORBELL, (posn 16) | i, 0); + kvm_set_ioeventfd_pio_long(s-peers[posn].eventfds[i], + s-pio_addr + DOORBELL, (posn 16) | i, 0); close(s-peers[posn].eventfds[i]); } @@ -361,7 +361,7 @@ static void setup_ioeventfds(IVShmemState *s) { for (i = 0; i = s-max_peer; i++) { for (j = 0; j s-peers[i].nb_eventfds; j++) { - memory_region_add_eventfd(s-ivshmem_mmio, + memory_region_add_eventfd(s-ivshmem_pio, DOORBELL, 4, true, @@ -491,7 +491,7 @@ static void ivshmem_read(void *opaque, const uint8_t * buf, int flags) } if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { - if (kvm_set_ioeventfd_mmio_long(incoming_fd, s-mmio_addr + DOORBELL, + if (kvm_set_ioeventfd_pio_long(incoming_fd, s-pio_addr + DOORBELL, (incoming_posn 16) | guest_max_eventfd, 1) 0) { fprintf(stderr, ivshmem: ioeventfd not available\n); } @@ -656,16 +656,16 @@ static int pci_ivshmem_init(PCIDevice *dev) s-shm_fd = 0; - memory_region_init_io(s-ivshmem_mmio, ivshmem_mmio_ops, s, - ivshmem-mmio, IVSHMEM_REG_BAR_SIZE); + memory_region_init_io(s-ivshmem_pio, ivshmem_pio_ops, s, + ivshmem-pio, IVSHMEM_REG_BAR_SIZE); if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD)) { setup_ioeventfds(s); } /* region for registers*/ - pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, - s-ivshmem_mmio); + pci_register_bar(s-dev, 0, PCI_BASE_ADDRESS_SPACE_IO, + s-ivshmem_pio); memory_region_init(s-bar, ivshmem-bar2-container, s-ivshmem_size); @@ -741,7 +741,7 @@ static int pci_ivshmem_uninit(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); - memory_region_destroy(s-ivshmem_mmio); +
Re: [Qemu-devel] Questions regarding ivshmem spec
On Thu, Aug 25, 2011 at 7:29 AM, Sasha Levin levinsasha...@gmail.com wrote: Hello, I am looking to implement an ivshmem device for KVM tools, the purpose is to provide same functionality as QEMU and interoperability with QEMU. Going through the spec (I found here: https://gitorious.org/nahanni/guest-code/blobs/master/device_spec.txt ) and the code in QEMU I have gathered several questions, I'll be happy for some help with it. 1. File handles and guest IDs are passed between the server and the peers using sockets, is the protocol itself documented anywhere? I would like to be able to work alongside QEMU servers/peers. 2. The spec describes DOORBELL as an array of DWORDs, when one guest wants to poke a different guest it would write something into the offset of the other guest in the DOORBELL array. Looking at the implementation in QEMU, DOORBELL is one DWORD, when writing to it the upper WORD is the guest id and the lower WORD is the value. What am I missing here? 3. There are 3 ways for guests to communicate between each other, and I'm assuming all guests using the same SHM block must use the same method. Is it safe to assume we'll always use ioeventfds as in the implementation now? Guests can either access the shared memory region directly or use the server, so I would count that as 2 ways to share memory via ivshmem. Can you clarify what you mean by ways for guests to communicate? As for ioeventfds, they are an optional optimization to eventfds. We use eventfds currently, but since the guest VMs only see file descriptors (passed from the server) another mechanism could replace eventfds if there was some reason to. Cam
Re: [Qemu-devel] Questions regarding ivshmem spec
On Mon, Aug 29, 2011 at 9:25 AM, Sasha Levin levinsasha...@gmail.com wrote: On Thu, 2011-08-25 at 16:29 +0300, Sasha Levin wrote: Hello, I am looking to implement an ivshmem device for KVM tools, the purpose is to provide same functionality as QEMU and interoperability with QEMU. [snip] 1. File handles and guest IDs are passed between the server and the peers using sockets, is the protocol itself documented anywhere? I would like to be able to work alongside QEMU servers/peers. I'm still wondering if someone could do a quick sketch of the client-server protocol or possibly point me to something that documents it? Hi Sasha, I have something like that. I'll be in touch when I find it. Cam -- Sasha.
Re: [Qemu-devel] [PATCH] vmstate: Add unmigratable flag
On Mon, Jun 20, 2011 at 3:05 AM, Jan Kiszka jan.kis...@siemens.com wrote: On 2011-06-19 22:46, Cam Macdonell wrote: On Thu, Jun 9, 2011 at 2:39 PM, Jan Kiszka jan.kis...@web.de wrote: On 2011-06-09 22:00, Anthony Liguori wrote: On 06/09/2011 11:44 AM, Jan Kiszka wrote: A first step towards getting rid of register_device_unmigratable (ivshmem and lacking vmstate support in virtio are blocking this): Allow to register an unmigratable vmstate via qdev, i.e. tag a device declaratively. I thought part of the problem with this was that for some devices (like ivshmem), whether it can be migrated was dynamic. It depends on configuration, state, etc. That only applies to ivshmem (the other user is device assignment which is unconditionally unmigratable). And the ivshmem issue could easily be solved by defining two devices, ivshmem-peer (or just ivshmem) and ivshmem-master, eliminating the need for the role property. I don't think there will ever be a use case for a transformer device that becomes unmigratable during runtime (would be a nightmare for management apps anyway). If breaking the user interface of ivshmem for this is OK, I'll post a patch. Jan The migratability of ivshmem is not dynamic in that it doesn't change at runtime, it's set when the device is created, either role=peer or role=master is specified. So iiuc, this could work with ivshmem. So you are fine with breaking the interface? Everyone else as well? Then I'll cook a patch to sort at least this out for 0.15. To be clear, this would break the interface in that a device cannot specify whether it's migratable via a parameter? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] vmstate: Add unmigratable flag
On Thu, Jun 9, 2011 at 2:39 PM, Jan Kiszka jan.kis...@web.de wrote: On 2011-06-09 22:00, Anthony Liguori wrote: On 06/09/2011 11:44 AM, Jan Kiszka wrote: A first step towards getting rid of register_device_unmigratable (ivshmem and lacking vmstate support in virtio are blocking this): Allow to register an unmigratable vmstate via qdev, i.e. tag a device declaratively. I thought part of the problem with this was that for some devices (like ivshmem), whether it can be migrated was dynamic. It depends on configuration, state, etc. That only applies to ivshmem (the other user is device assignment which is unconditionally unmigratable). And the ivshmem issue could easily be solved by defining two devices, ivshmem-peer (or just ivshmem) and ivshmem-master, eliminating the need for the role property. I don't think there will ever be a use case for a transformer device that becomes unmigratable during runtime (would be a nightmare for management apps anyway). If breaking the user interface of ivshmem for this is OK, I'll post a patch. Jan The migratability of ivshmem is not dynamic in that it doesn't change at runtime, it's set when the device is created, either role=peer or role=master is specified. So iiuc, this could work with ivshmem. Cam
Re: [Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs
On Mon, Dec 13, 2010 at 8:00 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Mon, Dec 13, 2010 at 03:43:44PM -0700, Cam Macdonell wrote: Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar. Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory. I was encountering crashes when mapping certain PCI region sizes. The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR. For example when mapping to 0x18000, once the lower 32-bits were written the remapping happened (mapping to 0x800) which would overwrite something. I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag. Upper 32-bit address parts can be values like 0xff which is tricky to test against. You're assuming that guest OS always write lower 32bit and them upper 32bit. Is the assumption correct? I found Linux does, but I don't know about other OSes. And I couldn't find any sentence about how to update (64bit) BAR in the specs. (Please correct me if I missed it) I think you're right, we probably can't assume the order. Some work around would be necessary regardless of 32bit-or-64bit. because qemu doesn't emulate bus accurately at the moment. How about the followings? If BAR overlaps with RAM, don't map BAR. If BAR overlaps with other BARs, record the overlapping and when updating one of the BARs, update all the overlapping BARs. Which BAR wins depends on the order of updating, it doesn't matter because it's anomaly case. But the addresses in the BARs may not overlap. For example, Linux allocates memory from top down, so I recently had the mapping of a BAR to address 0xffc000 So BAR 0x18 sees 0xc004 Then BAR 0x1c sees 0xff So if I understand what you mean by overlapping BARs, 0xc000 and 0xffc000 will not be detected as overlapping and so we can't record it. But, we can allow harmless mappings of the incomplete lower-32 to proceed and then get remapped when the upper bits are written. (This is what happens currently, but fails when the lower-32 overwrite RAM). Case of writing upper-then-lower (non-Linux case): The addresses in the upper 32-bits are going to be limited to 16-bits (at most 48-bit addresses currently) and so those shouldn't update mappings because they will overlap with RAM. When the lower-bits are written, we have the full 64-bit address and can update mappings. Case of writing lower-then-upper: If the lower 32-bit BAR address doesn't conflict with RAM, map it. When the upper bits are written, update to the correct mapping. We would just have to ensure the first mapping is indeed harmless. Would that work? Cam This way, 32bit BAR case is also covered. thanks, Cam --- hw/pci.c | 5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 438c0d1..3b81792 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { int i, was_irq_disabled = pci_irq_disabled(d); uint32_t config_size = pci_config_size(d); + int is_64 = 0; + + is_64 = ((val 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64); for (i = 0; i l addr + i config_size; val = 8, ++i) { uint8_t wmask = d-wmask[addr + i]; @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) d-config[addr + i] = (d-config[addr + i] ~wmask) | (val wmask); d-config[addr + i] = ~(val w1cmask); /* W1C: Write 1 to Clear */ } - if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || + if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) (!is_64)) || ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || range_covers_byte(addr, l, PCI_COMMAND)) -- 1.7.0.4 -- yamahata
[Qemu-devel] [PATCH] RFC: delay pci_update_mappings for 64-bit BARs
Do not call pci_update_mappings on the lower 32-bits of a 64-bit bar. Wait for the upper 32 or else Qemu will try to map on just the lower 32 which is probably going to corrupt memory. I was encountering crashes when mapping certain PCI region sizes. The problem turns out that pci_update_mappings is being called without all 64-bits in the BAR. For example when mapping to 0x18000, once the lower 32-bits were written the remapping happened (mapping to 0x800) which would overwrite something. I'm not certain if this is completely correct, I'm simply testing the lower 4-bits to only be MEM_TYPE_64 flag. Upper 32-bit address parts can be values like 0xff which is tricky to test against. Cam --- hw/pci.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 438c0d1..3b81792 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1000,6 +1000,9 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { int i, was_irq_disabled = pci_irq_disabled(d); uint32_t config_size = pci_config_size(d); +int is_64 = 0; + +is_64 = ((val 0xf) == PCI_BASE_ADDRESS_MEM_TYPE_64); for (i = 0; i l addr + i config_size; val = 8, ++i) { uint8_t wmask = d-wmask[addr + i]; @@ -1008,7 +1011,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) d-config[addr + i] = (d-config[addr + i] ~wmask) | (val wmask); d-config[addr + i] = ~(val w1cmask); /* W1C: Write 1 to Clear */ } -if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +if ((ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) (!is_64)) || ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || range_covers_byte(addr, l, PCI_COMMAND)) -- 1.7.0.4
[Qemu-devel] [PATCH 1/2] add qemu_ram_free_from_ptr
add function to free memory from Qemu that was added via qemu_ram_alloc_from_ptr. Name is a little weird. This is copied from qemu_ram_unmap from qemu-kvm. Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- cpu-common.h |1 + exec.c | 13 + 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index a543b5d..3f802e1 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -44,6 +44,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size); void qemu_ram_free(ram_addr_t addr); +void qemu_ram_free_from_ptr(ram_addr_t addr); /* This should only be used for ram local to a device. */ void *qemu_get_ram_ptr(ram_addr_t addr); /* This should not be used by devices. */ diff --git a/exec.c b/exec.c index db9ff55..1f5c8f8 100644 --- a/exec.c +++ b/exec.c @@ -2875,6 +2875,19 @@ ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size) return qemu_ram_alloc_from_ptr(dev, name, size, NULL); } +void qemu_ram_free_from_ptr(ram_addr_t addr) +{ +RAMBlock *block; + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (addr == block-offset) { +QLIST_REMOVE(block, next); +qemu_free(block); +return; +} +} +} + void qemu_ram_free(ram_addr_t addr) { RAMBlock *block; -- 1.7.0.4
[Qemu-devel] [PATCH 2/2] Unregister shared memory on unplug.
This allows 'peer' ivshmem guests to detach from shared memory before migration and re-attach after migration is complete. Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- hw/ivshmem.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index 06dce70..9254fad 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -68,6 +68,7 @@ typedef struct IVShmemState { int nb_peers; /* how many guests we have space for */ int max_peer; /* maximum numbered peer */ +void *shm_ptr; int vm_id; uint32_t vectors; uint32_t features; @@ -365,14 +366,13 @@ static int check_shm_size(IVShmemState *s, int fd) { * create the BAR and map the memory immediately */ static void create_shared_memory_BAR(IVShmemState *s, int fd) { -void * ptr; - s-shm_fd = fd; -ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); +s-shm_ptr = mmap(0, s-ivshmem_size, PROT_READ|PROT_WRITE, +MAP_SHARED, fd, 0); s-ivshmem_offset = qemu_ram_alloc_from_ptr(s-dev.qdev, ivshmem.bar2, -s-ivshmem_size, ptr); +s-ivshmem_size, s-shm_ptr); /* region for shared memory */ pci_register_bar(s-dev, 2, s-ivshmem_size, @@ -797,6 +797,10 @@ static int pci_ivshmem_uninit(PCIDevice *dev) { IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev); +cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, +IO_MEM_UNASSIGNED); +qemu_ram_free_from_ptr(s-ivshmem_offset); +munmap(s-shm_ptr, s-ivshmem_size); cpu_unregister_io_memory(s-ivshmem_mmio_io_addr); unregister_savevm(dev-qdev, ivshmem, s); -- 1.7.0.4
[Qemu-devel] [PATCH 0/2] fixups for ivshmem unplug/migration
We added qemu_ram_alloc_from_ptr, but we need its corresponding free function. With that, we can properly remove the ivshmem memory on unplug. Cam Macdonell (2): add qemu_ram_free_from_ptr Unregister shared memory on unplug. cpu-common.h |1 + exec.c | 13 + hw/ivshmem.c | 12 3 files changed, 22 insertions(+), 4 deletions(-)
[Qemu-devel] Re: [PATCH] pci: allow hotplug removal of cold-plugged devices
On Sun, Nov 14, 2010 at 7:18 AM, Michael S. Tsirkin m...@redhat.com wrote: This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1 which broke hotplug removal of cold plugged devices: - pass addition/removal state to hotplug callbacks - use that in piix and pcie This also fixes an assert on hotplug removal of coldplugged express devices. Reported-by: by Cam Macdonell c...@cs.ualberta.ca. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Michael S. Tsirkin m...@redhat.com Acked-by: Cam Macdonell c...@cs.ualberta.ca --- So I think the below would be the cleanest way to fix the bug as we keep the hot/cold plug logic in a central palce. Untested. Comments? Cam? Yes, it seems to fix the problem. diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index 66c7885..f549089 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -585,7 +585,8 @@ static void pciej_write(void *opaque, uint32_t addr, uint32_t val) PIIX4_DPRINTF(pciej write %x == %d\n, addr, val); } -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int state); +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, + PCIHotplugState state); static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s) { @@ -615,18 +616,23 @@ static void disable_device(PIIX4PMState *s, int slot) s-pci0_status.down |= (1 slot); } -static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int state) +static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, + PCIHotplugState state) { int slot = PCI_SLOT(dev-devfn); PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, DO_UPCAST(PCIDevice, qdev, qdev)); - if (!dev-qdev.hotplugged) + /* Don't send event when device is enabled during qemu machine creation: + * it is present on boot, no hotplug event is necessary. We do send an + * event when the device is disabled later. */ + if (state == PCI_COLDPLUG_ENABLED) { return 0; + } s-pci0_status.up = 0; s-pci0_status.down = 0; - if (state) { + if (state == PCI_HOTPLUG_ENABLED) { enable_device(s, slot); } else { disable_device(s, slot); diff --git a/hw/pci.c b/hw/pci.c index 30e1603..316b24f 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1566,8 +1566,11 @@ static int pci_qdev_init(DeviceState *qdev, DeviceInfo *base) pci_add_option_rom(pci_dev); if (bus-hotplug) { - /* lower layer must check qdev-hotplugged */ - rc = bus-hotplug(bus-hotplug_qdev, pci_dev, 1); + /* Let buses differentiate between hotplug and when device is + * enabled during qemu machine creation. */ + rc = bus-hotplug(bus-hotplug_qdev, pci_dev, + qdev-hotplugged ? PCI_HOTPLUG_ENABLED: + PCI_COLDPLUG_ENABLED); if (rc != 0) { int r = pci_unregister_device(pci_dev-qdev); assert(!r); @@ -1581,7 +1584,8 @@ static int pci_unplug_device(DeviceState *qdev) { PCIDevice *dev = DO_UPCAST(PCIDevice, qdev, qdev); - return dev-bus-hotplug(dev-bus-hotplug_qdev, dev, 0); + return dev-bus-hotplug(dev-bus-hotplug_qdev, dev, + PCI_HOTPLUG_DISABLED); } void pci_qdev_register(PCIDeviceInfo *info) diff --git a/hw/pci.h b/hw/pci.h index 7100804..09b3e4c 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -214,7 +214,15 @@ int pci_device_load(PCIDevice *s, QEMUFile *f); typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level); typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num); -typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev, int state); + +typedef enum { + PCI_HOTPLUG_DISABLED, + PCI_HOTPLUG_ENABLED, + PCI_COLDPLUG_ENABLED, +} PCIHotplugState; + +typedef int (*pci_hotplug_fn)(DeviceState *qdev, PCIDevice *pci_dev, + PCIHotplugState state); void pci_bus_new_inplace(PCIBus *bus, DeviceState *parent, const char *name, int devfn_min); PCIBus *pci_bus_new(DeviceState *parent, const char *name, int devfn_min); diff --git a/hw/pcie.c b/hw/pcie.c index 35918f7..4df48b8 100644 --- a/hw/pcie.c +++ b/hw/pcie.c @@ -192,14 +192,16 @@ static void pcie_cap_slot_event(PCIDevice *dev, PCIExpressHotPlugEvent event) } static int pcie_cap_slot_hotplug(DeviceState *qdev, - PCIDevice *pci_dev, int state) + PCIDevice *pci_dev, PCIHotplugState state) { PCIDevice *d = DO_UPCAST(PCIDevice, qdev, qdev); uint8_t *exp_cap = d-config + d-exp.exp_cap; uint16_t sltsta = pci_get_word(exp_cap + PCI_EXP_SLTSTA); - if (!pci_dev-qdev.hotplugged) { - assert(state); /* this case only happens at machine creation. */ + /* Don't send event when device
[Qemu-devel] Cannot not unplug cold-plugged devices
Hi, I was trying to do a device_del on my ivshmem device and it won't work unless the device is added via hotplug. If the device is coldplugged (added at startup) then nothing happens. I think I tracked this behaviour to the patch below. Is not allowing coldplugged devices to be unplugged the desired behaviour? Thanks, Cam commit 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1 Author: Isaku Yamahata yamah...@valinux.co.jp Date: Mon Sep 6 16:46:18 2010 +0900 pci: call hotplug callback even when not hotplug case for later use. call hotplug callback even when not hotplug case for later use. And move hotplug check into hotplug callback. PCIE slot needs this for card presence detection. Signed-off-by: Isaku Yamahata yamah...@valinux.co.jp Signed-off-by: Michael S. Tsirkin m...@redhat.com diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c index bfa1d9a..24dfcf2 100644 --- a/hw/acpi_piix4.c +++ b/hw/acpi_piix4.c @@ -611,6 +611,9 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev, int state) PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, DO_UPCAST(PCIDevice, qdev, qdev)); +if (!dev-qdev.hotplugged) +return 0; + s-pci0_status.up = 0; s-pci0_status.down = 0; if (state) {
[Qemu-devel] ACPI error when mapping a 2GB BAR w/ 4GB of RAM
After fixing the resource_size_t return value with pci_resource_alignment, I see one other strange behaviour only when using 4GB of RAM and a 2GB BAR. I haven't found any other combination of RAM/BAR size that triggers this bug. I am using 2.6.36-rc3. ACPI Error: The DSDT has been corrupted or replaced - old, new headers below (20100702/tbutils-372) ACPI: DSDT (null) 01F15 (v01 BXPC BXDSDT 0001 INTL 20090123) ACPI: (null) 0 (v00 ) ACPI Error: Please send DMI info to linux-a...@vger.kernel.org If system does not work as expected, please boot with acpi=copy_dsdt (20100702/tbutils-378) ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS (20100702/pci_link-283) ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi or acpi=off virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11 Non-volatile memory driver v1.3 Linux agpgart interface v0.103 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled with acpi=off, the ACPI error output goes away, but the boot does not get any further. here are the PCI writes from Qemu related to this BAR's assignment in case they are helpful, pci_write_config: (val) 0x8004 - 0x18 (addr) IVSHMEM: guest pci addr = 8000, guest h/w addr = 4312137728, size = 8000 pci_read_config: (val) 0x8004 - 0x18 (addr) pci_write_config: (val) 0x1 - 0x1c (addr) IVSHMEM: guest pci addr = 18000, guest h/w addr = 4312137728, size = 8000 pci_read_config: (val) 0x1 - 0x1c (addr) Any pointers are appreciated, Cam
Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM
On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote: * Cam Macdonell (c...@cs.ualberta.ca) wrote: After fixing the resource_size_t return value with pci_resource_alignment, I see one other strange behaviour only when using 4GB of RAM and a 2GB BAR. I haven't found any other combination of RAM/BAR size that triggers this bug. I am using 2.6.36-rc3. ACPI Error: The DSDT has been corrupted or replaced - old, new headers below (20100702/tbutils-372) ACPI: DSDT (null) 01F15 (v01 BXPC BXDSDT 0001 INTL 20090123) ACPI: (null) 0 (v00 ) ACPI Error: Please send DMI info to linux-a...@vger.kernel.org If system does not work as expected, please boot with acpi=copy_dsdt (20100702/tbutils-378) ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS (20100702/pci_link-283) ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi or acpi=off virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11 Non-volatile memory driver v1.3 Linux agpgart interface v0.103 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled IIRC, the pci hole is only 1.5G in the BIOS, can you verify that seabios is doing the right thing? I'm not sure what the right thing for seabios to do is. Here is the seabios output related to the device. PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110 region 0: 0xf204 init smm init boot device ordering snip Attempting to init PCI bdf 00:04.0 (vd 1af4:1110) Attempting to map option rom on dev 00:04.0 Option rom sizing returned 0 0 Checking rom 0x000c9800 (sig aa55 size 17) Checking rom 0x000cc000 (sig aa55 size 2) Checking rom 0x000c9000 (sig aa55 size 4) Checking rom 0x000c9800 (sig aa55 size 17) Checking rom 0x000cc000 (sig aa55 size 2) Mapping hd drive 0x000fdb50 to 0 Running option rom at c980:0003 Running option rom at cc00:0003 pmm_malloc zone=0x000f515c handle= size=36 align=10 ret=0x000fdaf0 (detail=0x7ffefca0) ebda moved from 9fc00 to 9f400 pmm_malloc zone=0x000f5154 handle= size=2048 align=10 ret=0x0009f800 (detail=0x7ffefb40) finalize PMM malloc finalize when using a BAR of 2GB or less, there is an additional write to the PCI space of the device, which may be from the bios pci_write_config: (val) 0x - 0x18 (addr) pci_read_config: (val) 0x8004 - 0x18 (addr) pci_write_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x3 - 0x4 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x - 0x1c (addr) IVSHMEM: guest pci addr = , guest h/w addr = 4312137728, size = 8000 so is it succeeding with smaller sizes ( 2GB) because it fits in the bios' pci hole? Cam thanks, -chris
Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM
On Fri, Sep 17, 2010 at 2:52 PM, Cam Macdonell c...@cs.ualberta.ca wrote: On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote: * Cam Macdonell (c...@cs.ualberta.ca) wrote: After fixing the resource_size_t return value with pci_resource_alignment, I see one other strange behaviour only when using 4GB of RAM and a 2GB BAR. I haven't found any other combination of RAM/BAR size that triggers this bug. I am using 2.6.36-rc3. ACPI Error: The DSDT has been corrupted or replaced - old, new headers below (20100702/tbutils-372) ACPI: DSDT (null) 01F15 (v01 BXPC BXDSDT 0001 INTL 20090123) ACPI: (null) 0 (v00 ) ACPI Error: Please send DMI info to linux-a...@vger.kernel.org If system does not work as expected, please boot with acpi=copy_dsdt (20100702/tbutils-378) ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS (20100702/pci_link-283) ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi or acpi=off virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11 Non-volatile memory driver v1.3 Linux agpgart interface v0.103 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled IIRC, the pci hole is only 1.5G in the BIOS, can you verify that seabios is doing the right thing? I'm not sure what the right thing for seabios to do is. Here is the seabios output related to the device. PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110 region 0: 0xf204 init smm init boot device ordering snip Attempting to init PCI bdf 00:04.0 (vd 1af4:1110) Attempting to map option rom on dev 00:04.0 Option rom sizing returned 0 0 Checking rom 0x000c9800 (sig aa55 size 17) Checking rom 0x000cc000 (sig aa55 size 2) Checking rom 0x000c9000 (sig aa55 size 4) Checking rom 0x000c9800 (sig aa55 size 17) Checking rom 0x000cc000 (sig aa55 size 2) Mapping hd drive 0x000fdb50 to 0 Running option rom at c980:0003 Running option rom at cc00:0003 pmm_malloc zone=0x000f515c handle= size=36 align=10 ret=0x000fdaf0 (detail=0x7ffefca0) ebda moved from 9fc00 to 9f400 pmm_malloc zone=0x000f5154 handle= size=2048 align=10 ret=0x0009f800 (detail=0x7ffefb40) finalize PMM malloc finalize when using a BAR of 2GB or less, there is an additional write to the PCI space of the device, which may be from the bios pci_write_config: (val) 0x - 0x18 (addr) pci_read_config: (val) 0x8004 - 0x18 (addr) pci_write_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x3 - 0x4 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x - 0x1c (addr) IVSHMEM: guest pci addr = , guest h/w addr = 4312137728, size = 8000 so is it succeeding with smaller sizes ( 2GB) because it fits in the bios' pci hole? sorry that should be 2GB. Cam thanks, -chris
Re: [Qemu-devel] Re: ACPI error when mapping a 2GB BAR w/ 4GB of RAM
On Fri, Sep 17, 2010 at 3:15 PM, Chris Wright chr...@redhat.com wrote: * Cam Macdonell (c...@cs.ualberta.ca) wrote: On Fri, Sep 17, 2010 at 2:52 PM, Cam Macdonell c...@cs.ualberta.ca wrote: On Fri, Sep 17, 2010 at 2:04 PM, Chris Wright chr...@redhat.com wrote: * Cam Macdonell (c...@cs.ualberta.ca) wrote: After fixing the resource_size_t return value with pci_resource_alignment, I see one other strange behaviour only when using 4GB of RAM and a 2GB BAR. I haven't found any other combination of RAM/BAR size that triggers this bug. I am using 2.6.36-rc3. ACPI Error: The DSDT has been corrupted or replaced - old, new headers below (20100702/tbutils-372) ACPI: DSDT (null) 01F15 (v01 BXPC BXDSDT 0001 INTL 20090123) ACPI: (null) 0 (v00 ) ACPI Error: Please send DMI info to linux-a...@vger.kernel.org If system does not work as expected, please boot with acpi=copy_dsdt (20100702/tbutils-378) ACPI: PCI Interrupt Link [LNKC] disabled and referenced, BIOS bug ACPI Exception: AE_AML_INVALID_RESOURCE_TYPE, Evaluating _CRS (20100702/pci_link-283) ACPI: Unable to set IRQ for PCI Interrupt Link [LNKC]. Try pci=noacpi or acpi=off virtio-pci :00:03.0: PCI INT A: no GSI - using ISA IRQ 11 Non-volatile memory driver v1.3 Linux agpgart interface v0.103 Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled IIRC, the pci hole is only 1.5G in the BIOS, can you verify that seabios is doing the right thing? I'm not sure what the right thing for seabios to do is. Here is the seabios output related to the device. PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110 region 0: 0xf204 init smm init boot device ordering snip Attempting to init PCI bdf 00:04.0 (vd 1af4:1110) Attempting to map option rom on dev 00:04.0 Option rom sizing returned 0 0 Checking rom 0x000c9800 (sig aa55 size 17) Checking rom 0x000cc000 (sig aa55 size 2) Checking rom 0x000c9000 (sig aa55 size 4) Checking rom 0x000c9800 (sig aa55 size 17) Checking rom 0x000cc000 (sig aa55 size 2) Mapping hd drive 0x000fdb50 to 0 Running option rom at c980:0003 Running option rom at cc00:0003 pmm_malloc zone=0x000f515c handle= size=36 align=10 ret=0x000fdaf0 (detail=0x7ffefca0) ebda moved from 9fc00 to 9f400 pmm_malloc zone=0x000f5154 handle= size=2048 align=10 ret=0x0009f800 (detail=0x7ffefb40) finalize PMM malloc finalize when using a BAR of 2GB or less, there is an additional write to the PCI space of the device, which may be from the bios pci_write_config: (val) 0x - 0x18 (addr) pci_read_config: (val) 0x8004 - 0x18 (addr) pci_write_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x3 - 0x4 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x - 0x1c (addr) IVSHMEM: guest pci addr = , guest h/w addr = 4312137728, size = 8000 so is it succeeding with smaller sizes ( 2GB) because it fits in the bios' pci hole? sorry that should be 2GB. It seems most likely... 2GB also means = 1GB (which would fit in the hole). Although, I have to admit, I'm not sure how seabios handles the hole nowadays. What about 2GB with 32bit BAR? The bios seems to ignore it and the guest can't map it as it cannot find an alignment for it. Seems strange since with 1GB of RAM and 2GB BAR, everything should map below 4GB. thanks, -chris
Re: [Qemu-devel] Guest cannot handle a PCI BAR 1GB
On Sun, Sep 5, 2010 at 10:50 AM, Avi Kivity a...@redhat.com wrote: On 09/04/2010 01:22 AM, Cam Macdonell wrote: Hi, I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and I get an error in the guest that it is able to find a mem resource for a BAR larger than 1GB. I'm using 64-bit BARs. when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a resource (and searches beyond 32-bit values to find it). Here is a log from printfs added to the loop that searches the resources from find_resource() in kernel/resource.c:363. This is a kernel question, not a qemu issue. Copying lkml. trying 'tmp.start' 1000 to 'tmp.end' fff trying 'tmp.start' 9f400 to 'tmp.end' 9f3ff trying 'tmp.start' a to 'tmp.end' e trying 'tmp.start' 10 to 'tmp.end' f trying 'tmp.start' dfffd000 to 'tmp.end' dfffcfff trying 'tmp.start' e000 to 'tmp.end' efff trying 'tmp.start' f200 to 'tmp.end' f1ff trying 'tmp.start' f2001000 to 'tmp.end' f200 trying 'tmp.start' f202 to 'tmp.end' f201 trying 'tmp.start' f2021000 to 'tmp.end' f202 trying 'tmp.start' f204 to 'tmp.end' f203 trying 'tmp.start' f2040100 to 'tmp.end' febf trying 'tmp.start' fec00400 to 'tmp.end' fffb trying 'tmp.start' 1 to 'tmp.end' trying 'tmp.start' 1a000 to 'tmp.end' pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit] pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit] (PCI address [0x1c000-0x1] and you can see the BAR is successfully assigned. However, with a 2GB BAR (below), the search fails, but it also never searches beyong 32-bits. Again, all that's changed is the size of the ivshmem region. trying 'tmp.start' 1000 to 'tmp.end' fff trying 'tmp.start' 9f400 to 'tmp.end' 9f3ff trying 'tmp.start' a to 'tmp.end' e trying 'tmp.start' 10 to 'tmp.end' f trying 'tmp.start' dfffd000 to 'tmp.end' dfffcfff pci :00:04.0: BAR 2: can't assign mem (size 0x8000) Is there a limit to PCI BAR sizes or resources? Any pointers or further debugging tips are greatly appreciated. What kernel version are you looking at? latest kvm git, 2.6.36-rc2+ Please add printks to the loop so we can see this-start and this-end. It smells like a truncation issue. Success with a 1GB BAR this-start 1000, this-end 9f3ff this-start 9f400, this-end 9 this-start f, this-end f this-start 10, this-end dfffcfff this-start dfffd000, this-end dfff this-start f000, this-end f1ff this-start f200, this-end f2000fff this-start f201, this-end f201 this-start f202, this-end f2020fff this-start f203, this-end f203 this-start f204, this-end f20400ff this-start fec0, this-end fec003ff this-start fffc, this-end this-start 1, this-end 11fff tmp.start 12000, tmp.end pci :00:04.0: BAR 2: assigned [mem 0x14000-0x17fff 64bit] and when it fails with a 2GB BAR, the following is printed this-start 1000, this-end 9f3ff this-start 9f400, this-end 9 this-start f, this-end f this-start 10, this-end dfffcfff this-start dfffd000, this-end dfff pci :00:04.0: BAR 2: can't assign mem (size 0x8000) I added a few more debug statements and found that in the failure case, the function returns that it found a region (the last one printed before the error). I've added printfs for the two tests in the if that determine when a region is found: if (tmp.start tmp.end tmp.end - tmp.start = size - 1) { new-start = tmp.start; new-end = tmp.start + size - 1; printk(KERN_INFO returning 0\n); return 0; } this-start 1000, this-end 9f3ff tmp.start 8000, tmp.end fff true: 8fff = 7fff this-start 9f400, this-end 9 tmp.start 8000, tmp.end 9f3ff true: 8009f3ff = 7fff this-start f, this-end f tmp.start 8000, tmp.end e true: 800e = 7fff this-start 10, this-end dfffcfff tmp.start 8000, tmp.end f true: 800f = 7fff this-start dfffd000, this-end dfff tmp.start 10, tmp.end dfffcfff true: 10 dfffcfff true: dfefcfff = 7fff returning 0 pci :00:04.0: BAR 2: can't assign mem (size 0x8000) -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Guest cannot handle a PCI BAR 1GB
On Mon, Sep 6, 2010 at 10:37 AM, Cam Macdonell c...@cs.ualberta.ca wrote: On Sun, Sep 5, 2010 at 10:50 AM, Avi Kivity a...@redhat.com wrote: On 09/04/2010 01:22 AM, Cam Macdonell wrote: Hi, I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and I get an error in the guest that it is able to find a mem resource for a BAR larger than 1GB. I'm using 64-bit BARs. when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a resource (and searches beyond 32-bit values to find it). Here is a log from printfs added to the loop that searches the resources from find_resource() in kernel/resource.c:363. This is a kernel question, not a qemu issue. Copying lkml. trying 'tmp.start' 1000 to 'tmp.end' fff trying 'tmp.start' 9f400 to 'tmp.end' 9f3ff trying 'tmp.start' a to 'tmp.end' e trying 'tmp.start' 10 to 'tmp.end' f trying 'tmp.start' dfffd000 to 'tmp.end' dfffcfff trying 'tmp.start' e000 to 'tmp.end' efff trying 'tmp.start' f200 to 'tmp.end' f1ff trying 'tmp.start' f2001000 to 'tmp.end' f200 trying 'tmp.start' f202 to 'tmp.end' f201 trying 'tmp.start' f2021000 to 'tmp.end' f202 trying 'tmp.start' f204 to 'tmp.end' f203 trying 'tmp.start' f2040100 to 'tmp.end' febf trying 'tmp.start' fec00400 to 'tmp.end' fffb trying 'tmp.start' 1 to 'tmp.end' trying 'tmp.start' 1a000 to 'tmp.end' pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit] pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit] (PCI address [0x1c000-0x1] and you can see the BAR is successfully assigned. However, with a 2GB BAR (below), the search fails, but it also never searches beyong 32-bits. Again, all that's changed is the size of the ivshmem region. trying 'tmp.start' 1000 to 'tmp.end' fff trying 'tmp.start' 9f400 to 'tmp.end' 9f3ff trying 'tmp.start' a to 'tmp.end' e trying 'tmp.start' 10 to 'tmp.end' f trying 'tmp.start' dfffd000 to 'tmp.end' dfffcfff pci :00:04.0: BAR 2: can't assign mem (size 0x8000) Is there a limit to PCI BAR sizes or resources? Any pointers or further debugging tips are greatly appreciated. What kernel version are you looking at? latest kvm git, 2.6.36-rc2+ Please add printks to the loop so we can see this-start and this-end. It smells like a truncation issue. Success with a 1GB BAR this-start 1000, this-end 9f3ff this-start 9f400, this-end 9 this-start f, this-end f this-start 10, this-end dfffcfff this-start dfffd000, this-end dfff this-start f000, this-end f1ff this-start f200, this-end f2000fff this-start f201, this-end f201 this-start f202, this-end f2020fff this-start f203, this-end f203 this-start f204, this-end f20400ff this-start fec0, this-end fec003ff this-start fffc, this-end this-start 1, this-end 11fff tmp.start 12000, tmp.end pci :00:04.0: BAR 2: assigned [mem 0x14000-0x17fff 64bit] and when it fails with a 2GB BAR, the following is printed this-start 1000, this-end 9f3ff this-start 9f400, this-end 9 this-start f, this-end f this-start 10, this-end dfffcfff this-start dfffd000, this-end dfff pci :00:04.0: BAR 2: can't assign mem (size 0x8000) I added a few more debug statements and found that in the failure case, the function returns that it found a region (the last one printed before the error). I've added printfs for the two tests in the if that determine when a region is found: if (tmp.start tmp.end tmp.end - tmp.start = size - 1) { new-start = tmp.start; new-end = tmp.start + size - 1; printk(KERN_INFO returning 0\n); return 0; } this-start 1000, this-end 9f3ff tmp.start 8000, tmp.end fff true: 8fff = 7fff this-start 9f400, this-end 9 tmp.start 8000, tmp.end 9f3ff true: 8009f3ff = 7fff this-start f, this-end f tmp.start 8000, tmp.end e true: 800e = 7fff this-start 10, this-end dfffcfff tmp.start 8000, tmp.end f true: 800f = 7fff this-start dfffd000, this-end dfff tmp.start 10, tmp.end dfffcfff true: 10 dfffcfff true: dfefcfff = 7fff returning 0 pci :00:04.0: BAR 2: can't assign mem (size 0x8000) Further to this, it seems tmp.start is getting set to zero by the ALIGN macro 2GB BAR: this-start dfffd000, this-end dfff tmp.start dfffd000 tmp.start 0 tmp.start 10, tmp.end dfffcfff 1GB BAR: this-start dfffd000, this-end dfff tmp.start dfffd000 tmp.start
Re: [Qemu-devel] virtio-serial question
On Fri, Sep 3, 2010 at 2:57 AM, Vasiliy G Tolstov v.tols...@selfip.ru wrote: Hello. Can somebody provide minimal example code that using virtio-serial to communicate with guest os? (work on guest side and hypervisor side) Hi Vasiliy, There are basic examples here: http://fedoraproject.org/wiki/Features/VirtioSerial Cam -- Vasiliy G Tolstov v.tols...@selfip.ru Selfip.Ru
[Qemu-devel] Guest cannot handle a PCI BAR 1GB
Hi, I'm trying to test 2 GB (and eventually larger) BARs with ivshmem and I get an error in the guest that it is able to find a mem resource for a BAR larger than 1GB. I'm using 64-bit BARs. when running with 6GB of RAM and a 1GB BAR for ivshmem, it finds a resource (and searches beyond 32-bit values to find it). Here is a log from printfs added to the loop that searches the resources from find_resource() in kernel/resource.c:363. trying 'tmp.start' 1000 to 'tmp.end' fff trying 'tmp.start' 9f400 to 'tmp.end' 9f3ff trying 'tmp.start' a to 'tmp.end' e trying 'tmp.start' 10 to 'tmp.end' f trying 'tmp.start' dfffd000 to 'tmp.end' dfffcfff trying 'tmp.start' e000 to 'tmp.end' efff trying 'tmp.start' f200 to 'tmp.end' f1ff trying 'tmp.start' f2001000 to 'tmp.end' f200 trying 'tmp.start' f202 to 'tmp.end' f201 trying 'tmp.start' f2021000 to 'tmp.end' f202 trying 'tmp.start' f204 to 'tmp.end' f203 trying 'tmp.start' f2040100 to 'tmp.end' febf trying 'tmp.start' fec00400 to 'tmp.end' fffb trying 'tmp.start' 1 to 'tmp.end' trying 'tmp.start' 1a000 to 'tmp.end' pci :00:04.0: BAR 2: assigned [mem 0x1c000-0x1 64bit] pci :00:04.0: BAR 2: set to [mem 0x1c000-0x1 64bit] (PCI address [0x1c000-0x1] and you can see the BAR is successfully assigned. However, with a 2GB BAR (below), the search fails, but it also never searches beyong 32-bits. Again, all that's changed is the size of the ivshmem region. trying 'tmp.start' 1000 to 'tmp.end' fff trying 'tmp.start' 9f400 to 'tmp.end' 9f3ff trying 'tmp.start' a to 'tmp.end' e trying 'tmp.start' 10 to 'tmp.end' f trying 'tmp.start' dfffd000 to 'tmp.end' dfffcfff pci :00:04.0: BAR 2: can't assign mem (size 0x8000) Is there a limit to PCI BAR sizes or resources? Any pointers or further debugging tips are greatly appreciated. Thanks, Cam
[Qemu-devel] Re: [PATCH] hw/ivshmem.c don't check for negative values on unsigned data types
On Mon, Aug 30, 2010 at 4:31 AM, jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com There is no need to check for dest 0 or vector = 0 as both are uint16_t. This should fix problems with broken build with aggressive compiler flags. Reported by Xudong Hao xudong@intel.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- hw/ivshmem.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index bbb5cba..afebbc3 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -199,13 +199,13 @@ static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr, case DOORBELL: /* check that dest VM ID is reasonable */ - if ((dest 0) || (dest s-max_peer)) { + if (dest s-max_peer) { IVSHMEM_DPRINTF(Invalid destination VM ID (%d)\n, dest); break; } /* check doorbell range */ - if ((vector = 0) (vector s-peers[dest].nb_eventfds)) { + if (vector s-peers[dest].nb_eventfds) { IVSHMEM_DPRINTF(Writing % PRId64 to VM %d on vector %d\n, write_one, dest, vector); if (write(s-peers[dest].eventfds[vector], -- 1.7.2.2 Acked-by: Cam Macdonell c...@cs.ualberta.ca I sent a patch yesterday that addressed it, but it is probably better to remove the tests that switched to signed 16-bit ints. Anthony, please apply this one. Cam
[Qemu-devel] Re: [PATCH] hw/ivshmem.c don't check for negative values on unsigned data types
On Tue, Aug 31, 2010 at 6:51 PM, Hao, Xudong xudong@intel.com wrote: Hao, Xudong wrote: jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com There is no need to check for dest 0 or vector = 0 as both are uint16_t. This should fix problems with broken build with aggressive compiler flags. Reported by Xudong Hao xudong@intel.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- hw/ivshmem.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index bbb5cba..afebbc3 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -199,13 +199,13 @@ static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr, case DOORBELL: /* check that dest VM ID is reasonable */ - if ((dest 0) || (dest s-max_peer)) { + if (dest s-max_peer) { IVSHMEM_DPRINTF(Invalid destination VM ID (%d)\n, dest); break; } /* check doorbell range */ - if ((vector = 0) (vector s-peers[dest].nb_eventfds)) { + if (vector s-peers[dest].nb_eventfds) { IVSHMEM_DPRINTF(Writing % PRId64 to VM %d on vector %d\n, write_one, dest, vector); if (write(s-peers[dest].eventfds[vector], This patch works for me. Jes, correct result is this patch works for me on x86_64 system. However, in i386 system, there is another bug: ... CC x86_64-softmmu/ivshmem.o CC x86_64-softmmu/fpu/softfloat-native.o CC x86_64-softmmu/op_helper.o cc1: warnings being treated as errors /home/build/gitrepo/qemu/hw/ivshmem.c: In function 'check_shm_size. /home/build/gitrepo/qemu/hw/ivshmem.c:357: warning: format '%ld' expects type 'long int', but argt 5 has type '__off64_t make[1]: *** [ivshmem.o] Error 1 make[1]: *** Waiting for unfinished jobs make: *** [subdir-x86_64-softmmu] Error 2 Hmm, that was causing problems on 32-bit systems, not 64-bit. Is this with gcc 4.1.2? Please the try the following patch from Avi that should add the necessary cast. http://www.mail-archive.com/qemu-devel@nongnu.org/msg40715.html Cam
[Qemu-devel] Re: [PATCH] hw/ivshmem.c don't check for negative values on unsigned data types
On Tue, Aug 31, 2010 at 9:56 PM, Cam Macdonell c...@cs.ualberta.ca wrote: On Tue, Aug 31, 2010 at 6:51 PM, Hao, Xudong xudong@intel.com wrote: Hao, Xudong wrote: jes.soren...@redhat.com wrote: From: Jes Sorensen jes.soren...@redhat.com There is no need to check for dest 0 or vector = 0 as both are uint16_t. This should fix problems with broken build with aggressive compiler flags. Reported by Xudong Hao xudong@intel.com Signed-off-by: Jes Sorensen jes.soren...@redhat.com --- hw/ivshmem.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index bbb5cba..afebbc3 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -199,13 +199,13 @@ static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr, case DOORBELL: /* check that dest VM ID is reasonable */ - if ((dest 0) || (dest s-max_peer)) { + if (dest s-max_peer) { IVSHMEM_DPRINTF(Invalid destination VM ID (%d)\n, dest); break; } /* check doorbell range */ - if ((vector = 0) (vector s-peers[dest].nb_eventfds)) { + if (vector s-peers[dest].nb_eventfds) { IVSHMEM_DPRINTF(Writing % PRId64 to VM %d on vector %d\n, write_one, dest, vector); if (write(s-peers[dest].eventfds[vector], This patch works for me. Jes, correct result is this patch works for me on x86_64 system. However, in i386 system, there is another bug: ... CC x86_64-softmmu/ivshmem.o CC x86_64-softmmu/fpu/softfloat-native.o CC x86_64-softmmu/op_helper.o cc1: warnings being treated as errors /home/build/gitrepo/qemu/hw/ivshmem.c: In function 'check_shm_size. /home/build/gitrepo/qemu/hw/ivshmem.c:357: warning: format '%ld' expects type 'long int', but argt 5 has type '__off64_t make[1]: *** [ivshmem.o] Error 1 make[1]: *** Waiting for unfinished jobs make: *** [subdir-x86_64-softmmu] Error 2 Hmm, that was causing problems on 32-bit systems, not 64-bit. Is this with gcc 4.1.2? Oh sorry, I read too fast, you did say it was on 32-bit. Avi's patch will do the trick. Cam
[Qemu-devel] [PATCH] Use signed 16-bit values for ivshmem register writes
fixes gcc 4.1 warning Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- hw/ivshmem.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ivshmem.c b/hw/ivshmem.c index bbb5cba..fa9c684 100644 --- a/hw/ivshmem.c +++ b/hw/ivshmem.c @@ -181,8 +181,8 @@ static void ivshmem_io_writel(void *opaque, target_phys_addr_t addr, IVShmemState *s = opaque; uint64_t write_one = 1; -uint16_t dest = val 16; -uint16_t vector = val 0xff; +int16_t dest = val 16; +int16_t vector = val 0xff; addr = 0xfc; -- 1.6.6.1
Re: [Qemu-devel] Template for developing a Qemu device with PCIe and MSI-X
On Wed, Aug 25, 2010 at 4:39 PM, Adnan Khaleel ad...@khaleel.us wrote: Hi Isaku, I've made some progress in coding the device template but its no where near complete. I've created some files and am attaching it to this note. Based on what I could gather from the pcie source files I've made a stab at creating a simple model. I've also attached a file for a simple pci device that works under regular Qemu. I would like to duplicate its functionality in your pcie environment for starters. Could you please take a look at the files I've created and tell me if I've understood your pcie model correctly. Any help will be truly appreciated. Adnan Hi Adnan, There is a fairly simple device I've created called ivshmem that is in the qemu git tree. It is a regular PCI device that exports a shared memory object via a BAR and supports a few registers and optional MSI-X interrupts (I had to pick through the virtio code to get MSI-X working, so looking at ivshmem might save you some effort). My device is somewhat similar to a graphics card actually which I recall is your goal. The purpose of ivshmem is to support sharing memory between multiple guests running on the same host. It follows the qdev model which you will need to do. Cam The five files I've modified from your git repository are as follows hw/pci_ids.h // Added vendor id defines hw/pc_q35.c // Device instantiation hw/pcie_msix_template.h // Device header file hw/pcie_msix_template.c // Device file Makefile.objs // Added pcie_msix_template.o to list of objects being built Everything should compile without any warnings or errors. The last file: sc_link_pci.c Is the original PCI device that I'm trying to convert into being PCIe and MSI-X and is included merely for reference to help you understand what I'd like to achieve in your environment. From: Isaku Yamahata [mailto:yamah...@valinux.co.jp] To: Adnan Khaleel [mailto:ad...@khaleel.us] Cc: qemu-devel@nongnu.org Sent: Wed, 18 Aug 2010 22:19:04 -0500 Subject: Re: [Qemu-devel] Template for developing a Qemu device with PCIe and MSI-X On Wed, Aug 18, 2010 at 02:10:10PM -0500, Adnan Khaleel wrote: Hello Qemu developers, I'm interested in developing a device model that plugs into Qemu that is based on a PCIe interface and uses MSI-X. My goal is to ultimately attach a GPU simulator to this PCIe interface and use the entire platfom (Qemu + GPU simulator) for studying cpu, gpu interactions. I'm not terribly familiar with the Qemu device model and I'm looking for some assistance, perhaps a starting template for pcie and msi-x that would offer the basic functionality that I could then build upon. I have looked at the various devices that already modelled that are included with Qemu (v0.12.5 at least) and I've noticed several a few pci devices, eg; ne2k and cirrus-pci etc, however only one device truly seems to utilize both the technologies that I'm interested in and that is the virtio-pci.c I'm not sure what virtio-pci does so I'm not sure if that is a suitable starting point for me. Any help, suggestions etc would be extremely helpful and much appreciated. Qemu doesn't support pcie at the moment. Only partial patches have been merged, still more patches have to be merged for pcie to fully work. The following repo is available. git clone http://people.valinux.co.jp/~yamahata/qemu/q35/qemu git clone http://people.valinux.co.jp/~yamahata/qemu/q35/seabios git clone http://people.valinux.co.jp/~yamahata/qemu/q35/vgabios Note: patched seabios and vgabios are needed, you have to pass ACPI DSDT for q35. example: qemu-system-x86_64 -M pc_q35 -acpitable load_header,data=roms/seabios/src/q35-acpi-dsdt.aml This repo is for those who want to try/develop pcie support, not for upstream merge. So they include patches unsuitable for upstream. The repo includes pcie port switch emulator which utilize pcie and MSI(not MSI-X). The difference between PCI device and PCIe device is configuration space size. By setting PCIDeviceInfo::is_express = 1, you'll get 4K configuration space. Helper functions for pcie are found in qemu/hw/pcie.c For msi-x, see qemu/hw/msix.c. Thanks, -- yamahata
Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Tue, Jul 20, 2010 at 9:49 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: Added Cc: seab...@seabios.org On Wed, Jul 21, 2010 at 06:31:01AM +0300, Michael S. Tsirkin wrote: On Tue, Jul 20, 2010 at 06:52:23PM +0900, Isaku Yamahata wrote: On Wed, Jul 14, 2010 at 09:10:28AM -0600, Cam Macdonell wrote: On Tue, Jul 13, 2010 at 8:52 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Tue, Jul 13, 2010 at 04:48:19PM -0600, Cam Macdonell wrote: On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote: Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also thinks the second half of the BAR is an I/O region instead of memory (hence the c200, that's part of the pci portio region. I've sent the patches to address it. But they haven't been merged yet. seabios doesn't map BARs beyond 4GB. If bar is mapped beyond 4GB, guest BIOS does it. Have those patches been merged yet? They have been merged into seabios upstream now. qemu seabios fork hasn't pulled for a while, though. To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL in config.h of seabios Where does the output from seabios end up? ?Inside dmesg? It outputs them to the serial console which qemu emulates. seabios is out of kernel control, so dmesg doesn't show it. pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) seabios BAR3. Not sure how it is mapped from this message. Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and BAR3 to store all 64-bits? Yes. Seabios misbehaves. 64bit bar is(was) a missing feature. -- yamahata With the latest seabios git passed via -bios, I no longer see the 48-bit address, but instead a 32-bit address and then . ?This guest has 1gb of RAM so the address isn't be mapped beyond 4g. Can I see the debug log like before? (hopefully seabios with CONFIG_DEBUG_LEVEL enabled.) Here's the dump from SeaBIOS in the region related to the PCI devices. The SeaBIOS output is identical whether the BAR is 32-bit or 64-bit. PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8 region 0: 0xf000 region 1: 0xf200 region 6: 0xf201 PCI: bus=0 devfn=0x18: vendor_id=0x1af4 device_id=0x1000 region 0: 0xc020 region 1: 0xf202 region 6: 0xf203 PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110 region 0: 0xf204 region 1: 0xf2041000 region 2: 0x Is this region (region 2 of devfn=0x20: vendor_id=0x1af4 device_id=0x1110) the BAR in quistion? The value 0 seems odd. Probably BAR address calculation overflowed. Currently seabios doesn't check overflow. I attached the patch. Do you know who sets the BAR to ? Here are the config reads/writes related to the 0x18/1c, the 'IVSHMEM' lines are from the map function passed to pci_register_bar(). It looks like SeaBIOS sets the address to 0 and then the potentially useful e000 address gets mangled into 00. There is something wrong with the debug message of write case, I suppose. All written value are 0, but the resulted effect doesn't seems so. IVSHMEM: guest pci addr = 0, guest h/w addr = 1090912256, size = 536870912 ...snip... pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000 If 0 is written to 0x18, the bar address should be 0, but it says e000. pci_read_config: (val) 0xe004 - 0x18 (addr) The read value isn't 0. and so on... pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) IVSHMEM: guest pci addr = , guest h/w addr = 1090912256, size = 2000 pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) and with the 64-bit guest I get this error as well (recall the guest fails to boot on 64-bit) BUG: kvm_dirty_pages_log_change: invalid parameters f000-f0ff diff --git a/src/pciinit.c b/src/pciinit.c index b110531..6eca2ce 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -90,7 +90,8 @@ static int pci_bios_allocate_region(u16 bdf, int region_num) /* If pci_bios_prefmem_addr == 0, keep old behaviour
[Qemu-devel] Re: [qemu-kvm PATCH 0/3] small qemu-kvm cleanups
On Mon, Aug 23, 2010 at 12:49 AM, Avi Kivity a...@redhat.com wrote: On 08/23/2010 09:45 AM, Paolo Bonzini wrote: On 08/17/2010 01:29 PM, Avi Kivity wrote: On 08/12/2010 06:29 PM, Paolo Bonzini wrote: Nothing earth shattering. :) Paolo Bonzini (3): move kvm_set_irqfd to kvm-stub.c This touches kvm-all.c, so should be against uq/master. kvm_set_irqfd is not in upstream qemu, should I add it there even though it is unused? I think Cam (copied) wants it for nahanni devices, so it's a good opportunity. Yes, I have a patch to make use of kvm_set_irqfd in nahanni, so it would be great to have kvm_set_irqfd in qemu. Cam -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] Re: Latest version in Git doesn't link - any ideas?
Yup, it was my patch. Sorry about that. Patches are in the list, just need to be merged. Cam On Sat, Aug 14, 2010 at 4:18 PM, Stefan Weil w...@mail.berlios.de wrote: Am 14.08.2010 23:35, schrieb Nigel Horne: I've seen some activity in this area of late, but the code still fails to link. Any clues anyone? make distclean ./configure --enable-linux-aio --enable-io-thread --enable-kvm make ... LINK arm-softmmu/qemu-system-arm ivshmem.o: In function `ivshmem_mmio_map': ivshmem.c:(.text+0x80f): undefined reference to `kvm_set_ioeventfd_mmio_long' ivshmem.o: In function `ivshmem_read': ivshmem.c:(.text+0x9fc): undefined reference to `kvm_set_ioeventfd_mmio_long' ivshmem.c:(.text+0xa6c): undefined reference to `kvm_set_ioeventfd_mmio_long' collect2: ld returned 1 exit status make[1]: *** [qemu-system-arm] Error 1 make: *** [subdir-arm-softmmu] Error 2 -Nigel Try changing the entry for ivshmem.o (which is not needed for targets without kvm) in Makefile.target like this: obj-$(CONFIG_KVM) += ivshmem.o Regards Stefan
[Qemu-devel] [PATCH 1/2] RESEND: Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- kvm-stub.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/kvm-stub.c b/kvm-stub.c index 3378bd3..d45f9fa 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) { return -ENOSYS; } + +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign) +{ +return -ENOSYS; +} -- 1.6.2.5
[Qemu-devel] [PATCH 2/2] RESEND: Disable build of ivshmem on non-KVM systems
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile.target b/Makefile.target index b791492..c8281e9 100644 --- a/Makefile.target +++ b/Makefile.target @@ -191,7 +191,7 @@ obj-y += rtl8139.o obj-y += e1000.o # Inter-VM PCI shared memory -obj-y += ivshmem.o +obj-$(CONFIG_KVM) += ivshmem.o # Hardware support obj-i386-y += vga.o -- 1.6.2.5
Re: [Qemu-devel] [PATCH 1/2] Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems
On Sat, Aug 14, 2010 at 11:24 AM, Andreas Färber andreas.faer...@web.de wrote: Am 11.08.2010 um 20:16 schrieb Cam Macdonell: --- kvm-stub.c | 5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/kvm-stub.c b/kvm-stub.c index 3378bd3..d45f9fa 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) { return -ENOSYS; } + +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign) +{ + return -ENOSYS; +} -- 1.6.2.5 Acked-by: Andreas Färber andreas.faer...@web.de This fixes linking [1] on Mac OS X. The patch is missing a Signed-off-by though, same for 2/2. I've re-sent with the sign-off. Cam Andreas [1] LINK i386-softmmu/qemu Undefined symbols: _kvm_set_ioeventfd_mmio_long, referenced from: _ivshmem_read in ivshmem.o _ivshmem_read in ivshmem.o _ivshmem_mmio_map in ivshmem.o ld: symbol(s) not found collect2: ld returned 1 exit status make[1]: *** [qemu] Error 1 make: *** [subdir-i386-softmmu] Error 2
Re: [Qemu-devel] [PATCH v8 5/5] RESEND: Inter-VM shared memory PCI device
On Wed, Aug 11, 2010 at 4:35 AM, Stefan Weil w...@mail.berlios.de wrote: Am 27.07.2010 18:54, schrieb Cam Macdonell: resend for bug fix related to removal of irqfd Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id The shared memory server, sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target | 3 + hw/ivshmem.c | 828 +++ qemu-char.c | 6 + qemu-char.h | 3 + qemu-doc.texi | 43 +++ 5 files changed, 883 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c Hi, hw/ivshmem.c breaks compilation on 32 bit hosts, for targets without kvm support and for win32 environments. I sent patches to qemu-devel which fix the first two problems. The win32 problems (missing mmap, maybe more) remain. Could you please fix them? Could that be accomplished with excluding it on Windows on the makefiles? Thanks, Cam Regards Stefan
[Qemu-devel] Re: [PATCH 2/2] ivshmem: Fix compilation without kvm
On Wed, Aug 11, 2010 at 1:05 PM, Paolo Bonzini pbonz...@redhat.com wrote: On 08/11/2010 03:38 AM, Stefan Weil wrote: kvm_set_ioeventfd_mmio_long is only available with CONFIG_KVM. We should just disable ivshmem for non-KVM diff --git a/Makefile.target b/Makefile.target index b791492..c8281e9 100644 --- a/Makefile.target +++ b/Makefile.target @@ -191,7 +191,7 @@ obj-y += rtl8139.o obj-y += e1000.o # Inter-VM PCI shared memory -obj-y += ivshmem.o +obj-$(CONFIG_KVM) += ivshmem.o # Hardware support obj-i386-y += vga.o because it is also breaking Windows builds. Alternatively, the right way to do what this patch does, is to add kvm_set_ioeventfd_mmio_long to kvm-stub.c, and to use obj-$(CONFIG_POSIX) += ivshmem.o in the makefile to work around the Windows build problems. I think we decided to disable it on non-KVM systems to avoid people stumbling into atomicity issues with emulation. Paolo
[Qemu-devel] [PATCH 1/2] Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems
--- kvm-stub.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/kvm-stub.c b/kvm-stub.c index 3378bd3..d45f9fa 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) { return -ENOSYS; } + +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign) +{ +return -ENOSYS; +} -- 1.6.2.5
[Qemu-devel] [PATCH 2/2] Disable build of ivshmem on non-KVM systems
--- Makefile.target |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile.target b/Makefile.target index b791492..c8281e9 100644 --- a/Makefile.target +++ b/Makefile.target @@ -191,7 +191,7 @@ obj-y += rtl8139.o obj-y += e1000.o # Inter-VM PCI shared memory -obj-y += ivshmem.o +obj-$(CONFIG_KVM) += ivshmem.o # Hardware support obj-i386-y += vga.o -- 1.6.2.5
Re: [Qemu-devel] [PATCH 1/2] Add kvm_set_ioeventfd_mmio_long definition for non-KVM systems
On Wed, Aug 11, 2010 at 2:28 PM, Stefan Weil w...@mail.berlios.de wrote: Am 11.08.2010 20:16, schrieb Cam Macdonell: --- kvm-stub.c | 5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/kvm-stub.c b/kvm-stub.c index 3378bd3..d45f9fa 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -136,3 +136,8 @@ int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) { return -ENOSYS; } + +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign) +{ + return -ENOSYS; +} Your 2nd patch disables build of ivshmem.o on non-kvm systems. Only ivshmem.c was using kvm_set_ioeventfd_mmio_long, so up to now, no dummy function in kvm-stub.c is needed. Right. It can be left out for now if that's preferred. Cam
[Qemu-devel] [PATCH v8 5/5] RESEND: Inter-VM shared memory PCI device
resend for bug fix related to removal of irqfd Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id The shared memory server, sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target |3 + hw/ivshmem.c| 828 +++ qemu-char.c |6 + qemu-char.h |3 + qemu-doc.texi | 43 +++ 5 files changed, 883 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index 8a9c427..b791492 100644 --- a/Makefile.target +++ b/Makefile.target @@ -190,6 +190,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o obj-y += rtl8139.o obj-y += e1000.o +# Inter-VM PCI shared memory +obj-y += ivshmem.o + # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..bbb5cba --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,828 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell c...@cs.ualberta.ca + * + * Based On: cirrus_vga.c + * Copyright (c) 2004 Fabrice Bellard + * Copyright (c) 2004 Makoto Suzuki (suzu) + * + * and rtl8139.c + * Copyright (c) 2006 Igor Kovalenko + * + * This code is licensed under the GNU GPL v2. + */ +#include hw.h +#include pc.h +#include pci.h +#include msix.h +#include kvm.h + +#include sys/mman.h +#include sys/types.h + +#define IVSHMEM_IOEVENTFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_PEER0 +#define IVSHMEM_MASTER 1 + +#define IVSHMEM_REG_BAR_SIZE 0x100 + +//#define DEBUG_IVSHMEM +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, ...)\ +do {printf(IVSHMEM: fmt, ## __VA_ARGS__); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, ...) +#endif + +typedef struct Peer { +int nb_eventfds; +int *eventfds; +} Peer; + +typedef struct EventfdEntry { +PCIDevice *pdev; +int vector; +} EventfdEntry; + +typedef struct IVShmemState { +PCIDevice dev; +uint32_t intrmask; +uint32_t intrstatus; +uint32_t doorbell; + +CharDriverState **eventfd_chr; +CharDriverState *server_chr; +int ivshmem_mmio_io_addr; + +pcibus_t mmio_addr; +pcibus_t shm_pci_addr; +uint64_t ivshmem_offset; +uint64_t ivshmem_size; /* size of shared memory region */ +int shm_fd; /* shared memory file descriptor */ + +Peer *peers; +int nb_peers; /* how many guests we have space for */ +int max_peer; /* maximum numbered peer */ + +int vm_id; +uint32_t vectors; +uint32_t features; +EventfdEntry *eventfd_table; + +char * shmobj; +char * sizearg; +char * role; +int role_val; /* scalar to avoid multiple string comparisons */ +} IVShmemState; + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +INTRMASK = 0, +INTRSTATUS = 4, +IVPOSITION = 8, +DOORBELL = 12, +}; + +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, +unsigned int feature) { +return (ivs-features (1 feature)); +} + +static inline bool is_power_of_two(uint64_t x) { +return (x (x - 1)) == 0; +} + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +pcibus_t addr, pcibus_t size, int type) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + +s-shm_pci_addr = addr; + +if (s-ivshmem_offset 0) { +cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, +s-ivshmem_offset); +} + +IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = % +PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size); + +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s, int val) +{ +int isr; +isr = (s-intrstatus s-intrmask) 0x; + +/* don't print ISR resets */ +if (isr) { +IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n, + isr ? 1 : 0, s-intrstatus, s-intrmask); +} + +qemu_set_irq(s-dev.irq[0], (isr != 0)); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ +IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val); + +s-intrmask = val; + +ivshmem_update_irq(s, val
Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device
On Mon, Jul 26, 2010 at 7:48 AM, Anthony Liguori anth...@codemonkey.ws wrote: On 06/15/2010 03:23 PM, Cam Macdonell wrote: Latest patch for PCI shared memory device that maps a host shared memory object to be shared between guests Is this against qemu.git or qemu-kvm.git? It depends on functions like qemu_ram_map() which are not present in qemu.git (and are present in qemu-kvm.git). It is against qemu-kvm.git. Is qemu_ram_map() going into qemu.git? Another function I use (and virtio in qemu-kvm uses as well) is kvm_set_irqfd() that is not qemu.git either. Do I need ifdef these functions with CONFIG_KVM? Please advise as to how to handle these two functions and then I can rebase against qemu.git. Thanks, Cam Regards, Anthony Liguori new in this series - replace marking memory from v6 with marking device as unmigratable indicating that it should be unplugged before migration and re-added after. - 'peer' case changed to require removal before migration, only 'master' devices can be migrated while attached. v6 - migration support with 'master' and 'peer' roles for guest to determine who owns memory - modified phys_ram_dirty array for marking memory as not to be migrated v5: - fixed segfault for non-server case - code style fixes - removed limit on the number of guests - shared memory server is now in qemu.git/contrib - made ioeventfd setup function generic - removed interrupts when guest joined (let application handle it) v4: - moved to single Doorbell register and use datamatch to trigger different VMs rather than one register per eventfd - remove writing arbitrary values to eventfds. Only values of 1 are now written to ensure correct usage Cam Macdonell (4): Device specification for shared memory PCI device Add function to assign ioeventfd to MMIO. Support marking a device as non-migratable Inter-VM shared memory PCI device Makefile.target | 3 + docs/specs/ivshmem_device_spec.txt | 96 + hw/hw.h | 1 + hw/ivshmem.c | 823 kvm-all.c | 32 ++ kvm.h | 1 + qemu-char.c | 6 + qemu-char.h | 3 + qemu-doc.texi | 43 ++ savevm.c | 32 ++- 10 files changed, 1037 insertions(+), 3 deletions(-) create mode 100644 docs/specs/ivshmem_device_spec.txt create mode 100644 hw/ivshmem.c
Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device
On Mon, Jul 26, 2010 at 1:51 PM, Avi Kivity a...@redhat.com wrote: On 07/26/2010 10:01 PM, Cam Macdonell wrote: Is this against qemu.git or qemu-kvm.git? It depends on functions like qemu_ram_map() which are not present in qemu.git (and are present in qemu-kvm.git). It is against qemu-kvm.git. Is qemu_ram_map() going into qemu.git? Another function I use (and virtio in qemu-kvm uses as well) is kvm_set_irqfd() that is not qemu.git either. Do I need ifdef these functions with CONFIG_KVM? Please advise as to how to handle these two functions and then I can rebase against qemu.git. Please add qemu_ram_map() as a separate patch to avoid interdependencies. Try to keep it at the same place etc., that will reduce merge difficulties later. qemu_ram_map() isn't my patch, Marcelo pushed it into qemu-kvm, so I dropped my version. Marcelo's version is commit c15414b9 in qemu-kvm.git wrt kvm_set_irqfd(), its usage is optional, yes? if so I recommend just dropping support for it temporarily. I'll work at upstreaming kvm_set_irqfd() so you can re-add this functionality. Yes, kvm_set_irqfd() is optional. I can drop it for now. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device
On Mon, Jul 26, 2010 at 2:11 PM, Avi Kivity a...@redhat.com wrote: On 07/26/2010 11:03 PM, Cam Macdonell wrote: Please add qemu_ram_map() as a separate patch to avoid interdependencies. Try to keep it at the same place etc., that will reduce merge difficulties later. qemu_ram_map() isn't my patch, Marcelo pushed it into qemu-kvm, so I dropped my version. You can pick that code and put it in front of your patch set as a patch, or risk more months of delay while Marcelo, Anthony and myself try to figure out who was supposed to be doing what. Marcelo's version is commit c15414b9 in qemu-kvm.git That appears to be a merge. Ah, the header changed in a merge, the original patch is commit 00d53f24. I'll pick it as part of my patch set as you suggest. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.
Re: [Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device
On Mon, Jul 26, 2010 at 2:41 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 07/26/2010 03:27 PM, Avi Kivity wrote: On 07/26/2010 10:41 PM, Anthony Liguori wrote: kvm_set_irqfd() is fine, it just needs to be ported. It should be there due to vhost though? It should, but isn't. qemu_ram_map() is more difficult. I would think the better approach would be to invert things. Instead of a give me a stable mapping that is shared atomically with a guest to this ram region, I would go with go create a ram region with this preallocated memory and then just assume that afterwards, you can make use of it and it's exposed as atomic memory. Isn't that what qemu_ram_map() does? Yup, name is misleading. I thought it was the equivalent of cpu_physical_memory_map() but took a ram_addr_t instead of a target_ulong. If I add it to my patch set, should I change the name to the suggested qemu_ram_alloc_from_ptr() or keep it as is for consistency? Also, the current version of qemu_ram_map() in qemu-kvm.git uses the DeviceState parameter. Are Alex's DeviceState changes going into 0.13 for the qemu_ram_*() functions?? Cam Regards, Anthony Liguori ram_addr_t qemu_ram_map(DeviceState *dev, const char *name, ram_addr_t size, void *host) 'host' is your this preallocated memory, no?
[Qemu-devel] [PATCH v8 3/5] Add function to assign ioeventfd to MMIO.
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- kvm-all.c | 32 kvm.h |1 + 2 files changed, 33 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 7635f2f..d9a5dd0 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1241,6 +1241,38 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset) return r; } +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign) +{ +#ifdef KVM_IOEVENTFD +int ret; +struct kvm_ioeventfd iofd; + +iofd.datamatch = val; +iofd.addr = addr; +iofd.len = 4; +iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH; +iofd.fd = fd; + +if (!kvm_enabled()) { +return -ENOSYS; +} + +if (!assign) { +iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; +} + +ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, iofd); + +if (ret 0) { +return -errno; +} + +return 0; +#else +return -ENOSYS; +#endif +} + int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) { #ifdef KVM_IOEVENTFD diff --git a/kvm.h b/kvm.h index 93f8187..50b6c01 100644 --- a/kvm.h +++ b/kvm.h @@ -175,6 +175,7 @@ static inline void cpu_synchronize_post_init(CPUState *env) } #endif +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign); int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign); #endif -- 1.6.3.2.198.g6096d
[Qemu-devel] [PATCH v8 1/5] Add qemu_ram_alloc_from_ptr function
Provide a function to add an allocated region of memory to the qemu RAM. This patch is copied from Marcelo's qemu_ram_map() in qemu-kvm and given the clearer name qemu_ram_alloc_from_ptr(). Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- cpu-common.h |2 ++ exec.c | 43 +++ 2 files changed, 45 insertions(+), 0 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index 71e7933..0426bc8 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -40,6 +40,8 @@ static inline void cpu_register_physical_memory(target_phys_addr_t start_addr, } ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); +ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, +ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ diff --git a/exec.c b/exec.c index 868cd7f..8636316 100644 --- a/exec.c +++ b/exec.c @@ -2808,6 +2808,49 @@ static ram_addr_t last_ram_offset(void) return last; } +ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, +ram_addr_t size, void *host) +{ +RAMBlock *new_block, *block; + +size = TARGET_PAGE_ALIGN(size); +new_block = qemu_mallocz(sizeof(*new_block)); + +if (dev dev-parent_bus dev-parent_bus-info-get_dev_path) { +char *id = dev-parent_bus-info-get_dev_path(dev); +if (id) { +snprintf(new_block-idstr, sizeof(new_block-idstr), %s/, id); +qemu_free(id); +} +} +pstrcat(new_block-idstr, sizeof(new_block-idstr), name); + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strcmp(block-idstr, new_block-idstr)) { +fprintf(stderr, RAMBlock \%s\ already registered, abort!\n, +new_block-idstr); +abort(); +} +} + +new_block-host = host; + +new_block-offset = find_ram_offset(size); +new_block-length = size; + +QLIST_INSERT_HEAD(ram_list.blocks, new_block, next); + +ram_list.phys_dirty = qemu_realloc(ram_list.phys_dirty, + last_ram_offset() TARGET_PAGE_BITS); +memset(ram_list.phys_dirty + (new_block-offset TARGET_PAGE_BITS), + 0xff, size TARGET_PAGE_BITS); + +if (kvm_enabled()) +kvm_setup_guest_memory(new_block-host, size); + +return new_block-offset; +} + ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size) { RAMBlock *new_block, *block; -- 1.6.3.2.198.g6096d
[Qemu-devel] [PATCH v8 0/5] Inter-VM Shared Memory Device
Latest patch for PCI shared memory device that maps a host shared memory object to be shared between guests new in this series v8 - rebased to qemu.git - added qemu_ram_alloc_from_ptr() function - removed irqfd support as support functions are not yet in qemu v7 - replace marking memory from v6 with marking device as unmigratable indicating that it should be unplugged before migration and re-added after. - 'peer' case changed to require removal before migration, only 'master' devices can be migrated while attached. v6 - migration support with 'master' and 'peer' roles for guest to determine who owns memory - modified phys_ram_dirty array for marking memory as not to be migrated v5: - fixed segfault for non-server case - code style fixes - removed limit on the number of guests - shared memory server is now in qemu.git/contrib - made ioeventfd setup function generic - removed interrupts when guest joined (let application handle it) v4: - moved to single Doorbell register and use datamatch to trigger different VMs rather than one register per eventfd - remove writing arbitrary values to eventfds. Only values of 1 are now written to ensure correct usage Cam Macdonell (5): Provide a function to add an allocated region of memory to the qemu RAM. Device specification for shared memory PCI device Add function to assign ioeventfd to MMIO. Support marking a device as non-migratable Inter-VM shared memory PCI device Makefile.target|3 + cpu-common.h |2 + docs/specs/ivshmem_device_spec.txt | 96 exec.c | 43 ++ hw/hw.h|2 + hw/ivshmem.c | 834 kvm-all.c | 32 ++ kvm.h |1 + qemu-char.c|6 + qemu-char.h|3 + qemu-doc.texi | 43 ++ savevm.c | 44 ++- 12 files changed, 1106 insertions(+), 3 deletions(-) create mode 100644 docs/specs/ivshmem_device_spec.txt create mode 100644 hw/ivshmem.c
[Qemu-devel] [PATCH v8 4/5] Support marking a device as non-migratable
A non-migratable device should be removed before migration and re-added after. Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- hw/hw.h |2 ++ savevm.c | 44 +--- 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index c2de6fe..28b4204 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -264,6 +264,8 @@ int register_savevm_live(DeviceState *dev, void *opaque); void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque); +void register_device_unmigratable(DeviceState *dev, const char *idstr, +void *opaque); typedef void QEMUResetHandler(void *opaque); diff --git a/savevm.c b/savevm.c index ee27989..6d55d28 100644 --- a/savevm.c +++ b/savevm.c @@ -1005,6 +1005,7 @@ typedef struct SaveStateEntry { const VMStateDescription *vmsd; void *opaque; CompatEntry *compat; +int no_migrate; } SaveStateEntry; @@ -1068,6 +1069,7 @@ int register_savevm_live(DeviceState *dev, se-load_state = load_state; se-opaque = opaque; se-vmsd = NULL; +se-no_migrate = 0; if (dev dev-parent_bus dev-parent_bus-info-get_dev_path) { char *id = dev-parent_bus-info-get_dev_path(dev); @@ -1131,6 +1133,31 @@ void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque) } } +/* mark a device as not to be migrated, that is the device should be + unplugged before migration */ +void register_device_unmigratable(DeviceState *dev, const char *idstr, +void *opaque) +{ +SaveStateEntry *se; +char id[256] = ; + +if (dev dev-parent_bus dev-parent_bus-info-get_dev_path) { +char *path = dev-parent_bus-info-get_dev_path(dev); +if (path) { +pstrcpy(id, sizeof(id), path); +pstrcat(id, sizeof(id), /); +qemu_free(path); +} +} +pstrcat(id, sizeof(id), idstr); + +QTAILQ_FOREACH(se, savevm_handlers, entry) { +if (strcmp(se-idstr, id) == 0 se-opaque == opaque) { +se-no_migrate = 1; +} +} +} + int vmstate_register_with_alias_id(DeviceState *dev, int instance_id, const VMStateDescription *vmsd, void *opaque, int alias_id, @@ -1323,13 +1350,19 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id) return vmstate_load_state(f, se-vmsd, se-opaque, version_id); } -static void vmstate_save(QEMUFile *f, SaveStateEntry *se) +static int vmstate_save(QEMUFile *f, SaveStateEntry *se) { +if (se-no_migrate) { +return -1; +} + if (!se-vmsd) { /* Old style */ se-save_state(f, se-opaque); -return; +return 0; } vmstate_save_state(f,se-vmsd, se-opaque); + +return 0; } #define QEMU_VM_FILE_MAGIC 0x5145564d @@ -1423,6 +1456,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f) int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) { SaveStateEntry *se; +int r; cpu_synchronize_all_states(); @@ -1455,7 +1489,11 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) qemu_put_be32(f, se-instance_id); qemu_put_be32(f, se-version_id); -vmstate_save(f, se); +r = vmstate_save(f, se); +if (r 0) { +monitor_printf(mon, cannot migrate with device '%s'\n, se-idstr); +return r; +} } qemu_put_byte(f, QEMU_VM_EOF); -- 1.6.3.2.198.g6096d
[Qemu-devel] [PATCH v8 2/5] Device specification for shared memory PCI device
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- docs/specs/ivshmem_device_spec.txt | 96 1 files changed, 96 insertions(+), 0 deletions(-) create mode 100644 docs/specs/ivshmem_device_spec.txt diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt new file mode 100644 index 000..23dd2ba --- /dev/null +++ b/docs/specs/ivshmem_device_spec.txt @@ -0,0 +1,96 @@ + +Device Specification for Inter-VM shared memory device +-- + +The Inter-VM shared memory device is designed to share a region of memory to +userspace in multiple virtual guests. The memory region does not belong to any +guest, but is a POSIX memory object on the host. Optionally, the device may +support sending interrupts to other guests sharing the same memory region. + + +The Inter-VM PCI device +--- + +*BARs* + +The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support +registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is +used to map the shared memory object from the host. The size of BAR2 is +specified when the guest is started and must be a power of 2 in size. + +*Registers* + +The device currently supports 4 registers of 32-bits each. Registers +are used for synchronization between guests sharing the same memory object when +interrupts are supported (this requires using the shared memory server). + +The server assigns each VM an ID number and sends this ID number to the Qemu +process when the guest starts. + +enum ivshmem_registers { +IntrMask = 0, +IntrStatus = 4, +IVPosition = 8, +Doorbell = 12 +}; + +The first two registers are the interrupt mask and status registers. Mask and +status are only used with pin-based interrupts. They are unused with MSI +interrupts. + +Status Register: The status register is set to 1 when an interrupt occurs. + +Mask Register: The mask register is bitwise ANDed with the interrupt status +and the result will raise an interrupt if it is non-zero. However, since 1 is +the only value the status will be set to, it is only the first bit of the mask +that has any effect. Therefore interrupts can be masked by setting the first +bit to 0 and unmasked by setting the first bit to 1. + +IVPosition Register: The IVPosition register is read-only and reports the +guest's ID number. The guest IDs are non-negative integers. When using the +server, since the server is a separate process, the VM ID will only be set when +the device is ready (shared memory is received from the server and accessible via +the device). If the device is not ready, the IVPosition will return -1. +Applications should ensure that they have a valid VM ID before accessing the +shared memory. + +Doorbell Register: To interrupt another guest, a guest must write to the +Doorbell register. The doorbell register is 32-bits, logically divided into +two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low +16-bits are the interrupt vector to trigger. The semantics of the value +written to the doorbell depends on whether the device is using MSI or a regular +pin-based interrupt. In short, MSI uses vectors while regular interrupts set the +status register. + +Regular Interrupts + +If regular interrupts are used (due to either a guest not supporting MSI or the +user specifying not to use them on startup) then the value written to the lower +16-bits of the Doorbell register results is arbitrary and will trigger an +interrupt in the destination guest. + +Message Signalled Interrupts + +A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits +written to the Doorbell register must be between 0 and the maximum number of +vectors the guest supports. The lower 16 bits written to the doorbell is the +MSI vector that will be raised in the destination guest. The number of MSI +vectors is configurable but it is set when the VM is started. + +The important thing to remember with MSI is that it is only a signal, no status +is set (since MSI interrupts are not shared). All information other than the +interrupt itself should be communicated via the shared memory region. Devices +supporting multiple MSI vectors can use different vectors to indicate different +events have occurred. The semantics of interrupt vectors are left to the +user's discretion. + + +Usage in the Guest +-- + +The shared memory device is intended to be used with the provided UIO driver. +Very little configuration is needed. The guest should map BAR0 to access the +registers (an array of 32-bit ints allows simple writing) and map BAR2 to +access the shared memory region itself. The size of the shared memory region +is specified when the guest (or shared memory server) is started. A guest may +map the whole shared memory region or only part of it. -- 1.6.3.2.198.g6096d
[Qemu-devel] [PATCH v8 5/5] Inter-VM shared memory PCI device
Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,ioeventfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id The shared memory server, sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target |3 + hw/ivshmem.c| 834 +++ qemu-char.c |6 + qemu-char.h |3 + qemu-doc.texi | 43 +++ 5 files changed, 889 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index 3ef4666..4cb3a00 100644 --- a/Makefile.target +++ b/Makefile.target @@ -188,6 +188,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o obj-y += rtl8139.o obj-y += e1000.o +# Inter-VM PCI shared memory +obj-y += ivshmem.o + # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..3f197ae --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,834 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell c...@cs.ualberta.ca + * + * Based On: cirrus_vga.c + * Copyright (c) 2004 Fabrice Bellard + * Copyright (c) 2004 Makoto Suzuki (suzu) + * + * and rtl8139.c + * Copyright (c) 2006 Igor Kovalenko + * + * This code is licensed under the GNU GPL v2. + */ +#include hw.h +#include pc.h +#include pci.h +#include msix.h +#include kvm.h + +#include sys/mman.h +#include sys/types.h + +#define IVSHMEM_IOEVENTFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_PEER0 +#define IVSHMEM_MASTER 1 + +#define IVSHMEM_REG_BAR_SIZE 0x100 + +//#define DEBUG_IVSHMEM +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, ...)\ +do {printf(IVSHMEM: fmt, ## __VA_ARGS__); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, ...) +#endif + +typedef struct Peer { +int nb_eventfds; +int *eventfds; +} Peer; + +typedef struct EventfdEntry { +PCIDevice *pdev; +int vector; +} EventfdEntry; + +typedef struct IVShmemState { +PCIDevice dev; +uint32_t intrmask; +uint32_t intrstatus; +uint32_t doorbell; + +CharDriverState **eventfd_chr; +CharDriverState *server_chr; +int ivshmem_mmio_io_addr; + +pcibus_t mmio_addr; +pcibus_t shm_pci_addr; +uint64_t ivshmem_offset; +uint64_t ivshmem_size; /* size of shared memory region */ +int shm_fd; /* shared memory file descriptor */ + +Peer *peers; +int nb_peers; /* how many guests we have space for */ +int max_peer; /* maximum numbered peer */ + +int vm_id; +uint32_t vectors; +uint32_t features; +EventfdEntry *eventfd_table; + +char * shmobj; +char * sizearg; +char * role; +int role_val; /* scalar to avoid multiple string comparisons */ +} IVShmemState; + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +INTRMASK = 0, +INTRSTATUS = 4, +IVPOSITION = 8, +DOORBELL = 12, +}; + +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, +unsigned int feature) { +return (ivs-features (1 feature)); +} + +static inline bool is_power_of_two(uint64_t x) { +return (x (x - 1)) == 0; +} + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +pcibus_t addr, pcibus_t size, int type) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + +s-shm_pci_addr = addr; + +if (s-ivshmem_offset 0) { +cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, +s-ivshmem_offset); +} + +IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = % +PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size); + +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s, int val) +{ +int isr; +isr = (s-intrstatus s-intrmask) 0x; + +/* don't print ISR resets */ +if (isr) { +IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n, + isr ? 1 : 0, s-intrstatus, s-intrmask); +} + +qemu_set_irq(s-dev.irq[0], (isr != 0)); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ +IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val); + +s-intrmask = val; + +ivshmem_update_irq(s, val); +} + +static uint32_t ivshmem_IntrMask_read
Re: [Qemu-devel] virtio-9p is not working
On Wed, Jul 21, 2010 at 2:27 AM, Dallas Lee mswpl...@gmail.com wrote: Hi, I have trying to use the virtio-9p for my linux in QEMU, but without success. Here is my option for booting my qemu: i386-softmmu/qemu -kernel bzImage -append console=ttyS0 video=uvesafb:ywrap,overlay:rgb16,480x800...@60 root=/dev/nfs rw nfsroot=10.0.2.2:/root,udp ip=10.0.2.16:eth0:none 5 -net nic,model=virtio -net user -soundhw all -usb -serial telnet:localhost:1200,server -vga std -m 512 -L ./pc-bios -bios bios.bin -virtfs local,path=/home/dallas/nfs,security_model=passthrough,mount_tag=v_tmp The virtio network is working, I could mount the nfs through virio net. And in the guest linux, I tried to mount v9fs by using following command: mount -t 9p -o trans=virtio -o debug=0x v_tmp /mnt but unfortunately I got the error: mount: mounting v_tmp on /mnt failed: No such device And I can't find the v_tmp neither in /sys/devices/virtio-pci/virtio1/ nor in /sys/bus/virtio/drivers/9pnet_virtio/virtio1/ And before building the kernel, I enabled the Plan 9 Ressource Sharing Support under File System/Network File System, I also enabled the following configures: PARAVIRT_GUEST: - Processor type and features - Paravirtualized guest support LGUEST_GUEST: - Processor type and features - Paravirtualized guest support - Lguest guest support VIRTIO_PCI: - Virtualization (VIRTUALIZATION [=y]) - PCI driver for virtio devices VIRTIO_BLK: - Device Drivers - Block devices (BLK_DEV [=y]) - Virtio block driver VIRTIO_NET: - Device Drivers - Network device support (NETDEVICES [=y]) - Virtio network driver Would you please help me to find out the problem why I couldn't mount the v9fs? Thank you very much! BR, Dallas Hi Dallas, what does 'lspci -vv' in the guest show? Is there a device for virtio_9p? do you have CONFIG_NET_9P_VIRTIO=y in your kernel's .config? Cam
Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Tue, Jul 13, 2010 at 8:52 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Tue, Jul 13, 2010 at 04:48:19PM -0600, Cam Macdonell wrote: On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote: Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also thinks the second half of the BAR is an I/O region instead of memory (hence the c200, that's part of the pci portio region. I've sent the patches to address it. But they haven't been merged yet. seabios doesn't map BARs beyond 4GB. If bar is mapped beyond 4GB, guest BIOS does it. Have those patches been merged yet? They have been merged into seabios upstream now. qemu seabios fork hasn't pulled for a while, though. To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL in config.h of seabios Where does the output from seabios end up? ?Inside dmesg? It outputs them to the serial console which qemu emulates. seabios is out of kernel control, so dmesg doesn't show it. pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) seabios BAR3. Not sure how it is mapped from this message. Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and BAR3 to store all 64-bits? Yes. Seabios misbehaves. 64bit bar is(was) a missing feature. -- yamahata With the latest seabios git passed via -bios, I no longer see the 48-bit address, but instead a 32-bit address and then . This guest has 1gb of RAM so the address isn't be mapped beyond 4g. Can I see the debug log like before? (hopefully seabios with CONFIG_DEBUG_LEVEL enabled.) Here's the dump from SeaBIOS in the region related to the PCI devices. The SeaBIOS output is identical whether the BAR is 32-bit or 64-bit. PCI: bus=0 devfn=0x10: vendor_id=0x1013 device_id=0x00b8 region 0: 0xf000 region 1: 0xf200 region 6: 0xf201 PCI: bus=0 devfn=0x18: vendor_id=0x1af4 device_id=0x1000 region 0: 0xc020 region 1: 0xf202 region 6: 0xf203 PCI: bus=0 devfn=0x20: vendor_id=0x1af4 device_id=0x1110 region 0: 0xf204 region 1: 0xf2041000 region 2: 0x Do you know who sets the BAR to ? Here are the config reads/writes related to the 0x18/1c, the 'IVSHMEM' lines are from the map function passed to pci_register_bar(). It looks like SeaBIOS sets the address to 0 and then the potentially useful e000 address gets mangled into 00. IVSHMEM: guest pci addr = 0, guest h/w addr = 1090912256, size = 536870912 ...snip... pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000 pci_read_config: (val) 0xe004 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) IVSHMEM: guest pci addr = , guest h/w addr = 1090912256, size = 2000 pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) and with the 64-bit guest I get this error as well (recall the guest fails to boot on 64-bit) BUG: kvm_dirty_pages_log_change: invalid parameters f000-f0ff If it's seabios, does the following patch help? The patch doesn't make a difference. Perhaps it's Qemu then? diff --git a/src/pciinit.c b/src/pciinit.c index b110531..ce07709 100644 --- a/src/pciinit.c +++ b/src/pciinit.c @@ -117,11 +117,7 @@ static int pci_bios_allocate_region(u16 bdf, int region_num) int is_64bit = !(val PCI_BASE_ADDRESS_SPACE_IO) (val PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64; if (is_64bit) { - if (size 0) { - pci_config_writel(bdf, ofs + 4, 0); - } else { - pci_config_writel(bdf, ofs + 4, ~0); - } + pci_config_writel(bdf, ofs + 4, 0); } return is_64bit; } IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000 IVSHMEM: guest pci addr = , guest h/w addr = 1090912256, size = 2000 However, booting fails when I use a 64-bit BAR. Booting is fine with a 32-bit BAR. HPET: 1 timers in total, 0 timers will be used for per-cpu timer divide error: [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.35-rc1 #280 /Bochs RIP: 0010:[812a801b] [812a801b] hpet_alloc+0x12c/0x35b RSP: 0018:88003e61fd80 EFLAGS: 00010246 RAX: 00038d7ea4c68000 RBX: 88003e6efd80 RCX: RDX: RSI: RDI: 817b8520 RBP
Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Tue, Jun 29, 2010 at 9:29 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Tue, Jun 29, 2010 at 11:48:13AM -0600, Cam Macdonell wrote: On Tue, Jun 29, 2010 at 12:50 AM, Avi Kivity a...@redhat.com wrote: On 06/28/2010 11:38 PM, Cam Macdonell wrote: Is this really the address the guest programmed, or is qemu misinterpreting it? Well, what's the answer? You're going to have to give me a hint on how to determine that. lspci in the guest shows the following Memory at c200 (64-bit, non-prefetchable) [size=1024M] does that demonstrate a guest generated address? That's the result of a round trip: the guest programmed the address and then read it back. ?It could have been screwed up in the first place, or perhaps qemu screwed it up. Add a printf() to the config space handlers in qemu (likely in your own code) on writes and reads, and show the relevant writes (and reads) for this BAR. That's the theory of deductive debugging; however browsing the code shows the guest is at fault: ? ? ? ?for (i = 0; i PCI_NUM_REGIONS; i++) { ? ? ? ? ? ?int ofs; ? ? ? ? ? ?if (i == PCI_ROM_SLOT) ? ? ? ? ? ? ? ?ofs = PCI_ROM_ADDRESS; ? ? ? ? ? ?else ? ? ? ? ? ? ? ?ofs = PCI_BASE_ADDRESS_0 + i * 4; ? ? ? ? ? ?u32 old = pci_config_readl(bdf, ofs); ? ? ? ? ? ?u32 mask; ? ? ? ? ? ?if (i == PCI_ROM_SLOT) { ? ? ? ? ? ? ? ?mask = PCI_ROM_ADDRESS_MASK; ? ? ? ? ? ? ? ?pci_config_writel(bdf, ofs, mask); ? ? ? ? ? ?} else { ? ? ? ? ? ? ? ?if (old PCI_BASE_ADDRESS_SPACE_IO) ? ? ? ? ? ? ? ? ? ?mask = PCI_BASE_ADDRESS_IO_MASK; ? ? ? ? ? ? ? ?else ? ? ? ? ? ? ? ? ? ?mask = PCI_BASE_ADDRESS_MEM_MASK; ? ? ? ? ? ? ? ?pci_config_writel(bdf, ofs, ~0); ? ? ? ? ? ?} ? ? ? ? ? ?u32 val = pci_config_readl(bdf, ofs); ? ? ? ? ? ?pci_config_writel(bdf, ofs, old); ? ? ? ? ? ?if (val != 0) { ? ? ? ? ? ? ? ?u32 size = (~(val mask)) + 1; ? ? ? ? ? ? ? ?if (val PCI_BASE_ADDRESS_SPACE_IO) ? ? ? ? ? ? ? ? ? ?paddr = pci_bios_io_addr; ? ? ? ? ? ? ? ?else ? ? ? ? ? ? ? ? ? ?paddr = pci_bios_mem_addr; ? ? ? ? ? ? ? ?*paddr = ALIGN(*paddr, size); ? ? ? ? ? ? ? ?pci_set_io_region_addr(bdf, i, *paddr); ? ? ? ? ? ? ? ?*paddr += size; ? ? ? ? ? ?} ? ? ? ?} ? ? ? ?break; ? ?} Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also thinks the second half of the BAR is an I/O region instead of memory (hence the c200, that's part of the pci portio region. I've sent the patches to address it. But they haven't been merged yet. seabios doesn't map BARs beyond 4GB. If bar is mapped beyond 4GB, guest BIOS does it. Have those patches been merged yet? To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL in config.h of seabios Where does the output from seabios end up? Inside dmesg? Do post those reads and writes, I think there's more than one thing wrong here. Here it is, I added the debug statements to pci_read_config and pci_default_write_config. here are the reads and writes to offsets 0x18 and 0x1c where a 64-bit BAR2 config would be configured It seems that 0x1c is accessed as BAR3. The write value in the log is always 0. Some comment in the log. pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc004 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc004 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc040 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) the complete read/write profile is below along with debug statements from the map functions for the BARs (prefixed with IVSHMEM) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x0 - 0xe (addr) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x1110 - 0x2 (addr) pci_read_config: (val) 0x0 - 0xe (addr) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x0 - 0xe (addr) pci_read_config: (val) 0x500 - 0xa (addr) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x1110 - 0x2 (addr) pci_read_config: (val) 0x0 - 0x10 (addr) pci_write_config: (val) 0x0 - 0x10 (addr) pci_read_config: (val) 0xff00 - 0x10 (addr) pci_write_config: (val) 0x0 - 0x10 (addr) pci_read_config: (val) 0x0 - 0x10 (addr) pci_write_config: (val) 0x0 - 0x10 (addr) pci_read_config: (val) 0x0
Re: [Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Tue, Jul 13, 2010 at 2:41 PM, Isaku Yamahata yamah...@valinux.co.jp wrote: On Tue, Jul 13, 2010 at 02:05:51PM -0600, Cam Macdonell wrote: Seabios completely ignore the 64-bitness of the BAR. ?Looks like it also thinks the second half of the BAR is an I/O region instead of memory (hence the c200, that's part of the pci portio region. I've sent the patches to address it. But they haven't been merged yet. seabios doesn't map BARs beyond 4GB. If bar is mapped beyond 4GB, guest BIOS does it. Have those patches been merged yet? They have been merged into seabios upstream now. qemu seabios fork hasn't pulled for a while, though. To see how seabios works, it would help to increase CONFIG_DEBUG_LEVEL in config.h of seabios Where does the output from seabios end up? Inside dmesg? It outputs them to the serial console which qemu emulates. seabios is out of kernel control, so dmesg doesn't show it. pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) seabios BAR3. Not sure how it is mapped from this message. Isn't the BAR3 from the fact that a 64-bit BAR would use both BAR2 and BAR3 to store all 64-bits? Yes. Seabios misbehaves. 64bit bar is(was) a missing feature. -- yamahata With the latest seabios git passed via -bios, I no longer see the 48-bit address, but instead a 32-bit address and then . This guest has 1gb of RAM so the address isn't be mapped beyond 4g. IVSHMEM: guest pci addr = e000, guest h/w addr = 1090912256, size = 2000 IVSHMEM: guest pci addr = , guest h/w addr = 1090912256, size = 2000 However, booting fails when I use a 64-bit BAR. Booting is fine with a 32-bit BAR. HPET: 1 timers in total, 0 timers will be used for per-cpu timer divide error: [#1] SMP last sysfs file: CPU 0 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.35-rc1 #280 /Bochs RIP: 0010:[812a801b] [812a801b] hpet_alloc+0x12c/0x35b RSP: 0018:88003e61fd80 EFLAGS: 00010246 RAX: 00038d7ea4c68000 RBX: 88003e6efd80 RCX: RDX: RSI: RDI: 817b8520 RBP: 88003e61fdc0 R08: 80d0 R09: c900 R10: 88003f9ac600 R11: R12: 88003e61fdd0 R13: c900 R14: R15: 817a00a9 FS: () GS:88000200() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: CR3: 01a42000 CR4: 06f0DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process swapper (pid: 1, threadinfo 88003e61e000, task 88003e62) Stack: 88003f43ab90 88003f43ab90 81ca3174 81b1d596 0 0100 0100 0 88003e61fe80 8102945d fed0 c900 Call Trace: [81b1d596] ? hpet_late_init+0x0/0xea [8102945d] hpet_reserve_platform_timers+0x10b/0x115 [81b1d596] ? hpet_late_init+0x0/0xea [81b1d601] hpet_late_init+0x6b/0xea [81b1d596] ? hpet_late_init+0x0/0xea [81002069] do_one_initcall+0x5e/0x159 [81b0b71e] kernel_init+0x18e/0x21c [8100aa24] kernel_thread_helper+0x4/0x10 [81b0b590] ? kernel_init+0x0/0x21c [8100aa20] ? kernel_thread_helper+0x0/0x10 Code: 89 1d 8a 92 b3 00 48 c1 ea 21 8b 73 34 49 c7 c7 a9 00 7a 81 48 8d 04 02 4c 89 f2 48 c7 c7 20 85 7b 81 48 c1 ea 20 48 89 d1 31 d2 48 f7 f1 83 7b 30 01 48 c7 c1 cd fa 7c 81 49 0f 46 cf 48 89 43 RIP [812a801b] hpet_alloc+0x12c/0x35b RSP 88003e61fd80 ---[ end trace a7919e7f17c0a725 ]--- Kernel panic - not syncing: Attempted to kill init! Pid: 1, comm: swapper Tainted: G D 2.6.35-rc1 #280 Call Trace: [8145b870] panic+0x8b/0x10b [81056a93] ? exit_ptrace+0x38/0x121 [8104f9e8] do_exit+0x7a/0x722 [8104c3bd] ? spin_unlock_irqrestore+0xe/0x10 [8104cfd8] ? kmsg_dump+0x12b/0x145 [8145eaa9] oops_end+0xbf/0xc7 [8100d299] die+0x5a/0x63 [8145e4b3] do_trap+0x121/0x130 [8100b560] do_divide_error+0x96/0x9f [812a801b] ? hpet_alloc+0x12c/0x35b [8100a83b] divide_error+0x1b/0x20 [812a801b] ? hpet_alloc+0x12c/0x35b [81b1d596] ? hpet_late_init+0x0/0xea [8102945d] hpet_reserve_platform_timers+0x10b/0x115 [81b1d596] ? hpet_late_init+0x0/0xea [81b1d601] hpet_late_init+0x6b/0xea [81b1d596] ? hpet_late_init+0x0/0xea [81002069] do_one_initcall+0x5e/0x159 [81b0b71e] kernel_init+0x18e/0x21c [8100aa24] kernel_thread_helper+0x4/0x10
[Qemu-devel] [PATCH v7 RESEND 4/4] Inter-VM shared memory PCI device
Resent (again): Some lines were over 80 characters and debugging is now off. Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id The shared memory server, sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target |3 + hw/ivshmem.c| 842 +++ qemu-char.c |6 + qemu-char.h |3 + qemu-doc.texi | 43 +++ 5 files changed, 897 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index a0e9747..1e99ec8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -203,6 +203,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o obj-y += rtl8139.o obj-y += e1000.o +# Inter-VM PCI shared memory +obj-y += ivshmem.o + # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..763b9c2 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,842 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell c...@cs.ualberta.ca + * + * Based On: cirrus_vga.c + * Copyright (c) 2004 Fabrice Bellard + * Copyright (c) 2004 Makoto Suzuki (suzu) + * + * and rtl8139.c + * Copyright (c) 2006 Igor Kovalenko + * + * This code is licensed under the GNU GPL v2. + */ +#include hw.h +#include pc.h +#include pci.h +#include msix.h +#include kvm.h + +#include sys/mman.h +#include sys/types.h + +#define IVSHMEM_IRQFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_PEER0 +#define IVSHMEM_MASTER 1 + +#define IVSHMEM_REG_BAR_SIZE 0x100 + +//#define DEBUG_IVSHMEM +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, ...)\ +do {printf(IVSHMEM: fmt, ## __VA_ARGS__); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, ...) +#endif + +typedef struct Peer { +int nb_eventfds; +int *eventfds; +} Peer; + +typedef struct EventfdEntry { +PCIDevice *pdev; +int vector; +} EventfdEntry; + +typedef struct IVShmemState { +PCIDevice dev; +uint32_t intrmask; +uint32_t intrstatus; +uint32_t doorbell; + +CharDriverState **eventfd_chr; +CharDriverState *server_chr; +int ivshmem_mmio_io_addr; + +pcibus_t mmio_addr; +pcibus_t shm_pci_addr; +uint64_t ivshmem_offset; +uint64_t ivshmem_size; /* size of shared memory region */ +int shm_fd; /* shared memory file descriptor */ + +Peer *peers; +int nb_peers; /* how many guests we have space for */ +int max_peer; /* maximum numbered peer */ + +int vm_id; +uint32_t vectors; +uint32_t features; +EventfdEntry *eventfd_table; + +char * shmobj; +char * sizearg; +char * role; +int role_val; /* scalar to avoid multiple string comparisons */ +} IVShmemState; + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +INTRMASK = 0, +INTRSTATUS = 4, +IVPOSITION = 8, +DOORBELL = 12, +}; + +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, +unsigned int feature) { +return (ivs-features (1 feature)); +} + +static inline bool is_power_of_two(uint64_t x) { +return (x (x - 1)) == 0; +} + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +pcibus_t addr, pcibus_t size, int type) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + +s-shm_pci_addr = addr; + +if (s-ivshmem_offset 0) { +cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, +s-ivshmem_offset); +} + +IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = % +PRIu64 , size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size); + +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s, int val) +{ +int isr; +isr = (s-intrstatus s-intrmask) 0x; + +/* don't print ISR resets */ +if (isr) { +IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n, + isr ? 1 : 0, s-intrstatus, s-intrmask); +} + +qemu_set_irq(s-dev.irq[0], (isr != 0)); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ +IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val); + +s-intrmask = val
Re: [Qemu-devel] KVM call minutes for July 6
On Tue, Jul 6, 2010 at 8:46 AM, Juan Quintela quint...@redhat.com wrote: Today was a short call as Anthony didn't attend. - Jes reminded people that Linux Plumbers deadline is July 19th - Qemu 0.13 (from the agenda, we didn't discuss it without Anthony). List of patches missing commit/comment/review from Anthony. We decided to send to the list an initial list, and anybody that has anything empty just anwer to this email. Juan (i.e. me) was voluntered to keep that list. In this list there is: - migration subsections (juan) will resend today - migration ide: we are in a mess, will resend with subsections. - migration memory with names. Allows migration with hotplug/hotunplug (Alex), it is waiting for comments. - shared memory PCI device is waiting for commit or further comment/review
[Qemu-devel] Re: [PATCH v7 0/4] Inter-VM shared memory device
On Tue, Jun 15, 2010 at 2:23 PM, Cam Macdonell c...@cs.ualberta.ca wrote: Latest patch for PCI shared memory device that maps a host shared memory object to be shared between guests new in this series - replace marking memory from v6 with marking device as unmigratable indicating that it should be unplugged before migration and re-added after. - 'peer' case changed to require removal before migration, only 'master' devices can be migrated while attached. v6 - migration support with 'master' and 'peer' roles for guest to determine who owns memory - modified phys_ram_dirty array for marking memory as not to be migrated v5: - fixed segfault for non-server case - code style fixes - removed limit on the number of guests - shared memory server is now in qemu.git/contrib - made ioeventfd setup function generic - removed interrupts when guest joined (let application handle it) v4: - moved to single Doorbell register and use datamatch to trigger different VMs rather than one register per eventfd - remove writing arbitrary values to eventfds. Only values of 1 are now written to ensure correct usage Cam Macdonell (4): Device specification for shared memory PCI device Add function to assign ioeventfd to MMIO. Support marking a device as non-migratable Inter-VM shared memory PCI device Makefile.target | 3 + docs/specs/ivshmem_device_spec.txt | 96 + hw/hw.h | 1 + hw/ivshmem.c | 823 kvm-all.c | 32 ++ kvm.h | 1 + qemu-char.c | 6 + qemu-char.h | 3 + qemu-doc.texi | 43 ++ savevm.c | 32 ++- 10 files changed, 1037 insertions(+), 3 deletions(-) create mode 100644 docs/specs/ivshmem_device_spec.txt create mode 100644 hw/ivshmem.c Hi, Are there outstanding concerns with this patchset? Can it be merged? I can rebase if necessary. Thanks, Cam
[Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Tue, Jun 29, 2010 at 12:50 AM, Avi Kivity a...@redhat.com wrote: On 06/28/2010 11:38 PM, Cam Macdonell wrote: Is this really the address the guest programmed, or is qemu misinterpreting it? Well, what's the answer? You're going to have to give me a hint on how to determine that. lspci in the guest shows the following Memory at c200 (64-bit, non-prefetchable) [size=1024M] does that demonstrate a guest generated address? That's the result of a round trip: the guest programmed the address and then read it back. It could have been screwed up in the first place, or perhaps qemu screwed it up. Add a printf() to the config space handlers in qemu (likely in your own code) on writes and reads, and show the relevant writes (and reads) for this BAR. That's the theory of deductive debugging; however browsing the code shows the guest is at fault: for (i = 0; i PCI_NUM_REGIONS; i++) { int ofs; if (i == PCI_ROM_SLOT) ofs = PCI_ROM_ADDRESS; else ofs = PCI_BASE_ADDRESS_0 + i * 4; u32 old = pci_config_readl(bdf, ofs); u32 mask; if (i == PCI_ROM_SLOT) { mask = PCI_ROM_ADDRESS_MASK; pci_config_writel(bdf, ofs, mask); } else { if (old PCI_BASE_ADDRESS_SPACE_IO) mask = PCI_BASE_ADDRESS_IO_MASK; else mask = PCI_BASE_ADDRESS_MEM_MASK; pci_config_writel(bdf, ofs, ~0); } u32 val = pci_config_readl(bdf, ofs); pci_config_writel(bdf, ofs, old); if (val != 0) { u32 size = (~(val mask)) + 1; if (val PCI_BASE_ADDRESS_SPACE_IO) paddr = pci_bios_io_addr; else paddr = pci_bios_mem_addr; *paddr = ALIGN(*paddr, size); pci_set_io_region_addr(bdf, i, *paddr); *paddr += size; } } break; } Seabios completely ignore the 64-bitness of the BAR. Looks like it also thinks the second half of the BAR is an I/O region instead of memory (hence the c200, that's part of the pci portio region. Do post those reads and writes, I think there's more than one thing wrong here. Here it is, I added the debug statements to pci_read_config and pci_default_write_config. here are the reads and writes to offsets 0x18 and 0x1c where a 64-bit BAR2 config would be configured pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc004 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc004 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc040 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) the complete read/write profile is below along with debug statements from the map functions for the BARs (prefixed with IVSHMEM) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x0 - 0xe (addr) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x1110 - 0x2 (addr) pci_read_config: (val) 0x0 - 0xe (addr) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x0 - 0xe (addr) pci_read_config: (val) 0x500 - 0xa (addr) pci_read_config: (val) 0x1af4 - 0x0 (addr) pci_read_config: (val) 0x1110 - 0x2 (addr) pci_read_config: (val) 0x0 - 0x10 (addr) pci_write_config: (val) 0x0 - 0x10 (addr) pci_read_config: (val) 0xff00 - 0x10 (addr) pci_write_config: (val) 0x0 - 0x10 (addr) pci_read_config: (val) 0x0 - 0x10 (addr) pci_write_config: (val) 0x0 - 0x10 (addr) pci_read_config: (val) 0x0 - 0x14 (addr) pci_write_config: (val) 0x0 - 0x14 (addr) pci_read_config: (val) 0xf000 - 0x14 (addr) pci_write_config: (val) 0x0 - 0x14 (addr) pci_read_config: (val) 0x0 - 0x14 (addr) pci_write_config: (val) 0x0 - 0x14 (addr) pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0xc004 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x4 - 0x18 (addr) pci_write_config: (val) 0x0 - 0x18 (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c (addr) pci_read_config: (val) 0x0 - 0x1c (addr) pci_write_config: (val) 0x0 - 0x1c
[Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Sun, Jun 27, 2010 at 2:39 AM, Avi Kivity a...@redhat.com wrote: On 06/25/2010 12:51 AM, Cam Macdonell wrote: On Tue, Jun 15, 2010 at 5:04 AM, Avi Kivitya...@redhat.com wrote: On 06/11/2010 08:31 PM, Cam Macdonell wrote: On Mon, Apr 19, 2010 at 10:41 AM, Cam Macdonellc...@cs.ualberta.ca wrote: Hi, I'm trying to use a 64-bit BAR for my shared memory device. In simply changing the memory type in pci_register_bar() to PCI_BASE_ADDRESS_MEM_TYPE_64 I get an unusual physical address for that BAR (and my driver crashes in pci_ioremap). from lspci: 00:04.0 RAM memory: Qumranet, Inc. Device 1110 Subsystem: Qumranet, Inc. Device 1100 Flags: fast devsel, IRQ 10 Memory at f102 (32-bit, non-prefetchable) [size=1K] Memory at f1021000 (32-bit, non-prefetchable) [size=4K] Memory at c200 (64-bit, non-prefetchable) [size=1024M] Capabilities:access denied 00: f4 1a 10 11 03 00 10 00 00 00 00 05 00 00 00 00 10: 00 00 02 f1 00 10 02 f1 04 00 00 00 00 c2 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 f4 1a 00 11 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00 with DEBUG_MEMREG, I see kvm_unregister_memory_area:666 Unregistering memory region c200 (4000) kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0 IVSHMEM: addr = 3221225472 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: c200c000, size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 kvm_unregister_memory_area:666 Unregistering memory region c200c000 (4000) kvm_destroy_phys_mem:649 slot 7 start c200c000 len 0 flags 0 IVSHMEM: addr = 0 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 kvm_unregister_memory_area:666 Unregistering memory region c200 (4000) kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0 IVSHMEM: addr = 0 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: , size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 kvm_unregister_memory_area:666 Unregistering memory region (4000) kvm_destroy_phys_mem:649 slot 7 start len 0 flags 0 IVSHMEM: addr = 0 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 (the IVSHMEM lines are my debug statements) address sizes : 40 bits physical, 48 bits virtual (guest) address sizes : 38 bits physical, 48 bits virtual (host) Hi, I happened to run into this problem again when trying to use a 64-bit BAR. I did a bit more digging and the test that is failing is called from arch/x86/mm/ioremap.c in the guest and here it is. static inline int phys_addr_valid(resource_size_t addr) { #ifdef CONFIG_PHYS_ADDR_T_64BIT return !(addr boot_cpu_data.x86_phys_bits); #else return 1; #endif } the value of addr (in this case the 48-bit virtual address c200) is shifted to the right shift by boot_cpu_data.x86_phys_bits (which is 40 bits, the physical address size), so a non-zero value is returned which causes the test to fail and generates the invalid physical address error in the guest. Any help is appreciated as to whether this is a Qemu or guest kernel issue. The guest kernel should never have generated an address that is bigger than cpu_phys_bits in the first place. What's the value for cpu_phys_bits in the guest? (/proc/cpuinfo, 'address sizes :' line). Sorry I missed your reply until now. The guest address sizes are as follows: address sizes : 40 bits physical, 48 bits virtual So the address c200 is illegal. Is this really the address the guest programmed, or is qemu misinterpreting it? Well, what's the answer? You're going to have to give me a hint on how to determine that. lspci in the guest shows the following Memory at c200 (64-bit, non-prefetchable) [size=1024M] does that demonstrate a guest generated address? Cam -- error compiling committee.c: too many arguments to function
[Qemu-devel] Re: Unusual physical address when using 64-bit BAR
On Tue, Jun 15, 2010 at 5:04 AM, Avi Kivity a...@redhat.com wrote: On 06/11/2010 08:31 PM, Cam Macdonell wrote: On Mon, Apr 19, 2010 at 10:41 AM, Cam Macdonellc...@cs.ualberta.ca wrote: Hi, I'm trying to use a 64-bit BAR for my shared memory device. In simply changing the memory type in pci_register_bar() to PCI_BASE_ADDRESS_MEM_TYPE_64 I get an unusual physical address for that BAR (and my driver crashes in pci_ioremap). from lspci: 00:04.0 RAM memory: Qumranet, Inc. Device 1110 Subsystem: Qumranet, Inc. Device 1100 Flags: fast devsel, IRQ 10 Memory at f102 (32-bit, non-prefetchable) [size=1K] Memory at f1021000 (32-bit, non-prefetchable) [size=4K] Memory at c200 (64-bit, non-prefetchable) [size=1024M] Capabilities:access denied 00: f4 1a 10 11 03 00 10 00 00 00 00 05 00 00 00 00 10: 00 00 02 f1 00 10 02 f1 04 00 00 00 00 c2 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 f4 1a 00 11 30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00 with DEBUG_MEMREG, I see kvm_unregister_memory_area:666 Unregistering memory region c200 (4000) kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0 IVSHMEM: addr = 3221225472 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: c200c000, size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 kvm_unregister_memory_area:666 Unregistering memory region c200c000 (4000) kvm_destroy_phys_mem:649 slot 7 start c200c000 len 0 flags 0 IVSHMEM: addr = 0 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 kvm_unregister_memory_area:666 Unregistering memory region c200 (4000) kvm_destroy_phys_mem:649 slot 7 start c200 len 0 flags 0 IVSHMEM: addr = 0 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: , size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 kvm_unregister_memory_area:666 Unregistering memory region (4000) kvm_destroy_phys_mem:649 slot 7 start len 0 flags 0 IVSHMEM: addr = 0 size = 1073741824 kvm_register_phys_mem:605 memory: gpa: c200, size: 4000, uaddr: 7f6dd7ffe000, slot: 7, flags: 0 (the IVSHMEM lines are my debug statements) address sizes : 40 bits physical, 48 bits virtual (guest) address sizes : 38 bits physical, 48 bits virtual (host) Hi, I happened to run into this problem again when trying to use a 64-bit BAR. I did a bit more digging and the test that is failing is called from arch/x86/mm/ioremap.c in the guest and here it is. static inline int phys_addr_valid(resource_size_t addr) { #ifdef CONFIG_PHYS_ADDR_T_64BIT return !(addr boot_cpu_data.x86_phys_bits); #else return 1; #endif } the value of addr (in this case the 48-bit virtual address c200) is shifted to the right shift by boot_cpu_data.x86_phys_bits (which is 40 bits, the physical address size), so a non-zero value is returned which causes the test to fail and generates the invalid physical address error in the guest. Any help is appreciated as to whether this is a Qemu or guest kernel issue. The guest kernel should never have generated an address that is bigger than cpu_phys_bits in the first place. What's the value for cpu_phys_bits in the guest? (/proc/cpuinfo, 'address sizes :' line). Sorry I missed your reply until now. The guest address sizes are as follows: address sizes : 40 bits physical, 48 bits virtual Is this really the address the guest programmed, or is qemu misinterpreting it? -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH v7 RESEND 4/4] Inter-VM shared memory PCI device
resent for some printf macros and one addition Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id The shared memory server, sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target |3 + hw/ivshmem.c| 828 +++ qemu-char.c |6 + qemu-char.h |3 + qemu-doc.texi | 43 +++ 5 files changed, 883 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index a0e9747..1e99ec8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -203,6 +203,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o obj-y += rtl8139.o obj-y += e1000.o +# Inter-VM PCI shared memory +obj-y += ivshmem.o + # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..8534e50 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,828 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell c...@cs.ualberta.ca + * + * Based On: cirrus_vga.c + * Copyright (c) 2004 Fabrice Bellard + * Copyright (c) 2004 Makoto Suzuki (suzu) + * + * and rtl8139.c + * Copyright (c) 2006 Igor Kovalenko + * + * This code is licensed under the GNU GPL v2. + */ +#include hw.h +#include pc.h +#include pci.h +#include msix.h +#include kvm.h + +#include sys/mman.h +#include sys/types.h + +#define IVSHMEM_IRQFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_PEER0 +#define IVSHMEM_MASTER 1 + +#define IVSHMEM_REG_BAR_SIZE 0x100 + +#define DEBUG_IVSHMEM +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, ...)\ +do {printf(IVSHMEM: fmt, ## __VA_ARGS__); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, ...) +#endif + +typedef struct Peer { +int nb_eventfds; +int *eventfds; +} Peer; + +typedef struct EventfdEntry { +PCIDevice *pdev; +int vector; +} EventfdEntry; + +typedef struct IVShmemState { +PCIDevice dev; +uint32_t intrmask; +uint32_t intrstatus; +uint32_t doorbell; + +CharDriverState **eventfd_chr; +CharDriverState *server_chr; +int ivshmem_mmio_io_addr; + +pcibus_t mmio_addr; +pcibus_t shm_pci_addr; +uint64_t ivshmem_offset; +uint64_t ivshmem_size; /* size of shared memory region */ +int shm_fd; /* shared memory file descriptor */ + +Peer *peers; +int nb_peers; /* how many guests we have space for */ +int max_peer; /* maximum numbered peer */ + +int vm_id; +uint32_t vectors; +uint32_t features; +EventfdEntry *eventfd_table; + +char * shmobj; +char * sizearg; +char * role; +int role_val; /* scalar to avoid multiple string comparisons */ +} IVShmemState; + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +INTRMASK = 0, +INTRSTATUS = 4, +IVPOSITION = 8, +DOORBELL = 12, +}; + +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, unsigned int feature) { +return (ivs-features (1 feature)); +} + +static inline bool is_power_of_two(uint64_t x) { +return (x (x - 1)) == 0; +} + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +pcibus_t addr, pcibus_t size, int type) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + +s-shm_pci_addr = addr; + +if (s-ivshmem_offset 0) { +cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, +s-ivshmem_offset); +} + +IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS , guest h/w addr = % PRIu64 +, size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size); + +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s, int val) +{ +int isr; +isr = (s-intrstatus s-intrmask) 0x; + +/* don't print ISR resets */ +if (isr) { +IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n, + isr ? 1 : 0, s-intrstatus, s-intrmask); +} + +qemu_set_irq(s-dev.irq[0], (isr != 0)); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ +IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val); + +s-intrmask = val; + +ivshmem_update_irq(s, val); +} + +static uint32_t ivshmem_IntrMask_read
Re: [Qemu-devel] Re: [PATCH RFC] Mark a device as non-migratable
On Wed, Jun 16, 2010 at 6:34 AM, Anthony Liguori anth...@codemonkey.ws wrote: On 06/16/2010 12:05 AM, Cam Macdonell wrote: On Tue, Jun 15, 2010 at 4:33 PM, Anthony Liguorianth...@codemonkey.ws wrote: On 06/15/2010 05:26 PM, Cam Macdonell wrote: On Tue, Jun 15, 2010 at 10:32 AM, Anthony Liguorianth...@codemonkey.ws wrote: On 06/15/2010 11:16 AM, Cam Macdonell wrote: How does this look for marking the device as non-migratable? It adds a field 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save. This would replace anything that touches memory. Cam --- hw/hw.h | 1 + savevm.c | 32 +--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index d78d814..7c93f08 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr, void *opaque); void unregister_savevm(const char *idstr, void *opaque); +void mark_no_migrate(const char *idstr, void *opaque); I'm not thrilled with the name but the functionality is spot on. I lack the creativity to offer a better name suggestion :-) Regards, Anthony Liguori Hmmm, in working on this it seems that the memory (from qemu_ram_map()) is still attached even when the device is removed (which causes migration to fail because there is an unexpected memory). Is something like cpu_unregister_physical_memory()/qemu_ram_free() needed? Yes. You need to unregister any memory that you have registered upon device removal. Is there an established way to achieve this? I can't seem find another device that unregisters memory registered with cpu_register_physical_memory(). Is something like cpu_unregister_physical_memory() needed? cpu_register_physical_memory(IO_MEM_UNASSIGNED). If you look at pci.c, you'll see that it automatically unregisters any mapped io regions on remove. It appears that the 'peer' migration won't work until memory hotplug is supported, correct? AFAICT the memory sizes will not match between the source and destination VMs after the device is removed and the memory system currently doesn't support gaps. A technique similar to my patch for non-migratable memory would be needed to mark free'd memory pages without Alex's patches in. For the purposes of my patch, can it be merged without the 'peer' case (pending Alex's patches and hotplug)? Thanks, Cam
[Qemu-devel] [PATCH RFC] Mark a device as non-migratable
How does this look for marking the device as non-migratable? It adds a field 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save. This would replace anything that touches memory. Cam --- hw/hw.h |1 + savevm.c | 32 +--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index d78d814..7c93f08 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr, void *opaque); void unregister_savevm(const char *idstr, void *opaque); +void mark_no_migrate(const char *idstr, void *opaque); typedef void QEMUResetHandler(void *opaque); diff --git a/savevm.c b/savevm.c index 017695b..2642a9c 100644 --- a/savevm.c +++ b/savevm.c @@ -1023,6 +1023,7 @@ typedef struct SaveStateEntry { LoadStateHandler *load_state; const VMStateDescription *vmsd; void *opaque; +int no_migrate; } SaveStateEntry; @@ -1069,6 +1070,7 @@ int register_savevm_live(const char *idstr, se-load_state = load_state; se-opaque = opaque; se-vmsd = NULL; +se-no_migrate = 0; if (instance_id == -1) { se-instance_id = calculate_new_instance_id(idstr); @@ -1103,6 +1105,19 @@ void unregister_savevm(const char *idstr, void *opaque) } } +/* mark a device as not to be migrated, that is the device should be + unplugged before migration */ +void mark_no_migrate(const char *idstr, void *opaque) +{ +SaveStateEntry *se; + +QTAILQ_FOREACH(se, savevm_handlers, entry) { +if (strcmp(se-idstr, idstr) == 0 se-opaque == opaque) { +se-no_migrate = 1; +} +} +} + int vmstate_register_with_alias_id(int instance_id, const VMStateDescription *vmsd, void *opaque, int alias_id, @@ -1277,13 +1292,19 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id) return vmstate_load_state(f, se-vmsd, se-opaque, version_id); } -static void vmstate_save(QEMUFile *f, SaveStateEntry *se) +static int vmstate_save(QEMUFile *f, SaveStateEntry *se) { +if (se-no_migrate) { +return -1; +} + if (!se-vmsd) { /* Old style */ se-save_state(f, se-opaque); -return; +return 0; } vmstate_save_state(f,se-vmsd, se-opaque); + +return 0; } #define QEMU_VM_FILE_MAGIC 0x5145564d @@ -1377,6 +1398,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f) int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) { SaveStateEntry *se; +int r; cpu_synchronize_all_states(); @@ -1409,7 +1431,11 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) qemu_put_be32(f, se-instance_id); qemu_put_be32(f, se-version_id); -vmstate_save(f, se); +r = vmstate_save(f, se); +if (r 0) { +monitor_printf(mon, cannot migrate with device '%s'\n, se-idstr); +return r; +} } qemu_put_byte(f, QEMU_VM_EOF); -- 1.6.3.2.198.g6096d
[Qemu-devel] [PATCH v7 0/4] Inter-VM shared memory device
Latest patch for PCI shared memory device that maps a host shared memory object to be shared between guests new in this series - replace marking memory from v6 with marking device as unmigratable indicating that it should be unplugged before migration and re-added after. - 'peer' case changed to require removal before migration, only 'master' devices can be migrated while attached. v6 - migration support with 'master' and 'peer' roles for guest to determine who owns memory - modified phys_ram_dirty array for marking memory as not to be migrated v5: - fixed segfault for non-server case - code style fixes - removed limit on the number of guests - shared memory server is now in qemu.git/contrib - made ioeventfd setup function generic - removed interrupts when guest joined (let application handle it) v4: - moved to single Doorbell register and use datamatch to trigger different VMs rather than one register per eventfd - remove writing arbitrary values to eventfds. Only values of 1 are now written to ensure correct usage Cam Macdonell (4): Device specification for shared memory PCI device Add function to assign ioeventfd to MMIO. Support marking a device as non-migratable Inter-VM shared memory PCI device Makefile.target|3 + docs/specs/ivshmem_device_spec.txt | 96 + hw/hw.h|1 + hw/ivshmem.c | 823 kvm-all.c | 32 ++ kvm.h |1 + qemu-char.c|6 + qemu-char.h|3 + qemu-doc.texi | 43 ++ savevm.c | 32 ++- 10 files changed, 1037 insertions(+), 3 deletions(-) create mode 100644 docs/specs/ivshmem_device_spec.txt create mode 100644 hw/ivshmem.c
[Qemu-devel] [PATCH v7 3/4] Support marking a device as non-migratable
A non-migratable device should be removed before migration and re-added after. Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- hw/hw.h |1 + savevm.c | 32 +--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index d78d814..894e6b0 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr, void *opaque); void unregister_savevm(const char *idstr, void *opaque); +void register_device_unmigratable(const char *idstr, void *opaque); typedef void QEMUResetHandler(void *opaque); diff --git a/savevm.c b/savevm.c index ada5c63..ac1d70d 100644 --- a/savevm.c +++ b/savevm.c @@ -1024,6 +1024,7 @@ typedef struct SaveStateEntry { LoadStateHandler *load_state; const VMStateDescription *vmsd; void *opaque; +int no_migrate; } SaveStateEntry; @@ -1070,6 +1071,7 @@ int register_savevm_live(const char *idstr, se-load_state = load_state; se-opaque = opaque; se-vmsd = NULL; +se-no_migrate = 0; if (instance_id == -1) { se-instance_id = calculate_new_instance_id(idstr); @@ -1104,6 +1106,19 @@ void unregister_savevm(const char *idstr, void *opaque) } } +/* mark a device as not to be migrated, that is the device should be + unplugged before migration */ +void register_device_unmigratable(const char *idstr, void *opaque) +{ +SaveStateEntry *se; + +QTAILQ_FOREACH(se, savevm_handlers, entry) { +if (strcmp(se-idstr, idstr) == 0 se-opaque == opaque) { +se-no_migrate = 1; +} +} +} + int vmstate_register_with_alias_id(int instance_id, const VMStateDescription *vmsd, void *opaque, int alias_id, @@ -1278,13 +1293,19 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, int version_id) return vmstate_load_state(f, se-vmsd, se-opaque, version_id); } -static void vmstate_save(QEMUFile *f, SaveStateEntry *se) +static int vmstate_save(QEMUFile *f, SaveStateEntry *se) { +if (se-no_migrate) { +return -1; +} + if (!se-vmsd) { /* Old style */ se-save_state(f, se-opaque); -return; +return 0; } vmstate_save_state(f,se-vmsd, se-opaque); + +return 0; } #define QEMU_VM_FILE_MAGIC 0x5145564d @@ -1378,6 +1399,7 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f) int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) { SaveStateEntry *se; +int r; cpu_synchronize_all_states(); @@ -1410,7 +1432,11 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f) qemu_put_be32(f, se-instance_id); qemu_put_be32(f, se-version_id); -vmstate_save(f, se); +r = vmstate_save(f, se); +if (r 0) { +monitor_printf(mon, cannot migrate with device '%s'\n, se-idstr); +return r; +} } qemu_put_byte(f, QEMU_VM_EOF); -- 1.6.3.2.198.g6096d
[Qemu-devel] [PATCH v7 2/4] Add function to assign ioeventfd to MMIO.
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- kvm-all.c | 32 kvm.h |1 + 2 files changed, 33 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 47f58a6..2982631 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1257,6 +1257,38 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset) return r; } +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t addr, uint32_t val, bool assign) +{ +#ifdef KVM_IOEVENTFD +int ret; +struct kvm_ioeventfd iofd; + +iofd.datamatch = val; +iofd.addr = addr; +iofd.len = 4; +iofd.flags = KVM_IOEVENTFD_FLAG_DATAMATCH; +iofd.fd = fd; + +if (!kvm_enabled()) { +return -ENOSYS; +} + +if (!assign) { +iofd.flags |= KVM_IOEVENTFD_FLAG_DEASSIGN; +} + +ret = kvm_vm_ioctl(kvm_state, KVM_IOEVENTFD, iofd); + +if (ret 0) { +return -errno; +} + +return 0; +#else +return -ENOSYS; +#endif +} + int kvm_set_ioeventfd_pio_word(int fd, uint16_t addr, uint16_t val, bool assign) { #ifdef KVM_IOEVENTFD diff --git a/kvm.h b/kvm.h index aab5118..52e3a7f 100644 --- a/kvm.h +++ b/kvm.h @@ -181,6 +181,7 @@ static inline void cpu_synchronize_post_init(CPUState *env) } #endif +int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool assign); #if defined(KVM_IRQFD) defined(CONFIG_KVM) int kvm_set_irqfd(int gsi, int fd, bool assigned); -- 1.6.6.1
[Qemu-devel] [PATCH v7 4/4] Inter-VM shared memory PCI device
Support an inter-vm shared memory device that maps a shared-memory object as a PCI device in the guest. This patch also supports interrupts between guest by communicating over a unix domain socket. This patch applies to the qemu-kvm repository. -device ivshmem,size=size in format accepted by -m[,shm=shm name] Interrupts are supported between multiple VMs by using a shared memory server by using a chardev socket. -device ivshmem,size=size in format accepted by -m[,shm=shm name] [,chardev=id][,msi=on][,irqfd=on][,vectors=n][,role=peer|master] -chardev socket,path=path,id=id The shared memory server, sample programs and init scripts are in a git repo here: www.gitorious.org/nahanni Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- Makefile.target |3 + hw/ivshmem.c| 823 +++ qemu-char.c |6 + qemu-char.h |3 + qemu-doc.texi | 43 +++ 5 files changed, 878 insertions(+), 0 deletions(-) create mode 100644 hw/ivshmem.c diff --git a/Makefile.target b/Makefile.target index a0e9747..1e99ec8 100644 --- a/Makefile.target +++ b/Makefile.target @@ -203,6 +203,9 @@ obj-$(CONFIG_USB_OHCI) += usb-ohci.o obj-y += rtl8139.o obj-y += e1000.o +# Inter-VM PCI shared memory +obj-y += ivshmem.o + # Hardware support obj-i386-y += vga.o obj-i386-y += mc146818rtc.o i8259.o pc.o diff --git a/hw/ivshmem.c b/hw/ivshmem.c new file mode 100644 index 000..af035c3 --- /dev/null +++ b/hw/ivshmem.c @@ -0,0 +1,823 @@ +/* + * Inter-VM Shared Memory PCI device. + * + * Author: + * Cam Macdonell c...@cs.ualberta.ca + * + * Based On: cirrus_vga.c + * Copyright (c) 2004 Fabrice Bellard + * Copyright (c) 2004 Makoto Suzuki (suzu) + * + * and rtl8139.c + * Copyright (c) 2006 Igor Kovalenko + * + * This code is licensed under the GNU GPL v2. + */ +#include hw.h +#include pc.h +#include pci.h +#include msix.h +#include kvm.h + +#include sys/mman.h +#include sys/types.h + +#define IVSHMEM_IRQFD 0 +#define IVSHMEM_MSI 1 + +#define IVSHMEM_PEER0 +#define IVSHMEM_MASTER 1 + +#define IVSHMEM_REG_BAR_SIZE 0x100 + +//#define DEBUG_IVSHMEM +#ifdef DEBUG_IVSHMEM +#define IVSHMEM_DPRINTF(fmt, ...)\ +do {printf(IVSHMEM: fmt, ## __VA_ARGS__); } while (0) +#else +#define IVSHMEM_DPRINTF(fmt, ...) +#endif + +typedef struct Peer { +int nb_eventfds; +int *eventfds; +} Peer; + +typedef struct EventfdEntry { +PCIDevice *pdev; +int vector; +} EventfdEntry; + +typedef struct IVShmemState { +PCIDevice dev; +uint32_t intrmask; +uint32_t intrstatus; +uint32_t doorbell; + +CharDriverState **eventfd_chr; +CharDriverState *server_chr; +int ivshmem_mmio_io_addr; + +pcibus_t mmio_addr; +pcibus_t shm_pci_addr; +uint64_t ivshmem_offset; +uint64_t ivshmem_size; /* size of shared memory region */ +int shm_fd; /* shared memory file descriptor */ + +Peer *peers; +int nb_peers; /* how many guests we have space for */ +int max_peer; /* maximum numbered peer */ + +int vm_id; +uint32_t vectors; +uint32_t features; +EventfdEntry *eventfd_table; + +char * shmobj; +char * sizearg; +char * role; +int role_val; /* scalar to avoid multiple string comparisons */ +} IVShmemState; + +/* registers for the Inter-VM shared memory device */ +enum ivshmem_registers { +INTRMASK = 0, +INTRSTATUS = 4, +IVPOSITION = 8, +DOORBELL = 12, +}; + +static inline uint32_t ivshmem_has_feature(IVShmemState *ivs, unsigned int feature) { +return (ivs-features (1 feature)); +} + +static inline bool is_power_of_two(uint64_t x) { +return (x (x - 1)) == 0; +} + +static void ivshmem_map(PCIDevice *pci_dev, int region_num, +pcibus_t addr, pcibus_t size, int type) +{ +IVShmemState *s = DO_UPCAST(IVShmemState, dev, pci_dev); + +s-shm_pci_addr = addr; + +if (s-ivshmem_offset 0) { +cpu_register_physical_memory(s-shm_pci_addr, s-ivshmem_size, +s-ivshmem_offset); +} + +IVSHMEM_DPRINTF(guest pci addr = % FMT_PCIBUS ,guest h/w addr = % PRIu64 +size = % FMT_PCIBUS \n, addr, s-ivshmem_offset, size); + +} + +/* accessing registers - based on rtl8139 */ +static void ivshmem_update_irq(IVShmemState *s, int val) +{ +int isr; +isr = (s-intrstatus s-intrmask) 0x; + +/* don't print ISR resets */ +if (isr) { +IVSHMEM_DPRINTF(Set IRQ to %d (%04x %04x)\n, + isr ? 1 : 0, s-intrstatus, s-intrmask); +} + +qemu_set_irq(s-dev.irq[0], (isr != 0)); +} + +static void ivshmem_IntrMask_write(IVShmemState *s, uint32_t val) +{ +IVSHMEM_DPRINTF(IntrMask write(w) val = 0x%04x\n, val); + +s-intrmask = val; + +ivshmem_update_irq(s, val); +} + +static uint32_t ivshmem_IntrMask_read(IVShmemState *s) +{ +uint32_t ret = s-intrmask
[Qemu-devel] [PATCH v7 1/4] Device specification for shared memory PCI device
Signed-off-by: Cam Macdonell c...@cs.ualberta.ca --- docs/specs/ivshmem_device_spec.txt | 96 1 files changed, 96 insertions(+), 0 deletions(-) create mode 100644 docs/specs/ivshmem_device_spec.txt diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt new file mode 100644 index 000..23dd2ba --- /dev/null +++ b/docs/specs/ivshmem_device_spec.txt @@ -0,0 +1,96 @@ + +Device Specification for Inter-VM shared memory device +-- + +The Inter-VM shared memory device is designed to share a region of memory to +userspace in multiple virtual guests. The memory region does not belong to any +guest, but is a POSIX memory object on the host. Optionally, the device may +support sending interrupts to other guests sharing the same memory region. + + +The Inter-VM PCI device +--- + +*BARs* + +The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support +registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is +used to map the shared memory object from the host. The size of BAR2 is +specified when the guest is started and must be a power of 2 in size. + +*Registers* + +The device currently supports 4 registers of 32-bits each. Registers +are used for synchronization between guests sharing the same memory object when +interrupts are supported (this requires using the shared memory server). + +The server assigns each VM an ID number and sends this ID number to the Qemu +process when the guest starts. + +enum ivshmem_registers { +IntrMask = 0, +IntrStatus = 4, +IVPosition = 8, +Doorbell = 12 +}; + +The first two registers are the interrupt mask and status registers. Mask and +status are only used with pin-based interrupts. They are unused with MSI +interrupts. + +Status Register: The status register is set to 1 when an interrupt occurs. + +Mask Register: The mask register is bitwise ANDed with the interrupt status +and the result will raise an interrupt if it is non-zero. However, since 1 is +the only value the status will be set to, it is only the first bit of the mask +that has any effect. Therefore interrupts can be masked by setting the first +bit to 0 and unmasked by setting the first bit to 1. + +IVPosition Register: The IVPosition register is read-only and reports the +guest's ID number. The guest IDs are non-negative integers. When using the +server, since the server is a separate process, the VM ID will only be set when +the device is ready (shared memory is received from the server and accessible via +the device). If the device is not ready, the IVPosition will return -1. +Applications should ensure that they have a valid VM ID before accessing the +shared memory. + +Doorbell Register: To interrupt another guest, a guest must write to the +Doorbell register. The doorbell register is 32-bits, logically divided into +two 16-bit fields. The high 16-bits are the guest ID to interrupt and the low +16-bits are the interrupt vector to trigger. The semantics of the value +written to the doorbell depends on whether the device is using MSI or a regular +pin-based interrupt. In short, MSI uses vectors while regular interrupts set the +status register. + +Regular Interrupts + +If regular interrupts are used (due to either a guest not supporting MSI or the +user specifying not to use them on startup) then the value written to the lower +16-bits of the Doorbell register results is arbitrary and will trigger an +interrupt in the destination guest. + +Message Signalled Interrupts + +A ivshmem device may support multiple MSI vectors. If so, the lower 16-bits +written to the Doorbell register must be between 0 and the maximum number of +vectors the guest supports. The lower 16 bits written to the doorbell is the +MSI vector that will be raised in the destination guest. The number of MSI +vectors is configurable but it is set when the VM is started. + +The important thing to remember with MSI is that it is only a signal, no status +is set (since MSI interrupts are not shared). All information other than the +interrupt itself should be communicated via the shared memory region. Devices +supporting multiple MSI vectors can use different vectors to indicate different +events have occurred. The semantics of interrupt vectors are left to the +user's discretion. + + +Usage in the Guest +-- + +The shared memory device is intended to be used with the provided UIO driver. +Very little configuration is needed. The guest should map BAR0 to access the +registers (an array of 32-bit ints allows simple writing) and map BAR2 to +access the shared memory region itself. The size of the shared memory region +is specified when the guest (or shared memory server) is started. A guest may +map the whole shared memory region or only part of it. -- 1.6.6.1
[Qemu-devel] Re: [PATCH RFC] Mark a device as non-migratable
On Tue, Jun 15, 2010 at 10:32 AM, Anthony Liguori anth...@codemonkey.ws wrote: On 06/15/2010 11:16 AM, Cam Macdonell wrote: How does this look for marking the device as non-migratable? It adds a field 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save. This would replace anything that touches memory. Cam --- hw/hw.h | 1 + savevm.c | 32 +--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index d78d814..7c93f08 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr, void *opaque); void unregister_savevm(const char *idstr, void *opaque); +void mark_no_migrate(const char *idstr, void *opaque); I'm not thrilled with the name but the functionality is spot on. I lack the creativity to offer a better name suggestion :-) Regards, Anthony Liguori Hmmm, in working on this it seems that the memory (from qemu_ram_map()) is still attached even when the device is removed (which causes migration to fail because there is an unexpected memory). Is something like cpu_unregister_physical_memory()/qemu_ram_free() needed? Cam
Re: [Qemu-devel] Re: [PATCH RFC] Mark a device as non-migratable
On Tue, Jun 15, 2010 at 4:33 PM, Anthony Liguori anth...@codemonkey.ws wrote: On 06/15/2010 05:26 PM, Cam Macdonell wrote: On Tue, Jun 15, 2010 at 10:32 AM, Anthony Liguorianth...@codemonkey.ws wrote: On 06/15/2010 11:16 AM, Cam Macdonell wrote: How does this look for marking the device as non-migratable? It adds a field 'no_migrate' to the SaveStateEntry and tests for it in vmstate_save. This would replace anything that touches memory. Cam --- hw/hw.h | 1 + savevm.c | 32 +--- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/hw/hw.h b/hw/hw.h index d78d814..7c93f08 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -263,6 +263,7 @@ int register_savevm_live(const char *idstr, void *opaque); void unregister_savevm(const char *idstr, void *opaque); +void mark_no_migrate(const char *idstr, void *opaque); I'm not thrilled with the name but the functionality is spot on. I lack the creativity to offer a better name suggestion :-) Regards, Anthony Liguori Hmmm, in working on this it seems that the memory (from qemu_ram_map()) is still attached even when the device is removed (which causes migration to fail because there is an unexpected memory). Is something like cpu_unregister_physical_memory()/qemu_ram_free() needed? Yes. You need to unregister any memory that you have registered upon device removal. Is there an established way to achieve this? I can't seem find another device that unregisters memory registered with cpu_register_physical_memory(). Is something like cpu_unregister_physical_memory() needed? Thanks, Cam Regards, Anthony Liguori Cam