Re: [Qemu-devel] virtio-blk broken after system reset
Am 13.11.2010 08:49, Stefan Hajnoczi wrote: > On Fri, Nov 12, 2010 at 10:02 PM, Jan Kiszka wrote: >> Hi, >> >> both after hard and guest-initiated reset, something is seriously broken >> with virtio block devices. If I reset my Linux guest while still in >> grub, the bios will simply fail to read from the disk after the reboot. If I >> reset after Linux touched the device, qemu terminates: >> >> Breakpoint 1, 0x74b945b0 in _exit () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x74b945b0 in _exit () from /lib64/libc.so.6 >> #1 0x74b2948d in __run_exit_handlers () from /lib64/libc.so.6 >> #2 0x74b29535 in exit () from /lib64/libc.so.6 >> #3 0x00568da3 in virtqueue_num_heads (vq=0x17040e0, idx=0) at >> /data/qemu/hw/virtio.c:258 >> #4 0x00569511 in virtqueue_pop (vq=0x17040e0, elem=0x17cea58) at >> /data/qemu/hw/virtio.c:388 >> #5 0x00419e31 in virtio_blk_get_request (s=0x1704010) at >> /data/qemu/hw/virtio-blk.c:132 >> #6 virtio_blk_handle_output (vdev=0x1704010, vq=) at >> /data/qemu/hw/virtio-blk.c:369 >> >> This is with current qemu.git head, haven't tried older versions. Known bug? > > This is a known issue. Gleb has posted a SeaBIOS fix: > > http://www.mail-archive.com/qemu-devel@nongnu.org/msg45849.html > > Currently the patch only appears in SeaBIOS master. Gleb and Kevin > have discussed putting it into 0.6.1.2 stable (see linked thread). > QEMU should then pick that release up. Ah, good. And what about the guest-triggerable qemu exit above? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] virtio-blk broken after system reset
On Fri, Nov 12, 2010 at 10:02 PM, Jan Kiszka wrote: > Hi, > > both after hard and guest-initiated reset, something is seriously broken > with virtio block devices. If I reset my Linux guest while still in > grub, the bios will simply fail to read from the disk after the reboot. If I > reset after Linux touched the device, qemu terminates: > > Breakpoint 1, 0x74b945b0 in _exit () from /lib64/libc.so.6 > (gdb) bt > #0 0x74b945b0 in _exit () from /lib64/libc.so.6 > #1 0x74b2948d in __run_exit_handlers () from /lib64/libc.so.6 > #2 0x74b29535 in exit () from /lib64/libc.so.6 > #3 0x00568da3 in virtqueue_num_heads (vq=0x17040e0, idx=0) at > /data/qemu/hw/virtio.c:258 > #4 0x00569511 in virtqueue_pop (vq=0x17040e0, elem=0x17cea58) at > /data/qemu/hw/virtio.c:388 > #5 0x00419e31 in virtio_blk_get_request (s=0x1704010) at > /data/qemu/hw/virtio-blk.c:132 > #6 virtio_blk_handle_output (vdev=0x1704010, vq=) at > /data/qemu/hw/virtio-blk.c:369 > > This is with current qemu.git head, haven't tried older versions. Known bug? This is a known issue. Gleb has posted a SeaBIOS fix: http://www.mail-archive.com/qemu-devel@nongnu.org/msg45849.html Currently the patch only appears in SeaBIOS master. Gleb and Kevin have discussed putting it into 0.6.1.2 stable (see linked thread). QEMU should then pick that release up. Stefan
[Qemu-devel] [Bug 510612] Re: sound broken in qemu 0.12.x
** Changed in: qemu-kvm (Debian) Status: Fix Committed => Fix Released -- sound broken in qemu 0.12.x https://bugs.launchpad.net/bugs/510612 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Status in “qemu” package in Debian: New Status in “qemu-kvm” package in Debian: Fix Released Bug description: In qemu 0.12.x there is a jitter when sound is played by guest Windows XP SP2. Also there are warnings on console: alsa: Unexpected state 1 alsa: Unexpected state 1 and qemu tries to eat 100% cpu. According to strace it polls alsa device continuously: futex(0x849e60, FUTEX_WAKE_PRIVATE, 1) = 1 select(23, [6 10 12 20 22], [18], [], {1, 0}) = 1 (out [18], left {0, 97}) poll([{fd=18, events=POLLOUT|POLLERR|POLLNVAL}], 1, 0) = 1 ([{fd=18, revents=POLLOUT|POLLERR}]) write(2, "alsa: "..., 6)= 6 write(2, "Unexpected state 1\n"..., 19) = 19 The bug appears with ac97 and es1370 guest sound devices (i didn't test others). With OSS host sound driver there are no warnings and consumption of cpu by qemu is normal, but jitter in the sound is still there.
[Qemu-devel] [Bug 595438] Re: KVM segmentation fault, using SCSI+writeback and linux 2.4 guest
** Changed in: qemu-kvm (Debian) Status: Confirmed => Fix Released -- KVM segmentation fault, using SCSI+writeback and linux 2.4 guest https://bugs.launchpad.net/bugs/595438 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in Kernel Virtual Machine: Confirmed Status in QEMU: Fix Committed Status in qemu-kvm: Fix Released Status in “qemu-kvm” package in Ubuntu: Fix Released Status in “qemu-kvm” source package in Lucid: Fix Committed Status in “qemu-kvm” package in Debian: Fix Released Bug description: I Use Ubuntu 32 bit 10.04 with standard KVM. I have Intel E7600 @ 3.06GHz processor with VMX In this system I Run: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -M pc-0.12 -enable-kvm -m 256 -smp 1 -name spamsender -uuid b9cacd5e-08f7-41fd-78c8-89cec59af881 -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/spamsender.monitor,server,nowait -monitor chardev:monitor -boot d -drive file=/mnt/megadiff/cdiso_400_130.iso,if=ide,media=cdrom,index=2 -drive file=/home/mmarkk/spamsender2.img,if=scsi,index=0,format=qcow2,cache=writeback -net nic,macaddr=00:00:00:00:00:00,vlan=0,name=nic.0 -net tap,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -vnc 127.0.0.1:0 -vga cirrus .iso image contain custom distro of 2.4-linux kernel based system. During install process (when .tar.gz actively unpacked), kvm dead with segmentation fault. And ONLY when I choose scsi virtual disk and writeback simultaneously. But, writeback+ide, writethrough+scsi works OK. I use qcow2. It seems, that qcow does not have such problems. Virtual machine get down at random time during file copy. It seems, when qcow2 file size need to be expanded. IMPACT: kvm used with scsi virtual disk and writeback dies with segfault. FIX: is the inclusion of a patch cherry-picked from upstream which dequeues requests before invoking callbacks. It is at http://bazaar.launchpad.net/~serge-hallyn/ubuntu/lucid/qemu-kvm/fix-scsi-writeback/revision/70 TO REPRODUCE: See the command above. REGRESSION POTENTIAL: this is cherry-picked from upstream, and has been tested by the bug reporter with no ill effects.
[Qemu-devel] [Bug 674740] Re: qemu segfaults when security_model=none using virtio-9p-pci driver
Add the space in 'thesecurity'. ** Patch added: "0002-include-a-missing-space-in-and-error-message.patch" https://bugs.launchpad.net/qemu/+bug/674740/+attachment/1731776/+files/0002-include-a-missing-space-in-and-error-message.patch -- qemu segfaults when security_model=none using virtio-9p-pci driver https://bugs.launchpad.net/bugs/674740 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: qemu version: 0.13 commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4 example invocation: $ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img one of the following must be specified as thesecurity option: security_model=passthrough security_model=mapped Segmentation fault Patch is attached. Also attached is a patch that addes the space between 'the' and 'security' in 'thesecurity'. Does not affect trunk.
[Qemu-devel] [Bug 674740] Re: qemu segfaults when security_model=none using virtio-9p-pci driver
** Patch added: "patch to fix the bug" https://bugs.launchpad.net/bugs/674740/+attachment/1731775/+files/0001-return-an-error-code-if-the-virtio-9p-pci-driver-can.patch -- qemu segfaults when security_model=none using virtio-9p-pci driver https://bugs.launchpad.net/bugs/674740 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: qemu version: 0.13 commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4 example invocation: $ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img one of the following must be specified as thesecurity option: security_model=passthrough security_model=mapped Segmentation fault Patch is attached. Also attached is a patch that addes the space between 'the' and 'security' in 'thesecurity'. Does not affect trunk.
[Qemu-devel] [Bug 674740] [NEW] qemu segfaults when security_model=none using virtio-9p-pci driver
Public bug reported: qemu version: 0.13 commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4 example invocation: $ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img one of the following must be specified as thesecurity option: security_model=passthrough security_model=mapped Segmentation fault Patch is attached. Also attached is a patch that addes the space between 'the' and 'security' in 'thesecurity'. Does not affect trunk. ** Affects: qemu Importance: Undecided Status: New -- qemu segfaults when security_model=none using virtio-9p-pci driver https://bugs.launchpad.net/bugs/674740 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: New Bug description: qemu version: 0.13 commit-id: 6ed912999d6ef636a5be5ccb266d7d3c0f0310b4 example invocation: $ qemu -virtfs local,path=/tmp,security_model=none,mount_tag=mmm r.img one of the following must be specified as thesecurity option: security_model=passthrough security_model=mapped Segmentation fault Patch is attached. Also attached is a patch that addes the space between 'the' and 'security' in 'thesecurity'. Does not affect trunk.
Re: [Qemu-devel] [PATCH 4/8] ARM: Return correct result for float-to-integer conversion of NaN
On 11 November 2010 19:21, Nathan Froyd wrote: > On Thu, Nov 11, 2010 at 06:23:58PM +, Peter Maydell wrote: >> The ARM architecture mandates that converting a NaN value to >> integer gives zero. This isn't the behaviour of the SoftFloat >> library, so NaNs must be special-cased. > > This is correct, but it's really only correct if FP traps are disabled. True (in that if you take a trap you don't return any result at all), but isn't it a bit of a red herring for this patchset given that qemu doesn't implement trapping on FP exceptions? I can make the commit message read "gives zero (if Invalid Operation FP exceptions are not being trapped)" if you think that would be clearer. -- PMM
[Qemu-devel] virtio-blk broken after system reset
Hi, both after hard and guest-initiated reset, something is seriously broken with virtio block devices. If I reset my Linux guest while still in grub, the bios will simply fail to read from the disk after the reboot. If I reset after Linux touched the device, qemu terminates: Breakpoint 1, 0x74b945b0 in _exit () from /lib64/libc.so.6 (gdb) bt #0 0x74b945b0 in _exit () from /lib64/libc.so.6 #1 0x74b2948d in __run_exit_handlers () from /lib64/libc.so.6 #2 0x74b29535 in exit () from /lib64/libc.so.6 #3 0x00568da3 in virtqueue_num_heads (vq=0x17040e0, idx=0) at /data/qemu/hw/virtio.c:258 #4 0x00569511 in virtqueue_pop (vq=0x17040e0, elem=0x17cea58) at /data/qemu/hw/virtio.c:388 #5 0x00419e31 in virtio_blk_get_request (s=0x1704010) at /data/qemu/hw/virtio-blk.c:132 #6 virtio_blk_handle_output (vdev=0x1704010, vq=) at /data/qemu/hw/virtio-blk.c:369 This is with current qemu.git head, haven't tried older versions. Known bug? Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] No Virtual Console
On Fri, Nov 12, 2010 at 16:49, wrote: > Yes - that's it! The SDL window. I can't get it to show up ... any ideas? make sure you have "SDL" package installed. Also, if you compile Qemu by your own, please install "SDL-devel" package before ./configure and make. -- regards, Mulyadi Santosa Freelance Linux trainer and consultant blog: the-hydra.blogspot.com training: mulyaditraining.blogspot.com
[Qemu-devel] [PATCH v2 2/9] pci: Remove pci_enable_capability_support()
This interface doesn't make much sense, adding a capability can take care of everything, just provide a means to register capability read/write handlers. Device assignment does it's own thing, so requires a couple ugly hacks that will be cleaned by subsequent patches. Signed-off-by: Alex Williamson --- hw/device-assignment.c | 12 --- hw/pci.c | 52 +--- hw/pci.h |9 +++- 3 files changed, 35 insertions(+), 38 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index bde231d..311c197 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1292,7 +1292,12 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) PCIRegion *pci_region = dev->real_device.regions; int next_cap_pt = 0; +pci_dev->cap.supported = 1; +pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR; pci_dev->cap.length = 0; +pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; +pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start; + #ifdef KVM_CAP_IRQ_ROUTING #ifdef KVM_CAP_DEVICE_MSI /* Expose MSI capability @@ -1468,9 +1473,10 @@ static int assigned_initfn(struct PCIDevice *pci_dev) dev->h_busnr = dev->host.bus; dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func); -if (pci_enable_capability_support(pci_dev, 0, NULL, -assigned_device_pci_cap_write_config, -assigned_device_pci_cap_init) < 0) +pci_register_capability_handlers(pci_dev, NULL, + assigned_device_pci_cap_write_config); + +if (assigned_device_pci_cap_init(pci_dev) < 0) goto out; /* assign device to guest */ diff --git a/hw/pci.c b/hw/pci.c index 4bc5882..607eb27 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -787,6 +787,8 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, config_write = pci_default_write_config; pci_dev->config_read = config_read; pci_dev->config_write = config_write; +pci_dev->cap.config_read = pci_default_cap_read_config; +pci_dev->cap.config_write = pci_default_cap_write_config; bus->devices[devfn] = pci_dev; pci_dev->irq = qemu_allocate_irqs(pci_set_irq, pci_dev, PCI_NUM_PINS); pci_dev->version_id = 2; /* Current pci device vmstate version */ @@ -1894,35 +1896,21 @@ PCIDevice *pci_create_simple_multifunction(PCIBus *bus, int devfn, return dev; } -int pci_enable_capability_support(PCIDevice *pci_dev, - uint32_t config_start, - PCICapConfigReadFunc *config_read, - PCICapConfigWriteFunc *config_write, - PCICapConfigInitFunc *config_init) +void pci_register_capability_handlers(PCIDevice *pdev, + PCICapConfigReadFunc *config_read, + PCICapConfigWriteFunc *config_write) { -if (!pci_dev) -return -ENODEV; - -pci_dev->config[0x06] |= 0x10; // status = capabilities - -if (config_start == 0) - pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR; -else if (config_start >= 0x40 && config_start < 0xff) -pci_dev->cap.start = config_start; -else -return -EINVAL; +if (config_read) { +pdev->cap.config_read = config_read; +} else { +pdev->cap.config_read = pci_default_cap_read_config; +} -if (config_read) -pci_dev->cap.config_read = config_read; -else -pci_dev->cap.config_read = pci_default_cap_read_config; -if (config_write) -pci_dev->cap.config_write = config_write; -else -pci_dev->cap.config_write = pci_default_cap_write_config; -pci_dev->cap.supported = 1; -pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start; -return config_init(pci_dev); +if (config_write) { +pdev->cap.config_write = config_write; +} else { +pdev->cap.config_write = pci_default_cap_write_config; +} } PCIDevice *pci_create(PCIBus *bus, int devfn, const char *name) @@ -2046,12 +2034,16 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id, config[PCI_CAP_LIST_ID] = cap_id; config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; pdev->config[PCI_CAPABILITY_LIST] = offset; -pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); /* Check capability by default */ memset(pdev->cmask + offset, 0xFF, size); + +pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; +pdev->cap.supported = 1; +pdev->cap.start = pdev->cap.start ? MIN(pdev->cap.start, offset) : offset; + return offset; } @@ -2079,8 +2071,10 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id,
[Qemu-devel] [PATCH v2 9/9] pci: Store capability offsets in PCIDevice
This not only makes pci_find_capability a directly lookup, but also allows us to better track added capabilities and avoids the proliferation of random additional capability offset markers. Signed-off-by: Alex Williamson --- hw/msix.c | 15 +++ hw/pci.c | 20 ++-- hw/pci.h |5 +++-- 3 files changed, 28 insertions(+), 12 deletions(-) diff --git a/hw/msix.c b/hw/msix.c index b98b34a..060f27b 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -204,7 +204,6 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_PENDING) | bar_nr); } -pdev->msix_cap = config_offset; /* Make flags bit writeable. */ pdev->wmask[config_offset + MSIX_CONTROL_OFFSET] |= MSIX_ENABLE_MASK | MSIX_MASKALL_MASK; @@ -253,7 +252,8 @@ static void msix_clr_pending(PCIDevice *dev, int vector) static int msix_function_masked(PCIDevice *dev) { -return dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & MSIX_MASKALL_MASK; +return dev->config[dev->caps[PCI_CAP_ID_MSIX] + + MSIX_CONTROL_OFFSET] & MSIX_MASKALL_MASK; } static int msix_is_masked(PCIDevice *dev, int vector) @@ -275,7 +275,7 @@ static void msix_handle_mask_update(PCIDevice *dev, int vector) void msix_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len) { -unsigned enable_pos = dev->msix_cap + MSIX_CONTROL_OFFSET; +unsigned enable_pos = dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET; int vector; if (!range_covers_byte(addr, len, enable_pos)) { @@ -334,7 +334,7 @@ static CPUReadMemoryFunc * const msix_mmio_read[] = { void msix_mmio_map(PCIDevice *d, int region_num, pcibus_t addr, pcibus_t size, int type) { -uint8_t *config = d->config + d->msix_cap; +uint8_t *config = d->config + d->caps[PCI_CAP_ID_MSIX]; uint32_t table = pci_get_long(config + MSIX_TABLE_OFFSET); uint32_t offset = table & ~(MSIX_PAGE_SIZE - 1); /* TODO: for assigned devices, we'll want to make it possible to map @@ -437,7 +437,6 @@ int msix_uninit(PCIDevice *dev) if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) return 0; pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH); -dev->msix_cap = 0; msix_free_irq_entries(dev); dev->msix_entries_nr = 0; cpu_unregister_io_memory(dev->msix_mmio_index); @@ -493,7 +492,7 @@ int msix_present(PCIDevice *dev) int msix_enabled(PCIDevice *dev) { return (dev->cap_present & QEMU_PCI_CAP_MSIX) && -(dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] & +(dev->config[dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET] & MSIX_ENABLE_MASK); } @@ -534,8 +533,8 @@ void msix_reset(PCIDevice *dev) if (!(dev->cap_present & QEMU_PCI_CAP_MSIX)) return; msix_free_irq_entries(dev); -dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &= - ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET]; +dev->config[dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET] &= + ~dev->wmask[dev->caps[PCI_CAP_ID_MSIX] + MSIX_CONTROL_OFFSET]; memset(dev->msix_table_page, 0, MSIX_PAGE_SIZE); msix_mask_all(dev, dev->msix_entries_nr); } diff --git a/hw/pci.c b/hw/pci.c index bc25be7..773afa5 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1990,15 +1990,24 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id, { uint8_t i, *config = pdev->config + offset; +/* Check overlap with existing capabilities, valid cap, already added */ for (i = 0; i < size; i++) { if (pdev->config_map[offset + i]) { return -EFAULT; } } +if (!cap_id || cap_id > PCI_CAP_ID_MAX) { +return -EINVAL; +} + +if (pdev->caps[cap_id]) { +return -EFAULT; +} + config[PCI_CAP_LIST_ID] = cap_id; config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; -pdev->config[PCI_CAPABILITY_LIST] = offset; +pdev->caps[cap_id] = pdev->config[PCI_CAPABILITY_LIST] = offset; memset(pdev->config_map + offset, cap_id, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); @@ -2033,6 +2042,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) /* Clear cmask as device-specific registers can't be checked */ memset(pdev->cmask + offset, 0, size); memset(pdev->config_map + offset, 0, size); +pdev->caps[cap_id] = 0; if (!pdev->config[PCI_CAPABILITY_LIST]) { pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST; @@ -2041,7 +2051,13 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id) { -return pci_find_capability_list(pdev, cap_id, NULL); +if (cap_id == PCI_CAP_ID_BASIC) { +return 0; +} +if (cap_id && cap_id <= PCI_CAP_ID_MAX
[Qemu-devel] [PATCH v2 8/9] pci: Remove capability read/write config handlers
These are just as easy to handle out of the main config read/write handlers. Also expand cap_map to config_map so we can use it to track all of config space. Signed-off-by: Alex Williamson --- hw/device-assignment.c | 22 +++- hw/pci.c | 66 hw/pci.h | 25 +++--- 3 files changed, 35 insertions(+), 78 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 179c7dc..85fa50d 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -63,6 +63,11 @@ static void assigned_dev_load_option_rom(AssignedDevice *dev); static void assigned_dev_unregister_msix_mmio(AssignedDevice *dev); +static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, + uint8_t cap_id, + uint32_t address, + uint32_t val, int len); + static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region, uint32_t addr, int len, uint32_t *val) { @@ -400,20 +405,25 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, { int fd; ssize_t ret; +uint8_t cap_id = pci_get_cap_id(d, address); AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", ((d->devfn >> 3) & 0x1F), (d->devfn & 0x7), (uint16_t) address, val, len); +if (cap_id && cap_id != PCI_CAP_ID_BASIC) { +return assigned_device_pci_cap_write_config(d, cap_id, address, +val, len); +} + if (address == 0x4) { pci_default_write_config(d, address, val, len); /* Continue to program the card */ } if ((address >= 0x10 && address <= 0x24) || address == 0x30 || -address == 0x34 || address == 0x3c || address == 0x3d || -pci_access_cap_config(d, address, len)) { +address == 0x34 || address == 0x3c || address == 0x3d) { /* used for update-mappings (BAR emulation) */ pci_default_write_config(d, address, val, len); return; @@ -443,13 +453,14 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, { uint32_t val = 0; int fd; +uint8_t cap_id = pci_get_cap_id(d, address); ssize_t ret; AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || (address >= 0x10 && address <= 0x24) || address == 0x30 || address == 0x34 || address == 0x3c || address == 0x3d || -pci_access_cap_config(d, address, len)) { +(cap_id && cap_id != PCI_CAP_ID_BASIC)) { val = pci_default_read_config(d, address, len); DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); @@ -1249,7 +1260,7 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { -pci_default_cap_write_config(pci_dev, cap_id, address, val, len); +pci_default_write_config(pci_dev, address, val, len); switch (cap_id) { #ifdef KVM_CAP_IRQ_ROUTING @@ -1466,9 +1477,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev) dev->h_busnr = dev->host.bus; dev->h_devfn = PCI_DEVFN(dev->host.dev, dev->host.func); -pci_register_capability_handlers(pci_dev, NULL, - assigned_device_pci_cap_write_config); - if (assigned_device_pci_cap_init(pci_dev) < 0) goto out; diff --git a/hw/pci.c b/hw/pci.c index 337afc4..bc25be7 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev) pci_dev->config = qemu_mallocz(config_size); pci_dev->cmask = qemu_mallocz(config_size); pci_dev->wmask = qemu_mallocz(config_size); -pci_dev->cap_map = qemu_mallocz(config_size); +pci_dev->config_map = qemu_mallocz(config_size); } static void pci_config_free(PCIDevice *pci_dev) @@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev) qemu_free(pci_dev->config); qemu_free(pci_dev->cmask); qemu_free(pci_dev->wmask); -qemu_free(pci_dev->cap_map); +qemu_free(pci_dev->config_map); } /* -1 for devfn means auto assign */ @@ -767,6 +767,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, pstrcpy(pci_dev->name, sizeof(pci_dev->name), name); pci_dev->irq_state = 0; pci_config_alloc(pci_dev); +memset(pci_dev->config_map, PCI_CAP_ID_BASIC, PCI_CONFIG_HEADER_SIZE); if (!is_bridge) { pci_set_default_subsystem_id(pci_dev); @@ -787,8 +788,6 @@ s
[Qemu-devel] [PATCH v2 6/9] device-assignment: Move PCI capabilities to match physical hardware
Now that common PCI code doesn't have a hangup on capabilities being contiguous, move assigned device capabilities to match their offset on physical hardware. This helps for drivers that assume a capability configuration and don't bother searching. We can also remove several calls to assigned_dev_pci_read_* because we're overlaying the capability at the same location as the initial copy we made of config space. We can therefore just use pci_get_*. Signed-off-by: Alex Williamson --- hw/device-assignment.c | 67 +++- 1 files changed, 21 insertions(+), 46 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 322fa9f..39f19be 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -366,16 +366,6 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice *d, int pos) return (uint8_t)assigned_dev_pci_read(d, pos, 1); } -static uint16_t assigned_dev_pci_read_word(PCIDevice *d, int pos) -{ -return (uint16_t)assigned_dev_pci_read(d, pos, 2); -} - -static uint32_t assigned_dev_pci_read_long(PCIDevice *d, int pos) -{ -return assigned_dev_pci_read(d, pos, 4); -} - static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) { int id; @@ -1285,6 +1275,7 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) { AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev); PCIRegion *pci_region = dev->real_device.regions; +int pos; /* Clear initial capabilities pointer and status copied from hw */ pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0); @@ -1296,60 +1287,44 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) #ifdef KVM_CAP_DEVICE_MSI /* Expose MSI capability * MSI capability is the 1st capability in capability config */ -if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) { -int vpos, ppos; -uint16_t flags; - +if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI))) { dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; -vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSI, - PCI_CAPABILITY_CONFIG_MSI_LENGTH); - -memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0, - PCI_CAPABILITY_CONFIG_MSI_LENGTH - PCI_CAP_FLAGS); +pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSI, pos, + PCI_CAPABILITY_CONFIG_MSI_LENGTH); /* Only 32-bit/no-mask currently supported */ -ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI); -flags = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSI_FLAGS); -flags &= PCI_MSI_FLAGS_QMASK; -pci_set_word(pci_dev->config + vpos + PCI_MSI_FLAGS, flags); +pci_set_word(pci_dev->config + pos + PCI_MSI_FLAGS, + pci_get_word(pci_dev->config + pos + PCI_MSI_FLAGS) & + PCI_MSI_FLAGS_QMASK); +pci_set_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO, 0); +pci_set_word(pci_dev->config + pos + PCI_MSI_DATA_32, 0); /* Set writable fields */ -pci_set_word(pci_dev->wmask + vpos + PCI_MSI_FLAGS, +pci_set_word(pci_dev->wmask + pos + PCI_MSI_FLAGS, PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE); -pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffc); -pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0x); +pci_set_long(pci_dev->wmask + pos + PCI_MSI_ADDRESS_LO, 0xfffc); +pci_set_word(pci_dev->wmask + pos + PCI_MSI_DATA_32, 0x); } #endif #ifdef KVM_CAP_DEVICE_MSIX /* Expose MSI-X capability */ -if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX)) { -int vpos, ppos, entry_nr, bar_nr; +if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX))) { +int bar_nr; uint32_t msix_table_entry; dev->cap.available |= ASSIGNED_DEVICE_CAP_MSIX; -vpos = pci_add_capability(pci_dev, PCI_CAP_ID_MSIX, - PCI_CAPABILITY_CONFIG_MSIX_LENGTH); +pci_add_capability_at_offset(pci_dev, PCI_CAP_ID_MSIX, pos, + PCI_CAPABILITY_CONFIG_MSIX_LENGTH); -memset(pci_dev->config + vpos + PCI_CAP_FLAGS, 0, - PCI_CAPABILITY_CONFIG_MSIX_LENGTH - PCI_CAP_FLAGS); +pci_set_word(pci_dev->config + pos + PCI_MSIX_FLAGS, + pci_get_word(pci_dev->config + pos + PCI_MSIX_FLAGS) & + PCI_MSIX_TABSIZE); /* Only enable and function mask bits are writable */ -pci_set_word(pci_dev->wmask + vpos + PCI_MSIX_FLAGS, +pci_set_word(pci_dev->wmask + pos + PCI_MSIX_FLAGS, PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL); -ppos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX); - -entry_nr = assigned_dev_pci_read_word(pci_dev, ppos + PCI_MSIX_FLAGS); -entry_nr &= PCI_MSIX_TABSIZE; -pci_se
[Qemu-devel] [PATCH v2 5/9] pci: Remove cap.length, cap.start, cap.supported
Capabilities aren't required to be contiguous, so cap.length never really made much sense. Likewise, cap.start is mostly meaningless too. Both of these are better served by the capability map. We can also get rid of cap.supported, since it's really now unused and redundant with flag in the status word anyway. Signed-off-by: Alex Williamson --- hw/device-assignment.c |4 hw/pci.c |8 +--- hw/pci.h |2 -- 3 files changed, 1 insertions(+), 13 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 74cdd26..322fa9f 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1292,8 +1292,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) pci_get_word(pci_dev->config + PCI_STATUS) & ~PCI_STATUS_CAP_LIST); -pci_dev->cap.length = 0; - #ifdef KVM_CAP_IRQ_ROUTING #ifdef KVM_CAP_DEVICE_MSI /* Expose MSI capability @@ -1320,7 +1318,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE); pci_set_long(pci_dev->wmask + vpos + PCI_MSI_ADDRESS_LO, 0xfffc); pci_set_long(pci_dev->wmask + vpos + PCI_MSI_DATA_32, 0x); -pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSI_LENGTH; } #endif #ifdef KVM_CAP_DEVICE_MSIX @@ -1356,7 +1353,6 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) bar_nr = msix_table_entry & PCI_MSIX_BIR; msix_table_entry &= ~PCI_MSIX_BIR; dev->msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry; -pci_dev->cap.length += PCI_CAPABILITY_CONFIG_MSIX_LENGTH; } #endif #endif diff --git a/hw/pci.c b/hw/pci.c index 80610b3..a0a6126 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1191,10 +1191,7 @@ static void pci_write_config_with_mask(PCIDevice *d, uint32_t addr, int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len) { -if (pci_dev->cap.supported && address >= pci_dev->cap.start && -(address + len) < pci_dev->cap.start + pci_dev->cap.length) -return 1; -return 0; +return pci_dev->cap_map[address]; } uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, @@ -2041,8 +2038,6 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id, memset(pdev->cmask + offset, 0xFF, size); pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; -pdev->cap.supported = 1; -pdev->cap.start = pdev->cap.start ? MIN(pdev->cap.start, offset) : offset; return offset; } @@ -2073,7 +2068,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) if (!pdev->config[PCI_CAPABILITY_LIST]) { pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST; -pdev->cap.start = pdev->cap.length = 0; } } diff --git a/hw/pci.h b/hw/pci.h index 2265c70..177008a 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -208,8 +208,6 @@ struct PCIDevice { /* Device capability configuration space */ struct { -int supported; -unsigned int start, length; PCICapConfigReadFunc *config_read; PCICapConfigWriteFunc *config_write; } cap;
[Qemu-devel] [PATCH v2 4/9] pci: Replace used bitmap with capability byte map
Capabilities are allocated in bytes, so we can track both whether a byte is used and by what capability in the same structure. Remove pci_reserve_capability() as there are no users. Signed-off-by: Alex Williamson --- hw/pci.c | 16 +--- hw/pci.h |6 ++ 2 files changed, 7 insertions(+), 15 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 607eb27..80610b3 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -730,7 +730,7 @@ static void pci_config_alloc(PCIDevice *pci_dev) pci_dev->config = qemu_mallocz(config_size); pci_dev->cmask = qemu_mallocz(config_size); pci_dev->wmask = qemu_mallocz(config_size); -pci_dev->used = qemu_mallocz(config_size); +pci_dev->cap_map = qemu_mallocz(config_size); } static void pci_config_free(PCIDevice *pci_dev) @@ -738,7 +738,7 @@ static void pci_config_free(PCIDevice *pci_dev) qemu_free(pci_dev->config); qemu_free(pci_dev->cmask); qemu_free(pci_dev->wmask); -qemu_free(pci_dev->used); +qemu_free(pci_dev->cap_map); } /* -1 for devfn means auto assign */ @@ -1929,7 +1929,7 @@ static int pci_find_space(PCIDevice *pdev, uint8_t size) int offset = PCI_CONFIG_HEADER_SIZE; int i; for (i = PCI_CONFIG_HEADER_SIZE; i < config_size; ++i) -if (pdev->used[i]) +if (pdev->cap_map[i]) offset = i + 1; else if (i - offset + 1 == size) return offset; @@ -2034,7 +2034,7 @@ int pci_add_capability_at_offset(PCIDevice *pdev, uint8_t cap_id, config[PCI_CAP_LIST_ID] = cap_id; config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; pdev->config[PCI_CAPABILITY_LIST] = offset; -memset(pdev->used + offset, 0xFF, size); +memset(pdev->cap_map + offset, cap_id, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); /* Check capability by default */ @@ -2069,7 +2069,7 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) memset(pdev->wmask + offset, 0xff, size); /* Clear cmask as device-specific registers can't be checked */ memset(pdev->cmask + offset, 0, size); -memset(pdev->used + offset, 0, size); +memset(pdev->cap_map + offset, 0, size); if (!pdev->config[PCI_CAPABILITY_LIST]) { pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST; @@ -2077,12 +2077,6 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) } } -/* Reserve space for capability at a known offset (to call after load). */ -void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size) -{ -memset(pdev->used + offset, 0xff, size); -} - uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id) { return pci_find_capability_list(pdev, cap_id, NULL); diff --git a/hw/pci.h b/hw/pci.h index 0712e55..2265c70 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -151,8 +151,8 @@ struct PCIDevice { /* Used to implement R/W bytes */ uint8_t *wmask; -/* Used to allocate config space for capabilities. */ -uint8_t *used; +/* Used to allocate config space and track capabilities. */ +uint8_t *cap_map; /* the following fields are read only */ PCIBus *bus; @@ -239,8 +239,6 @@ int pci_add_capability_at_offset(PCIDevice *pci_dev, uint8_t cap_id, void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size); -void pci_reserve_capability(PCIDevice *pci_dev, uint8_t offset, uint8_t size); - uint8_t pci_find_capability(PCIDevice *pci_dev, uint8_t cap_id); uint32_t pci_default_read_config(PCIDevice *d,
[Qemu-devel] [PATCH v2 7/9] pci: Pass ID for capability read/write handlers
Any handlers that actually want to interact with specific capabilities are going to want to know the capability ID being accessed. With the capability map, this is readily available, so we can save handlers the trouble of figuring it out. Signed-off-by: Alex Williamson --- hw/device-assignment.c | 36 ++-- hw/pci.c | 14 -- hw/pci.h |8 3 files changed, 34 insertions(+), 24 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 39f19be..179c7dc 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1244,30 +1244,38 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos) #endif #endif -static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t address, +static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, + uint8_t cap_id, + uint32_t address, uint32_t val, int len) { -AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev); +pci_default_cap_write_config(pci_dev, cap_id, address, val, len); -pci_default_cap_write_config(pci_dev, address, val, len); +switch (cap_id) { #ifdef KVM_CAP_IRQ_ROUTING +case PCI_CAP_ID_MSI: #ifdef KVM_CAP_DEVICE_MSI -if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) { -int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI); -if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) { -assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS); +{ +uint8_t cap = pci_find_capability(pci_dev, cap_id); +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) { +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS); +} } -} #endif +break; + +case PCI_CAP_ID_MSIX: #ifdef KVM_CAP_DEVICE_MSIX -if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) { -int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); -if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) { -assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS); - } -} +{ +uint8_t cap = pci_find_capability(pci_dev, cap_id); +if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) { +assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS); +} +} #endif +break; #endif +} return; } diff --git a/hw/pci.c b/hw/pci.c index a0a6126..337afc4 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1168,10 +1168,11 @@ static uint32_t pci_read_config(PCIDevice *d, uint32_t pci_default_read_config(PCIDevice *d, uint32_t address, int len) { +uint8_t cap_id; assert(len == 1 || len == 2 || len == 4); -if (pci_access_cap_config(d, address, len)) { -return d->cap.config_read(d, address, len); +if ((cap_id = pci_access_cap_config(d, address, len))) { +return d->cap.config_read(d, cap_id, address, len); } return pci_read_config(d, address, len); @@ -1194,13 +1195,13 @@ int pci_access_cap_config(PCIDevice *pci_dev, uint32_t address, int len) return pci_dev->cap_map[address]; } -uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, +uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, uint8_t cap_id, uint32_t address, int len) { return pci_read_config(pci_dev, address, len); } -void pci_default_cap_write_config(PCIDevice *pci_dev, +void pci_default_cap_write_config(PCIDevice *pci_dev, uint8_t cap_id, uint32_t address, uint32_t val, int len) { pci_write_config_with_mask(pci_dev, address, val, len); @@ -1209,9 +1210,10 @@ void pci_default_cap_write_config(PCIDevice *pci_dev, void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { int was_irq_disabled = pci_irq_disabled(d); +uint8_t cap_id; -if (pci_access_cap_config(d, addr, l)) { -d->cap.config_write(d, addr, val, l); +if ((cap_id = pci_access_cap_config(d, addr, l))) { +d->cap.config_write(d, cap_id, addr, val, l); return; } diff --git a/hw/pci.h b/hw/pci.h index 177008a..3f0b4e0 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -83,9 +83,9 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num, pcibus_t addr, pcibus_t size, int type); typedef int PCIUnregisterFunc(PCIDevice *pci_dev); -typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, +typedef void PCICapConfigWriteFunc(PCIDevice *pci_dev, uint8_t cap_id, uint32_t address, uint32_t val, int len); -typedef uint32_t PCICapConfigReadFunc(PCIDevice *pci_dev, +typedef uint32_
[Qemu-devel] [PATCH v2 3/9] device-assignment: Use PCI capabilities support
Convert to use common pci_add_capabilities() rather than creating our own mess. Signed-off-by: Alex Williamson --- hw/device-assignment.c | 112 +++- 1 files changed, 63 insertions(+), 49 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 311c197..74cdd26 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -38,6 +38,7 @@ #include "device-assignment.h" #include "loader.h" #include "monitor.h" +#include "range.h" #include /* From linux/ioport.h */ @@ -1075,17 +1076,17 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, unsigned int ctrl_pos) } if (ctrl_byte & PCI_MSI_FLAGS_ENABLE) { +int pos = ctrl_pos - PCI_MSI_FLAGS; assigned_dev->entry = calloc(1, sizeof(struct kvm_irq_routing_entry)); if (!assigned_dev->entry) { perror("assigned_dev_update_msi: "); return; } assigned_dev->entry->u.msi.address_lo = -*(uint32_t *)(pci_dev->config + pci_dev->cap.start + - PCI_MSI_ADDRESS_LO); +pci_get_long(pci_dev->config + pos + PCI_MSI_ADDRESS_LO); assigned_dev->entry->u.msi.address_hi = 0; -assigned_dev->entry->u.msi.data = *(uint16_t *)(pci_dev->config + -pci_dev->cap.start + PCI_MSI_DATA_32); +assigned_dev->entry->u.msi.data = +pci_get_word(pci_dev->config + pos + PCI_MSI_DATA_32); assigned_dev->entry->type = KVM_IRQ_ROUTING_MSI; r = kvm_get_irq_route_gsi(); if (r < 0) { @@ -1123,10 +1124,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) struct kvm_assigned_msix_entry msix_entry; void *va = adev->msix_table_page; -if (adev->cap.available & ASSIGNED_DEVICE_CAP_MSI) -pos = pci_dev->cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH; -else -pos = pci_dev->cap.start; +pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); entries_max_nr = *(uint16_t *)(pci_dev->config + pos + 2); entries_max_nr &= PCI_MSIX_TABSIZE; @@ -1260,26 +1258,23 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad uint32_t val, int len) { AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev); -unsigned int pos = pci_dev->cap.start, ctrl_pos; pci_default_cap_write_config(pci_dev, address, val, len); #ifdef KVM_CAP_IRQ_ROUTING #ifdef KVM_CAP_DEVICE_MSI if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSI) { -ctrl_pos = pos + PCI_MSI_FLAGS; -if (address <= ctrl_pos && address + len > ctrl_pos) -assigned_dev_update_msi(pci_dev, ctrl_pos); -pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH; +int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSI); +if (ranges_overlap(address, len, pos + PCI_MSI_FLAGS, 1)) { +assigned_dev_update_msi(pci_dev, pos + PCI_MSI_FLAGS); +} } #endif #ifdef KVM_CAP_DEVICE_MSIX if (assigned_dev->cap.available & ASSIGNED_DEVICE_CAP_MSIX) { -ctrl_pos = pos + 3; -if (address <= ctrl_pos && address + len > ctrl_pos) { -ctrl_pos--; /* control is word long */ -assigned_dev_update_msix(pci_dev, ctrl_pos); +int pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); +if (ranges_overlap(address, len, pos + PCI_MSIX_FLAGS + 1, 1)) { +assigned_dev_update_msix(pci_dev, pos + PCI_MSIX_FLAGS); } -pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH; } #endif #endif @@ -1290,58 +1285,77 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) { AssignedDevice *dev = container_of(pci_dev, AssignedDevice, dev); PCIRegion *pci_region = dev->real_device.regions; -int next_cap_pt = 0; -pci_dev->cap.supported = 1; -pci_dev->cap.start = PCI_CAPABILITY_CONFIG_DEFAULT_START_ADDR; +/* Clear initial capabilities pointer and status copied from hw */ +pci_set_byte(pci_dev->config + PCI_CAPABILITY_LIST, 0); +pci_set_word(pci_dev->config + PCI_STATUS, + pci_get_word(pci_dev->config + PCI_STATUS) & + ~PCI_STATUS_CAP_LIST); + pci_dev->cap.length = 0; -pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; -pci_dev->config[PCI_CAPABILITY_LIST] = pci_dev->cap.start; #ifdef KVM_CAP_IRQ_ROUTING #ifdef KVM_CAP_DEVICE_MSI /* Expose MSI capability * MSI capability is the 1st capability in capability config */ if (pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSI)) { +int vpos, ppos; +uint16_t flags; + dev->cap.available |= ASSIGNED_DEVICE_CAP_MSI; -memset(&pci_dev->config[pci_dev->cap.start + pci_dev->cap.length], - 0, PCI_CAPABILITY_CONFIG_MSI_LENGTH); -pci_dev->config[pci_dev->cap.start + pci_dev->cap.length] = -PCI_CAP_ID_MSI; +
[Qemu-devel] [PATCH v2 1/9] pci: pci_default_cap_write_config ignores wmask
Make use of wmask, just like the rest of config space. Signed-off-by: Alex Williamson --- hw/pci.c | 22 ++ 1 files changed, 10 insertions(+), 12 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 92aaa85..4bc5882 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1175,13 +1175,15 @@ uint32_t pci_default_read_config(PCIDevice *d, return pci_read_config(d, address, len); } -static void pci_write_config(PCIDevice *pci_dev, - uint32_t address, uint32_t val, int len) +static void pci_write_config_with_mask(PCIDevice *d, uint32_t addr, + uint32_t val, int l) { int i; -for (i = 0; i < len; i++) { -pci_dev->config[address + i] = val & 0xff; -val >>= 8; +uint32_t config_size = pci_config_size(d); + +for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) { +uint8_t wmask = d->wmask[addr + i]; +d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); } } @@ -1202,23 +1204,19 @@ uint32_t pci_default_cap_read_config(PCIDevice *pci_dev, void pci_default_cap_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len) { -pci_write_config(pci_dev, address, val, len); +pci_write_config_with_mask(pci_dev, address, val, len); } void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l) { -int i, was_irq_disabled = pci_irq_disabled(d); -uint32_t config_size = pci_config_size(d); +int was_irq_disabled = pci_irq_disabled(d); if (pci_access_cap_config(d, addr, l)) { d->cap.config_write(d, addr, val, l); return; } -for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) { -uint8_t wmask = d->wmask[addr + i]; -d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); -} +pci_write_config_with_mask(d, addr, val, l); #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT if (kvm_enabled() && kvm_irqchip_in_kernel() &&
[Qemu-devel] [PATCH v2 0/9] PCI capability and device assignment improvements
v2: - Fixed the function name in 1/8 per Michael's suggestion. - Removed capability specific config read/write registration - Added more checks to add_capability - Added capability lookup table to PCIDevice I've dropped the RFC patch to add more capabilities to device assignment while I do some more work on it. Please feel free to comment on the v1 version though. Patches still against qemu-kvm, but I hope some of this makes it easier to bring qemu & qemu-kvm closer here. Thanks, Alex v1: This series attempts to clean up capability support between common code and device assignment. In doing so, we can move existing MSI & MSI-X capabilities to offsets matching real hardware, and further enable more capabilities to be exposed. The last patch is only for RFC, I'd like some input on what we should pass directly and where we should only provide read-only/emulated access. Patches 1-7 are submitted for commit. Thanks, Alex --- Alex Williamson (9): pci: Store capability offsets in PCIDevice pci: Remove capability read/write config handlers pci: Pass ID for capability read/write handlers device-assignment: Move PCI capabilities to match physical hardware pci: Remove cap.length, cap.start, cap.supported pci: Replace used bitmap with capability byte map device-assignment: Use PCI capabilities support pci: Remove pci_enable_capability_support() pci: pci_default_cap_write_config ignores wmask hw/device-assignment.c | 157 +--- hw/msix.c | 15 ++--- hw/pci.c | 132 +++- hw/pci.h | 39 ++-- 4 files changed, 145 insertions(+), 198 deletions(-)
Re: [Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal
Kevin Wolf writes: > I have to admit that I didn't follow your discussion very closely any > more after a few versions, so just to confirm: You came to the > conclusion that we want to add drive_del to QMP and not only the human > monitor, even though there is no drive_add in QMP? Fair question. -drive and drive_add conflate host and guest part. They were hacked up to work with -device, and it shows. I'd rather not copy the mess to QMP. Maybe it's best to leave out the QMP part for now, i.e. drop PATCH 2/2. If we can't get blockdev_add, blockdev_del done in time, we reconsider. > The HMP help text doesn't seem to be completely right (sent a comment), > but once that's fixed, I'm going to merge the series based on Markus's ACK.
Re: [Qemu-devel] Re: [PATCH v4 3/5] qed: Table, L2 cache, and cluster functions
On Fri, Nov 12, 2010 at 5:26 PM, Kevin Wolf wrote: > Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: >> This patch adds code to look up data cluster offsets in the image via >> the L1/L2 tables. The L2 tables are writethrough cached in memory for >> performance (each read/write requires a lookup so it is essential to >> cache the tables). >> >> With cluster lookup code in place it is possible to implement >> bdrv_is_allocated() to query the number of contiguous >> allocated/unallocated clusters. >> >> Signed-off-by: Stefan Hajnoczi > > Didn't find any obvious problems with this one, but maybe that just > means that I'm tired and should stop working for today. ;-) Thanks for your review effort, Kevin. Stefan
[Qemu-devel] Re: [PATCH 0/2] v8 Decouple block device removal from device removal
Am 12.11.2010 18:07, schrieb Ryan Harper: > details, details, v8 > > This patch series decouples the detachment of a block device from the > removal of the backing pci-device. Removal of a hotplugged pci device > requires the guest to respond before qemu tears down the block device. > In some cases, the guest may not respond leaving the guest with > continued access to the block device. Mgmt layer doesn't have a > reliable method to force a disconnect. > > The new monitor command, drive_del, will revoke a guests access to the > block device independently of the removal of the pci device. > > The first patch implements drive_del, the second patch implements the > qmp version of the monitor command. > > Changes since v7: > - Fixed up doc strings (delete -> drive_del) > Changes since v6: > - Updated patch description > - Dropped bdrv_unplug and inlined in drive_del > - Explicitly invoke drive_uninit() > Changes since v5: > - Removed dangling pointers in guest and host state. This ensures things > like > info block no longer displays the deleted drive; though info pci will > continue to display the pci device until the guest responds to the removal > request. > - Renamed drive_unplug -> drive_del > Changes since v4: > - Droppped drive_get_by_id patch and use bdrv_find() instead > - Added additional details about drive_unplug to hmp/qmp interface > > Changes since v3: > - Moved QMP command for drive_unplug() to separate patch > > Changes since v2: > - Added QMP command for drive_unplug() > > Changes since v1: > - CodingStyle fixes > - Added qemu_aio_flush() to bdrv_unplug() > > Signed-off-by: Ryan Harper Thanks, applied both to the block branch. Kevin
[Qemu-devel] Re: [PATCH v4 3/5] qed: Table, L2 cache, and cluster functions
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: > This patch adds code to look up data cluster offsets in the image via > the L1/L2 tables. The L2 tables are writethrough cached in memory for > performance (each read/write requires a lookup so it is essential to > cache the tables). > > With cluster lookup code in place it is possible to implement > bdrv_is_allocated() to query the number of contiguous > allocated/unallocated clusters. > > Signed-off-by: Stefan Hajnoczi Didn't find any obvious problems with this one, but maybe that just means that I'm tired and should stop working for today. ;-) Kevin
Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format
Am 12.11.2010 18:24, schrieb Markus Armbruster: > Kevin Wolf writes: > >> Am 12.11.2010 17:34, schrieb Stefan Hajnoczi: >>> On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf wrote: Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: > +/** > + * Check whether an image format is raw > + * > + * @fmt:Backing file format, may be NULL > + */ > +static bool qed_fmt_is_raw(const char *fmt) > +{ > +return fmt && strcmp(fmt, "raw") == 0; > +} People shouldn't use them directly, but should we also consider file, host_device, etc.? >>> >>> Hrm..I will look into it for v5. I thought we always have a "raw" >>> format on top of "file", "host_device", etc protocols? >> >> That's how it's meant to be used. I think at the moment we still allow >> to directly use file etc. and you know that if something is possible, >> users will do it. > > It's possible, it's perfectly well-defined, and it works. Wow, I'm not used to be confirmed that quickly! Yes, it's possible, it works, but it's only exposed because we didn't pay attention when host_* was split out. There's no real reason to use it (or do you have any example where format=raw doesn't work?) and it should be considered an implementation detail. Kevin
Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format
Kevin Wolf writes: > Am 12.11.2010 17:34, schrieb Stefan Hajnoczi: >> On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf wrote: >>> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: +/** + * Check whether an image format is raw + * + * @fmt:Backing file format, may be NULL + */ +static bool qed_fmt_is_raw(const char *fmt) +{ +return fmt && strcmp(fmt, "raw") == 0; +} >>> >>> People shouldn't use them directly, but should we also consider file, >>> host_device, etc.? >> >> Hrm..I will look into it for v5. I thought we always have a "raw" >> format on top of "file", "host_device", etc protocols? > > That's how it's meant to be used. I think at the moment we still allow > to directly use file etc. and you know that if something is possible, > users will do it. It's possible, it's perfectly well-defined, and it works. > But instead of listing all of them here, maybe we should just make the > final step and return an error if they are used directly. No. > Unfortunately, > that will bring us back to the discussion about formats and protocols... ;-) /me runs away screaming [...]
[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal
Currently device hotplug removal code is tied to device removal via ACPI. All pci devices that are removable via device_del() require the guest to respond to the request. In some cases the guest may not respond leaving the device still accessible to the guest. The management layer doesn't currently have a reliable way to revoke access to host resource in the presence of an uncooperative guest. This patch implements a new monitor command, drive_del, which provides an explicit command to revoke access to a host block device. drive_del first quiesces the block device (qemu_aio_flush; bdrv_flush() and bdrv_close()). This prevents further IO from being submitted against the host device. Finally, drive_del cleans up pointers between the drive object (host resource) and the device object (guest resource). Signed-off-by: Ryan Harper --- blockdev.c | 39 +++ blockdev.h |1 + hmp-commands.hx | 18 ++ 3 files changed, 58 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 6cb179a..f6ac439 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include "qemu-option.h" #include "qemu-config.h" #include "sysemu.h" +#include "hw/qdev.h" +#include "block_int.h" static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, "id"); +BlockDriverState *bs; +BlockDriverState **ptr; +Property *prop; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* quiesce block driver; prevent further io */ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); + +/* clean up guest state from pointing to host resource by + * finding and removing DeviceState "drive" property */ +for (prop = bs->peer->info->props; prop && prop->name; prop++) { +if (prop->info->type == PROP_TYPE_DRIVE) { +ptr = qdev_get_prop_ptr(bs->peer, prop); +if ((*ptr) == bs) { +bdrv_detach(bs, bs->peer); +*ptr = NULL; +break; +} +} +} + +/* clean up host side */ +drive_uninit(drive_get_by_blockdev(bs)); + +return 0; +} diff --git a/blockdev.h b/blockdev.h index 653affc..2a0559e 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index e5585ba..c8b0206 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = "drive_del", +.args_type = "id:s", +.params = "device", +.help = "remove host block device", +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +STEXI +...@item drive_del @var{device} +...@findex drive_del +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. +ETEXI + +{ .name = "change", .args_type = "device:B,target:F,arg:s?", .params = "device filename [format]", -- 1.6.3.3
[Qemu-devel] [PATCH 0/2] v8 Decouple block device removal from device removal
details, details, v8 This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. Mgmt layer doesn't have a reliable method to force a disconnect. The new monitor command, drive_del, will revoke a guests access to the block device independently of the removal of the pci device. The first patch implements drive_del, the second patch implements the qmp version of the monitor command. Changes since v7: - Fixed up doc strings (delete -> drive_del) Changes since v6: - Updated patch description - Dropped bdrv_unplug and inlined in drive_del - Explicitly invoke drive_uninit() Changes since v5: - Removed dangling pointers in guest and host state. This ensures things like info block no longer displays the deleted drive; though info pci will continue to display the pci device until the guest responds to the removal request. - Renamed drive_unplug -> drive_del Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper
[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del
Signed-off-by: Ryan Harper --- qmp-commands.hx | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..1e0d4e9 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,35 @@ Example: EQMP { +.name = "drive_del", +.args_type = "id:s", +.params = "device", +.help = "remove host block device", +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +SQMP +drive del +-- + +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. + +Arguments: + +- "id": the device's ID (json-string) + +Example: + +-> { "execute": "drive_del", "arguments": { "id": "drive-virtio-blk1" } } +<- { "return": {} } + +EQMP + +{ .name = "cpu", .args_type = "index:i", .params = "index", -- 1.6.3.3
Re: [Qemu-devel] Re: [Try2][PATCH] Initial implementation of a mpeg1 layer2 streaming audio driver.
Le 12 nov. 2010 à 15:32, Anthony Liguori a écrit : >> I did try years ago, but at least the current wav driver really didn't like >> fifos back then. I recall trying for hours to get it pipe to ffmpeg or >> others without much luck. >> >> Also, this poses several problems about the control of the external process >> (respawn on listener disconnection, close on exit...). > > Doing encoding in QEMU is going to starve the guest pretty bad. For an x86 > guest, that's going to result in issues with time drift, etc. > > Making reencoding with an external process work a bit more nicely also has > the advantage of making this work with other formats like Ogg which are a bit > more Open Source friendly. The current patch uses a separate thread, but since I clone this part from the esdaudio code I didn't check what it was doing. It seems it's not really queueing anything though except the single mix buffer, which kind of defeats the purpose of having a separate thread. This might explain why it lags so much here... I tried increasing the buffer size and lowering the threshold but it doesn't seem to help. I agree it'd be better to use external programs when possible but as I said it's a bit harder to handle the errors and such, and I wanted to have something working. Also, it requires more work to set it up for users, they must install the externals, figure out the command line options... Possibly we can provide default templates for known programs, either text files with % escaping for args, or just a shell script passing env vars maybe... Besides, external or not, IIRC a pipe is by default 4kB max, which isn't much better for decoupling the processing on its own, if the encoder is too slow it will still starve the audio thread, and the rest. Also it all requires more context switching and IPC, which increase the total processing time. So I think it might be interesting to have both. I'll see if I can buffer a bit more in the twolame code and if it helps, then I'll try to merge with the failed attempts I have around at using external progs. François.
Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format
Am 12.11.2010 17:34, schrieb Stefan Hajnoczi: > On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf wrote: >> Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: >>> +/** >>> + * Check whether an image format is raw >>> + * >>> + * @fmt:Backing file format, may be NULL >>> + */ >>> +static bool qed_fmt_is_raw(const char *fmt) >>> +{ >>> +return fmt && strcmp(fmt, "raw") == 0; >>> +} >> >> People shouldn't use them directly, but should we also consider file, >> host_device, etc.? > > Hrm..I will look into it for v5. I thought we always have a "raw" > format on top of "file", "host_device", etc protocols? That's how it's meant to be used. I think at the moment we still allow to directly use file etc. and you know that if something is possible, users will do it. But instead of listing all of them here, maybe we should just make the final step and return an error if they are used directly. Unfortunately, that will bring us back to the discussion about formats and protocols... ;-) >>> +if (s->header.magic != QED_MAGIC) { >>> +return -ENOENT; >>> +} >> >> ENOENT seems a bit odd for a wrong magic number, especially if it's used >> for the error message. Wouldn't EINVAL or ENOTSUP be a closer match? > > You're right, ENOENT is confusing for the user. > >>> +static void bdrv_qed_flush(BlockDriverState *bs) >>> +{ >>> +bdrv_flush(bs->file); >>> +} >> >> This conflicts with one of my recent changes. bdrv_flush should return >> an int now. > > Will fix and will also check for bdrv_flush() failures. > >>> +while (options && options->name) { >>> +if (!strcmp(options->name, BLOCK_OPT_SIZE)) { >>> +image_size = options->value.n; >>> +} else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) { >>> +backing_file = options->value.s; >>> +} else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) { >>> +backing_fmt = options->value.s; >> >> I'm not sure here. It doesn't really matter for QED, but I find it >> strange that -o backing_fmt=foobar works and gives you a non-raw backing >> file. Should we check if the format exists at least? > > I see the same issue in QCOW2 so let's solve this generically in a > separate patch. Okay, that makes sense. Kevin
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
On Fri, 12 Nov 2010 17:06:16 +0100 Markus Armbruster wrote: > Luiz Capitulino writes: > > > On Fri, 12 Nov 2010 16:04:39 +0100 > > Markus Armbruster wrote: > > > >> Luiz Capitulino writes: > >> > >> > On Fri, 12 Nov 2010 15:16:33 +0100 > >> > Markus Armbruster wrote: > >> > > >> >> Luiz Capitulino writes: > >> >> > >> >> > On Fri, 12 Nov 2010 11:21:57 +0100 > >> >> > Markus Armbruster wrote: > >> >> > > >> >> >> Luiz Capitulino writes: > >> >> [...] > >> >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) > >> >> >> > +{ > >> >> >> > +MemoryDriver *d = chr->opaque; > >> >> >> > + > >> >> >> > +if (d->outbuf_size == 0) { > >> >> >> > +return qstring_new(); > >> >> >> > +} > >> >> >> > >> >> >> Why is this necessary? Is qstring_from_substr() broken for empty > >> >> >> substrings? If it is, it ought to be fixed! > >> >> > > >> >> > qstring_from_substr() takes a character range; outbuf_size stores a > >> >> > size, > >> >> > not a string length. So we do: > >> >> > > >> >> >> > +return qstring_from_substr((char *) d->outbuf, 0, > >> >> >> > d->outbuf_size - 1); > >> >> > > >> >> > If outbuf_size is 0, we'll be passing a negative value down. > >> >> > >> >> What's wrong with that? > >> > > >> > Although it's going to work with the current QString implementation, I > >> > don't > >> > think it's it's a good idea to rely on a negative index. > >> > >> How should I extract the substring of S beginning at index B with length > >> L? If I cant't do this for any B, L with interval [B,B+L-1] fully > >> within [0,length(S)], then the API is flawed, and ought to be replaced. > > > > Not sure we're talking about the same problem, anymore. When you said: > > > >> >> What's wrong with that? > > > > What did you mean? Did you mean 'let's not decrement outbuf_size' or did > > you mean 'let's pass -1 anyway'? > > Yes, what's wrong with qstring_from_substr(S, 0, -1)? > > Its function comment is imprecise, it doesn't tell us whether the END-th > character is included in the substring or not. > > The code, however, is clear enough: it *is* included. And the unit test > checks that. > > Therefore, qstring_from_substr("abc", 0, 0) returns the qstring "a". > > > Both seem wrong to me: the substring [0,-1] should be invalid > > Why? > > How do you express "the empty substring starting at 0" then? I didn't consider that when I wrote the code, so it's a matter a defining the behavior we want it to have. > > > and not > > decrementing outbuf_size is wrong, because it contains the buffer size and > > qstring_from_substr() will consume an additional char from the buffer (which > > should be '\0' today, but we shouldn't count on that). > > > >> > >> > Maybe, we could have: > >> > > >> > return qstring_from_substr((char *) d->outbuf, 0, > >> > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0); > >> > > >> > A bit harder to read, but makes the function smaller. > >> > >> Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length > >> 1? > > > > Yeah, it's a bug. But that doesn't change my suggestion, can we do this way? > > > > This should fix the bug (not even compiled tested): > > > > diff --git a/qstring.c b/qstring.c > > index 4e2ba08..72a25de 100644 > > --- a/qstring.c > > +++ b/qstring.c > > @@ -42,10 +42,10 @@ QString *qstring_from_substr(const char *str, int > > start, int end) > > > > qstring = qemu_malloc(sizeof(*qstring)); > > > > -qstring->length = end - start + 1; > > -qstring->capacity = qstring->length; > > +qstring->length = end - start; > > +qstring->capacity = qstring->length + 1; > > > > -qstring->string = qemu_malloc(qstring->capacity + 1); > > +qstring->string = qemu_malloc(qstring->capacity); > > memcpy(qstring->string, str + start, qstring->length); > > qstring->string[qstring->length] = 0; > > I suspect this will fail your unit test. Haven't checked it yet, but maybe it has to be fixed too.
[Qemu-devel] Re: [PATCH 1/2] Implement drive_del to decouple block removal from device removal
* Kevin Wolf [2010-11-12 10:43]: > Am 12.11.2010 16:38, schrieb Ryan Harper: > > Currently device hotplug removal code is tied to device removal via > > ACPI. All pci devices that are removable via device_del() require the > > guest to respond to the request. In some cases the guest may not > > respond leaving the device still accessible to the guest. The management > > layer doesn't currently have a reliable way to revoke access to host > > resource in the presence of an uncooperative guest. > > > > This patch implements a new monitor command, drive_del, which > > provides an explicit command to revoke access to a host block device. > > > > drive_del first quiesces the block device (qemu_aio_flush; > > bdrv_flush() and bdrv_close()). This prevents further IO from being > > submitted against the host device. Finally, drive_del cleans up > > pointers between the drive object (host resource) and the device > > object (guest resource). > > > > Signed-off-by: Ryan Harper > > --- > > blockdev.c | 39 +++ > > blockdev.h |1 + > > hmp-commands.hx | 18 ++ > > 3 files changed, 58 insertions(+), 0 deletions(-) > > > > diff --git a/blockdev.c b/blockdev.c > > index 6cb179a..f6ac439 100644 > > --- a/blockdev.c > > +++ b/blockdev.c > > @@ -14,6 +14,8 @@ > > #include "qemu-option.h" > > #include "qemu-config.h" > > #include "sysemu.h" > > +#include "hw/qdev.h" > > +#include "block_int.h" > > > > static QTAILQ_HEAD(drivelist, DriveInfo) drives = > > QTAILQ_HEAD_INITIALIZER(drives); > > > > @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, > > } > > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); > > } > > + > > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > > +{ > > +const char *id = qdict_get_str(qdict, "id"); > > +BlockDriverState *bs; > > +BlockDriverState **ptr; > > +Property *prop; > > + > > +bs = bdrv_find(id); > > +if (!bs) { > > +qerror_report(QERR_DEVICE_NOT_FOUND, id); > > +return -1; > > +} > > + > > +/* quiesce block driver; prevent further io */ > > +qemu_aio_flush(); > > +bdrv_flush(bs); > > +bdrv_close(bs); > > + > > +/* clean up guest state from pointing to host resource by > > + * finding and removing DeviceState "drive" property */ > > +for (prop = bs->peer->info->props; prop && prop->name; prop++) { > > +if (prop->info->type == PROP_TYPE_DRIVE) { > > +ptr = qdev_get_prop_ptr(bs->peer, prop); > > +if ((*ptr) == bs) { > > +bdrv_detach(bs, bs->peer); > > +*ptr = NULL; > > +break; > > +} > > +} > > +} > > + > > +/* clean up host side */ > > +drive_uninit(drive_get_by_blockdev(bs)); > > + > > +return 0; > > +} > > diff --git a/blockdev.h b/blockdev.h > > index 653affc..2a0559e 100644 > > --- a/blockdev.h > > +++ b/blockdev.h > > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject > > **ret_data); > > int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject > > **ret_data); > > int do_change_block(Monitor *mon, const char *device, > > const char *filename, const char *fmt); > > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); > > > > #endif > > diff --git a/hmp-commands.hx b/hmp-commands.hx > > index e5585ba..d6dc18c 100644 > > --- a/hmp-commands.hx > > +++ b/hmp-commands.hx > > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). > > ETEXI > > > > { > > +.name = "drive_del", > > +.args_type = "id:s", > > +.params = "device", > > +.help = "remove host block device", > > +.user_print = monitor_user_noop, > > +.mhandler.cmd_new = do_drive_del, > > +}, > > + > > +STEXI > > +...@item delete @var{device} > > +...@findex delete > > I think this should be @item drive_del and @findex drive_del. Yep. > > Kevin -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx ry...@us.ibm.com
Re: [Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal
Am 12.11.2010 17:28, schrieb Markus Armbruster: > Ryan Harper writes: > >> Once more dear friends, v7 >> >> This patch series decouples the detachment of a block device from the >> removal of the backing pci-device. Removal of a hotplugged pci device >> requires the guest to respond before qemu tears down the block device. >> In some cases, the guest may not respond leaving the guest with >> continued access to the block device. Mgmt layer doesn't have a >> reliable method to force a disconnect. >> >> The new monitor command, drive_del, will revoke a guests access to the >> block device independently of the removal of the pci device. >> >> The first patch implements drive_del, the second patch implements the >> qmp version of the monitor command. >> >> Changes since v6: >> - Updated patch description >> - Dropped bdrv_unplug and inlined in drive_del >> - Explicitly invoke drive_uninit() >> Changes since v5: >> - Removed dangling pointers in guest and host state. This ensures things >> like >> info block no longer displays the deleted drive; though info pci will >> continue to display the pci device until the guest responds to the removal >> request. >> - Renamed drive_unplug -> drive_del >> Changes since v4: >> - Droppped drive_get_by_id patch and use bdrv_find() instead >> - Added additional details about drive_unplug to hmp/qmp interface >> >> Changes since v3: >> - Moved QMP command for drive_unplug() to separate patch >> >> Changes since v2: >> - Added QMP command for drive_unplug() >> >> Changes since v1: >> - CodingStyle fixes >> - Added qemu_aio_flush() to bdrv_unplug() >> >> Signed-off-by: Ryan Harper > > ACK series I have to admit that I didn't follow your discussion very closely any more after a few versions, so just to confirm: You came to the conclusion that we want to add drive_del to QMP and not only the human monitor, even though there is no drive_add in QMP? The HMP help text doesn't seem to be completely right (sent a comment), but once that's fixed, I'm going to merge the series based on Markus's ACK. Kevin
[Qemu-devel] Re: [PATCH 1/2] Implement drive_del to decouple block removal from device removal
Am 12.11.2010 16:38, schrieb Ryan Harper: > Currently device hotplug removal code is tied to device removal via > ACPI. All pci devices that are removable via device_del() require the > guest to respond to the request. In some cases the guest may not > respond leaving the device still accessible to the guest. The management > layer doesn't currently have a reliable way to revoke access to host > resource in the presence of an uncooperative guest. > > This patch implements a new monitor command, drive_del, which > provides an explicit command to revoke access to a host block device. > > drive_del first quiesces the block device (qemu_aio_flush; > bdrv_flush() and bdrv_close()). This prevents further IO from being > submitted against the host device. Finally, drive_del cleans up > pointers between the drive object (host resource) and the device > object (guest resource). > > Signed-off-by: Ryan Harper > --- > blockdev.c | 39 +++ > blockdev.h |1 + > hmp-commands.hx | 18 ++ > 3 files changed, 58 insertions(+), 0 deletions(-) > > diff --git a/blockdev.c b/blockdev.c > index 6cb179a..f6ac439 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -14,6 +14,8 @@ > #include "qemu-option.h" > #include "qemu-config.h" > #include "sysemu.h" > +#include "hw/qdev.h" > +#include "block_int.h" > > static QTAILQ_HEAD(drivelist, DriveInfo) drives = > QTAILQ_HEAD_INITIALIZER(drives); > > @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, > } > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); > } > + > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > +{ > +const char *id = qdict_get_str(qdict, "id"); > +BlockDriverState *bs; > +BlockDriverState **ptr; > +Property *prop; > + > +bs = bdrv_find(id); > +if (!bs) { > +qerror_report(QERR_DEVICE_NOT_FOUND, id); > +return -1; > +} > + > +/* quiesce block driver; prevent further io */ > +qemu_aio_flush(); > +bdrv_flush(bs); > +bdrv_close(bs); > + > +/* clean up guest state from pointing to host resource by > + * finding and removing DeviceState "drive" property */ > +for (prop = bs->peer->info->props; prop && prop->name; prop++) { > +if (prop->info->type == PROP_TYPE_DRIVE) { > +ptr = qdev_get_prop_ptr(bs->peer, prop); > +if ((*ptr) == bs) { > +bdrv_detach(bs, bs->peer); > +*ptr = NULL; > +break; > +} > +} > +} > + > +/* clean up host side */ > +drive_uninit(drive_get_by_blockdev(bs)); > + > +return 0; > +} > diff --git a/blockdev.h b/blockdev.h > index 653affc..2a0559e 100644 > --- a/blockdev.h > +++ b/blockdev.h > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject > **ret_data); > int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject > **ret_data); > int do_change_block(Monitor *mon, const char *device, > const char *filename, const char *fmt); > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); > > #endif > diff --git a/hmp-commands.hx b/hmp-commands.hx > index e5585ba..d6dc18c 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). > ETEXI > > { > +.name = "drive_del", > +.args_type = "id:s", > +.params = "device", > +.help = "remove host block device", > +.user_print = monitor_user_noop, > +.mhandler.cmd_new = do_drive_del, > +}, > + > +STEXI > +...@item delete @var{device} > +...@findex delete I think this should be @item drive_del and @findex drive_del. Kevin
Re: [Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format
On Fri, Nov 12, 2010 at 3:43 PM, Kevin Wolf wrote: > Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: >> +/** >> + * Check whether an image format is raw >> + * >> + * @fmt: Backing file format, may be NULL >> + */ >> +static bool qed_fmt_is_raw(const char *fmt) >> +{ >> + return fmt && strcmp(fmt, "raw") == 0; >> +} > > People shouldn't use them directly, but should we also consider file, > host_device, etc.? Hrm..I will look into it for v5. I thought we always have a "raw" format on top of "file", "host_device", etc protocols? >> + if (s->header.magic != QED_MAGIC) { >> + return -ENOENT; >> + } > > ENOENT seems a bit odd for a wrong magic number, especially if it's used > for the error message. Wouldn't EINVAL or ENOTSUP be a closer match? You're right, ENOENT is confusing for the user. >> +static void bdrv_qed_flush(BlockDriverState *bs) >> +{ >> + bdrv_flush(bs->file); >> +} > > This conflicts with one of my recent changes. bdrv_flush should return > an int now. Will fix and will also check for bdrv_flush() failures. >> + while (options && options->name) { >> + if (!strcmp(options->name, BLOCK_OPT_SIZE)) { >> + image_size = options->value.n; >> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) { >> + backing_file = options->value.s; >> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) { >> + backing_fmt = options->value.s; > > I'm not sure here. It doesn't really matter for QED, but I find it > strange that -o backing_fmt=foobar works and gives you a non-raw backing > file. Should we check if the format exists at least? I see the same issue in QCOW2 so let's solve this generically in a separate patch. Stefan
Re: [Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal
Ryan Harper writes: > Once more dear friends, v7 > > This patch series decouples the detachment of a block device from the > removal of the backing pci-device. Removal of a hotplugged pci device > requires the guest to respond before qemu tears down the block device. > In some cases, the guest may not respond leaving the guest with > continued access to the block device. Mgmt layer doesn't have a > reliable method to force a disconnect. > > The new monitor command, drive_del, will revoke a guests access to the > block device independently of the removal of the pci device. > > The first patch implements drive_del, the second patch implements the > qmp version of the monitor command. > > Changes since v6: > - Updated patch description > - Dropped bdrv_unplug and inlined in drive_del > - Explicitly invoke drive_uninit() > Changes since v5: > - Removed dangling pointers in guest and host state. This ensures things > like > info block no longer displays the deleted drive; though info pci will > continue to display the pci device until the guest responds to the removal > request. > - Renamed drive_unplug -> drive_del > Changes since v4: > - Droppped drive_get_by_id patch and use bdrv_find() instead > - Added additional details about drive_unplug to hmp/qmp interface > > Changes since v3: > - Moved QMP command for drive_unplug() to separate patch > > Changes since v2: > - Added QMP command for drive_unplug() > > Changes since v1: > - CodingStyle fixes > - Added qemu_aio_flush() to bdrv_unplug() > > Signed-off-by: Ryan Harper ACK series
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
On Fri, 12 Nov 2010 16:54:14 +0100 Markus Armbruster wrote: > Luiz Capitulino writes: > > > On Fri, 12 Nov 2010 11:16:54 +0100 > > Markus Armbruster wrote: > > > >> Luiz Capitulino writes: > >> > >> > On Thu, 11 Nov 2010 17:32:06 +0100 > >> > Markus Armbruster wrote: > >> > > >> >> Luiz Capitulino writes: > >> >> > >> >> > On Thu, 11 Nov 2010 16:30:26 +0100 > >> >> > Markus Armbruster wrote: > >> >> > > >> >> >> Luiz Capitulino writes: > >> >> >> > >> >> >> > This driver handles in-memory chardev operations. That's, all > >> >> >> > writes > >> >> >> > to this driver are stored in an internal buffer and it doesn't talk > >> >> >> > to the external world in any way. > >> >> >> > > >> >> >> > Right now it's very simple: it supports only writes. But it can be > >> >> >> > easily extended to support more operations. > >> >> >> > > >> >> >> > This is going to be used by the monitor's "HMP passthrough via QMP" > >> >> >> > feature, which needs to run monitor handlers without a backing > >> >> >> > device. > >> >> >> > > >> >> >> > Signed-off-by: Luiz Capitulino > >> >> >> > --- > >> >> >> > qemu-char.c | 66 > >> >> >> > +++ > >> >> >> > qemu-char.h |6 + > >> >> >> > 2 files changed, 72 insertions(+), 0 deletions(-) > >> >> >> > > >> >> >> > diff --git a/qemu-char.c b/qemu-char.c > >> >> >> > index 88997f9..896df14 100644 > >> >> >> > --- a/qemu-char.c > >> >> >> > +++ b/qemu-char.c > >> >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState > >> >> >> > *qemu_chr_open_socket(QemuOpts *opts) > >> >> >> > return NULL; > >> >> >> > } > >> >> >> > > >> >> >> > +/***/ > >> >> >> > +/* Memory chardev */ > >> >> >> > +typedef struct { > >> >> >> > +size_t outbuf_size; > >> >> >> > +size_t outbuf_capacity; > >> >> >> > +uint8_t *outbuf; > >> >> >> > +} MemoryDriver; > >> >> >> > + > >> >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t > >> >> >> > *buf, int len) > >> >> >> > +{ > >> >> >> > +MemoryDriver *d = chr->opaque; > >> >> >> > + > >> >> >> > +/* TODO: the QString implementation has the same code, we > >> >> >> > should > >> >> >> > + * introduce a generic way to do this in cutils.c */ > >> >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) { > >> >> >> > +/* grown outbuf */ > >> >> >> > >> >> >> Used to say "grow" (sans n) here. Intentional change? > >> >> > > >> >> > Hum, no. I think I've squashed an older commit while rebasing (but > >> >> > this seems > >> >> > to be the only problem). > >> >> > > >> >> >> > >> >> >> > +d->outbuf_capacity += len; > >> >> >> > +d->outbuf_capacity *= 2; > >> >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity); > >> >> >> > +} > >> >> >> > + > >> >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len); > >> >> >> > +d->outbuf_size += len; > >> >> >> > + > >> >> >> > +return len; > >> >> >> > +} > >> >> >> > + > >> >> >> > +#define DEFAULT_BUF_SIZE 4096 > >> >> >> > >> >> >> It's the *initial* buffer size, isn't it? > >> >> > > >> >> > Yes. > >> >> > >> >> Could we make the name reflect that then? > >> >> > >> >> >> Doubt it's worth a #define (there's just one user), but that's a > >> >> >> matter > >> >> >> of taste. > >> >> >> > >> >> >> > + > >> >> >> > +void qemu_chr_init_mem(CharDriverState *chr) > >> >> >> > +{ > >> >> >> > +MemoryDriver *d; > >> >> >> > + > >> >> >> > +d = qemu_malloc(sizeof(*d)); > >> >> >> > +d->outbuf_size = 0; > >> >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE; > >> >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity); > >> >> >> > + > >> >> >> > +memset(chr, 0, sizeof(*chr)); > >> >> >> > +chr->opaque = d; > >> >> >> > +chr->chr_write = mem_chr_write; > >> >> >> > +} > >> >> >> > + > >> >> >> > +/* assumes the stored data is a string */ > >> >> >> > >> >> >> What else could it be? Worrying about embedded '\0's? > >> >> > > >> >> > Yes, as the driver itself doesn't interpret the contents of its > >> >> > buffer. > >> >> > >> >> What happens if there are embedded '\0's? > >> > > >> > The string will be shorter than expected? And what if it contains > >> > non-printable characters? > >> > > >> > It's just a cautionary comment to help the user identify such problems, > >> > I think > >> > we're making a whole argument about a quite minor thing. > >> > >> When I see "assumes X" in a function comment, I immediately ask "and > >> what happens when !X?" The default answer is "it explodes, so don't do > >> that". That answer is wrong here. Therefore, I find the comment > >> misleading. > > > > That's how you interpret it, my interpretation is that I might not get > > the expected behavior. > > Actually, this function works just fine for embedded '\0's (I tested > it): it returns the correct QString, with full length and '\0' embedded. Good. > Only l
Re: [Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal
Ryan Harper writes: > Currently device hotplug removal code is tied to device removal via > ACPI. All pci devices that are removable via device_del() require the > guest to respond to the request. In some cases the guest may not > respond leaving the device still accessible to the guest. The management > layer doesn't currently have a reliable way to revoke access to host > resource in the presence of an uncooperative guest. > > This patch implements a new monitor command, drive_del, which > provides an explicit command to revoke access to a host block device. > > drive_del first quiesces the block device (qemu_aio_flush; > bdrv_flush() and bdrv_close()). This prevents further IO from being > submitted against the host device. Finally, drive_del cleans up > pointers between the drive object (host resource) and the device > object (guest resource). > > Signed-off-by: Ryan Harper > --- > blockdev.c | 39 +++ > blockdev.h |1 + > hmp-commands.hx | 18 ++ > 3 files changed, 58 insertions(+), 0 deletions(-) > > diff --git a/blockdev.c b/blockdev.c > index 6cb179a..f6ac439 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -14,6 +14,8 @@ > #include "qemu-option.h" > #include "qemu-config.h" > #include "sysemu.h" > +#include "hw/qdev.h" > +#include "block_int.h" > > static QTAILQ_HEAD(drivelist, DriveInfo) drives = > QTAILQ_HEAD_INITIALIZER(drives); > > @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, > } > return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); > } > + > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) > +{ > +const char *id = qdict_get_str(qdict, "id"); > +BlockDriverState *bs; > +BlockDriverState **ptr; > +Property *prop; > + > +bs = bdrv_find(id); > +if (!bs) { > +qerror_report(QERR_DEVICE_NOT_FOUND, id); > +return -1; > +} > + > +/* quiesce block driver; prevent further io */ > +qemu_aio_flush(); > +bdrv_flush(bs); > +bdrv_close(bs); > + > +/* clean up guest state from pointing to host resource by > + * finding and removing DeviceState "drive" property */ > +for (prop = bs->peer->info->props; prop && prop->name; prop++) { > +if (prop->info->type == PROP_TYPE_DRIVE) { > +ptr = qdev_get_prop_ptr(bs->peer, prop); > +if ((*ptr) == bs) { Superfluous parenthesis around *ptr. Not worth a respin; I've tormented you enough ;) > +bdrv_detach(bs, bs->peer); > +*ptr = NULL; > +break; > +} > +} > +} > + > +/* clean up host side */ > +drive_uninit(drive_get_by_blockdev(bs)); > + > +return 0; > +} > diff --git a/blockdev.h b/blockdev.h > index 653affc..2a0559e 100644 > --- a/blockdev.h > +++ b/blockdev.h > @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject > **ret_data); > int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject > **ret_data); > int do_change_block(Monitor *mon, const char *device, > const char *filename, const char *fmt); > +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); > > #endif > diff --git a/hmp-commands.hx b/hmp-commands.hx > index e5585ba..d6dc18c 100644 > --- a/hmp-commands.hx > +++ b/hmp-commands.hx > @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). > ETEXI > > { > +.name = "drive_del", > +.args_type = "id:s", > +.params = "device", > +.help = "remove host block device", > +.user_print = monitor_user_noop, > +.mhandler.cmd_new = do_drive_del, > +}, > + > +STEXI > +...@item delete @var{device} > +...@findex delete > +Remove host block device. The result is that guest generated IO is no longer > +submitted against the host device underlying the disk. Once a drive has > +been deleted, the QEMU Block layer returns -EIO which results in IO > +errors in the guest for applications that are reading/writing to the device. > +ETEXI > + > +{ > .name = "change", > .args_type = "device:B,target:F,arg:s?", > .params = "device filename [format]",
[Qemu-devel] Announce: Auto/Lazy-migration Patches RFC on linux-numa list
At last weeks' LPC, there was some interest in my patches for Auto/Lazy Migration to improve locality and possibly performance of unpinned guest VMs on a NUMA platform. As a result of these conversations I have reposted the patches [4 series, ~40 patches] as RFCs to the linux-numa list. Links to threads given below. I have rebased the patches atop 3Nov10 mmotm series [2.6.36 + 3nov mmotm]. The patched kernel builds, boots and survives some fairly heavy testing on an 8 node istanbul x86_64. Under heavy load, I do encounter a race in the somewhat optional migration cache. Currently this generates a warning and carries on, but the one migration cache entry and related page is then wedged. This would need to be resolved. The series/threads in the order applied: [PATCH/RFC 0/14] Shared Policy Overview http://markmail.org/message/trvpl3t7gimvwht6 [PATCH/RFC 0/8] numa - Migrate-on-Fault http://markmail.org/message/mdwbcitql5ka4uws [PATCH/RFC 0/11] numa - Automatic-migration http://markmail.org/message/zik3itmqed65mol2 [PATCH/RFC 1/5] numa - migration cache - core implementation http://markmail.org/message/xvck7enyezx6chyi RESEND: [PATCH/RFC 1/5] numa - migration cache - core implementation http://markmail.org/message/xgvvrnn2nk4nsn2e resend to add back the patch description missing from 1st attempt.
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
Luiz Capitulino writes: > On Fri, 12 Nov 2010 16:04:39 +0100 > Markus Armbruster wrote: > >> Luiz Capitulino writes: >> >> > On Fri, 12 Nov 2010 15:16:33 +0100 >> > Markus Armbruster wrote: >> > >> >> Luiz Capitulino writes: >> >> >> >> > On Fri, 12 Nov 2010 11:21:57 +0100 >> >> > Markus Armbruster wrote: >> >> > >> >> >> Luiz Capitulino writes: >> >> [...] >> >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) >> >> >> > +{ >> >> >> > +MemoryDriver *d = chr->opaque; >> >> >> > + >> >> >> > +if (d->outbuf_size == 0) { >> >> >> > +return qstring_new(); >> >> >> > +} >> >> >> >> >> >> Why is this necessary? Is qstring_from_substr() broken for empty >> >> >> substrings? If it is, it ought to be fixed! >> >> > >> >> > qstring_from_substr() takes a character range; outbuf_size stores a >> >> > size, >> >> > not a string length. So we do: >> >> > >> >> >> > +return qstring_from_substr((char *) d->outbuf, 0, >> >> >> > d->outbuf_size - 1); >> >> > >> >> > If outbuf_size is 0, we'll be passing a negative value down. >> >> >> >> What's wrong with that? >> > >> > Although it's going to work with the current QString implementation, I >> > don't >> > think it's it's a good idea to rely on a negative index. >> >> How should I extract the substring of S beginning at index B with length >> L? If I cant't do this for any B, L with interval [B,B+L-1] fully >> within [0,length(S)], then the API is flawed, and ought to be replaced. > > Not sure we're talking about the same problem, anymore. When you said: > >> >> What's wrong with that? > > What did you mean? Did you mean 'let's not decrement outbuf_size' or did > you mean 'let's pass -1 anyway'? Yes, what's wrong with qstring_from_substr(S, 0, -1)? Its function comment is imprecise, it doesn't tell us whether the END-th character is included in the substring or not. The code, however, is clear enough: it *is* included. And the unit test checks that. Therefore, qstring_from_substr("abc", 0, 0) returns the qstring "a". > Both seem wrong to me: the substring [0,-1] should be invalid Why? How do you express "the empty substring starting at 0" then? > and not > decrementing outbuf_size is wrong, because it contains the buffer size and > qstring_from_substr() will consume an additional char from the buffer (which > should be '\0' today, but we shouldn't count on that). > >> >> > Maybe, we could have: >> > >> > return qstring_from_substr((char *) d->outbuf, 0, >> > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0); >> > >> > A bit harder to read, but makes the function smaller. >> >> Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length >> 1? > > Yeah, it's a bug. But that doesn't change my suggestion, can we do this way? > > This should fix the bug (not even compiled tested): > > diff --git a/qstring.c b/qstring.c > index 4e2ba08..72a25de 100644 > --- a/qstring.c > +++ b/qstring.c > @@ -42,10 +42,10 @@ QString *qstring_from_substr(const char *str, int start, > int end) > > qstring = qemu_malloc(sizeof(*qstring)); > > -qstring->length = end - start + 1; > -qstring->capacity = qstring->length; > +qstring->length = end - start; > +qstring->capacity = qstring->length + 1; > > -qstring->string = qemu_malloc(qstring->capacity + 1); > +qstring->string = qemu_malloc(qstring->capacity); > memcpy(qstring->string, str + start, qstring->length); > qstring->string[qstring->length] = 0; I suspect this will fail your unit test.
[Qemu-devel] Re: [RfC PATCH] spice: qmp windup: connection events & info command.
On Thu, 11 Nov 2010 13:49:32 +0100 Gerd Hoffmann wrote: > Hi, > > Looking for comments especially from Luiz and Daniel (aka qmp and > libvirt masters) ... > > This patch adds support for connection events to spice. The events are > quite simliar to the vnc events. Unlike vnc spice uses multiple tcp > channels though. qemu will report every single tcp connection (aka > spice channel). If you want track spice sessions only you can filter > for the main channel (channel-type == 1). > > The patch also adds a 'query-spice' monitor command which returns > informations about the spice server configuration and also a list of > channel connections. The first thing we have to check with Daniel is whether or not we're providing the info they would need/expect, apart from that I have the following general comments: 1. It's missing documentation in QMP/qmp-events.txt and qmp-commands.hx (yeah, docs are far from code, hope to fix soon) 2. Can you please split this in two patches? One adding the events and the other adding the query command Some small comments follow. > > cheers, > Gerd > > Cc: lcapitul...@redhat.com > Cc: berra...@redhat.com > --- > monitor.c | 43 +++ > monitor.h |3 + > ui/qemu-spice.h |2 + > ui/spice-core.c | 160 > +++ > 4 files changed, 208 insertions(+), 0 deletions(-) > > diff --git a/monitor.c b/monitor.c > index 8cee35d..4f18cbc 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -59,6 +59,7 @@ > #ifdef CONFIG_SIMPLE_TRACE > #include "trace.h" > #endif > +#include "ui/qemu-spice.h" > > //#define DEBUG > //#define DEBUG_COMPLETION > @@ -459,6 +460,15 @@ void monitor_protocol_event(MonitorEvent event, QObject > *data) > case QEVENT_WATCHDOG: > event_name = "WATCHDOG"; > break; > +case QEVENT_SPICE_CONNECTED: > +event_name = "SPICE_CONNECTED"; > +break; > +case QEVENT_SPICE_INITIALIZED: > +event_name = "SPICE_INITIALIZED"; > +break; > +case QEVENT_SPICE_DISCONNECTED: > +event_name = "SPICE_DISCONNECTED"; > +break; > default: > abort(); > break; > @@ -640,6 +650,19 @@ static void user_async_info_handler(Monitor *mon, const > mon_cmd_t *cmd) > } > } > > +/* > + * generic print handler for hmp 'info $what' > + * simply pretty-print the josn representation > + */ > +#if defined(CONFIG_SPICE) /* because 'info spice' is the only user */ > +static void do_info_generic_print(Monitor *mon, const QObject *data) > +{ > +QString *json = qobject_to_json_pretty(data); > +monitor_printf(mon, "%s\n", qstring_get_str(json)); > +QDECREF(json); > +} > +#endif We definitely need a generic print handler, but I don't that stringifying JSON makes a minimal good user interface. > static void do_info(Monitor *mon, const QDict *qdict) > { > const mon_cmd_t *cmd; > @@ -2533,6 +2556,16 @@ static const mon_cmd_t info_cmds[] = { > .user_print = do_info_vnc_print, > .mhandler.info_new = do_info_vnc, > }, > +#if defined(CONFIG_SPICE) > +{ > +.name = "spice", > +.args_type = "", > +.params = "", > +.help = "show the spice server status", > +.user_print = do_info_generic_print, > +.mhandler.info_new = do_info_spice, > +}, > +#endif > { > .name = "name", > .args_type = "", > @@ -2720,6 +2753,16 @@ static const mon_cmd_t qmp_query_cmds[] = { > .user_print = do_info_vnc_print, > .mhandler.info_new = do_info_vnc, > }, > +#if defined(CONFIG_SPICE) > +{ > +.name = "spice", > +.args_type = "", > +.params = "", > +.help = "show the spice server status", > +.user_print = do_info_generic_print, > +.mhandler.info_new = do_info_spice, > +}, > +#endif > { > .name = "name", > .args_type = "", > diff --git a/monitor.h b/monitor.h > index 2d36bba..4f2d328 100644 > --- a/monitor.h > +++ b/monitor.h > @@ -32,6 +32,9 @@ typedef enum MonitorEvent { > QEVENT_BLOCK_IO_ERROR, > QEVENT_RTC_CHANGE, > QEVENT_WATCHDOG, > +QEVENT_SPICE_CONNECTED, > +QEVENT_SPICE_INITIALIZED, > +QEVENT_SPICE_DISCONNECTED, > QEVENT_MAX, > } MonitorEvent; > > diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h > index 0e3ad9b..33c1e71 100644 > --- a/ui/qemu-spice.h > +++ b/ui/qemu-spice.h > @@ -33,6 +33,8 @@ void qemu_spice_audio_init(void); > void qemu_spice_display_init(DisplayState *ds); > int qemu_spice_add_interface(SpiceBaseInstance *sin); > > +void do_info_spice(Monitor *mon, QObject **ret_data); > + > #else /* CONFIG_SPICE */ > > #define using_spice 0 > diff --git a/ui/spice-core.c b/ui/spice-core.c > index b7fa031..d2c4a70 100644 > --- a/ui/spice-core.c > +++ b/ui/spice
[Qemu-devel] Re: [patch 0/3] block migration fixes
On Thu, Nov 11, 2010 at 06:06:02PM +0900, Yoshiaki Tamura wrote: > Marcelo Tosatti wrote: > >On Tue, Nov 09, 2010 at 03:02:12PM +0900, Yoshiaki Tamura wrote: > >>Marcelo Tosatti wrote: > >>>Following patchset fixes block migration corruption issues. > >> > >>Hi Marcelo, > >> > >>Thanks for looking into this issue. Although we tried your patches, we're > >>still > >>seeing the corruption. If we execute block migration while copying a file > >>locally, md5sum of the copied file doesn't match with the original. > >>Sometimes, > >>the filesystem returns an I/O error. > >> > >>Could you let us know how you tested and debugged? Did you use blkverify? > > > >Yoshiaki, > > > >I first reproduced corruption by copying a large file during "migrate > >-i", with shared base on qcow2 filesystem, as in your original report. > > > >To debug the problem, file with different byte pattern at every 1MB > >(size of dirty chunk) was created and copied directly to an IDE disk in > >the guest. Raw format used for the disk image. > > > >With this patchset, i'm not able to reproduce the original issue > >anymore. > > > >Can you please provide more details on how to reproduce? > > Marcelo, > > We double checked and the patchset does seem to fix the problem. > The was a mistake in our test procedure. Sorry for the confusion. > > Thanks, > > Yoshi I was also experiencing corruption on automated test, but it turned out to be fixed by kvm.git's ae8894c00b560bde4. Thanks.
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
Luiz Capitulino writes: > On Fri, 12 Nov 2010 11:16:54 +0100 > Markus Armbruster wrote: > >> Luiz Capitulino writes: >> >> > On Thu, 11 Nov 2010 17:32:06 +0100 >> > Markus Armbruster wrote: >> > >> >> Luiz Capitulino writes: >> >> >> >> > On Thu, 11 Nov 2010 16:30:26 +0100 >> >> > Markus Armbruster wrote: >> >> > >> >> >> Luiz Capitulino writes: >> >> >> >> >> >> > This driver handles in-memory chardev operations. That's, all writes >> >> >> > to this driver are stored in an internal buffer and it doesn't talk >> >> >> > to the external world in any way. >> >> >> > >> >> >> > Right now it's very simple: it supports only writes. But it can be >> >> >> > easily extended to support more operations. >> >> >> > >> >> >> > This is going to be used by the monitor's "HMP passthrough via QMP" >> >> >> > feature, which needs to run monitor handlers without a backing >> >> >> > device. >> >> >> > >> >> >> > Signed-off-by: Luiz Capitulino >> >> >> > --- >> >> >> > qemu-char.c | 66 >> >> >> > +++ >> >> >> > qemu-char.h |6 + >> >> >> > 2 files changed, 72 insertions(+), 0 deletions(-) >> >> >> > >> >> >> > diff --git a/qemu-char.c b/qemu-char.c >> >> >> > index 88997f9..896df14 100644 >> >> >> > --- a/qemu-char.c >> >> >> > +++ b/qemu-char.c >> >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState >> >> >> > *qemu_chr_open_socket(QemuOpts *opts) >> >> >> > return NULL; >> >> >> > } >> >> >> > >> >> >> > +/***/ >> >> >> > +/* Memory chardev */ >> >> >> > +typedef struct { >> >> >> > +size_t outbuf_size; >> >> >> > +size_t outbuf_capacity; >> >> >> > +uint8_t *outbuf; >> >> >> > +} MemoryDriver; >> >> >> > + >> >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, >> >> >> > int len) >> >> >> > +{ >> >> >> > +MemoryDriver *d = chr->opaque; >> >> >> > + >> >> >> > +/* TODO: the QString implementation has the same code, we should >> >> >> > + * introduce a generic way to do this in cutils.c */ >> >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) { >> >> >> > +/* grown outbuf */ >> >> >> >> >> >> Used to say "grow" (sans n) here. Intentional change? >> >> > >> >> > Hum, no. I think I've squashed an older commit while rebasing (but this >> >> > seems >> >> > to be the only problem). >> >> > >> >> >> >> >> >> > +d->outbuf_capacity += len; >> >> >> > +d->outbuf_capacity *= 2; >> >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity); >> >> >> > +} >> >> >> > + >> >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len); >> >> >> > +d->outbuf_size += len; >> >> >> > + >> >> >> > +return len; >> >> >> > +} >> >> >> > + >> >> >> > +#define DEFAULT_BUF_SIZE 4096 >> >> >> >> >> >> It's the *initial* buffer size, isn't it? >> >> > >> >> > Yes. >> >> >> >> Could we make the name reflect that then? >> >> >> >> >> Doubt it's worth a #define (there's just one user), but that's a matter >> >> >> of taste. >> >> >> >> >> >> > + >> >> >> > +void qemu_chr_init_mem(CharDriverState *chr) >> >> >> > +{ >> >> >> > +MemoryDriver *d; >> >> >> > + >> >> >> > +d = qemu_malloc(sizeof(*d)); >> >> >> > +d->outbuf_size = 0; >> >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE; >> >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity); >> >> >> > + >> >> >> > +memset(chr, 0, sizeof(*chr)); >> >> >> > +chr->opaque = d; >> >> >> > +chr->chr_write = mem_chr_write; >> >> >> > +} >> >> >> > + >> >> >> > +/* assumes the stored data is a string */ >> >> >> >> >> >> What else could it be? Worrying about embedded '\0's? >> >> > >> >> > Yes, as the driver itself doesn't interpret the contents of its >> >> > buffer. >> >> >> >> What happens if there are embedded '\0's? >> > >> > The string will be shorter than expected? And what if it contains >> > non-printable characters? >> > >> > It's just a cautionary comment to help the user identify such problems, I >> > think >> > we're making a whole argument about a quite minor thing. >> >> When I see "assumes X" in a function comment, I immediately ask "and >> what happens when !X?" The default answer is "it explodes, so don't do >> that". That answer is wrong here. Therefore, I find the comment >> misleading. > > That's how you interpret it, my interpretation is that I might not get > the expected behavior. Actually, this function works just fine for embedded '\0's (I tested it): it returns the correct QString, with full length and '\0' embedded. Only later, when we attempt to put that QString on the wire do we screw up, in to_json(). It fails to consider the length, and stops at the first 0. In fact, there's not even a way to get the length of a QString! There's only qstring_get_str(). I'd call that an API bug. You might call it a restriction instead ;) If anything needs a comment, it's qobject_to_json(). But I th
[Qemu-devel] Re: [PATCH 1/8] pci: pci_default_cap_write_config ignores wmask
On Fri, 2010-11-12 at 10:48 +0200, Michael S. Tsirkin wrote: > On Thu, Nov 11, 2010 at 11:03:19PM -0700, Alex Williamson wrote: > > On Fri, 2010-11-12 at 07:22 +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 11, 2010 at 07:55:01PM -0700, Alex Williamson wrote: > > > > Make use of wmask, just like the rest of config space. > > > > > > > > Signed-off-by: Alex Williamson > > > > --- > > > > > > > > hw/pci.c | 19 --- > > > > 1 files changed, 8 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/hw/pci.c b/hw/pci.c > > > > index 92aaa85..12c47ac 100644 > > > > --- a/hw/pci.c > > > > +++ b/hw/pci.c > > > > @@ -1175,13 +1175,14 @@ uint32_t pci_default_read_config(PCIDevice *d, > > > > return pci_read_config(d, address, len); > > > > } > > > > > > > > -static void pci_write_config(PCIDevice *pci_dev, > > > > - uint32_t address, uint32_t val, int len) > > > > +static void pci_write_config(PCIDevice *d, uint32_t addr, uint32_t > > > > val, int l) > > > > { > > > > int i; > > > > -for (i = 0; i < len; i++) { > > > > -pci_dev->config[address + i] = val & 0xff; > > > > -val >>= 8; > > > > +uint32_t config_size = pci_config_size(d); > > > > + > > > > +for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) { > > > > +uint8_t wmask = d->wmask[addr + i]; > > > > +d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & > > > > wmask); > > > > } > > > > } > > > > > > > > > > > > > Let's not name an internal static helper pci_write_config. > > > This is really update_config_by_mask or something like that. > > > But see below: maybe we don't need it at all? > > > > The function already exists, I just made it do what it seems like it > > should have been doing already. > > Yep. But since you are rewriting this function - could you rename it as > well? Ok > > > > @@ -1207,18 +1208,14 @@ void pci_default_cap_write_config(PCIDevice > > > > *pci_dev, > > > > > > > > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t > > > > val, int l) > > > > { > > > > -int i, was_irq_disabled = pci_irq_disabled(d); > > > > -uint32_t config_size = pci_config_size(d); > > > > +int was_irq_disabled = pci_irq_disabled(d); > > > > > > > > if (pci_access_cap_config(d, addr, l)) { > > > > d->cap.config_write(d, addr, val, l); > > > > return; > > > > } > > > > > > > > > > I would like to also examine the need for _cap_ > > > functions. Why can assigned devices just do > > > > > > pci_default_write_config > > > if (range_overlap(...msi)) { > > > } > > > if (range_overlap(...msix)) { > > > } > > > and then we could remove all the _cap_ extensions > > > altogether? > > > > I think that somewhere we need to track what capabilities are at what > > offset, config space isn't a performance path, but that look horribly > > inefficient and gets worse with more capabilities. > > Looks like premature optimization to me. I guess when we get more than > say 8 capabilities to support, I'll start to worry. > Even then, these optimizations are better internal in pci core. It's not just an optimization, as noted in another reply, we should be using it to make sure we don't have collisions, and it simply makes the callback code much cleaner to be able to do a switch statement instead of a pile of 'if (ranges_overlap)', IMO. > > Why don't we define capability id 0xff as normal config space (first 64 > > bytes), then add the capability id to read/write_config (this is what > > vfio does). Then the driver can split capability handling off from > > their main functions if they want. > > My feeling is we need higher level APIs than 'capability write'. > Otherwise we get the PCI config handling all over the place. > E.g. a callback when msi is enabled/disabled would make sense, > so that pci core can keep track of current state and only notify > the device when there are things to do. I agree, but it's difficult to provide the flexibility to meet all the needs. Device assignment might want to be called for more or less bit flips than an emulated device, PM is probably a good example of this. We could actually change state on a PMCSR write, but I'm not sure what an emulated device would do. Does that mean we add a callback specifically for that, or do we provide some generic interface that drivers can register which bits they want to know about changing? > > Anyway, I think such an improvement > > could be added incrementally later. Thanks, > > > > Alex > > Sure. > > > > > -for (i = 0; i < l && addr + i < config_size; val >>= 8, ++i) { > > > > -uint8_t wmask = d->wmask[addr + i]; > > > > -d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & > > > > wmask); > > > > -} > > > > +pci_write_config(d, addr, val, l); > > > > > > > > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT > > > > if (kvm_enabled() && kvm_i
[Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
On Fri, 2010-11-12 at 11:11 +0200, Michael S. Tsirkin wrote: > On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote: > > On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote: > > > > Some drivers depend on finding capabilities like power management, > > > > PCI express/X, vital product data, or vendor specific fields. Now > > > > that we have better capability support, we can pass more of these > > > > tables through to the guest. Note that VPD and VNDR are direct pass > > > > through capabilies, the rest are mostly empty shells with a few > > > > writable bits where necessary. > > > > > > > > Signed-off-by: Alex Williamson > > > > --- > > > > > > > > hw/device-assignment.c | 160 > > > > +--- > > > > 1 files changed, 149 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > > > > index 179c7dc..1b228ad 100644 > > > > --- a/hw/device-assignment.c > > > > +++ b/hw/device-assignment.c > > > > @@ -366,6 +366,27 @@ static uint8_t > > > > assigned_dev_pci_read_byte(PCIDevice *d, int pos) > > > > return (uint8_t)assigned_dev_pci_read(d, pos, 1); > > > > } > > > > > > > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t > > > > val, int len) > > > > +{ > > > > +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); > > > > +ssize_t ret; > > > > +int fd = pci_dev->real_device.config_fd; > > > > + > > > > +again: > > > > +ret = pwrite(fd, &val, len, pos); > > > > +if (ret != len) { > > > > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) > > > > + goto again; > > > > > > > > > do {} while() ? > > > > Sure, this is just a copy of another place that does something similar. > > They should either be merged or both converted in a separate patch. > > > > > > + > > > > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n", > > > > + __func__, ret, errno); > > > > + > > > > + exit(1); > > > > +} > > > > + > > > > +return; > > > > +} > > > > + > > > > static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) > > > > { > > > > int id; > > > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice > > > > *pci_dev, unsigned int ctrl_pos) > > > > #endif > > > > #endif > > > > > > > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev, > > > > +uint8_t cap_id, > > > > +uint32_t address, > > > > int len) > > > > +{ > > > > +uint8_t cap; > > > > + > > > > +switch (cap_id) { > > > > + > > > > +case PCI_CAP_ID_VPD: > > > > +cap = pci_find_capability(pci_dev, cap_id); > > > > +if (address - cap >= PCI_CAP_FLAGS) { > > > > +return assigned_dev_pci_read(pci_dev, address, len); > > > > +} > > > > +break; > > > > + > > > > +case PCI_CAP_ID_VNDR: > > > > +cap = pci_find_capability(pci_dev, cap_id); > > > > +if (address - cap > PCI_CAP_FLAGS) { > > > > +return assigned_dev_pci_read(pci_dev, address, len); > > > > +} > > > > +break; > > > > +} > > > > + > > > > +return pci_default_cap_read_config(pci_dev, cap_id, address, len); > > > > +} > > > > + > > > > static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, > > > > uint8_t cap_id, > > > > uint32_t address, > > > > uint32_t val, int len) > > > > { > > > > +uint8_t cap; > > > > + > > > > pci_default_cap_write_config(pci_dev, cap_id, address, val, len); > > > > > > > > switch (cap_id) { > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > case PCI_CAP_ID_MSI: > > > > #ifdef KVM_CAP_DEVICE_MSI > > > > -{ > > > > -uint8_t cap = pci_find_capability(pci_dev, cap_id); > > > > -if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) { > > > > -assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS); > > > > -} > > > > +cap = pci_find_capability(pci_dev, cap_id); > > > > +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) { > > > > +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS); > > > > } > > > > #endif > > > > break; > > > > > > > > case PCI_CAP_ID_MSIX: > > > > #ifdef KVM_CAP_DEVICE_MSIX > > > > -{ > > > > -uint8_t cap = pci_find_capability(pci_dev, cap_id); > > > > -if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, > > > > 1)) { > > > > -assigned_dev_update_msix(pci_dev, cap + > > > > PCI_MSIX_FLAGS); > > > > -} > > > > +cap = p
[Qemu-devel] [PATCH 2/2] Add qmp version of drive_del
Signed-off-by: Ryan Harper --- qmp-commands.hx | 29 + 1 files changed, 29 insertions(+), 0 deletions(-) diff --git a/qmp-commands.hx b/qmp-commands.hx index 793cf1c..1e0d4e9 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -338,6 +338,35 @@ Example: EQMP { +.name = "drive_del", +.args_type = "id:s", +.params = "device", +.help = "remove host block device", +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +SQMP +drive del +-- + +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. + +Arguments: + +- "id": the device's ID (json-string) + +Example: + +-> { "execute": "drive_del", "arguments": { "id": "drive-virtio-blk1" } } +<- { "return": {} } + +EQMP + +{ .name = "cpu", .args_type = "index:i", .params = "index", -- 1.6.3.3
[Qemu-devel] Re: [PATCH v4 2/5] qed: Add QEMU Enhanced Disk image format
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: > This patch introduces the qed on-disk layout and implements image > creation. Later patches add read/write and other functionality. > > Signed-off-by: Stefan Hajnoczi > --- > Makefile.objs |1 + > block/qed.c | 548 > + > block/qed.h | 145 +++ > block_int.h |1 + > 4 files changed, 695 insertions(+), 0 deletions(-) > create mode 100644 block/qed.c > create mode 100644 block/qed.h > > diff --git a/Makefile.objs b/Makefile.objs > index f07fb01..7bae72a 100644 > --- a/Makefile.objs > +++ b/Makefile.objs > @@ -14,6 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o > > block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o > vpc.o vvfat.o > block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o > +block-nested-y += qed.o > block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o > block-nested-$(CONFIG_WIN32) += raw-win32.o > block-nested-$(CONFIG_POSIX) += raw-posix.o > diff --git a/block/qed.c b/block/qed.c > new file mode 100644 > index 000..8469cf0 > --- /dev/null > +++ b/block/qed.c > @@ -0,0 +1,548 @@ > +/* > + * QEMU Enhanced Disk Format > + * > + * Copyright IBM, Corp. 2010 > + * > + * Authors: > + * Stefan Hajnoczi > + * Anthony Liguori > + * > + * This work is licensed under the terms of the GNU LGPL, version 2 or later. > + * See the COPYING.LIB file in the top-level directory. > + * > + */ > + > +#include "qed.h" > + > +static int bdrv_qed_probe(const uint8_t *buf, int buf_size, > + const char *filename) > +{ > +const QEDHeader *header = (const QEDHeader *)buf; > + > +if (buf_size < sizeof(*header)) { > +return 0; > +} > +if (le32_to_cpu(header->magic) != QED_MAGIC) { > +return 0; > +} > +return 100; > +} > + > +/** > + * Check whether an image format is raw > + * > + * @fmt:Backing file format, may be NULL > + */ > +static bool qed_fmt_is_raw(const char *fmt) > +{ > +return fmt && strcmp(fmt, "raw") == 0; > +} People shouldn't use them directly, but should we also consider file, host_device, etc.? > + > +static void qed_header_le_to_cpu(const QEDHeader *le, QEDHeader *cpu) > +{ > +cpu->magic = le32_to_cpu(le->magic); > +cpu->cluster_size = le32_to_cpu(le->cluster_size); > +cpu->table_size = le32_to_cpu(le->table_size); > +cpu->header_size = le32_to_cpu(le->header_size); > +cpu->features = le64_to_cpu(le->features); > +cpu->compat_features = le64_to_cpu(le->compat_features); > +cpu->autoclear_features = le64_to_cpu(le->autoclear_features); > +cpu->l1_table_offset = le64_to_cpu(le->l1_table_offset); > +cpu->image_size = le64_to_cpu(le->image_size); > +cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset); > +cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size); > +} > + > +static void qed_header_cpu_to_le(const QEDHeader *cpu, QEDHeader *le) > +{ > +le->magic = cpu_to_le32(cpu->magic); > +le->cluster_size = cpu_to_le32(cpu->cluster_size); > +le->table_size = cpu_to_le32(cpu->table_size); > +le->header_size = cpu_to_le32(cpu->header_size); > +le->features = cpu_to_le64(cpu->features); > +le->compat_features = cpu_to_le64(cpu->compat_features); > +le->autoclear_features = cpu_to_le64(cpu->autoclear_features); > +le->l1_table_offset = cpu_to_le64(cpu->l1_table_offset); > +le->image_size = cpu_to_le64(cpu->image_size); > +le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset); > +le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size); > +} > + > +static int qed_write_header_sync(BDRVQEDState *s) > +{ > +QEDHeader le; > +int ret; > + > +qed_header_cpu_to_le(&s->header, &le); > +ret = bdrv_pwrite(s->bs->file, 0, &le, sizeof(le)); > +if (ret != sizeof(le)) { > +return ret; > +} > +return 0; > +} > + > +static uint64_t qed_max_image_size(uint32_t cluster_size, uint32_t > table_size) > +{ > +uint64_t table_entries; > +uint64_t l2_size; > + > +table_entries = (table_size * cluster_size) / sizeof(uint64_t); > +l2_size = table_entries * cluster_size; > + > +return l2_size * table_entries; > +} > + > +static bool qed_is_cluster_size_valid(uint32_t cluster_size) > +{ > +if (cluster_size < QED_MIN_CLUSTER_SIZE || > +cluster_size > QED_MAX_CLUSTER_SIZE) { > +return false; > +} > +if (cluster_size & (cluster_size - 1)) { > +return false; /* not power of 2 */ > +} > +return true; > +} > + > +static bool qed_is_table_size_valid(uint32_t table_size) > +{ > +if (table_size < QED_MIN_TABLE_SIZE || > +table_size > QED_MAX_TABLE_SIZE) { > +return false; > +} > +if (table_size & (table_size - 1)) { > +return false; /* not power of 2 */ > +} > +retur
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
On Fri, 12 Nov 2010 16:04:39 +0100 Markus Armbruster wrote: > Luiz Capitulino writes: > > > On Fri, 12 Nov 2010 15:16:33 +0100 > > Markus Armbruster wrote: > > > >> Luiz Capitulino writes: > >> > >> > On Fri, 12 Nov 2010 11:21:57 +0100 > >> > Markus Armbruster wrote: > >> > > >> >> Luiz Capitulino writes: > >> [...] > >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) > >> >> > +{ > >> >> > +MemoryDriver *d = chr->opaque; > >> >> > + > >> >> > +if (d->outbuf_size == 0) { > >> >> > +return qstring_new(); > >> >> > +} > >> >> > >> >> Why is this necessary? Is qstring_from_substr() broken for empty > >> >> substrings? If it is, it ought to be fixed! > >> > > >> > qstring_from_substr() takes a character range; outbuf_size stores a size, > >> > not a string length. So we do: > >> > > >> >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size > >> >> > - 1); > >> > > >> > If outbuf_size is 0, we'll be passing a negative value down. > >> > >> What's wrong with that? > > > > Although it's going to work with the current QString implementation, I don't > > think it's it's a good idea to rely on a negative index. > > How should I extract the substring of S beginning at index B with length > L? If I cant't do this for any B, L with interval [B,B+L-1] fully > within [0,length(S)], then the API is flawed, and ought to be replaced. Not sure we're talking about the same problem, anymore. When you said: > >> What's wrong with that? What did you mean? Did you mean 'let's not decrement outbuf_size' or did you mean 'let's pass -1 anyway'? Both seem wrong to me: the substring [0,-1] should be invalid and not decrementing outbuf_size is wrong, because it contains the buffer size and qstring_from_substr() will consume an additional char from the buffer (which should be '\0' today, but we shouldn't count on that). > > > Maybe, we could have: > > > > return qstring_from_substr((char *) d->outbuf, 0, > > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0); > > > > A bit harder to read, but makes the function smaller. > > Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length > 1? Yeah, it's a bug. But that doesn't change my suggestion, can we do this way? This should fix the bug (not even compiled tested): diff --git a/qstring.c b/qstring.c index 4e2ba08..72a25de 100644 --- a/qstring.c +++ b/qstring.c @@ -42,10 +42,10 @@ QString *qstring_from_substr(const char *str, int start, int end) qstring = qemu_malloc(sizeof(*qstring)); -qstring->length = end - start + 1; -qstring->capacity = qstring->length; +qstring->length = end - start; +qstring->capacity = qstring->length + 1; -qstring->string = qemu_malloc(qstring->capacity + 1); +qstring->string = qemu_malloc(qstring->capacity); memcpy(qstring->string, str + start, qstring->length); qstring->string[qstring->length] = 0;
[Qemu-devel] [PATCH 1/2] Implement drive_del to decouple block removal from device removal
Currently device hotplug removal code is tied to device removal via ACPI. All pci devices that are removable via device_del() require the guest to respond to the request. In some cases the guest may not respond leaving the device still accessible to the guest. The management layer doesn't currently have a reliable way to revoke access to host resource in the presence of an uncooperative guest. This patch implements a new monitor command, drive_del, which provides an explicit command to revoke access to a host block device. drive_del first quiesces the block device (qemu_aio_flush; bdrv_flush() and bdrv_close()). This prevents further IO from being submitted against the host device. Finally, drive_del cleans up pointers between the drive object (host resource) and the device object (guest resource). Signed-off-by: Ryan Harper --- blockdev.c | 39 +++ blockdev.h |1 + hmp-commands.hx | 18 ++ 3 files changed, 58 insertions(+), 0 deletions(-) diff --git a/blockdev.c b/blockdev.c index 6cb179a..f6ac439 100644 --- a/blockdev.c +++ b/blockdev.c @@ -14,6 +14,8 @@ #include "qemu-option.h" #include "qemu-config.h" #include "sysemu.h" +#include "hw/qdev.h" +#include "block_int.h" static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); @@ -597,3 +599,40 @@ int do_change_block(Monitor *mon, const char *device, } return monitor_read_bdrv_key_start(mon, bs, NULL, NULL); } + +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ +const char *id = qdict_get_str(qdict, "id"); +BlockDriverState *bs; +BlockDriverState **ptr; +Property *prop; + +bs = bdrv_find(id); +if (!bs) { +qerror_report(QERR_DEVICE_NOT_FOUND, id); +return -1; +} + +/* quiesce block driver; prevent further io */ +qemu_aio_flush(); +bdrv_flush(bs); +bdrv_close(bs); + +/* clean up guest state from pointing to host resource by + * finding and removing DeviceState "drive" property */ +for (prop = bs->peer->info->props; prop && prop->name; prop++) { +if (prop->info->type == PROP_TYPE_DRIVE) { +ptr = qdev_get_prop_ptr(bs->peer, prop); +if ((*ptr) == bs) { +bdrv_detach(bs, bs->peer); +*ptr = NULL; +break; +} +} +} + +/* clean up host side */ +drive_uninit(drive_get_by_blockdev(bs)); + +return 0; +} diff --git a/blockdev.h b/blockdev.h index 653affc..2a0559e 100644 --- a/blockdev.h +++ b/blockdev.h @@ -51,5 +51,6 @@ int do_eject(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_set_passwd(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_change_block(Monitor *mon, const char *device, const char *filename, const char *fmt); +int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); #endif diff --git a/hmp-commands.hx b/hmp-commands.hx index e5585ba..d6dc18c 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -68,6 +68,24 @@ Eject a removable medium (use -f to force it). ETEXI { +.name = "drive_del", +.args_type = "id:s", +.params = "device", +.help = "remove host block device", +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_drive_del, +}, + +STEXI +...@item delete @var{device} +...@findex delete +Remove host block device. The result is that guest generated IO is no longer +submitted against the host device underlying the disk. Once a drive has +been deleted, the QEMU Block layer returns -EIO which results in IO +errors in the guest for applications that are reading/writing to the device. +ETEXI + +{ .name = "change", .args_type = "device:B,target:F,arg:s?", .params = "device filename [format]", -- 1.6.3.3
[Qemu-devel] [PATCH 0/2] v7 Decouple block device removal from device removal
Once more dear friends, v7 This patch series decouples the detachment of a block device from the removal of the backing pci-device. Removal of a hotplugged pci device requires the guest to respond before qemu tears down the block device. In some cases, the guest may not respond leaving the guest with continued access to the block device. Mgmt layer doesn't have a reliable method to force a disconnect. The new monitor command, drive_del, will revoke a guests access to the block device independently of the removal of the pci device. The first patch implements drive_del, the second patch implements the qmp version of the monitor command. Changes since v6: - Updated patch description - Dropped bdrv_unplug and inlined in drive_del - Explicitly invoke drive_uninit() Changes since v5: - Removed dangling pointers in guest and host state. This ensures things like info block no longer displays the deleted drive; though info pci will continue to display the pci device until the guest responds to the removal request. - Renamed drive_unplug -> drive_del Changes since v4: - Droppped drive_get_by_id patch and use bdrv_find() instead - Added additional details about drive_unplug to hmp/qmp interface Changes since v3: - Moved QMP command for drive_unplug() to separate patch Changes since v2: - Added QMP command for drive_unplug() Changes since v1: - CodingStyle fixes - Added qemu_aio_flush() to bdrv_unplug() Signed-off-by: Ryan Harper
[Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map
On Fri, 2010-11-12 at 11:02 +0200, Michael S. Tsirkin wrote: > On Thu, Nov 11, 2010 at 11:07:15PM -0700, Alex Williamson wrote: > > On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote: > > > On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote: > > > > Capabilities are allocated in bytes, so we can track both whether > > > > a byte is used and by what capability in the same structure. > > > > > > > > Remove pci_reserve_capability() as there are no users. > > > > > > > > Signed-off-by: Alex Williamson > > > > > > I actually wanted to remove the used array completely and ask > > > all users to add offsets directly. > > > Will this be needed then? > > > > Can you give an example, I don't understand what you mean by asking > > users to add offsets directly. > > Here's a dump of patch in progress. > Something like the below (untested). After applying this we can remove > the whole used array allocator. > > > I think some kind of tracking what's > > where in config space needs to be done somewhere and the common PCI code > > seems like it'd be the place. > > Why do we need it? config lets us scan the capability list > readily enough. I had this idea that we should pack > capabilities in config space, but now I think it's silly. > > > Thanks, > > > > Alex > > pci: remove config space allocator > > pci supports allocating caps in config space so they are packed tightly. > This doesn't seem to be useful, especially since caps must stay at fixed > offsets to ensure backwards compatibility (e.g. for migration). > > Remove this support, and make virtio-pci supply the offset to > the MSIX capability explicitly. > > Signed-off-by: Michael S. Tsirkin So add_capability effectively becomes the add_capability_at_offset that we have in the qemu-kvm tree. I think that's fine, but doesn't necessarily support removing the cap_map. A direct config offset to capability id map is pretty useful for device assignment, which only uses add_capability_at_offset after these patches. We might also want to use it to check drivers to make sure they're not overlapping caps. For instance in the below, we now have core virtio code forcing an offset for the msix cap. What if the driver already placed something at that offset, or what if the driver wants to add capability foo, it should at least error if it tries to place it over something else in the list. Alex > diff --git a/hw/msix.c b/hw/msix.c > index f66d255..6fd3791 100644 > --- a/hw/msix.c > +++ b/hw/msix.c > @@ -52,7 +52,7 @@ int msix_supported; > * Original bar size must be a power of 2 or 0. > * New bar size is returned. */ > static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, > - unsigned bar_nr, unsigned bar_size) > + unsigned offset, unsigned bar_nr, unsigned > bar_size) > { > int config_offset; > uint8_t *config; > @@ -75,7 +75,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned > short nentries, > > pdev->msix_bar_size = new_size; > config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, > - 0, MSIX_CAP_LENGTH); > + offset, MSIX_CAP_LENGTH); > if (config_offset < 0) > return config_offset; > config = pdev->config + config_offset; > @@ -237,7 +237,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned > nentries) > /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is > * modified, it should be retrieved with msix_bar_size. */ > int msix_init(struct PCIDevice *dev, unsigned short nentries, > - unsigned bar_nr, unsigned bar_size) > + unsigned offset, unsigned bar_nr, unsigned bar_size) > { > int ret; > /* Nothing to do if MSI is not supported by interrupt controller */ > @@ -261,7 +261,7 @@ int msix_init(struct PCIDevice *dev, unsigned short > nentries, > } > > dev->msix_entries_nr = nentries; > -ret = msix_add_config(dev, nentries, bar_nr, bar_size); > +ret = msix_add_config(dev, nentries, offset, bar_nr, bar_size); > if (ret) > goto err_config; > > diff --git a/hw/msix.h b/hw/msix.h > index a9f7993..b61e42e 100644 > --- a/hw/msix.h > +++ b/hw/msix.h > @@ -5,7 +5,7 @@ > #include "pci.h" > > int msix_init(PCIDevice *pdev, unsigned short nentries, > - unsigned bar_nr, unsigned bar_size); > + unsigned offset, unsigned bar_nr, unsigned bar_size); > > void msix_write_config(PCIDevice *pci_dev, uint32_t address, > uint32_t val, int len); > diff --git a/hw/pci.c b/hw/pci.c > index aed2d42..e411f12 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -1737,12 +1737,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, > uint8_t offset, uint8_t size) > { > uint8_t *config; > -if (!offset) { > -offset = pci_find_space(pdev, size)
Re: [Qemu-devel] [PATCH] Makefile: Fix check dependency breakage
Luiz Capitulino writes: > Commit b152aa84d52882bb1846485a89baf13aa07c86bc broke the unit-tests > build, fix it. Got bitten by that, and the patch fixes it for me.
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
Luiz Capitulino writes: > On Fri, 12 Nov 2010 15:16:33 +0100 > Markus Armbruster wrote: > >> Luiz Capitulino writes: >> >> > On Fri, 12 Nov 2010 11:21:57 +0100 >> > Markus Armbruster wrote: >> > >> >> Luiz Capitulino writes: >> [...] >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) >> >> > +{ >> >> > +MemoryDriver *d = chr->opaque; >> >> > + >> >> > +if (d->outbuf_size == 0) { >> >> > +return qstring_new(); >> >> > +} >> >> >> >> Why is this necessary? Is qstring_from_substr() broken for empty >> >> substrings? If it is, it ought to be fixed! >> > >> > qstring_from_substr() takes a character range; outbuf_size stores a size, >> > not a string length. So we do: >> > >> >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - >> >> > 1); >> > >> > If outbuf_size is 0, we'll be passing a negative value down. >> >> What's wrong with that? > > Although it's going to work with the current QString implementation, I don't > think it's it's a good idea to rely on a negative index. How should I extract the substring of S beginning at index B with length L? If I cant't do this for any B, L with interval [B,B+L-1] fully within [0,length(S)], then the API is flawed, and ought to be replaced. > Maybe, we could have: > > return qstring_from_substr((char *) d->outbuf, 0, > d->outbuf_size > 0 ? d->outbuf_size - 1 : 0); > > A bit harder to read, but makes the function smaller. Err, doesn't qstring_from_substr(s, 0, 0) extract a substring of length 1?
[Qemu-devel] [PATCH] Makefile: Fix check dependency breakage
Commit b152aa84d52882bb1846485a89baf13aa07c86bc broke the unit-tests build, fix it. Signed-off-by: Luiz Capitulino --- Makefile | 14 -- 1 files changed, 8 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 02698e9..719aca9 100644 --- a/Makefile +++ b/Makefile @@ -142,12 +142,14 @@ qemu-img-cmds.h: $(SRC_PATH)/qemu-img-cmds.hx check-qint.o check-qstring.o check-qdict.o check-qlist.o check-qfloat.o check-qjson.o: $(GENERATED_HEADERS) -check-qint: check-qint.o qint.o qemu-malloc.o $(trace-obj-y) -check-qstring: check-qstring.o qstring.o qemu-malloc.o $(trace-obj-y) -check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qemu-malloc.o qlist.o $(trace-obj-y) -check-qlist: check-qlist.o qlist.o qint.o qemu-malloc.o $(trace-obj-y) -check-qfloat: check-qfloat.o qfloat.o qemu-malloc.o $(trace-obj-y) -check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o json-parser.o qemu-malloc.o $(trace-obj-y) +CHECK_PROG_DEPS = qemu-malloc.o $(oslib-obj-y) $(trace-obj-y) + +check-qint: check-qint.o qint.o $(CHECK_PROG_DEPS) +check-qstring: check-qstring.o qstring.o $(CHECK_PROG_DEPS) +check-qdict: check-qdict.o qdict.o qfloat.o qint.o qstring.o qbool.o qlist.o $(CHECK_PROG_DEPS) +check-qlist: check-qlist.o qlist.o qint.o $(CHECK_PROG_DEPS) +check-qfloat: check-qfloat.o qfloat.o $(CHECK_PROG_DEPS) +check-qjson: check-qjson.o qfloat.o qint.o qdict.o qstring.o qlist.o qbool.o qjson.o json-streamer.o json-lexer.o json-parser.o $(CHECK_PROG_DEPS) clean: # avoid old build problems by removing potentially incorrect old files -- 1.7.3.2.164.g6f10c
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
On Fri, 12 Nov 2010 15:16:33 +0100 Markus Armbruster wrote: > Luiz Capitulino writes: > > > On Fri, 12 Nov 2010 11:21:57 +0100 > > Markus Armbruster wrote: > > > >> Luiz Capitulino writes: > [...] > >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) > >> > +{ > >> > +MemoryDriver *d = chr->opaque; > >> > + > >> > +if (d->outbuf_size == 0) { > >> > +return qstring_new(); > >> > +} > >> > >> Why is this necessary? Is qstring_from_substr() broken for empty > >> substrings? If it is, it ought to be fixed! > > > > qstring_from_substr() takes a character range; outbuf_size stores a size, > > not a string length. So we do: > > > >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - > >> > 1); > > > > If outbuf_size is 0, we'll be passing a negative value down. > > What's wrong with that? Although it's going to work with the current QString implementation, I don't think it's it's a good idea to rely on a negative index. Maybe, we could have: return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size > 0 ? d->outbuf_size - 1 : 0); A bit harder to read, but makes the function smaller.
Re: [Qemu-devel] Re: [Try2][PATCH] Initial implementation of a mpeg1 layer2 streaming audio driver.
On 11/08/2010 12:15 AM, François Revol wrote: Le 8 nov. 2010 à 04:57, malc a écrit : And can you, please, elaborate some more on usage scenarios of this thing? cf. http://dev.haiku-os.org/browser/haiku/trunk/3rdparty/mmu_man/onlinedemo/haiku.php Sorry my PHP skills have lapsed aeons ago. Sorry I don't have it installed on a publically accessible machine atm. and possibly http://oszoo.org/wiki/index.php/Main_Page some day... The idea is to use it along with the -vnc option and the VNC applet to present a VM on the web. While not very sexy, but can't you just use wav output to a fifo and compress it via separate process. I did try years ago, but at least the current wav driver really didn't like fifos back then. I recall trying for hours to get it pipe to ffmpeg or others without much luck. Also, this poses several problems about the control of the external process (respawn on listener disconnection, close on exit...). Doing encoding in QEMU is going to starve the guest pretty bad. For an x86 guest, that's going to result in issues with time drift, etc. Making reencoding with an external process work a bit more nicely also has the advantage of making this work with other formats like Ogg which are a bit more Open Source friendly. Regards, Anthony Liguori François.
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
Luiz Capitulino writes: > On Fri, 12 Nov 2010 11:21:57 +0100 > Markus Armbruster wrote: > >> Luiz Capitulino writes: [...] >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) >> > +{ >> > +MemoryDriver *d = chr->opaque; >> > + >> > +if (d->outbuf_size == 0) { >> > +return qstring_new(); >> > +} >> >> Why is this necessary? Is qstring_from_substr() broken for empty >> substrings? If it is, it ought to be fixed! > > qstring_from_substr() takes a character range; outbuf_size stores a size, > not a string length. So we do: > >> > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 1); > > If outbuf_size is 0, we'll be passing a negative value down. What's wrong with that? [...]
Re: [Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification
On Fri, Nov 12, 2010 at 1:58 PM, Kevin Wolf wrote: > Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: >> Signed-off-by: Stefan Hajnoczi >> --- >> docs/specs/qed_spec.txt | 128 >> +++ >> 1 files changed, 128 insertions(+), 0 deletions(-) >> create mode 100644 docs/specs/qed_spec.txt >> >> diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt >> new file mode 100644 >> index 000..e4425c8 >> --- /dev/null >> +++ b/docs/specs/qed_spec.txt >> @@ -0,0 +1,128 @@ >> +=Specification= >> + >> +The file format looks like this: >> + >> + +--+--+--+-+ >> + | cluster0 | cluster1 | cluster2 | ... | >> + +--+--+--+-+ >> + >> +The first cluster begins with the '''header'''. The header contains >> information about where regular clusters start; this allows the header to be >> extensible and store extra information about the image file. A regular >> cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''. L1 >> and L2 tables are composed of one or more contiguous clusters. >> + >> +Normally the file size will be a multiple of the cluster size. If the file >> size is not a multiple, extra information after the last cluster may not be >> preserved if data is written. Legitimate extra information should use space >> between the header and the first regular cluster. >> + >> +All fields are little-endian. >> + >> +==Header== >> + Header { >> + uint32_t magic; /* QED\0 */ >> + >> + uint32_t cluster_size; /* in bytes */ >> + uint32_t table_size; /* for L1 and L2 tables, in clusters */ >> + uint32_t header_size; /* in clusters */ >> + >> + uint64_t features; /* format feature bits */ >> + uint64_t compat_features; /* compat feature bits */ >> + uint64_t l1_table_offset; /* in bytes */ >> + uint64_t image_size; /* total logical image size, in bytes */ >> + >> + /* if (features & QED_F_BACKING_FILE) */ >> + uint32_t backing_filename_offset; /* in bytes from start of header */ >> + uint32_t backing_filename_size; /* in bytes */ >> + } >> + >> +Field descriptions: >> +* ''cluster_size'' must be a power of 2 in range [2^12, 2^26]. >> +* ''table_size'' must be a power of 2 in range [1, 16]. >> +* ''header_size'' is the number of clusters used by the header and any >> additional information stored before regular clusters. >> +* ''features'', ''compat_features'', and ''autoclear_features'' are file >> format extension bitmaps. They work as follows: >> +** An image with unknown ''features'' bits enabled must not be opened. >> File format changes that are not backwards-compatible must use ''features'' >> bits. >> +** An image with unknown ''compat_features'' bits enabled can be opened >> safely. The unknown features are simply ignored and represent >> backwards-compatible changes to the file format. >> +** An image with unknown ''autoclear_features'' bits enable can be opened >> safely after clearing the unknown bits. This allows for >> backwards-compatible changes to the file format which degrade gracefully and >> can be re-enabled again by a new program later. > > autoclear features aren't even part of the header in the spec. Thanks for spotting this, I forgot to sync the header with the code. >> +* ''l1_table_offset'' is the offset of the first byte of the L1 table in >> the image file and must be a multiple of ''cluster_size''. >> +* ''image_size'' is the block device size seen by the guest and must be a >> multiple of 512 bytes. >> +* ''backing_filename'' is a string in (byte offset, byte size) form. It is >> not NUL-terminated and has no alignment constraints. >> + >> +Feature bits: >> +* QED_F_BACKING_FILE = 0x01. The image uses a backing file. The backing >> filename string is given in the ''backing_filename_{offset,size}'' fields >> and may be an absolute path or relative to the image file. >> +* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. >> +* QED_F_BACKING_FORMAT_NO_PROBE = 0x04. The backing file is a raw disk >> image and no file format autodetection should be attempted. This should be >> used to ensure that raw backing images are never detected as an image format >> if they happen to contain magic constants. >> + >> +There are currently no defined ''compat_features'' or >> ''autoclear_features'' bits. >> + >> +Fields predicated on a feature bit are only used when that feature is set. >> The fields always take up header space, regardless of whether or not the >> feature bit is set. >> + >> +==Tables== >> + >> +Tables provide the translation from logical offsets in the block device to >> cluster offsets in the file. >> + >> + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) >> + >> + Table { >> + uint64_t offsets[TABLE_NOFFSETS]; >> + } >> + >> +The tables are organized as follows: >> + >> +
[Qemu-devel] Re: [PATCH v3 3/3] virtio-pci: Don't use ioeventfd on old kernels
On Fri, Nov 12, 2010 at 1:24 PM, Stefan Hajnoczi wrote: > @@ -1046,6 +1087,11 @@ int kvm_has_xcrs(void) > return kvm_state->xcrs; > } > > +int kvm_has_many_ioeventfds(void) > +{ > + return kvm_state->many_ioeventfds; > +} > + Missing if (!kvm_enabled()) { return 0; }. Will fix in next version, would still appreciate review comments on any other aspect of the patch. Stefan
[Qemu-devel] Re: [PATCH v4 1/5] docs: Add QED image format specification
Am 28.10.2010 13:01, schrieb Stefan Hajnoczi: > Signed-off-by: Stefan Hajnoczi > --- > docs/specs/qed_spec.txt | 128 > +++ > 1 files changed, 128 insertions(+), 0 deletions(-) > create mode 100644 docs/specs/qed_spec.txt > > diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt > new file mode 100644 > index 000..e4425c8 > --- /dev/null > +++ b/docs/specs/qed_spec.txt > @@ -0,0 +1,128 @@ > +=Specification= > + > +The file format looks like this: > + > + +--+--+--+-+ > + | cluster0 | cluster1 | cluster2 | ... | > + +--+--+--+-+ > + > +The first cluster begins with the '''header'''. The header contains > information about where regular clusters start; this allows the header to be > extensible and store extra information about the image file. A regular > cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''. L1 > and L2 tables are composed of one or more contiguous clusters. > + > +Normally the file size will be a multiple of the cluster size. If the file > size is not a multiple, extra information after the last cluster may not be > preserved if data is written. Legitimate extra information should use space > between the header and the first regular cluster. > + > +All fields are little-endian. > + > +==Header== > + Header { > + uint32_t magic; /* QED\0 */ > + > + uint32_t cluster_size;/* in bytes */ > + uint32_t table_size; /* for L1 and L2 tables, in clusters */ > + uint32_t header_size; /* in clusters */ > + > + uint64_t features;/* format feature bits */ > + uint64_t compat_features; /* compat feature bits */ > + uint64_t l1_table_offset; /* in bytes */ > + uint64_t image_size; /* total logical image size, in bytes */ > + > + /* if (features & QED_F_BACKING_FILE) */ > + uint32_t backing_filename_offset; /* in bytes from start of header */ > + uint32_t backing_filename_size; /* in bytes */ > + } > + > +Field descriptions: > +* ''cluster_size'' must be a power of 2 in range [2^12, 2^26]. > +* ''table_size'' must be a power of 2 in range [1, 16]. > +* ''header_size'' is the number of clusters used by the header and any > additional information stored before regular clusters. > +* ''features'', ''compat_features'', and ''autoclear_features'' are file > format extension bitmaps. They work as follows: > +** An image with unknown ''features'' bits enabled must not be opened. File > format changes that are not backwards-compatible must use ''features'' bits. > +** An image with unknown ''compat_features'' bits enabled can be opened > safely. The unknown features are simply ignored and represent > backwards-compatible changes to the file format. > +** An image with unknown ''autoclear_features'' bits enable can be opened > safely after clearing the unknown bits. This allows for backwards-compatible > changes to the file format which degrade gracefully and can be re-enabled > again by a new program later. autoclear features aren't even part of the header in the spec. > +* ''l1_table_offset'' is the offset of the first byte of the L1 table in the > image file and must be a multiple of ''cluster_size''. > +* ''image_size'' is the block device size seen by the guest and must be a > multiple of 512 bytes. > +* ''backing_filename'' is a string in (byte offset, byte size) form. It is > not NUL-terminated and has no alignment constraints. > + > +Feature bits: > +* QED_F_BACKING_FILE = 0x01. The image uses a backing file. The backing > filename string is given in the ''backing_filename_{offset,size}'' fields and > may be an absolute path or relative to the image file. > +* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. > +* QED_F_BACKING_FORMAT_NO_PROBE = 0x04. The backing file is a raw disk > image and no file format autodetection should be attempted. This should be > used to ensure that raw backing images are never detected as an image format > if they happen to contain magic constants. > + > +There are currently no defined ''compat_features'' or ''autoclear_features'' > bits. > + > +Fields predicated on a feature bit are only used when that feature is set. > The fields always take up header space, regardless of whether or not the > feature bit is set. > + > +==Tables== > + > +Tables provide the translation from logical offsets in the block device to > cluster offsets in the file. > + > + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) > + > + Table { > + uint64_t offsets[TABLE_NOFFSETS]; > + } > + > +The tables are organized as follows: > + > ++--+ > +| L1 table | > ++--+ > + ,--' | '--. > + +--+ |+--+ > + | L2 table | ... | L2 tabl
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
On Fri, 12 Nov 2010 11:21:57 +0100 Markus Armbruster wrote: > Luiz Capitulino writes: > > > This driver handles in-memory chardev operations. That's, all writes > > to this driver are stored in an internal buffer and it doesn't talk > > to the external world in any way. > > > > Right now it's very simple: it supports only writes. But it can be > > easily extended to support more operations. > > > > This is going to be used by the monitor's "HMP passthrough via QMP" > > feature, which needs to run monitor handlers without a backing > > device. > > > > Signed-off-by: Luiz Capitulino > > --- > > qemu-char.c | 70 > > +++ > > qemu-char.h |7 ++ > > 2 files changed, 77 insertions(+), 0 deletions(-) > > > > diff --git a/qemu-char.c b/qemu-char.c > > index 88997f9..36d23c6 100644 > > --- a/qemu-char.c > > +++ b/qemu-char.c > > @@ -2275,6 +2275,76 @@ static CharDriverState > > *qemu_chr_open_socket(QemuOpts *opts) > > return NULL; > > } > > > > +/***/ > > +/* Memory chardev */ > > +typedef struct { > > +size_t outbuf_size; > > +size_t outbuf_capacity; > > +uint8_t *outbuf; > > +} MemoryDriver; > > + > > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, int len) > > +{ > > +MemoryDriver *d = chr->opaque; > > + > > +/* TODO: the QString implementation has the same code, we should > > + * introduce a generic way to do this in cutils.c */ > > +if (d->outbuf_capacity < d->outbuf_size + len) { > > +/* grow outbuf */ > > +d->outbuf_capacity += len; > > +d->outbuf_capacity *= 2; > > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity); > > +} > > + > > +memcpy(d->outbuf + d->outbuf_size, buf, len); > > +d->outbuf_size += len; > > + > > +return len; > > +} > > + > > +void qemu_chr_init_mem(CharDriverState *chr) > > +{ > > +MemoryDriver *d; > > + > > +d = qemu_malloc(sizeof(*d)); > > +d->outbuf_size = 0; > > +d->outbuf_capacity = 4096; > > +d->outbuf = qemu_mallocz(d->outbuf_capacity); > > + > > +memset(chr, 0, sizeof(*chr)); > > +chr->opaque = d; > > +chr->chr_write = mem_chr_write; > > +} > > + > > +/* assumes the stored data is a string */ > > This could indicate a problem. See my reply in the thread for v2. Replied, but I can't see the problem. > > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) > > +{ > > +MemoryDriver *d = chr->opaque; > > + > > +if (d->outbuf_size == 0) { > > +return qstring_new(); > > +} > > Why is this necessary? Is qstring_from_substr() broken for empty > substrings? If it is, it ought to be fixed! qstring_from_substr() takes a character range; outbuf_size stores a size, not a string length. So we do: > > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 1); If outbuf_size is 0, we'll be passing a negative value down. > > +} > > + > > +/* NOTE: this driver can not be closed with qemu_chr_close()! */ > > +void qemu_chr_close_mem(CharDriverState *chr) > > +{ > > +MemoryDriver *d = chr->opaque; > > + > > +qemu_free(d->outbuf); > > +qemu_free(chr->opaque); > > +chr->opaque = NULL; > > +chr->chr_write = NULL; > > +} > > + > > +size_t qemu_chr_mem_osize(const CharDriverState *chr) > > +{ > > +const MemoryDriver *d = chr->opaque; > > +return d->outbuf_size; > > +} > > + > > QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename) > > { > > char host[65], port[33], width[8], height[8]; > > diff --git a/qemu-char.h b/qemu-char.h > > index 18ad12b..e6ee6c4 100644 > > --- a/qemu-char.h > > +++ b/qemu-char.h > > @@ -6,6 +6,7 @@ > > #include "qemu-option.h" > > #include "qemu-config.h" > > #include "qobject.h" > > +#include "qstring.h" > > > > /* character device */ > > > > @@ -100,6 +101,12 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd); > > > > extern int term_escape_char; > > > > +/* memory chardev */ > > +void qemu_chr_init_mem(CharDriverState *chr); > > +void qemu_chr_close_mem(CharDriverState *chr); > > +QString *qemu_chr_mem_to_qs(CharDriverState *chr); > > +size_t qemu_chr_mem_osize(const CharDriverState *chr); > > + > > /* async I/O support */ > > > > int qemu_set_fd_handler2(int fd, >
[Qemu-devel] Re: [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware
On Fri, 2010-11-12 at 11:20 +0200, Michael S. Tsirkin wrote: > On Thu, Nov 11, 2010 at 07:56:13PM -0700, Alex Williamson wrote: > > Now that common PCI code doesn't have a hangup on capabilities > > being contiguous, > > Hmm, this comment confused me : there's no requirement of > contigious allocations in current code in pci.c, is there? Exactly, but the code used to have cap.start and cap.length, which implied it was contiguous. Since those were removed in 5/8, we don't need to worry about where the physical capabilities land in config space. Thanks, Alex
Re: [Qemu-devel] [PATCH 2/2] Add support for generating a systemtap tapset static probes
On Fri, Nov 12, 2010 at 1:20 PM, Daniel P. Berrange wrote: > This introduces generation of a qemu.stp/qemu-system-XXX.stp > files which provides tapsets with friendly names for static > probes & their arguments. Instead of > > probe process("qemu").mark("qemu_malloc") { > printf("Malloc %d %p\n", $arg1, $arg2); > } > > It is now possible todo > > probe qemu.system.i386.qemu_malloc { > printf("Malloc %d %p\n", size, ptr); > } > > There is one tapset defined per target arch, for both > user and system emulators. > > * Makefile.target: Generate stp files for each target > * tracetool: Support for generating systemtap tapsets > * configure: Check for whether systemtap is available > with the DTrace backend > > Signed-off-by: Daniel P. Berrange > --- > Makefile.target | 29 - > configure | 7 +++ > tracetool | 128 +++--- > 3 files changed, 146 insertions(+), 18 deletions(-) Reviewed-by: Stefan Hajnoczi
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
On Fri, 12 Nov 2010 11:16:54 +0100 Markus Armbruster wrote: > Luiz Capitulino writes: > > > On Thu, 11 Nov 2010 17:32:06 +0100 > > Markus Armbruster wrote: > > > >> Luiz Capitulino writes: > >> > >> > On Thu, 11 Nov 2010 16:30:26 +0100 > >> > Markus Armbruster wrote: > >> > > >> >> Luiz Capitulino writes: > >> >> > >> >> > This driver handles in-memory chardev operations. That's, all writes > >> >> > to this driver are stored in an internal buffer and it doesn't talk > >> >> > to the external world in any way. > >> >> > > >> >> > Right now it's very simple: it supports only writes. But it can be > >> >> > easily extended to support more operations. > >> >> > > >> >> > This is going to be used by the monitor's "HMP passthrough via QMP" > >> >> > feature, which needs to run monitor handlers without a backing > >> >> > device. > >> >> > > >> >> > Signed-off-by: Luiz Capitulino > >> >> > --- > >> >> > qemu-char.c | 66 > >> >> > +++ > >> >> > qemu-char.h |6 + > >> >> > 2 files changed, 72 insertions(+), 0 deletions(-) > >> >> > > >> >> > diff --git a/qemu-char.c b/qemu-char.c > >> >> > index 88997f9..896df14 100644 > >> >> > --- a/qemu-char.c > >> >> > +++ b/qemu-char.c > >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState > >> >> > *qemu_chr_open_socket(QemuOpts *opts) > >> >> > return NULL; > >> >> > } > >> >> > > >> >> > +/***/ > >> >> > +/* Memory chardev */ > >> >> > +typedef struct { > >> >> > +size_t outbuf_size; > >> >> > +size_t outbuf_capacity; > >> >> > +uint8_t *outbuf; > >> >> > +} MemoryDriver; > >> >> > + > >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, > >> >> > int len) > >> >> > +{ > >> >> > +MemoryDriver *d = chr->opaque; > >> >> > + > >> >> > +/* TODO: the QString implementation has the same code, we should > >> >> > + * introduce a generic way to do this in cutils.c */ > >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) { > >> >> > +/* grown outbuf */ > >> >> > >> >> Used to say "grow" (sans n) here. Intentional change? > >> > > >> > Hum, no. I think I've squashed an older commit while rebasing (but this > >> > seems > >> > to be the only problem). > >> > > >> >> > >> >> > +d->outbuf_capacity += len; > >> >> > +d->outbuf_capacity *= 2; > >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity); > >> >> > +} > >> >> > + > >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len); > >> >> > +d->outbuf_size += len; > >> >> > + > >> >> > +return len; > >> >> > +} > >> >> > + > >> >> > +#define DEFAULT_BUF_SIZE 4096 > >> >> > >> >> It's the *initial* buffer size, isn't it? > >> > > >> > Yes. > >> > >> Could we make the name reflect that then? > >> > >> >> Doubt it's worth a #define (there's just one user), but that's a matter > >> >> of taste. > >> >> > >> >> > + > >> >> > +void qemu_chr_init_mem(CharDriverState *chr) > >> >> > +{ > >> >> > +MemoryDriver *d; > >> >> > + > >> >> > +d = qemu_malloc(sizeof(*d)); > >> >> > +d->outbuf_size = 0; > >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE; > >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity); > >> >> > + > >> >> > +memset(chr, 0, sizeof(*chr)); > >> >> > +chr->opaque = d; > >> >> > +chr->chr_write = mem_chr_write; > >> >> > +} > >> >> > + > >> >> > +/* assumes the stored data is a string */ > >> >> > >> >> What else could it be? Worrying about embedded '\0's? > >> > > >> > Yes, as the driver itself doesn't interpret the contents of its > >> > buffer. > >> > >> What happens if there are embedded '\0's? > > > > The string will be shorter than expected? And what if it contains > > non-printable characters? > > > > It's just a cautionary comment to help the user identify such problems, I > > think > > we're making a whole argument about a quite minor thing. > > When I see "assumes X" in a function comment, I immediately ask "and > what happens when !X?" The default answer is "it explodes, so don't do > that". That answer is wrong here. Therefore, I find the comment > misleading. That's how you interpret it, my interpretation is that I might not get the expected behavior. > Let's figure out what really happens. The human command's output is > sent to the client as a JSON string (response object member return). > JSON strings can consist of Unicode characters, "except for the > characters that must be escaped: quotation mark, reverse solidus, and > the control characters (U+ through U+001F)" (RFC 4627, section 2.5). > > Do we escape these characters? Where in the code? Should be in the json parser, but qemu_chr_mem_to_qs() doesn't assume its users (and it obviously shouldn't). > > >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) > >> >> > +{ > >> >> > +MemoryDriver *d = chr->opaque; > >> >> > + > >> >> > +if (d->o
[Qemu-devel] [PATCH v3 3/3] virtio-pci: Don't use ioeventfd on old kernels
There used to be a limit of 6 KVM io bus devices inside the kernel. On such a kernel, don't use ioeventfd for virtqueue host notification since the limit is reached too easily. This ensures that existing vhost-net setups (which always use ioeventfd) have ioeventfds available so they can continue to work. Signed-off-by: Stefan Hajnoczi --- hw/virtio-pci.c |4 kvm-all.c | 46 ++ kvm-stub.c |5 + kvm.h |1 + 4 files changed, 56 insertions(+), 0 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 117e855..d3a7a9c 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -661,6 +661,10 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, pci_register_bar(&proxy->pci_dev, 0, size, PCI_BASE_ADDRESS_SPACE_IO, virtio_map); +if (!kvm_has_many_ioeventfds()) { +proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD; +} + virtio_bind_device(vdev, &virtio_pci_bindings, proxy); proxy->host_features |= 0x1 << VIRTIO_F_NOTIFY_ON_EMPTY; proxy->host_features |= 0x1 << VIRTIO_F_BAD_FEATURE; diff --git a/kvm-all.c b/kvm-all.c index 37b99c7..ba302bc 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -28,6 +28,11 @@ #include "kvm.h" #include "bswap.h" +/* This check must be after config-host.h is included */ +#ifdef CONFIG_EVENTFD +#include +#endif + /* KVM uses PAGE_SIZE in it's definition of COALESCED_MMIO_MAX */ #define PAGE_SIZE TARGET_PAGE_SIZE @@ -72,6 +77,7 @@ struct KVMState int irqchip_in_kernel; int pit_in_kernel; int xsave, xcrs; +int many_ioeventfds; }; static KVMState *kvm_state; @@ -441,6 +447,39 @@ int kvm_check_extension(KVMState *s, unsigned int extension) return ret; } +static int kvm_check_many_ioeventfds(void) +{ +/* Older kernels have a 6 device limit on the KVM io bus. Find out so we + * can avoid creating too many ioeventfds. + */ +#ifdef CONFIG_EVENTFD +int ioeventfds[7]; +int i, ret = 0; +for (i = 0; i < ARRAY_SIZE(ioeventfds); i++) { +ioeventfds[i] = eventfd(0, EFD_CLOEXEC); +if (ioeventfds[i] < 0) { +break; +} +ret = kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, true); +if (ret < 0) { +close(ioeventfds[i]); +break; +} +} + +/* Decide whether many devices are supported or not */ +ret = i == ARRAY_SIZE(ioeventfds); + +while (i-- > 0) { +kvm_set_ioeventfd_pio_word(ioeventfds[i], 0, i, false); +close(ioeventfds[i]); +} +return ret; +#else +return 0; +#endif +} + static void kvm_set_phys_mem(target_phys_addr_t start_addr, ram_addr_t size, ram_addr_t phys_offset) @@ -717,6 +756,8 @@ int kvm_init(int smp_cpus) kvm_state = s; cpu_register_phys_memory_client(&kvm_cpu_phys_memory_client); +s->many_ioeventfds = kvm_check_many_ioeventfds(); + return 0; err: @@ -1046,6 +1087,11 @@ int kvm_has_xcrs(void) return kvm_state->xcrs; } +int kvm_has_many_ioeventfds(void) +{ +return kvm_state->many_ioeventfds; +} + void kvm_setup_guest_memory(void *start, size_t size) { if (!kvm_has_sync_mmu()) { diff --git a/kvm-stub.c b/kvm-stub.c index 5384a4b..33d4476 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -99,6 +99,11 @@ int kvm_has_robust_singlestep(void) return 0; } +int kvm_has_many_ioeventfds(void) +{ +return 0; +} + void kvm_setup_guest_memory(void *start, size_t size) { } diff --git a/kvm.h b/kvm.h index 60a9b42..ce08d42 100644 --- a/kvm.h +++ b/kvm.h @@ -42,6 +42,7 @@ int kvm_has_robust_singlestep(void); int kvm_has_debugregs(void); int kvm_has_xsave(void); int kvm_has_xcrs(void); +int kvm_has_many_ioeventfds(void); #ifdef NEED_CPU_H int kvm_init_vcpu(CPUState *env); -- 1.7.2.3
Re: [Qemu-devel] [PATCH 1/2] Add a DTrace tracing backend targetted for SystemTAP compatability
On Fri, Nov 12, 2010 at 1:20 PM, Daniel P. Berrange wrote: > This introduces a new tracing backend that targets the SystemTAP > implementation of DTrace userspace tracing. The core functionality > should be applicable and standard across any DTrace implementation > on Solaris, OS-X, *BSD, but the Makefile rules will likely need > some small additional changes to cope with OS specific build > requirements. > > This backend builds a little differently from the other tracing > backends. Specifically there is no 'trace.c' file, because the > 'dtrace' command line tool generates a '.o' file directly from > the dtrace probe definition file. The probe definition is usually > named with a '.d' extension but QEMU uses '.d' files for its > external makefile dependancy tracking, so this uses '.dtrace' as > the extension for the probe definition file. > > The 'tracetool' program gains the ability to generate a trace.h > file for DTrace, and also to generate the trace.d file containing > the dtrace probe definition. > > Example usage of a dtrace probe in systemtap looks like: > > probe process("qemu").mark("qemu_malloc") { > printf("Malloc %d %p\n", $arg1, $arg2); > } > > * .gitignore: Ignore trace-dtrace.* > * Makefile: Extra rules for generating DTrace files > * Makefile.obj: Don't build trace.o for DTrace, use > trace-dtrace.o generated by 'dtrace' instead > * tracetool: Support for generating DTrace data files > > Signed-off-by: Daniel P. Berrange > --- > .gitignore | 2 + > Makefile | 23 +++ > Makefile.objs | 4 ++ > configure | 14 ++- > tracetool | 122 > - > 5 files changed, 154 insertions(+), 11 deletions(-) Reviewed-by: Stefan Hajnoczi
[Qemu-devel] [PATCH v3 1/3] virtio-pci: Rename bugs field to flags
The VirtIOPCIProxy bugs field is currently used to enable workarounds for older guests. Rename it to flags so that other per-device behavior can be tracked. A later patch uses the flags field to remember whether ioeventfd should be used for virtqueue host notification. Signed-off-by: Stefan Hajnoczi --- hw/virtio-pci.c | 15 +++ 1 files changed, 7 insertions(+), 8 deletions(-) diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 729917d..549118d 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -80,9 +80,8 @@ * 12 is historical, and due to x86 page size. */ #define VIRTIO_PCI_QUEUE_ADDR_SHIFT12 -/* We can catch some guest bugs inside here so we continue supporting older - guests. */ -#define VIRTIO_PCI_BUG_BUS_MASTER (1 << 0) +/* Flags track per-device state like workarounds for quirks in older guests. */ +#define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 << 0) /* QEMU doesn't strictly need write barriers since everything runs in * lock-step. We'll leave the calls to wmb() in though to make it obvious for @@ -95,7 +94,7 @@ typedef struct { PCIDevice pci_dev; VirtIODevice *vdev; -uint32_t bugs; +uint32_t flags; uint32_t addr; uint32_t class_code; uint32_t nvectors; @@ -159,7 +158,7 @@ static int virtio_pci_load_config(void * opaque, QEMUFile *f) in ready state. Then we have a buggy guest OS. */ if ((proxy->vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) && !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) { -proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER; +proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } return 0; } @@ -185,7 +184,7 @@ static void virtio_pci_reset(DeviceState *d) VirtIOPCIProxy *proxy = container_of(d, VirtIOPCIProxy, pci_dev.qdev); virtio_reset(proxy->vdev); msix_reset(&proxy->pci_dev); -proxy->bugs = 0; +proxy->flags = 0; } static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) @@ -235,7 +234,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) some safety checks. */ if ((val & VIRTIO_CONFIG_S_DRIVER_OK) && !(proxy->pci_dev.config[PCI_COMMAND] & PCI_COMMAND_MASTER)) { -proxy->bugs |= VIRTIO_PCI_BUG_BUS_MASTER; +proxy->flags |= VIRTIO_PCI_FLAG_BUS_MASTER_BUG; } break; case VIRTIO_MSI_CONFIG_VECTOR: @@ -403,7 +402,7 @@ static void virtio_write_config(PCIDevice *pci_dev, uint32_t address, if (PCI_COMMAND == address) { if (!(val & PCI_COMMAND_MASTER)) { -if (!(proxy->bugs & VIRTIO_PCI_BUG_BUS_MASTER)) { +if (!(proxy->flags & VIRTIO_PCI_FLAG_BUS_MASTER_BUG)) { virtio_set_status(proxy->vdev, proxy->vdev->status & ~VIRTIO_CONFIG_S_DRIVER_OK); } -- 1.7.2.3
Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport
On 11/12/2010 06:14 AM, Ian Molton wrote: On 10/11/10 17:47, Anthony Liguori wrote: On 11/10/2010 11:22 AM, Ian Molton wrote: Ping ? I think the best way forward is to post patches. I posted links to the git trees. I can post patches, but they are *large*. Do you really want me to post them? Yes, and they have to be split up into something reviewable. To summarize what I was trying to express in the thread, I think this is not the right long term architecture but am not opposed to it as a short term solution. I think having a new virtio device is a bad design choice but am not totally opposed to it. Ok! (I agree (that this should be a short term solution) :) ) you want to go for the path of integration, you're going to have to fix all of the coding style issues and make the code fit into QEMU. Dropping a bunch of junk into target-i386/ is not making the code fit into QEMU. I agree. how about hw/gl for the renderer and hw/ for the virtio module? That would be fine. If you post just what you have now in patch form, I can try to provide more concrete advice ignoring the coding style problems. I can post patches, although I dont think LKML would appreciate the volume! I can post them to the qemu list if you do. Yes, qemu is where I was suggesting you post them. Regards, Anthony Liguori -Ian
[Qemu-devel] [PATCH v3 2/3] virtio-pci: Use ioeventfd for virtqueue notify
Virtqueue notify is currently handled synchronously in userspace virtio. This prevents the vcpu from executing guest code while hardware emulation code handles the notify. On systems that support KVM, the ioeventfd mechanism can be used to make virtqueue notify a lightweight exit by deferring hardware emulation to the iothread and allowing the VM to continue execution. This model is similar to how vhost receives virtqueue notifies. The result of this change is improved performance for userspace virtio devices. Virtio-blk throughput increases especially for multithreaded scenarios and virtio-net transmit throughput increases substantially. Some virtio devices are known to have guest drivers which expect a notify to be processed synchronously and spin waiting for completion. Only enable ioeventfd for virtio-blk and virtio-net for now. Care must be taken not to interfere with vhost-net, which already uses ioeventfd host notifiers. The following list shows the behavior implemented in this patch and is designed to take vhost-net into account: * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, qemu_set_fd_handler(virtio_pci_host_notifier_read) * !VIRTIO_CONFIG_S_DRIVER_OK -> qemu_set_fd_handler(NULL), deassign host notifiers * virtio_pci_set_host_notifier(true) -> qemu_set_fd_handler(NULL) * virtio_pci_set_host_notifier(false) -> qemu_set_fd_handler(virtio_pci_host_notifier_read) Signed-off-by: Stefan Hajnoczi --- hw/virtio-pci.c | 152 ++ hw/virtio.c | 14 - hw/virtio.h | 13 + 3 files changed, 153 insertions(+), 26 deletions(-) Now toggles host notifiers based on VIRTIO_CONFIG_S_DRIVER_OK status changes. The cleanest way I could see was to introduce pre and a post set_status() callbacks. They allow a binding to hook status changes, including the status change from virtio_reset(). diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 549118d..117e855 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -83,6 +83,11 @@ /* Flags track per-device state like workarounds for quirks in older guests. */ #define VIRTIO_PCI_FLAG_BUS_MASTER_BUG (1 << 0) +/* Performance improves when virtqueue kick processing is decoupled from the + * vcpu thread using ioeventfd for some devices. */ +#define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1 +#define VIRTIO_PCI_FLAG_USE_IOEVENTFD (1 << VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT) + /* QEMU doesn't strictly need write barriers since everything runs in * lock-step. We'll leave the calls to wmb() in though to make it obvious for * KVM or if kqemu gets SMP support. @@ -179,12 +184,125 @@ static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f) return 0; } +static int virtio_pci_set_host_notifier_ioeventfd(VirtIOPCIProxy *proxy, + int n, bool assign) +{ +VirtQueue *vq = virtio_get_queue(proxy->vdev, n); +EventNotifier *notifier = virtio_queue_get_host_notifier(vq); +int r; +if (assign) { +r = event_notifier_init(notifier, 1); +if (r < 0) { +return r; +} +r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), + proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY, + n, assign); +if (r < 0) { +event_notifier_cleanup(notifier); +} +} else { +r = kvm_set_ioeventfd_pio_word(event_notifier_get_fd(notifier), + proxy->addr + VIRTIO_PCI_QUEUE_NOTIFY, + n, assign); +if (r < 0) { +return r; +} +event_notifier_cleanup(notifier); +} +return r; +} + +static void virtio_pci_host_notifier_read(void *opaque) +{ +VirtQueue *vq = opaque; +EventNotifier *n = virtio_queue_get_host_notifier(vq); +if (event_notifier_test_and_clear(n)) { +virtio_queue_notify_vq(vq); +} +} + +static void virtio_pci_set_host_notifier_fd_handler(VirtIOPCIProxy *proxy, +int n, bool assign) +{ +VirtQueue *vq = virtio_get_queue(proxy->vdev, n); +EventNotifier *notifier = virtio_queue_get_host_notifier(vq); +if (assign) { +qemu_set_fd_handler(event_notifier_get_fd(notifier), +virtio_pci_host_notifier_read, NULL, vq); +} else { +qemu_set_fd_handler(event_notifier_get_fd(notifier), +NULL, NULL, NULL); +} +} + +static int virtio_pci_set_host_notifiers(VirtIOPCIProxy *proxy, bool assign) +{ +int n, r; + +for (n = 0; n < VIRTIO_PCI_QUEUE_MAX; n++) { +if (!virtio_queue_get_num(proxy->vdev, n)) { +continue; +} + +if (assign) { +r = virtio_pci_set_host_notifier_ioeventfd(proxy, n, true); +if (r < 0) { +goto assign_error; +} + +
[Qemu-devel] [PATCH 2/2] Add support for generating a systemtap tapset static probes
This introduces generation of a qemu.stp/qemu-system-XXX.stp files which provides tapsets with friendly names for static probes & their arguments. Instead of probe process("qemu").mark("qemu_malloc") { printf("Malloc %d %p\n", $arg1, $arg2); } It is now possible todo probe qemu.system.i386.qemu_malloc { printf("Malloc %d %p\n", size, ptr); } There is one tapset defined per target arch, for both user and system emulators. * Makefile.target: Generate stp files for each target * tracetool: Support for generating systemtap tapsets * configure: Check for whether systemtap is available with the DTrace backend Signed-off-by: Daniel P. Berrange --- Makefile.target | 29 - configure |7 +++ tracetool | 128 +++--- 3 files changed, 146 insertions(+), 18 deletions(-) diff --git a/Makefile.target b/Makefile.target index 91e6e74..7c3671c 100644 --- a/Makefile.target +++ b/Makefile.target @@ -40,7 +40,27 @@ kvm.o kvm-all.o vhost.o vhost_net.o: QEMU_CFLAGS+=$(KVM_CFLAGS) config-target.h: config-target.h-timestamp config-target.h-timestamp: config-target.mak -all: $(PROGS) +ifdef CONFIG_SYSTEMTAP_TRACE +stap: $(QEMU_PROG).stp + +ifdef CONFIG_USER_ONLY +TARGET_TYPE=user +else +TARGET_TYPE=system +endif + +$(QEMU_PROG).stp: + $(call quiet-command,sh $(SRC_PATH)/tracetool \ + --$(TRACE_BACKEND) \ + --binary $(bindir)/$(QEMU_PROG) \ + --target-arch $(TARGET_ARCH) \ + --target-type $(TARGET_TYPE) \ + --stap < $(SRC_PATH)/trace-events > $(QEMU_PROG).stp," GEN $(QEMU_PROG).stp") +else +stap: +endif + +all: $(PROGS) stap # Dummy command so that make thinks it has done something @true @@ -340,6 +360,9 @@ clean: rm -f *.o *.a *~ $(PROGS) nwfpe/*.o fpu/*.o rm -f *.d */*.d tcg/*.o ide/*.o rm -f hmp-commands.h qmp-commands.h gdbstub-xml.c +ifdef CONFIG_SYSTEMTAP_TRACE + rm -f *.stp +endif install: all ifneq ($(PROGS),) @@ -348,6 +371,10 @@ ifneq ($(STRIP),) $(STRIP) $(patsubst %,"$(DESTDIR)$(bindir)/%",$(PROGS)) endif endif +ifdef CONFIG_SYSTEMTAP_TRACE + $(INSTALL_DIR) "$(DESTDIR)$(datadir)/../systemtap/tapset" + $(INSTALL_DATA) $(QEMU_PROG).stp "$(DESTDIR)$(datadir)/../systemtap/tapset" +endif # Include automatically generated dependency files -include $(wildcard *.d */*.d) diff --git a/configure b/configure index f8dad3e..2917874 100755 --- a/configure +++ b/configure @@ -2203,6 +2203,10 @@ if test "$trace_backend" = "dtrace"; then echo exit 1 fi + trace_backend_stap="no" + if has 'stap' ; then +trace_backend_stap="yes" + fi fi ## @@ -2645,6 +2649,9 @@ fi if test "$trace_backend" = "simple"; then trace_file="\"$trace_file-%u\"" fi +if test "$trace_backend" = "dtrace" -a "$trace_backend_stap" = "yes" ; then + echo "CONFIG_SYSTEMTAP_TRACE=y" >> $config_host_mak +fi echo "CONFIG_TRACE_FILE=$trace_file" >> $config_host_mak echo "TOOLS=$tools" >> $config_host_mak diff --git a/tracetool b/tracetool index 1ade103..fce491c 100755 --- a/tracetool +++ b/tracetool @@ -23,9 +23,16 @@ Backends: --dtrace DTrace/SystemTAP backend Output formats: - -hGenerate .h file - -cGenerate .c file - -dGenerate .d file (DTrace only) + -h Generate .h file + -c Generate .c file + -d Generate .d file (DTrace only) + --stap Generate .stp file (DTrace with SystemTAP only) + +Options: + --binary [path] Full path to QEMU binary + --target-arch [arch] QEMU emulator target arch + --target-type [type] QEMU emulator target type ('system' or 'user') + EOF exit 1 } @@ -396,6 +403,51 @@ linetod_end_dtrace() EOF } +linetostap_begin_dtrace() +{ +return +} + +linetostap_dtrace() +{ +local i arg name args arglist state +name=$(get_name "$1") +args=$(get_args "$1") +arglist=$(get_argnames "$1", "") +state=$(get_state "$1") +if [ "$state" = "0" ] ; then +name=${name##disable } +fi + +# Define prototype for probe arguments +cat <
[Qemu-devel] [PATCH 1/2] Add a DTrace tracing backend targetted for SystemTAP compatability
This introduces a new tracing backend that targets the SystemTAP implementation of DTrace userspace tracing. The core functionality should be applicable and standard across any DTrace implementation on Solaris, OS-X, *BSD, but the Makefile rules will likely need some small additional changes to cope with OS specific build requirements. This backend builds a little differently from the other tracing backends. Specifically there is no 'trace.c' file, because the 'dtrace' command line tool generates a '.o' file directly from the dtrace probe definition file. The probe definition is usually named with a '.d' extension but QEMU uses '.d' files for its external makefile dependancy tracking, so this uses '.dtrace' as the extension for the probe definition file. The 'tracetool' program gains the ability to generate a trace.h file for DTrace, and also to generate the trace.d file containing the dtrace probe definition. Example usage of a dtrace probe in systemtap looks like: probe process("qemu").mark("qemu_malloc") { printf("Malloc %d %p\n", $arg1, $arg2); } * .gitignore: Ignore trace-dtrace.* * Makefile: Extra rules for generating DTrace files * Makefile.obj: Don't build trace.o for DTrace, use trace-dtrace.o generated by 'dtrace' instead * tracetool: Support for generating DTrace data files Signed-off-by: Daniel P. Berrange --- .gitignore|2 + Makefile | 23 +++ Makefile.objs |4 ++ configure | 14 ++- tracetool | 122 - 5 files changed, 154 insertions(+), 11 deletions(-) diff --git a/.gitignore b/.gitignore index a43e4d1..3efb4ec 100644 --- a/.gitignore +++ b/.gitignore @@ -4,6 +4,8 @@ config-host.* config-target.* trace.h trace.c +trace-dtrace.h +trace-dtrace.dtrace *-timestamp *-softmmu *-darwin-user diff --git a/Makefile b/Makefile index 02698e9..554ad97 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,9 @@ # Makefile for QEMU. GENERATED_HEADERS = config-host.h trace.h qemu-options.def +ifeq ($(TRACE_BACKEND),dtrace) +GENERATED_HEADERS += trace-dtrace.h +endif ifneq ($(wildcard config-host.mak),) # Put the all: rule here so that config-host.mak can contain dependencies. @@ -108,7 +111,11 @@ ui/vnc.o: QEMU_CFLAGS += $(VNC_TLS_CFLAGS) bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS) +ifeq ($(TRACE_BACKEND),dtrace) +trace.h: trace.h-timestamp trace-dtrace.h +else trace.h: trace.h-timestamp +endif trace.h-timestamp: $(SRC_PATH)/trace-events config-host.mak $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -h < $< > $@," GEN trace.h") @cmp -s $@ trace.h || cp $@ trace.h @@ -120,6 +127,20 @@ trace.c-timestamp: $(SRC_PATH)/trace-events config-host.mak trace.o: trace.c $(GENERATED_HEADERS) +trace-dtrace.h: trace-dtrace.dtrace + $(call quiet-command,dtrace -o $@ -h -s $<, " GEN trace-dtrace.h") + +# Normal practice is to name DTrace probe file with a '.d' extension +# but that gets picked up by QEMU's Makefile as an external dependancy +# rule file. So we use '.dtrace' instead +trace-dtrace.dtrace: trace-dtrace.dtrace-timestamp +trace-dtrace.dtrace-timestamp: $(SRC_PATH)/trace-events config-host.mak + $(call quiet-command,sh $(SRC_PATH)/tracetool --$(TRACE_BACKEND) -d < $< > $@," GEN trace-dtrace.dtrace") + @cmp -s $@ trace-dtrace.dtrace || cp $@ trace-dtrace.dtrace + +trace-dtrace.o: trace-dtrace.dtrace $(GENERATED_HEADERS) + $(call quiet-command,dtrace -o $@ -G -s $<, " GEN trace-dtrace.o") + simpletrace.o: simpletrace.c $(GENERATED_HEADERS) version.o: $(SRC_PATH)/version.rc config-host.mak @@ -157,6 +178,8 @@ clean: rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d rm -f qemu-img-cmds.h rm -f trace.c trace.h trace.c-timestamp trace.h-timestamp + rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp + rm -f trace-dtrace.h trace-dtrace.h-timestamp $(MAKE) -C tests clean for d in $(ALL_SUBDIRS) libhw32 libhw64 libuser libdis libdis-user; do \ if test -d $$d; then $(MAKE) -C $$d $@ || exit 1; fi; \ diff --git a/Makefile.objs b/Makefile.objs index 15569af..23b17ce 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -286,11 +286,15 @@ libdis-$(CONFIG_SPARC_DIS) += sparc-dis.o ## # trace +ifeq ($(TRACE_BACKEND),dtrace) +trace-obj-y = trace-dtrace.o +else trace-obj-y = trace.o ifeq ($(TRACE_BACKEND),simple) trace-obj-y += simpletrace.o user-obj-y += qemu-timer-common.o endif +endif vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS) diff --git a/configure b/configure index 7025d2b..f8dad3e 100755 --- a/configure +++ b/configure @@ -929,7 +929,7 @@ echo " --enable-docsenable documentation build" echo " --disable-docs disable documentation build" echo " --disable-vhost-net disable vhost-net acceleratio
[Qemu-devel] Add support for SystemTAP and DTrace tracing backends (v6)
An update of the SystemTAP/DTrace patches from http://lists.gnu.org/archive/html/qemu-devel/2010-11/msg00563.html The first patch contains the generic DTrace tracing backend support. The second patch contains additional pieces for SystemTAP to generate a tapset for each of the qemu-system-XXX binaries, to simplify life for admins wanting to use the tracing backend. Changed in v6: - Fix handling of probes with no-args in DTrace provider - Generate one tapset per target, for both system and user emulators - Re-write command line arg parsing in tracetool so it doesn't assume a particular ordering of args - Change '-s' to '--stap' in tracetool to avoid confusion - Pass full binary name into tracetool
[Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify
On Fri, Nov 12, 2010 at 9:25 AM, Michael S. Tsirkin wrote: > On Fri, Nov 12, 2010 at 09:18:48AM +, Stefan Hajnoczi wrote: >> On Thu, Nov 11, 2010 at 3:53 PM, Michael S. Tsirkin wrote: >> > On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote: >> >> Care must be taken not to interfere with vhost-net, which already uses >> >> ioeventfd host notifiers. The following list shows the behavior >> >> implemented in >> >> this patch and is designed to take vhost-net into account: >> >> >> >> * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, >> >> qemu_set_fd_handler(virtio_pci_host_notifier_read) >> > >> > we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared >> > by io write or bus master bit? >> >> You're right, I'll fix the lifecycle to trigger symmetrically on >> status bit changes rather than VIRTIO_CONFIG_S_DRIVER_OK/reset. >> >> >> +static void virtio_pci_reset_vdev(VirtIOPCIProxy *proxy) >> >> +{ >> >> + /* Poke virtio device so it deassigns its host notifiers (if any) */ >> >> + virtio_set_status(proxy->vdev, 0); >> > >> > Hmm. virtio_reset already sets status to 0. >> > I guess it should just be fixed to call virtio_set_status? >> >> This part is ugly. The problem is that virtio_reset() calls >> virtio_set_status(vdev, 0) but doesn't give the transport binding a >> chance clean up after the virtio device has cleaned up. Since >> virtio-net will spot status=0 and deassign its host notifier, we need >> to perform our own clean up after vhost. >> >> What makes this slightly less of a hack is the fact that virtio-pci.c >> was already causing virtio_set_status(vdev, 0) to be invoked twice >> during reset. When 0 is written to the VIRTIO_PCI_STATUS register, we >> do virtio_set_status(proxy->vdev, val & 0xFF) and then >> virtio_reset(proxy->vdev). So the status byte callback already gets >> invoked twice today. >> >> I've just split this out into virtio_pci_reset_vdev() and (ab)used it >> to correctly clean up virtqueue ioeventfd. >> >> The alternative is to add another callback from virtio.c so we are >> notified after the vdev's reset callback has finished. > > Oh, likely not worth it. Mabe put the above explanation in the comment. > Will this go away now that you move to set notifiers on status write? For v3 I have switched to a bindings callback. I wish it wasn't necessary but the only other ways I can think of catching status writes are hacks which depend on side-effects too much. Stefan
[Qemu-devel] Re: Cannot not unplug cold-plugged devices
On Fri, Nov 12, 2010 at 01:26:30PM +0200, Michael S. Tsirkin wrote: > No, I am just trying to understand why is hotplug event dangerous. > We still get it if we do device add before starting the VM, right? I'm not sure if it's safe to call enable/disable_device() and pm_update_sci() before starting VM. So I'd like to avoid to call them instead of making sure it. If someone else ensures its safety, I'm willing to eliminate the if clause. -- yamahata
Re: [Qemu-devel] Access to specific isa card with Qemu???
On 11/08/10 17:21, Djamel Hakkar wrote: > Hello, > > We have a software that runs on MS-DOS and must communicate with a specific > card installed on port isa. > We want to use this software in Qemu with a machine that runs XP. > Is it possible to access to the ISA port with Qemu in this case? > > Do we have to do a specific development? > Can you help us in this development, how much would it cost? Right now I don't think there is any support for ISA pass-through, but it could probably be done, even if it sounds pretty scary. There's a lot of risk with ISA cards, they can easily take down the whole system, and if it does DMA you are probably out of luck. You might find someone on the list willing to do the work on a contract, but I will let interested parties reply to you directly. I cannot do it. Cheers, Jes
Re: [Qemu-devel] [PATCH] virtio-9p: fix build on !CONFIG_UTIMENSAT v2
On 11/08/10 07:44, M. Mohan Kumar wrote: >> This patch introduce a fallback mechanism for old systems that do not >> support utimensat. This will fix build failure with following warnings: >> >> hw/virtio-9p-local.c: In function 'local_utimensat': >> hw/virtio-9p-local.c:479: warning: implicit declaration of function >> 'utimensat' hw/virtio-9p-local.c:479: warning: nested extern declaration >> of 'utimensat' >> >> and >> >> hw/virtio-9p.c: In function 'v9fs_setattr_post_chmod': >> hw/virtio-9p.c:1410: error: 'UTIME_NOW' undeclared (first use in this >> function) hw/virtio-9p.c:1410: error: (Each undeclared identifier is >> reported only once hw/virtio-9p.c:1410: error: for each function it >> appears in.) >> hw/virtio-9p.c:1413: error: 'UTIME_OMIT' undeclared (first use in this >> function) hw/virtio-9p.c: In function 'v9fs_wstat_post_chmod': >> hw/virtio-9p.c:2905: error: 'UTIME_OMIT' undeclared (first use in this >> function) >> >> Signed-off-by: Hidetoshi Seto >> --- >> hw/virtio-9p-local.c | 32 ++-- >> hw/virtio-9p.h | 10 ++ >> 2 files changed, 40 insertions(+), 2 deletions(-) >> >> diff --git a/hw/virtio-9p-local.c b/hw/virtio-9p-local.c >> index 0d52020..7811d2c 100644 >> --- a/hw/virtio-9p-local.c >> +++ b/hw/virtio-9p-local.c >> @@ -479,10 +479,38 @@ static int local_chown(FsContext *fs_ctx, const char >> *path, FsCred *credp) return -1; >> } >> >> +/* TODO: relocate this to proper file, and make it more generic */ >> +static int qemu_utimensat(int dirfd, const char *path, >> + const struct timespec *times, int flags) >> +{ > > IMHO, this code can be moved to cutils.c It's not a C utility function, so it really belongs in oslib-posix.c, but otherwise I agree. This is emulation of a C library function, it shouldn't be in the 9p code. Cheers, Jes
Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport
On 10/11/10 17:47, Anthony Liguori wrote: On 11/10/2010 11:22 AM, Ian Molton wrote: Ping ? I think the best way forward is to post patches. I posted links to the git trees. I can post patches, but they are *large*. Do you really want me to post them? To summarize what I was trying to express in the thread, I think this is not the right long term architecture but am not opposed to it as a short term solution. I think having a new virtio device is a bad design choice but am not totally opposed to it. Ok! (I agree (that this should be a short term solution) :) ) you want to go for the path of integration, you're going to have to fix all of the coding style issues and make the code fit into QEMU. Dropping a bunch of junk into target-i386/ is not making the code fit into QEMU. I agree. how about hw/gl for the renderer and hw/ for the virtio module? If you post just what you have now in patch form, I can try to provide more concrete advice ignoring the coding style problems. I can post patches, although I dont think LKML would appreciate the volume! I can post them to the qemu list if you do. -Ian
[Qemu-devel] Re: Cannot not unplug cold-plugged devices
On Fri, Nov 12, 2010 at 06:59:52PM +0900, Isaku Yamahata wrote: > On Fri, Nov 12, 2010 at 11:18:57AM +0200, Michael S. Tsirkin wrote: > > On Fri, Nov 12, 2010 at 04:24:11PM +0900, Isaku Yamahata wrote: > > > On Thu, Nov 11, 2010 at 11:29:39PM -0700, Cam Macdonell wrote: > > > > Hi, > > > > > > > > I was trying to do a "device_del" on my ivshmem device and it won't > > > > work unless the device is added via hotplug. If the device is > > > > coldplugged (added at startup) then nothing happens. I think I > > > > tracked this behaviour to the patch below. > > > > > > > > Is not allowing coldplugged devices to be unplugged the desired > > > > behaviour? > > > > > > Oh, my bad. Does the following patch help? > > > > > > >From 45914b2fc95750b65685dfb98a435f58e38b45ba Mon Sep 17 00:00:00 2001 > > > Message-Id: > > > <45914b2fc95750b65685dfb98a435f58e38b45ba.1289546596.git.yamah...@valinux.co.jp> > > > From: Isaku Yamahata > > > Date: Fri, 12 Nov 2010 16:21:35 +0900 > > > Subject: [PATCH] acpi/piix4: allow pci hotplug for cold plugged device. > > > > > > This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1 > > > reported by Cam Macdonell . > > > Before the change set, cold plugged device can be hot unplugged. > > > This patch unbreaks it. > > > > > > Signed-off-by: Isaku Yamahata > > > > > > > > > --- > > > hw/acpi_piix4.c |4 +++- > > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c > > > index 66c7885..bca330e 100644 > > > --- a/hw/acpi_piix4.c > > > +++ b/hw/acpi_piix4.c > > > @@ -621,8 +621,10 @@ static int piix4_device_hotplug(DeviceState *qdev, > > > PCIDevice *dev, int state) > > > PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, > > > DO_UPCAST(PCIDevice, qdev, qdev)); > > > > > > -if (!dev->qdev.hotplugged) > > > +/* qemu initialization case */ > > > > That comment is pretty obscure :) > > How about the following? > > Reached here during qemu creating a machine before > VM runs. hot plug event shouldn't be triggered > because it's dangerous. > > > > > > > > +if (state && !dev->qdev.hotplugged) { > > > return 0; > > > +} > > > > Do we even need this check at all? > > Hmm, I suppose you don't like the condition more complex. > How about "if (qdev_hotplug)". > An access function to qdev_hotplug in qdev.c would be desirable. No, I am just trying to understand why is hotplug event dangerous. We still get it if we do device add before starting the VM, right? > > > > > > > > > s->pci0_status.up = 0; > > > s->pci0_status.down = 0; > > > -- > > > 1.7.1.1 > > > > > > > > > > > > -- > > > yamahata > > > > -- > yamahata
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
Luiz Capitulino writes: > On Thu, 11 Nov 2010 17:32:06 +0100 > Markus Armbruster wrote: > >> Luiz Capitulino writes: >> >> > On Thu, 11 Nov 2010 16:30:26 +0100 >> > Markus Armbruster wrote: >> > >> >> Luiz Capitulino writes: >> >> >> >> > This driver handles in-memory chardev operations. That's, all writes >> >> > to this driver are stored in an internal buffer and it doesn't talk >> >> > to the external world in any way. >> >> > >> >> > Right now it's very simple: it supports only writes. But it can be >> >> > easily extended to support more operations. >> >> > >> >> > This is going to be used by the monitor's "HMP passthrough via QMP" >> >> > feature, which needs to run monitor handlers without a backing >> >> > device. >> >> > >> >> > Signed-off-by: Luiz Capitulino >> >> > --- >> >> > qemu-char.c | 66 >> >> > +++ >> >> > qemu-char.h |6 + >> >> > 2 files changed, 72 insertions(+), 0 deletions(-) >> >> > >> >> > diff --git a/qemu-char.c b/qemu-char.c >> >> > index 88997f9..896df14 100644 >> >> > --- a/qemu-char.c >> >> > +++ b/qemu-char.c >> >> > @@ -2275,6 +2275,72 @@ static CharDriverState >> >> > *qemu_chr_open_socket(QemuOpts *opts) >> >> > return NULL; >> >> > } >> >> > >> >> > +/***/ >> >> > +/* Memory chardev */ >> >> > +typedef struct { >> >> > +size_t outbuf_size; >> >> > +size_t outbuf_capacity; >> >> > +uint8_t *outbuf; >> >> > +} MemoryDriver; >> >> > + >> >> > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, int >> >> > len) >> >> > +{ >> >> > +MemoryDriver *d = chr->opaque; >> >> > + >> >> > +/* TODO: the QString implementation has the same code, we should >> >> > + * introduce a generic way to do this in cutils.c */ >> >> > +if (d->outbuf_capacity < d->outbuf_size + len) { >> >> > +/* grown outbuf */ >> >> >> >> Used to say "grow" (sans n) here. Intentional change? >> > >> > Hum, no. I think I've squashed an older commit while rebasing (but this >> > seems >> > to be the only problem). >> > >> >> >> >> > +d->outbuf_capacity += len; >> >> > +d->outbuf_capacity *= 2; >> >> > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity); >> >> > +} >> >> > + >> >> > +memcpy(d->outbuf + d->outbuf_size, buf, len); >> >> > +d->outbuf_size += len; >> >> > + >> >> > +return len; >> >> > +} >> >> > + >> >> > +#define DEFAULT_BUF_SIZE 4096 >> >> >> >> It's the *initial* buffer size, isn't it? >> > >> > Yes. >> >> Could we make the name reflect that then? >> >> >> Doubt it's worth a #define (there's just one user), but that's a matter >> >> of taste. >> >> >> >> > + >> >> > +void qemu_chr_init_mem(CharDriverState *chr) >> >> > +{ >> >> > +MemoryDriver *d; >> >> > + >> >> > +d = qemu_malloc(sizeof(*d)); >> >> > +d->outbuf_size = 0; >> >> > +d->outbuf_capacity = DEFAULT_BUF_SIZE; >> >> > +d->outbuf = qemu_mallocz(d->outbuf_capacity); >> >> > + >> >> > +memset(chr, 0, sizeof(*chr)); >> >> > +chr->opaque = d; >> >> > +chr->chr_write = mem_chr_write; >> >> > +} >> >> > + >> >> > +/* assumes the stored data is a string */ >> >> >> >> What else could it be? Worrying about embedded '\0's? >> > >> > Yes, as the driver itself doesn't interpret the contents of its >> > buffer. >> >> What happens if there are embedded '\0's? > > The string will be shorter than expected? And what if it contains > non-printable characters? > > It's just a cautionary comment to help the user identify such problems, I > think > we're making a whole argument about a quite minor thing. When I see "assumes X" in a function comment, I immediately ask "and what happens when !X?" The default answer is "it explodes, so don't do that". That answer is wrong here. Therefore, I find the comment misleading. Let's figure out what really happens. The human command's output is sent to the client as a JSON string (response object member return). JSON strings can consist of Unicode characters, "except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+ through U+001F)" (RFC 4627, section 2.5). Do we escape these characters? Where in the code? >> >> > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) >> >> > +{ >> >> > +MemoryDriver *d = chr->opaque; >> >> > + >> >> > +if (d->outbuf_size == 0) { >> >> > +return NULL; >> >> > +} >> >> >> >> Did you forget to change this? We agreed to return an empty QString >> >> when chr contains an empty string. >> > >> > I've changed my mind and forgot to mention it: I thought that we would >> > need to return NULL on error conditions, but turns out that this function >> > never fails. >> > >> > So, I do think it's better to let it that way for two reasons: >> > >> > 1. An empty has at least the '\0' character, but in this case the buffer >> > is rea
Re: [Qemu-devel] [PATCH 1/3] qemu-char: Introduce Memory driver
Luiz Capitulino writes: > This driver handles in-memory chardev operations. That's, all writes > to this driver are stored in an internal buffer and it doesn't talk > to the external world in any way. > > Right now it's very simple: it supports only writes. But it can be > easily extended to support more operations. > > This is going to be used by the monitor's "HMP passthrough via QMP" > feature, which needs to run monitor handlers without a backing > device. > > Signed-off-by: Luiz Capitulino > --- > qemu-char.c | 70 > +++ > qemu-char.h |7 ++ > 2 files changed, 77 insertions(+), 0 deletions(-) > > diff --git a/qemu-char.c b/qemu-char.c > index 88997f9..36d23c6 100644 > --- a/qemu-char.c > +++ b/qemu-char.c > @@ -2275,6 +2275,76 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts > *opts) > return NULL; > } > > +/***/ > +/* Memory chardev */ > +typedef struct { > +size_t outbuf_size; > +size_t outbuf_capacity; > +uint8_t *outbuf; > +} MemoryDriver; > + > +static int mem_chr_write(CharDriverState *chr, const uint8_t *buf, int len) > +{ > +MemoryDriver *d = chr->opaque; > + > +/* TODO: the QString implementation has the same code, we should > + * introduce a generic way to do this in cutils.c */ > +if (d->outbuf_capacity < d->outbuf_size + len) { > +/* grow outbuf */ > +d->outbuf_capacity += len; > +d->outbuf_capacity *= 2; > +d->outbuf = qemu_realloc(d->outbuf, d->outbuf_capacity); > +} > + > +memcpy(d->outbuf + d->outbuf_size, buf, len); > +d->outbuf_size += len; > + > +return len; > +} > + > +void qemu_chr_init_mem(CharDriverState *chr) > +{ > +MemoryDriver *d; > + > +d = qemu_malloc(sizeof(*d)); > +d->outbuf_size = 0; > +d->outbuf_capacity = 4096; > +d->outbuf = qemu_mallocz(d->outbuf_capacity); > + > +memset(chr, 0, sizeof(*chr)); > +chr->opaque = d; > +chr->chr_write = mem_chr_write; > +} > + > +/* assumes the stored data is a string */ This could indicate a problem. See my reply in the thread for v2. > +QString *qemu_chr_mem_to_qs(CharDriverState *chr) > +{ > +MemoryDriver *d = chr->opaque; > + > +if (d->outbuf_size == 0) { > +return qstring_new(); > +} Why is this necessary? Is qstring_from_substr() broken for empty substrings? If it is, it ought to be fixed! > + > +return qstring_from_substr((char *) d->outbuf, 0, d->outbuf_size - 1); > +} > + > +/* NOTE: this driver can not be closed with qemu_chr_close()! */ > +void qemu_chr_close_mem(CharDriverState *chr) > +{ > +MemoryDriver *d = chr->opaque; > + > +qemu_free(d->outbuf); > +qemu_free(chr->opaque); > +chr->opaque = NULL; > +chr->chr_write = NULL; > +} > + > +size_t qemu_chr_mem_osize(const CharDriverState *chr) > +{ > +const MemoryDriver *d = chr->opaque; > +return d->outbuf_size; > +} > + > QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename) > { > char host[65], port[33], width[8], height[8]; > diff --git a/qemu-char.h b/qemu-char.h > index 18ad12b..e6ee6c4 100644 > --- a/qemu-char.h > +++ b/qemu-char.h > @@ -6,6 +6,7 @@ > #include "qemu-option.h" > #include "qemu-config.h" > #include "qobject.h" > +#include "qstring.h" > > /* character device */ > > @@ -100,6 +101,12 @@ CharDriverState *qemu_chr_open_eventfd(int eventfd); > > extern int term_escape_char; > > +/* memory chardev */ > +void qemu_chr_init_mem(CharDriverState *chr); > +void qemu_chr_close_mem(CharDriverState *chr); > +QString *qemu_chr_mem_to_qs(CharDriverState *chr); > +size_t qemu_chr_mem_osize(const CharDriverState *chr); > + > /* async I/O support */ > > int qemu_set_fd_handler2(int fd,
[Qemu-devel] Re: [PATCH] scsi-disk: Move active request asserts
Am 12.11.2010 10:57, schrieb Stefan Hajnoczi: > SCSI read/write requests should not be re-issued before the current > fragment of I/O completes. There are asserts in scsi-disk.c that guard > this constraint but they trigger on SPARC Linux 2.4. It turns out that > the asserts are too early in the code path and don't allow for read > requests to terminate. > > Only the read assert needs to be moved but move the write assert too for > consistency. > > Reported-by: Nigel Horne > Signed-off-by: Stefan Hajnoczi Thanks, applied to the block branch. Kevin
[Qemu-devel] Re: Cannot not unplug cold-plugged devices
On Fri, Nov 12, 2010 at 11:18:57AM +0200, Michael S. Tsirkin wrote: > On Fri, Nov 12, 2010 at 04:24:11PM +0900, Isaku Yamahata wrote: > > On Thu, Nov 11, 2010 at 11:29:39PM -0700, Cam Macdonell wrote: > > > Hi, > > > > > > I was trying to do a "device_del" on my ivshmem device and it won't > > > work unless the device is added via hotplug. If the device is > > > coldplugged (added at startup) then nothing happens. I think I > > > tracked this behaviour to the patch below. > > > > > > Is not allowing coldplugged devices to be unplugged the desired behaviour? > > > > Oh, my bad. Does the following patch help? > > > > >From 45914b2fc95750b65685dfb98a435f58e38b45ba Mon Sep 17 00:00:00 2001 > > Message-Id: > > <45914b2fc95750b65685dfb98a435f58e38b45ba.1289546596.git.yamah...@valinux.co.jp> > > From: Isaku Yamahata > > Date: Fri, 12 Nov 2010 16:21:35 +0900 > > Subject: [PATCH] acpi/piix4: allow pci hotplug for cold plugged device. > > > > This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1 > > reported by Cam Macdonell . > > Before the change set, cold plugged device can be hot unplugged. > > This patch unbreaks it. > > > > Signed-off-by: Isaku Yamahata > > > > > --- > > hw/acpi_piix4.c |4 +++- > > 1 files changed, 3 insertions(+), 1 deletions(-) > > > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c > > index 66c7885..bca330e 100644 > > --- a/hw/acpi_piix4.c > > +++ b/hw/acpi_piix4.c > > @@ -621,8 +621,10 @@ static int piix4_device_hotplug(DeviceState *qdev, > > PCIDevice *dev, int state) > > PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, > > DO_UPCAST(PCIDevice, qdev, qdev)); > > > > -if (!dev->qdev.hotplugged) > > +/* qemu initialization case */ > > That comment is pretty obscure :) How about the following? Reached here during qemu creating a machine before VM runs. hot plug event shouldn't be triggered because it's dangerous. > > > > +if (state && !dev->qdev.hotplugged) { > > return 0; > > +} > > Do we even need this check at all? Hmm, I suppose you don't like the condition more complex. How about "if (qdev_hotplug)". An access function to qdev_hotplug in qdev.c would be desirable. > > > > > s->pci0_status.up = 0; > > s->pci0_status.down = 0; > > -- > > 1.7.1.1 > > > > > > > > -- > > yamahata > -- yamahata
[Qemu-devel] Re: [patch] fix scsi-generic
On 08/09/2010 01:51 AM, adq wrote: Figured out what the problem is - READ DVD STRUCTURE has its xfer length in an unexpected place, so hw/scsi-bus.c retrieves completely the wrong value for the transfer length. Attached nasty hacky (!) patch fixes it as a proof of concept, will see what I can do to clean it up later. I'd probably want it to warn if it sees SCSI commands it doesn't know how to explicitly handle to aid debugging this sort of thing in future. Hi Andrew, are you going to submit a similar patch in definitive form? Paolo
[Qemu-devel] [PATCH 3/4] pci: fix accesses to pci status register
pci status register is 16 bit, not 8 bit. So use helper function to manipulate status register. Signed-off-by: Isaku Yamahata --- hw/pci.c | 21 + 1 files changed, 13 insertions(+), 8 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index a734e14..a16e763 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -127,9 +127,11 @@ static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change) static void pci_update_irq_status(PCIDevice *dev) { if (dev->irq_state) { -dev->config[PCI_STATUS] |= PCI_STATUS_INTERRUPT; +pci_word_test_and_set_mask(dev->config + PCI_STATUS, + PCI_STATUS_INTERRUPT); } else { -dev->config[PCI_STATUS] &= ~PCI_STATUS_INTERRUPT; +pci_word_test_and_clear_mask(dev->config + PCI_STATUS, + PCI_STATUS_INTERRUPT); } } @@ -413,7 +415,7 @@ void pci_device_save(PCIDevice *s, QEMUFile *f) * in irq_state which we are saving. * This makes us compatible with old devices * which never set or clear this bit. */ -s->config[PCI_STATUS] &= ~PCI_STATUS_INTERRUPT; +pci_word_test_and_clear_mask(s->config + PCI_STATUS, PCI_STATUS_INTERRUPT); vmstate_save_state(f, pci_get_vmstate(s), s); /* Restore the interrupt status bit. */ pci_update_irq_status(s); @@ -626,7 +628,7 @@ static void pci_init_cmask(PCIDevice *dev) { pci_set_word(dev->cmask + PCI_VENDOR_ID, 0x); pci_set_word(dev->cmask + PCI_DEVICE_ID, 0x); -dev->cmask[PCI_STATUS] = PCI_STATUS_CAP_LIST; +pci_set_word(dev->cmask + PCI_STATUS, PCI_STATUS_CAP_LIST); dev->cmask[PCI_REVISION_ID] = 0xff; dev->cmask[PCI_CLASS_PROG] = 0xff; pci_set_word(dev->cmask + PCI_CLASS_DEVICE, 0x); @@ -1793,8 +1795,9 @@ static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id, { uint8_t next, prev; -if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST)) +if (!(pci_get_word(pdev->config + PCI_STATUS) & PCI_STATUS_CAP_LIST)) { return 0; +} for (prev = PCI_CAPABILITY_LIST; (next = pdev->config[prev]); prev = next + PCI_CAP_LIST_NEXT) @@ -1900,7 +1903,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, config[PCI_CAP_LIST_ID] = cap_id; config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST]; pdev->config[PCI_CAPABILITY_LIST] = offset; -pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST; +pci_word_test_and_set_mask(pdev->config + PCI_STATUS, PCI_STATUS_CAP_LIST); memset(pdev->used + offset, 0xFF, size); /* Make capability read-only by default */ memset(pdev->wmask + offset, 0, size); @@ -1923,8 +1926,10 @@ void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size) memset(pdev->cmask + offset, 0, size); memset(pdev->used + offset, 0, size); -if (!pdev->config[PCI_CAPABILITY_LIST]) -pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST; +if (!pdev->config[PCI_CAPABILITY_LIST]) { +pci_word_test_and_clear_mask(pdev->config + PCI_STATUS, + PCI_STATUS_CAP_LIST); +} } /* Reserve space for capability at a known offset (to call after load). */ -- 1.7.1.1
[Qemu-devel] [PATCH] scsi-disk: Move active request asserts
SCSI read/write requests should not be re-issued before the current fragment of I/O completes. There are asserts in scsi-disk.c that guard this constraint but they trigger on SPARC Linux 2.4. It turns out that the asserts are too early in the code path and don't allow for read requests to terminate. Only the read assert needs to be moved but move the write assert too for consistency. Reported-by: Nigel Horne Signed-off-by: Stefan Hajnoczi --- hw/scsi-disk.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index dc71957..7d85899 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -170,6 +170,9 @@ static void scsi_read_request(SCSIDiskReq *r) return; } +/* No data transfer may already be in progress */ +assert(r->req.aiocb == NULL); + n = r->sector_count; if (n > SCSI_DMA_BUF_SIZE / 512) n = SCSI_DMA_BUF_SIZE / 512; @@ -197,9 +200,6 @@ static void scsi_read_data(SCSIDevice *d, uint32_t tag) return; } -/* No data transfer may already be in progress */ -assert(r->req.aiocb == NULL); - scsi_read_request(r); } @@ -269,6 +269,9 @@ static void scsi_write_request(SCSIDiskReq *r) SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev); uint32_t n; +/* No data transfer may already be in progress */ +assert(r->req.aiocb == NULL); + n = r->iov.iov_len / 512; if (n) { qemu_iovec_init_external(&r->qiov, &r->iov, 1); @@ -298,9 +301,6 @@ static int scsi_write_data(SCSIDevice *d, uint32_t tag) return 1; } -/* No data transfer may already be in progress */ -assert(r->req.aiocb == NULL); - scsi_write_request(r); return 0; -- 1.7.2.3
[Qemu-devel] [PATCH 2/4] pci: clean up pci command register io/memory bit initialization
This patch fixes the initialization of io/memory bit of command register. Those bits for type 1 device is RW. Those bits for type 0 device is RO = 0 if it has no io/memory BAR RW if it has io/memory BAR Signed-off-by: Isaku Yamahata --- hw/pci.c | 20 +--- 1 files changed, 17 insertions(+), 3 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 86900a2..a734e14 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -644,10 +644,14 @@ static void pci_init_wmask(PCIDevice *dev) /* * bit 0: PCI_COMMAND_IO *type 0: if IO BAR is used, RW - *type 1: RW + *This is handled by pci_register_bar() + *type 1: RW: + *This is fixed by pci_init_wmask_bridge() * bit 1: PCI_COMMAND_MEMORY *type 0: if IO BAR is used, RW + *This is handled by pci_register_bar() *type 1: RW + *This is fixed by pci_init_wmask_bridge() * bit 2: PCI_COMMAND_MASTER *type 0: RW if bus master *type 1: RW @@ -682,8 +686,7 @@ static void pci_init_wmask(PCIDevice *dev) * bit 11-15: reserved */ pci_set_word(dev->wmask + PCI_COMMAND, - PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER | - PCI_COMMAND_PARITY | PCI_COMMAND_SERR | + PCI_COMMAND_MASTER | PCI_COMMAND_PARITY | PCI_COMMAND_SERR | PCI_COMMAND_INTX_DISABLE); memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff, @@ -692,6 +695,9 @@ static void pci_init_wmask(PCIDevice *dev) static void pci_init_wmask_bridge(PCIDevice *d) { +pci_word_test_and_set_mask(d->wmask + PCI_COMMAND, + PCI_COMMAND_IO | PCI_COMMAND_MEMORY); + /* PCI_PRIMARY_BUS, PCI_SECONDARY_BUS, PCI_SUBORDINATE_BUS and PCI_SEC_LETENCY_TIMER */ memset(d->wmask + PCI_PRIMARY_BUS, 0xff, 4); @@ -929,6 +935,14 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, if (region_num == PCI_ROM_SLOT) { /* ROM enable bit is writeable */ wmask |= PCI_ROM_ADDRESS_ENABLE; +} else { +if (r->type & PCI_BASE_ADDRESS_SPACE_IO) { +pci_word_test_and_set_mask(pci_dev->wmask + PCI_COMMAND, + PCI_COMMAND_IO); +} else { +pci_word_test_and_set_mask(pci_dev->wmask + PCI_COMMAND, + PCI_COMMAND_MEMORY); +} } pci_set_long(pci_dev->config + addr, type); if (!(r->type & PCI_BASE_ADDRESS_SPACE_IO) && -- 1.7.1.1
[Qemu-devel] [PATCH 1/4] pci: revise pci command register initialization
This patch cleans up command register initialization with comments. Signed-off-by: Isaku Yamahata --- hw/pci.c | 42 ++ 1 files changed, 42 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index 8b79ad6..86900a2 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -640,8 +640,50 @@ static void pci_init_wmask(PCIDevice *dev) dev->wmask[PCI_CACHE_LINE_SIZE] = 0xff; dev->wmask[PCI_INTERRUPT_LINE] = 0xff; + +/* + * bit 0: PCI_COMMAND_IO + *type 0: if IO BAR is used, RW + *type 1: RW + * bit 1: PCI_COMMAND_MEMORY + *type 0: if IO BAR is used, RW + *type 1: RW + * bit 2: PCI_COMMAND_MASTER + *type 0: RW if bus master + *type 1: RW + * bit 3: PCI_COMMAND_SPECIAL + *RO=0, optionally RW: Such device should set this bit itself + * bit 4: PCI_COMMAND_INVALIDATE + *RO=0, optionally RW: Such device should set this bit itself + * bit 5: PCI_COMMAND_VGA_PALETTE + *RO=0, optionally RW: Such device should set this bit itself + * bit 6: PCI_COMMAND_PARITY + *RW with exceptions: Such device should clear this bit itself + *Given that qemu doesn't emulate pci bus cycles, so that there + *is no place to generate parity error. So just make this + *register RW is safe because there is no place which refers + *this bit. + *TODO: When device assignment tried to inject PERR# into qemu, + * some extra work would be needed. + * bit 7: PCI_COMMAND_WAIT: reserved (PCI 3.0) + *RO=0 + * bit 8: PCI_COMMAND_SERR + *RW with exceptions: Such device should clear this bit itself + *Given that qemu doesn't emulate pci bus cycles, so that there + *is no place to generate system error. So just make this + *register RW is safe because there is no place which refers + *this bit. + *TODO: When device assignment tried to inject SERR# into qemu, + * some extra work would be needed. + * bit 9: PCI_COMMAND_FAST_BACK + *RO=0, optionally RW: Such device should set this bit itself + * bit 10: PCI_COMMAND_INTX_DISABLE + * RW + * bit 11-15: reserved + */ pci_set_word(dev->wmask + PCI_COMMAND, PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER | + PCI_COMMAND_PARITY | PCI_COMMAND_SERR | PCI_COMMAND_INTX_DISABLE); memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff, -- 1.7.1.1
[Qemu-devel] [PATCH 0/4] pci: clean up of command/status register
This patch set cleans up of initialization of pci command/status register. Isaku Yamahata (4): pci: revise pci command register initialization pci: clean up pci command register io/memory bit initialization pci: fix accesses to pci status register pci: clean up of pci status register hw/pci.c | 119 -- 1 files changed, 108 insertions(+), 11 deletions(-)
[Qemu-devel] [PATCH 4/4] pci: clean up of pci status register
This patch refine the initialization/reset of pci status registers. Signed-off-by: Isaku Yamahata --- hw/pci.c | 40 ++-- 1 files changed, 38 insertions(+), 2 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index a16e763..184db6c 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -154,6 +154,9 @@ static void pci_device_reset(PCIDevice *dev) pci_word_test_and_clear_mask(dev->config + PCI_COMMAND, pci_get_word(dev->wmask + PCI_COMMAND) | pci_get_word(dev->w1cmask + PCI_COMMAND)); +pci_word_test_and_clear_mask(dev->config + PCI_STATUS, + pci_get_word(dev->wmask + PCI_STATUS) | + pci_get_word(dev->w1cmask + PCI_STATUS)); dev->config[PCI_CACHE_LINE_SIZE] = 0x0; dev->config[PCI_INTERRUPT_LINE] = 0x0; for (r = 0; r < PCI_NUM_REGIONS; ++r) { @@ -636,7 +639,7 @@ static void pci_init_cmask(PCIDevice *dev) dev->cmask[PCI_CAPABILITY_LIST] = 0xff; } -static void pci_init_wmask(PCIDevice *dev) +static void pci_init_wmask_w1cmask(PCIDevice *dev) { int config_size = pci_config_size(dev); @@ -691,6 +694,39 @@ static void pci_init_wmask(PCIDevice *dev) PCI_COMMAND_MASTER | PCI_COMMAND_PARITY | PCI_COMMAND_SERR | PCI_COMMAND_INTX_DISABLE); +/* + * bit 0-2: reserved + * bit 3: PCI_STATUS_INTERRUPT: RO + * bit 4: PCI_STATUS_CAP_LIST: RO + * bit 5: PCI_STATUS_66MHZ: RO + * bit 6: PCI_STATUS_UDF: reserved (PCI 2.2-) + * bit 7: PCI_STATUS_FAST_BACK: RO + * bit 8: PCI_STATUS_PARITY + *type 0: RW for bus master + *type 1: RW1C + * bit 9-10: PCI_STATUS_DEVSEL: RO + * bit 11: PCI_STATUS_SIG_TARGET_ABORT + * type 0: RW1C for targets that is capable of terminating + * a transaction. + * type 1: RW1C + * bit 12: PCI_STATUS_REC_TARGET_ABORT + * type 0: RW1C for masters + * type 1: RW1C + * bit 13: PCI_STATUS_REC_MASTER_ABORT + * type 0: RW1C for masters + * type 1: RW1C + * bit 14: PCI_STATUS_SIG_SYSTEM_ERROR + * type 0: RW1C with execptions + * type 1: RW1C + * bit : PCI_STATUS_DETECTED_PARITY: RW1C + * + * It's safe to set w1mask even for RO. + */ +pci_set_word(dev->w1cmask + PCI_STATUS, + PCI_STATUS_PARITY | PCI_STATUS_SIG_TARGET_ABORT | + PCI_STATUS_REC_TARGET_ABORT | PCI_STATUS_REC_MASTER_ABORT | + PCI_STATUS_SIG_SYSTEM_ERROR | PCI_STATUS_DETECTED_PARITY); + memset(dev->wmask + PCI_CONFIG_HEADER_SIZE, 0xff, config_size - PCI_CONFIG_HEADER_SIZE); } @@ -821,7 +857,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, pci_set_default_subsystem_id(pci_dev); } pci_init_cmask(pci_dev); -pci_init_wmask(pci_dev); +pci_init_wmask_w1cmask(pci_dev); if (is_bridge) { pci_init_wmask_bridge(pci_dev); } -- 1.7.1.1
Re: [Qemu-devel] No Virtual Console
Yes - that's it! The SDL window. I can't get it to show up ... any ideas? Thanks! On Fri, Nov 12, 2010 00:18 AM, Mulyadi Santosa wrote: > Hi... > > On Fri, Nov 12, 2010 at 03:30, Russell Morris wrote: > > Hi, > > > > > > > > Yes, I am in runlevel 5. I have to admit, I did check /etc/inittab, but I'm > > not sure what I'm looking for ... :-(. A bit lost here as to what you're > > saying, sorry! Can you clarify a bit? > > i mean, do you something like below in your /etc/inittab? > 1:2345:respawn:/sbin/mingetty tty1 > > it could be tty2, tty3 and so on...it represents the console number. > So it reads "in console 1, in run level 2 up to 5, execute mingetty > and make it servicing tty1" > > > Yes, VNC works - I was trying to say that, just not very clearly. What I'm > > after though is to have a separate window open when I launch qemu, which is > > the "target" OS ... make sense? > > I think what you mean here is the standart graphical window, right? > the SDL one, right? > > -- > regards, > > Mulyadi Santosa > Freelance Linux trainer and consultant > > blog: the-hydra.blogspot.com > training: mulyaditraining.blogspot.com > >
[Qemu-devel] (no subject)
[Qemu-devel] [Bug 671831] Re: Sparc guest assert error
** Changed in: qemu Status: New => In Progress ** Changed in: qemu Assignee: (unassigned) => Stefan Hajnoczi (stefanha) -- Sparc guest assert error https://bugs.launchpad.net/bugs/671831 You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. Status in QEMU: In Progress Bug description: The latest version in git (d33ea50a958b2e050d2b28e5f17e3b55e91c6d74) crashes with an assert error when booting a Sparc/Linux guest. The last time I tried it (about a week ago) it worked fine. Yesterdai, I did a git pull, make clean, reran configure and compiled. Host OS: Debian Linux/x86_64 5.0 C Compiler: 4.4.5 Guest OS: Linux/Sparc (2.4) Command Line: qemu-system-sparc -hda ~njh/qemu/sparc/debian.img -nographic -m 128 Build Configure: ./configure --enable-linux-aio --enable-io-thread --enable-kvm GIT commit: d33ea50a958b2e050d2b28e5f17e3b55e91c6d74 Output: Adding Swap: 122532k swap-space (priority -1) . Will now check root file system:fsck 1.40-WIP (14-Nov-2006) [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda2 qemu-system-sparc: /home/njh/src/qemu/hw/scsi-disk.c:201: scsi_read_data: Assertion `r->req.aiocb == ((void *)0)' failed. It crashes in the same place every time. (gdb) thread apply all bt: Thread 3 (Thread 17643): #0 0x7f4db21bc8d3 in select () at ../sysdeps/unix/syscall-template.S:82 #1 0x004d02c4 in main_loop_wait (nonblocking=) at /home/njh/src/qemu/vl.c:1246 #2 0x004d0e57 in main_loop (argc=, argv=, envp=) at /home/njh/src/qemu/vl.c:1309 #3 main (argc=, argv=, envp=) at /home/njh/src/qemu/vl.c:2999 Thread 2 (Thread 17645): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211 #1 0x0042450b in cond_timedwait (unused=) at posix-aio-compat.c:104 #2 aio_thread (unused=) at posix-aio-compat.c:325 #3 0x7f4db3b818ba in start_thread (arg=) at pthread_create.c:300 #4 0x7f4db21c302d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 #5 0x in ?? () Current language: auto The current source language is "auto; currently asm". Thread 1 (Thread 17644): #0 0x7f4db2126165 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x7f4db2128f70 in *__GI_abort () at abort.c:92 #2 0x7f4db211f2b1 in *__GI___assert_fail ( assertion=0x52690a "r->req.aiocb == ((void *)0)", file=, line=201, function=0x527480 "scsi_read_data") at assert.c:81 #3 0x0044f363 in scsi_read_data (d=, tag=0) at /home/njh/src/qemu/hw/scsi-disk.c:201 #4 0x004ebd6c in esp_do_dma (s=0x20679d0) at /home/njh/src/qemu/hw/esp.c:377 #5 0x004ec781 in handle_ti (opaque=0x20679d0, addr=, val=) at /home/njh/src/qemu/hw/esp.c:443 #6 esp_mem_writeb (opaque=0x20679d0, addr=, val=) at /home/njh/src/qemu/hw/esp.c:595 #7 0x41b2d971 in ?? () #8 0x in ?? () #9 0x031ad000 in ?? () #10 0x000301adfa20 in ?? () #11 0x1007 in ?? () #12 0x7f4daf80e8a0 in ?? () #13 0x0001 in ?? () #14 0x in ?? ()
[Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify
On Fri, Nov 12, 2010 at 09:18:48AM +, Stefan Hajnoczi wrote: > On Thu, Nov 11, 2010 at 3:53 PM, Michael S. Tsirkin wrote: > > On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote: > >> Care must be taken not to interfere with vhost-net, which already uses > >> ioeventfd host notifiers. The following list shows the behavior > >> implemented in > >> this patch and is designed to take vhost-net into account: > >> > >> * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, > >> qemu_set_fd_handler(virtio_pci_host_notifier_read) > > > > we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared > > by io write or bus master bit? > > You're right, I'll fix the lifecycle to trigger symmetrically on > status bit changes rather than VIRTIO_CONFIG_S_DRIVER_OK/reset. > > >> +static void virtio_pci_reset_vdev(VirtIOPCIProxy *proxy) > >> +{ > >> + /* Poke virtio device so it deassigns its host notifiers (if any) */ > >> + virtio_set_status(proxy->vdev, 0); > > > > Hmm. virtio_reset already sets status to 0. > > I guess it should just be fixed to call virtio_set_status? > > This part is ugly. The problem is that virtio_reset() calls > virtio_set_status(vdev, 0) but doesn't give the transport binding a > chance clean up after the virtio device has cleaned up. Since > virtio-net will spot status=0 and deassign its host notifier, we need > to perform our own clean up after vhost. > > What makes this slightly less of a hack is the fact that virtio-pci.c > was already causing virtio_set_status(vdev, 0) to be invoked twice > during reset. When 0 is written to the VIRTIO_PCI_STATUS register, we > do virtio_set_status(proxy->vdev, val & 0xFF) and then > virtio_reset(proxy->vdev). So the status byte callback already gets > invoked twice today. > > I've just split this out into virtio_pci_reset_vdev() and (ab)used it > to correctly clean up virtqueue ioeventfd. > > The alternative is to add another callback from virtio.c so we are > notified after the vdev's reset callback has finished. Oh, likely not worth it. Mabe put the above explanation in the comment. Will this go away now that you move to set notifiers on status write? > >> @@ -223,10 +322,16 @@ static void virtio_ioport_write(void *opaque, > >> uint32_t addr, uint32_t val) > >> virtio_queue_notify(vdev, val); > >> break; > >> case VIRTIO_PCI_STATUS: > >> - virtio_set_status(vdev, val & 0xFF); > >> - if (vdev->status == 0) { > >> - virtio_reset(proxy->vdev); > >> - msix_unuse_all_vectors(&proxy->pci_dev); > >> + if ((val & VIRTIO_CONFIG_S_DRIVER_OK) && > >> + !(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) && > >> + (proxy->flags & VIRTIO_PCI_FLAG_USE_IOEVENTFD)) { > >> + virtio_pci_set_host_notifiers(proxy, true); > >> + } > > > > So we set host notifiers to true from here, but to false > > only on reset? This seems strange. Should not we disable > > notifiers when driver clears OK status? > > How about on bus master disable? > > You're right, this needs to be fixed. > > >> @@ -714,6 +803,8 @@ static PCIDeviceInfo virtio_info[] = { > >> .exit = virtio_net_exit_pci, > >> .romfile = "pxe-virtio.bin", > >> .qdev.props = (Property[]) { > >> + DEFINE_PROP_UINT32("flags", VirtIOPCIProxy, flags, > >> + VIRTIO_PCI_FLAG_USE_IOEVENTFD), > >> DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3), > >> DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features), > >> DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic), > > > > This ties interface to an internal macro value. Further, user gets to > > tweak other fields in this integer which we don't want. Finally, the > > interface is extremely unfriendly. > > Please use a bit property instead: DEFINE_PROP_BIT. > > Will fix in v3. > > >> diff --git a/hw/virtio.c b/hw/virtio.c > >> index a2a657e..f588e29 100644 > >> --- a/hw/virtio.c > >> +++ b/hw/virtio.c > >> @@ -582,6 +582,11 @@ void virtio_queue_notify(VirtIODevice *vdev, int n) > >> } > >> } > >> > >> +void virtio_queue_notify_vq(VirtQueue *vq) > >> +{ > >> + virtio_queue_notify(vq->vdev, vq - vq->vdev->vq); > > > > Let's implement virtio_queue_notify in terms of virtio_queue_notify_vq. > > Not the other way around. > > Will fix in v3. > > Stefan
Re: [Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify
On Thu, Nov 11, 2010 at 4:45 PM, Christoph Hellwig wrote: > On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote: >> Some virtio devices are known to have guest drivers which expect a notify to >> be >> processed synchronously and spin waiting for completion. Only enable >> ioeventfd >> for virtio-blk and virtio-net for now. > > Who guarantees that less common virtio-blk and virtio-net guest drivers > for non-Linux OSes are fine with it? Maybe you should add a feature flag > that the guest has to ACK to enable it. Virtio-blk and virtio-net are fine. Both of those devices are expected to operate asynchronously. SeaBIOS and gPXE virtio-net drivers spin but they expect to and it is okay in those environments. They already burn CPU today. Virtio-console expects synchronous virtqueue kick. In Linux, virtio_console.c __send_control_msg() and send_buf() will spin. Qemu userspace is able to complete those requests synchronously so that the guest never actually burns CPU (e.g. hw/virtio-serial-bus.c:send_control_msg()). I don't want to burn CPU in places where we previously didn't. It's good that QEMU can decide whether or not to handle virtqueue kick in the vcpu thread. For high performance asynchronous devices like virtio-net and virtio-blk it makes sense to use ioeventfd. For others it may not be useful. I'm not sure a feature bit that exposes this detail to the guest would be useful. Stefan
[Qemu-devel] Re: [PATCH 6/8] device-assignment: Move PCI capabilities to match physical hardware
On Thu, Nov 11, 2010 at 07:56:13PM -0700, Alex Williamson wrote: > Now that common PCI code doesn't have a hangup on capabilities > being contiguous, Hmm, this comment confused me : there's no requirement of contigious allocations in current code in pci.c, is there? -- MST
[Qemu-devel] Re: Cannot not unplug cold-plugged devices
On Fri, Nov 12, 2010 at 04:24:11PM +0900, Isaku Yamahata wrote: > On Thu, Nov 11, 2010 at 11:29:39PM -0700, Cam Macdonell wrote: > > Hi, > > > > I was trying to do a "device_del" on my ivshmem device and it won't > > work unless the device is added via hotplug. If the device is > > coldplugged (added at startup) then nothing happens. I think I > > tracked this behaviour to the patch below. > > > > Is not allowing coldplugged devices to be unplugged the desired behaviour? > > Oh, my bad. Does the following patch help? > > >From 45914b2fc95750b65685dfb98a435f58e38b45ba Mon Sep 17 00:00:00 2001 > Message-Id: > <45914b2fc95750b65685dfb98a435f58e38b45ba.1289546596.git.yamah...@valinux.co.jp> > From: Isaku Yamahata > Date: Fri, 12 Nov 2010 16:21:35 +0900 > Subject: [PATCH] acpi/piix4: allow pci hotplug for cold plugged device. > > This patch fixes 5beb8ad503c88a76f2b8106c3b74b4ce485a60e1 > reported by Cam Macdonell . > Before the change set, cold plugged device can be hot unplugged. > This patch unbreaks it. > > Signed-off-by: Isaku Yamahata > --- > hw/acpi_piix4.c |4 +++- > 1 files changed, 3 insertions(+), 1 deletions(-) > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c > index 66c7885..bca330e 100644 > --- a/hw/acpi_piix4.c > +++ b/hw/acpi_piix4.c > @@ -621,8 +621,10 @@ static int piix4_device_hotplug(DeviceState *qdev, > PCIDevice *dev, int state) > PIIX4PMState *s = DO_UPCAST(PIIX4PMState, dev, > DO_UPCAST(PCIDevice, qdev, qdev)); > > -if (!dev->qdev.hotplugged) > +/* qemu initialization case */ That comment is pretty obscure :) > +if (state && !dev->qdev.hotplugged) { > return 0; > +} Do we even need this check at all? > > s->pci0_status.up = 0; > s->pci0_status.down = 0; > -- > 1.7.1.1 > > > > -- > yamahata
[Qemu-devel] Re: [PATCH 2/3] virtio-pci: Use ioeventfd for virtqueue notify
On Thu, Nov 11, 2010 at 3:53 PM, Michael S. Tsirkin wrote: > On Thu, Nov 11, 2010 at 01:47:21PM +, Stefan Hajnoczi wrote: >> Care must be taken not to interfere with vhost-net, which already uses >> ioeventfd host notifiers. The following list shows the behavior implemented >> in >> this patch and is designed to take vhost-net into account: >> >> * VIRTIO_CONFIG_S_DRIVER_OK -> assign host notifiers, >> qemu_set_fd_handler(virtio_pci_host_notifier_read) > > we should also deassign when VIRTIO_CONFIG_S_DRIVER_OK is cleared > by io write or bus master bit? You're right, I'll fix the lifecycle to trigger symmetrically on status bit changes rather than VIRTIO_CONFIG_S_DRIVER_OK/reset. >> +static void virtio_pci_reset_vdev(VirtIOPCIProxy *proxy) >> +{ >> + /* Poke virtio device so it deassigns its host notifiers (if any) */ >> + virtio_set_status(proxy->vdev, 0); > > Hmm. virtio_reset already sets status to 0. > I guess it should just be fixed to call virtio_set_status? This part is ugly. The problem is that virtio_reset() calls virtio_set_status(vdev, 0) but doesn't give the transport binding a chance clean up after the virtio device has cleaned up. Since virtio-net will spot status=0 and deassign its host notifier, we need to perform our own clean up after vhost. What makes this slightly less of a hack is the fact that virtio-pci.c was already causing virtio_set_status(vdev, 0) to be invoked twice during reset. When 0 is written to the VIRTIO_PCI_STATUS register, we do virtio_set_status(proxy->vdev, val & 0xFF) and then virtio_reset(proxy->vdev). So the status byte callback already gets invoked twice today. I've just split this out into virtio_pci_reset_vdev() and (ab)used it to correctly clean up virtqueue ioeventfd. The alternative is to add another callback from virtio.c so we are notified after the vdev's reset callback has finished. >> @@ -223,10 +322,16 @@ static void virtio_ioport_write(void *opaque, uint32_t >> addr, uint32_t val) >> virtio_queue_notify(vdev, val); >> break; >> case VIRTIO_PCI_STATUS: >> - virtio_set_status(vdev, val & 0xFF); >> - if (vdev->status == 0) { >> - virtio_reset(proxy->vdev); >> - msix_unuse_all_vectors(&proxy->pci_dev); >> + if ((val & VIRTIO_CONFIG_S_DRIVER_OK) && >> + !(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK) && >> + (proxy->flags & VIRTIO_PCI_FLAG_USE_IOEVENTFD)) { >> + virtio_pci_set_host_notifiers(proxy, true); >> + } > > So we set host notifiers to true from here, but to false > only on reset? This seems strange. Should not we disable > notifiers when driver clears OK status? > How about on bus master disable? You're right, this needs to be fixed. >> @@ -714,6 +803,8 @@ static PCIDeviceInfo virtio_info[] = { >> .exit = virtio_net_exit_pci, >> .romfile = "pxe-virtio.bin", >> .qdev.props = (Property[]) { >> + DEFINE_PROP_UINT32("flags", VirtIOPCIProxy, flags, >> + VIRTIO_PCI_FLAG_USE_IOEVENTFD), >> DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 3), >> DEFINE_VIRTIO_NET_FEATURES(VirtIOPCIProxy, host_features), >> DEFINE_NIC_PROPERTIES(VirtIOPCIProxy, nic), > > This ties interface to an internal macro value. Further, user gets to > tweak other fields in this integer which we don't want. Finally, the > interface is extremely unfriendly. > Please use a bit property instead: DEFINE_PROP_BIT. Will fix in v3. >> diff --git a/hw/virtio.c b/hw/virtio.c >> index a2a657e..f588e29 100644 >> --- a/hw/virtio.c >> +++ b/hw/virtio.c >> @@ -582,6 +582,11 @@ void virtio_queue_notify(VirtIODevice *vdev, int n) >> } >> } >> >> +void virtio_queue_notify_vq(VirtQueue *vq) >> +{ >> + virtio_queue_notify(vq->vdev, vq - vq->vdev->vq); > > Let's implement virtio_queue_notify in terms of virtio_queue_notify_vq. > Not the other way around. Will fix in v3. Stefan
[Qemu-devel] Re: [RFC PATCH 8/8] device-assignment: pass through and stub more PCI caps
On Thu, Nov 11, 2010 at 11:30:07PM -0700, Alex Williamson wrote: > On Fri, 2010-11-12 at 07:36 +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 11, 2010 at 07:56:46PM -0700, Alex Williamson wrote: > > > Some drivers depend on finding capabilities like power management, > > > PCI express/X, vital product data, or vendor specific fields. Now > > > that we have better capability support, we can pass more of these > > > tables through to the guest. Note that VPD and VNDR are direct pass > > > through capabilies, the rest are mostly empty shells with a few > > > writable bits where necessary. > > > > > > Signed-off-by: Alex Williamson > > > --- > > > > > > hw/device-assignment.c | 160 > > > +--- > > > 1 files changed, 149 insertions(+), 11 deletions(-) > > > > > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > > > index 179c7dc..1b228ad 100644 > > > --- a/hw/device-assignment.c > > > +++ b/hw/device-assignment.c > > > @@ -366,6 +366,27 @@ static uint8_t assigned_dev_pci_read_byte(PCIDevice > > > *d, int pos) > > > return (uint8_t)assigned_dev_pci_read(d, pos, 1); > > > } > > > > > > +static void assigned_dev_pci_write(PCIDevice *d, int pos, uint32_t val, > > > int len) > > > +{ > > > +AssignedDevice *pci_dev = container_of(d, AssignedDevice, dev); > > > +ssize_t ret; > > > +int fd = pci_dev->real_device.config_fd; > > > + > > > +again: > > > +ret = pwrite(fd, &val, len, pos); > > > +if (ret != len) { > > > + if ((ret < 0) && (errno == EINTR || errno == EAGAIN)) > > > + goto again; > > > > > > do {} while() ? > > Sure, this is just a copy of another place that does something similar. > They should either be merged or both converted in a separate patch. > > > > + > > > + fprintf(stderr, "%s: pwrite failed, ret = %zd errno = %d\n", > > > + __func__, ret, errno); > > > + > > > + exit(1); > > > +} > > > + > > > +return; > > > +} > > > + > > > static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t cap) > > > { > > > int id; > > > @@ -1244,37 +1265,75 @@ static void assigned_dev_update_msix(PCIDevice > > > *pci_dev, unsigned int ctrl_pos) > > > #endif > > > #endif > > > > > > +static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev, > > > +uint8_t cap_id, > > > +uint32_t address, > > > int len) > > > +{ > > > +uint8_t cap; > > > + > > > +switch (cap_id) { > > > + > > > +case PCI_CAP_ID_VPD: > > > +cap = pci_find_capability(pci_dev, cap_id); > > > +if (address - cap >= PCI_CAP_FLAGS) { > > > +return assigned_dev_pci_read(pci_dev, address, len); > > > +} > > > +break; > > > + > > > +case PCI_CAP_ID_VNDR: > > > +cap = pci_find_capability(pci_dev, cap_id); > > > +if (address - cap > PCI_CAP_FLAGS) { > > > +return assigned_dev_pci_read(pci_dev, address, len); > > > +} > > > +break; > > > +} > > > + > > > +return pci_default_cap_read_config(pci_dev, cap_id, address, len); > > > +} > > > + > > > static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, > > > uint8_t cap_id, > > > uint32_t address, > > > uint32_t val, int len) > > > { > > > +uint8_t cap; > > > + > > > pci_default_cap_write_config(pci_dev, cap_id, address, val, len); > > > > > > switch (cap_id) { > > > #ifdef KVM_CAP_IRQ_ROUTING > > > case PCI_CAP_ID_MSI: > > > #ifdef KVM_CAP_DEVICE_MSI > > > -{ > > > -uint8_t cap = pci_find_capability(pci_dev, cap_id); > > > -if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) { > > > -assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS); > > > -} > > > +cap = pci_find_capability(pci_dev, cap_id); > > > +if (ranges_overlap(address - cap, len, PCI_MSI_FLAGS, 1)) { > > > +assigned_dev_update_msi(pci_dev, cap + PCI_MSI_FLAGS); > > > } > > > #endif > > > break; > > > > > > case PCI_CAP_ID_MSIX: > > > #ifdef KVM_CAP_DEVICE_MSIX > > > -{ > > > -uint8_t cap = pci_find_capability(pci_dev, cap_id); > > > -if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, > > > 1)) { > > > -assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS); > > > -} > > > +cap = pci_find_capability(pci_dev, cap_id); > > > +if (ranges_overlap(address - cap, len, PCI_MSIX_FLAGS + 1, 1)) { > > > +assigned_dev_update_msix(pci_dev, cap + PCI_MSIX_FLAGS); > > > } > > > #endif > > > break; > > > #endif > > > + > > > +case PCI_CAP_ID_VPD: > > > +cap = pci_find_capability(pci_dev
[Qemu-devel] Re: [PATCH 4/8] pci: Replace used bitmap with capability byte map
On Thu, Nov 11, 2010 at 11:07:15PM -0700, Alex Williamson wrote: > On Fri, 2010-11-12 at 07:40 +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 11, 2010 at 07:55:43PM -0700, Alex Williamson wrote: > > > Capabilities are allocated in bytes, so we can track both whether > > > a byte is used and by what capability in the same structure. > > > > > > Remove pci_reserve_capability() as there are no users. > > > > > > Signed-off-by: Alex Williamson > > > > I actually wanted to remove the used array completely and ask > > all users to add offsets directly. > > Will this be needed then? > > Can you give an example, I don't understand what you mean by asking > users to add offsets directly. Here's a dump of patch in progress. Something like the below (untested). After applying this we can remove the whole used array allocator. > I think some kind of tracking what's > where in config space needs to be done somewhere and the common PCI code > seems like it'd be the place. Why do we need it? config lets us scan the capability list readily enough. I had this idea that we should pack capabilities in config space, but now I think it's silly. > Thanks, > > Alex pci: remove config space allocator pci supports allocating caps in config space so they are packed tightly. This doesn't seem to be useful, especially since caps must stay at fixed offsets to ensure backwards compatibility (e.g. for migration). Remove this support, and make virtio-pci supply the offset to the MSIX capability explicitly. Signed-off-by: Michael S. Tsirkin diff --git a/hw/msix.c b/hw/msix.c index f66d255..6fd3791 100644 --- a/hw/msix.c +++ b/hw/msix.c @@ -52,7 +52,7 @@ int msix_supported; * Original bar size must be a power of 2 or 0. * New bar size is returned. */ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, - unsigned bar_nr, unsigned bar_size) + unsigned offset, unsigned bar_nr, unsigned bar_size) { int config_offset; uint8_t *config; @@ -75,7 +75,7 @@ static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries, pdev->msix_bar_size = new_size; config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, - 0, MSIX_CAP_LENGTH); + offset, MSIX_CAP_LENGTH); if (config_offset < 0) return config_offset; config = pdev->config + config_offset; @@ -237,7 +237,7 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned nentries) /* Initialize the MSI-X structures. Note: if MSI-X is supported, BAR size is * modified, it should be retrieved with msix_bar_size. */ int msix_init(struct PCIDevice *dev, unsigned short nentries, - unsigned bar_nr, unsigned bar_size) + unsigned offset, unsigned bar_nr, unsigned bar_size) { int ret; /* Nothing to do if MSI is not supported by interrupt controller */ @@ -261,7 +261,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries, } dev->msix_entries_nr = nentries; -ret = msix_add_config(dev, nentries, bar_nr, bar_size); +ret = msix_add_config(dev, nentries, offset, bar_nr, bar_size); if (ret) goto err_config; diff --git a/hw/msix.h b/hw/msix.h index a9f7993..b61e42e 100644 --- a/hw/msix.h +++ b/hw/msix.h @@ -5,7 +5,7 @@ #include "pci.h" int msix_init(PCIDevice *pdev, unsigned short nentries, - unsigned bar_nr, unsigned bar_size); + unsigned offset, unsigned bar_nr, unsigned bar_size); void msix_write_config(PCIDevice *pci_dev, uint32_t address, uint32_t val, int len); diff --git a/hw/pci.c b/hw/pci.c index aed2d42..e411f12 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -1737,12 +1737,7 @@ int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t offset, uint8_t size) { uint8_t *config; -if (!offset) { -offset = pci_find_space(pdev, size); -if (!offset) { -return -ENOSPC; -} -} +assert(offset >= PCI_CONFIG_HEADER_SIZE); config = pdev->config + offset; config[PCI_CAP_LIST_ID] = cap_id; diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c index 729917d..bcb0266 100644 --- a/hw/virtio-pci.c +++ b/hw/virtio-pci.c @@ -543,7 +543,8 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, config[0x3d] = 1; -if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, 1, 0)) { +if (vdev->nvectors && !msix_init(&proxy->pci_dev, vdev->nvectors, + PCI_CONFIG_HEADER_SIZE, 1, 0)) { pci_register_bar(&proxy->pci_dev, 1, msix_bar_size(&proxy->pci_dev), PCI_BASE_ADDRESS_SPACE_MEMORY,