Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)
Emilio G. Cota writes: > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: >> If we are going to have an indirection then we can also drop the >> requirement to scale the TLB according to the number of MMU indexes we >> have to support. It's fairly wasteful when a bunch of them are almost >> never used unless you are running stuff that uses them. > > So with dynamic TLB sizing, what you're suggesting here is to resize > each MMU array independently (depending on their use rate) instead > of using a single "TLB size" for all MMU indexes. Am I understanding > your point correctly? Not quite - I think it would overly complicate the lookup to have a differently sized TLB lookup for each mmu index - even if their usage patterns are different. I just meant that if we already have the cost of an indirection we don't have to ensure: CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE]; CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE]; restrict their sizes so any entry in the 2D array can be indexed directly from env. Currently CPU_TLB_SIZE/CPU_TLB_BITS is restricted by the number of NB_MMU_MODES we have to support. But if each can be flushed and managed separately we can have: CPUTLBEntry *tlb_table[NB_MMU_MODES]; And size CPU_TLB_SIZE for the maximum offset we can mange in the lookup code. This is mainly driven by the varying TCG_TARGET_TLB_DISPLACEMENT_BITS each backend has available to it. > > Thanks, > > E. -- Alex Bennée
[Qemu-devel] [PULL 0/1] libfdt queue 20181002
The following changes since commit a2ef4d9e95400cd387ab4ae19a317741e458fb07: Merge remote-tracking branch 'remotes/kraxel/tags/ui-20181001-pull-request' into staging (2018-10-01 15:44:30 +0100) are available in the Git repository at: git://github.com/dgibson/qemu.git tags/libfdt-20181002 for you to fetch changes up to 0b001b3094bcf80a5a0c3b7655029a10fb63f661: Update dtc/libfdt submodule to v1.4.7 (2018-10-02 13:53:26 +1000) Update dtc submodule to v1.4.7 We have some upcoming things planned for ppc that will require some newer libfdt features. In preparation, update the dtc/libfdt submodule to upstreasm version v1.4.7. David Gibson (1): Update dtc/libfdt submodule to v1.4.7 dtc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[Qemu-devel] [PULL 1/1] Update dtc/libfdt submodule to v1.4.7
dtc v1.4.7 contains a bunch of improvements to make libfdt safer against handling a corrupted or malicious tree, which is a good thing to have. It also includes an explicit fdt checking function that we'll be wanting in future. Signed-off-by: David Gibson --- dtc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dtc b/dtc index e54388015a..88f18909db 16 --- a/dtc +++ b/dtc @@ -1 +1 @@ -Subproject commit e54388015af1fb4bf04d0bca99caba1074d9cc42 +Subproject commit 88f18909db731a627456f26d779445f84e449536 -- 2.17.1
[Qemu-devel] Have multiple virtio-net devices, but only one of them receives all traffic
Hi, I'm using QEMU 3.0.0 and Linux kernel 4.15.0 on x86 machines. I'm observing pretty weird behavior when I have multiple virtio-net devices. My KVM VM has two virtio-net devices (vhost=off) and I'm using a Linux bridge in the host. The two devices have different MAC/IP addresses. When I tried to access the VM using two different IP addresses (e.g ping or ssh), I found that only one device in VM gets all incoming network traffic while I expected that two devices get traffic for their own IP addresses. I checked it in the several ways. 1) I did ping with two IP addresses from the host/other physical machines in the same subnet, and only one device's interrupt count is increased. 2) I checked the ARP table from the ping sources, and two different IP addresses have the same MAC address. In fact, I dumped ARP messages using tcpdump, and the VM (or the bridge?) replied with the same MAC address for two different IP addresses as attached below. 3) I monitored the host bridge (# bridge monitor) and found that only one device's MAC address is registered. It looks like one device's IP/MAC address is not advertised properly, but I'm not really sure. When I turned off the device getting all the traffic, then the other device starts getting incoming packets; the device's MAC address is registered in the host bridge. The active device only gets traffic for its own IP address, of course. Here's the tcpdump result. IP 10.10.1.100 and 10.10.1.221 are VM's IP addresses. IP 10.10.1.221 is assigned to a device having 52:54:00:12:34:58, but the log shows it is advertised as having ...:57. 23:24:10.983700 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.100 tell kvm-dest-link-1, length 46 23:24:10.983771 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.100 is-at 52:54:00:12:34:57 (oui Unknown), length 28 23:24:17.794811 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.10.1.211 tell kvm-dest-link-1, length 46 23:24:17.794869 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.211 is-at 52:54:00:12:34:57 (oui Unknown), length 28 I would appreciate any help! Thanks, Jintack
Re: [Qemu-devel] [PATCH] PPC: e500: convert SysBus init method to a realize method
On Mon, Oct 01, 2018 at 05:04:13PM +0200, Cédric Le Goater wrote: > Signed-off-by: Cédric Le Goater Applied to ppc-for-3.1, thanks. > --- > hw/pci-host/ppce500.c | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c > index eb75e080fc10..b8f8c112e662 100644 > --- a/hw/pci-host/ppce500.c > +++ b/hw/pci-host/ppce500.c > @@ -436,8 +436,9 @@ static AddressSpace *e500_pcihost_set_iommu(PCIBus *bus, > void *opaque, > return &s->bm_as; > } > > -static int e500_pcihost_initfn(SysBusDevice *dev) > +static void e500_pcihost_realize(DeviceState *dev, Error **errp) > { > +SysBusDevice *sbd = SYS_BUS_DEVICE(dev); > PCIHostState *h; > PPCE500PCIState *s; > PCIBus *b; > @@ -447,7 +448,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev) > s = PPC_E500_PCI_HOST_BRIDGE(dev); > > for (i = 0; i < ARRAY_SIZE(s->irq); i++) { > -sysbus_init_irq(dev, &s->irq[i]); > +sysbus_init_irq(sbd, &s->irq[i]); > } > > for (i = 0; i < PCI_NUM_PINS; i++) { > @@ -460,7 +461,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev) > /* PIO lives at the bottom of our bus space */ > memory_region_add_subregion_overlap(&s->busmem, 0, &s->pio, -2); > > -b = pci_register_root_bus(DEVICE(dev), NULL, mpc85xx_pci_set_irq, > +b = pci_register_root_bus(dev, NULL, mpc85xx_pci_set_irq, >mpc85xx_pci_map_irq, s, &s->busmem, &s->pio, >PCI_DEVFN(s->first_slot, 0), 4, TYPE_PCI_BUS); > h->bus = b; > @@ -483,10 +484,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev) > memory_region_add_subregion(&s->container, PCIE500_CFGADDR, > &h->conf_mem); > memory_region_add_subregion(&s->container, PCIE500_CFGDATA, > &h->data_mem); > memory_region_add_subregion(&s->container, PCIE500_REG_BASE, &s->iomem); > -sysbus_init_mmio(dev, &s->container); > +sysbus_init_mmio(sbd, &s->container); > pci_bus_set_route_irq_fn(b, e500_route_intx_pin_to_irq); > - > -return 0; > } > > static void e500_host_bridge_class_init(ObjectClass *klass, void *data) > @@ -526,9 +525,8 @@ static Property pcihost_properties[] = { > static void e500_pcihost_class_init(ObjectClass *klass, void *data) > { > DeviceClass *dc = DEVICE_CLASS(klass); > -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); > > -k->init = e500_pcihost_initfn; > +dc->realize = e500_pcihost_realize; > set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); > dc->props = pcihost_properties; > dc->vmsd = &vmstate_ppce500_pci; -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH 0/2] ppc: convert SysBus init method to a realize method
On Mon, Oct 01, 2018 at 01:44:19PM +0200, Cédric Le Goater wrote: > Hello, > > Here are a couple of patches converting the SysBus init method to a > realize method. make check tested only. Applied to ppc-for-3.1, thanks. > > Thanks, > > C. > > > Cédric Le Goater (2): > ppc440_pcix: convert SysBus init method to a realize method > ppc4xx_pci: convert SysBus init method to a realize method > > hw/ppc/ppc440_pcix.c | 14 ++ > hw/ppc/ppc4xx_pci.c | 14 ++ > 2 files changed, 12 insertions(+), 16 deletions(-) > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)
On Mon, Oct 01, 2018 at 15:40:37 -0500, Richard Henderson wrote: > On 10/1/18 1:34 PM, Emilio G. Cota wrote: > > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: > >> If we are going to have an indirection then we can also drop the > >> requirement to scale the TLB according to the number of MMU indexes we > >> have to support. It's fairly wasteful when a bunch of them are almost > >> never used unless you are running stuff that uses them. > > > > So with dynamic TLB sizing, what you're suggesting here is to resize > > each MMU array independently (depending on their use rate) instead > > of using a single "TLB size" for all MMU indexes. Am I understanding > > your point correctly? > > You cannot do that without flushing the TBs (and with out-of-line memory ops, > the prologue as well) and regenerating. The TLB size is baked into the code. > And we really don't have any extra registers free to vary that. Can you please elaborate on this? I can't see where this is baked into the generated code, other than the TLB lookup. Grepping for CPU_TLB_SIZE and CPU_TLB_BITS only shows a few places. I have written today a prototype of dynamic TLB flushing. It uses no extra registers because mmu_idx is known at generation time. I haven't done any extensive testing yet, but at least it boots aarch64 and x86_64 guests on an x86_64 host. The code (some messy WIP commits in there, sorry) is at: https://github.com/cota/qemu/tree/tlb2 Please take a look -- am I doing anything horribly wrong there? Thanks, Emilio
[Qemu-devel] [Bug 1795527] [NEW] Malformed audio and video output stuttering after upgrade to QEMU 3.0
Public bug reported: My host is an x86_64 Arch Linux OS with a recompiled 4.18.10 hardened kernel, running a few KVM guests with varying OSes and configurations managed through a Libvirt stack. Among these guests I have two Windows 10 VMs with VGA passthrough and PulseAudio-backed virtual audio devices. After upgrading to QEMU 3.0.0, both of the Win10 guests started showing corrupted audio output in the form of unnatural reproduction speed and occasional but consistently misplaced audio fragments originating from what seems to be a circular buffer wrapping over itself (misbehaviour detected by starting some games with known OSTs and dialogues: soundtracks sound accelerated and past dialogue lines start replaying middle-sentence until the next line starts playing). In addition, the video output of the malfunctioning VMs regularly stutters roughly twice a second for a fraction of a second (sync'ed with the suspected buffer wrapping and especially pronounced during not-pre- rendered cutscenes), toghether with mouse freezes that look like actual input misses more than simple lack of screen refreshes. The issue was succesfully reproduced without the managing stack, directly with the following command line, on the most capable Windows guest: QEMU_AUDIO_DRV=pa QEMU_PA_SERVER=127.0.0.1 /usr/bin/qemu-system-x86_64 -name guest=win10_gms,debug-threads=on \ -machine pc-i440fx-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \ -cpu host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=123456789abc,kvm=off \ -drive file=/usr/share/ovmf/x64/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \ -drive file=/var/lib/libvirt/qemu/nvram/win10_gms_VARS.fd,if=pflash,format=raw,unit=1 \ -m 5120 \ -realtime mlock=off \ -smp 3,sockets=1,cores=3,threads=1 \ -uuid 39b56ee2-6bae-4009-9108-7be26d5d63ac \ -display none \ -no-user-config \ -nodefaults \ -rtc base=localtime,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -device ahci,id=sata0,bus=pci.0,addr=0x9 \ -drive file=/dev/vms/win10_gaming,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on \ -drive file=/dev/sr0,format=raw,if=none,id=drive-sata0-0-0,media=cdrom,readonly=on \ -device ide-cd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0 \ -device intel-hda,id=sound0,bus=pci.0,addr=0x3 \ -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \ -device usb-host,hostbus=2,hostaddr=3,id=hostdev0,bus=usb.0,port=1 \ -device vfio-pci,host=01:00.0,id=hostdev1,bus=pci.0,addr=0x6 \ -device vfio-pci,host=01:00.1,id=hostdev2,bus=pci.0,addr=0x7 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on By "purposedly misconfiguring" the codepaths and replacing "pc-i440fx-3.0" with "pc-i440fx-2.11" (basically reverting the config changes I needed to do in order to update the domain definitions), the stuttering seems to disappear (or at least becomes negligible) and the audio output, despite becoming incredibly distorted, is consistent in every other way, with in-order dialogues and (perceived) correct tempo. In order to exclude eventual misconfigurations in the host's audio processing pipeline, I proceeded to update the domain definition's codepath of another guest running Ubuntu 18.04 with a completely different hardware configuration (no video card passthrough and no PulseAudio backconnection, just a plain emulated VirtIO display and Spice audio device). The audio issue presented itself again in th
[Qemu-devel] [PATCH 04/15] hw/ssi/xilinx_spi: Use DeviceState::realize rather than SysBusDevice::init
Move from the legacy SysBusDevice::init method to using DeviceState::realize. Signed-off-by: Philippe Mathieu-Daudé --- hw/ssi/xilinx_spi.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/ssi/xilinx_spi.c b/hw/ssi/xilinx_spi.c index 83585bc8b2..3dae303d5b 100644 --- a/hw/ssi/xilinx_spi.c +++ b/hw/ssi/xilinx_spi.c @@ -319,9 +319,9 @@ static const MemoryRegionOps spi_ops = { } }; -static int xilinx_spi_init(SysBusDevice *sbd) +static void xilinx_spi_realize(DeviceState *dev, Error **errp) { -DeviceState *dev = DEVICE(sbd); +SysBusDevice *sbd = SYS_BUS_DEVICE(dev); XilinxSPI *s = XILINX_SPI(dev); int i; @@ -344,8 +344,6 @@ static int xilinx_spi_init(SysBusDevice *sbd) fifo8_create(&s->tx_fifo, FIFO_CAPACITY); fifo8_create(&s->rx_fifo, FIFO_CAPACITY); - -return 0; } static const VMStateDescription vmstate_xilinx_spi = { @@ -368,9 +366,8 @@ static Property xilinx_spi_properties[] = { static void xilinx_spi_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); -k->init = xilinx_spi_init; +dc->realize = xilinx_spi_realize; dc->reset = xlx_spi_reset; dc->props = xilinx_spi_properties; dc->vmsd = &vmstate_xilinx_spi; -- 2.19.0
[Qemu-devel] [PATCH 02/15] hw/timer/sun4v-rtc: Convert from DPRINTF() macro to trace events
Signed-off-by: Philippe Mathieu-Daudé --- hw/timer/sun4v-rtc.c | 13 +++-- hw/timer/trace-events | 4 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/hw/timer/sun4v-rtc.c b/hw/timer/sun4v-rtc.c index 310523225f..13be94f8da 100644 --- a/hw/timer/sun4v-rtc.c +++ b/hw/timer/sun4v-rtc.c @@ -14,15 +14,8 @@ #include "hw/sysbus.h" #include "qemu/timer.h" #include "hw/timer/sun4v-rtc.h" +#include "trace.h" -//#define DEBUG_SUN4V_RTC - -#ifdef DEBUG_SUN4V_RTC -#define DPRINTF(fmt, ...) \ -do { printf("sun4v_rtc: " fmt , ## __VA_ARGS__); } while (0) -#else -#define DPRINTF(fmt, ...) do {} while (0) -#endif #define TYPE_SUN4V_RTC "sun4v_rtc" #define SUN4V_RTC(obj) OBJECT_CHECK(Sun4vRtc, (obj), TYPE_SUN4V_RTC) @@ -41,14 +34,14 @@ static uint64_t sun4v_rtc_read(void *opaque, hwaddr addr, /* accessing the high 32 bits */ val >>= 32; } -DPRINTF("read from " TARGET_FMT_plx " val %lx\n", addr, val); +trace_sun4v_rtc_read(addr, val); return val; } static void sun4v_rtc_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) { -DPRINTF("write 0x%x to " TARGET_FMT_plx "\n", (unsigned)val, addr); +trace_sun4v_rtc_read(addr, val); } static const MemoryRegionOps sun4v_rtc_ops = { diff --git a/hw/timer/trace-events b/hw/timer/trace-events index ca9ad6321a..75bd3b1042 100644 --- a/hw/timer/trace-events +++ b/hw/timer/trace-events @@ -66,5 +66,9 @@ cmsdk_apb_dualtimer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK A cmsdk_apb_dualtimer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB dualtimer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u" cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset" +# hw/timer/sun4v-rtc.c +sun4v_rtc_read(uint64_t addr, uint64_t value) "read: addr 0x%" PRIx64 " value 0x%" PRIx64 +sun4v_rtc_write(uint64_t addr, uint64_t value) "write: addr 0x%" PRIx64 " value 0x%" PRIx64 + # hw/timer/xlnx-zynqmp-rtc.c xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int sec) "Get time from host: %d-%d-%d %2d:%02d:%02d" -- 2.19.0
[Qemu-devel] [PATCH 05/15] hw/sh4/sh_pci: Use DeviceState::realize rather than SysBusDevice::init
Move from the legacy SysBusDevice::init method to using DeviceState::realize. Signed-off-by: Philippe Mathieu-Daudé --- hw/sh4/sh_pci.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/hw/sh4/sh_pci.c b/hw/sh4/sh_pci.c index 4ec2e35500..84c52df067 100644 --- a/hw/sh4/sh_pci.c +++ b/hw/sh4/sh_pci.c @@ -120,16 +120,15 @@ static void sh_pci_set_irq(void *opaque, int irq_num, int level) qemu_set_irq(pic[irq_num], level); } -static int sh_pci_device_init(SysBusDevice *dev) +static void sh_pci_device_realize(PCIDevice *dev, Error **errp) { -PCIHostState *phb; -SHPCIState *s; +SysBusDevice *sbd = SYS_BUS_DEVICE(dev); +SHPCIState *s = SH_PCI_HOST_BRIDGE(dev); +PCIHostState *phb = PCI_HOST_BRIDGE(s); int i; -s = SH_PCI_HOST_BRIDGE(dev); -phb = PCI_HOST_BRIDGE(s); for (i = 0; i < 4; i++) { -sysbus_init_irq(dev, &s->irq[i]); +sysbus_init_irq(sbd, &s->irq[i]); } phb->bus = pci_register_root_bus(DEVICE(dev), "pci", sh_pci_set_irq, sh_pci_map_irq, @@ -143,13 +142,12 @@ static int sh_pci_device_init(SysBusDevice *dev) &s->memconfig_p4, 0, 0x224); memory_region_init_alias(&s->isa, OBJECT(s), "sh_pci.isa", get_system_io(), 0, 0x4); -sysbus_init_mmio(dev, &s->memconfig_p4); -sysbus_init_mmio(dev, &s->memconfig_a7); +sysbus_init_mmio(sbd, &s->memconfig_p4); +sysbus_init_mmio(sbd, &s->memconfig_a7); s->iobr = 0xfe24; memory_region_add_subregion(get_system_memory(), s->iobr, &s->isa); s->dev = pci_create_simple(phb->bus, PCI_DEVFN(0, 0), "sh_pci_host"); -return 0; } static void sh_pci_host_realize(PCIDevice *d, Error **errp) @@ -187,9 +185,9 @@ static const TypeInfo sh_pci_host_info = { static void sh_pci_device_class_init(ObjectClass *klass, void *data) { -SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); -sdc->init = sh_pci_device_init; +k->realize = sh_pci_device_realize; } static const TypeInfo sh_pci_device_info = { -- 2.19.0
[Qemu-devel] [PATCH 10/15] hw/sparc64/niagara: Replace 'empty_slot' by 'unimplemented_device'
The TYPE_EMPTY_SLOT and TYPE_UNIMPLEMENTED_DEVICE are identical devices, however the later use more recent APIs and is more widely used. Replace 'empty_slot' by 'unimplemented_device' to simplify devices code maintenance. Signed-off-by: Philippe Mathieu-Daudé --- default-configs/sparc64-softmmu.mak | 1 - hw/sparc64/niagara.c| 4 ++-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/default-configs/sparc64-softmmu.mak b/default-configs/sparc64-softmmu.mak index 52edafe547..ce63d47046 100644 --- a/default-configs/sparc64-softmmu.mak +++ b/default-configs/sparc64-softmmu.mak @@ -16,5 +16,4 @@ CONFIG_SIMBA=y CONFIG_SUNHME=y CONFIG_MC146818RTC=y CONFIG_ISA_TESTDEV=y -CONFIG_EMPTY_SLOT=y CONFIG_SUN4V_RTC=y diff --git a/hw/sparc64/niagara.c b/hw/sparc64/niagara.c index 4fa8cb2904..f8a856f611 100644 --- a/hw/sparc64/niagara.c +++ b/hw/sparc64/niagara.c @@ -29,7 +29,7 @@ #include "hw/hw.h" #include "hw/boards.h" #include "hw/char/serial.h" -#include "hw/empty_slot.h" +#include "hw/misc/unimp.h" #include "hw/loader.h" #include "hw/sparc/sparc64.h" #include "hw/timer/sun4v-rtc.h" @@ -161,7 +161,7 @@ static void niagara_init(MachineState *machine) serial_mm_init(sysmem, NIAGARA_UART_BASE, 0, NULL, 115200, serial_hd(0), DEVICE_BIG_ENDIAN); } -empty_slot_init(NIAGARA_IOBBASE, NIAGARA_IOBSIZE); +create_unimplemented_device("sun4v-iob", NIAGARA_IOBBASE, NIAGARA_IOBSIZE); sun4v_rtc_init(NIAGARA_RTC_BASE); } -- 2.19.0
[Qemu-devel] [PATCH 07/15] hw/mips/gt64xxx_pci: Convert gt64120_reset() function into Device reset method
Convert the gt64120_reset() function into a proper Device reset method. Signed-off-by: Philippe Mathieu-Daudé --- hw/mips/gt64xxx_pci.c | 17 +++-- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c index 24ad0ad024..dcd1a66329 100644 --- a/hw/mips/gt64xxx_pci.c +++ b/hw/mips/gt64xxx_pci.c @@ -992,9 +992,9 @@ static void gt64120_pci_set_irq(void *opaque, int irq_num, int level) } -static void gt64120_reset(void *opaque) +static void gt64120_reset(DeviceState *dev) { -GT64120State *s = opaque; +GT64120State *s = GT64120_PCI_HOST_BRIDGE(dev); /* FIXME: Malta specific hw assumptions ahead */ @@ -1184,16 +1184,6 @@ PCIBus *gt64120_register(qemu_irq *pic) return phb->bus; } -static int gt64120_init(SysBusDevice *dev) -{ -GT64120State *s; - -s = GT64120_PCI_HOST_BRIDGE(dev); - -qemu_register_reset(gt64120_reset, s); -return 0; -} - static void gt64120_pci_realize(PCIDevice *d, Error **errp) { /* FIXME: Malta specific hw assumptions ahead */ @@ -1241,9 +1231,8 @@ static const TypeInfo gt64120_pci_info = { static void gt64120_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); -SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass); -sdc->init = gt64120_init; +dc->reset = gt64120_reset; dc->vmsd = &vmstate_gt64120; } -- 2.19.0
[Qemu-devel] [PATCH 08/15] hw/mips/gt64xxx_pci: Mark as bridge device
The gt64120 is currently listed as uncategorized device. Mark it as bridge device. Signed-off-by: Philippe Mathieu-Daudé --- hw/mips/gt64xxx_pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c index dcd1a66329..1cd8aac658 100644 --- a/hw/mips/gt64xxx_pci.c +++ b/hw/mips/gt64xxx_pci.c @@ -1232,6 +1232,7 @@ static void gt64120_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); +set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); dc->reset = gt64120_reset; dc->vmsd = &vmstate_gt64120; } -- 2.19.0
[Qemu-devel] [RFC PATCH 13/15] hw/alpha/typhoon: Remove unuseful code
Signed-off-by: Philippe Mathieu-Daudé --- hw/alpha/typhoon.c | 13 - 1 file changed, 13 deletions(-) diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c index d74b5b55e1..8004afe45b 100644 --- a/hw/alpha/typhoon.c +++ b/hw/alpha/typhoon.c @@ -932,23 +932,10 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus **isa_bus, return b; } -static int typhoon_pcihost_init(SysBusDevice *dev) -{ -return 0; -} - -static void typhoon_pcihost_class_init(ObjectClass *klass, void *data) -{ -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); - -k->init = typhoon_pcihost_init; -} - static const TypeInfo typhoon_pcihost_info = { .name = TYPE_TYPHOON_PCI_HOST_BRIDGE, .parent= TYPE_PCI_HOST_BRIDGE, .instance_size = sizeof(TyphoonState), -.class_init= typhoon_pcihost_class_init, }; static void typhoon_iommu_memory_region_class_init(ObjectClass *klass, -- 2.19.0
[Qemu-devel] [PATCH 03/15] hw/timer/sun4v-rtc: Use DeviceState::realize rather than SysBusDevice::init
Move from the legacy SysBusDevice::init method to using DeviceState::realize. Signed-off-by: Philippe Mathieu-Daudé --- hw/timer/sun4v-rtc.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/timer/sun4v-rtc.c b/hw/timer/sun4v-rtc.c index 13be94f8da..4e7f6a1eff 100644 --- a/hw/timer/sun4v-rtc.c +++ b/hw/timer/sun4v-rtc.c @@ -63,21 +63,21 @@ void sun4v_rtc_init(hwaddr addr) sysbus_mmio_map(s, 0, addr); } -static int sun4v_rtc_init1(SysBusDevice *dev) +static void sun4v_rtc_realize(DeviceState *dev, Error **errp) { +SysBusDevice *sbd = SYS_BUS_DEVICE(dev); Sun4vRtc *s = SUN4V_RTC(dev); memory_region_init_io(&s->iomem, OBJECT(s), &sun4v_rtc_ops, s, "sun4v-rtc", 0x08ULL); -sysbus_init_mmio(dev, &s->iomem); -return 0; +sysbus_init_mmio(sbd, &s->iomem); } static void sun4v_rtc_class_init(ObjectClass *klass, void *data) { -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); +DeviceClass *dc = DEVICE_CLASS(klass); -k->init = sun4v_rtc_init1; +dc->realize = sun4v_rtc_realize; } static const TypeInfo sun4v_rtc_info = { -- 2.19.0
[Qemu-devel] [PATCH 09/15] hw/mips/malta: Replace 'empty_slot' by 'unimplemented_device'
The TYPE_EMPTY_SLOT and TYPE_UNIMPLEMENTED_DEVICE are identical devices, however the later use more recent APIs and is more widely used. Replace 'empty_slot' by 'unimplemented_device' to simplify devices code maintenance. Signed-off-by: Philippe Mathieu-Daudé --- default-configs/mips-softmmu-common.mak | 1 - hw/mips/mips_malta.c| 4 ++-- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/default-configs/mips-softmmu-common.mak b/default-configs/mips-softmmu-common.mak index fae2347ee7..f9d664e120 100644 --- a/default-configs/mips-softmmu-common.mak +++ b/default-configs/mips-softmmu-common.mak @@ -32,7 +32,6 @@ CONFIG_PFLASH_CFI01=y CONFIG_I8259=y CONFIG_MC146818RTC=y CONFIG_ISA_TESTDEV=y -CONFIG_EMPTY_SLOT=y CONFIG_MIPS_CPS=y CONFIG_MIPS_ITU=y CONFIG_I2C=y diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 40041d5ec0..4ccfa87c35 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -53,7 +53,7 @@ #include "sysemu/qtest.h" #include "qapi/error.h" #include "qemu/error-report.h" -#include "hw/empty_slot.h" +#include "hw/misc/unimp.h" #include "sysemu/kvm.h" #include "exec/semihost.h" #include "hw/mips/cps.h" @@ -1216,7 +1216,7 @@ void mips_malta_init(MachineState *machine) /* The whole address space decoded by the GT-64120A doesn't generate exception when accessing invalid memory. Create an empty slot to emulate this feature. */ -empty_slot_init(0, 0x2000); +create_unimplemented_device("gt64120-SysAD", 0, 0x2000); qdev_init_nofail(dev); -- 2.19.0
[Qemu-devel] [PATCH 11/15] hw/sparc/sun4m: Replace 'empty_slot' by 'unimplemented_device'
The TYPE_EMPTY_SLOT and TYPE_UNIMPLEMENTED_DEVICE are identical devices, however the later use more recent APIs and is more widely used. Replace 'empty_slot' by 'unimplemented_device' to simplify devices code maintenance. Signed-off-by: Philippe Mathieu-Daudé --- default-configs/sparc-softmmu.mak | 1 - hw/sparc/sun4m.c | 24 ++-- 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/default-configs/sparc-softmmu.mak b/default-configs/sparc-softmmu.mak index 12f97eeb20..7369b54467 100644 --- a/default-configs/sparc-softmmu.mak +++ b/default-configs/sparc-softmmu.mak @@ -8,7 +8,6 @@ CONFIG_ESCC=y CONFIG_M48T59=y CONFIG_PTIMER=y CONFIG_FDC=y -CONFIG_EMPTY_SLOT=y CONFIG_PCNET_COMMON=y CONFIG_LANCE=y CONFIG_TCX=y diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index 3c29b68e67..ca37acf7af 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -42,7 +42,7 @@ #include "hw/nvram/chrp_nvram.h" #include "hw/nvram/fw_cfg.h" #include "hw/char/escc.h" -#include "hw/empty_slot.h" +#include "hw/misc/unimp.h" #include "hw/loader.h" #include "elf.h" #include "trace.h" @@ -863,7 +863,8 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, ram_init(0, machine->ram_size, hwdef->max_mem); /* models without ECC don't trap when missing ram is accessed */ if (!hwdef->ecc_base) { -empty_slot_init(machine->ram_size, hwdef->max_mem - machine->ram_size); +create_unimplemented_device("ecc", machine->ram_size, +hwdef->max_mem - machine->ram_size); } prom_init(hwdef->slavio_base, bios_name); @@ -892,9 +893,10 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, if (hwdef->iommu_pad_base) { /* On the real hardware (SS-5, LX) the MMU is not padded, but aliased. Software shouldn't use aliased addresses, neither should it crash - when does. Using empty_slot instead of aliasing can help with - debugging such accesses */ -empty_slot_init(hwdef->iommu_pad_base,hwdef->iommu_pad_len); + when does. Using the 'unimplemented device' instead of aliasing can + help with debugging such accesses */ +create_unimplemented_device("iommu.alias", hwdef->iommu_pad_base, +hwdef->iommu_pad_len); } sparc32_dma_init(hwdef->dma_base, @@ -944,12 +946,13 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, for (i = num_vsimms; i < MAX_VSIMMS; i++) { /* vsimm registers probed by OBP */ if (hwdef->vsimm[i].reg_base) { -empty_slot_init(hwdef->vsimm[i].reg_base, 0x2000); +create_unimplemented_device("vsimm", hwdef->vsimm[i].reg_base, +0x2000); } } if (hwdef->sx_base) { -empty_slot_init(hwdef->sx_base, 0x2000); +create_unimplemented_device("sx", hwdef->sx_base, 0x2000); } nvram = m48t59_init(slavio_irq[0], hwdef->nvram_base, 0, 0x2000, 1968, 8); @@ -1012,14 +1015,15 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef, if (hwdef->dbri_base) { /* ISDN chip with attached CS4215 audio codec */ /* prom space */ -empty_slot_init(hwdef->dbri_base+0x1000, 0x30); +create_unimplemented_device("dbri.prom", hwdef->dbri_base + 0x1000, +0x30); /* reg space */ -empty_slot_init(hwdef->dbri_base+0x1, 0x100); +create_unimplemented_device("dbri", hwdef->dbri_base + 0x1, 0x100); } if (hwdef->bpp_base) { /* parallel port */ -empty_slot_init(hwdef->bpp_base, 0x20); +create_unimplemented_device("bpp", hwdef->bpp_base, 0x20); } kernel_size = sun4m_load_kernel(machine->kernel_filename, -- 2.19.0
[Qemu-devel] [PATCH 12/15] hw/core: Remove the 'empty_slot' device
All previous users of TYPE_EMPTY_SLOT now use TYPE_UNIMPLEMENTED_DEVICE. Since TYPE_EMPTY_SLOT is no more used/referenced, remove it. Signed-off-by: Philippe Mathieu-Daudé --- include/hw/empty_slot.h | 7 --- hw/core/empty_slot.c| 103 hw/core/Makefile.objs | 1 - 3 files changed, 111 deletions(-) delete mode 100644 include/hw/empty_slot.h delete mode 100644 hw/core/empty_slot.c diff --git a/include/hw/empty_slot.h b/include/hw/empty_slot.h deleted file mode 100644 index 123a9f8989..00 --- a/include/hw/empty_slot.h +++ /dev/null @@ -1,7 +0,0 @@ -#ifndef HW_EMPTY_SLOT_H -#define HW_EMPTY_SLOT_H - -/* empty_slot.c */ -void empty_slot_init(hwaddr addr, uint64_t slot_size); - -#endif diff --git a/hw/core/empty_slot.c b/hw/core/empty_slot.c deleted file mode 100644 index c1b9c2b104..00 --- a/hw/core/empty_slot.c +++ /dev/null @@ -1,103 +0,0 @@ -/* - * QEMU Empty Slot - * - * The empty_slot device emulates known to a bus but not connected devices. - * - * Copyright (c) 2010 Artyom Tarasenko - * - * This code is licensed under the GNU GPL v2 or (at your option) any later - * version. - */ - -#include "qemu/osdep.h" -#include "hw/hw.h" -#include "hw/sysbus.h" -#include "hw/empty_slot.h" - -//#define DEBUG_EMPTY_SLOT - -#ifdef DEBUG_EMPTY_SLOT -#define DPRINTF(fmt, ...) \ -do { printf("empty_slot: " fmt , ## __VA_ARGS__); } while (0) -#else -#define DPRINTF(fmt, ...) do {} while (0) -#endif - -#define TYPE_EMPTY_SLOT "empty_slot" -#define EMPTY_SLOT(obj) OBJECT_CHECK(EmptySlot, (obj), TYPE_EMPTY_SLOT) - -typedef struct EmptySlot { -SysBusDevice parent_obj; - -MemoryRegion iomem; -uint64_t size; -} EmptySlot; - -static uint64_t empty_slot_read(void *opaque, hwaddr addr, -unsigned size) -{ -DPRINTF("read from " TARGET_FMT_plx "\n", addr); -return 0; -} - -static void empty_slot_write(void *opaque, hwaddr addr, - uint64_t val, unsigned size) -{ -DPRINTF("write 0x%x to " TARGET_FMT_plx "\n", (unsigned)val, addr); -} - -static const MemoryRegionOps empty_slot_ops = { -.read = empty_slot_read, -.write = empty_slot_write, -.endianness = DEVICE_NATIVE_ENDIAN, -}; - -void empty_slot_init(hwaddr addr, uint64_t slot_size) -{ -if (slot_size > 0) { -/* Only empty slots larger than 0 byte need handling. */ -DeviceState *dev; -SysBusDevice *s; -EmptySlot *e; - -dev = qdev_create(NULL, TYPE_EMPTY_SLOT); -s = SYS_BUS_DEVICE(dev); -e = EMPTY_SLOT(dev); -e->size = slot_size; - -qdev_init_nofail(dev); - -sysbus_mmio_map(s, 0, addr); -} -} - -static int empty_slot_init1(SysBusDevice *dev) -{ -EmptySlot *s = EMPTY_SLOT(dev); - -memory_region_init_io(&s->iomem, OBJECT(s), &empty_slot_ops, s, - "empty-slot", s->size); -sysbus_init_mmio(dev, &s->iomem); -return 0; -} - -static void empty_slot_class_init(ObjectClass *klass, void *data) -{ -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); - -k->init = empty_slot_init1; -} - -static const TypeInfo empty_slot_info = { -.name = TYPE_EMPTY_SLOT, -.parent= TYPE_SYS_BUS_DEVICE, -.instance_size = sizeof(EmptySlot), -.class_init= empty_slot_class_init, -}; - -static void empty_slot_register_types(void) -{ -type_register_static(&empty_slot_info); -} - -type_init(empty_slot_register_types) diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs index eb88ca979e..fd75172a21 100644 --- a/hw/core/Makefile.objs +++ b/hw/core/Makefile.objs @@ -8,7 +8,6 @@ common-obj-y += irq.o common-obj-y += hotplug.o common-obj-$(CONFIG_SOFTMMU) += nmi.o -common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o common-obj-$(CONFIG_XILINX_AXI) += stream.o common-obj-$(CONFIG_PTIMER) += ptimer.o common-obj-$(CONFIG_SOFTMMU) += sysbus.o -- 2.19.0
[Qemu-devel] [PATCH 00/15] another SysBusDevice::init to Device::realize cleanup
Peter suggested [1] another crusade for this merge window, then Cédric jumped on his horse [2]. My turn on my dromedary. - convert few devices to DeviceState::realize, - kill the empty_slot device, - remove unuseful class_init() code [RFC, do we want to keep this?] - few other minor fixes catched while editing Regards, Phil. [1] https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg03605.html [2] https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg1.html Philippe Mathieu-Daudé (15): trace-events: Fix copy/paste typo hw/timer/sun4v-rtc: Convert from DPRINTF() macro to trace events hw/timer/sun4v-rtc: Use DeviceState::realize rather than SysBusDevice::init hw/ssi/xilinx_spi: Use DeviceState::realize rather than SysBusDevice::init hw/sh4/sh_pci: Use DeviceState::realize rather than SysBusDevice::init hw/pci-host/bonito: Use DeviceState::realize rather than SysBusDevice::init hw/mips/gt64xxx_pci: Convert gt64120_reset() function into Device reset method hw/mips/gt64xxx_pci: Mark as bridge device hw/mips/malta: Replace 'empty_slot' by 'unimplemented_device' hw/sparc64/niagara: Replace 'empty_slot' by 'unimplemented_device' hw/sparc/sun4m: Replace 'empty_slot' by 'unimplemented_device' hw/core: Remove the 'empty_slot' device hw/alpha/typhoon: Remove unuseful code hw/hppa/dino: Remove unuseful code hw/mips/malta: Remove unuseful code default-configs/mips-softmmu-common.mak | 1 - default-configs/sparc-softmmu.mak | 1 - default-configs/sparc64-softmmu.mak | 1 - include/hw/empty_slot.h | 7 -- hw/alpha/typhoon.c | 13 --- hw/core/empty_slot.c| 103 hw/hppa/dino.c | 7 -- hw/mips/gt64xxx_pci.c | 18 + hw/mips/mips_malta.c| 17 +--- hw/pci-host/bonito.c| 9 +-- hw/sh4/sh_pci.c | 20 +++-- hw/sparc/sun4m.c| 24 +++--- hw/sparc64/niagara.c| 4 +- hw/ssi/xilinx_spi.c | 9 +-- hw/timer/sun4v-rtc.c| 23 ++ hw/core/Makefile.objs | 1 - hw/timer/trace-events | 6 +- 17 files changed, 50 insertions(+), 214 deletions(-) delete mode 100644 include/hw/empty_slot.h delete mode 100644 hw/core/empty_slot.c -- 2.19.0
[Qemu-devel] [RFC PATCH 15/15] hw/mips/malta: Remove unuseful code
Signed-off-by: Philippe Mathieu-Daudé --- hw/mips/mips_malta.c | 13 - 1 file changed, 13 deletions(-) diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c index 4ccfa87c35..b6633fa141 100644 --- a/hw/mips/mips_malta.c +++ b/hw/mips/mips_malta.c @@ -1422,23 +1422,10 @@ void mips_malta_init(MachineState *machine) pci_vga_init(pci_bus); } -static int mips_malta_sysbus_device_init(SysBusDevice *sysbusdev) -{ -return 0; -} - -static void mips_malta_class_init(ObjectClass *klass, void *data) -{ -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); - -k->init = mips_malta_sysbus_device_init; -} - static const TypeInfo mips_malta_device = { .name = TYPE_MIPS_MALTA, .parent= TYPE_SYS_BUS_DEVICE, .instance_size = sizeof(MaltaState), -.class_init= mips_malta_class_init, }; static void mips_malta_machine_init(MachineClass *mc) -- 2.19.0
[Qemu-devel] [PATCH 01/15] trace-events: Fix copy/paste typo
Missed while reviewing 5dd85b4b486. Signed-off-by: Philippe Mathieu-Daudé --- hw/timer/trace-events | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/timer/trace-events b/hw/timer/trace-events index fa4213df5b..ca9ad6321a 100644 --- a/hw/timer/trace-events +++ b/hw/timer/trace-events @@ -56,7 +56,7 @@ systick_timer_tick(void) "systick reload" systick_read(uint64_t addr, uint32_t value, unsigned size) "systick read addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u" systick_write(uint64_t addr, uint32_t value, unsigned size) "systick write addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u" -# hw/char/cmsdk_apb_timer.c +# hw/timer/cmsdk_apb_timer.c cmsdk_apb_timer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB timer read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u" cmsdk_apb_timer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB timer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u" cmsdk_apb_timer_reset(void) "CMSDK APB timer: reset" -- 2.19.0
[Qemu-devel] [RFC PATCH 14/15] hw/hppa/dino: Remove unuseful code
Signed-off-by: Philippe Mathieu-Daudé --- hw/hppa/dino.c | 7 --- 1 file changed, 7 deletions(-) diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c index 564b938e3a..31e09942b5 100644 --- a/hw/hppa/dino.c +++ b/hw/hppa/dino.c @@ -488,17 +488,10 @@ PCIBus *dino_init(MemoryRegion *addr_space, return b; } -static int dino_pcihost_init(SysBusDevice *dev) -{ -return 0; -} - static void dino_pcihost_class_init(ObjectClass *klass, void *data) { -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass); -k->init = dino_pcihost_init; dc->vmsd = &vmstate_dino; } -- 2.19.0
[Qemu-devel] [PATCH 06/15] hw/pci-host/bonito: Use DeviceState::realize rather than SysBusDevice::init
Move from the legacy SysBusDevice::init method to using DeviceState::realize. Signed-off-by: Philippe Mathieu-Daudé --- hw/pci-host/bonito.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/pci-host/bonito.c b/hw/pci-host/bonito.c index 9868e2eccc..03d1ec33e3 100644 --- a/hw/pci-host/bonito.c +++ b/hw/pci-host/bonito.c @@ -595,7 +595,7 @@ static const VMStateDescription vmstate_bonito = { } }; -static int bonito_pcihost_initfn(SysBusDevice *dev) +static void bonito_pcihost_realize(PCIDevice *dev, Error **errp) { PCIHostState *phb = PCI_HOST_BRIDGE(dev); @@ -603,8 +603,6 @@ static int bonito_pcihost_initfn(SysBusDevice *dev) pci_bonito_set_irq, pci_bonito_map_irq, dev, get_system_memory(), get_system_io(), 0x28, 32, TYPE_PCI_BUS); - -return 0; } static void bonito_realize(PCIDevice *dev, Error **errp) @@ -684,7 +682,6 @@ PCIBus *bonito_init(qemu_irq *pic) pcihost->pic = pic; qdev_init_nofail(dev); -/* set the pcihost pointer before bonito_initfn is called */ d = pci_create(phb->bus, PCI_DEVFN(0, 0), TYPE_PCI_BONITO); s = PCI_BONITO(d); s->pcihost = pcihost; @@ -726,9 +723,9 @@ static const TypeInfo bonito_info = { static void bonito_pcihost_class_init(ObjectClass *klass, void *data) { -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); +PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); -k->init = bonito_pcihost_initfn; +k->realize = bonito_pcihost_realize; } static const TypeInfo bonito_pcihost_info = { -- 2.19.0
Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)
On 10/1/18 1:34 PM, Emilio G. Cota wrote: > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: >> If we are going to have an indirection then we can also drop the >> requirement to scale the TLB according to the number of MMU indexes we >> have to support. It's fairly wasteful when a bunch of them are almost >> never used unless you are running stuff that uses them. > > So with dynamic TLB sizing, what you're suggesting here is to resize > each MMU array independently (depending on their use rate) instead > of using a single "TLB size" for all MMU indexes. Am I understanding > your point correctly? You cannot do that without flushing the TBs (and with out-of-line memory ops, the prologue as well) and regenerating. The TLB size is baked into the code. And we really don't have any extra registers free to vary that. r~
[Qemu-devel] [PATCH] lsi: Reselection needed to remove pending commands from queue
Under heavy IO (e.g. fio) the queue is not checked frequently enough for pending commands. As a result some pending commands are timed out by the linux sym53c8xx driver, which sends SCSI Abort messages for the timed out commands. The SCSI Abort messages result in linux errors, which show up in /var/log/messages. e.g. sd 0:0:3:0: [sdd] tag#33 ABORT operation started scsi target0:0:3: control msgout: 80 20 47 d sd 0:0:3:0: ABORT operation complete. scsi target0:0:4: message d sent on bad reselection Add a deadline along with the command when it is added to the queue. When the current command completes, check the queue for pending commands that have exceeded the deadline and if so, simulate a Wait Reselect to handle the pending commands on the queue. When a Wait Reselect is needed, intercept and save the current DMA Scripts Ptr (DSP) contents and load it instead with the pointer to the Reselection Scripts. When Reselection has completed, restore the original DSP contents. Signed-off-by: George Kennedy --- hw/scsi/lsi53c895a.c | 94 ++-- 1 file changed, 91 insertions(+), 3 deletions(-) diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c index 996b406..1e38ceb 100644 --- a/hw/scsi/lsi53c895a.c +++ b/hw/scsi/lsi53c895a.c @@ -198,6 +198,7 @@ typedef struct lsi_request { uint32_t dma_len; uint8_t *dma_buf; uint32_t pending; +uint64_t deadline; int out; QTAILQ_ENTRY(lsi_request) next; } lsi_request; @@ -232,6 +233,9 @@ typedef struct { int command_complete; QTAILQ_HEAD(, lsi_request) queue; lsi_request *current; +int want_resel; /* need resel to handle queued completed cmds */ +uint32_t resel_dsp; /* DMA Scripts Ptr (DSP) of reselection scsi scripts */ +uint32_t next_dsp; /* if want_resel, will be loaded with above */ uint32_t dsa; uint32_t temp; @@ -310,6 +314,44 @@ static inline int lsi_irq_on_rsl(LSIState *s) { return (s->sien0 & LSI_SIST0_RSL) && (s->scid & LSI_SCID_RRE); } +#ifdef DEBUG_LSI +static void showq(LSIState *s) +{ +lsi_request *p; +int count = 0; + +QTAILQ_FOREACH(p, &s->queue, next) { +DPRINTF("%d: tag=%x, pending=%x, deadline=%lx\n", +count++, p->tag, p->pending, p->deadline); +} +} + +static int get_pending_count(LSIState *s) +{ +lsi_request *p; +int count = 0; + +QTAILQ_FOREACH(p, &s->queue, next) { +if (p->pending) { +count++; +} +} +return count; +} +#endif +static int pending_past_deadline(LSIState *s) +{ +lsi_request *p; + +QTAILQ_FOREACH(p, &s->queue, next) { +if (p->pending) { +if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > p->deadline) { +return 1; +} +} +} +return 0; +} static void lsi_soft_reset(LSIState *s) { @@ -634,15 +676,22 @@ static void lsi_do_dma(LSIState *s, int out) } } +/* Max time a completed command can be on the queue before Reselection needed */ +#define LSI_DEADLINE1000 /* Add a command to the queue. */ static void lsi_queue_command(LSIState *s) { lsi_request *p = s->current; +uint64_t timeout_ms = LSI_DEADLINE; DPRINTF("Queueing tag=0x%x\n", p->tag); assert(s->current != NULL); assert(s->current->dma_len == 0); + +p->deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + +timeout_ms * 100ULL; + QTAILQ_INSERT_TAIL(&s->queue, s->current, next); s->current = NULL; @@ -668,6 +717,8 @@ static void lsi_reselect(LSIState *s, lsi_request *p) assert(s->current == NULL); QTAILQ_REMOVE(&s->queue, p, next); +DPRINTF("lsi_reselect: p->tag=%x, p->pending=%x, p->deadline=%lx\n", +p->tag, p->pending, p->deadline); s->current = p; id = (p->tag >> 8) & 0xf; @@ -775,6 +826,10 @@ static void lsi_command_complete(SCSIRequest *req, uint32_t status, size_t resid lsi_request_free(s, s->current); scsi_req_unref(req); } +if (pending_past_deadline(s)) { +DPRINTF("lsi_command_complete: want_resel = 1\n"); +s->want_resel = 1; +} lsi_resume_script(s); } @@ -987,7 +1042,7 @@ static void lsi_do_msgout(LSIState *s) s->select_tag |= lsi_get_msgbyte(s) | LSI_TAG_VALID; break; case 0x22: /* ORDERED queue */ -BADF("ORDERED queue not implemented\n"); +DPRINTF("ORDERED queue not implemented\n"); s->select_tag |= lsi_get_msgbyte(s) | LSI_TAG_VALID; break; case 0x0d: @@ -1075,8 +1130,17 @@ static void lsi_memcpy(LSIState *s, uint32_t dest, uint32_t src, int count) static void lsi_wait_reselect(LSIState *s) { lsi_request *p; +#ifdef DEBUG_LSI +int pending_count = get_pending_count(s); -DPRINTF("Wait Reselect\n"); +if (pending_count) { +DPRINTF("Wait Reselect, s->current=%p, pending_count=%d\n", +s->current, pending_count); +
Re: [Qemu-devel] [PATCH v4 3/3] tests: Add unit tests for image locking
On 21.08.18 02:58, Fam Zheng wrote: > Signed-off-by: Fam Zheng > --- > tests/Makefile.include | 2 + > tests/test-image-locking.c | 154 + > 2 files changed, 156 insertions(+) > create mode 100644 tests/test-image-locking.c > > diff --git a/tests/Makefile.include b/tests/Makefile.include > index 760a0f18b6..8cc0595b39 100644 > --- a/tests/Makefile.include > +++ b/tests/Makefile.include > @@ -95,6 +95,7 @@ check-unit-y += tests/test-bdrv-drain$(EXESUF) > check-unit-y += tests/test-blockjob$(EXESUF) > check-unit-y += tests/test-blockjob-txn$(EXESUF) > check-unit-y += tests/test-block-backend$(EXESUF) > +check-unit-y += tests/test-image-locking$(EXESUF) > check-unit-y += tests/test-x86-cpuid$(EXESUF) > # all code tested by test-x86-cpuid is inside topology.h > gcov-files-test-x86-cpuid-y = > @@ -640,6 +641,7 @@ tests/test-bdrv-drain$(EXESUF): tests/test-bdrv-drain.o > $(test-block-obj-y) $(te > tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) > $(test-util-obj-y) > tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o > $(test-block-obj-y) $(test-util-obj-y) > tests/test-block-backend$(EXESUF): tests/test-block-backend.o > $(test-block-obj-y) $(test-util-obj-y) > +tests/test-image-locking$(EXESUF): tests/test-image-locking.o > $(test-block-obj-y) $(test-util-obj-y) > tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y) > tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y) > tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) > $(test-crypto-obj-y) > diff --git a/tests/test-image-locking.c b/tests/test-image-locking.c > new file mode 100644 > index 00..ac1a5fd4ca > --- /dev/null > +++ b/tests/test-image-locking.c > @@ -0,0 +1,154 @@ > +/* > + * Image locking tests > + * > + * Copyright (c) 2018 Red Hat Inc. > + * > + * Author: Fam Zheng > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > copy > + * of this software and associated documentation files (the "Software"), to > deal > + * in the Software without restriction, including without limitation the > rights > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > + * copies of the Software, and to permit persons to whom the Software is > + * furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice shall be included in > + * all copies or substantial portions of the Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > + * THE SOFTWARE. > + */ > + > +#include "qemu/osdep.h" > +#include "block/block.h" > +#include "sysemu/block-backend.h" > +#include "qapi/error.h" > +#include "qapi/qmp/qdict.h" > + > +static BlockBackend *open_image(const char *path, > +uint64_t perm, uint64_t shared_perm, > +Error **errp) > +{ > +Error *local_err = NULL; > +BlockBackend *blk; > +QDict *options = qdict_new(); > + > +qdict_put_str(options, "driver", "raw"); > +blk = blk_new_open(path, NULL, options, BDRV_O_RDWR, &local_err); > +if (blk) { > +g_assert_null(local_err); > +if (blk_set_perm(blk, perm, shared_perm, errp)) { > +blk_unref(blk); > +blk = NULL; > +} > +} else { > +error_propagate(errp, local_err); > +} > +return blk; > +} > + > +static void check_locked_bytes(int fd, uint64_t perm_locks, > + uint64_t shared_perm_locks) > +{ > +int i; > + > +if (!perm_locks && !shared_perm_locks) { > +g_assert(!qemu_lock_fd_test(fd, 0, 0, true)); > +return; > +} > +for (i = 0; (1ULL << i) <= BLK_PERM_ALL; i++) { > +uint64_t bit = (1ULL << i); > +bool perm_expected = !!(bit & perm_locks); > +bool shared_perm_expected = !!(bit & shared_perm_locks); > +g_assert_cmpint(perm_expected, ==, > +!!qemu_lock_fd_test(fd, 100 + i, 1, true)); > +g_assert_cmpint(shared_perm_expected, ==, > +!!qemu_lock_fd_test(fd, 200 + i, 1, true)); > +} > +} > + > +static void test_image_locking_basic(void) > +{ > +BlockBackend *blk1, *blk2, *blk3; > +char img_path[] = "/tmp/qtest.XX"; > +uint64_t perm, shared_perm; > + > +int fd = mkstemp(img_path); > +assert(fd >= 0); Don't you want to remove these temporary files after the test? > + > +perm = BLK_PERM_WRITE | B
[Qemu-devel] [PATCH v5 2/9] x86_iommu: move vtd_generate_msi_message in common file
The vtd_generate_msi_message() in intel-iommu is used to construct a MSI Message from IRQ. A similar function will be needed when we add interrupt remapping support in amd-iommu. Moving the function in common file to avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message(). There is no logic changes in the code flow. Signed-off-by: Brijesh Singh Suggested-by: Peter Xu Reviewed-by: Eduardo Habkost Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/intel_iommu.c | 32 +++-- hw/i386/x86-iommu.c | 24 include/hw/i386/intel_iommu.h | 59 -- include/hw/i386/x86-iommu.h | 66 +++ 4 files changed, 94 insertions(+), 87 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 84dbc20..014418b 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2701,7 +2701,7 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t index, /* Fetch IRQ information of specific IR index */ static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, - VTDIrq *irq, uint16_t sid) + X86IOMMUIrq *irq, uint16_t sid) { VTD_IR_TableEntry irte = {}; int ret = 0; @@ -2730,30 +2730,6 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index, return 0; } -/* Generate one MSI message from VTDIrq info */ -static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out) -{ -VTD_MSIMessage msg = {}; - -/* Generate address bits */ -msg.dest_mode = irq->dest_mode; -msg.redir_hint = irq->redir_hint; -msg.dest = irq->dest; -msg.__addr_hi = irq->dest & 0xff00; -msg.__addr_head = cpu_to_le32(0xfee); -/* Keep this from original MSI address bits */ -msg.__not_used = irq->msi_addr_last_bits; - -/* Generate data bits */ -msg.vector = irq->vector; -msg.delivery_mode = irq->delivery_mode; -msg.level = 1; -msg.trigger_mode = irq->trigger_mode; - -msg_out->address = msg.msi_addr; -msg_out->data = msg.msi_data; -} - /* Interrupt remapping for MSI/MSI-X entry */ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, MSIMessage *origin, @@ -2763,7 +2739,7 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, int ret = 0; VTD_IR_MSIAddress addr; uint16_t index; -VTDIrq irq = {}; +X86IOMMUIrq irq = {}; assert(origin && translated); @@ -2842,8 +2818,8 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu, */ irq.msi_addr_last_bits = addr.addr.__not_care; -/* Translate VTDIrq to MSI message */ -vtd_generate_msi_message(&irq, translated); +/* Translate X86IOMMUIrq to MSI message */ +x86_iommu_irq_to_msi_message(&irq, translated); out: trace_vtd_ir_remap_msi(origin->address, origin->data, diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c index 7440cb8..abc3c03 100644 --- a/hw/i386/x86-iommu.c +++ b/hw/i386/x86-iommu.c @@ -53,6 +53,30 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool global, } } +/* Generate one MSI message from VTDIrq info */ +void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *msg_out) +{ +X86IOMMU_MSIMessage msg = {}; + +/* Generate address bits */ +msg.dest_mode = irq->dest_mode; +msg.redir_hint = irq->redir_hint; +msg.dest = irq->dest; +msg.__addr_hi = irq->dest & 0xff00; +msg.__addr_head = cpu_to_le32(0xfee); +/* Keep this from original MSI address bits */ +msg.__not_used = irq->msi_addr_last_bits; + +/* Generate data bits */ +msg.vector = irq->vector; +msg.delivery_mode = irq->delivery_mode; +msg.level = 1; +msg.trigger_mode = irq->trigger_mode; + +msg_out->address = msg.msi_addr; +msg_out->data = msg.msi_data; +} + /* Default X86 IOMMU device */ static X86IOMMUState *x86_iommu_default = NULL; diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index fbfedcb..ed4e758 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -66,8 +66,6 @@ typedef struct VTDIOTLBEntry VTDIOTLBEntry; typedef struct VTDBus VTDBus; typedef union VTD_IR_TableEntry VTD_IR_TableEntry; typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress; -typedef struct VTDIrq VTDIrq; -typedef struct VTD_MSIMessage VTD_MSIMessage; /* Context-Entry */ struct VTDContextEntry { @@ -197,63 +195,6 @@ union VTD_IR_MSIAddress { uint32_t data; }; -/* Generic IRQ entry information */ -struct VTDIrq { -/* Used by both IOAPIC/MSI interrupt remapping */ -uint8_t trigger_mode; -uint8_t vector; -uint8_t delivery_mode; -uint32_t dest; -uint8_t dest_mode; - -/* only used by MSI interrupt remapping */ -
[Qemu-devel] [PATCH v5 9/9] x86_iommu/amd: Enable Guest virtual APIC support
Now that amd-iommu support interrupt remapping, enable the GASup in IVRS table and GASup in extended feature register to indicate that IOMMU support guest virtual APIC mode. GASup provides option to guest OS to make use of 128-bit IRTE. Note that the GAMSup is set to zero to indicate that amd-iommu does not support guest virtual APIC mode (aka AVIC) which would be used for the nested VMs. See Table 21 from IOMMU spec for interrupt virtualization controls Signed-off-by: Brijesh Singh Reviewed-by: Peter Xu Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/acpi-build.c | 3 ++- hw/i386/amd_iommu.h | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 1ef396d..236a20e 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2518,7 +2518,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker) build_append_int_noprefix(table_data, (48UL << 30) | /* HATS */ (48UL << 28) | /* GATS */ - (1UL << 2),/* GTSup */ + (1UL << 2) | /* GTSup */ + (1UL << 6),/* GASup */ 4); /* * Type 1 device entry reporting all devices diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h index 8061e9c..687fcd8 100644 --- a/hw/i386/amd_iommu.h +++ b/hw/i386/amd_iommu.h @@ -176,7 +176,7 @@ /* extended feature support */ #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \ AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \ -AMDVI_GATS_MODE | AMDVI_HATS_MODE) +AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA) /* capabilities header */ #define AMDVI_CAPAB_FEATURES (AMDVI_CAPAB_FLAT_EXT | \ -- 2.7.4
[Qemu-devel] [Bug 1701449] Re: high memory usage when using rbd with client caching
@Nick: if you can recreate the librbd memory growth, any chance you can help test a potential fix [1]? [1] https://github.com/ceph/ceph/pull/24297 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1701449 Title: high memory usage when using rbd with client caching Status in QEMU: New Bug description: Hi, we are experiencing a quite high memory usage of a single qemu (used with KVM) process when using RBD with client caching as a disk backend. We are testing with 3GB memory qemu virtual machines and 128MB RBD client cache. When running 'fio' in the virtual machine you can see that after some time the machine uses a lot more memory (RSS) on the hypervisor than she should. We have seen values (in real production machines, no artificially fio tests) of 250% memory overhead. I reproduced this with qemu version 2.9 as well. Here the contents of our ceph.conf on the hypervisor: """ [client] rbd cache writethrough until flush = False rbd cache max dirty = 100663296 rbd cache size = 134217728 rbd cache target dirty = 50331648 """ How to reproduce: * create a virtual machine with a RBD backed disk (100GB or so) * install a linux distribution on it (we are using Ubuntu) * install fio (apt-get install fio) * run fio multiple times with (e.g.) the following test file: """ # This job file tries to mimic the Intel IOMeter File Server Access Pattern [global] description=Emulation of Intel IOmeter File Server Access Pattern randrepeat=0 filename=/root/test.dat # IOMeter defines the server loads as the following: # iodepth=1 Linear # iodepth=4 Very Light # iodepth=8 Light # iodepth=64Moderate # iodepth=256 Heavy iodepth=8 size=80g direct=0 ioengine=libaio [iometer] stonewall bs=4M rw=randrw [iometer_just_write] stonewall bs=4M rw=write [iometer_just_read] stonewall bs=4M rw=read """ You can measure the virtual machine RSS usage on the hypervisor with: virsh dommemstat | grep rss or if you are not using libvirt: grep RSS /proc//status When switching off the RBD client cache, all is ok again, as the process does not use so much memory anymore. There is already a ticket on the ceph bug tracker for this ([1]). However I can reproduce that memory behaviour only when using qemu (maybe it is using librbd in a special way?). Running directly 'fio' with the rbd engine does not result in that high memory usage. [1] http://tracker.ceph.com/issues/20054 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1701449/+subscriptions
[Qemu-devel] [PATCH v5 1/9] x86_iommu: move the kernel-irqchip check in common code
Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD platforms. Move the check in common place. Signed-off-by: Brijesh Singh Reviewed-by: Peter Xu Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/intel_iommu.c | 7 --- hw/i386/x86-iommu.c | 9 + 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 3dfada1..84dbc20 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3248,13 +3248,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) { X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); -/* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */ -if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() && -!kvm_irqchip_is_split()) { -error_setg(errp, "Intel Interrupt Remapping cannot work with " - "kernel-irqchip=on, please use 'split|off'."); -return false; -} if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu->intr_supported) { error_setg(errp, "eim=on cannot be selected without intremap=on"); return false; diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c index 8a01a2d..7440cb8 100644 --- a/hw/i386/x86-iommu.c +++ b/hw/i386/x86-iommu.c @@ -25,6 +25,7 @@ #include "qapi/error.h" #include "qemu/error-report.h" #include "trace.h" +#include "sysemu/kvm.h" void x86_iommu_iec_register_notifier(X86IOMMUState *iommu, iec_notify_fn fn, void *data) @@ -94,6 +95,14 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp) return; } +/* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */ +if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() && +!kvm_irqchip_is_split()) { +error_setg(errp, "Interrupt Remapping cannot work with " + "kernel-irqchip=on, please use 'split|off'."); +return; +} + if (x86_class->realize) { x86_class->realize(dev, errp); } -- 2.7.4
[Qemu-devel] [PATCH v5 7/9] i386: acpi: add IVHD device entry for IOAPIC
When interrupt remapping is enabled, add a special IVHD device (type IOAPIC). Signed-off-by: Brijesh Singh Acked-by: Peter Xu Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/acpi-build.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 1599caa..1ef396d 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2467,9 +2467,12 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker) * IVRS table as specified in AMD IOMMU Specification v2.62, Section 5.2 * accessible here http://support.amd.com/TechDocs/48882_IOMMU.pdf */ +#define IOAPIC_SB_DEVID (uint64_t)PCI_BUILD_BDF(0, PCI_DEVFN(0x14, 0)) + static void build_amd_iommu(GArray *table_data, BIOSLinker *linker) { +int ivhd_table_len = 28; int iommu_start = table_data->len; AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default()); @@ -2491,8 +2494,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker) (1UL << 6) | /* PrefSup */ (1UL << 7), /* PPRSup */ 1); + +/* + * When interrupt remapping is supported, we add a special IVHD device + * for type IO-APIC. + */ +if (x86_iommu_get_default()->intr_supported) { +ivhd_table_len += 8; +} /* IVHD length */ -build_append_int_noprefix(table_data, 28, 2); +build_append_int_noprefix(table_data, ivhd_table_len, 2); /* DeviceID */ build_append_int_noprefix(table_data, s->devid, 2); /* Capability offset */ @@ -2516,6 +2527,21 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker) */ build_append_int_noprefix(table_data, 0x001, 4); +/* + * Add a special IVHD device type. + * Refer to spec - Table 95: IVHD device entry type codes + * + * Linux IOMMU driver checks for the special IVHD device (type IO-APIC). + * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059' + */ +if (x86_iommu_get_default()->intr_supported) { +build_append_int_noprefix(table_data, + (0x1ull << 56) | /* type IOAPIC */ + (IOAPIC_SB_DEVID << 40) | /* IOAPIC devid */ + 0x48, /* special device */ + 8); +} + build_header(linker, table_data, (void *)(table_data->data + iommu_start), "IVRS", table_data->len - iommu_start, 1, NULL, NULL); } -- 2.7.4
[Qemu-devel] [PATCH v5 6/9] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
Emulate the interrupt remapping support when guest virtual APIC is not enabled. For more info Refer: AMD IOMMU spec Rev 3.0 - section 2.2.5.1 When VAPIC is not enabled, it uses interrupt remapping as defined in Table 20 and Figure 15 from IOMMU spec. Signed-off-by: Brijesh Singh Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/amd_iommu.c | 199 ++- hw/i386/amd_iommu.h | 46 +++- hw/i386/trace-events | 7 ++ 3 files changed, 250 insertions(+), 2 deletions(-) diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c index 9118a75..8e2f13c 100644 --- a/hw/i386/amd_iommu.c +++ b/hw/i386/amd_iommu.c @@ -28,6 +28,7 @@ #include "qemu/error-report.h" #include "hw/i386/apic_internal.h" #include "trace.h" +#include "hw/i386/apic-msidef.h" /* used AMD-Vi MMIO registers */ const char *amdvi_mmio_low[] = { @@ -1032,21 +1033,146 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr, return ret; } +static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte, + union irte *irte, uint16_t devid) +{ +uint64_t irte_root, offset; + +irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK; +offset = (origin->data & AMDVI_IRTE_OFFSET) << 2; + +trace_amdvi_ir_irte(irte_root, offset); + +if (dma_memory_read(&address_space_memory, irte_root + offset, +irte, sizeof(*irte))) { +trace_amdvi_ir_err("failed to get irte"); +return -AMDVI_IR_GET_IRTE; +} + +trace_amdvi_ir_irte_val(irte->val); + +return 0; +} + +static int amdvi_int_remap_legacy(AMDVIState *iommu, + MSIMessage *origin, + MSIMessage *translated, + uint64_t *dte, + X86IOMMUIrq *irq, + uint16_t sid) +{ +int ret; +union irte irte; + +/* get interrupt remapping table */ +ret = amdvi_get_irte(iommu, origin, dte, &irte, sid); +if (ret < 0) { +return ret; +} + +if (!irte.fields.valid) { +trace_amdvi_ir_target_abort("RemapEn is disabled"); +return -AMDVI_IR_TARGET_ABORT; +} + +if (irte.fields.guest_mode) { +error_report_once("guest mode is not zero"); +return -AMDVI_IR_ERR; +} + +if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) { +error_report_once("reserved int_type"); +return -AMDVI_IR_ERR; +} + +irq->delivery_mode = irte.fields.int_type; +irq->vector = irte.fields.vector; +irq->dest_mode = irte.fields.dm; +irq->redir_hint = irte.fields.rq_eoi; +irq->dest = irte.fields.destination; + +return 0; +} + +static int __amdvi_int_remap_msi(AMDVIState *iommu, + MSIMessage *origin, + MSIMessage *translated, + uint64_t *dte, + X86IOMMUIrq *irq, + uint16_t sid) +{ +uint8_t int_ctl; + +int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3; +trace_amdvi_ir_intctl(int_ctl); + +switch (int_ctl) { +case AMDVI_IR_INTCTL_PASS: +memcpy(translated, origin, sizeof(*origin)); +return 0; +case AMDVI_IR_INTCTL_REMAP: +break; +case AMDVI_IR_INTCTL_ABORT: +trace_amdvi_ir_target_abort("int_ctl abort"); +return -AMDVI_IR_TARGET_ABORT; +default: +trace_amdvi_ir_err("int_ctl reserved"); +return -AMDVI_IR_ERR; +} + +return amdvi_int_remap_legacy(iommu, origin, translated, dte, irq, sid); +} + /* Interrupt remapping for MSI/MSI-X entry */ static int amdvi_int_remap_msi(AMDVIState *iommu, MSIMessage *origin, MSIMessage *translated, uint16_t sid) { +int ret = 0; +uint64_t pass = 0; +uint64_t dte[4] = { 0 }; +X86IOMMUIrq irq = { 0 }; +uint8_t dest_mode, delivery_mode; + assert(origin && translated); +/* + * When IOMMU is enabled, interrupt remap request will come either from + * IO-APIC or PCI device. If interrupt is from PCI device then it will + * have a valid requester id but if the interrupt is from IO-APIC + * then requester id will be invalid. + */ +if (sid == X86_IOMMU_SID_INVALID) { +sid = AMDVI_IOAPIC_SB_DEVID; +} + trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid); -if (!iommu || !X86_IOMMU_DEVICE(iommu)->intr_supported) { +/* check if device table entry is set before we go further. */ +if (!iommu || !iommu->devtab_len) { memcpy(translated, origin, sizeof(*origin)); goto out; } +if (!amdvi_get_dte(iomm
[Qemu-devel] [PATCH v5 4/9] x86_iommu/amd: make the address space naming consistent with intel-iommu
To be consistent with intel-iommu: - rename the address space to use '_' instead of '-' - update the memory region relationships Signed-off-by: Brijesh Singh Reviewed-by: Peter Xu Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/amd_iommu.c | 34 +++--- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c index 7206bb0..4bec1c6 100644 --- a/hw/i386/amd_iommu.c +++ b/hw/i386/amd_iommu.c @@ -55,6 +55,7 @@ struct AMDVIAddressSpace { uint8_t bus_num;/* bus number */ uint8_t devfn; /* device function */ AMDVIState *iommu_state;/* AMDVI - one per machine */ +MemoryRegion root; /* AMDVI Root memory map region */ IOMMUMemoryRegion iommu;/* Device's address translation region */ MemoryRegion iommu_ir; /* Device's interrupt remapping region */ AddressSpace as;/* device's corresponding address space */ @@ -1032,8 +1033,9 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr, static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) { +char name[128]; AMDVIState *s = opaque; -AMDVIAddressSpace **iommu_as; +AMDVIAddressSpace **iommu_as, *amdvi_dev_as; int bus_num = pci_bus_num(bus); iommu_as = s->address_spaces[bus_num]; @@ -1046,19 +1048,37 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) /* set up AMD-Vi region */ if (!iommu_as[devfn]) { +snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn); + iommu_as[devfn] = g_malloc0(sizeof(AMDVIAddressSpace)); iommu_as[devfn]->bus_num = (uint8_t)bus_num; iommu_as[devfn]->devfn = (uint8_t)devfn; iommu_as[devfn]->iommu_state = s; -memory_region_init_iommu(&iommu_as[devfn]->iommu, - sizeof(iommu_as[devfn]->iommu), +amdvi_dev_as = iommu_as[devfn]; + +/* + * Memory region relationships looks like (Address range shows + * only lower 32 bits to make it short in length...): + * + * |-+---+--| + * | Name| Address range | Priority | + * |-+---+--+ + * | amdvi_root | - |0 | + * | amdvi_iommu| - |1 | + * |-+---+--| + */ +memory_region_init_iommu(&amdvi_dev_as->iommu, + sizeof(amdvi_dev_as->iommu), TYPE_AMD_IOMMU_MEMORY_REGION, OBJECT(s), - "amd-iommu", UINT64_MAX); -address_space_init(&iommu_as[devfn]->as, - MEMORY_REGION(&iommu_as[devfn]->iommu), - "amd-iommu"); + "amd_iommu", UINT64_MAX); +memory_region_init(&amdvi_dev_as->root, OBJECT(s), + "amdvi_root", UINT64_MAX); +address_space_init(&amdvi_dev_as->as, &amdvi_dev_as->root, name); +memory_region_add_subregion_overlap(&amdvi_dev_as->root, 0, + MEMORY_REGION(&amdvi_dev_as->iommu), +1); } return &iommu_as[devfn]->as; } -- 2.7.4
[Qemu-devel] [PATCH v5 0/9] x86_iommu/amd: add interrupt remap support
This series adds the interrupt remapping support for amd-iommu device. IOMMU spec is available at: https://support.amd.com/TechDocs/48882_IOMMU.pdf To enable the interrupt remap use below qemu cli # $QEMU \ -device amd-iommu,intremap=on I have tested FC-28 and Ubuntu 18.04 guest. Linux guest bootup log shows the interrupt remap supports: [root@localhost ~]# dmesg | grep -i AMD-Vi [0.001761] AMD-Vi: Using IVHD type 0x10 [0.003051] AMD-Vi: device: 00:03.0 cap: 0040 seg: 0 flags: d1 info [0.004007] AMD-Vi:mmio-addr: fed8 [0.004874] AMD-Vi: DEV_ALLflags: 00 [0.006236] AMD-Vi: DEV_SPECIAL(IOAPIC[0]) devid: 00:14.0 [0.667943] AMD-Vi: Found IOMMU at :00:03.0 cap 0x40 [0.668727] AMD-Vi: Extended features (0x29d3): [0.669874] AMD-Vi: Interrupt remapping enabled [0.671074] AMD-Vi: Lazy IO/TLB flushing enabled cat /proc/interrupts confirms that its using IR [root@localhost ~]# cat /proc/interrupts CPU0 0: 40 IR-IO-APIC2-edge timer 1: 9 IR-IO-APIC1-edge i8042 4: 1770 IR-IO-APIC4-edge ttyS0 7: 0 IR-IO-APIC7-edge parport0 8: 1 IR-IO-APIC8-edge rtc0 9: 0 IR-IO-APIC9-fasteoi acpi 12: 15 IR-IO-APIC 12-edge i8042 16: 0 IR-IO-APIC 16-fasteoi i801_smbus 24: 0 PCI-MSI 49152-edge AMD-Vi 25: 13070 IR-PCI-MSI 512000-edge ahci[:00:1f.2] 26: 86 IR-PCI-MSI 32768-edge enp0s2-rx-0 27:139 IR-PCI-MSI 32769-edge enp0s2-tx-0 28: 1 IR-PCI-MSI 32770-edge enp0s2 NMI: 0 Non-maskable interrupts LOC: 26686 Local timer interrupts SPU: 0 Spurious interrupts ... ... Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit Cc: Peter Xu Changes since v4: - Based on Peter's comment use error_report_once in patch 5 and tracepoint in Patch 6 Changes since v3: - do not treat errors as a passthrough (where applicable) - use error_report_once when applicable Changes since v2: - make the address space rename a separate patch - fix the V=1 check in patch 2 - add more comments in the patches - use amdvi_target_abort trace point where applicable - use error_report_once where applicable - do not cause exit() when configuration mismatch is detected, but log the error so that user knows about it. Changes since v1: - move vtd_generate_msi_message to common code - fix the dest_mode bit extraction - add more comments explaining why we add the special device - some minor cleanups based on Peter's feedbacks Brijesh Singh (9): x86_iommu: move the kernel-irqchip check in common code x86_iommu: move vtd_generate_msi_message in common file x86_iommu/amd: remove V=1 check from amdvi_validate_dte() x86_iommu/amd: make the address space naming consistent with intel-iommu x86_iommu/amd: Prepare for interrupt remap support x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled i386: acpi: add IVHD device entry for IOAPIC x86_iommu/amd: Add interrupt remap support when VAPIC is enabled x86_iommu/amd: Enable Guest virtual APIC support hw/i386/acpi-build.c | 31 +++- hw/i386/amd_iommu.c | 414 +- hw/i386/amd_iommu.h | 96 +- hw/i386/intel_iommu.c | 39 +--- hw/i386/trace-events | 14 ++ hw/i386/x86-iommu.c | 33 include/hw/i386/intel_iommu.h | 59 -- include/hw/i386/x86-iommu.h | 66 +++ 8 files changed, 643 insertions(+), 109 deletions(-) -- 2.7.4
[Qemu-devel] [PATCH v5 5/9] x86_iommu/amd: Prepare for interrupt remap support
Register the interrupt remapping callback and read/write ops for the amd-iommu-ir memory region. amd-iommu-ir is set to higher priority to ensure that this region won't be masked out by other memory regions. Signed-off-by: Brijesh Singh Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/amd_iommu.c | 106 +++ hw/i386/amd_iommu.h | 14 ++- hw/i386/trace-events | 5 +++ 3 files changed, 123 insertions(+), 2 deletions(-) diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c index 4bec1c6..9118a75 100644 --- a/hw/i386/amd_iommu.c +++ b/hw/i386/amd_iommu.c @@ -26,6 +26,7 @@ #include "amd_iommu.h" #include "qapi/error.h" #include "qemu/error-report.h" +#include "hw/i386/apic_internal.h" #include "trace.h" /* used AMD-Vi MMIO registers */ @@ -1031,6 +1032,99 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion *iommu, hwaddr addr, return ret; } +/* Interrupt remapping for MSI/MSI-X entry */ +static int amdvi_int_remap_msi(AMDVIState *iommu, + MSIMessage *origin, + MSIMessage *translated, + uint16_t sid) +{ +assert(origin && translated); + +trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid); + +if (!iommu || !X86_IOMMU_DEVICE(iommu)->intr_supported) { +memcpy(translated, origin, sizeof(*origin)); +goto out; +} + +if (origin->address & AMDVI_MSI_ADDR_HI_MASK) { +trace_amdvi_err("MSI address high 32 bits non-zero when " +"Interrupt Remapping enabled."); +return -AMDVI_IR_ERR; +} + +if ((origin->address & AMDVI_MSI_ADDR_LO_MASK) != APIC_DEFAULT_ADDRESS) { +trace_amdvi_err("MSI is not from IOAPIC."); +return -AMDVI_IR_ERR; +} + +out: +trace_amdvi_ir_remap_msi(origin->address, origin->data, + translated->address, translated->data); +return 0; +} + +static int amdvi_int_remap(X86IOMMUState *iommu, + MSIMessage *origin, + MSIMessage *translated, + uint16_t sid) +{ +return amdvi_int_remap_msi(AMD_IOMMU_DEVICE(iommu), origin, + translated, sid); +} + +static MemTxResult amdvi_mem_ir_write(void *opaque, hwaddr addr, + uint64_t value, unsigned size, + MemTxAttrs attrs) +{ +int ret; +MSIMessage from = { 0, 0 }, to = { 0, 0 }; +uint16_t sid = AMDVI_IOAPIC_SB_DEVID; + +from.address = (uint64_t) addr + AMDVI_INT_ADDR_FIRST; +from.data = (uint32_t) value; + +trace_amdvi_mem_ir_write_req(addr, value, size); + +if (!attrs.unspecified) { +/* We have explicit Source ID */ +sid = attrs.requester_id; +} + +ret = amdvi_int_remap_msi(opaque, &from, &to, sid); +if (ret < 0) { +/* TODO: log the event using IOMMU log event interface */ +error_report_once("failed to remap interrupt from devid 0x%x", sid); +return MEMTX_ERROR; +} + +apic_get_class()->send_msi(&to); + +trace_amdvi_mem_ir_write(to.address, to.data); +return MEMTX_OK; +} + +static MemTxResult amdvi_mem_ir_read(void *opaque, hwaddr addr, + uint64_t *data, unsigned size, + MemTxAttrs attrs) +{ +return MEMTX_OK; +} + +static const MemoryRegionOps amdvi_ir_ops = { +.read_with_attrs = amdvi_mem_ir_read, +.write_with_attrs = amdvi_mem_ir_write, +.endianness = DEVICE_LITTLE_ENDIAN, +.impl = { +.min_access_size = 4, +.max_access_size = 4, +}, +.valid = { +.min_access_size = 4, +.max_access_size = 4, +} +}; + static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) { char name[128]; @@ -1066,6 +1160,7 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) * |-+---+--+ * | amdvi_root | - |0 | * | amdvi_iommu| - |1 | + * | amdvi_iommu_ir | fee0-feef | 64 | * |-+---+--| */ memory_region_init_iommu(&amdvi_dev_as->iommu, @@ -1076,6 +1171,13 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) memory_region_init(&amdvi_dev_as->root, OBJECT(s), "amdvi_root", UINT64_MAX); address_space_init(&amdvi_dev_as->as, &amdvi_dev_as->root, name); +memory_region_init_io(&amdvi_dev_as->iommu_ir, OBJECT(s), + &amdvi_ir_ops, s, "amd_iommu_ir", +
[Qemu-devel] [PATCH v5 3/9] x86_iommu/amd: remove V=1 check from amdvi_validate_dte()
Currently, the amdvi_validate_dte() assumes that a valid DTE will always have V=1. This is not true. The V=1 means that bit[127:1] are valid. A valid DTE can have IV=1 and V=0 (i.e address translation disabled and interrupt remapping enabled) Remove the V=1 check from amdvi_validate_dte(), make the caller responsible to check for V or IV bits. This also fixes a bug in existing code that when error is detected during the translation we'll fail the translation instead of assuming a passthrough mode. Signed-off-by: Brijesh Singh Reviewed-by: Peter Xu Cc: Peter Xu Cc: "Michael S. Tsirkin" Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Cc: Marcel Apfelbaum Cc: Tom Lendacky Cc: Suravee Suthikulpanit --- hw/i386/amd_iommu.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c index 1fd669f..7206bb0 100644 --- a/hw/i386/amd_iommu.c +++ b/hw/i386/amd_iommu.c @@ -807,7 +807,7 @@ static inline uint64_t amdvi_get_perms(uint64_t entry) AMDVI_DEV_PERM_SHIFT; } -/* a valid entry should have V = 1 and reserved bits honoured */ +/* validate that reserved bits are honoured */ static bool amdvi_validate_dte(AMDVIState *s, uint16_t devid, uint64_t *dte) { @@ -820,7 +820,7 @@ static bool amdvi_validate_dte(AMDVIState *s, uint16_t devid, return false; } -return dte[0] & AMDVI_DEV_VALID; +return true; } /* get a device table entry given the devid */ @@ -966,8 +966,12 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, hwaddr addr, return; } -/* devices with V = 0 are not translated */ if (!amdvi_get_dte(s, devid, entry)) { +return; +} + +/* devices with V = 0 are not translated */ +if (!(entry[0] & AMDVI_DEV_VALID)) { goto out; } -- 2.7.4
Re: [Qemu-devel] [PATCH v4 2/3] file-posix: Drop s->lock_fd
On 21.08.18 02:58, Fam Zheng wrote: > The lock_fd field is not strictly necessary because transferring locked > bytes from old fd to the new one shouldn't fail anyway. This spares the > user one fd per image. > > Signed-off-by: Fam Zheng > --- > block/file-posix.c | 37 + > 1 file changed, 13 insertions(+), 24 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v4 1/3] file-posix: Skip effectiveless OFD lock operations
On 21.08.18 02:58, Fam Zheng wrote: > If we know we've already locked the bytes, don't do it again; similarly > don't unlock a byte if we haven't locked it. This doesn't change the > behavior, but fixes a corner case explained below. > > Libvirt had an error handling bug that an image can get its (ownership, > file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind > QEMU. Specifically, an image in use by Libvirt VM has: > > $ ls -lhZ b.img > -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 b.img > > Trying to attach it a second time won't work because of image locking. > And after the error, it becomes: > > $ ls -lhZ b.img > -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img > > Then, we won't be able to do OFD lock operations with the existing fd. > In other words, the code such as in blk_detach_dev: > > blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort); > > can abort() QEMU, out of environmental changes. > > This patch is an easy fix to this and the change is regardlessly > reasonable, so do it. > > Signed-off-by: Fam Zheng > > --- > > v3: Don't misuse s->perm and s->shared_perm. > v2: For s == NULL, unlock all bits. [Kevin] > --- > block/file-posix.c | 54 +- > 1 file changed, 44 insertions(+), 10 deletions(-) Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
[Qemu-devel] [Bug 1701449] Re: high memory usage when using rbd with client caching
Any reason we are keeping this bug and #1674481 separate? We are not sure? -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1701449 Title: high memory usage when using rbd with client caching Status in QEMU: New Bug description: Hi, we are experiencing a quite high memory usage of a single qemu (used with KVM) process when using RBD with client caching as a disk backend. We are testing with 3GB memory qemu virtual machines and 128MB RBD client cache. When running 'fio' in the virtual machine you can see that after some time the machine uses a lot more memory (RSS) on the hypervisor than she should. We have seen values (in real production machines, no artificially fio tests) of 250% memory overhead. I reproduced this with qemu version 2.9 as well. Here the contents of our ceph.conf on the hypervisor: """ [client] rbd cache writethrough until flush = False rbd cache max dirty = 100663296 rbd cache size = 134217728 rbd cache target dirty = 50331648 """ How to reproduce: * create a virtual machine with a RBD backed disk (100GB or so) * install a linux distribution on it (we are using Ubuntu) * install fio (apt-get install fio) * run fio multiple times with (e.g.) the following test file: """ # This job file tries to mimic the Intel IOMeter File Server Access Pattern [global] description=Emulation of Intel IOmeter File Server Access Pattern randrepeat=0 filename=/root/test.dat # IOMeter defines the server loads as the following: # iodepth=1 Linear # iodepth=4 Very Light # iodepth=8 Light # iodepth=64Moderate # iodepth=256 Heavy iodepth=8 size=80g direct=0 ioengine=libaio [iometer] stonewall bs=4M rw=randrw [iometer_just_write] stonewall bs=4M rw=write [iometer_just_read] stonewall bs=4M rw=read """ You can measure the virtual machine RSS usage on the hypervisor with: virsh dommemstat | grep rss or if you are not using libvirt: grep RSS /proc//status When switching off the RBD client cache, all is ok again, as the process does not use so much memory anymore. There is already a ticket on the ceph bug tracker for this ([1]). However I can reproduce that memory behaviour only when using qemu (maybe it is using librbd in a special way?). Running directly 'fio' with the rbd engine does not result in that high memory usage. [1] http://tracker.ceph.com/issues/20054 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1701449/+subscriptions
Re: [Qemu-devel] [PATCH v3 2/3] scripts: add render_block_graph function for QEMUMachine
On 23.08.18 17:46, Vladimir Sementsov-Ogievskiy wrote: > Render block nodes graph with help of graphviz. This new function is > for debugging, so there is no sense to put it into qemu.py as a method > of QEMUMachine. Let's instead put it separately. > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > scripts/render_block_graph.py | 120 ++ > 1 file changed, 120 insertions(+) > create mode 100755 scripts/render_block_graph.py Reviewed-by: Max Reitz signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v4 22/24] numa: handle virtio-pmem in NUMA stats
* David Hildenbrand (da...@redhat.com) wrote: > Account the memory to node 0 for now. Once (if ever) virtio-pmem > supports NUMA, we can account it to the right node. > > Signed-off-by: David Hildenbrand Reviewed-by: Dr. David Alan Gilbert > --- > numa.c | 24 > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/numa.c b/numa.c > index 81542d4ebb..1ff1418c1e 100644 > --- a/numa.c > +++ b/numa.c > @@ -545,6 +545,7 @@ static void numa_stat_memory_devices(NumaNodeMem > node_mem[]) > MemoryDeviceInfoList *info_list = qmp_memory_device_list(); > MemoryDeviceInfoList *info; > PCDIMMDeviceInfo *pcdimm_info; > +VirtioPMemDeviceInfo *vpi; > > for (info = info_list; info; info = info->next) { > MemoryDeviceInfo *value = info->value; > @@ -552,22 +553,21 @@ static void numa_stat_memory_devices(NumaNodeMem > node_mem[]) > if (value) { > switch (value->type) { > case MEMORY_DEVICE_INFO_KIND_DIMM: > -pcdimm_info = value->u.dimm.data; > -break; > - > case MEMORY_DEVICE_INFO_KIND_NVDIMM: > -pcdimm_info = value->u.nvdimm.data; > -break; > - > -default: > -pcdimm_info = NULL; > -break; > -} > - > -if (pcdimm_info) { > +pcdimm_info = (value->type == MEMORY_DEVICE_INFO_KIND_DIMM) ? > + value->u.dimm.data : value->u.nvdimm.data; > node_mem[pcdimm_info->node].node_mem += pcdimm_info->size; > node_mem[pcdimm_info->node].node_plugged_mem += > pcdimm_info->size; > +break; > +case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM: > +vpi = value->u.virtio_pmem.data; > +/* TODO: once we support numa, assign to right node */ > +node_mem[0].node_mem += vpi->size; > +node_mem[0].node_plugged_mem += vpi->size; > +break; > +default: > +g_assert_not_reached(); > } > } > } > -- > 2.17.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [PATCH v4 21/24] hmp: handle virtio-pmem when printing memory device infos
* David Hildenbrand (da...@redhat.com) wrote: > Print the memory device info just like for PCDIMM/NVDIMM. > > Signed-off-by: David Hildenbrand Reviewed-by: Dr. David Alan Gilbert > --- > hmp.c | 27 +++ > 1 file changed, 15 insertions(+), 12 deletions(-) > > diff --git a/hmp.c b/hmp.c > index 3a9f797677..159f82e155 100644 > --- a/hmp.c > +++ b/hmp.c > @@ -2532,6 +2532,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict > *qdict) > Error *err = NULL; > MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err); > MemoryDeviceInfoList *info; > +VirtioPMemDeviceInfo *vpi; > MemoryDeviceInfo *value; > PCDIMMDeviceInfo *di; > > @@ -2541,19 +2542,9 @@ void hmp_info_memory_devices(Monitor *mon, const QDict > *qdict) > if (value) { > switch (value->type) { > case MEMORY_DEVICE_INFO_KIND_DIMM: > -di = value->u.dimm.data; > -break; > - > case MEMORY_DEVICE_INFO_KIND_NVDIMM: > -di = value->u.nvdimm.data; > -break; > - > -default: > -di = NULL; > -break; > -} > - > -if (di) { > +di = (value->type == MEMORY_DEVICE_INFO_KIND_DIMM) ? > + value->u.dimm.data : value->u.nvdimm.data; > monitor_printf(mon, "Memory device [%s]: \"%s\"\n", > MemoryDeviceInfoKind_str(value->type), > di->id ? di->id : ""); > @@ -2566,6 +2557,18 @@ void hmp_info_memory_devices(Monitor *mon, const QDict > *qdict) > di->hotplugged ? "true" : "false"); > monitor_printf(mon, " hotpluggable: %s\n", > di->hotpluggable ? "true" : "false"); > +break; > +case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM: > +vpi = value->u.virtio_pmem.data; > +monitor_printf(mon, "Memory device [%s]: \"%s\"\n", > + MemoryDeviceInfoKind_str(value->type), > + vpi->id ? vpi->id : ""); > +monitor_printf(mon, " memaddr: 0x%" PRIx64 "\n", > vpi->memaddr); > +monitor_printf(mon, " size: %" PRIu64 "\n", vpi->size); > +monitor_printf(mon, " memdev: %s\n", vpi->memdev); > +break; > +default: > +g_assert_not_reached(); > } > } > } > -- > 2.17.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)
On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote: > If we are going to have an indirection then we can also drop the > requirement to scale the TLB according to the number of MMU indexes we > have to support. It's fairly wasteful when a bunch of them are almost > never used unless you are running stuff that uses them. So with dynamic TLB sizing, what you're suggesting here is to resize each MMU array independently (depending on their use rate) instead of using a single "TLB size" for all MMU indexes. Am I understanding your point correctly? Thanks, E.
Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding reverse debugging
I've posted bug report with extended tests (incl. case without sleep=off). You may find guest image (kernel) in bug description. https://bugs.launchpad.net/qemu/+bug/1795369 The most annoying thing is that some issues are almost not reproducible. There are definitely race conditions somewhere in qemu code. Running 'stress-ng' utility with CPU and I/O stressors in parallel with qemu execution greatly minimizes amount of attempts when I'm trying to trigger some of issues I encounter. I'll try 'info monitor' command tomorrow, but no guarantees that I'll be able to reproduce issue again. Speaking about '-nographic' and SDL... I've noted that UI greatly minimizes possibility of hanging (but not avoids it completely) when using icount in general, so this effect isn't rr-specific. I've already reported this bug too. пн, 1 окт. 2018 г., 20:14 dovgaluk : > Artem Pisarenko писал 2018-09-30 14:01: > > Feature still broken :( > > Thanks for testing. > > > > > Brief description of my tests. > > > > Guest image is Linux, which just powers off after kernel boots > > (instead of proceeding to user-space /init or /sbin/init). > > Base cmdline: > > qemu-system-x86_64 -nodefaults -machine pc,accel=tcg -m 2048 -cpu > > qemu64 -rtc clock=vm,base=2000-01-01T00:00:00 -kernel bzImage -initrd > > rootfs -append 'nokaslr console=ttyS0 rdinit=/init_poweroff' > > -nographic -serial SERIAL_VALUE -icount > > 1,sleep=off,rr=RR_VALUE,rrfile=icount_rr_capture.bin > > I've never tried it with sleep=off. Can you remove it and try again? > > We also seen a problem with '-nographic'. When we remove this option and > QEMU runs with SDL > window, everything is ok. There is some problem with main loop which may > sleep when there > is no GUI to update, or something like that. We couldn't fix it yet. > > > > > Test 1. When SERIAL_VALUE=none > > Running with RR_VALUE=record completes successfully. > > Running with RR_VALUE=replay doesn't completes. qemu process just > > eating ~100% cpu and memory usage doesn't grow after some moment. I > > don't see what happens because of problem no.2 (see below). > > Try 'info replay' monitor command. Does instruction counter increases? > > > > > Test 2. When SERIAL_VALUE=stdio > > Running with RR_VALUE=record completes successfully. > > > > Running with RR_VALUE=replay caues exit with error: > > > > "qemu-system-x86_64: Missing character write event in the replay log" > > > > These problems are same with qemu 2.12 (both vanilla and with previous > > versions of these patches applied). Furthemore, I consider whole > > icount mode broken and determinism isn't achievable. > > The irony is that I actually don't need record/replay feature. I've > > tried to use it only as instrument to debug failing determinism in > > qemu code. But since replay/record feature itself relies on > > determinism, which is broken, it's no wonder why it fails also (I just > > hoped to bypass it). > > > > Contact me if you need more details. I just tired a lot trying to get > > all these things working... Hope is leaving me... > > Can you share the kernel in case the icount still broken? > > > Pavel Dovgalyuk > > -- С уважением, Артем Писаренко
[Qemu-devel] [PULL 00/15] target/xtensa: preparation for FLIX support
Hi Peter, please pull the following series that rearranges target/xtensa code in preparation for FLIX support. The following changes since commit c5e4e49258e9b89cb34c085a419dd9f862935c48: Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-09-25' into staging (2018-09-25 16:47:35 +0100) are available in the git repository at: git://github.com/OSLL/qemu-xtensa.git tags/20181001-xtensa for you to fetch changes up to d74624e59a6087ccf233843b936aedc9f8a8050a: target/xtensa: extract gen_check_interrupts call (2018-10-01 11:08:36 -0700) target/xtensa: preparation for FLIX support Separate generation of per-instruction code (such as raising exceptions and terminating TB) from per-opcode code. Max Filippov (15): target/xtensa: extract test for an illegal instruction target/xtensa: extract test for privileged instruction target/xtensa: extract test for syscall instruction target/xtensa: extract test for debug exception target/xtensa: extract test for window overflow exception target/xtensa: extract test for window underflow exception target/xtensa: extract test for alloca exception target/xtensa: extract test for cpdisabled exception target/xtensa: extract test for division by zero target/xtensa: extract unconditional TB termination target/xtensa: change SR number checks to assertions target/xtensa: always end TB on CCOUNT access/CCOMPARE write target/xtensa: extract unconditional TB termination via slot 0 target/xtensa: make rsr/wsr helpers return void target/xtensa: extract gen_check_interrupts call target/xtensa/cpu.h | 37 +- target/xtensa/helper.c|6 + target/xtensa/helper.h|2 + target/xtensa/op_helper.c | 73 +- target/xtensa/translate.c | 2673 + 5 files changed, 1817 insertions(+), 974 deletions(-) Thanks. -- Max
Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
Peter Maydell writes: > I've been investigating a race condition where sometimes when my > guest writes to a device register which triggers a > qemu_system_reset_request(), it doesn't actually cause a clean reset, > but instead the guest CPU continues to execute instructions. > I managed to repro it under 'rr', which let me walk through enough > of what was going on to determine the following: > > When a guest CPU thread calls qemu_system_reset_request(), this > results in a call to qemu_cpu_stop(current_cpu, true), to > make the CPU come back out to the main loop. We also set the > reset_requested flag, to get the IO thread to actually do the > reset. > > The main loop thread runs main_loop_should_exit(). If there is a > pending reset, it calls pause_all_vcpus(), with the intention > that this quiesces all the guest CPUs before it starts messing > with reset actions. > > pause_all_vcpus() just waits for every cpu to have cpu->stopped set. > However, if the running cpu has just called qemu_cpu_stop() on > itself then it will have set cpu->stopped true but not actually > made it out to the main loop yet. (In the case I'm looking at, > what happens is that as soon as the CPU thread unlocks the > iothread mutex in io_writex() after the device write, the > main thread runs and does all the reset operations.) > > The reset code in the iothread then proceeds to start calling > various reset functions while the CPU thread is still inside > the exec loop, running generated code and so on. This doesn't > seem like what ought to happen. In particular it includes > calling cpu_common_reset(), which clears all kinds of flags > relevant to the still-executing CPU... I would have thought the reset code should be scheduled via safe async work to run in the vCPU context. Why should the main loop get involved at all here? > > Any suggestions for how we should fix this? > > thanks > -- PMM -- Alex Bennée
[Qemu-devel] [PULL 15/23] qcow2: Assign the L2 cache relatively to the image size
From: Leonid Bloch Sufficient L2 cache can noticeably improve the performance when using large images with frequent I/O. Previously, unless 'cache-size' was specified and was large enough, the L2 cache was set to a certain size without taking the virtual image size into account. Now, the L2 cache assignment is aware of the virtual size of the image, and will cover the entire image, unless the cache size needed for that is larger than a certain maximum. This maximum is set to 1 MB by default (enough to cover an 8 GB image with the default cluster size) but can be increased or decreased using the 'l2-cache-size' option. This option was previously documented as the *maximum* L2 cache size, and this patch makes it behave as such, instead of as a constant size. Also, the existing option 'cache-size' can limit the sum of both L2 and refcount caches, as previously. Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- docs/qcow2-cache.txt | 15 ++- block/qcow2.h | 4 +--- block/qcow2.c | 21 + qemu-options.hx| 6 +++--- tests/qemu-iotests/137 | 8 +++- tests/qemu-iotests/137.out | 4 +++- 6 files changed, 33 insertions(+), 25 deletions(-) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 7e28b41bd3..750447ea4f 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -125,8 +125,12 @@ There are a few things that need to be taken into account: - Both caches must have a size that is a multiple of the cluster size (or the cache entry size: see "Using smaller cache sizes" below). - - The default L2 cache size is 8 clusters or 1MB (whichever is more), - and the minimum is 2 clusters (or 2 cache entries, see below). + - The maximum L2 cache size is 1 MB by default (enough for full coverage + of 8 GB images, with the default cluster size). This value can be + modified using the "l2-cache-size" option. QEMU will not use more memory + than needed to hold all of the image's L2 tables, regardless of this max. + value. The minimal L2 cache size is 2 clusters (or 2 cache entries, see + below). - The default (and minimum) refcount cache size is 4 clusters. @@ -184,9 +188,10 @@ Some things to take into account: always uses the cluster size as the entry size. - If the L2 cache is big enough to hold all of the image's L2 tables - (as explained in the "Choosing the right cache sizes" section - earlier in this document) then none of this is necessary and you - can omit the "l2-cache-entry-size" parameter altogether. + (as explained in the "Choosing the right cache sizes" and "How to + configure the cache sizes" sections in this document) then none of + this is necessary and you can omit the "l2-cache-entry-size" + parameter altogether. Reducing the memory usage diff --git a/block/qcow2.h b/block/qcow2.h index a8d6f757b1..2f8c1fd15c 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -74,9 +74,7 @@ /* Must be at least 4 to cover all cases of refcount table growth */ #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */ -/* Whichever is more */ -#define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */ -#define DEFAULT_L2_CACHE_SIZE S_1MiB +#define DEFAULT_L2_CACHE_MAX_SIZE S_1MiB #define DEFAULT_CLUSTER_SIZE S_64KiB diff --git a/block/qcow2.c b/block/qcow2.c index cd0053b6ee..589f6c1b1c 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -777,29 +777,35 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, uint64_t *refcount_cache_size, Error **errp) { BDRVQcow2State *s = bs->opaque; -uint64_t combined_cache_size; +uint64_t combined_cache_size, l2_cache_max_setting; bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set; int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size; +uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE; +uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8); combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE); l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE); refcount_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE); combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0); -*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0); +l2_cache_max_setting = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, + DEFAULT_L2_CACHE_MAX_SIZE); *refcount_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0); *l2_cache_entry_size = qemu_opt_get_size( opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size); +*l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting); + if (combined_cache_size_set) { if (l2_cache_size_set && re
[Qemu-devel] [PULL 22/23] test-replication: Lock AioContext around blk_unref()
Recently, the test case has started failing because some job related functions want to drop the AioContext lock even though it hasn't been taken: (gdb) bt #0 0x7f51c067c9fb in raise () from /lib64/libc.so.6 #1 0x7f51c067e77d in abort () from /lib64/libc.so.6 #2 0x558c9d5dde7b in error_exit (err=, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36 #3 0x558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 #4 0x558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 #5 0x558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738 #6 0x558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 , errp=errp@entry=0x0) at job.c:986 #7 0x558c9d5eb8ee in job_cancel_sync (job=) at job.c:941 #8 0x558c9d64d853 in replication_close (bs=) at block/replication.c:148 #9 0x558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 #12 0x558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783 #13 0x558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402 #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 #15 0x558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478 #16 0x7f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #17 0x7f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #18 0x7f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #19 0x7f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0 #20 0x7f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 #21 0x558c9d5de31f in main (argc=, argv=) at tests/test-replication.c:581 It is yet unclear whether this should really be considered a bug in the test case or whether blk_unref() should work for callers that haven't taken the AioContext lock, but in order to fix the build tests quickly, just take the AioContext lock around blk_unref(). Signed-off-by: Kevin Wolf --- tests/test-replication.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/tests/test-replication.c b/tests/test-replication.c index c8165ae954..f085d1993a 100644 --- a/tests/test-replication.c +++ b/tests/test-replication.c @@ -207,13 +207,17 @@ static BlockBackend *start_primary(void) static void teardown_primary(void) { BlockBackend *blk; +AioContext *ctx; /* remove P_ID */ blk = blk_by_name(P_ID); assert(blk); +ctx = blk_get_aio_context(blk); +aio_context_acquire(ctx); monitor_remove_blk(blk); blk_unref(blk); +aio_context_release(ctx); } static void test_primary_read(void) @@ -365,20 +369,27 @@ static void teardown_secondary(void) { /* only need to destroy two BBs */ BlockBackend *blk; +AioContext *ctx; /* remove S_LOCAL_DISK_ID */ blk = blk_by_name(S_LOCAL_DISK_ID); assert(blk); +ctx = blk_get_aio_context(blk); +aio_context_acquire(ctx); monitor_remove_blk(blk); blk_unref(blk); +aio_context_release(ctx); /* remove S_ID */ blk = blk_by_name(S_ID); assert(blk); +ctx = blk_get_aio_context(blk); +aio_context_acquire(ctx); monitor_remove_blk(blk); blk_unref(blk); +aio_context_release(ctx); } static void test_secondary_read(void) -- 2.13.6
[Qemu-devel] [PULL 18/23] qcow2: Set the default cache-clean-interval to 10 minutes
From: Leonid Bloch The default cache-clean-interval is set to 10 minutes, in order to lower the overhead of the qcow2 caches (before the default was 0, i.e. disabled). * For non-Linux platforms the default is kept at 0, because cache-clean-interval is not supported there yet. Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- qapi/block-core.json | 3 ++- docs/qcow2-cache.txt | 4 ++-- block/qcow2.h| 4 +++- block/qcow2.c| 2 +- qemu-options.hx | 2 +- 5 files changed, 9 insertions(+), 6 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index ac3b48ee54..46dac23d2f 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2895,7 +2895,8 @@ # # @cache-clean-interval: clean unused entries in the L2 and refcount # caches. The interval is in seconds. The default value -# is 0 and it disables this feature (since 2.5) +# is 600, and 0 disables this feature. (since 2.5) +# # @encrypt: Image decryption options. Mandatory for # encrypted images, except when doing a metadata-only # probe of the image. (since 2.10) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 1fcc0658b2..59358b816f 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -210,8 +210,8 @@ This example removes all unused cache entries every 15 minutes: -drive file=hd.qcow2,cache-clean-interval=900 -If unset, the default value for this parameter is 0 and it disables -this feature. +If unset, the default value for this parameter is 600. Setting it to 0 +disables this feature. Note that this functionality currently relies on the MADV_DONTNEED argument for madvise() to actually free the memory. This is a diff --git a/block/qcow2.h b/block/qcow2.h index 0f0e3534bf..ba430316b9 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -76,13 +76,15 @@ #ifdef CONFIG_LINUX #define DEFAULT_L2_CACHE_MAX_SIZE S_32MiB +#define DEFAULT_CACHE_CLEAN_INTERVAL 600 /* seconds */ #else #define DEFAULT_L2_CACHE_MAX_SIZE S_8MiB +/* Cache clean interval is currently available only on Linux, so must be 0 */ +#define DEFAULT_CACHE_CLEAN_INTERVAL 0 #endif #define DEFAULT_CLUSTER_SIZE S_64KiB - #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts" #define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request" #define QCOW2_OPT_DISCARD_SNAPSHOT "pass-discard-snapshot" diff --git a/block/qcow2.c b/block/qcow2.c index 20b5093269..95e1c98daa 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -944,7 +944,7 @@ static int qcow2_update_options_prepare(BlockDriverState *bs, /* New interval for cache cleanup timer */ r->cache_clean_interval = qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL, -s->cache_clean_interval); +DEFAULT_CACHE_CLEAN_INTERVAL); #ifndef CONFIG_LINUX if (r->cache_clean_interval != 0) { error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL diff --git a/qemu-options.hx b/qemu-options.hx index 14aee78c6c..52d9d9f06d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -747,7 +747,7 @@ it which is not used for the L2 cache) @item cache-clean-interval Clean unused entries in the L2 and refcount caches. The interval is in seconds. -The default value is 0 and it disables this feature. +The default value is 600. Setting it to 0 disables this feature. @item pass-discard-request Whether discard requests to the qcow2 device should be forwarded to the data -- 2.13.6
[Qemu-devel] [PULL 21/23] qcow2: Fix cache-clean-interval documentation
From: Leonid Bloch Fixing cache-clean-interval documentation following the recent change to a default of 600 seconds on supported plarforms (only Linux currently). Signed-off-by: Leonid Bloch Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- qapi/block-core.json | 3 ++- docs/qcow2-cache.txt | 20 ++-- qemu-options.hx | 3 ++- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 46dac23d2f..25b8a0e744 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2895,7 +2895,8 @@ # # @cache-clean-interval: clean unused entries in the L2 and refcount # caches. The interval is in seconds. The default value -# is 600, and 0 disables this feature. (since 2.5) +# is 600 on supporting platforms, and 0 on other +# platforms. 0 disables this feature. (since 2.5) # # @encrypt: Image decryption options. Mandatory for # encrypted images, except when doing a metadata-only diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 59358b816f..c459bf5dd3 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -202,18 +202,18 @@ Reducing the memory usage It is possible to clean unused cache entries in order to reduce the memory usage during periods of low I/O activity. -The parameter "cache-clean-interval" defines an interval (in seconds). -All cache entries that haven't been accessed during that interval are -removed from memory. +The parameter "cache-clean-interval" defines an interval (in seconds), +after which all the cache entries that haven't been accessed during the +interval are removed from memory. Setting this parameter to 0 disables this +feature. -This example removes all unused cache entries every 15 minutes: +The following example removes all unused cache entries every 15 minutes: -drive file=hd.qcow2,cache-clean-interval=900 -If unset, the default value for this parameter is 600. Setting it to 0 -disables this feature. +If unset, the default value for this parameter is 600 on platforms which +support this functionality, and is 0 (disabled) on other platforms. -Note that this functionality currently relies on the MADV_DONTNEED -argument for madvise() to actually free the memory. This is a -Linux-specific feature, so cache-clean-interval is not supported in -other systems. +This functionality currently relies on the MADV_DONTNEED argument for +madvise() to actually free the memory. This is a Linux-specific feature, +so cache-clean-interval is not supported on other systems. diff --git a/qemu-options.hx b/qemu-options.hx index 52d9d9f06d..f139459e80 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -747,7 +747,8 @@ it which is not used for the L2 cache) @item cache-clean-interval Clean unused entries in the L2 and refcount caches. The interval is in seconds. -The default value is 600. Setting it to 0 disables this feature. +The default value is 600 on supporting platforms, and 0 on other platforms. +Setting it to 0 disables this feature. @item pass-discard-request Whether discard requests to the qcow2 device should be forwarded to the data -- 2.13.6
[Qemu-devel] [PULL 13/23] qcow2: Make sizes more humanly readable
From: Leonid Bloch Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- block/qcow2.h | 9 + block/qcow2.c | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/block/qcow2.h b/block/qcow2.h index 81b844e936..a8d6f757b1 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -27,6 +27,7 @@ #include "crypto/block.h" #include "qemu/coroutine.h" +#include "qemu/units.h" //#define DEBUG_ALLOC //#define DEBUG_ALLOC2 @@ -43,11 +44,11 @@ /* 8 MB refcount table is enough for 2 PB images at 64k cluster size * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */ -#define QCOW_MAX_REFTABLE_SIZE 0x80 +#define QCOW_MAX_REFTABLE_SIZE S_8MiB /* 32 MB L1 table is enough for 2 PB images at 64k cluster size * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */ -#define QCOW_MAX_L1_SIZE 0x200 +#define QCOW_MAX_L1_SIZE S_32MiB /* Allow for an average of 1k per snapshot table entry, should be plenty of * space for snapshot names and IDs */ @@ -75,9 +76,9 @@ /* Whichever is more */ #define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */ -#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */ +#define DEFAULT_L2_CACHE_SIZE S_1MiB -#define DEFAULT_CLUSTER_SIZE 65536 +#define DEFAULT_CLUSTER_SIZE S_64KiB #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts" diff --git a/block/qcow2.c b/block/qcow2.c index c13153735a..d2c07ce9fe 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -830,7 +830,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, } } else { if (!l2_cache_size_set) { -*l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE, +*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE, (uint64_t)DEFAULT_L2_CACHE_CLUSTERS * s->cluster_size); } -- 2.13.6
[Qemu-devel] [PULL 23/23] tests/test-bdrv-drain: Fix too late qemu_event_reset()
qemu_event_reset() must be called before the AIO request in a different iothread is submitted. Otherwise the request could be completed before we do the qemu_event_reset() and the test would hang in qemu_event_wait(). Signed-off-by: Kevin Wolf Reviewed-by: Max Reitz Tested-by: Max Reitz --- tests/test-bdrv-drain.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index c9f29c8b10..ee1740ff06 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -694,6 +694,8 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread) s->bh_indirection_ctx = ctx_b; aio_ret = -EINPROGRESS; +qemu_event_reset(&done_event); + if (drain_thread == 0) { acb = blk_aio_preadv(blk, 0, &qiov, 0, test_iothread_aio_cb, &aio_ret); } else { @@ -723,7 +725,6 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread) * but the drain in this thread can continue immediately after * bdrv_dec_in_flight() and aio_ret might be assigned only slightly * later. */ -qemu_event_reset(&done_event); do_drain_begin(drain_type, bs); g_assert_cmpint(bs->in_flight, ==, 0); @@ -743,7 +744,6 @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread) } break; case 1: -qemu_event_reset(&done_event); aio_bh_schedule_oneshot(ctx_a, test_iothread_drain_entry, &data); qemu_event_wait(&done_event); break; -- 2.13.6
[Qemu-devel] [PULL 20/23] block-backend: Set werror/rerror defaults in blk_new()
Currently, the default values for werror and rerror have to be set explicitly with blk_set_on_error() by the callers of blk_new(). The only caller actually doing this is blockdev_init(), which is called for BlockBackends created using -drive. In particular, anonymous BlockBackends created with -device ...,drive= didn't get the correct default set and instead defaulted to the integer value 0 (= BLOCKDEV_ON_ERROR_REPORT). This is the intended default for rerror anyway, but the default for werror should be BLOCKDEV_ON_ERROR_ENOSPC. Set the defaults in blk_new() instead so that they apply no matter what way the BlockBackend was created. Cc: qemu-sta...@nongnu.org Signed-off-by: Kevin Wolf Reviewed-by: Eric Blake Reviewed-by: Fam Zheng --- block/block-backend.c | 3 +++ tests/qemu-iotests/067.out | 1 + 2 files changed, 4 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index 7b1ec5071b..dc0cd57724 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -325,6 +325,9 @@ BlockBackend *blk_new(uint64_t perm, uint64_t shared_perm) blk->shared_perm = shared_perm; blk_set_enable_write_cache(blk, true); +blk->on_read_error = BLOCKDEV_ON_ERROR_REPORT; +blk->on_write_error = BLOCKDEV_ON_ERROR_ENOSPC; + block_acct_init(&blk->stats); notifier_list_init(&blk->remove_bs_notifiers); diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out index 2e71cff3ce..b10c71db03 100644 --- a/tests/qemu-iotests/067.out +++ b/tests/qemu-iotests/067.out @@ -385,6 +385,7 @@ Testing: -device virtio-scsi -device scsi-cd,id=cd0 { "return": [ { +"io-status": "ok", "device": "", "locked": false, "removable": true, -- 2.13.6
[Qemu-devel] [PULL 12/23] include: Add a lookup table of sizes
From: Leonid Bloch Adding a lookup table for the powers of two, with the appropriate size prefixes. This is needed when a size has to be stringified, in which case something like '(1 * KiB)' would become a literal '(1 * (1L << 10))' string. Powers of two are used very often for sizes, so such a table will also make it easier and more intuitive to write them. This table is generatred using the following AWK script: BEGIN { suffix="KMGTPE"; for(i=10; i<64; i++) { val=2**i; s=substr(suffix, int(i/10), 1); n=2**(i%10); pad=21-int(log(n)/log(10)); printf("#define S_%d%siB %*d\n", n, s, pad, val); } } Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- include/qemu/units.h | 55 1 file changed, 55 insertions(+) diff --git a/include/qemu/units.h b/include/qemu/units.h index 692db3fbb2..68a7758650 100644 --- a/include/qemu/units.h +++ b/include/qemu/units.h @@ -17,4 +17,59 @@ #define PiB (INT64_C(1) << 50) #define EiB (INT64_C(1) << 60) +#define S_1KiB 1024 +#define S_2KiB 2048 +#define S_4KiB 4096 +#define S_8KiB 8192 +#define S_16KiB16384 +#define S_32KiB32768 +#define S_64KiB65536 +#define S_128KiB 131072 +#define S_256KiB 262144 +#define S_512KiB 524288 +#define S_1MiB 1048576 +#define S_2MiB 2097152 +#define S_4MiB 4194304 +#define S_8MiB 8388608 +#define S_16MiB 16777216 +#define S_32MiB 33554432 +#define S_64MiB 67108864 +#define S_128MiB 134217728 +#define S_256MiB 268435456 +#define S_512MiB 536870912 +#define S_1GiB1073741824 +#define S_2GiB2147483648 +#define S_4GiB4294967296 +#define S_8GiB8589934592 +#define S_16GiB 17179869184 +#define S_32GiB 34359738368 +#define S_64GiB 68719476736 +#define S_128GiB137438953472 +#define S_256GiB274877906944 +#define S_512GiB549755813888 +#define S_1TiB 1099511627776 +#define S_2TiB 219902322 +#define S_4TiB 4398046511104 +#define S_8TiB 8796093022208 +#define S_16TiB 17592186044416 +#define S_32TiB 35184372088832 +#define S_64TiB 70368744177664 +#define S_128TiB 140737488355328 +#define S_256TiB 281474976710656 +#define S_512TiB 562949953421312 +#define S_1PiB 1125899906842624 +#define S_2PiB 2251799813685248 +#define S_4PiB 4503599627370496 +#define S_8PiB 9007199254740992 +#define S_16PiB18014398509481984 +#define S_32PiB36028797018963968 +#define S_64PiB72057594037927936 +#define S_128PiB 144115188075855872 +#define S_256PiB 288230376151711744 +#define S_512PiB 576460752303423488 +#define S_1EiB 1152921504606846976 +#define S_2EiB 2305843009213693952 +#define S_4EiB 4611686018427387904 +#define S_8EiB 9223372036854775808 + #endif -- 2.13.6
[Qemu-devel] [PULL 16/23] qcow2: Increase the default upper limit on the L2 cache size
From: Leonid Bloch The upper limit on the L2 cache size is increased from 1 MB to 32 MB on Linux platforms, and to 8 MB on other platforms (this difference is caused by the ability to set intervals for cache cleaning on Linux platforms only). This is done in order to allow default full coverage with the L2 cache for images of up to 256 GB in size (was 8 GB). Note, that only the needed amount to cover the full image is allocated. The value which is changed here is just the upper limit on the L2 cache size, beyond which it will not grow, even if the size of the image will require it to. Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- docs/qcow2-cache.txt | 15 +-- block/qcow2.h| 6 +- qemu-options.hx | 6 +++--- 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 750447ea4f..1fcc0658b2 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -125,12 +125,15 @@ There are a few things that need to be taken into account: - Both caches must have a size that is a multiple of the cluster size (or the cache entry size: see "Using smaller cache sizes" below). - - The maximum L2 cache size is 1 MB by default (enough for full coverage - of 8 GB images, with the default cluster size). This value can be - modified using the "l2-cache-size" option. QEMU will not use more memory - than needed to hold all of the image's L2 tables, regardless of this max. - value. The minimal L2 cache size is 2 clusters (or 2 cache entries, see - below). + - The maximum L2 cache size is 32 MB by default on Linux platforms (enough + for full coverage of 256 GB images, with the default cluster size). This + value can be modified using the "l2-cache-size" option. QEMU will not use + more memory than needed to hold all of the image's L2 tables, regardless + of this max. value. + On non-Linux platforms the maximal value is smaller by default (8 MB) and + this difference stems from the fact that on Linux the cache can be cleared + periodically if needed, using the "cache-clean-interval" option (see below). + The minimal L2 cache size is 2 clusters (or 2 cache entries, see below). - The default (and minimum) refcount cache size is 4 clusters. diff --git a/block/qcow2.h b/block/qcow2.h index 2f8c1fd15c..0f0e3534bf 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -74,7 +74,11 @@ /* Must be at least 4 to cover all cases of refcount table growth */ #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */ -#define DEFAULT_L2_CACHE_MAX_SIZE S_1MiB +#ifdef CONFIG_LINUX +#define DEFAULT_L2_CACHE_MAX_SIZE S_32MiB +#else +#define DEFAULT_L2_CACHE_MAX_SIZE S_8MiB +#endif #define DEFAULT_CLUSTER_SIZE S_64KiB diff --git a/qemu-options.hx b/qemu-options.hx index 6eef0f5651..14aee78c6c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -736,9 +736,9 @@ The maximum total size of the L2 table and refcount block caches in bytes @item l2-cache-size The maximum size of the L2 table cache in bytes -(default: if cache-size is not specified - 1M; otherwise, as large as possible -within the cache-size, while permitting the requested or the minimal refcount -cache size) +(default: if cache-size is not specified - 32M on Linux platforms, and 8M on +non-Linux platforms; otherwise, as large as possible within the cache-size, +while permitting the requested or the minimal refcount cache size) @item refcount-cache-size The maximum size of the refcount block cache in bytes -- 2.13.6
[Qemu-devel] [PULL 14/23] qcow2: Avoid duplication in setting the refcount cache size
From: Leonid Bloch The refcount cache size does not need to be set to its minimum value in read_cache_sizes(), as it is set to at least its minimum value in qcow2_update_options_prepare(). Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- block/qcow2.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index d2c07ce9fe..cd0053b6ee 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -834,10 +834,9 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts, (uint64_t)DEFAULT_L2_CACHE_CLUSTERS * s->cluster_size); } -if (!refcount_cache_size_set) { -*refcount_cache_size = min_refcount_cache; -} } +/* l2_cache_size and refcount_cache_size are ensured to have at least + * their minimum values in qcow2_update_options_prepare() */ if (*l2_cache_entry_size < (1 << MIN_CLUSTER_BITS) || *l2_cache_entry_size > s->cluster_size || -- 2.13.6
[Qemu-devel] [PULL 07/23] block: Forbid trying to change unsupported options during reopen
From: Alberto Garcia The bdrv_reopen_prepare() function checks all options passed to each BlockDriverState (in the reopen_state->options QDict) and makes all necessary preparations to apply the option changes requested by the user. Options are removed from the QDict as they are processed, so at the end of bdrv_reopen_prepare() only the options that can't be changed are left. Then a loop goes over all remaining options and verifies that the old and new values are identical, returning an error if they're not. The problem is that at the moment there are options that are removed from the QDict although they can't be changed. The consequence of this is any modification to any of those options is silently ignored: (qemu) qemu-io virtio0 "reopen -o discard=on" This happens when all options from bdrv_runtime_opts are removed from the QDict but then only a few of them are processed. Since it's especially important that "node-name" and "driver" are not changed, the code puts them back into the QDict so they are checked at the end of the function. Instead of putting only those two options back into the QDict, this patch puts all unprocessed options using qemu_opts_to_qdict(). update_flags_from_options() also needs to be modified to prevent BDRV_OPT_CACHE_NO_FLUSH, BDRV_OPT_CACHE_DIRECT and BDRV_OPT_READ_ONLY from going back to the QDict. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- block.c | 22 +++--- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/block.c b/block.c index ff1aded4b8..e36ae3e8d1 100644 --- a/block.c +++ b/block.c @@ -1094,19 +1094,19 @@ static void update_flags_from_options(int *flags, QemuOpts *opts) *flags &= ~BDRV_O_CACHE_MASK; assert(qemu_opt_find(opts, BDRV_OPT_CACHE_NO_FLUSH)); -if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_NO_FLUSH, false)) { +if (qemu_opt_get_bool_del(opts, BDRV_OPT_CACHE_NO_FLUSH, false)) { *flags |= BDRV_O_NO_FLUSH; } assert(qemu_opt_find(opts, BDRV_OPT_CACHE_DIRECT)); -if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_DIRECT, false)) { +if (qemu_opt_get_bool_del(opts, BDRV_OPT_CACHE_DIRECT, false)) { *flags |= BDRV_O_NOCACHE; } *flags &= ~BDRV_O_RDWR; assert(qemu_opt_find(opts, BDRV_OPT_READ_ONLY)); -if (!qemu_opt_get_bool(opts, BDRV_OPT_READ_ONLY, false)) { +if (!qemu_opt_get_bool_del(opts, BDRV_OPT_READ_ONLY, false)) { *flags |= BDRV_O_RDWR; } @@ -3156,7 +3156,6 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, BlockDriver *drv; QemuOpts *opts; QDict *orig_reopen_opts; -const char *value; bool read_only; assert(reopen_state != NULL); @@ -3179,17 +3178,10 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, update_flags_from_options(&reopen_state->flags, opts); -/* node-name and driver must be unchanged. Put them back into the QDict, so - * that they are checked at the end of this function. */ -value = qemu_opt_get(opts, "node-name"); -if (value) { -qdict_put_str(reopen_state->options, "node-name", value); -} - -value = qemu_opt_get(opts, "driver"); -if (value) { -qdict_put_str(reopen_state->options, "driver", value); -} +/* All other options (including node-name and driver) must be unchanged. + * Put them back into the QDict, so that they are checked at the end + * of this function. */ +qemu_opts_to_qdict(opts, reopen_state->options); /* If we are to stay read-only, do not allow permission change * to r/w. Attempting to set to r/w may fail if either BDRV_O_ALLOW_RDWR is -- 2.13.6
[Qemu-devel] [PULL 11/23] qcow2: Options' documentation fixes
From: Leonid Bloch Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- docs/qcow2-cache.txt | 21 ++--- qemu-options.hx | 9 ++--- 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt index 8a09a5cc5f..7e28b41bd3 100644 --- a/docs/qcow2-cache.txt +++ b/docs/qcow2-cache.txt @@ -79,14 +79,14 @@ Choosing the right cache sizes In order to choose the cache sizes we need to know how they relate to the amount of allocated space. -The amount of virtual disk that can be mapped by the L2 and refcount +The part of the virtual disk that can be mapped by the L2 and refcount caches (in bytes) is: disk_size = l2_cache_size * cluster_size / 8 disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits With the default values for cluster_size (64KB) and refcount_bits -(16), that is +(16), this becomes: disk_size = l2_cache_size * 8192 disk_size = refcount_cache_size * 32768 @@ -97,12 +97,16 @@ need: l2_cache_size = disk_size_GB * 131072 refcount_cache_size = disk_size_GB * 32768 -QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount -cache of 256KB (262144 bytes), so using the formulas we've just seen -we have +For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual +image size (given that the default cluster size is used): - 1048576 / 131072 = 8 GB of virtual disk covered by that cache -262144 / 32768 = 8 GB + 8 GB / 8192 = 1 MB + +The refcount cache is 4 times the cluster size by default. With the default +cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for +8 GB of image size: + + 262144 * 32768 = 8 GB How to configure the cache sizes @@ -130,6 +134,9 @@ There are a few things that need to be taken into account: memory as possible to the L2 cache before increasing the refcount cache size. + - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size" + can be set simultaneously. + Unlike L2 tables, refcount blocks are not used during normal I/O but only during allocations and internal snapshots. In most cases they are accessed sequentially (even during random guest I/O) so increasing the diff --git a/qemu-options.hx b/qemu-options.hx index a642ad297f..2db6247eff 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -732,15 +732,18 @@ image file) @item cache-size The maximum total size of the L2 table and refcount block caches in bytes -(default: 1048576 bytes or 8 clusters, whichever is larger) +(default: the sum of l2-cache-size and refcount-cache-size) @item l2-cache-size The maximum size of the L2 table cache in bytes -(default: 4/5 of the total cache size) +(default: if cache-size is not defined - 1048576 bytes or 8 clusters, whichever +is larger; otherwise, as large as possible or needed within the cache-size, +while permitting the requested or the minimal refcount cache size) @item refcount-cache-size The maximum size of the refcount block cache in bytes -(default: 1/5 of the total cache size) +(default: 4 times the cluster size; or if cache-size is specified, the part of +it which is not used for the L2 cache) @item cache-clean-interval Clean unused entries in the L2 and refcount caches. The interval is in seconds. -- 2.13.6
[Qemu-devel] [PULL 09/23] block: Allow changing 'discard' on reopen
From: Alberto Garcia 'discard' is one of the basic BlockdevOptions available for all drivers, but it's not handled by bdrv_reopen_prepare() so any attempt to change it results in an error: (qemu) qemu-io virtio0 "reopen -o discard=on" Cannot change the option 'discard' Since there's no reason why we shouldn't allow changing it and the implementation is simple let's just do it. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- block.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block.c b/block.c index e36ae3e8d1..7fb267bd1f 100644 --- a/block.c +++ b/block.c @@ -3156,6 +3156,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, BlockDriver *drv; QemuOpts *opts; QDict *orig_reopen_opts; +char *discard = NULL; bool read_only; assert(reopen_state != NULL); @@ -3178,6 +3179,15 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, update_flags_from_options(&reopen_state->flags, opts); +discard = qemu_opt_get_del(opts, "discard"); +if (discard != NULL) { +if (bdrv_parse_discard_flags(discard, &reopen_state->flags) != 0) { +error_setg(errp, "Invalid discard option"); +ret = -EINVAL; +goto error; +} +} + /* All other options (including node-name and driver) must be unchanged. * Put them back into the QDict, so that they are checked at the end * of this function. */ @@ -3291,6 +3301,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, error: qemu_opts_del(opts); qobject_unref(orig_reopen_opts); +g_free(discard); return ret; } -- 2.13.6
[Qemu-devel] [PULL 04/23] block: Remove child references from bs->{options, explicit_options}
From: Alberto Garcia Block drivers allow opening their children using a reference to an existing BlockDriverState. These references remain stored in the 'options' and 'explicit_options' QDicts, but we don't need to keep them once everything is open. What is more important, these values can become wrong if the children change: $ qemu-img create -f qcow2 hd0.qcow2 10M $ qemu-img create -f qcow2 hd1.qcow2 10M $ qemu-img create -f qcow2 hd2.qcow2 10M $ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \ -drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \ -drive file=hd2.qcow2,node-name=hd2,backing=hd1 After this hd2 has hd1 as its backing file. Now let's remove it using block_stream: (qemu) block_stream hd2 0 hd0.qcow2 Now hd0 is the backing file of hd2, but hd2's options QDicts still contain backing=hd1. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- block.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/block.c b/block.c index c298ca6a19..db71ea5b9a 100644 --- a/block.c +++ b/block.c @@ -2763,12 +2763,15 @@ static BlockDriverState *bdrv_open_inherit(const char *filename, } } -/* Remove all children options from bs->options and bs->explicit_options */ +/* Remove all children options and references + * from bs->options and bs->explicit_options */ QLIST_FOREACH(child, &bs->children, next) { char *child_key_dot; child_key_dot = g_strdup_printf("%s.", child->name); qdict_extract_subqdict(bs->explicit_options, NULL, child_key_dot); qdict_extract_subqdict(bs->options, NULL, child_key_dot); +qdict_del(bs->explicit_options, child->name); +qdict_del(bs->options, child->name); g_free(child_key_dot); } @@ -3290,6 +3293,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state) { BlockDriver *drv; BlockDriverState *bs; +BdrvChild *child; bool old_can_write, new_can_write; assert(reopen_state != NULL); @@ -3314,6 +3318,13 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state) bs->open_flags = reopen_state->flags; bs->read_only = !(reopen_state->flags & BDRV_O_RDWR); +/* Remove child references from bs->options and bs->explicit_options. + * Child options were already removed in bdrv_reopen_queue_child() */ +QLIST_FOREACH(child, &bs->children, next) { +qdict_del(bs->explicit_options, child->name); +qdict_del(bs->options, child->name); +} + bdrv_refresh_limits(bs, NULL); bdrv_set_perm(reopen_state->bs, reopen_state->perm, -- 2.13.6
[Qemu-devel] [PULL 08/23] file-posix: Forbid trying to change unsupported options during reopen
From: Alberto Garcia The file-posix code is used for the "file", "host_device" and "host_cdrom" drivers, and it allows reopening images. However the only option that is actually processed is "x-check-cache-dropped", and changes in all other options (e.g. "filename") are silently ignored: (qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file" While we could allow changing some of the other options, let's keep things as they are for now but return an error if the user tries to change any of them. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- block/file-posix.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index bc5e54560a..2da3a76355 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -849,8 +849,13 @@ static int raw_reopen_prepare(BDRVReopenState *state, goto out; } -rs->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped", -false); +rs->check_cache_dropped = +qemu_opt_get_bool_del(opts, "x-check-cache-dropped", false); + +/* This driver's reopen function doesn't currently allow changing + * other options, so let's put them back in the original QDict and + * bdrv_reopen_prepare() will detect changes and complain. */ +qemu_opts_to_qdict(opts, state->options); if (s->type == FTYPE_CD) { rs->open_flags |= O_NONBLOCK; -- 2.13.6
[Qemu-devel] [PULL 19/23] qcow2: Explicit number replaced by a constant
From: Leonid Bloch Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Reviewed-by: Kevin Wolf Signed-off-by: Kevin Wolf --- block/qcow2.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 95e1c98daa..7277feda13 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1324,7 +1324,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options, /* 2^(s->refcount_order - 3) is the refcount width in bytes */ s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3); s->refcount_block_size = 1 << s->refcount_block_bits; -bs->total_sectors = header.size / 512; +bs->total_sectors = header.size / BDRV_SECTOR_SIZE; s->csize_shift = (62 - (s->cluster_bits - 8)); s->csize_mask = (1 << (s->cluster_bits - 8)) - 1; s->cluster_offset_mask = (1LL << s->csize_shift) - 1; @@ -3450,7 +3450,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, goto fail; } -old_length = bs->total_sectors * 512; +old_length = bs->total_sectors * BDRV_SECTOR_SIZE; new_l1_size = size_to_l1(s, offset); if (offset < old_length) { -- 2.13.6
[Qemu-devel] [PULL 10/23] block: Allow changing 'detect-zeroes' on reopen
From: Alberto Garcia 'detect-zeroes' is one of the basic BlockdevOptions available for all drivers, but it's not handled by bdrv_reopen_prepare(), so any attempt to change it results in an error: (qemu) qemu-io virtio0 "reopen -o detect-zeroes=on" Cannot change the option 'detect-zeroes' Since there's no reason why we shouldn't allow changing it and the implementation is simple let's just do it. Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- include/block/block.h | 1 + block.c | 64 --- 2 files changed, 41 insertions(+), 24 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 4edc1e8afa..b189cf422e 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -184,6 +184,7 @@ typedef QSIMPLEQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue; typedef struct BDRVReopenState { BlockDriverState *bs; int flags; +BlockdevDetectZeroesOptions detect_zeroes; uint64_t perm, shared_perm; QDict *options; QDict *explicit_options; diff --git a/block.c b/block.c index 7fb267bd1f..7710b399a3 100644 --- a/block.c +++ b/block.c @@ -764,6 +764,31 @@ static void bdrv_join_options(BlockDriverState *bs, QDict *options, } } +static BlockdevDetectZeroesOptions bdrv_parse_detect_zeroes(QemuOpts *opts, +int open_flags, +Error **errp) +{ +Error *local_err = NULL; +char *value = qemu_opt_get_del(opts, "detect-zeroes"); +BlockdevDetectZeroesOptions detect_zeroes = +qapi_enum_parse(&BlockdevDetectZeroesOptions_lookup, value, +BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, &local_err); +g_free(value); +if (local_err) { +error_propagate(errp, local_err); +return detect_zeroes; +} + +if (detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP && +!(open_flags & BDRV_O_UNMAP)) +{ +error_setg(errp, "setting detect-zeroes to unmap is not allowed " + "without setting discard operation to unmap"); +} + +return detect_zeroes; +} + /** * Set open flags for a given discard mode * @@ -1328,7 +1353,6 @@ static int bdrv_open_common(BlockDriverState *bs, BlockBackend *file, const char *driver_name = NULL; const char *node_name = NULL; const char *discard; -const char *detect_zeroes; QemuOpts *opts; BlockDriver *drv; Error *local_err = NULL; @@ -1417,29 +1441,12 @@ static int bdrv_open_common(BlockDriverState *bs, BlockBackend *file, } } -detect_zeroes = qemu_opt_get(opts, "detect-zeroes"); -if (detect_zeroes) { -BlockdevDetectZeroesOptions value = -qapi_enum_parse(&BlockdevDetectZeroesOptions_lookup, -detect_zeroes, -BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, -&local_err); -if (local_err) { -error_propagate(errp, local_err); -ret = -EINVAL; -goto fail_opts; -} - -if (value == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP && -!(bs->open_flags & BDRV_O_UNMAP)) -{ -error_setg(errp, "setting detect-zeroes to unmap is not allowed " - "without setting discard operation to unmap"); -ret = -EINVAL; -goto fail_opts; -} - -bs->detect_zeroes = value; +bs->detect_zeroes = +bdrv_parse_detect_zeroes(opts, bs->open_flags, &local_err); +if (local_err) { +error_propagate(errp, local_err); +ret = -EINVAL; +goto fail_opts; } if (filename != NULL) { @@ -3188,6 +3195,14 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, } } +reopen_state->detect_zeroes = +bdrv_parse_detect_zeroes(opts, reopen_state->flags, &local_err); +if (local_err) { +error_propagate(errp, local_err); +ret = -EINVAL; +goto error; +} + /* All other options (including node-name and driver) must be unchanged. * Put them back into the QDict, so that they are checked at the end * of this function. */ @@ -3338,6 +3353,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state) bs->options= reopen_state->options; bs->open_flags = reopen_state->flags; bs->read_only = !(reopen_state->flags & BDRV_O_RDWR); +bs->detect_zeroes = reopen_state->detect_zeroes; /* Remove child references from bs->options and bs->explicit_options. * Child options were already removed in bdrv_reopen_queue_child() */ -- 2.13.6
[Qemu-devel] [PULL 17/23] qcow2: Resize the cache upon image resizing
From: Leonid Bloch The caches are now recalculated upon image resizing. This is done because the new default behavior of assigning L2 cache relatively to the image size, implies that the cache will be adapted accordingly after an image resize. Signed-off-by: Leonid Bloch Reviewed-by: Alberto Garcia Signed-off-by: Kevin Wolf --- block/qcow2.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/block/qcow2.c b/block/qcow2.c index 589f6c1b1c..20b5093269 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3418,6 +3418,7 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, uint64_t old_length; int64_t new_l1_size; int ret; +QDict *options; if (prealloc != PREALLOC_MODE_OFF && prealloc != PREALLOC_MODE_METADATA && prealloc != PREALLOC_MODE_FALLOC && prealloc != PREALLOC_MODE_FULL) @@ -3642,6 +3643,8 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, } } +bs->total_sectors = offset / BDRV_SECTOR_SIZE; + /* write updated header.size */ offset = cpu_to_be64(offset); ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size), @@ -3652,6 +3655,14 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset, } s->l1_vm_state_index = new_l1_size; + +/* Update cache sizes */ +options = qdict_clone_shallow(bs->options); +ret = qcow2_update_options(bs, options, s->flags, errp); +qobject_unref(options); +if (ret < 0) { +goto fail; +} ret = 0; fail: qemu_co_mutex_unlock(&s->lock); -- 2.13.6
[Qemu-devel] [PULL 01/23] file-posix: Include filename in locking error message
From: Fam Zheng Image locking errors happening at device initialization time doesn't say which file cannot be locked, for instance, -device scsi-disk,drive=drive-1: Failed to get shared "write" lock Is another process using the image? could refer to either the overlay image or its backing image. Hoist the error_append_hint to the caller of raw_check_lock_bytes where file name is known, and include it in the error hint. Signed-off-by: Fam Zheng Reviewed-by: Eric Blake Signed-off-by: Kevin Wolf --- block/file-posix.c | 10 +++--- tests/qemu-iotests/153.out | 76 +++--- tests/qemu-iotests/182.out | 2 +- 3 files changed, 45 insertions(+), 43 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index fe83cbf0eb..327f39ca45 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -741,8 +741,6 @@ static int raw_check_lock_bytes(int fd, uint64_t perm, uint64_t shared_perm, "Failed to get \"%s\" lock", perm_name); g_free(perm_name); -error_append_hint(errp, - "Is another process using the image?\n"); return ret; } } @@ -758,8 +756,6 @@ static int raw_check_lock_bytes(int fd, uint64_t perm, uint64_t shared_perm, "Failed to get shared \"%s\" lock", perm_name); g_free(perm_name); -error_append_hint(errp, - "Is another process using the image?\n"); return ret; } } @@ -796,6 +792,9 @@ static int raw_handle_perm_lock(BlockDriverState *bs, if (!ret) { return 0; } +error_append_hint(errp, + "Is another process using the image [%s]?\n", + bs->filename); } op = RAW_PL_ABORT; /* fall through to unlock bytes. */ @@ -2217,6 +2216,9 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp) /* Step two: Check that nobody else has taken conflicting locks */ result = raw_check_lock_bytes(fd, perm, shared, errp); if (result < 0) { +error_append_hint(errp, + "Is another process using the image [%s]?\n", + file_opts->filename); goto out_unlock; } diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out index 93eaf10486..884254868c 100644 --- a/tests/qemu-iotests/153.out +++ b/tests/qemu-iotests/153.out @@ -12,11 +12,11 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33554432 backing_file=TEST_DIR/t == Launching another QEMU, opts: '' == QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,: Failed to get "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? == Launching another QEMU, opts: 'read-only=on' == QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,read-only=on: Failed to get shared "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? == Launching another QEMU, opts: 'read-only=on,force-share=on' == @@ -24,77 +24,77 @@ Is another process using the image? _qemu_io_wrapper -c read 0 512 TEST_DIR/t.qcow2 can't open device TEST_DIR/t.qcow2: Failed to get "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? _qemu_io_wrapper -r -c read 0 512 TEST_DIR/t.qcow2 can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? _qemu_io_wrapper -c open TEST_DIR/t.qcow2 -c read 0 512 can't open device TEST_DIR/t.qcow2: Failed to get "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? no file open, try 'help open' _qemu_io_wrapper -c open -r TEST_DIR/t.qcow2 -c read 0 512 can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? no file open, try 'help open' _qemu_img_wrapper info TEST_DIR/t.qcow2 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? _qemu_img_wrapper check TEST_DIR/t.qcow2 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? _qemu_img_wrapper compare TEST_DIR/t.qcow2 TEST_DIR/t.qcow2 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock -Is another process using the image? +Is another process using the image [TEST_DIR/t.qcow2]? _qemu_img_wrapper map TEST_DIR/t.qcow2 qe
[Qemu-devel] [PULL 03/23] file-posix: x-check-cache-dropped should default to false on reopen
From: Alberto Garcia The default value of x-check-cache-dropped is false. There's no reason to use the previous value as a default in raw_reopen_prepare() because bdrv_reopen_queue_child() already takes care of putting the old options in the BDRVReopenState.options QDict. If x-check-cache-dropped was previously set but is now missing from the reopen QDict then it should be reset to false. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- block/file-posix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/file-posix.c b/block/file-posix.c index 327f39ca45..bc5e54560a 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -850,7 +850,7 @@ static int raw_reopen_prepare(BDRVReopenState *state, } rs->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped", -s->check_cache_dropped); +false); if (s->type == FTYPE_CD) { rs->open_flags |= O_NONBLOCK; -- 2.13.6
[Qemu-devel] [PULL 06/23] block: Allow child references on reopen
From: Alberto Garcia In the previous patches we removed all child references from bs->{options,explicit_options} because keeping them is useless and wrong. Because of this, any attempt to reopen a BlockDriverState using a child reference as one of its options would result in a failure, because bdrv_reopen_prepare() would detect that there's a new option (the child reference) that wasn't present in bs->options. But passing child references on reopen can be useful. It's a way to specify a BDS's child without having to pass recursively all of the child's options, and if the reference points to a different BDS then this can allow us to replace the child. However, replacing the child is something that needs to be implemented case by case and only when it makes sense. For now, this patch allows passing a child reference as long as it points to the current child of the BlockDriverState. It's also important to remember that, as a consequence of the previous patches, this child reference will be removed from bs->{options,explicit_options} after the reopening has been completed. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- block.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/block.c b/block.c index 98aca56785..ff1aded4b8 100644 --- a/block.c +++ b/block.c @@ -3242,6 +3242,24 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, BlockReopenQueue *queue, QObject *new = entry->value; QObject *old = qdict_get(reopen_state->bs->options, entry->key); +/* Allow child references (child_name=node_name) as long as they + * point to the current child (i.e. everything stays the same). */ +if (qobject_type(new) == QTYPE_QSTRING) { +BdrvChild *child; +QLIST_FOREACH(child, &reopen_state->bs->children, next) { +if (!strcmp(child->name, entry->key)) { +break; +} +} + +if (child) { +const char *str = qobject_get_try_str(new); +if (!strcmp(child->bs->node_name, str)) { +continue; /* Found child with this name, skip option */ +} +} +} + /* * TODO: When using -drive to specify blockdev options, all values * will be strings; however, when using -blockdev, blockdev-add or -- 2.13.6
[Qemu-devel] [PULL 05/23] block: Don't look for child references in append_open_options()
From: Alberto Garcia In the previous patch we removed child references from bs->options, so there's no need to look for them here anymore. Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- block.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/block.c b/block.c index db71ea5b9a..98aca56785 100644 --- a/block.c +++ b/block.c @@ -5150,23 +5150,12 @@ static bool append_open_options(QDict *d, BlockDriverState *bs) { const QDictEntry *entry; QemuOptDesc *desc; -BdrvChild *child; bool found_any = false; for (entry = qdict_first(bs->options); entry; entry = qdict_next(bs->options, entry)) { -/* Exclude node-name references to children */ -QLIST_FOREACH(child, &bs->children, next) { -if (!strcmp(entry->key, child->name)) { -break; -} -} -if (child) { -continue; -} - -/* And exclude all non-driver-specific options */ +/* Exclude all non-driver-specific options */ for (desc = bdrv_runtime_opts.desc; desc->name; desc++) { if (!strcmp(qdict_entry_key(entry), desc->name)) { break; -- 2.13.6
[Qemu-devel] [PULL 00/23] Block layer patches
The following changes since commit 07f426c35eddd79388a23d11cb278600d7e3831d: Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into staging (2018-09-28 18:56:09 +0100) are available in the git repository at: git://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to dd353157942a59c21da07da5ac8749a871f7c3ed: tests/test-bdrv-drain: Fix too late qemu_event_reset() (2018-10-01 19:13:55 +0200) Block layer patches: - qcow2 cache option default changes (Linux: 32 MB maximum, limited by whatever cache size can be made use of with the specific image; default cache-clean-interval of 10 minutes) - reopen: Allow specifying unchanged child node references, and changing a few generic options (discard, detect-zeroes) - Fix werror/rerror defaults for -device drive= - Test case fixes Alberto Garcia (9): qemu-io: Fix writethrough check in reopen file-posix: x-check-cache-dropped should default to false on reopen block: Remove child references from bs->{options,explicit_options} block: Don't look for child references in append_open_options() block: Allow child references on reopen block: Forbid trying to change unsupported options during reopen file-posix: Forbid trying to change unsupported options during reopen block: Allow changing 'discard' on reopen block: Allow changing 'detect-zeroes' on reopen Fam Zheng (1): file-posix: Include filename in locking error message Kevin Wolf (3): block-backend: Set werror/rerror defaults in blk_new() test-replication: Lock AioContext around blk_unref() tests/test-bdrv-drain: Fix too late qemu_event_reset() Leonid Bloch (10): qcow2: Options' documentation fixes include: Add a lookup table of sizes qcow2: Make sizes more humanly readable qcow2: Avoid duplication in setting the refcount cache size qcow2: Assign the L2 cache relatively to the image size qcow2: Increase the default upper limit on the L2 cache size qcow2: Resize the cache upon image resizing qcow2: Set the default cache-clean-interval to 10 minutes qcow2: Explicit number replaced by a constant qcow2: Fix cache-clean-interval documentation qapi/block-core.json | 4 +- docs/qcow2-cache.txt | 59 block/qcow2.h | 19 --- include/block/block.h | 1 + include/qemu/units.h | 55 ++ block.c| 135 + block/block-backend.c | 3 + block/file-posix.c | 19 +-- block/qcow2.c | 43 +-- qemu-io-cmds.c | 2 +- tests/test-bdrv-drain.c| 4 +- tests/test-replication.c | 11 qemu-options.hx| 12 ++-- tests/qemu-iotests/067.out | 1 + tests/qemu-iotests/137 | 8 ++- tests/qemu-iotests/137.out | 4 +- tests/qemu-iotests/153.out | 76 - tests/qemu-iotests/182.out | 2 +- 18 files changed, 307 insertions(+), 151 deletions(-)
[Qemu-devel] [PULL 02/23] qemu-io: Fix writethrough check in reopen
From: Alberto Garcia "qemu-io reopen" doesn't allow changing the writethrough setting of the cache, but the check is wrong, causing an error even on a simple reopen with the default parameters: $ qemu-img create -f qcow2 hd.qcow2 1M $ qemu-system-x86_64 -monitor stdio -drive if=virtio,file=hd.qcow2 (qemu) qemu-io virtio0 reopen Cannot change cache.writeback: Device attached Signed-off-by: Alberto Garcia Reviewed-by: Max Reitz Signed-off-by: Kevin Wolf --- qemu-io-cmds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c index 5bf5f28178..db0b3ee5ef 100644 --- a/qemu-io-cmds.c +++ b/qemu-io-cmds.c @@ -2025,7 +2025,7 @@ static int reopen_f(BlockBackend *blk, int argc, char **argv) return -EINVAL; } -if (writethrough != blk_enable_write_cache(blk) && +if (!writethrough != blk_enable_write_cache(blk) && blk_get_attached_dev(blk)) { error_report("Cannot change cache.writeback: Device attached"); -- 2.13.6
Re: [Qemu-devel] [PULL v3 00/79] Misc patches for 2018-09-30
On 1 October 2018 at 16:29, Paolo Bonzini wrote: > The following changes since commit 042938f46e1c477419d1931381fdadffaa49d45e: > > Merge remote-tracking branch > 'remotes/dgilbert/tags/pull-migration-20180926a' into staging (2018-09-28 > 17:07:23 +0100) > > are available in the Git repository at: > > git://github.com/bonzini/qemu.git tags/for-upstream > > for you to fetch changes up to 72d09800e7b85fa1a14b88d341ca8c042d236cba: > > hw/scsi/mptendian: Avoid taking address of fields in packed structs > (2018-10-01 17:28:02 +0200) > > > * configure fix for environment variables (Daniel) > * fix memory leaks (Alex) > * x86_64 MTTCG fixes (Emilio) > * introduce atomic64 (Emilio) > * Fix for virtio hang (Fam, myself) > * SH serial port fix (Geert) > * Deprecate rotation_rate for scsi-block (Fam) > * Extend memory-backend-file availability to all POSIX hosts (Hikaru) > * Memory API cleanups and fixes (Igor, Li Qiang, Peter, Philippe) > * MSI/IOMMU fix (Jan) > * Socket reconnection fixes (Marc-André) > * icount fixes (Emilio, myself) > * QSP fixes for Coverity (myself) > * Some record/replay improovements (Pavel) > * Packed struct fixes (Peter) > * Windows dump fixes and elf2dmp (Viktor) > * kvmclock fix (Yongji) > > Hi; compile failures due to format string issues, I'm afraid: /home/petmay01/qemu-for-merges/hw/mips/mips_r4k.c: In function 'load_kernel': /home/petmay01/qemu-for-merges/hw/mips/mips_r4k.c:139:47: error: format '%li' expects argument of type 'long int', but argument 5 has type 'int64_t {aka long long int}' [-Werror=format=] snprintf((char *)params_buf + 8, 256, "rd_start=0x%" PRIx64 " rd_size=%li %s", ^ cc1: all warnings being treated as errors /home/petmay01/qemu-for-merges/hw/mips/mips_malta.c: In function 'load_kernel': /home/petmay01/qemu-for-merges/hw/mips/mips_malta.c:1073:42: error: format '%li' expects argument of type 'long int', but argument 5 has type 'int64_t {aka long long int}' [-Werror=format=] prom_set(prom_buf, prom_index++, "rd_start=0x%" PRIx64 " rd_size=%li %s", ^ I also saw this in the freebsd VM test: MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} gtester -k --verbose -m=quick tests/test-char TEST: tests/test-char... (pid=15938) /char/null: OK /char/invalid: OK /char/ringbuf: OK /char/mux: OK /char/stdio: OK /char/pipe: OK /char/file: OK /char/file-fifo: OK /char/udp: OK /char/serial:OK /char/hotswap: OK /char/socket/basic: OK /char/socket/reconnect: FAIL GTester: last random seed: R02S615917be6302ea7de5d0fc9c58149953 (pid=16010) /char/socket/fdpass: OK FAIL: tests/test-char ** ERROR:tests/test-char.c:353:char_socket_test_common: assertion failed: (object_property_get_bool(OBJECT(chr_client), "connected", &error_abort)) gmake: *** [/var/tmp/qemu-test.YRve5C/tests/Makefile.include:925: check-tests/test-char] Error 1 Haven't seen that before, so maybe it's something in this patchset; it could be a pre-existing intermittent though I suppose. thanks -- PMM
Re: [Qemu-devel] [PATCH v2 06/15] gdbstub: add multiprocess support to (f|s)ThreadInfo and ThreadExtraInfo
On 01/10/2018 13:56, Luc Michel wrote: > Change the thread info related packets handling to support multiprocess > extension. > > Add the CPUs class name in the extra info to help differentiate > them in multiprocess mode. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 32 ++-- > 1 file changed, 22 insertions(+), 10 deletions(-) > > diff --git a/gdbstub.c b/gdbstub.c > index 3242f0d261..63d7913269 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -1248,11 +1248,10 @@ out: > static int gdb_handle_packet(GDBState *s, const char *line_buf) > { > CPUState *cpu; > CPUClass *cc; > const char *p; > -uint32_t thread; > uint32_t pid, tid; > int ch, reg_size, type, res; > uint8_t mem_buf[MAX_PACKET_LENGTH]; > char buf[sizeof(mem_buf) + 1 /* trailing NUL */]; > char thread_id[16]; > @@ -1540,30 +1539,43 @@ static int gdb_handle_packet(GDBState *s, const char > *line_buf) > snprintf(buf, sizeof(buf), "QC%s", > gdb_fmt_thread_id(s, cpu, thread_id, > sizeof(thread_id))); > put_packet(s, buf); > break; > } else if (strcmp(p,"fThreadInfo") == 0) { > -s->query_cpu = first_cpu; > +s->query_cpu = gdb_first_cpu(s); > goto report_cpuinfo; > } else if (strcmp(p,"sThreadInfo") == 0) { > report_cpuinfo: > if (s->query_cpu) { > -snprintf(buf, sizeof(buf), "m%x", > cpu_gdb_index(s->query_cpu)); > +snprintf(buf, sizeof(buf), "m%s", > + gdb_fmt_thread_id(s, s->query_cpu, > + thread_id, sizeof(thread_id))); > put_packet(s, buf); > -s->query_cpu = CPU_NEXT(s->query_cpu); > +s->query_cpu = gdb_next_cpu(s, s->query_cpu); > } else > put_packet(s, "l"); > break; > } else if (strncmp(p,"ThreadExtraInfo,", 16) == 0) { > -thread = strtoull(p+16, (char **)&p, 16); > -cpu = find_cpu(thread); > +read_thread_id(p + 16, &p, &pid, &tid); Check similar to patch #3: if (read_thread_id(p, &p, &pid, &tid) == GDB_READ_THREAD_ERR) { put_packet(s, "E22"); break; } With equivalent change: Reviewed-by: Philippe Mathieu-Daudé > +cpu = gdb_get_cpu(s, pid, tid); > if (cpu != NULL) { > cpu_synchronize_state(cpu); > -/* memtohex() doubles the required space */ > -len = snprintf((char *)mem_buf, sizeof(buf) / 2, > - "CPU#%d [%s]", cpu->cpu_index, > - cpu->halted ? "halted " : "running"); > + > +if (s->multiprocess && (s->process_num > 1)) { > +/* Print the CPU model in multiprocess mode */ > +ObjectClass *oc = object_get_class(OBJECT(cpu)); > +const char *cpu_model = object_class_get_name(oc); > +len = snprintf((char *)mem_buf, sizeof(buf) / 2, > + "CPU#%d %s [%s]", cpu->cpu_index, > + cpu_model, > + cpu->halted ? "halted " : "running"); > +} else { > +/* memtohex() doubles the required space */ > +len = snprintf((char *)mem_buf, sizeof(buf) / 2, > + "CPU#%d [%s]", cpu->cpu_index, > + cpu->halted ? "halted " : "running"); > +} > trace_gdbstub_op_extra_info((char *)mem_buf); > memtohex(buf, mem_buf, len); > put_packet(s, buf); > } > break; >
Re: [Qemu-devel] [PATCH v2 03/15] gdbstub: add multiprocess support to 'H' and 'T' packets
On 01/10/2018 13:56, Luc Michel wrote: > Add a couple of helper functions to cope with GDB threads and processes. > > The gdb_get_process() function looks for a process given a pid. > > The gdb_get_cpu() function returns the CPU corresponding to the (pid, > tid) pair given as parameters. > > The read_thread_id() function parses the thread-id sent by the peer. > This function supports the multiprocess extension thread-id syntax. The > return value specifies if the parsing failed, or if a special case was > encountered (all processes or all threads). > > Use them in 'H' and 'T' packets handling to support the multiprocess > extension. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 149 +++--- > 1 file changed, 131 insertions(+), 18 deletions(-) > > diff --git a/gdbstub.c b/gdbstub.c > index ac9d540fda..a21ce3ca18 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -667,10 +667,74 @@ static uint32_t gdb_get_cpu_pid(const GDBState *s, > CPUState *cpu) > } > > return pid + 1; > } > > +static GDBProcess *gdb_get_process(const GDBState *s, uint32_t pid) > +{ > +uint32_t process_idx; > + > +if (!pid) { > +/* 0 means any process, we take the first one */ > +pid = 1; > +} > + > +/* GDB PIDs are in the range [1;n] */ > +process_idx = pid - 1; > + > +if (process_idx >= s->process_num) { > +return NULL; > +} > + > +return &s->processes[process_idx]; > +} > + > +static GDBProcess *gdb_get_cpu_process(const GDBState *s, CPUState *cpu) > +{ > +return gdb_get_process(s, gdb_get_cpu_pid(s, cpu)); > +} > + > +static CPUState *find_cpu(uint32_t thread_id) > +{ > +CPUState *cpu; > + > +CPU_FOREACH(cpu) { > +if (cpu_gdb_index(cpu) == thread_id) { > +return cpu; > +} > +} > + > +return NULL; > +} > + > +static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid) > +{ > +GDBProcess *process; > +CPUState *cpu = find_cpu(tid); > + > +if (!tid) { > +/* 0 means any thread, we take the first one */ > +tid = 1; > +} > + > +if (cpu == NULL) { > +return NULL; > +} > + > +process = gdb_get_cpu_process(s, cpu); > + > +if (process->pid != pid) { > +return NULL; > +} > + > +if (!process->attached) { > +return NULL; > +} > + > +return cpu; > +} > + > static const char *get_feature_xml(const char *p, const char **newp, > CPUClass *cc) > { > size_t len; > int i; > @@ -923,23 +987,10 @@ static void gdb_set_cpu_pc(GDBState *s, target_ulong pc) > > cpu_synchronize_state(cpu); > cpu_set_pc(cpu, pc); > } > > -static CPUState *find_cpu(uint32_t thread_id) > -{ > -CPUState *cpu; > - > -CPU_FOREACH(cpu) { > -if (cpu_gdb_index(cpu) == thread_id) { > -return cpu; > -} > -} > - > -return NULL; > -} > - > static char *gdb_fmt_thread_id(const GDBState *s, CPUState *cpu, > char *buf, size_t buf_size) > { > if (s->multiprocess) { > snprintf(buf, buf_size, "p%02x.%02x", > @@ -949,10 +1000,64 @@ static char *gdb_fmt_thread_id(const GDBState *s, > CPUState *cpu, > } > > return buf; > } > > +typedef enum GDBThreadIdKind { > +GDB_ONE_THREAD = 0, > +GDB_ALL_THREADS, /* One process, all threads */ > +GDB_ALL_PROCESSES, > +GDB_READ_THREAD_ERR > +} GDBThreadIdKind; > + > +static GDBThreadIdKind read_thread_id(const char *buf, const char **end_buf, > + uint32_t *pid, uint32_t *tid) > +{ > +unsigned long p, t; > +int ret; > + > +if (*buf == 'p') { > +buf++; > +ret = qemu_strtoul(buf, &buf, 16, &p); > + > +if (ret) { > +return GDB_READ_THREAD_ERR; > +} > + > +/* Skip '.' */ > +buf++; > +} else { > +p = 1; > +} > + > +ret = qemu_strtoul(buf, &buf, 16, &t); > + > +if (ret) { > +return GDB_READ_THREAD_ERR; > +} > + > +*end_buf = buf; > + > +if (p == -1) { > +return GDB_ALL_PROCESSES; > +} > + > +if (pid) { > +*pid = p; > +} > + > +if (t == -1) { > +return GDB_ALL_THREADS; > +} > + > +if (tid) { > +*tid = t; > +} > + > +return GDB_ONE_THREAD; > +} > + > static int is_query_packet(const char *p, const char *query, char separator) > { > unsigned int query_len = strlen(query); > > return strncmp(p, query, query_len) == 0 && > @@ -1057,16 +1162,18 @@ static int gdb_handle_packet(GDBState *s, const char > *line_buf) > { > CPUState *cpu; > CPUClass *cc; > const char *p; > uint32_t thread; > +uint32_t pid, tid; > int ch, reg_size, type, res; > uint8_t mem_buf[MAX_PACKET_LENGTH]; > char buf[sizeof(mem_buf) + 1 /* trailing NUL */]; > char thread_id[16];
[Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()
I've been investigating a race condition where sometimes when my guest writes to a device register which triggers a qemu_system_reset_request(), it doesn't actually cause a clean reset, but instead the guest CPU continues to execute instructions. I managed to repro it under 'rr', which let me walk through enough of what was going on to determine the following: When a guest CPU thread calls qemu_system_reset_request(), this results in a call to qemu_cpu_stop(current_cpu, true), to make the CPU come back out to the main loop. We also set the reset_requested flag, to get the IO thread to actually do the reset. The main loop thread runs main_loop_should_exit(). If there is a pending reset, it calls pause_all_vcpus(), with the intention that this quiesces all the guest CPUs before it starts messing with reset actions. pause_all_vcpus() just waits for every cpu to have cpu->stopped set. However, if the running cpu has just called qemu_cpu_stop() on itself then it will have set cpu->stopped true but not actually made it out to the main loop yet. (In the case I'm looking at, what happens is that as soon as the CPU thread unlocks the iothread mutex in io_writex() after the device write, the main thread runs and does all the reset operations.) The reset code in the iothread then proceeds to start calling various reset functions while the CPU thread is still inside the exec loop, running generated code and so on. This doesn't seem like what ought to happen. In particular it includes calling cpu_common_reset(), which clears all kinds of flags relevant to the still-executing CPU... Any suggestions for how we should fix this? thanks -- PMM
Re: [Qemu-devel] [PATCH v2 04/15] gdbstub: add multiprocess support to vCont packets
On 01/10/2018 13:56, Luc Michel wrote: > Add the gdb_first_cpu() and gdb_next_cpu() to iterate over all > the CPUs in currently attached processes. > > Add the gdb_first_cpu_in_process() and gdb_next_cpu_in_process() to > iterate over CPUs of a given process. > > Use them to add multiprocess extension support to vCont packets. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 117 +++--- > 1 file changed, 102 insertions(+), 15 deletions(-) > > diff --git a/gdbstub.c b/gdbstub.c > index a21ce3ca18..779cc8b241 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -704,10 +704,40 @@ static CPUState *find_cpu(uint32_t thread_id) > } > > return NULL; > } > > +static CPUState *get_first_cpu_in_process(const GDBState *s, > + GDBProcess *process) > +{ > +CPUState *cpu; > + > +CPU_FOREACH(cpu) { > +if (gdb_get_cpu_pid(s, cpu) == process->pid) { > +return cpu; > +} > +} > + > +return NULL; > +} > + > +static CPUState *gdb_next_cpu_in_process(const GDBState *s, CPUState *cpu) > +{ > +uint32_t pid = gdb_get_cpu_pid(s, cpu); > +cpu = CPU_NEXT(cpu); > + > +while (cpu) { > +if (gdb_get_cpu_pid(s, cpu) == pid) { > +break; > +} > + > +cpu = CPU_NEXT(cpu); > +} > + > +return cpu; > +} > + > static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid) > { > GDBProcess *process; > CPUState *cpu = find_cpu(tid); > > @@ -731,10 +761,41 @@ static CPUState *gdb_get_cpu(const GDBState *s, > uint32_t pid, uint32_t tid) > } > > return cpu; > } > > +/* Return the cpu following @cpu, while ignoring > + * unattached processes. > + */ > +static CPUState *gdb_next_cpu(const GDBState *s, CPUState *cpu) > +{ > +cpu = CPU_NEXT(cpu); > + > +while (cpu) { > +if (gdb_get_cpu_process(s, cpu)->attached) { > +break; > +} > + > +cpu = CPU_NEXT(cpu); > +} > + > +return cpu; > +} > + > +/* Return the first attached cpu */ > +static CPUState *gdb_first_cpu(const GDBState *s) > +{ > +CPUState *cpu = first_cpu; > +GDBProcess *process = gdb_get_cpu_process(s, cpu); > + > +if (!process->attached) { > +return gdb_next_cpu(s, cpu); > +} > + > +return cpu; > +} > + > static const char *get_feature_xml(const char *p, const char **newp, > CPUClass *cc) > { > size_t len; > int i; > @@ -1069,14 +1130,16 @@ static int is_query_packet(const char *p, const char > *query, char separator) > * returns -ENOTSUP if a command is unsupported, -EINVAL or -ERANGE if there > is > * a format error, 0 on success. > */ > static int gdb_handle_vcont(GDBState *s, const char *p) > { > -int res, idx, signal = 0; > +int res, signal = 0; > char cur_action; > char *newstates; > unsigned long tmp; > +uint32_t pid, tid; > +GDBProcess *process; > CPUState *cpu; > #ifdef CONFIG_USER_ONLY > int max_cpus = 1; /* global variable max_cpus exists only in system mode > */ > > CPU_FOREACH(cpu) { > @@ -1115,29 +1178,52 @@ static int gdb_handle_vcont(GDBState *s, const char > *p) > } else if (cur_action != 'c' && cur_action != 's') { > /* unknown/invalid/unsupported command */ > res = -ENOTSUP; > goto out; > } > -/* thread specification. special values: (none), -1 = all; 0 = any */ > -if ((p[0] == ':' && p[1] == '-' && p[2] == '1') || (p[0] != ':')) { > -if (*p == ':') { > -p += 3; > -} > -for (idx = 0; idx < max_cpus; idx++) { > -if (newstates[idx] == 1) { > -newstates[idx] = cur_action; > + > +if (*p++ != ':') { > +res = -ENOTSUP; > +goto out; > +} > + > +switch (read_thread_id(p, &p, &pid, &tid)) { > +case GDB_READ_THREAD_ERR: > +res = -EINVAL; > +goto out; > + > +case GDB_ALL_PROCESSES: > +cpu = gdb_first_cpu(s); > +while (cpu) { > +if (newstates[cpu->cpu_index] == 1) { > +newstates[cpu->cpu_index] = cur_action; > } > + > +cpu = gdb_next_cpu(s, cpu); > } > -} else if (*p == ':') { > -p++; > -res = qemu_strtoul(p, &p, 16, &tmp); > -if (res) { > +break; > + > +case GDB_ALL_THREADS: > +process = gdb_get_process(s, pid); > + > +if (!process->attached) { > +res = -EINVAL; > goto out; > } > > -/* 0 means any thread, so we pick the first valid CPU */ > -cpu = tmp ? find_cpu(tmp) : first_cpu; > +cpu = get_first_cpu_in_process(s, process); > +
Re: [Qemu-devel] [PATCH] qcow2: Fix cache-clean-interval documentation
Am 01.10.2018 um 16:35 hat Eric Blake geschrieben: > On 9/29/18 4:54 AM, Leonid Bloch wrote: > > Fixing cache-clean-interval documentation following the recent change to > > a default of 600 seconds on supported plarforms (only Linux currently). > > > > Signed-off-by: Leonid Bloch > > --- > > docs/qcow2-cache.txt | 19 +-- > > qapi/block-core.json | 3 ++- > > qemu-options.hx | 3 ++- > > 3 files changed, 13 insertions(+), 12 deletions(-) > > > > diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt > > index 59358b816f..1778312e09 100644 > > --- a/docs/qcow2-cache.txt > > +++ b/docs/qcow2-cache.txt > > @@ -202,18 +202,17 @@ Reducing the memory usage > > It is possible to clean unused cache entries in order to reduce the > > memory usage during periods of low I/O activity. > > -The parameter "cache-clean-interval" defines an interval (in seconds). > > -All cache entries that haven't been accessed during that interval are > > -removed from memory. > > +The parameter "cache-clean-interval" defines an interval (in seconds), > > +affer which all the cache entries that haven't been accessed during it > > s/affer/after/ > > Maybe s/during it/during the interval/ Thanks, made these fixes and applied it to the block branch. Kevin
Re: [Qemu-devel] [PATCH v2 11/15] gdbstub: add support for vAttach packets
Hi Luc, On 01/10/2018 13:57, Luc Michel wrote: > Add support for the vAttach packets. In multiprocess mode, GDB sends > them to attach to additional processes. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 32 > 1 file changed, 32 insertions(+) > > diff --git a/gdbstub.c b/gdbstub.c > index d372972dd3..b6de821025 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -1326,10 +1326,42 @@ static int gdb_handle_packet(GDBState *s, const char > *line_buf) > break; > } > goto unknown_command; > } > break; > +} else if (strncmp(p, "Attach", 6) == 0) { Shouldn't this be strncmp("Attach;", 7)? > +unsigned long pid; > + > +p += 7; > + > +qemu_strtoul(p, &p, 16, &pid); You should check for EINVAL/ERANGE here. Rest of the patch is OK. > + > +process = gdb_get_process(s, pid); > + > +if (process == NULL) { > +put_packet(s, "E22"); > +break; > +} > + > +cpu = get_first_cpu_in_process(s, process); > + > +if (cpu == NULL) { > +/* Refuse to attach an empty process */ > +put_packet(s, "E22"); > +break; > +} > + > +process->attached = true; > + > +s->g_cpu = cpu; > +s->c_cpu = cpu; > + > +snprintf(buf, sizeof(buf), "T%02xthread:%s;", GDB_SIGNAL_TRAP, > + gdb_fmt_thread_id(s, cpu, thread_id, > sizeof(thread_id))); > + > +put_packet(s, buf); > +break; > } else { > goto unknown_command; > } > case 'k': > /* Kill the target */ >
Re: [Qemu-devel] [PATCH v2 10/15] gdbstub: add support for extended mode packet
Hi Luc, On 01/10/2018 13:56, Luc Michel wrote: > Add support for the '!' extended mode packet. This is required for the > multiprocess extension. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gdbstub.c b/gdbstub.c > index 299783b3b8..d372972dd3 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -1280,10 +1280,13 @@ static int gdb_handle_packet(GDBState *s, const char > *line_buf) > trace_gdbstub_io_command(line_buf); > > p = line_buf; > ch = *p++; > switch(ch) { > +case '!': > +put_packet(s, "OK"); Don't we want to also support the 'R' packet? > +break; > case '?': > /* TODO: Make this return the correct value for user-mode. */ > snprintf(buf, sizeof(buf), "T%02xthread:%s;", GDB_SIGNAL_TRAP, > gdb_fmt_thread_id(s, s->c_cpu, thread_id, > sizeof(thread_id))); > put_packet(s, buf); >
Re: [Qemu-devel] [PATCH v2 14/15] gdbstub: add multiprocess extension support
On 01/10/2018 13:57, Luc Michel wrote: > Add multiprocess extension support by enabling multiprocess mode when > the peer requests it, and by replying that we actually support it in the > qSupported reply packet. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/gdbstub.c b/gdbstub.c > index 51cc11981e..89f6803533 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -1692,15 +1692,19 @@ static int gdb_handle_packet(GDBState *s, const char > *line_buf) > put_packet(s, "OK"); > break; > } > #endif /* !CONFIG_USER_ONLY */ > if (is_query_packet(p, "Supported", ':')) { > +if (strstr(p, "multiprocess+")) { > +s->multiprocess = true; > +} > snprintf(buf, sizeof(buf), "PacketSize=%x", MAX_PACKET_LENGTH); > cc = CPU_GET_CLASS(first_cpu); > if (cc->gdb_core_xml_file != NULL) { > pstrcat(buf, sizeof(buf), ";qXfer:features:read+"); > } Maybe move altogether: if (strstr(p, "multiprocess+")) { s->multiprocess = true; } > +pstrcat(buf, sizeof(buf), ";multiprocess+"); Regardless: Reviewed-by: Philippe Mathieu-Daudé > put_packet(s, buf); > break; > } > if (strncmp(p, "Xfer:features:read:", 19) == 0) { > const char *xml; >
Re: [Qemu-devel] [PATCH v2 08/15] gdbstub: add multiprocess support to gdb_vm_state_change()
On 01/10/2018 13:56, Luc Michel wrote: > Add support for multiprocess extension in gdb_vm_state_change() > function. > > Signed-off-by: Luc Michel Reviewed-by: Philippe Mathieu-Daudé > --- > gdbstub.c | 15 --- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/gdbstub.c b/gdbstub.c > index 9065e8e140..c1a02c34cd 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -1692,10 +1692,11 @@ void gdb_set_stop_cpu(CPUState *cpu) > static void gdb_vm_state_change(void *opaque, int running, RunState state) > { > GDBState *s = gdbserver_state; > CPUState *cpu = s->c_cpu; > char buf[256]; > +char thread_id[16]; > const char *type; > int ret; > > if (running || s->state == RS_INACTIVE) { > return; > @@ -1703,10 +1704,18 @@ static void gdb_vm_state_change(void *opaque, int > running, RunState state) > /* Is there a GDB syscall waiting to be sent? */ > if (s->current_syscall_cb) { > put_packet(s, s->syscall_buf); > return; > } > + > +if (cpu == NULL) { > +/* No process attached */ > +return; > +} > + > +gdb_fmt_thread_id(s, cpu, thread_id, sizeof(thread_id)); > + > switch (state) { > case RUN_STATE_DEBUG: > if (cpu->watchpoint_hit) { > switch (cpu->watchpoint_hit->flags & BP_MEM_ACCESS) { > case BP_MEM_READ: > @@ -1720,12 +1729,12 @@ static void gdb_vm_state_change(void *opaque, int > running, RunState state) > break; > } > trace_gdbstub_hit_watchpoint(type, cpu_gdb_index(cpu), > (target_ulong)cpu->watchpoint_hit->vaddr); > snprintf(buf, sizeof(buf), > - "T%02xthread:%02x;%swatch:" TARGET_FMT_lx ";", > - GDB_SIGNAL_TRAP, cpu_gdb_index(cpu), type, > + "T%02xthread:%s;%swatch:" TARGET_FMT_lx ";", > + GDB_SIGNAL_TRAP, thread_id, type, > (target_ulong)cpu->watchpoint_hit->vaddr); > cpu->watchpoint_hit = NULL; > goto send_packet; > } else { > trace_gdbstub_hit_break(); > @@ -1763,11 +1772,11 @@ static void gdb_vm_state_change(void *opaque, int > running, RunState state) > trace_gdbstub_hit_unknown(state); > ret = GDB_SIGNAL_UNKNOWN; > break; > } > gdb_set_stop_cpu(cpu); > -snprintf(buf, sizeof(buf), "T%02xthread:%02x;", ret, cpu_gdb_index(cpu)); > +snprintf(buf, sizeof(buf), "T%02xthread:%s;", ret, thread_id); > > send_packet: > put_packet(s, buf); > > /* disable single step if it was enabled */ >
Re: [Qemu-devel] [PATCH v2 07/15] gdbstub: add multiprocess support to Xfer:features:read:
Hi Luc, On 01/10/2018 13:56, Luc Michel wrote: > Change the Xfer:features:read: packet handling to support the > multiprocess extension. This packet is used to request the XML > description of the CPU. In multiprocess mode, different descriptions can > be sent for different processes. > > This function now takes the process to send the description for as a > parameter, and use a buffer in the process structure to store the > generated description. > > It takes the first CPU of the process to generate the description. > > Signed-off-by: Luc Michel > --- > gdbstub.c | 44 > 1 file changed, 24 insertions(+), 20 deletions(-) > > diff --git a/gdbstub.c b/gdbstub.c > index 63d7913269..9065e8e140 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -792,55 +792,57 @@ static CPUState *gdb_first_cpu(const GDBState *s) > } > > return cpu; > } > > -static const char *get_feature_xml(const char *p, const char **newp, > - CPUClass *cc) > +static const char *get_feature_xml(const GDBState *s, const char *p, > + const char **newp, GDBProcess *process) > { > size_t len; > int i; > const char *name; > -static char target_xml[1024]; As noted in your patch #2, I'd add GDBProcess::target_xml in this same patch. With this change: Reviewed-by: Philippe Mathieu-Daudé > +CPUState *cpu = get_first_cpu_in_process(s, process); > +CPUClass *cc = CPU_GET_CLASS(cpu); > > len = 0; > while (p[len] && p[len] != ':') > len++; > *newp = p + len; > > name = NULL; > if (strncmp(p, "target.xml", len) == 0) { > +char *buf = process->target_xml; > +const size_t buf_sz = sizeof(process->target_xml); > + > /* Generate the XML description for this CPU. */ > -if (!target_xml[0]) { > +if (!buf[0]) { > GDBRegisterState *r; > -CPUState *cpu = first_cpu; > > -pstrcat(target_xml, sizeof(target_xml), > +pstrcat(buf, buf_sz, > "" > "" > ""); > if (cc->gdb_arch_name) { > gchar *arch = cc->gdb_arch_name(cpu); > -pstrcat(target_xml, sizeof(target_xml), ""); > -pstrcat(target_xml, sizeof(target_xml), arch); > -pstrcat(target_xml, sizeof(target_xml), ""); > +pstrcat(buf, buf_sz, ""); > +pstrcat(buf, buf_sz, arch); > +pstrcat(buf, buf_sz, ""); > g_free(arch); > } > -pstrcat(target_xml, sizeof(target_xml), " -pstrcat(target_xml, sizeof(target_xml), cc->gdb_core_xml_file); > -pstrcat(target_xml, sizeof(target_xml), "\"/>"); > +pstrcat(buf, buf_sz, " +pstrcat(buf, buf_sz, cc->gdb_core_xml_file); > +pstrcat(buf, buf_sz, "\"/>"); > for (r = cpu->gdb_regs; r; r = r->next) { > -pstrcat(target_xml, sizeof(target_xml), " href=\""); > -pstrcat(target_xml, sizeof(target_xml), r->xml); > -pstrcat(target_xml, sizeof(target_xml), "\"/>"); > +pstrcat(buf, buf_sz, " +pstrcat(buf, buf_sz, r->xml); > +pstrcat(buf, buf_sz, "\"/>"); > } > -pstrcat(target_xml, sizeof(target_xml), ""); > +pstrcat(buf, buf_sz, ""); > } > -return target_xml; > +return buf; > } > if (cc->gdb_get_dynamic_xml) { > -CPUState *cpu = first_cpu; > char *xmlname = g_strndup(p, len); > const char *xml = cc->gdb_get_dynamic_xml(cpu, xmlname); > > g_free(xmlname); > if (xml) { > @@ -1246,10 +1248,11 @@ out: > } > > static int gdb_handle_packet(GDBState *s, const char *line_buf) > { > CPUState *cpu; > +GDBProcess *process; > CPUClass *cc; > const char *p; > uint32_t pid, tid; > int ch, reg_size, type, res; > uint8_t mem_buf[MAX_PACKET_LENGTH]; > @@ -1620,18 +1623,19 @@ static int gdb_handle_packet(GDBState *s, const char > *line_buf) > } > if (strncmp(p, "Xfer:features:read:", 19) == 0) { > const char *xml; > target_ulong total_len; > > -cc = CPU_GET_CLASS(first_cpu); > +process = gdb_get_cpu_process(s, s->g_cpu); > +cc = CPU_GET_CLASS(s->g_cpu); > if (cc->gdb_core_xml_file == NULL) { > goto unknown_command; > } > > gdb_has_xml = true; > p += 19; > -xml = get_feature_xml(p, &p, cc); > +xml = get_feature_xml(s, p, &p, process); > if (!xml) { > snprintf(buf, sizeof(buf), "E00"); > put_packet(s, buf); > break; > } >
Re: [Qemu-devel] [PATCH 4/7] qcow2: async scheme for qcow2_co_preadv
01.10.2018 18:49, Max Reitz wrote: On 01.10.18 17:33, Vladimir Sementsov-Ogievskiy wrote: 27.09.2018 21:35, Max Reitz wrote: On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: Start several async requests instead of read chunk by chunk. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2.c | 208 -- 1 file changed, 204 insertions(+), 4 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 5e7f2ee318..a0df8d4e50 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1869,6 +1869,197 @@ out: return ret; } +typedef struct Qcow2WorkerTask { + uint64_t file_cluster_offset; + uint64_t offset; + uint64_t bytes; + uint64_t bytes_done; +} Qcow2WorkerTask; Why don't you make this a union of request-specific structs? ok, will try + +typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector *qiov, + Qcow2WorkerTask *task); + +typedef struct Qcow2RWState { + BlockDriverState *bs; + QEMUIOVector *qiov; + uint64_t bytes; Maybe make it total_bytes so it doesn't conflict with the value in Qcow2WorkerTask? ok + int ret; + bool waiting_one; + bool waiting_all; + bool finalize; + Coroutine *co; + QSIMPLEQ_HEAD(, Qcow2Worker) free_workers; + QSIMPLEQ_HEAD(, Qcow2Worker) busy_workers; + int online_workers; + Qcow2DoWorkFunc do_work_func; +} Qcow2RWState; + +typedef struct Qcow2Worker { + Qcow2RWState *rws; + Coroutine *co; + Qcow2WorkerTask task; + bool busy; + QSIMPLEQ_ENTRY(Qcow2Worker) entry; +} Qcow2Worker; +#define QCOW2_MAX_WORKERS 64 That's really a bit hidden here. I think it should go into the header. Also I'm not quite sure about the number. In other places we've always used 16. (With the encryption code always allocating a new bounce buffer, this can mean quite a bit of memory usage.) No doubts. + +static coroutine_fn void qcow2_rw_worker(void *opaque); +static Qcow2Worker *qcow2_new_worker(Qcow2RWState *rws) +{ + Qcow2Worker *w = g_new0(Qcow2Worker, 1); + w->rws = rws; + w->co = qemu_coroutine_create(qcow2_rw_worker, w); + + return w; +} + +static void qcow2_free_worker(Qcow2Worker *w) +{ + g_free(w); +} + +static coroutine_fn void qcow2_rw_worker(void *opaque) +{ + Qcow2Worker *w = opaque; + Qcow2RWState *rws = w->rws; + + rws->online_workers++; + + while (!rws->finalize) { + int ret = rws->do_work_func(rws->bs, rws->qiov, &w->task); + if (ret < 0 && rws->ret == 0) { + rws->ret = ret; + } + + if (rws->waiting_all || rws->ret < 0) { + break; + } + + w->busy = false; + QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry); + QSIMPLEQ_INSERT_TAIL(&rws->free_workers, w, entry); + if (rws->waiting_one) { + rws->waiting_one = false; + /* we must unset it here, to prevent queuing rws->co in several + * workers (it may happen if other worker already waits us on mutex, + * so it will be entered after our yield and before rws->co enter) + * + * TODO: rethink this comment, as here (and in other places in the + * file) we moved from qemu_coroutine_add_next to aio_co_wake. + */ + aio_co_wake(rws->co); + } + + qemu_coroutine_yield(); + } + + if (w->busy) { + w->busy = false; + QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry); + } + qcow2_free_worker(w); + rws->online_workers--; + + if (rws->waiting_all && rws->online_workers == 0) { + aio_co_wake(rws->co); + } +} + +static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws, + uint64_t file_cluster_offset, + uint64_t offset, + uint64_t bytes, + uint64_t bytes_done) I'd propose just taking a const Qcow2WorkerTask * here. (Makes even more sense if you make it a union.) ok, I'll try this way +{ + Qcow2Worker *w; + + assert(rws->co == qemu_coroutine_self()); + + if (bytes_done == 0 && bytes == rws->bytes) { + Qcow2WorkerTask task = { + .file_cluster_offset = file_cluster_offset, + .offset = offset, + .bytes = bytes, + .bytes_done = bytes_done + }; + rws->ret = rws->do_work_func(rws->bs, rws->qiov, &task); (If so, you'd just pass the pointer along here) + return; + } I like this fast path, but I think it deserves a small comment. (That is a fast path and bypasses the whole worker infrastructure.) + + if (!QSIMPLEQ_EMPTY(&rws->free_workers)) { + w = QSIMPLEQ_FIRST(&rws->free_workers); + QSIMPLEQ_REMOVE_HEAD(&rws->free_workers, entry); + } else if (rws->online_workers < QCOW2_MAX_WORKERS) { + w = qco
Re: [Qemu-devel] [PATCH v2 01/15] gdbstub: introduce GDB processes
Hi Luc, On 01/10/2018 13:56, Luc Michel wrote: > Add a structure GDBProcess that represent processes from the GDB > semantic point of view. > > CPUs can be split into different processes, by grouping them under a QOM > container named after the GDB_CPU_GROUP_NAME macro (`gdb-group[*]'). > Each occurrence of such a container implies the existence of the > corresponding process in the GDB stub. The gdb_cpu_group_container_get() > function can be used to create a new container. > > When no such container are found, all the CPUs are put in a unique GDB > process (create_unique_process()). This is also the case when compiled > in user mode, where multi-processes do not make much sense for now. > > Signed-off-by: Luc Michel > --- > include/exec/gdbstub.h | 8 + > gdbstub.c | 67 ++ > 2 files changed, 75 insertions(+) > > diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h > index 08363969c1..a3e4159bf4 100644 > --- a/include/exec/gdbstub.h > +++ b/include/exec/gdbstub.h > @@ -1,8 +1,10 @@ > #ifndef GDBSTUB_H > #define GDBSTUB_H > > +#include "qom/object.h" > + > #define DEFAULT_GDBSTUB_PORT "1234" > > /* GDB breakpoint/watchpoint types */ > #define GDB_BREAKPOINT_SW0 > #define GDB_BREAKPOINT_HW1 > @@ -129,6 +131,12 @@ void gdbserver_cleanup(void); > extern bool gdb_has_xml; > > /* in gdbstub-xml.c, generated by scripts/feature_to_c.sh */ > extern const char *const xml_builtin[][2]; > > +#define GDB_CPU_GROUP_NAME "gdb-group" > + > +static inline Object *gdb_cpu_group_container_get(Object *parent) > +{ > +return container_get(parent, "/" GDB_CPU_GROUP_NAME "[*]"); > +} > #endif > diff --git a/gdbstub.c b/gdbstub.c > index d6ab95006c..5c86218f49 100644 > --- a/gdbstub.c > +++ b/gdbstub.c > @@ -295,10 +295,17 @@ typedef struct GDBRegisterState { > gdb_reg_cb set_reg; > const char *xml; > struct GDBRegisterState *next; > } GDBRegisterState; > > +typedef struct GDBProcess { > +uint32_t pid; > +bool attached; > + > +char target_xml[1024]; I'd add this field in the patch #7 "support to Xfer:features:read:" where you start using it. > +} GDBProcess; > + > enum RSState { > RS_INACTIVE, > RS_IDLE, > RS_GETLINE, > RS_GETLINE_ESC, > @@ -323,10 +330,13 @@ typedef struct GDBState { > int running_state; > #else > CharBackend chr; > Chardev *mon_chr; > #endif > +bool multiprocess; > +GDBProcess *processes; > +int process_num; > char syscall_buf[256]; > gdb_syscall_complete_cb current_syscall_cb; > } GDBState; > > /* By default use no IRQs and no timers while single stepping so as to > @@ -1750,10 +1760,24 @@ void gdb_exit(CPUArchState *env, int code) > #ifndef CONFIG_USER_ONLY >qemu_chr_fe_deinit(&s->chr, true); > #endif > } > > +/* > + * Create a unique process containing all the CPUs. > + */ > +static void create_unique_process(GDBState *s) > +{ > +GDBProcess *process; > + > +s->processes = g_malloc0(sizeof(GDBProcess)); > +s->process_num = 1; > +process = &s->processes[0]; > + > +process->pid = 1; > +} > + > #ifdef CONFIG_USER_ONLY > int > gdb_handlesig(CPUState *cpu, int sig) > { > GDBState *s; > @@ -1847,10 +1871,11 @@ static bool gdb_accept(void) > } > > s = g_malloc0(sizeof(GDBState)); > s->c_cpu = first_cpu; > s->g_cpu = first_cpu; > +create_unique_process(s); > s->fd = fd; > gdb_has_xml = false; > > gdbserver_state = s; > return true; > @@ -2003,10 +2028,48 @@ static const TypeInfo char_gdb_type_info = { > .name = TYPE_CHARDEV_GDB, > .parent = TYPE_CHARDEV, > .class_init = char_gdb_class_init, > }; > > +static void create_processes(GDBState *s) > +{ > +Object *container; > +int i = 0; > +char process_str[16]; > + > +container = object_resolve_path(GDB_CPU_GROUP_NAME "[0]", NULL); > + > +while (container) { > +s->processes = g_renew(GDBProcess, s->processes, i + 1); > + > +GDBProcess *process = &s->processes[i]; > + > +/* GDB process IDs -1 and 0 are reserved */ > +process->pid = i + 1; > +process->attached = false; > +process->target_xml[0] = '\0'; > + > +i++; > +snprintf(process_str, sizeof(process_str), GDB_CPU_GROUP_NAME > "[%d]", i); > +container = object_resolve_path(process_str, NULL); > +} > + > +if (!s->processes) { > +/* No CPU group specified by the machine */ > +create_unique_process(s); > +} else { > +s->process_num = i; > +} > +} > + > +static void cleanup_processes(GDBState *s) > +{ > +g_free(s->processes); > +s->process_num = 0; > +s->processes = NULL; > +} > + > int gdbserver_start(const char *device) > { > trace_gdbstub_op_start(device); > > GDBState *s; > @@ -2055,15 +2118,19 @@ int gdbserver_start(const char *device) >
Re: [Qemu-devel] [PATCH] test-replication: Lock AioContext around blk_unref()
On 01/10/2018 18:03, Kevin Wolf wrote: >> Given the backtrace, I think bdrv_close should be taking the AioContext >> lock instead of blockdev_close_all_bdrv_states. > Conversely, that would mean that calling bdrv_unref() with the > AioContext lock held is a bug (because close callbacks can involve > AIO_WAIT_WHILE()). I'm not sure if that's very practical. What if bdrv_unref_child dropped the lock? Perhaps it's simpler to just kill the lock though, if bdrv_set_aio_context can be figured out. Paolo > Of course, there will probably be a lot of callers to fix either way > after we define whether to hold the lock for bdrv_unref() or not. Either > you need to add locking to the places where it's missing or you need to > drop the locks in all other places. > > I was leaning towards requiring the lock for bdrv_unref() (and > therefore blk_unref()).
Re: [Qemu-devel] [PATCH] tests/test-bdrv-drain: Fix too late qemu_event_reset()
On 01.10.18 17:10, Kevin Wolf wrote: > qemu_event_reset() must be called before the AIO request in a different > iothread is submitted. Otherwise the request could be completed before > we do the qemu_event_reset() and the test would hang in > qemu_event_wait(). > > Signed-off-by: Kevin Wolf > --- > tests/test-bdrv-drain.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Reviewed-by: Max Reitz Tested-by: Max Reitz (I still got the same hang as the one you just reported, though -- but it happens just very rarely, so it's definitely better.) signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Qemu-block] [PULL 00/42] Block patches
Am 01.10.2018 um 16:14 hat Kevin Wolf geschrieben: > Am 01.10.2018 um 15:03 hat Peter Maydell geschrieben: > > On 28 September 2018 at 15:36, Peter Maydell > > wrote: > > > I'm finding that test-bdrv-drain hangs intermittently on my OSX host. > > > > Ping? Between this and test-replication I'm finding that my > > parallel build tests for merges are failing about 50% of the > > time :-( > > Sorry, there wasn't much more than a weekend between your report and > now. > > For the replication one, I think we can just take the AioContext lock in > the test case while we decide how the API should really be used. I'll > prepare a fix for that (and hopefully I'll be able to reproduce the > problem reliably enough to verify the fix). > > Max said he could reproduce some hang in test-bdrv-drain (though we > don't know if this has anything to do with your OS X hang, which looked > rather odd) and would look into it, but I don't think we know the > problem yet. I'll try to reproduce that one after fixing the replication > test. So I sent two patches for the two test cases that should fix the bugs that made the tests fail relatively frequently. I can still reproduce another hang, which is a bit mysterious to me: Thread 2 (Thread 3321.3818): #0 0x7f2ebbdcc4e9 in syscall () from /lib64/libc.so.6 #1 0x5594d095690b in qemu_futex_wait (val=, f=) at /home/kwolf/source/qemu/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0x5594d0bff228 ) at util/qemu-thread-posix.c:442 #3 0x5594d0965f58 in call_rcu_thread (opaque=) at util/rcu.c:261 #4 0x7f2ebc09d36d in start_thread () from /lib64/libpthread.so.0 #5 0x7f2ebbdd1b4f in clone () from /lib64/libc.so.6 Thread 1 (Thread 3321.3321): #0 0x7f2ebc09e89d in pthread_join () from /lib64/libpthread.so.0 #1 0x5594d0956b6f in qemu_thread_join (thread=thread@entry=0x5594d16bd0b8) at util/qemu-thread-posix.c:565 #2 0x5594d091f4d9 in iothread_join (iothread=0x5594d16bd0b0) at tests/iothread.c:62 #3 0x5594d08806cc in test_iothread_common (drain_type=BDRV_DRAIN_ALL, drain_thread=) at tests/test-bdrv-drain.c:763 #4 0x7f2ebd58e178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #5 0x7f2ebd58e37b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #6 0x7f2ebd58e37b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0 #7 0x7f2ebd58e51b in g_test_run_suite () from /lib64/libglib-2.0.so.0 #8 0x7f2ebd58e571 in g_test_run () from /lib64/libglib-2.0.so.0 #9 0x5594d087a534 in main (argc=, argv=) at tests/test-bdrv-drain.c:1606 This pthread_join() is waiting for a thread that doesn't even exist any more. I caught the bug in rr and am clearly seeing how the iothread is notified and terminates. But pthread_join() just doesn't return. Kevin
Re: [Qemu-devel] [PATCH] test-replication: Lock AioContext around blk_unref()
Am 01.10.2018 um 17:40 hat Paolo Bonzini geschrieben: > On 01/10/2018 16:32, Kevin Wolf wrote: > > Recently, the test case has started failing because some job related > > functions want to drop the AioContext lock even though it hasn't been > > taken: > > > > (gdb) bt > > #0 0x7f51c067c9fb in raise () from /lib64/libc.so.6 > > #1 0x7f51c067e77d in abort () from /lib64/libc.so.6 > > #2 0x558c9d5dde7b in error_exit (err=, > > msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at > > util/qemu-thread-posix.c:36 > > #3 0x558c9d6b5263 in qemu_mutex_unlock_impl > > (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f > > "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 > > #4 0x558c9d6b0565 in aio_context_release > > (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 > > #5 0x558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) > > at job.c:738 > > #6 0x558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, > > finish=finish@entry=0x558c9d5eb8d0 , errp=errp@entry=0x0) > > at job.c:986 > > #7 0x558c9d5eb8ee in job_cancel_sync (job=) at > > job.c:941 > > #8 0x558c9d64d853 in replication_close (bs=) at > > block/replication.c:148 > > #9 0x558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 > > #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 > > #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 > > #12 0x558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) > > at block/block-backend.c:783 > > #13 0x558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at > > block/block-backend.c:402 > > #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 > > #15 0x558c9d5dfcea in test_secondary_stop () at > > tests/test-replication.c:478 > > #16 0x7f51c1f13178 in g_test_run_suite_internal () from > > /lib64/libglib-2.0.so.0 > > #17 0x7f51c1f1337b in g_test_run_suite_internal () from > > /lib64/libglib-2.0.so.0 > > #18 0x7f51c1f1337b in g_test_run_suite_internal () from > > /lib64/libglib-2.0.so.0 > > #19 0x7f51c1f13552 in g_test_run_suite () from > > /lib64/libglib-2.0.so.0 > > #20 0x7f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 > > #21 0x558c9d5de31f in main (argc=, argv= > out>) at tests/test-replication.c:581 > > > > It is yet unclear whether this should really be considered a bug in the > > test case or whether blk_unref() should work for callers that haven't > > taken the AioContext lock, but in order to fix the build tests quickly, > > just take the AioContext lock around blk_unref(). > > Given the backtrace, I think bdrv_close should be taking the AioContext > lock instead of blockdev_close_all_bdrv_states. Conversely, that would mean that calling bdrv_unref() with the AioContext lock held is a bug (because close callbacks can involve AIO_WAIT_WHILE()). I'm not sure if that's very practical. Of course, there will probably be a lot of callers to fix either way after we define whether to hold the lock for bdrv_unref() or not. Either you need to add locking to the places where it's missing or you need to drop the locks in all other places. I was leaning towards requiring the lock for bdrv_unref() (and therefore blk_unref()). Kevin
Re: [Qemu-devel] [PATCH 3/7] qcow2: split out reading normal clusters from qcow2_co_preadv
01.10.2018 18:39, Max Reitz wrote: On 01.10.18 17:14, Vladimir Sementsov-Ogievskiy wrote: 27.09.2018 20:35, Max Reitz wrote: On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: Memory allocation may become less efficient for encrypted case. It's a payment for further asynchronous scheme. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2.c | 114 -- 1 file changed, 64 insertions(+), 50 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 65e3ca00e2..5e7f2ee318 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1808,6 +1808,67 @@ out: return ret; } +static coroutine_fn int qcow2_co_preadv_normal(BlockDriverState *bs, + uint64_t file_cluster_offset, + uint64_t offset, + uint64_t bytes, + QEMUIOVector *qiov, + uint64_t qiov_offset) +{ +int ret; +BDRVQcow2State *s = bs->opaque; +void *crypt_buf = NULL; +QEMUIOVector hd_qiov; +int offset_in_cluster = offset_into_cluster(s, offset); + +if ((file_cluster_offset & 511) != 0) { +return -EIO; +} + +qemu_iovec_init(&hd_qiov, qiov->niov); So you're not just re-allocating the bounce buffer for every single call, but also this I/O vector. Hm. +if (bs->encrypted) { +assert(s->crypto); +assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); + +crypt_buf = qemu_try_blockalign(bs->file->bs, bytes); 1. Why did you remove the comment? 2. The check for whether crypt_buf was successfully allocated is missing. 3. Do you have any benchmarks for encrypted images? Having to allocate the bounce buffer for potentially every single cluster sounds rather bad to me. Hmm, no excuses. 1- Will fix, 2 - will fix, 3 - will fix or at least test the performance. +qemu_iovec_add(&hd_qiov, crypt_buf, bytes); +} else { +qemu_iovec_concat(&hd_qiov, qiov, qiov_offset, bytes); qcow2_co_preadv() continues to do this as well. That's superfluous now. hd_qiov is local here.. or what do you mean? qcow2_co_preadv() continues to have its own hd_qiov and appends qiov to it (just like here), but it's unused for normal clusters. So it doesn't have to do that for normal clusters. ah, yes, understand. Will think how to properly handle it. +} + +BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); +ret = bdrv_co_preadv(bs->file, + file_cluster_offset + offset_in_cluster, + bytes, &hd_qiov, 0); +if (ret < 0) { +goto out; +} + +if (bs->encrypted) { +assert(s->crypto); +assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0); +assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0); +if (qcrypto_block_decrypt(s->crypto, + (s->crypt_physical_offset ? + file_cluster_offset + offset_in_cluster : + offset), + crypt_buf, + bytes, + NULL) < 0) { What's the reason against decrypting this in-place in qiov so we could save the bounce buffer? We allow offsets in clusters, so why can't we just call this function once per involved I/O vector entry? Isn't it unsafe to do decryption in guest buffers? I don't know. Why would it be? Because... Max +ret = -EIO; +goto out; +} +qemu_iovec_from_buf(qiov, qiov_offset, crypt_buf, bytes); ...we're putting the decrypted content there anyway. The only issue I see is that the guest may see encrypted content for a short duration. Hm. Maybe we don't want that. In which case it probably has to stay as it is. Yes, I mean exactly this thing. Max -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH 1/7] qcow2: move qemu_co_mutex_lock below decryption procedure
27.09.2018 20:02, Max Reitz wrote: On 27.09.18 18:58, Max Reitz wrote: On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: From: "Denis V. Lunev" We are not working with a shared state data in the decruption code and (*decryption) thus this operation is safe. On the other hand this significantly reduces the scope of the lock. Sure, but does it have any effect? This is a coroutine lock and I don't see any yield points besides the bdrv_co_preadv() that was executed outside of the lock anyway. I think it makes sense to move the qemu_co_mutex_lock() down below the decryption, as the commit title suggests. Because then we can decrypt while another coroutine that uses the lock is blocking. But I don't see the point of moving the qemu_co_mutex_unlock() up. (That is non-trivial movement because it means I'd have to find out whether anything that could modify the cluster offset is serialized by the generic block layer. Or, well, not really, because there is no yield point, so it doesn't actually matter. But it doesn't give us any benefit either, I think.) Max It don't make any performance benefit, but it allows to split part about normal cluster reading to separate function. Hmm, in future we can do decryption in threads (like compress threads) and it should improve performance. Max Signed-off-by: Denis V. Lunev --- block/qcow2.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index ec9e6238a0..41def07c67 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1880,9 +1880,11 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, break; case QCOW2_CLUSTER_NORMAL: +qemu_co_mutex_unlock(&s->lock); + if ((cluster_offset & 511) != 0) { ret = -EIO; -goto fail; +goto fail_nolock; } if (bs->encrypted) { @@ -1899,7 +1901,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, * s->cluster_size); if (cluster_data == NULL) { ret = -ENOMEM; -goto fail; +goto fail_nolock; } } @@ -1909,13 +1911,11 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, } BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); -qemu_co_mutex_unlock(&s->lock); ret = bdrv_co_preadv(bs->file, cluster_offset + offset_in_cluster, cur_bytes, &hd_qiov, 0); -qemu_co_mutex_lock(&s->lock); if (ret < 0) { -goto fail; +goto fail_nolock; } if (bs->encrypted) { assert(s->crypto); @@ -1929,10 +1929,11 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, cur_bytes, NULL) < 0) { ret = -EIO; -goto fail; +goto fail_nolock; } qemu_iovec_from_buf(qiov, bytes_done, cluster_data, cur_bytes); } +qemu_co_mutex_lock(&s->lock); break; default: @@ -1950,6 +1951,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, fail: qemu_co_mutex_unlock(&s->lock); +fail_nolock: qemu_iovec_destroy(&hd_qiov); qemu_vfree(cluster_data); -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH] scripts/device-crash-test: Remove entries for serial devices
On Mon, Oct 01, 2018 at 04:13:10PM +0200, Thomas Huth wrote: > The problem with the various serial devices has been fixed a while > ago in commit 47c4f85a0c27888e12af827471cfef87deb49821 ("hw/char/serial: > Allow disconnected chardevs") already, so we can remove these entries > from the "ignore" list in the device-crash-test script now. > > Signed-off-by: Thomas Huth Thanks, queued on python-next. -- Eduardo
Re: [Qemu-devel] [PATCH 4/7] qcow2: async scheme for qcow2_co_preadv
On 01.10.18 17:33, Vladimir Sementsov-Ogievskiy wrote: > 27.09.2018 21:35, Max Reitz wrote: >> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: >>> Start several async requests instead of read chunk by chunk. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>> --- >>> block/qcow2.c | 208 >>> -- >>> 1 file changed, 204 insertions(+), 4 deletions(-) >>> >>> diff --git a/block/qcow2.c b/block/qcow2.c >>> index 5e7f2ee318..a0df8d4e50 100644 >>> --- a/block/qcow2.c >>> +++ b/block/qcow2.c >>> @@ -1869,6 +1869,197 @@ out: >>> return ret; >>> } >>> +typedef struct Qcow2WorkerTask { >>> + uint64_t file_cluster_offset; >>> + uint64_t offset; >>> + uint64_t bytes; >>> + uint64_t bytes_done; >>> +} Qcow2WorkerTask; >> Why don't you make this a union of request-specific structs? > > ok, will try > >> >>> + >>> +typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector >>> *qiov, >>> + Qcow2WorkerTask *task); >>> + >>> +typedef struct Qcow2RWState { >>> + BlockDriverState *bs; >>> + QEMUIOVector *qiov; >>> + uint64_t bytes; >> Maybe make it total_bytes so it doesn't conflict with the value in >> Qcow2WorkerTask? > > ok > >> >>> + int ret; >>> + bool waiting_one; >>> + bool waiting_all; >>> + bool finalize; >>> + Coroutine *co; >>> + QSIMPLEQ_HEAD(, Qcow2Worker) free_workers; >>> + QSIMPLEQ_HEAD(, Qcow2Worker) busy_workers; >>> + int online_workers; >>> + Qcow2DoWorkFunc do_work_func; >>> +} Qcow2RWState; >>> + >>> +typedef struct Qcow2Worker { >>> + Qcow2RWState *rws; >>> + Coroutine *co; >>> + Qcow2WorkerTask task; >>> + bool busy; >>> + QSIMPLEQ_ENTRY(Qcow2Worker) entry; >>> +} Qcow2Worker; >>> +#define QCOW2_MAX_WORKERS 64 >> That's really a bit hidden here. I think it should go into the header. >> >> Also I'm not quite sure about the number. In other places we've always >> used 16. >> >> (With the encryption code always allocating a new bounce buffer, this >> can mean quite a bit of memory usage.) > > No doubts. > >> >>> + >>> +static coroutine_fn void qcow2_rw_worker(void *opaque); >>> +static Qcow2Worker *qcow2_new_worker(Qcow2RWState *rws) >>> +{ >>> + Qcow2Worker *w = g_new0(Qcow2Worker, 1); >>> + w->rws = rws; >>> + w->co = qemu_coroutine_create(qcow2_rw_worker, w); >>> + >>> + return w; >>> +} >>> + >>> +static void qcow2_free_worker(Qcow2Worker *w) >>> +{ >>> + g_free(w); >>> +} >>> + >>> +static coroutine_fn void qcow2_rw_worker(void *opaque) >>> +{ >>> + Qcow2Worker *w = opaque; >>> + Qcow2RWState *rws = w->rws; >>> + >>> + rws->online_workers++; >>> + >>> + while (!rws->finalize) { >>> + int ret = rws->do_work_func(rws->bs, rws->qiov, &w->task); >>> + if (ret < 0 && rws->ret == 0) { >>> + rws->ret = ret; >>> + } >>> + >>> + if (rws->waiting_all || rws->ret < 0) { >>> + break; >>> + } >>> + >>> + w->busy = false; >>> + QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry); >>> + QSIMPLEQ_INSERT_TAIL(&rws->free_workers, w, entry); >>> + if (rws->waiting_one) { >>> + rws->waiting_one = false; >>> + /* we must unset it here, to prevent queuing rws->co in >>> several >>> + * workers (it may happen if other worker already waits >>> us on mutex, >>> + * so it will be entered after our yield and before >>> rws->co enter) >>> + * >>> + * TODO: rethink this comment, as here (and in other >>> places in the >>> + * file) we moved from qemu_coroutine_add_next to >>> aio_co_wake. >>> + */ >>> + aio_co_wake(rws->co); >>> + } >>> + >>> + qemu_coroutine_yield(); >>> + } >>> + >>> + if (w->busy) { >>> + w->busy = false; >>> + QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry); >>> + } >>> + qcow2_free_worker(w); >>> + rws->online_workers--; >>> + >>> + if (rws->waiting_all && rws->online_workers == 0) { >>> + aio_co_wake(rws->co); >>> + } >>> +} >>> + >>> +static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws, >>> + uint64_t >>> file_cluster_offset, >>> + uint64_t offset, >>> + uint64_t bytes, >>> + uint64_t bytes_done) >> I'd propose just taking a const Qcow2WorkerTask * here. (Makes even >> more sense if you make it a union.) > > ok, I'll try this way > >> >>> +{ >>> + Qcow2Worker *w; >>> + >>> + assert(rws->co == qemu_coroutine_self()); >>> + >>> + if (bytes_done == 0 && bytes == rws->bytes) { >>> + Qcow2WorkerTask task = { >>> + .file_cluster_offset = file_cluster_offset, >>> + .offset = offset, >>> + .bytes = bytes, >>> +
Re: [Qemu-devel] [PATCH 5/7] qcow2: refactor qcow2_co_pwritev: split out qcow2_co_do_pwritev
On 01.10.18 17:43, Vladimir Sementsov-Ogievskiy wrote: > 28.09.2018 17:23, Max Reitz wrote: >> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: >>> Split out block which will be reused in async scheme. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>> --- >>> block/qcow2.c | 138 >>> -- >>> 1 file changed, 86 insertions(+), 52 deletions(-) >>> >>> diff --git a/block/qcow2.c b/block/qcow2.c >>> index a0df8d4e50..4d669432d1 100644 >>> --- a/block/qcow2.c >>> +++ b/block/qcow2.c >>> @@ -2210,6 +2210,85 @@ static bool merge_cow(uint64_t offset, >>> unsigned bytes, >>> return false; >>> } >>> +/* qcow2_co_do_pwritev >>> + * Called without s->lock unlocked >>> + * hd_qiov - temp qiov for any use. It is initialized so it is empty >>> and >>> + * support adding up to qiov->niov + 2 elements >>> + * l2meta - if not NULL, qcow2_co_do_pwritev() will consume it. >>> Caller must not >>> + * use it somehow after qcow2_co_do_pwritev() call >>> + */ >>> +static coroutine_fn int qcow2_co_do_pwritev(BlockDriverState *bs, >>> + uint64_t >>> file_cluster_offset, >>> + uint64_t offset, >>> + uint64_t bytes, >>> + QEMUIOVector *qiov, >>> + uint64_t qiov_offset, >>> + QCowL2Meta *l2meta) >>> +{ >>> + int ret; >>> + BDRVQcow2State *s = bs->opaque; >>> + void *crypt_buf = NULL; >>> + QEMUIOVector hd_qiov; >>> + int offset_in_cluster = offset_into_cluster(s, offset); >>> + >>> + qemu_iovec_reset(&hd_qiov); >> This shouldn't be here. >> >>> + qemu_iovec_init(&hd_qiov, qiov->niov); >>> + >>> + if (bs->encrypted) { >>> + assert(s->crypto); >>> + assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); >>> + crypt_buf = qemu_try_blockalign(bs->file->bs, bytes); >>> + qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes); >>> + >>> + if (qcrypto_block_encrypt(s->crypto, >>> + (s->crypt_physical_offset ? >>> + file_cluster_offset + >>> offset_in_cluster : >>> + offset), >>> + crypt_buf, >>> + bytes, NULL) < 0) { >> Same question as in the read case: Can't we make do without the bounce >> buffer? > > I think, we should not modify guest buffers.. Hmm, yes, agreed. O:-) Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 7/7] qcow2: async scheme for qcow2_co_pwritev
28.09.2018 17:35, Max Reitz wrote: On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: Start several async requests instead of read chunk by chunk. Iotest 026 output is changed, as because of async io error path has changed. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2.c | 47 ++ tests/qemu-iotests/026.out | 18 --- tests/qemu-iotests/026.out.nocache | 20 3 files changed, 59 insertions(+), 26 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 88b3fb8080..c7501adfc6 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1874,6 +1874,7 @@ typedef struct Qcow2WorkerTask { uint64_t offset; uint64_t bytes; uint64_t bytes_done; +QCowL2Meta *l2meta; } Qcow2WorkerTask; typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector *qiov, @@ -1965,11 +1966,16 @@ static coroutine_fn void qcow2_rw_worker(void *opaque) } } +/* qcow2_rws_add_task +* l2meta - opaque pointer for do_work_func, so behavior depends on +* do_work_func, specified in qcow2_init_rws +*/ I think this would work better if Qcow2WorkerTask was indeed a union. static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws, uint64_t file_cluster_offset, uint64_t offset, uint64_t bytes, -uint64_t bytes_done) +uint64_t bytes_done, +QCowL2Meta *l2meta) { Qcow2Worker *w; @@ -1980,7 +1986,8 @@ static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws, .file_cluster_offset = file_cluster_offset, .offset = offset, .bytes = bytes, -.bytes_done = bytes_done +.bytes_done = bytes_done, +.l2meta = l2meta }; rws->ret = rws->do_work_func(rws->bs, rws->qiov, &task); return; @@ -2006,6 +2013,7 @@ static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws, w->task.offset = offset; w->task.bytes = bytes; w->task.bytes_done = bytes_done; +w->task.l2meta = l2meta; qemu_coroutine_enter(w->co); } @@ -2137,7 +2145,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset, qemu_co_mutex_unlock(&s->lock); qcow2_rws_add_task(&rws, cluster_offset, offset, cur_bytes, - bytes_done); + bytes_done, NULL); if (rws.ret < 0) { ret = rws.ret; goto fail_nolock; @@ -2289,6 +2297,19 @@ fail: return ret; } +static coroutine_fn int qcow2_co_do_pwritev_task(BlockDriverState *bs, + QEMUIOVector *qiov, + Qcow2WorkerTask *task) +{ +return qcow2_co_do_pwritev(bs, + task->file_cluster_offset, + task->offset, + task->bytes, + qiov, + task->bytes_done, + task->l2meta); +} + static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) @@ -2299,16 +2320,19 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes_done = 0; uint8_t *cluster_data = NULL; QCowL2Meta *l2meta = NULL; +Qcow2RWState rws = {0}; trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes); +qcow2_init_rws(&rws, bs, qiov, bytes, qcow2_co_do_pwritev_task); + qemu_iovec_init(&hd_qiov, qiov->niov); s->cluster_cache_offset = -1; /* disable compressed cache */ qemu_co_mutex_lock(&s->lock); -while (bytes != 0) { +while (bytes != 0 && rws.ret == 0) { int offset_in_cluster = offset_into_cluster(s, offset); unsigned int cur_bytes = MIN(bytes, INT_MAX); /* number of sectors in current iteration */ @@ -2340,11 +2364,11 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, qemu_co_mutex_unlock(&s->lock); - -ret = qcow2_co_do_pwritev(bs, cluster_offset, offset, cur_bytes, - qiov, bytes_done, l2meta); -l2meta = NULL; /* l2meta is consumed by qcow2_co_do_pwritev() */ -if (ret < 0) { +qcow2_rws_add_task(&rws, cluster_offset, offset, cur_bytes, bytes_done, + l2meta); +l2meta = NULL; /* l2m
Re: [Qemu-devel] [PATCH 5/7] qcow2: refactor qcow2_co_pwritev: split out qcow2_co_do_pwritev
28.09.2018 17:23, Max Reitz wrote: On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: Split out block which will be reused in async scheme. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2.c | 138 -- 1 file changed, 86 insertions(+), 52 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index a0df8d4e50..4d669432d1 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2210,6 +2210,85 @@ static bool merge_cow(uint64_t offset, unsigned bytes, return false; } +/* qcow2_co_do_pwritev + * Called without s->lock unlocked + * hd_qiov - temp qiov for any use. It is initialized so it is empty and + * support adding up to qiov->niov + 2 elements + * l2meta - if not NULL, qcow2_co_do_pwritev() will consume it. Caller must not + * use it somehow after qcow2_co_do_pwritev() call + */ +static coroutine_fn int qcow2_co_do_pwritev(BlockDriverState *bs, +uint64_t file_cluster_offset, +uint64_t offset, +uint64_t bytes, +QEMUIOVector *qiov, +uint64_t qiov_offset, +QCowL2Meta *l2meta) +{ +int ret; +BDRVQcow2State *s = bs->opaque; +void *crypt_buf = NULL; +QEMUIOVector hd_qiov; +int offset_in_cluster = offset_into_cluster(s, offset); + +qemu_iovec_reset(&hd_qiov); This shouldn't be here. +qemu_iovec_init(&hd_qiov, qiov->niov); + +if (bs->encrypted) { +assert(s->crypto); +assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); +crypt_buf = qemu_try_blockalign(bs->file->bs, bytes); +qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes); + +if (qcrypto_block_encrypt(s->crypto, + (s->crypt_physical_offset ? + file_cluster_offset + offset_in_cluster : + offset), + crypt_buf, + bytes, NULL) < 0) { Same question as in the read case: Can't we make do without the bounce buffer? I think, we should not modify guest buffers.. +ret = -EIO; +goto fail; +} + +qemu_iovec_add(&hd_qiov, crypt_buf, bytes); +} else { +qemu_iovec_concat(&hd_qiov, qiov, qiov_offset, bytes); +} + +/* If we need to do COW, check if it's possible to merge the + * writing of the guest data together with that of the COW regions. + * If it's not possible (or not necessary) then write the + * guest data now. */ +if (!merge_cow(offset, bytes, &hd_qiov, l2meta)) { +BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO); +trace_qcow2_writev_data(qemu_coroutine_self(), +file_cluster_offset + offset_in_cluster); +ret = bdrv_co_pwritev(bs->file, + file_cluster_offset + offset_in_cluster, + bytes, &hd_qiov, 0); +if (ret < 0) { +qemu_co_mutex_lock(&s->lock); +goto fail; +} +} + +qemu_co_mutex_lock(&s->lock); It looks a bit weird to only do some of the lock handling in this function. I don't quite like it, but as long as you keep it around the qcow2_handle_l2meta(), I won't object. A problem is that currently you don't keep it around qcow2_handle_l2meta(). Before going to the fail label, one has to manually lock the mutex (which you don't do in one of the above error paths, but that's fallout from patch 2). Maybe it would be nicer with a separate label that does the lock. Like: ... ret = qcow2_handle_l2meta(bs, &l2meta, true); goto done_locked; fail_unlocked: qemu_co_mutex_lock(&s->lock) done_locked: qcow2_handle_l2meta(bs, &l2meta, false); ... ok + +ret = qcow2_handle_l2meta(bs, &l2meta, true); +if (ret) { +goto fail; +} + +fail: +qcow2_handle_l2meta(bs, &l2meta, false); +qemu_co_mutex_unlock(&s->lock); + +qemu_vfree(crypt_buf); +qemu_iovec_destroy(&hd_qiov); + +return ret; +} + static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags) @@ -2262,63 +2341,16 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset, qemu_co_mutex_unlock(&s->lock); -qemu_iovec_reset(&hd_qiov); -qemu_iovec_concat(&hd_qiov, qiov, bytes_done, cur_bytes); -if (bs->encrypted) { -assert(s->crypto); -if (!cluster_data) { -cluster_data = qemu_try_blockalign(bs->file->bs, -
Re: [Qemu-devel] [PATCH] test-replication: Lock AioContext around blk_unref()
On 01/10/2018 16:32, Kevin Wolf wrote: > Recently, the test case has started failing because some job related > functions want to drop the AioContext lock even though it hasn't been > taken: > > (gdb) bt > #0 0x7f51c067c9fb in raise () from /lib64/libc.so.6 > #1 0x7f51c067e77d in abort () from /lib64/libc.so.6 > #2 0x558c9d5dde7b in error_exit (err=, > msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at > util/qemu-thread-posix.c:36 > #3 0x558c9d6b5263 in qemu_mutex_unlock_impl > (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f > "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96 > #4 0x558c9d6b0565 in aio_context_release > (ctx=ctx@entry=0x558c9f399940) at util/async.c:516 > #5 0x558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at > job.c:738 > #6 0x558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, > finish=finish@entry=0x558c9d5eb8d0 , errp=errp@entry=0x0) at > job.c:986 > #7 0x558c9d5eb8ee in job_cancel_sync (job=) at > job.c:941 > #8 0x558c9d64d853 in replication_close (bs=) at > block/replication.c:148 > #9 0x558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420 > #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629 > #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685 > #12 0x558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at > block/block-backend.c:783 > #13 0x558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at > block/block-backend.c:402 > #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457 > #15 0x558c9d5dfcea in test_secondary_stop () at > tests/test-replication.c:478 > #16 0x7f51c1f13178 in g_test_run_suite_internal () from > /lib64/libglib-2.0.so.0 > #17 0x7f51c1f1337b in g_test_run_suite_internal () from > /lib64/libglib-2.0.so.0 > #18 0x7f51c1f1337b in g_test_run_suite_internal () from > /lib64/libglib-2.0.so.0 > #19 0x7f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0 > #20 0x7f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0 > #21 0x558c9d5de31f in main (argc=, argv= out>) at tests/test-replication.c:581 > > It is yet unclear whether this should really be considered a bug in the > test case or whether blk_unref() should work for callers that haven't > taken the AioContext lock, but in order to fix the build tests quickly, > just take the AioContext lock around blk_unref(). Given the backtrace, I think bdrv_close should be taking the AioContext lock instead of blockdev_close_all_bdrv_states. Paolo
Re: [Qemu-devel] [PATCH 3/7] qcow2: split out reading normal clusters from qcow2_co_preadv
On 01.10.18 17:14, Vladimir Sementsov-Ogievskiy wrote: > 27.09.2018 20:35, Max Reitz wrote: >> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: >>> Memory allocation may become less efficient for encrypted case. It's a >>> payment for further asynchronous scheme. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>> --- >>> block/qcow2.c | 114 >>> -- >>> 1 file changed, 64 insertions(+), 50 deletions(-) >>> >>> diff --git a/block/qcow2.c b/block/qcow2.c >>> index 65e3ca00e2..5e7f2ee318 100644 >>> --- a/block/qcow2.c >>> +++ b/block/qcow2.c >>> @@ -1808,6 +1808,67 @@ out: >>> return ret; >>> } >>> >>> +static coroutine_fn int qcow2_co_preadv_normal(BlockDriverState *bs, >>> + uint64_t >>> file_cluster_offset, >>> + uint64_t offset, >>> + uint64_t bytes, >>> + QEMUIOVector *qiov, >>> + uint64_t qiov_offset) >>> +{ >>> +int ret; >>> +BDRVQcow2State *s = bs->opaque; >>> +void *crypt_buf = NULL; >>> +QEMUIOVector hd_qiov; >>> +int offset_in_cluster = offset_into_cluster(s, offset); >>> + >>> +if ((file_cluster_offset & 511) != 0) { >>> +return -EIO; >>> +} >>> + >>> +qemu_iovec_init(&hd_qiov, qiov->niov); >> So you're not just re-allocating the bounce buffer for every single >> call, but also this I/O vector. Hm. >> >>> +if (bs->encrypted) { >>> +assert(s->crypto); >>> +assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size); >>> + >>> +crypt_buf = qemu_try_blockalign(bs->file->bs, bytes); >> 1. Why did you remove the comment? >> >> 2. The check for whether crypt_buf was successfully allocated is missing. >> >> 3. Do you have any benchmarks for encrypted images? Having to allocate >> the bounce buffer for potentially every single cluster sounds rather bad >> to me. > > Hmm, no excuses. 1- Will fix, 2 - will fix, 3 - will fix or at least > test the performance. > >>> +qemu_iovec_add(&hd_qiov, crypt_buf, bytes); >>> +} else { >>> +qemu_iovec_concat(&hd_qiov, qiov, qiov_offset, bytes); >> qcow2_co_preadv() continues to do this as well. That's superfluous now. > > hd_qiov is local here.. or what do you mean? qcow2_co_preadv() continues to have its own hd_qiov and appends qiov to it (just like here), but it's unused for normal clusters. So it doesn't have to do that for normal clusters. >>> +} >>> + >>> +BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO); >>> +ret = bdrv_co_preadv(bs->file, >>> + file_cluster_offset + offset_in_cluster, >>> + bytes, &hd_qiov, 0); >>> +if (ret < 0) { >>> +goto out; >>> +} >>> + >>> +if (bs->encrypted) { >>> +assert(s->crypto); >>> +assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0); >>> +assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0); >>> +if (qcrypto_block_decrypt(s->crypto, >>> + (s->crypt_physical_offset ? >>> + file_cluster_offset + offset_in_cluster >>> : >>> + offset), >>> + crypt_buf, >>> + bytes, >>> + NULL) < 0) { >> What's the reason against decrypting this in-place in qiov so we could >> save the bounce buffer? We allow offsets in clusters, so why can't we >> just call this function once per involved I/O vector entry? > > Isn't it unsafe to do decryption in guest buffers? I don't know. Why would it be? Because... >> Max >> >>> +ret = -EIO; >>> +goto out; >>> +} >>> +qemu_iovec_from_buf(qiov, qiov_offset, crypt_buf, bytes); ...we're putting the decrypted content there anyway. The only issue I see is that the guest may see encrypted content for a short duration. Hm. Maybe we don't want that. In which case it probably has to stay as it is. Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PULL 0/3] Ui 20181001 patches
On 1 October 2018 at 11:42, Gerd Hoffmann wrote: > The following changes since commit 07f426c35eddd79388a23d11cb278600d7e3831d: > > Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into > staging (2018-09-28 18:56:09 +0100) > > are available in the git repository at: > > git://git.kraxel.org/qemu tags/ui-20181001-pull-request > > for you to fetch changes up to e8b1386ea1719525a1a92df03377764703fe8c64: > > gtk: add zoom-to-fit to gtk options. (2018-10-01 11:29:03 +0200) > > > ui: some small fixes/improvements. > Applied, thanks. -- PMM
Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: e500: convert SysBus init method to a realize method
On Mon, 1 Oct 2018 17:04:13 +0200 Cédric Le Goater wrote: > Signed-off-by: Cédric Le Goater > --- Reviewed-by: Greg Kurz > hw/pci-host/ppce500.c | 14 ++ > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c > index eb75e080fc10..b8f8c112e662 100644 > --- a/hw/pci-host/ppce500.c > +++ b/hw/pci-host/ppce500.c > @@ -436,8 +436,9 @@ static AddressSpace *e500_pcihost_set_iommu(PCIBus *bus, > void *opaque, > return &s->bm_as; > } > > -static int e500_pcihost_initfn(SysBusDevice *dev) > +static void e500_pcihost_realize(DeviceState *dev, Error **errp) > { > +SysBusDevice *sbd = SYS_BUS_DEVICE(dev); > PCIHostState *h; > PPCE500PCIState *s; > PCIBus *b; > @@ -447,7 +448,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev) > s = PPC_E500_PCI_HOST_BRIDGE(dev); > > for (i = 0; i < ARRAY_SIZE(s->irq); i++) { > -sysbus_init_irq(dev, &s->irq[i]); > +sysbus_init_irq(sbd, &s->irq[i]); > } > > for (i = 0; i < PCI_NUM_PINS; i++) { > @@ -460,7 +461,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev) > /* PIO lives at the bottom of our bus space */ > memory_region_add_subregion_overlap(&s->busmem, 0, &s->pio, -2); > > -b = pci_register_root_bus(DEVICE(dev), NULL, mpc85xx_pci_set_irq, > +b = pci_register_root_bus(dev, NULL, mpc85xx_pci_set_irq, >mpc85xx_pci_map_irq, s, &s->busmem, &s->pio, >PCI_DEVFN(s->first_slot, 0), 4, TYPE_PCI_BUS); > h->bus = b; > @@ -483,10 +484,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev) > memory_region_add_subregion(&s->container, PCIE500_CFGADDR, > &h->conf_mem); > memory_region_add_subregion(&s->container, PCIE500_CFGDATA, > &h->data_mem); > memory_region_add_subregion(&s->container, PCIE500_REG_BASE, &s->iomem); > -sysbus_init_mmio(dev, &s->container); > +sysbus_init_mmio(sbd, &s->container); > pci_bus_set_route_irq_fn(b, e500_route_intx_pin_to_irq); > - > -return 0; > } > > static void e500_host_bridge_class_init(ObjectClass *klass, void *data) > @@ -526,9 +525,8 @@ static Property pcihost_properties[] = { > static void e500_pcihost_class_init(ObjectClass *klass, void *data) > { > DeviceClass *dc = DEVICE_CLASS(klass); > -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass); > > -k->init = e500_pcihost_initfn; > +dc->realize = e500_pcihost_realize; > set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); > dc->props = pcihost_properties; > dc->vmsd = &vmstate_ppce500_pci;
Re: [Qemu-devel] [PATCH 4/7] qcow2: async scheme for qcow2_co_preadv
27.09.2018 21:35, Max Reitz wrote: On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote: Start several async requests instead of read chunk by chunk. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/qcow2.c | 208 -- 1 file changed, 204 insertions(+), 4 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 5e7f2ee318..a0df8d4e50 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1869,6 +1869,197 @@ out: return ret; } +typedef struct Qcow2WorkerTask { +uint64_t file_cluster_offset; +uint64_t offset; +uint64_t bytes; +uint64_t bytes_done; +} Qcow2WorkerTask; Why don't you make this a union of request-specific structs? ok, will try + +typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector *qiov, + Qcow2WorkerTask *task); + +typedef struct Qcow2RWState { +BlockDriverState *bs; +QEMUIOVector *qiov; +uint64_t bytes; Maybe make it total_bytes so it doesn't conflict with the value in Qcow2WorkerTask? ok +int ret; +bool waiting_one; +bool waiting_all; +bool finalize; +Coroutine *co; +QSIMPLEQ_HEAD(, Qcow2Worker) free_workers; +QSIMPLEQ_HEAD(, Qcow2Worker) busy_workers; +int online_workers; +Qcow2DoWorkFunc do_work_func; +} Qcow2RWState; + +typedef struct Qcow2Worker { +Qcow2RWState *rws; +Coroutine *co; +Qcow2WorkerTask task; +bool busy; +QSIMPLEQ_ENTRY(Qcow2Worker) entry; +} Qcow2Worker; +#define QCOW2_MAX_WORKERS 64 That's really a bit hidden here. I think it should go into the header. Also I'm not quite sure about the number. In other places we've always used 16. (With the encryption code always allocating a new bounce buffer, this can mean quite a bit of memory usage.) No doubts. + +static coroutine_fn void qcow2_rw_worker(void *opaque); +static Qcow2Worker *qcow2_new_worker(Qcow2RWState *rws) +{ +Qcow2Worker *w = g_new0(Qcow2Worker, 1); +w->rws = rws; +w->co = qemu_coroutine_create(qcow2_rw_worker, w); + +return w; +} + +static void qcow2_free_worker(Qcow2Worker *w) +{ +g_free(w); +} + +static coroutine_fn void qcow2_rw_worker(void *opaque) +{ +Qcow2Worker *w = opaque; +Qcow2RWState *rws = w->rws; + +rws->online_workers++; + +while (!rws->finalize) { +int ret = rws->do_work_func(rws->bs, rws->qiov, &w->task); +if (ret < 0 && rws->ret == 0) { +rws->ret = ret; +} + +if (rws->waiting_all || rws->ret < 0) { +break; +} + +w->busy = false; +QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry); +QSIMPLEQ_INSERT_TAIL(&rws->free_workers, w, entry); +if (rws->waiting_one) { +rws->waiting_one = false; +/* we must unset it here, to prevent queuing rws->co in several + * workers (it may happen if other worker already waits us on mutex, + * so it will be entered after our yield and before rws->co enter) + * + * TODO: rethink this comment, as here (and in other places in the + * file) we moved from qemu_coroutine_add_next to aio_co_wake. + */ +aio_co_wake(rws->co); +} + +qemu_coroutine_yield(); +} + +if (w->busy) { +w->busy = false; +QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry); +} +qcow2_free_worker(w); +rws->online_workers--; + +if (rws->waiting_all && rws->online_workers == 0) { +aio_co_wake(rws->co); +} +} + +static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws, +uint64_t file_cluster_offset, +uint64_t offset, +uint64_t bytes, +uint64_t bytes_done) I'd propose just taking a const Qcow2WorkerTask * here. (Makes even more sense if you make it a union.) ok, I'll try this way +{ +Qcow2Worker *w; + +assert(rws->co == qemu_coroutine_self()); + +if (bytes_done == 0 && bytes == rws->bytes) { +Qcow2WorkerTask task = { +.file_cluster_offset = file_cluster_offset, +.offset = offset, +.bytes = bytes, +.bytes_done = bytes_done +}; +rws->ret = rws->do_work_func(rws->bs, rws->qiov, &task); (If so, you'd just pass the pointer along here) +return; +} I like this fast path, but I think it deserves a small comment. (That is a fast path and bypasses the whole worker infrastructure.) + +if (!QSIMPLEQ_EMPTY(&rws->free_workers)) { +w = QSIMPLEQ_FIRST(&rws->free_workers); +QSIMPLEQ_REMOVE_HEAD(&rws->free_workers, entry); +} else if (rws->online_workers < QCOW2_MAX_WORKERS) { +w = qcow2_new_worker(rws); +} else { +rws->waiting_one = true; +qemu_