Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)

2018-10-01 Thread Alex Bennée


Emilio G. Cota  writes:

> On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote:
>> If we are going to have an indirection then we can also drop the
>> requirement to scale the TLB according to the number of MMU indexes we
>> have to support. It's fairly wasteful when a bunch of them are almost
>> never used unless you are running stuff that uses them.
>
> So with dynamic TLB sizing, what you're suggesting here is to resize
> each MMU array independently (depending on their use rate) instead
> of using a single "TLB size" for all MMU indexes. Am I understanding
> your point correctly?

Not quite - I think it would overly complicate the lookup to have a
differently sized TLB lookup for each mmu index - even if their usage
patterns are different.

I just meant that if we already have the cost of an indirection we don't
have to ensure:

CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];
CPUIOTLBEntry iotlb[NB_MMU_MODES][CPU_TLB_SIZE];

restrict their sizes so any entry in the 2D array can be indexed
directly from env. Currently CPU_TLB_SIZE/CPU_TLB_BITS is restricted by
the number of NB_MMU_MODES we have to support. But if each can be
flushed and managed separately we can have:

CPUTLBEntry *tlb_table[NB_MMU_MODES];

And size CPU_TLB_SIZE for the maximum offset we can mange in the lookup
code. This is mainly driven by the varying
TCG_TARGET_TLB_DISPLACEMENT_BITS each backend has available to it.

>
> Thanks,
>
>   E.


--
Alex Bennée



[Qemu-devel] [PULL 0/1] libfdt queue 20181002

2018-10-01 Thread David Gibson
The following changes since commit a2ef4d9e95400cd387ab4ae19a317741e458fb07:

  Merge remote-tracking branch 'remotes/kraxel/tags/ui-20181001-pull-request' 
into staging (2018-10-01 15:44:30 +0100)

are available in the Git repository at:

  git://github.com/dgibson/qemu.git tags/libfdt-20181002

for you to fetch changes up to 0b001b3094bcf80a5a0c3b7655029a10fb63f661:

  Update dtc/libfdt submodule to v1.4.7 (2018-10-02 13:53:26 +1000)


Update dtc submodule to v1.4.7

We have some upcoming things planned for ppc that will require some
newer libfdt features.  In preparation, update the dtc/libfdt
submodule to upstreasm version v1.4.7.


David Gibson (1):
  Update dtc/libfdt submodule to v1.4.7

 dtc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[Qemu-devel] [PULL 1/1] Update dtc/libfdt submodule to v1.4.7

2018-10-01 Thread David Gibson
dtc v1.4.7 contains a bunch of improvements to make libfdt safer against
handling a corrupted or malicious tree, which is a good thing to have.  It
also includes an explicit fdt checking function that we'll be wanting in
future.

Signed-off-by: David Gibson 
---
 dtc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dtc b/dtc
index e54388015a..88f18909db 16
--- a/dtc
+++ b/dtc
@@ -1 +1 @@
-Subproject commit e54388015af1fb4bf04d0bca99caba1074d9cc42
+Subproject commit 88f18909db731a627456f26d779445f84e449536
-- 
2.17.1




[Qemu-devel] Have multiple virtio-net devices, but only one of them receives all traffic

2018-10-01 Thread Jintack Lim
Hi,

I'm using QEMU 3.0.0 and Linux kernel 4.15.0 on x86 machines. I'm
observing pretty weird behavior when I have multiple virtio-net
devices. My KVM VM has two virtio-net devices (vhost=off) and I'm
using a Linux bridge in the host. The two devices have different
MAC/IP addresses.

When I tried to access the VM using two different IP addresses (e.g
ping or ssh), I found that only one device in VM gets all incoming
network traffic while I expected that two devices get traffic for
their own IP addresses.

I checked it in the several ways.

1) I did ping with two IP addresses from the host/other physical
machines in the same subnet, and only one device's interrupt count is
increased.
2) I checked the ARP table from the ping sources, and two different IP
addresses have the same MAC address. In fact, I dumped ARP messages
using tcpdump, and the VM (or the bridge?) replied with the same MAC
address for two different IP addresses as attached below.
3) I monitored the host bridge (# bridge monitor) and found that only
one device's MAC address is registered.

It looks like one device's IP/MAC address is not advertised properly,
but I'm not really sure. When I turned off the device getting all the
traffic, then the other device starts getting incoming packets; the
device's MAC address is registered in the host bridge. The active
device only gets traffic for its own IP address, of course.

Here's the tcpdump result. IP 10.10.1.100 and 10.10.1.221 are VM's IP
addresses. IP 10.10.1.221 is assigned to a device having
52:54:00:12:34:58, but the log shows it is advertised as having
...:57.

23:24:10.983700 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
10.10.1.100 tell kvm-dest-link-1, length 46
23:24:10.983771 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.100
is-at 52:54:00:12:34:57 (oui Unknown), length 28
23:24:17.794811 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has
10.10.1.211 tell kvm-dest-link-1, length 46
23:24:17.794869 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.10.1.211
is-at 52:54:00:12:34:57 (oui Unknown), length 28

I would appreciate any help!

Thanks,
Jintack




Re: [Qemu-devel] [PATCH] PPC: e500: convert SysBus init method to a realize method

2018-10-01 Thread David Gibson
On Mon, Oct 01, 2018 at 05:04:13PM +0200, Cédric Le Goater wrote:
> Signed-off-by: Cédric Le Goater 

Applied to ppc-for-3.1, thanks.

> ---
>  hw/pci-host/ppce500.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c
> index eb75e080fc10..b8f8c112e662 100644
> --- a/hw/pci-host/ppce500.c
> +++ b/hw/pci-host/ppce500.c
> @@ -436,8 +436,9 @@ static AddressSpace *e500_pcihost_set_iommu(PCIBus *bus, 
> void *opaque,
>  return &s->bm_as;
>  }
>  
> -static int e500_pcihost_initfn(SysBusDevice *dev)
> +static void e500_pcihost_realize(DeviceState *dev, Error **errp)
>  {
> +SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
>  PCIHostState *h;
>  PPCE500PCIState *s;
>  PCIBus *b;
> @@ -447,7 +448,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
>  s = PPC_E500_PCI_HOST_BRIDGE(dev);
>  
>  for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
> -sysbus_init_irq(dev, &s->irq[i]);
> +sysbus_init_irq(sbd, &s->irq[i]);
>  }
>  
>  for (i = 0; i < PCI_NUM_PINS; i++) {
> @@ -460,7 +461,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
>  /* PIO lives at the bottom of our bus space */
>  memory_region_add_subregion_overlap(&s->busmem, 0, &s->pio, -2);
>  
> -b = pci_register_root_bus(DEVICE(dev), NULL, mpc85xx_pci_set_irq,
> +b = pci_register_root_bus(dev, NULL, mpc85xx_pci_set_irq,
>mpc85xx_pci_map_irq, s, &s->busmem, &s->pio,
>PCI_DEVFN(s->first_slot, 0), 4, TYPE_PCI_BUS);
>  h->bus = b;
> @@ -483,10 +484,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
>  memory_region_add_subregion(&s->container, PCIE500_CFGADDR, 
> &h->conf_mem);
>  memory_region_add_subregion(&s->container, PCIE500_CFGDATA, 
> &h->data_mem);
>  memory_region_add_subregion(&s->container, PCIE500_REG_BASE, &s->iomem);
> -sysbus_init_mmio(dev, &s->container);
> +sysbus_init_mmio(sbd, &s->container);
>  pci_bus_set_route_irq_fn(b, e500_route_intx_pin_to_irq);
> -
> -return 0;
>  }
>  
>  static void e500_host_bridge_class_init(ObjectClass *klass, void *data)
> @@ -526,9 +525,8 @@ static Property pcihost_properties[] = {
>  static void e500_pcihost_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
>  
> -k->init = e500_pcihost_initfn;
> +dc->realize = e500_pcihost_realize;
>  set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  dc->props = pcihost_properties;
>  dc->vmsd = &vmstate_ppce500_pci;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/2] ppc: convert SysBus init method to a realize method

2018-10-01 Thread David Gibson
On Mon, Oct 01, 2018 at 01:44:19PM +0200, Cédric Le Goater wrote:
> Hello,
> 
> Here are a couple of patches converting the SysBus init method to a
> realize method. make check tested only.

Applied to ppc-for-3.1, thanks.

> 
> Thanks,
> 
> C.
> 
> 
> Cédric Le Goater (2):
>   ppc440_pcix: convert SysBus init method to a realize method
>   ppc4xx_pci: convert SysBus init method to a realize method
> 
>  hw/ppc/ppc440_pcix.c | 14 ++
>  hw/ppc/ppc4xx_pci.c  | 14 ++
>  2 files changed, 12 insertions(+), 16 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)

2018-10-01 Thread Emilio G. Cota
On Mon, Oct 01, 2018 at 15:40:37 -0500, Richard Henderson wrote:
> On 10/1/18 1:34 PM, Emilio G. Cota wrote:
> > On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote:
> >> If we are going to have an indirection then we can also drop the
> >> requirement to scale the TLB according to the number of MMU indexes we
> >> have to support. It's fairly wasteful when a bunch of them are almost
> >> never used unless you are running stuff that uses them.
> > 
> > So with dynamic TLB sizing, what you're suggesting here is to resize
> > each MMU array independently (depending on their use rate) instead
> > of using a single "TLB size" for all MMU indexes. Am I understanding
> > your point correctly?
> 
> You cannot do that without flushing the TBs (and with out-of-line memory ops,
> the prologue as well) and regenerating.  The TLB size is baked into the code.
> And we really don't have any extra registers free to vary that.

Can you please elaborate on this? I can't see where this is
baked into the generated code, other than the TLB lookup.
Grepping for CPU_TLB_SIZE and CPU_TLB_BITS only shows a few
places.

I have written today a prototype of dynamic TLB flushing. It
uses no extra registers because mmu_idx is known at generation time.
I haven't done any extensive testing yet, but at least it boots
aarch64 and x86_64 guests on an x86_64 host.

The code (some messy WIP commits in there, sorry) is at:
  https://github.com/cota/qemu/tree/tlb2

Please take a look -- am I doing anything horribly wrong there?

Thanks,

Emilio



[Qemu-devel] [Bug 1795527] [NEW] Malformed audio and video output stuttering after upgrade to QEMU 3.0

2018-10-01 Thread tlloss
Public bug reported:

My host is an x86_64 Arch Linux OS with a recompiled 4.18.10 hardened
kernel, running a few KVM guests with varying OSes and configurations
managed through a Libvirt stack.

Among these guests I have two Windows 10 VMs with VGA passthrough and
PulseAudio-backed virtual audio devices.

After upgrading to QEMU 3.0.0, both of the Win10 guests started showing
corrupted audio output in the form of unnatural reproduction speed and
occasional but consistently misplaced audio fragments originating from
what seems to be a circular buffer wrapping over itself (misbehaviour
detected by starting some games with known OSTs and dialogues:
soundtracks sound accelerated and past dialogue lines start replaying
middle-sentence until the next line starts playing).

In addition, the video output of the malfunctioning VMs regularly
stutters roughly twice a second for a fraction of a second (sync'ed with
the suspected buffer wrapping and especially pronounced during not-pre-
rendered cutscenes), toghether with mouse freezes that look like actual
input misses more than simple lack of screen refreshes.


The issue was succesfully reproduced without the managing stack, directly with 
the following command line, on the most capable Windows guest:

 QEMU_AUDIO_DRV=pa
 QEMU_PA_SERVER=127.0.0.1
 /usr/bin/qemu-system-x86_64 -name guest=win10_gms,debug-threads=on \
 -machine pc-i440fx-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \  

 
 -cpu 
host,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,hv_vendor_id=123456789abc,kvm=off
 \  
 -drive 
file=/usr/share/ovmf/x64/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on \ 
  
 -drive 
file=/var/lib/libvirt/qemu/nvram/win10_gms_VARS.fd,if=pflash,format=raw,unit=1 \
 -m 5120 \  

 -realtime mlock=off \
 -smp 3,sockets=1,cores=3,threads=1 \
 -uuid 39b56ee2-6bae-4009-9108-7be26d5d63ac \
 -display none \ 
 -no-user-config \
 -nodefaults \
 -rtc base=localtime,driftfix=slew \

 
 -global kvm-pit.lost_tick_policy=delay \   
   
 -no-hpet \  
 -no-shutdown \
 -global PIIX4_PM.disable_s3=1 \
 -global PIIX4_PM.disable_s4=1 \
 -boot strict=on \  
 -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
 
 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
 -device ahci,id=sata0,bus=pci.0,addr=0x9 \ 
 -drive 
file=/dev/vms/win10_gaming,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native
 \
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on
 \
 -drive 
file=/dev/sr0,format=raw,if=none,id=drive-sata0-0-0,media=cdrom,readonly=on \   
 
 -device ide-cd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0 \
 
 -device intel-hda,id=sound0,bus=pci.0,addr=0x3 \   

 
 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \   
  
 -device usb-host,hostbus=2,hostaddr=3,id=hostdev0,bus=usb.0,port=1 \
 -device vfio-pci,host=01:00.0,id=hostdev1,bus=pci.0,addr=0x6 \  
 -device vfio-pci,host=01:00.1,id=hostdev2,bus=pci.0,addr=0x7 \
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 \   
 -sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
 -msg timestamp=on


By "purposedly misconfiguring" the codepaths and replacing "pc-i440fx-3.0" with 
"pc-i440fx-2.11" (basically reverting the config changes I needed to do in 
order to update the domain definitions), the stuttering seems to disappear (or 
at least becomes negligible) and the audio output, despite becoming incredibly 
distorted, is consistent in every other way, with in-order dialogues and 
(perceived) correct tempo.


In order to exclude eventual misconfigurations in the host's audio processing 
pipeline, I proceeded to update the domain definition's codepath of another 
guest running Ubuntu 18.04 with a completely different hardware configuration 
(no video card passthrough and no PulseAudio backconnection, just a plain 
emulated VirtIO display and Spice audio device).

The audio issue presented itself again in th

[Qemu-devel] [PATCH 04/15] hw/ssi/xilinx_spi: Use DeviceState::realize rather than SysBusDevice::init

2018-10-01 Thread Philippe Mathieu-Daudé
Move from the legacy SysBusDevice::init method to using DeviceState::realize.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/ssi/xilinx_spi.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/ssi/xilinx_spi.c b/hw/ssi/xilinx_spi.c
index 83585bc8b2..3dae303d5b 100644
--- a/hw/ssi/xilinx_spi.c
+++ b/hw/ssi/xilinx_spi.c
@@ -319,9 +319,9 @@ static const MemoryRegionOps spi_ops = {
 }
 };
 
-static int xilinx_spi_init(SysBusDevice *sbd)
+static void xilinx_spi_realize(DeviceState *dev, Error **errp)
 {
-DeviceState *dev = DEVICE(sbd);
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 XilinxSPI *s = XILINX_SPI(dev);
 int i;
 
@@ -344,8 +344,6 @@ static int xilinx_spi_init(SysBusDevice *sbd)
 
 fifo8_create(&s->tx_fifo, FIFO_CAPACITY);
 fifo8_create(&s->rx_fifo, FIFO_CAPACITY);
-
-return 0;
 }
 
 static const VMStateDescription vmstate_xilinx_spi = {
@@ -368,9 +366,8 @@ static Property xilinx_spi_properties[] = {
 static void xilinx_spi_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 
-k->init = xilinx_spi_init;
+dc->realize = xilinx_spi_realize;
 dc->reset = xlx_spi_reset;
 dc->props = xilinx_spi_properties;
 dc->vmsd = &vmstate_xilinx_spi;
-- 
2.19.0




[Qemu-devel] [PATCH 02/15] hw/timer/sun4v-rtc: Convert from DPRINTF() macro to trace events

2018-10-01 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/timer/sun4v-rtc.c  | 13 +++--
 hw/timer/trace-events |  4 
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/hw/timer/sun4v-rtc.c b/hw/timer/sun4v-rtc.c
index 310523225f..13be94f8da 100644
--- a/hw/timer/sun4v-rtc.c
+++ b/hw/timer/sun4v-rtc.c
@@ -14,15 +14,8 @@
 #include "hw/sysbus.h"
 #include "qemu/timer.h"
 #include "hw/timer/sun4v-rtc.h"
+#include "trace.h"
 
-//#define DEBUG_SUN4V_RTC
-
-#ifdef DEBUG_SUN4V_RTC
-#define DPRINTF(fmt, ...)   \
-do { printf("sun4v_rtc: " fmt , ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) do {} while (0)
-#endif
 
 #define TYPE_SUN4V_RTC "sun4v_rtc"
 #define SUN4V_RTC(obj) OBJECT_CHECK(Sun4vRtc, (obj), TYPE_SUN4V_RTC)
@@ -41,14 +34,14 @@ static uint64_t sun4v_rtc_read(void *opaque, hwaddr addr,
 /* accessing the high 32 bits */
 val >>= 32;
 }
-DPRINTF("read from " TARGET_FMT_plx " val %lx\n", addr, val);
+trace_sun4v_rtc_read(addr, val);
 return val;
 }
 
 static void sun4v_rtc_write(void *opaque, hwaddr addr,
  uint64_t val, unsigned size)
 {
-DPRINTF("write 0x%x to " TARGET_FMT_plx "\n", (unsigned)val, addr);
+trace_sun4v_rtc_read(addr, val);
 }
 
 static const MemoryRegionOps sun4v_rtc_ops = {
diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index ca9ad6321a..75bd3b1042 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -66,5 +66,9 @@ cmsdk_apb_dualtimer_read(uint64_t offset, uint64_t data, 
unsigned size) "CMSDK A
 cmsdk_apb_dualtimer_write(uint64_t offset, uint64_t data, unsigned size) 
"CMSDK APB dualtimer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_dualtimer_reset(void) "CMSDK APB dualtimer: reset"
 
+# hw/timer/sun4v-rtc.c
+sun4v_rtc_read(uint64_t addr, uint64_t value) "read: addr 0x%" PRIx64 " value 
0x%" PRIx64
+sun4v_rtc_write(uint64_t addr, uint64_t value) "write: addr 0x%" PRIx64 " 
value 0x%" PRIx64
+
 # hw/timer/xlnx-zynqmp-rtc.c
 xlnx_zynqmp_rtc_gettime(int year, int month, int day, int hour, int min, int 
sec) "Get time from host: %d-%d-%d %2d:%02d:%02d"
-- 
2.19.0




[Qemu-devel] [PATCH 05/15] hw/sh4/sh_pci: Use DeviceState::realize rather than SysBusDevice::init

2018-10-01 Thread Philippe Mathieu-Daudé
Move from the legacy SysBusDevice::init method to using DeviceState::realize.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sh4/sh_pci.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/hw/sh4/sh_pci.c b/hw/sh4/sh_pci.c
index 4ec2e35500..84c52df067 100644
--- a/hw/sh4/sh_pci.c
+++ b/hw/sh4/sh_pci.c
@@ -120,16 +120,15 @@ static void sh_pci_set_irq(void *opaque, int irq_num, int 
level)
 qemu_set_irq(pic[irq_num], level);
 }
 
-static int sh_pci_device_init(SysBusDevice *dev)
+static void sh_pci_device_realize(PCIDevice *dev, Error **errp)
 {
-PCIHostState *phb;
-SHPCIState *s;
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+SHPCIState *s = SH_PCI_HOST_BRIDGE(dev);
+PCIHostState *phb = PCI_HOST_BRIDGE(s);
 int i;
 
-s = SH_PCI_HOST_BRIDGE(dev);
-phb = PCI_HOST_BRIDGE(s);
 for (i = 0; i < 4; i++) {
-sysbus_init_irq(dev, &s->irq[i]);
+sysbus_init_irq(sbd, &s->irq[i]);
 }
 phb->bus = pci_register_root_bus(DEVICE(dev), "pci",
  sh_pci_set_irq, sh_pci_map_irq,
@@ -143,13 +142,12 @@ static int sh_pci_device_init(SysBusDevice *dev)
  &s->memconfig_p4, 0, 0x224);
 memory_region_init_alias(&s->isa, OBJECT(s), "sh_pci.isa",
  get_system_io(), 0, 0x4);
-sysbus_init_mmio(dev, &s->memconfig_p4);
-sysbus_init_mmio(dev, &s->memconfig_a7);
+sysbus_init_mmio(sbd, &s->memconfig_p4);
+sysbus_init_mmio(sbd, &s->memconfig_a7);
 s->iobr = 0xfe24;
 memory_region_add_subregion(get_system_memory(), s->iobr, &s->isa);
 
 s->dev = pci_create_simple(phb->bus, PCI_DEVFN(0, 0), "sh_pci_host");
-return 0;
 }
 
 static void sh_pci_host_realize(PCIDevice *d, Error **errp)
@@ -187,9 +185,9 @@ static const TypeInfo sh_pci_host_info = {
 
 static void sh_pci_device_class_init(ObjectClass *klass, void *data)
 {
-SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-sdc->init = sh_pci_device_init;
+k->realize = sh_pci_device_realize;
 }
 
 static const TypeInfo sh_pci_device_info = {
-- 
2.19.0




[Qemu-devel] [PATCH 10/15] hw/sparc64/niagara: Replace 'empty_slot' by 'unimplemented_device'

2018-10-01 Thread Philippe Mathieu-Daudé
The TYPE_EMPTY_SLOT and TYPE_UNIMPLEMENTED_DEVICE are identical devices,
however the later use more recent APIs and is more widely used.

Replace 'empty_slot' by 'unimplemented_device' to simplify devices code
maintenance.

Signed-off-by: Philippe Mathieu-Daudé 
---
 default-configs/sparc64-softmmu.mak | 1 -
 hw/sparc64/niagara.c| 4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/default-configs/sparc64-softmmu.mak 
b/default-configs/sparc64-softmmu.mak
index 52edafe547..ce63d47046 100644
--- a/default-configs/sparc64-softmmu.mak
+++ b/default-configs/sparc64-softmmu.mak
@@ -16,5 +16,4 @@ CONFIG_SIMBA=y
 CONFIG_SUNHME=y
 CONFIG_MC146818RTC=y
 CONFIG_ISA_TESTDEV=y
-CONFIG_EMPTY_SLOT=y
 CONFIG_SUN4V_RTC=y
diff --git a/hw/sparc64/niagara.c b/hw/sparc64/niagara.c
index 4fa8cb2904..f8a856f611 100644
--- a/hw/sparc64/niagara.c
+++ b/hw/sparc64/niagara.c
@@ -29,7 +29,7 @@
 #include "hw/hw.h"
 #include "hw/boards.h"
 #include "hw/char/serial.h"
-#include "hw/empty_slot.h"
+#include "hw/misc/unimp.h"
 #include "hw/loader.h"
 #include "hw/sparc/sparc64.h"
 #include "hw/timer/sun4v-rtc.h"
@@ -161,7 +161,7 @@ static void niagara_init(MachineState *machine)
 serial_mm_init(sysmem, NIAGARA_UART_BASE, 0, NULL, 115200,
serial_hd(0), DEVICE_BIG_ENDIAN);
 }
-empty_slot_init(NIAGARA_IOBBASE, NIAGARA_IOBSIZE);
+create_unimplemented_device("sun4v-iob", NIAGARA_IOBBASE, NIAGARA_IOBSIZE);
 sun4v_rtc_init(NIAGARA_RTC_BASE);
 }
 
-- 
2.19.0




[Qemu-devel] [PATCH 07/15] hw/mips/gt64xxx_pci: Convert gt64120_reset() function into Device reset method

2018-10-01 Thread Philippe Mathieu-Daudé
Convert the gt64120_reset() function into a proper Device reset method.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/gt64xxx_pci.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
index 24ad0ad024..dcd1a66329 100644
--- a/hw/mips/gt64xxx_pci.c
+++ b/hw/mips/gt64xxx_pci.c
@@ -992,9 +992,9 @@ static void gt64120_pci_set_irq(void *opaque, int irq_num, 
int level)
 }
 
 
-static void gt64120_reset(void *opaque)
+static void gt64120_reset(DeviceState *dev)
 {
-GT64120State *s = opaque;
+GT64120State *s = GT64120_PCI_HOST_BRIDGE(dev);
 
 /* FIXME: Malta specific hw assumptions ahead */
 
@@ -1184,16 +1184,6 @@ PCIBus *gt64120_register(qemu_irq *pic)
 return phb->bus;
 }
 
-static int gt64120_init(SysBusDevice *dev)
-{
-GT64120State *s;
-
-s = GT64120_PCI_HOST_BRIDGE(dev);
-
-qemu_register_reset(gt64120_reset, s);
-return 0;
-}
-
 static void gt64120_pci_realize(PCIDevice *d, Error **errp)
 {
 /* FIXME: Malta specific hw assumptions ahead */
@@ -1241,9 +1231,8 @@ static const TypeInfo gt64120_pci_info = {
 static void gt64120_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
-SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
 
-sdc->init = gt64120_init;
+dc->reset = gt64120_reset;
 dc->vmsd = &vmstate_gt64120;
 }
 
-- 
2.19.0




[Qemu-devel] [PATCH 08/15] hw/mips/gt64xxx_pci: Mark as bridge device

2018-10-01 Thread Philippe Mathieu-Daudé
The gt64120 is currently listed as uncategorized device.
Mark it as bridge device.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/gt64xxx_pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
index dcd1a66329..1cd8aac658 100644
--- a/hw/mips/gt64xxx_pci.c
+++ b/hw/mips/gt64xxx_pci.c
@@ -1232,6 +1232,7 @@ static void gt64120_class_init(ObjectClass *klass, void 
*data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
 dc->reset = gt64120_reset;
 dc->vmsd = &vmstate_gt64120;
 }
-- 
2.19.0




[Qemu-devel] [RFC PATCH 13/15] hw/alpha/typhoon: Remove unuseful code

2018-10-01 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/alpha/typhoon.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/hw/alpha/typhoon.c b/hw/alpha/typhoon.c
index d74b5b55e1..8004afe45b 100644
--- a/hw/alpha/typhoon.c
+++ b/hw/alpha/typhoon.c
@@ -932,23 +932,10 @@ PCIBus *typhoon_init(ram_addr_t ram_size, ISABus 
**isa_bus,
 return b;
 }
 
-static int typhoon_pcihost_init(SysBusDevice *dev)
-{
-return 0;
-}
-
-static void typhoon_pcihost_class_init(ObjectClass *klass, void *data)
-{
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
-
-k->init = typhoon_pcihost_init;
-}
-
 static const TypeInfo typhoon_pcihost_info = {
 .name  = TYPE_TYPHOON_PCI_HOST_BRIDGE,
 .parent= TYPE_PCI_HOST_BRIDGE,
 .instance_size = sizeof(TyphoonState),
-.class_init= typhoon_pcihost_class_init,
 };
 
 static void typhoon_iommu_memory_region_class_init(ObjectClass *klass,
-- 
2.19.0




[Qemu-devel] [PATCH 03/15] hw/timer/sun4v-rtc: Use DeviceState::realize rather than SysBusDevice::init

2018-10-01 Thread Philippe Mathieu-Daudé
Move from the legacy SysBusDevice::init method to using DeviceState::realize.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/timer/sun4v-rtc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/timer/sun4v-rtc.c b/hw/timer/sun4v-rtc.c
index 13be94f8da..4e7f6a1eff 100644
--- a/hw/timer/sun4v-rtc.c
+++ b/hw/timer/sun4v-rtc.c
@@ -63,21 +63,21 @@ void sun4v_rtc_init(hwaddr addr)
 sysbus_mmio_map(s, 0, addr);
 }
 
-static int sun4v_rtc_init1(SysBusDevice *dev)
+static void sun4v_rtc_realize(DeviceState *dev, Error **errp)
 {
+SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 Sun4vRtc *s = SUN4V_RTC(dev);
 
 memory_region_init_io(&s->iomem, OBJECT(s), &sun4v_rtc_ops, s,
   "sun4v-rtc", 0x08ULL);
-sysbus_init_mmio(dev, &s->iomem);
-return 0;
+sysbus_init_mmio(sbd, &s->iomem);
 }
 
 static void sun4v_rtc_class_init(ObjectClass *klass, void *data)
 {
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
+DeviceClass *dc = DEVICE_CLASS(klass);
 
-k->init = sun4v_rtc_init1;
+dc->realize = sun4v_rtc_realize;
 }
 
 static const TypeInfo sun4v_rtc_info = {
-- 
2.19.0




[Qemu-devel] [PATCH 09/15] hw/mips/malta: Replace 'empty_slot' by 'unimplemented_device'

2018-10-01 Thread Philippe Mathieu-Daudé
The TYPE_EMPTY_SLOT and TYPE_UNIMPLEMENTED_DEVICE are identical devices,
however the later use more recent APIs and is more widely used.

Replace 'empty_slot' by 'unimplemented_device' to simplify devices code
maintenance.

Signed-off-by: Philippe Mathieu-Daudé 
---
 default-configs/mips-softmmu-common.mak | 1 -
 hw/mips/mips_malta.c| 4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/default-configs/mips-softmmu-common.mak 
b/default-configs/mips-softmmu-common.mak
index fae2347ee7..f9d664e120 100644
--- a/default-configs/mips-softmmu-common.mak
+++ b/default-configs/mips-softmmu-common.mak
@@ -32,7 +32,6 @@ CONFIG_PFLASH_CFI01=y
 CONFIG_I8259=y
 CONFIG_MC146818RTC=y
 CONFIG_ISA_TESTDEV=y
-CONFIG_EMPTY_SLOT=y
 CONFIG_MIPS_CPS=y
 CONFIG_MIPS_ITU=y
 CONFIG_I2C=y
diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index 40041d5ec0..4ccfa87c35 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -53,7 +53,7 @@
 #include "sysemu/qtest.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
-#include "hw/empty_slot.h"
+#include "hw/misc/unimp.h"
 #include "sysemu/kvm.h"
 #include "exec/semihost.h"
 #include "hw/mips/cps.h"
@@ -1216,7 +1216,7 @@ void mips_malta_init(MachineState *machine)
 /* The whole address space decoded by the GT-64120A doesn't generate
exception when accessing invalid memory. Create an empty slot to
emulate this feature. */
-empty_slot_init(0, 0x2000);
+create_unimplemented_device("gt64120-SysAD", 0, 0x2000);
 
 qdev_init_nofail(dev);
 
-- 
2.19.0




[Qemu-devel] [PATCH 11/15] hw/sparc/sun4m: Replace 'empty_slot' by 'unimplemented_device'

2018-10-01 Thread Philippe Mathieu-Daudé
The TYPE_EMPTY_SLOT and TYPE_UNIMPLEMENTED_DEVICE are identical devices,
however the later use more recent APIs and is more widely used.

Replace 'empty_slot' by 'unimplemented_device' to simplify devices code
maintenance.

Signed-off-by: Philippe Mathieu-Daudé 
---
 default-configs/sparc-softmmu.mak |  1 -
 hw/sparc/sun4m.c  | 24 ++--
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/default-configs/sparc-softmmu.mak 
b/default-configs/sparc-softmmu.mak
index 12f97eeb20..7369b54467 100644
--- a/default-configs/sparc-softmmu.mak
+++ b/default-configs/sparc-softmmu.mak
@@ -8,7 +8,6 @@ CONFIG_ESCC=y
 CONFIG_M48T59=y
 CONFIG_PTIMER=y
 CONFIG_FDC=y
-CONFIG_EMPTY_SLOT=y
 CONFIG_PCNET_COMMON=y
 CONFIG_LANCE=y
 CONFIG_TCX=y
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 3c29b68e67..ca37acf7af 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -42,7 +42,7 @@
 #include "hw/nvram/chrp_nvram.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/char/escc.h"
-#include "hw/empty_slot.h"
+#include "hw/misc/unimp.h"
 #include "hw/loader.h"
 #include "elf.h"
 #include "trace.h"
@@ -863,7 +863,8 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 ram_init(0, machine->ram_size, hwdef->max_mem);
 /* models without ECC don't trap when missing ram is accessed */
 if (!hwdef->ecc_base) {
-empty_slot_init(machine->ram_size, hwdef->max_mem - machine->ram_size);
+create_unimplemented_device("ecc", machine->ram_size,
+hwdef->max_mem - machine->ram_size);
 }
 
 prom_init(hwdef->slavio_base, bios_name);
@@ -892,9 +893,10 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 if (hwdef->iommu_pad_base) {
 /* On the real hardware (SS-5, LX) the MMU is not padded, but aliased.
Software shouldn't use aliased addresses, neither should it crash
-   when does. Using empty_slot instead of aliasing can help with
-   debugging such accesses */
-empty_slot_init(hwdef->iommu_pad_base,hwdef->iommu_pad_len);
+   when does. Using the 'unimplemented device' instead of aliasing can
+   help with debugging such accesses */
+create_unimplemented_device("iommu.alias", hwdef->iommu_pad_base,
+hwdef->iommu_pad_len);
 }
 
 sparc32_dma_init(hwdef->dma_base,
@@ -944,12 +946,13 @@ static void sun4m_hw_init(const struct sun4m_hwdef *hwdef,
 for (i = num_vsimms; i < MAX_VSIMMS; i++) {
 /* vsimm registers probed by OBP */
 if (hwdef->vsimm[i].reg_base) {
-empty_slot_init(hwdef->vsimm[i].reg_base, 0x2000);
+create_unimplemented_device("vsimm", hwdef->vsimm[i].reg_base,
+0x2000);
 }
 }
 
 if (hwdef->sx_base) {
-empty_slot_init(hwdef->sx_base, 0x2000);
+create_unimplemented_device("sx", hwdef->sx_base, 0x2000);
 }
 
 nvram = m48t59_init(slavio_irq[0], hwdef->nvram_base, 0, 0x2000, 1968, 8);
@@ -1012,14 +1015,15 @@ static void sun4m_hw_init(const struct sun4m_hwdef 
*hwdef,
 if (hwdef->dbri_base) {
 /* ISDN chip with attached CS4215 audio codec */
 /* prom space */
-empty_slot_init(hwdef->dbri_base+0x1000, 0x30);
+create_unimplemented_device("dbri.prom", hwdef->dbri_base + 0x1000,
+0x30);
 /* reg space */
-empty_slot_init(hwdef->dbri_base+0x1, 0x100);
+create_unimplemented_device("dbri", hwdef->dbri_base + 0x1, 0x100);
 }
 
 if (hwdef->bpp_base) {
 /* parallel port */
-empty_slot_init(hwdef->bpp_base, 0x20);
+create_unimplemented_device("bpp", hwdef->bpp_base, 0x20);
 }
 
 kernel_size = sun4m_load_kernel(machine->kernel_filename,
-- 
2.19.0




[Qemu-devel] [PATCH 12/15] hw/core: Remove the 'empty_slot' device

2018-10-01 Thread Philippe Mathieu-Daudé
All previous users of TYPE_EMPTY_SLOT now use TYPE_UNIMPLEMENTED_DEVICE.

Since TYPE_EMPTY_SLOT is no more used/referenced, remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/empty_slot.h |   7 ---
 hw/core/empty_slot.c| 103 
 hw/core/Makefile.objs   |   1 -
 3 files changed, 111 deletions(-)
 delete mode 100644 include/hw/empty_slot.h
 delete mode 100644 hw/core/empty_slot.c

diff --git a/include/hw/empty_slot.h b/include/hw/empty_slot.h
deleted file mode 100644
index 123a9f8989..00
--- a/include/hw/empty_slot.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef HW_EMPTY_SLOT_H
-#define HW_EMPTY_SLOT_H
-
-/* empty_slot.c */
-void empty_slot_init(hwaddr addr, uint64_t slot_size);
-
-#endif
diff --git a/hw/core/empty_slot.c b/hw/core/empty_slot.c
deleted file mode 100644
index c1b9c2b104..00
--- a/hw/core/empty_slot.c
+++ /dev/null
@@ -1,103 +0,0 @@
-/*
- * QEMU Empty Slot
- *
- * The empty_slot device emulates known to a bus but not connected devices.
- *
- * Copyright (c) 2010 Artyom Tarasenko
- *
- * This code is licensed under the GNU GPL v2 or (at your option) any later
- * version.
- */
-
-#include "qemu/osdep.h"
-#include "hw/hw.h"
-#include "hw/sysbus.h"
-#include "hw/empty_slot.h"
-
-//#define DEBUG_EMPTY_SLOT
-
-#ifdef DEBUG_EMPTY_SLOT
-#define DPRINTF(fmt, ...)   \
-do { printf("empty_slot: " fmt , ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) do {} while (0)
-#endif
-
-#define TYPE_EMPTY_SLOT "empty_slot"
-#define EMPTY_SLOT(obj) OBJECT_CHECK(EmptySlot, (obj), TYPE_EMPTY_SLOT)
-
-typedef struct EmptySlot {
-SysBusDevice parent_obj;
-
-MemoryRegion iomem;
-uint64_t size;
-} EmptySlot;
-
-static uint64_t empty_slot_read(void *opaque, hwaddr addr,
-unsigned size)
-{
-DPRINTF("read from " TARGET_FMT_plx "\n", addr);
-return 0;
-}
-
-static void empty_slot_write(void *opaque, hwaddr addr,
- uint64_t val, unsigned size)
-{
-DPRINTF("write 0x%x to " TARGET_FMT_plx "\n", (unsigned)val, addr);
-}
-
-static const MemoryRegionOps empty_slot_ops = {
-.read = empty_slot_read,
-.write = empty_slot_write,
-.endianness = DEVICE_NATIVE_ENDIAN,
-};
-
-void empty_slot_init(hwaddr addr, uint64_t slot_size)
-{
-if (slot_size > 0) {
-/* Only empty slots larger than 0 byte need handling. */
-DeviceState *dev;
-SysBusDevice *s;
-EmptySlot *e;
-
-dev = qdev_create(NULL, TYPE_EMPTY_SLOT);
-s = SYS_BUS_DEVICE(dev);
-e = EMPTY_SLOT(dev);
-e->size = slot_size;
-
-qdev_init_nofail(dev);
-
-sysbus_mmio_map(s, 0, addr);
-}
-}
-
-static int empty_slot_init1(SysBusDevice *dev)
-{
-EmptySlot *s = EMPTY_SLOT(dev);
-
-memory_region_init_io(&s->iomem, OBJECT(s), &empty_slot_ops, s,
-  "empty-slot", s->size);
-sysbus_init_mmio(dev, &s->iomem);
-return 0;
-}
-
-static void empty_slot_class_init(ObjectClass *klass, void *data)
-{
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
-
-k->init = empty_slot_init1;
-}
-
-static const TypeInfo empty_slot_info = {
-.name  = TYPE_EMPTY_SLOT,
-.parent= TYPE_SYS_BUS_DEVICE,
-.instance_size = sizeof(EmptySlot),
-.class_init= empty_slot_class_init,
-};
-
-static void empty_slot_register_types(void)
-{
-type_register_static(&empty_slot_info);
-}
-
-type_init(empty_slot_register_types)
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index eb88ca979e..fd75172a21 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -8,7 +8,6 @@ common-obj-y += irq.o
 common-obj-y += hotplug.o
 common-obj-$(CONFIG_SOFTMMU) += nmi.o
 
-common-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o
 common-obj-$(CONFIG_XILINX_AXI) += stream.o
 common-obj-$(CONFIG_PTIMER) += ptimer.o
 common-obj-$(CONFIG_SOFTMMU) += sysbus.o
-- 
2.19.0




[Qemu-devel] [PATCH 00/15] another SysBusDevice::init to Device::realize cleanup

2018-10-01 Thread Philippe Mathieu-Daudé
Peter suggested [1] another crusade for this merge window,
then Cédric jumped on his horse [2]. My turn on my dromedary.

- convert few devices to DeviceState::realize,
- kill the empty_slot device,
- remove unuseful class_init() code [RFC, do we want to keep this?]
- few other minor fixes catched while editing

Regards,

Phil.

[1] https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg03605.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg1.html

Philippe Mathieu-Daudé (15):
  trace-events: Fix copy/paste typo
  hw/timer/sun4v-rtc: Convert from DPRINTF() macro to trace events
  hw/timer/sun4v-rtc: Use DeviceState::realize rather than
SysBusDevice::init
  hw/ssi/xilinx_spi: Use DeviceState::realize rather than
SysBusDevice::init
  hw/sh4/sh_pci: Use DeviceState::realize rather than SysBusDevice::init
  hw/pci-host/bonito: Use DeviceState::realize rather than
SysBusDevice::init
  hw/mips/gt64xxx_pci: Convert gt64120_reset() function into Device
reset method
  hw/mips/gt64xxx_pci: Mark as bridge device
  hw/mips/malta: Replace 'empty_slot' by 'unimplemented_device'
  hw/sparc64/niagara: Replace 'empty_slot' by 'unimplemented_device'
  hw/sparc/sun4m: Replace 'empty_slot' by 'unimplemented_device'
  hw/core: Remove the 'empty_slot' device
  hw/alpha/typhoon: Remove unuseful code
  hw/hppa/dino: Remove unuseful code
  hw/mips/malta: Remove unuseful code

 default-configs/mips-softmmu-common.mak |   1 -
 default-configs/sparc-softmmu.mak   |   1 -
 default-configs/sparc64-softmmu.mak |   1 -
 include/hw/empty_slot.h |   7 --
 hw/alpha/typhoon.c  |  13 ---
 hw/core/empty_slot.c| 103 
 hw/hppa/dino.c  |   7 --
 hw/mips/gt64xxx_pci.c   |  18 +
 hw/mips/mips_malta.c|  17 +---
 hw/pci-host/bonito.c|   9 +--
 hw/sh4/sh_pci.c |  20 +++--
 hw/sparc/sun4m.c|  24 +++---
 hw/sparc64/niagara.c|   4 +-
 hw/ssi/xilinx_spi.c |   9 +--
 hw/timer/sun4v-rtc.c|  23 ++
 hw/core/Makefile.objs   |   1 -
 hw/timer/trace-events   |   6 +-
 17 files changed, 50 insertions(+), 214 deletions(-)
 delete mode 100644 include/hw/empty_slot.h
 delete mode 100644 hw/core/empty_slot.c

-- 
2.19.0




[Qemu-devel] [RFC PATCH 15/15] hw/mips/malta: Remove unuseful code

2018-10-01 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/mips_malta.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/hw/mips/mips_malta.c b/hw/mips/mips_malta.c
index 4ccfa87c35..b6633fa141 100644
--- a/hw/mips/mips_malta.c
+++ b/hw/mips/mips_malta.c
@@ -1422,23 +1422,10 @@ void mips_malta_init(MachineState *machine)
 pci_vga_init(pci_bus);
 }
 
-static int mips_malta_sysbus_device_init(SysBusDevice *sysbusdev)
-{
-return 0;
-}
-
-static void mips_malta_class_init(ObjectClass *klass, void *data)
-{
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
-
-k->init = mips_malta_sysbus_device_init;
-}
-
 static const TypeInfo mips_malta_device = {
 .name  = TYPE_MIPS_MALTA,
 .parent= TYPE_SYS_BUS_DEVICE,
 .instance_size = sizeof(MaltaState),
-.class_init= mips_malta_class_init,
 };
 
 static void mips_malta_machine_init(MachineClass *mc)
-- 
2.19.0




[Qemu-devel] [PATCH 01/15] trace-events: Fix copy/paste typo

2018-10-01 Thread Philippe Mathieu-Daudé
Missed while reviewing 5dd85b4b486.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/timer/trace-events | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/timer/trace-events b/hw/timer/trace-events
index fa4213df5b..ca9ad6321a 100644
--- a/hw/timer/trace-events
+++ b/hw/timer/trace-events
@@ -56,7 +56,7 @@ systick_timer_tick(void) "systick reload"
 systick_read(uint64_t addr, uint32_t value, unsigned size) "systick read addr 
0x%" PRIx64 " data 0x%" PRIx32 " size %u"
 systick_write(uint64_t addr, uint32_t value, unsigned size) "systick write 
addr 0x%" PRIx64 " data 0x%" PRIx32 " size %u"
 
-# hw/char/cmsdk_apb_timer.c
+# hw/timer/cmsdk_apb_timer.c
 cmsdk_apb_timer_read(uint64_t offset, uint64_t data, unsigned size) "CMSDK APB 
timer read: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_timer_write(uint64_t offset, uint64_t data, unsigned size) "CMSDK 
APB timer write: offset 0x%" PRIx64 " data 0x%" PRIx64 " size %u"
 cmsdk_apb_timer_reset(void) "CMSDK APB timer: reset"
-- 
2.19.0




[Qemu-devel] [RFC PATCH 14/15] hw/hppa/dino: Remove unuseful code

2018-10-01 Thread Philippe Mathieu-Daudé
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/hppa/dino.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/hw/hppa/dino.c b/hw/hppa/dino.c
index 564b938e3a..31e09942b5 100644
--- a/hw/hppa/dino.c
+++ b/hw/hppa/dino.c
@@ -488,17 +488,10 @@ PCIBus *dino_init(MemoryRegion *addr_space,
 return b;
 }
 
-static int dino_pcihost_init(SysBusDevice *dev)
-{
-return 0;
-}
-
 static void dino_pcihost_class_init(ObjectClass *klass, void *data)
 {
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
 DeviceClass *dc = DEVICE_CLASS(klass);
 
-k->init = dino_pcihost_init;
 dc->vmsd = &vmstate_dino;
 }
 
-- 
2.19.0




[Qemu-devel] [PATCH 06/15] hw/pci-host/bonito: Use DeviceState::realize rather than SysBusDevice::init

2018-10-01 Thread Philippe Mathieu-Daudé
Move from the legacy SysBusDevice::init method to using DeviceState::realize.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/pci-host/bonito.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/pci-host/bonito.c b/hw/pci-host/bonito.c
index 9868e2eccc..03d1ec33e3 100644
--- a/hw/pci-host/bonito.c
+++ b/hw/pci-host/bonito.c
@@ -595,7 +595,7 @@ static const VMStateDescription vmstate_bonito = {
 }
 };
 
-static int bonito_pcihost_initfn(SysBusDevice *dev)
+static void bonito_pcihost_realize(PCIDevice *dev, Error **errp)
 {
 PCIHostState *phb = PCI_HOST_BRIDGE(dev);
 
@@ -603,8 +603,6 @@ static int bonito_pcihost_initfn(SysBusDevice *dev)
  pci_bonito_set_irq, pci_bonito_map_irq,
  dev, get_system_memory(), get_system_io(),
  0x28, 32, TYPE_PCI_BUS);
-
-return 0;
 }
 
 static void bonito_realize(PCIDevice *dev, Error **errp)
@@ -684,7 +682,6 @@ PCIBus *bonito_init(qemu_irq *pic)
 pcihost->pic = pic;
 qdev_init_nofail(dev);
 
-/* set the pcihost pointer before bonito_initfn is called */
 d = pci_create(phb->bus, PCI_DEVFN(0, 0), TYPE_PCI_BONITO);
 s = PCI_BONITO(d);
 s->pcihost = pcihost;
@@ -726,9 +723,9 @@ static const TypeInfo bonito_info = {
 
 static void bonito_pcihost_class_init(ObjectClass *klass, void *data)
 {
-SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
+PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k->init = bonito_pcihost_initfn;
+k->realize = bonito_pcihost_realize;
 }
 
 static const TypeInfo bonito_pcihost_info = {
-- 
2.19.0




Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)

2018-10-01 Thread Richard Henderson
On 10/1/18 1:34 PM, Emilio G. Cota wrote:
> On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote:
>> If we are going to have an indirection then we can also drop the
>> requirement to scale the TLB according to the number of MMU indexes we
>> have to support. It's fairly wasteful when a bunch of them are almost
>> never used unless you are running stuff that uses them.
> 
> So with dynamic TLB sizing, what you're suggesting here is to resize
> each MMU array independently (depending on their use rate) instead
> of using a single "TLB size" for all MMU indexes. Am I understanding
> your point correctly?

You cannot do that without flushing the TBs (and with out-of-line memory ops,
the prologue as well) and regenerating.  The TLB size is baked into the code.
And we really don't have any extra registers free to vary that.


r~



[Qemu-devel] [PATCH] lsi: Reselection needed to remove pending commands from queue

2018-10-01 Thread George Kennedy
Under heavy IO (e.g. fio) the queue is not checked frequently enough for
pending commands. As a result some pending commands are timed out by the
linux sym53c8xx driver, which sends SCSI Abort messages for the timed out
commands. The SCSI Abort messages result in linux errors, which show up
in /var/log/messages.

e.g.
sd 0:0:3:0: [sdd] tag#33 ABORT operation started
scsi target0:0:3: control msgout:
80 20 47 d
sd 0:0:3:0: ABORT operation complete.
scsi target0:0:4: message d sent on bad reselection

Add a deadline along with the command when it is added to the queue.
When the current command completes, check the queue for pending commands
that have exceeded the deadline and if so, simulate a Wait Reselect
to handle the pending commands on the queue.

When a Wait Reselect is needed, intercept and save the current DMA Scripts
Ptr (DSP) contents and load it instead with the pointer to the Reselection
Scripts.  When Reselection has completed, restore the original DSP contents.

Signed-off-by: George Kennedy 
---
 hw/scsi/lsi53c895a.c | 94 ++--
 1 file changed, 91 insertions(+), 3 deletions(-)

diff --git a/hw/scsi/lsi53c895a.c b/hw/scsi/lsi53c895a.c
index 996b406..1e38ceb 100644
--- a/hw/scsi/lsi53c895a.c
+++ b/hw/scsi/lsi53c895a.c
@@ -198,6 +198,7 @@ typedef struct lsi_request {
 uint32_t dma_len;
 uint8_t *dma_buf;
 uint32_t pending;
+uint64_t deadline;
 int out;
 QTAILQ_ENTRY(lsi_request) next;
 } lsi_request;
@@ -232,6 +233,9 @@ typedef struct {
 int command_complete;
 QTAILQ_HEAD(, lsi_request) queue;
 lsi_request *current;
+int want_resel; /* need resel to handle queued completed cmds */
+uint32_t resel_dsp; /* DMA Scripts Ptr (DSP) of reselection scsi scripts */
+uint32_t next_dsp;  /* if want_resel, will be loaded with above */
 
 uint32_t dsa;
 uint32_t temp;
@@ -310,6 +314,44 @@ static inline int lsi_irq_on_rsl(LSIState *s)
 {
 return (s->sien0 & LSI_SIST0_RSL) && (s->scid & LSI_SCID_RRE);
 }
+#ifdef DEBUG_LSI
+static void showq(LSIState *s)
+{
+lsi_request *p;
+int count = 0;
+
+QTAILQ_FOREACH(p, &s->queue, next) {
+DPRINTF("%d: tag=%x, pending=%x, deadline=%lx\n",
+count++, p->tag, p->pending, p->deadline);
+}
+}
+
+static int get_pending_count(LSIState *s)
+{
+lsi_request *p;
+int count = 0;
+
+QTAILQ_FOREACH(p, &s->queue, next) {
+if (p->pending) {
+count++;
+}
+}
+return count;
+}
+#endif
+static int pending_past_deadline(LSIState *s)
+{
+lsi_request *p;
+
+QTAILQ_FOREACH(p, &s->queue, next) {
+if (p->pending) {
+if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) > p->deadline) {
+return 1;
+}
+}
+}
+return 0;
+}
 
 static void lsi_soft_reset(LSIState *s)
 {
@@ -634,15 +676,22 @@ static void lsi_do_dma(LSIState *s, int out)
 }
 }
 
+/* Max time a completed command can be on the queue before Reselection needed 
*/
+#define LSI_DEADLINE1000
 
 /* Add a command to the queue.  */
 static void lsi_queue_command(LSIState *s)
 {
 lsi_request *p = s->current;
+uint64_t timeout_ms = LSI_DEADLINE;
 
 DPRINTF("Queueing tag=0x%x\n", p->tag);
 assert(s->current != NULL);
 assert(s->current->dma_len == 0);
+
+p->deadline = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+timeout_ms * 100ULL;
+
 QTAILQ_INSERT_TAIL(&s->queue, s->current, next);
 s->current = NULL;
 
@@ -668,6 +717,8 @@ static void lsi_reselect(LSIState *s, lsi_request *p)
 
 assert(s->current == NULL);
 QTAILQ_REMOVE(&s->queue, p, next);
+DPRINTF("lsi_reselect: p->tag=%x, p->pending=%x, p->deadline=%lx\n",
+p->tag, p->pending, p->deadline);
 s->current = p;
 
 id = (p->tag >> 8) & 0xf;
@@ -775,6 +826,10 @@ static void lsi_command_complete(SCSIRequest *req, 
uint32_t status, size_t resid
 lsi_request_free(s, s->current);
 scsi_req_unref(req);
 }
+if (pending_past_deadline(s)) {
+DPRINTF("lsi_command_complete: want_resel = 1\n");
+s->want_resel = 1;
+}
 lsi_resume_script(s);
 }
 
@@ -987,7 +1042,7 @@ static void lsi_do_msgout(LSIState *s)
 s->select_tag |= lsi_get_msgbyte(s) | LSI_TAG_VALID;
 break;
 case 0x22: /* ORDERED queue */
-BADF("ORDERED queue not implemented\n");
+DPRINTF("ORDERED queue not implemented\n");
 s->select_tag |= lsi_get_msgbyte(s) | LSI_TAG_VALID;
 break;
 case 0x0d:
@@ -1075,8 +1130,17 @@ static void lsi_memcpy(LSIState *s, uint32_t dest, 
uint32_t src, int count)
 static void lsi_wait_reselect(LSIState *s)
 {
 lsi_request *p;
+#ifdef DEBUG_LSI
+int pending_count = get_pending_count(s);
 
-DPRINTF("Wait Reselect\n");
+if (pending_count) {
+DPRINTF("Wait Reselect, s->current=%p, pending_count=%d\n",
+s->current, pending_count);
+

Re: [Qemu-devel] [PATCH v4 3/3] tests: Add unit tests for image locking

2018-10-01 Thread Max Reitz
On 21.08.18 02:58, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  tests/Makefile.include |   2 +
>  tests/test-image-locking.c | 154 +
>  2 files changed, 156 insertions(+)
>  create mode 100644 tests/test-image-locking.c
> 
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 760a0f18b6..8cc0595b39 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -95,6 +95,7 @@ check-unit-y += tests/test-bdrv-drain$(EXESUF)
>  check-unit-y += tests/test-blockjob$(EXESUF)
>  check-unit-y += tests/test-blockjob-txn$(EXESUF)
>  check-unit-y += tests/test-block-backend$(EXESUF)
> +check-unit-y += tests/test-image-locking$(EXESUF)
>  check-unit-y += tests/test-x86-cpuid$(EXESUF)
>  # all code tested by test-x86-cpuid is inside topology.h
>  gcov-files-test-x86-cpuid-y =
> @@ -640,6 +641,7 @@ tests/test-bdrv-drain$(EXESUF): tests/test-bdrv-drain.o 
> $(test-block-obj-y) $(te
>  tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) 
> $(test-util-obj-y)
>  tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o 
> $(test-block-obj-y) $(test-util-obj-y)
>  tests/test-block-backend$(EXESUF): tests/test-block-backend.o 
> $(test-block-obj-y) $(test-util-obj-y)
> +tests/test-image-locking$(EXESUF): tests/test-image-locking.o 
> $(test-block-obj-y) $(test-util-obj-y)
>  tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
>  tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
>  tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) 
> $(test-crypto-obj-y)
> diff --git a/tests/test-image-locking.c b/tests/test-image-locking.c
> new file mode 100644
> index 00..ac1a5fd4ca
> --- /dev/null
> +++ b/tests/test-image-locking.c
> @@ -0,0 +1,154 @@
> +/*
> + * Image locking tests
> + *
> + * Copyright (c) 2018 Red Hat Inc.
> + *
> + * Author: Fam Zheng 
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "block/block.h"
> +#include "sysemu/block-backend.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qdict.h"
> +
> +static BlockBackend *open_image(const char *path,
> +uint64_t perm, uint64_t shared_perm,
> +Error **errp)
> +{
> +Error *local_err = NULL;
> +BlockBackend *blk;
> +QDict *options = qdict_new();
> +
> +qdict_put_str(options, "driver", "raw");
> +blk = blk_new_open(path, NULL, options, BDRV_O_RDWR, &local_err);
> +if (blk) {
> +g_assert_null(local_err);
> +if (blk_set_perm(blk, perm, shared_perm, errp)) {
> +blk_unref(blk);
> +blk = NULL;
> +}
> +} else {
> +error_propagate(errp, local_err);
> +}
> +return blk;
> +}
> +
> +static void check_locked_bytes(int fd, uint64_t perm_locks,
> +   uint64_t shared_perm_locks)
> +{
> +int i;
> +
> +if (!perm_locks && !shared_perm_locks) {
> +g_assert(!qemu_lock_fd_test(fd, 0, 0, true));
> +return;
> +}
> +for (i = 0; (1ULL << i) <= BLK_PERM_ALL; i++) {
> +uint64_t bit = (1ULL << i);
> +bool perm_expected = !!(bit & perm_locks);
> +bool shared_perm_expected = !!(bit & shared_perm_locks);
> +g_assert_cmpint(perm_expected, ==,
> +!!qemu_lock_fd_test(fd, 100 + i, 1, true));
> +g_assert_cmpint(shared_perm_expected, ==,
> +!!qemu_lock_fd_test(fd, 200 + i, 1, true));
> +}
> +}
> +
> +static void test_image_locking_basic(void)
> +{
> +BlockBackend *blk1, *blk2, *blk3;
> +char img_path[] = "/tmp/qtest.XX";
> +uint64_t perm, shared_perm;
> +
> +int fd = mkstemp(img_path);
> +assert(fd >= 0);

Don't you want to remove these temporary files after the test?

> +
> +perm = BLK_PERM_WRITE | B

[Qemu-devel] [PATCH v5 2/9] x86_iommu: move vtd_generate_msi_message in common file

2018-10-01 Thread Singh, Brijesh
The vtd_generate_msi_message() in intel-iommu is used to construct a MSI
Message from IRQ. A similar function will be needed when we add interrupt
remapping support in amd-iommu. Moving the function in common file to
avoid the code duplication. Rename it to x86_iommu_irq_to_msi_message().
There is no logic changes in the code flow.

Signed-off-by: Brijesh Singh 
Suggested-by: Peter Xu 
Reviewed-by: Eduardo Habkost 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/intel_iommu.c | 32 +++--
 hw/i386/x86-iommu.c   | 24 
 include/hw/i386/intel_iommu.h | 59 --
 include/hw/i386/x86-iommu.h   | 66 +++
 4 files changed, 94 insertions(+), 87 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 84dbc20..014418b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2701,7 +2701,7 @@ static int vtd_irte_get(IntelIOMMUState *iommu, uint16_t 
index,
 
 /* Fetch IRQ information of specific IR index */
 static int vtd_remap_irq_get(IntelIOMMUState *iommu, uint16_t index,
- VTDIrq *irq, uint16_t sid)
+ X86IOMMUIrq *irq, uint16_t sid)
 {
 VTD_IR_TableEntry irte = {};
 int ret = 0;
@@ -2730,30 +2730,6 @@ static int vtd_remap_irq_get(IntelIOMMUState *iommu, 
uint16_t index,
 return 0;
 }
 
-/* Generate one MSI message from VTDIrq info */
-static void vtd_generate_msi_message(VTDIrq *irq, MSIMessage *msg_out)
-{
-VTD_MSIMessage msg = {};
-
-/* Generate address bits */
-msg.dest_mode = irq->dest_mode;
-msg.redir_hint = irq->redir_hint;
-msg.dest = irq->dest;
-msg.__addr_hi = irq->dest & 0xff00;
-msg.__addr_head = cpu_to_le32(0xfee);
-/* Keep this from original MSI address bits */
-msg.__not_used = irq->msi_addr_last_bits;
-
-/* Generate data bits */
-msg.vector = irq->vector;
-msg.delivery_mode = irq->delivery_mode;
-msg.level = 1;
-msg.trigger_mode = irq->trigger_mode;
-
-msg_out->address = msg.msi_addr;
-msg_out->data = msg.msi_data;
-}
-
 /* Interrupt remapping for MSI/MSI-X entry */
 static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
MSIMessage *origin,
@@ -2763,7 +2739,7 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
 int ret = 0;
 VTD_IR_MSIAddress addr;
 uint16_t index;
-VTDIrq irq = {};
+X86IOMMUIrq irq = {};
 
 assert(origin && translated);
 
@@ -2842,8 +2818,8 @@ static int vtd_interrupt_remap_msi(IntelIOMMUState *iommu,
  */
 irq.msi_addr_last_bits = addr.addr.__not_care;
 
-/* Translate VTDIrq to MSI message */
-vtd_generate_msi_message(&irq, translated);
+/* Translate X86IOMMUIrq to MSI message */
+x86_iommu_irq_to_msi_message(&irq, translated);
 
 out:
 trace_vtd_ir_remap_msi(origin->address, origin->data,
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 7440cb8..abc3c03 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -53,6 +53,30 @@ void x86_iommu_iec_notify_all(X86IOMMUState *iommu, bool 
global,
 }
 }
 
+/* Generate one MSI message from VTDIrq info */
+void x86_iommu_irq_to_msi_message(X86IOMMUIrq *irq, MSIMessage *msg_out)
+{
+X86IOMMU_MSIMessage msg = {};
+
+/* Generate address bits */
+msg.dest_mode = irq->dest_mode;
+msg.redir_hint = irq->redir_hint;
+msg.dest = irq->dest;
+msg.__addr_hi = irq->dest & 0xff00;
+msg.__addr_head = cpu_to_le32(0xfee);
+/* Keep this from original MSI address bits */
+msg.__not_used = irq->msi_addr_last_bits;
+
+/* Generate data bits */
+msg.vector = irq->vector;
+msg.delivery_mode = irq->delivery_mode;
+msg.level = 1;
+msg.trigger_mode = irq->trigger_mode;
+
+msg_out->address = msg.msi_addr;
+msg_out->data = msg.msi_data;
+}
+
 /* Default X86 IOMMU device */
 static X86IOMMUState *x86_iommu_default = NULL;
 
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index fbfedcb..ed4e758 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -66,8 +66,6 @@ typedef struct VTDIOTLBEntry VTDIOTLBEntry;
 typedef struct VTDBus VTDBus;
 typedef union VTD_IR_TableEntry VTD_IR_TableEntry;
 typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress;
-typedef struct VTDIrq VTDIrq;
-typedef struct VTD_MSIMessage VTD_MSIMessage;
 
 /* Context-Entry */
 struct VTDContextEntry {
@@ -197,63 +195,6 @@ union VTD_IR_MSIAddress {
 uint32_t data;
 };
 
-/* Generic IRQ entry information */
-struct VTDIrq {
-/* Used by both IOAPIC/MSI interrupt remapping */
-uint8_t trigger_mode;
-uint8_t vector;
-uint8_t delivery_mode;
-uint32_t dest;
-uint8_t dest_mode;
-
-/* only used by MSI interrupt remapping */
-

[Qemu-devel] [PATCH v5 9/9] x86_iommu/amd: Enable Guest virtual APIC support

2018-10-01 Thread Singh, Brijesh
Now that amd-iommu support interrupt remapping, enable the GASup in IVRS
table and GASup in extended feature register to indicate that IOMMU
support guest virtual APIC mode. GASup provides option to guest OS to
make use of 128-bit IRTE.

Note that the GAMSup is set to zero to indicate that amd-iommu does not
support guest virtual APIC mode (aka AVIC) which would be used for the
nested VMs.

See Table 21 from IOMMU spec for interrupt virtualization controls

Signed-off-by: Brijesh Singh 
Reviewed-by: Peter Xu 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/acpi-build.c | 3 ++-
 hw/i386/amd_iommu.h  | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1ef396d..236a20e 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2518,7 +2518,8 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 build_append_int_noprefix(table_data,
  (48UL << 30) | /* HATS   */
  (48UL << 28) | /* GATS   */
- (1UL << 2),/* GTSup  */
+ (1UL << 2)   | /* GTSup  */
+ (1UL << 6),/* GASup  */
  4);
 /*
  *   Type 1 device entry reporting all devices
diff --git a/hw/i386/amd_iommu.h b/hw/i386/amd_iommu.h
index 8061e9c..687fcd8 100644
--- a/hw/i386/amd_iommu.h
+++ b/hw/i386/amd_iommu.h
@@ -176,7 +176,7 @@
 /* extended feature support */
 #define AMDVI_EXT_FEATURES (AMDVI_FEATURE_PREFETCH | AMDVI_FEATURE_PPR | \
 AMDVI_FEATURE_IA | AMDVI_FEATURE_GT | AMDVI_FEATURE_HE | \
-AMDVI_GATS_MODE | AMDVI_HATS_MODE)
+AMDVI_GATS_MODE | AMDVI_HATS_MODE | AMDVI_FEATURE_GA)
 
 /* capabilities header */
 #define AMDVI_CAPAB_FEATURES (AMDVI_CAPAB_FLAT_EXT | \
-- 
2.7.4




[Qemu-devel] [Bug 1701449] Re: high memory usage when using rbd with client caching

2018-10-01 Thread Jason Dillaman
@Nick: if you can recreate the librbd memory growth, any chance you can
help test a potential fix [1]?

[1] https://github.com/ceph/ceph/pull/24297

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1701449

Title:
  high memory usage when using rbd with client caching

Status in QEMU:
  New

Bug description:
  Hi,
  we are experiencing a quite high memory usage of a single qemu (used with 
KVM) process when using RBD with client caching as a disk backend. We are 
testing with 3GB memory qemu virtual machines and 128MB RBD client cache. When 
running 'fio' in the virtual machine you can see that after some time the 
machine uses a lot more memory (RSS) on the hypervisor than she should. We have 
seen values (in real production machines, no artificially fio tests) of 250% 
memory overhead. I reproduced this with qemu version 2.9 as well.

  Here the contents of our ceph.conf on the hypervisor:
  """
  [client]
  rbd cache writethrough until flush = False
  rbd cache max dirty = 100663296
  rbd cache size = 134217728
  rbd cache target dirty = 50331648
  """

  How to reproduce:
  * create a virtual machine with a RBD backed disk (100GB or so)
  * install a linux distribution on it (we are using Ubuntu)
  * install fio (apt-get install fio)
  * run fio multiple times with (e.g.) the following test file:
  """
  # This job file tries to mimic the Intel IOMeter File Server Access Pattern
  [global]
  description=Emulation of Intel IOmeter File Server Access Pattern
  randrepeat=0
  filename=/root/test.dat
  # IOMeter defines the server loads as the following:
  # iodepth=1 Linear
  # iodepth=4 Very Light
  # iodepth=8 Light
  # iodepth=64Moderate
  # iodepth=256   Heavy
  iodepth=8
  size=80g
  direct=0
  ioengine=libaio

  [iometer]
  stonewall
  bs=4M
  rw=randrw

  [iometer_just_write]
  stonewall
  bs=4M
  rw=write

  [iometer_just_read]
  stonewall
  bs=4M
  rw=read
  """

  You can measure the virtual machine RSS usage on the hypervisor with:
virsh dommemstat  | grep rss
  or if you are not using libvirt:
grep RSS /proc//status

  When switching off the RBD client cache, all is ok again, as the
  process does not use so much memory anymore.

  There is already a ticket on the ceph bug tracker for this ([1]).
  However I can reproduce that memory behaviour only when using qemu
  (maybe it is using librbd in a special way?). Running directly 'fio'
  with the rbd engine does not result in that high memory usage.

  [1] http://tracker.ceph.com/issues/20054

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1701449/+subscriptions



[Qemu-devel] [PATCH v5 1/9] x86_iommu: move the kernel-irqchip check in common code

2018-10-01 Thread Singh, Brijesh
Interrupt remapping needs kernel-irqchip={off|split} on both Intel and AMD
platforms. Move the check in common place.

Signed-off-by: Brijesh Singh 
Reviewed-by: Peter Xu 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/intel_iommu.c | 7 ---
 hw/i386/x86-iommu.c   | 9 +
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 3dfada1..84dbc20 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3248,13 +3248,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error 
**errp)
 {
 X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
 
-/* Currently Intel IOMMU IR only support "kernel-irqchip={off|split}" */
-if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
-!kvm_irqchip_is_split()) {
-error_setg(errp, "Intel Interrupt Remapping cannot work with "
- "kernel-irqchip=on, please use 'split|off'.");
-return false;
-}
 if (s->intr_eim == ON_OFF_AUTO_ON && !x86_iommu->intr_supported) {
 error_setg(errp, "eim=on cannot be selected without intremap=on");
 return false;
diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c
index 8a01a2d..7440cb8 100644
--- a/hw/i386/x86-iommu.c
+++ b/hw/i386/x86-iommu.c
@@ -25,6 +25,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "trace.h"
+#include "sysemu/kvm.h"
 
 void x86_iommu_iec_register_notifier(X86IOMMUState *iommu,
  iec_notify_fn fn, void *data)
@@ -94,6 +95,14 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp)
 return;
 }
 
+/* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */
+if (x86_iommu->intr_supported && kvm_irqchip_in_kernel() &&
+!kvm_irqchip_is_split()) {
+error_setg(errp, "Interrupt Remapping cannot work with "
+ "kernel-irqchip=on, please use 'split|off'.");
+return;
+}
+
 if (x86_class->realize) {
 x86_class->realize(dev, errp);
 }
-- 
2.7.4




[Qemu-devel] [PATCH v5 7/9] i386: acpi: add IVHD device entry for IOAPIC

2018-10-01 Thread Singh, Brijesh
When interrupt remapping is enabled, add a special IVHD device
(type IOAPIC).

Signed-off-by: Brijesh Singh 
Acked-by: Peter Xu 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/acpi-build.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1599caa..1ef396d 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2467,9 +2467,12 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker)
  *   IVRS table as specified in AMD IOMMU Specification v2.62, Section 5.2
  *   accessible here http://support.amd.com/TechDocs/48882_IOMMU.pdf
  */
+#define IOAPIC_SB_DEVID   (uint64_t)PCI_BUILD_BDF(0, PCI_DEVFN(0x14, 0))
+
 static void
 build_amd_iommu(GArray *table_data, BIOSLinker *linker)
 {
+int ivhd_table_len = 28;
 int iommu_start = table_data->len;
 AMDVIState *s = AMD_IOMMU_DEVICE(x86_iommu_get_default());
 
@@ -2491,8 +2494,16 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  (1UL << 6) | /* PrefSup  */
  (1UL << 7),  /* PPRSup   */
  1);
+
+/*
+ * When interrupt remapping is supported, we add a special IVHD device
+ * for type IO-APIC.
+ */
+if (x86_iommu_get_default()->intr_supported) {
+ivhd_table_len += 8;
+}
 /* IVHD length */
-build_append_int_noprefix(table_data, 28, 2);
+build_append_int_noprefix(table_data, ivhd_table_len, 2);
 /* DeviceID */
 build_append_int_noprefix(table_data, s->devid, 2);
 /* Capability offset */
@@ -2516,6 +2527,21 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linker)
  */
 build_append_int_noprefix(table_data, 0x001, 4);
 
+/*
+ * Add a special IVHD device type.
+ * Refer to spec - Table 95: IVHD device entry type codes
+ *
+ * Linux IOMMU driver checks for the special IVHD device (type IO-APIC).
+ * See Linux kernel commit 'c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059'
+ */
+if (x86_iommu_get_default()->intr_supported) {
+build_append_int_noprefix(table_data,
+ (0x1ull << 56) |   /* type IOAPIC */
+ (IOAPIC_SB_DEVID << 40) |  /* IOAPIC devid */
+ 0x48,  /* special device 
*/
+ 8);
+}
+
 build_header(linker, table_data, (void *)(table_data->data + iommu_start),
  "IVRS", table_data->len - iommu_start, 1, NULL, NULL);
 }
-- 
2.7.4




[Qemu-devel] [PATCH v5 6/9] x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled

2018-10-01 Thread Singh, Brijesh
Emulate the interrupt remapping support when guest virtual APIC is
not enabled.

For more info Refer: AMD IOMMU spec Rev 3.0 - section 2.2.5.1

When VAPIC is not enabled, it uses interrupt remapping as defined in
Table 20 and Figure 15 from IOMMU spec.

Signed-off-by: Brijesh Singh 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/amd_iommu.c  | 199 ++-
 hw/i386/amd_iommu.h  |  46 +++-
 hw/i386/trace-events |   7 ++
 3 files changed, 250 insertions(+), 2 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 9118a75..8e2f13c 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -28,6 +28,7 @@
 #include "qemu/error-report.h"
 #include "hw/i386/apic_internal.h"
 #include "trace.h"
+#include "hw/i386/apic-msidef.h"
 
 /* used AMD-Vi MMIO registers */
 const char *amdvi_mmio_low[] = {
@@ -1032,21 +1033,146 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+static int amdvi_get_irte(AMDVIState *s, MSIMessage *origin, uint64_t *dte,
+  union irte *irte, uint16_t devid)
+{
+uint64_t irte_root, offset;
+
+irte_root = dte[2] & AMDVI_IR_PHYS_ADDR_MASK;
+offset = (origin->data & AMDVI_IRTE_OFFSET) << 2;
+
+trace_amdvi_ir_irte(irte_root, offset);
+
+if (dma_memory_read(&address_space_memory, irte_root + offset,
+irte, sizeof(*irte))) {
+trace_amdvi_ir_err("failed to get irte");
+return -AMDVI_IR_GET_IRTE;
+}
+
+trace_amdvi_ir_irte_val(irte->val);
+
+return 0;
+}
+
+static int amdvi_int_remap_legacy(AMDVIState *iommu,
+  MSIMessage *origin,
+  MSIMessage *translated,
+  uint64_t *dte,
+  X86IOMMUIrq *irq,
+  uint16_t sid)
+{
+int ret;
+union irte irte;
+
+/* get interrupt remapping table */
+ret = amdvi_get_irte(iommu, origin, dte, &irte, sid);
+if (ret < 0) {
+return ret;
+}
+
+if (!irte.fields.valid) {
+trace_amdvi_ir_target_abort("RemapEn is disabled");
+return -AMDVI_IR_TARGET_ABORT;
+}
+
+if (irte.fields.guest_mode) {
+error_report_once("guest mode is not zero");
+return -AMDVI_IR_ERR;
+}
+
+if (irte.fields.int_type > AMDVI_IOAPIC_INT_TYPE_ARBITRATED) {
+error_report_once("reserved int_type");
+return -AMDVI_IR_ERR;
+}
+
+irq->delivery_mode = irte.fields.int_type;
+irq->vector = irte.fields.vector;
+irq->dest_mode = irte.fields.dm;
+irq->redir_hint = irte.fields.rq_eoi;
+irq->dest = irte.fields.destination;
+
+return 0;
+}
+
+static int __amdvi_int_remap_msi(AMDVIState *iommu,
+ MSIMessage *origin,
+ MSIMessage *translated,
+ uint64_t *dte,
+ X86IOMMUIrq *irq,
+ uint16_t sid)
+{
+uint8_t int_ctl;
+
+int_ctl = (dte[2] >> AMDVI_IR_INTCTL_SHIFT) & 3;
+trace_amdvi_ir_intctl(int_ctl);
+
+switch (int_ctl) {
+case AMDVI_IR_INTCTL_PASS:
+memcpy(translated, origin, sizeof(*origin));
+return 0;
+case AMDVI_IR_INTCTL_REMAP:
+break;
+case AMDVI_IR_INTCTL_ABORT:
+trace_amdvi_ir_target_abort("int_ctl abort");
+return -AMDVI_IR_TARGET_ABORT;
+default:
+trace_amdvi_ir_err("int_ctl reserved");
+return -AMDVI_IR_ERR;
+}
+
+return amdvi_int_remap_legacy(iommu, origin, translated, dte, irq, sid);
+}
+
 /* Interrupt remapping for MSI/MSI-X entry */
 static int amdvi_int_remap_msi(AMDVIState *iommu,
MSIMessage *origin,
MSIMessage *translated,
uint16_t sid)
 {
+int ret = 0;
+uint64_t pass = 0;
+uint64_t dte[4] = { 0 };
+X86IOMMUIrq irq = { 0 };
+uint8_t dest_mode, delivery_mode;
+
 assert(origin && translated);
 
+/*
+ * When IOMMU is enabled, interrupt remap request will come either from
+ * IO-APIC or PCI device. If interrupt is from PCI device then it will
+ * have a valid requester id but if the interrupt is from IO-APIC
+ * then requester id will be invalid.
+ */
+if (sid == X86_IOMMU_SID_INVALID) {
+sid = AMDVI_IOAPIC_SB_DEVID;
+}
+
 trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid);
 
-if (!iommu || !X86_IOMMU_DEVICE(iommu)->intr_supported) {
+/* check if device table entry is set before we go further. */
+if (!iommu || !iommu->devtab_len) {
 memcpy(translated, origin, sizeof(*origin));
 goto out;
 }
 
+if (!amdvi_get_dte(iomm

[Qemu-devel] [PATCH v5 4/9] x86_iommu/amd: make the address space naming consistent with intel-iommu

2018-10-01 Thread Singh, Brijesh
To be consistent with intel-iommu:

- rename the address space to use '_' instead of '-'
- update the memory region relationships

Signed-off-by: Brijesh Singh 
Reviewed-by: Peter Xu 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/amd_iommu.c | 34 +++---
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 7206bb0..4bec1c6 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -55,6 +55,7 @@ struct AMDVIAddressSpace {
 uint8_t bus_num;/* bus number   */
 uint8_t devfn;  /* device function  */
 AMDVIState *iommu_state;/* AMDVI - one per machine  */
+MemoryRegion root;  /* AMDVI Root memory map region */
 IOMMUMemoryRegion iommu;/* Device's address translation region  */
 MemoryRegion iommu_ir;  /* Device's interrupt remapping region  */
 AddressSpace as;/* device's corresponding address space */
@@ -1032,8 +1033,9 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 
 static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
+char name[128];
 AMDVIState *s = opaque;
-AMDVIAddressSpace **iommu_as;
+AMDVIAddressSpace **iommu_as, *amdvi_dev_as;
 int bus_num = pci_bus_num(bus);
 
 iommu_as = s->address_spaces[bus_num];
@@ -1046,19 +1048,37 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 
 /* set up AMD-Vi region */
 if (!iommu_as[devfn]) {
+snprintf(name, sizeof(name), "amd_iommu_devfn_%d", devfn);
+
 iommu_as[devfn] = g_malloc0(sizeof(AMDVIAddressSpace));
 iommu_as[devfn]->bus_num = (uint8_t)bus_num;
 iommu_as[devfn]->devfn = (uint8_t)devfn;
 iommu_as[devfn]->iommu_state = s;
 
-memory_region_init_iommu(&iommu_as[devfn]->iommu,
- sizeof(iommu_as[devfn]->iommu),
+amdvi_dev_as = iommu_as[devfn];
+
+/*
+ * Memory region relationships looks like (Address range shows
+ * only lower 32 bits to make it short in length...):
+ *
+ * |-+---+--|
+ * | Name| Address range | Priority |
+ * |-+---+--+
+ * | amdvi_root  | - |0 |
+ * |  amdvi_iommu| - |1 |
+ * |-+---+--|
+ */
+memory_region_init_iommu(&amdvi_dev_as->iommu,
+ sizeof(amdvi_dev_as->iommu),
  TYPE_AMD_IOMMU_MEMORY_REGION,
  OBJECT(s),
- "amd-iommu", UINT64_MAX);
-address_space_init(&iommu_as[devfn]->as,
-   MEMORY_REGION(&iommu_as[devfn]->iommu),
-   "amd-iommu");
+ "amd_iommu", UINT64_MAX);
+memory_region_init(&amdvi_dev_as->root, OBJECT(s),
+   "amdvi_root", UINT64_MAX);
+address_space_init(&amdvi_dev_as->as, &amdvi_dev_as->root, name);
+memory_region_add_subregion_overlap(&amdvi_dev_as->root, 0,
+
MEMORY_REGION(&amdvi_dev_as->iommu),
+1);
 }
 return &iommu_as[devfn]->as;
 }
-- 
2.7.4




[Qemu-devel] [PATCH v5 0/9] x86_iommu/amd: add interrupt remap support

2018-10-01 Thread Singh, Brijesh
This series adds the interrupt remapping support for amd-iommu device.

IOMMU spec is available at: https://support.amd.com/TechDocs/48882_IOMMU.pdf

To enable the interrupt remap use below qemu cli
# $QEMU \
  -device amd-iommu,intremap=on

I have tested FC-28 and Ubuntu 18.04 guest. 

Linux guest bootup log shows the interrupt remap supports:

[root@localhost ~]# dmesg | grep -i AMD-Vi
[0.001761] AMD-Vi: Using IVHD type 0x10
[0.003051] AMD-Vi: device: 00:03.0 cap: 0040 seg: 0 flags: d1 info 
[0.004007] AMD-Vi:mmio-addr: fed8
[0.004874] AMD-Vi:   DEV_ALLflags: 00
[0.006236] AMD-Vi:   DEV_SPECIAL(IOAPIC[0]) devid: 00:14.0
[0.667943] AMD-Vi: Found IOMMU at :00:03.0 cap 0x40
[0.668727] AMD-Vi: Extended features (0x29d3):
[0.669874] AMD-Vi: Interrupt remapping enabled
[0.671074] AMD-Vi: Lazy IO/TLB flushing enabled

cat /proc/interrupts confirms that its using IR

[root@localhost ~]# cat /proc/interrupts 
CPU0   
 0: 40  IR-IO-APIC2-edge  timer
 1:  9  IR-IO-APIC1-edge  i8042
 4:   1770  IR-IO-APIC4-edge  ttyS0
 7:  0  IR-IO-APIC7-edge  parport0
 8:  1  IR-IO-APIC8-edge  rtc0
 9:  0  IR-IO-APIC9-fasteoi   acpi
12: 15  IR-IO-APIC   12-edge  i8042
16:  0  IR-IO-APIC   16-fasteoi   i801_smbus
24:  0   PCI-MSI 49152-edge  AMD-Vi
25:  13070  IR-PCI-MSI 512000-edge  ahci[:00:1f.2]
26: 86  IR-PCI-MSI 32768-edge  enp0s2-rx-0
27:139  IR-PCI-MSI 32769-edge  enp0s2-tx-0
28:  1  IR-PCI-MSI 32770-edge  enp0s2
NMI:  0   Non-maskable interrupts
LOC:  26686   Local timer interrupts
SPU:  0   Spurious interrupts
...
...

Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
Cc: Peter Xu 

Changes since v4:
- Based on Peter's comment use error_report_once in patch 5 and
  tracepoint in Patch 6

Changes since v3:
- do not treat errors as a passthrough (where applicable)
- use error_report_once when applicable

Changes since v2:
- make the address space rename a separate patch
- fix the V=1 check in patch 2
- add more comments in the patches
- use amdvi_target_abort trace point where applicable
- use error_report_once where applicable
- do not cause exit() when configuration mismatch is detected,
  but log the error so that user knows about it.

Changes since v1:
- move vtd_generate_msi_message to common code
- fix the dest_mode bit extraction
- add more comments explaining why we add the special device
- some minor cleanups based on Peter's feedbacks
Brijesh Singh (9):
  x86_iommu: move the kernel-irqchip check in common code
  x86_iommu: move vtd_generate_msi_message in common file
  x86_iommu/amd: remove V=1 check from amdvi_validate_dte()
  x86_iommu/amd: make the address space naming consistent with
intel-iommu
  x86_iommu/amd: Prepare for interrupt remap support
  x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled
  i386: acpi: add IVHD device entry for IOAPIC
  x86_iommu/amd: Add interrupt remap support when VAPIC is enabled
  x86_iommu/amd: Enable Guest virtual APIC support

 hw/i386/acpi-build.c  |  31 +++-
 hw/i386/amd_iommu.c   | 414 +-
 hw/i386/amd_iommu.h   |  96 +-
 hw/i386/intel_iommu.c |  39 +---
 hw/i386/trace-events  |  14 ++
 hw/i386/x86-iommu.c   |  33 
 include/hw/i386/intel_iommu.h |  59 --
 include/hw/i386/x86-iommu.h   |  66 +++
 8 files changed, 643 insertions(+), 109 deletions(-)

-- 
2.7.4




[Qemu-devel] [PATCH v5 5/9] x86_iommu/amd: Prepare for interrupt remap support

2018-10-01 Thread Singh, Brijesh
Register the interrupt remapping callback and read/write ops for the
amd-iommu-ir memory region.

amd-iommu-ir is set to higher priority to ensure that this region won't
be masked out by other memory regions.

Signed-off-by: Brijesh Singh 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/amd_iommu.c  | 106 +++
 hw/i386/amd_iommu.h  |  14 ++-
 hw/i386/trace-events |   5 +++
 3 files changed, 123 insertions(+), 2 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 4bec1c6..9118a75 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -26,6 +26,7 @@
 #include "amd_iommu.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
+#include "hw/i386/apic_internal.h"
 #include "trace.h"
 
 /* used AMD-Vi MMIO registers */
@@ -1031,6 +1032,99 @@ static IOMMUTLBEntry amdvi_translate(IOMMUMemoryRegion 
*iommu, hwaddr addr,
 return ret;
 }
 
+/* Interrupt remapping for MSI/MSI-X entry */
+static int amdvi_int_remap_msi(AMDVIState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+assert(origin && translated);
+
+trace_amdvi_ir_remap_msi_req(origin->address, origin->data, sid);
+
+if (!iommu || !X86_IOMMU_DEVICE(iommu)->intr_supported) {
+memcpy(translated, origin, sizeof(*origin));
+goto out;
+}
+
+if (origin->address & AMDVI_MSI_ADDR_HI_MASK) {
+trace_amdvi_err("MSI address high 32 bits non-zero when "
+"Interrupt Remapping enabled.");
+return -AMDVI_IR_ERR;
+}
+
+if ((origin->address & AMDVI_MSI_ADDR_LO_MASK) != APIC_DEFAULT_ADDRESS) {
+trace_amdvi_err("MSI is not from IOAPIC.");
+return -AMDVI_IR_ERR;
+}
+
+out:
+trace_amdvi_ir_remap_msi(origin->address, origin->data,
+ translated->address, translated->data);
+return 0;
+}
+
+static int amdvi_int_remap(X86IOMMUState *iommu,
+   MSIMessage *origin,
+   MSIMessage *translated,
+   uint16_t sid)
+{
+return amdvi_int_remap_msi(AMD_IOMMU_DEVICE(iommu), origin,
+   translated, sid);
+}
+
+static MemTxResult amdvi_mem_ir_write(void *opaque, hwaddr addr,
+  uint64_t value, unsigned size,
+  MemTxAttrs attrs)
+{
+int ret;
+MSIMessage from = { 0, 0 }, to = { 0, 0 };
+uint16_t sid = AMDVI_IOAPIC_SB_DEVID;
+
+from.address = (uint64_t) addr + AMDVI_INT_ADDR_FIRST;
+from.data = (uint32_t) value;
+
+trace_amdvi_mem_ir_write_req(addr, value, size);
+
+if (!attrs.unspecified) {
+/* We have explicit Source ID */
+sid = attrs.requester_id;
+}
+
+ret = amdvi_int_remap_msi(opaque, &from, &to, sid);
+if (ret < 0) {
+/* TODO: log the event using IOMMU log event interface */
+error_report_once("failed to remap interrupt from devid 0x%x", sid);
+return MEMTX_ERROR;
+}
+
+apic_get_class()->send_msi(&to);
+
+trace_amdvi_mem_ir_write(to.address, to.data);
+return MEMTX_OK;
+}
+
+static MemTxResult amdvi_mem_ir_read(void *opaque, hwaddr addr,
+ uint64_t *data, unsigned size,
+ MemTxAttrs attrs)
+{
+return MEMTX_OK;
+}
+
+static const MemoryRegionOps amdvi_ir_ops = {
+.read_with_attrs = amdvi_mem_ir_read,
+.write_with_attrs = amdvi_mem_ir_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.impl = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+}
+};
+
 static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
 {
 char name[128];
@@ -1066,6 +1160,7 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
  * |-+---+--+
  * | amdvi_root  | - |0 |
  * |  amdvi_iommu| - |1 |
+ * |  amdvi_iommu_ir | fee0-feef |   64 |
  * |-+---+--|
  */
 memory_region_init_iommu(&amdvi_dev_as->iommu,
@@ -1076,6 +1171,13 @@ static AddressSpace *amdvi_host_dma_iommu(PCIBus *bus, 
void *opaque, int devfn)
 memory_region_init(&amdvi_dev_as->root, OBJECT(s),
"amdvi_root", UINT64_MAX);
 address_space_init(&amdvi_dev_as->as, &amdvi_dev_as->root, name);
+memory_region_init_io(&amdvi_dev_as->iommu_ir, OBJECT(s),
+  &amdvi_ir_ops, s, "amd_iommu_ir",
+   

[Qemu-devel] [PATCH v5 3/9] x86_iommu/amd: remove V=1 check from amdvi_validate_dte()

2018-10-01 Thread Singh, Brijesh
Currently, the amdvi_validate_dte() assumes that a valid DTE will
always have V=1. This is not true. The V=1 means that bit[127:1] are
valid. A valid DTE can have IV=1 and V=0 (i.e address translation
disabled and interrupt remapping enabled)

Remove the V=1 check from amdvi_validate_dte(), make the caller
responsible to check for V or IV bits.

This also fixes a bug in existing code that when error is
detected during the translation we'll fail the translation
instead of assuming a passthrough mode.

Signed-off-by: Brijesh Singh 
Reviewed-by: Peter Xu 
Cc: Peter Xu 
Cc: "Michael S. Tsirkin" 
Cc: Paolo Bonzini 
Cc: Richard Henderson 
Cc: Eduardo Habkost 
Cc: Marcel Apfelbaum 
Cc: Tom Lendacky 
Cc: Suravee Suthikulpanit 
---
 hw/i386/amd_iommu.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/i386/amd_iommu.c b/hw/i386/amd_iommu.c
index 1fd669f..7206bb0 100644
--- a/hw/i386/amd_iommu.c
+++ b/hw/i386/amd_iommu.c
@@ -807,7 +807,7 @@ static inline uint64_t amdvi_get_perms(uint64_t entry)
AMDVI_DEV_PERM_SHIFT;
 }
 
-/* a valid entry should have V = 1 and reserved bits honoured */
+/* validate that reserved bits are honoured */
 static bool amdvi_validate_dte(AMDVIState *s, uint16_t devid,
uint64_t *dte)
 {
@@ -820,7 +820,7 @@ static bool amdvi_validate_dte(AMDVIState *s, uint16_t 
devid,
 return false;
 }
 
-return dte[0] & AMDVI_DEV_VALID;
+return true;
 }
 
 /* get a device table entry given the devid */
@@ -966,8 +966,12 @@ static void amdvi_do_translate(AMDVIAddressSpace *as, 
hwaddr addr,
 return;
 }
 
-/* devices with V = 0 are not translated */
 if (!amdvi_get_dte(s, devid, entry)) {
+return;
+}
+
+/* devices with V = 0 are not translated */
+if (!(entry[0] & AMDVI_DEV_VALID)) {
 goto out;
 }
 
-- 
2.7.4




Re: [Qemu-devel] [PATCH v4 2/3] file-posix: Drop s->lock_fd

2018-10-01 Thread Max Reitz
On 21.08.18 02:58, Fam Zheng wrote:
> The lock_fd field is not strictly necessary because transferring locked
> bytes from old fd to the new one shouldn't fail anyway. This spares the
> user one fd per image.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/file-posix.c | 37 +
>  1 file changed, 13 insertions(+), 24 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 1/3] file-posix: Skip effectiveless OFD lock operations

2018-10-01 Thread Max Reitz
On 21.08.18 02:58, Fam Zheng wrote:
> If we know we've already locked the bytes, don't do it again; similarly
> don't unlock a byte if we haven't locked it. This doesn't change the
> behavior, but fixes a corner case explained below.
> 
> Libvirt had an error handling bug that an image can get its (ownership,
> file mode, SELinux) permissions changed (RHBZ 1584982) by mistake behind
> QEMU. Specifically, an image in use by Libvirt VM has:
> 
> $ ls -lhZ b.img
> -rw-r--r--. qemu qemu system_u:object_r:svirt_image_t:s0:c600,c690 b.img
> 
> Trying to attach it a second time won't work because of image locking.
> And after the error, it becomes:
> 
> $ ls -lhZ b.img
> -rw-r--r--. root root system_u:object_r:virt_image_t:s0 b.img
> 
> Then, we won't be able to do OFD lock operations with the existing fd.
> In other words, the code such as in blk_detach_dev:
> 
> blk_set_perm(blk, 0, BLK_PERM_ALL, &error_abort);
> 
> can abort() QEMU, out of environmental changes.
> 
> This patch is an easy fix to this and the change is regardlessly
> reasonable, so do it.
> 
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v3: Don't misuse s->perm and s->shared_perm.
> v2: For s == NULL, unlock all bits. [Kevin]
> ---
>  block/file-posix.c | 54 +-
>  1 file changed, 44 insertions(+), 10 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [Bug 1701449] Re: high memory usage when using rbd with client caching

2018-10-01 Thread Andreas Hasenack
Any reason we are keeping this bug and #1674481 separate? We are not
sure?

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1701449

Title:
  high memory usage when using rbd with client caching

Status in QEMU:
  New

Bug description:
  Hi,
  we are experiencing a quite high memory usage of a single qemu (used with 
KVM) process when using RBD with client caching as a disk backend. We are 
testing with 3GB memory qemu virtual machines and 128MB RBD client cache. When 
running 'fio' in the virtual machine you can see that after some time the 
machine uses a lot more memory (RSS) on the hypervisor than she should. We have 
seen values (in real production machines, no artificially fio tests) of 250% 
memory overhead. I reproduced this with qemu version 2.9 as well.

  Here the contents of our ceph.conf on the hypervisor:
  """
  [client]
  rbd cache writethrough until flush = False
  rbd cache max dirty = 100663296
  rbd cache size = 134217728
  rbd cache target dirty = 50331648
  """

  How to reproduce:
  * create a virtual machine with a RBD backed disk (100GB or so)
  * install a linux distribution on it (we are using Ubuntu)
  * install fio (apt-get install fio)
  * run fio multiple times with (e.g.) the following test file:
  """
  # This job file tries to mimic the Intel IOMeter File Server Access Pattern
  [global]
  description=Emulation of Intel IOmeter File Server Access Pattern
  randrepeat=0
  filename=/root/test.dat
  # IOMeter defines the server loads as the following:
  # iodepth=1 Linear
  # iodepth=4 Very Light
  # iodepth=8 Light
  # iodepth=64Moderate
  # iodepth=256   Heavy
  iodepth=8
  size=80g
  direct=0
  ioengine=libaio

  [iometer]
  stonewall
  bs=4M
  rw=randrw

  [iometer_just_write]
  stonewall
  bs=4M
  rw=write

  [iometer_just_read]
  stonewall
  bs=4M
  rw=read
  """

  You can measure the virtual machine RSS usage on the hypervisor with:
virsh dommemstat  | grep rss
  or if you are not using libvirt:
grep RSS /proc//status

  When switching off the RBD client cache, all is ok again, as the
  process does not use so much memory anymore.

  There is already a ticket on the ceph bug tracker for this ([1]).
  However I can reproduce that memory behaviour only when using qemu
  (maybe it is using librbd in a special way?). Running directly 'fio'
  with the rbd engine does not result in that high memory usage.

  [1] http://tracker.ceph.com/issues/20054

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1701449/+subscriptions



Re: [Qemu-devel] [PATCH v3 2/3] scripts: add render_block_graph function for QEMUMachine

2018-10-01 Thread Max Reitz
On 23.08.18 17:46, Vladimir Sementsov-Ogievskiy wrote:
> Render block nodes graph with help of graphviz. This new function is
> for debugging, so there is no sense to put it into qemu.py as a method
> of QEMUMachine. Let's instead put it separately.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  scripts/render_block_graph.py | 120 ++
>  1 file changed, 120 insertions(+)
>  create mode 100755 scripts/render_block_graph.py

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4 22/24] numa: handle virtio-pmem in NUMA stats

2018-10-01 Thread Dr. David Alan Gilbert
* David Hildenbrand (da...@redhat.com) wrote:
> Account the memory to node 0 for now. Once (if ever) virtio-pmem
> supports NUMA, we can account it to the right node.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  numa.c | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/numa.c b/numa.c
> index 81542d4ebb..1ff1418c1e 100644
> --- a/numa.c
> +++ b/numa.c
> @@ -545,6 +545,7 @@ static void numa_stat_memory_devices(NumaNodeMem 
> node_mem[])
>  MemoryDeviceInfoList *info_list = qmp_memory_device_list();
>  MemoryDeviceInfoList *info;
>  PCDIMMDeviceInfo *pcdimm_info;
> +VirtioPMemDeviceInfo *vpi;
>  
>  for (info = info_list; info; info = info->next) {
>  MemoryDeviceInfo *value = info->value;
> @@ -552,22 +553,21 @@ static void numa_stat_memory_devices(NumaNodeMem 
> node_mem[])
>  if (value) {
>  switch (value->type) {
>  case MEMORY_DEVICE_INFO_KIND_DIMM:
> -pcdimm_info = value->u.dimm.data;
> -break;
> -
>  case MEMORY_DEVICE_INFO_KIND_NVDIMM:
> -pcdimm_info = value->u.nvdimm.data;
> -break;
> -
> -default:
> -pcdimm_info = NULL;
> -break;
> -}
> -
> -if (pcdimm_info) {
> +pcdimm_info = (value->type == MEMORY_DEVICE_INFO_KIND_DIMM) ?
> +  value->u.dimm.data : value->u.nvdimm.data;
>  node_mem[pcdimm_info->node].node_mem += pcdimm_info->size;
>  node_mem[pcdimm_info->node].node_plugged_mem +=
>  pcdimm_info->size;
> +break;
> +case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM:
> +vpi = value->u.virtio_pmem.data;
> +/* TODO: once we support numa, assign to right node */
> +node_mem[0].node_mem += vpi->size;
> +node_mem[0].node_plugged_mem += vpi->size;
> +break;
> +default:
> +g_assert_not_reached();
>  }
>  }
>  }
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH v4 21/24] hmp: handle virtio-pmem when printing memory device infos

2018-10-01 Thread Dr. David Alan Gilbert
* David Hildenbrand (da...@redhat.com) wrote:
> Print the memory device info just like for PCDIMM/NVDIMM.
> 
> Signed-off-by: David Hildenbrand 

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  hmp.c | 27 +++
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/hmp.c b/hmp.c
> index 3a9f797677..159f82e155 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -2532,6 +2532,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
>  Error *err = NULL;
>  MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
>  MemoryDeviceInfoList *info;
> +VirtioPMemDeviceInfo *vpi;
>  MemoryDeviceInfo *value;
>  PCDIMMDeviceInfo *di;
>  
> @@ -2541,19 +2542,9 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
>  if (value) {
>  switch (value->type) {
>  case MEMORY_DEVICE_INFO_KIND_DIMM:
> -di = value->u.dimm.data;
> -break;
> -
>  case MEMORY_DEVICE_INFO_KIND_NVDIMM:
> -di = value->u.nvdimm.data;
> -break;
> -
> -default:
> -di = NULL;
> -break;
> -}
> -
> -if (di) {
> +di = (value->type == MEMORY_DEVICE_INFO_KIND_DIMM) ?
> + value->u.dimm.data : value->u.nvdimm.data;
>  monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
> MemoryDeviceInfoKind_str(value->type),
> di->id ? di->id : "");
> @@ -2566,6 +2557,18 @@ void hmp_info_memory_devices(Monitor *mon, const QDict 
> *qdict)
> di->hotplugged ? "true" : "false");
>  monitor_printf(mon, "  hotpluggable: %s\n",
> di->hotpluggable ? "true" : "false");
> +break;
> +case MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM:
> +vpi = value->u.virtio_pmem.data;
> +monitor_printf(mon, "Memory device [%s]: \"%s\"\n",
> +   MemoryDeviceInfoKind_str(value->type),
> +   vpi->id ? vpi->id : "");
> +monitor_printf(mon, "  memaddr: 0x%" PRIx64 "\n", 
> vpi->memaddr);
> +monitor_printf(mon, "  size: %" PRIu64 "\n", vpi->size);
> +monitor_printf(mon, "  memdev: %s\n", vpi->memdev);
> +break;
> +default:
> +g_assert_not_reached();
>  }
>  }
>  }
> -- 
> 2.17.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] ideas for improving TLB performance (help with TCG backend wanted)

2018-10-01 Thread Emilio G. Cota
On Thu, Sep 20, 2018 at 01:19:51 +0100, Alex Bennée wrote:
> If we are going to have an indirection then we can also drop the
> requirement to scale the TLB according to the number of MMU indexes we
> have to support. It's fairly wasteful when a bunch of them are almost
> never used unless you are running stuff that uses them.

So with dynamic TLB sizing, what you're suggesting here is to resize
each MMU array independently (depending on their use rate) instead
of using a single "TLB size" for all MMU indexes. Am I understanding
your point correctly?

Thanks,

E.



Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding reverse debugging

2018-10-01 Thread Artem Pisarenko
I've posted bug report with extended tests (incl. case without sleep=off).
You may find guest image (kernel) in bug description.
https://bugs.launchpad.net/qemu/+bug/1795369

The most annoying thing is that some issues are almost not reproducible.
There are definitely race conditions somewhere in qemu code. Running
'stress-ng' utility with CPU and I/O stressors in parallel with qemu
execution greatly minimizes amount of attempts when I'm trying to trigger
some of issues I encounter.

I'll try 'info monitor' command tomorrow, but no guarantees that I'll be
able to reproduce issue again.

Speaking about '-nographic' and SDL... I've noted that UI greatly minimizes
possibility of hanging (but not avoids it completely) when using icount in
general, so this effect isn't rr-specific. I've already reported this bug
too.


пн, 1 окт. 2018 г., 20:14 dovgaluk :

> Artem Pisarenko писал 2018-09-30 14:01:
> > Feature still broken :(
>
> Thanks for testing.
>
> >
> > Brief description of my tests.
> >
> > Guest image is Linux, which just powers off after kernel boots
> > (instead of proceeding to user-space /init or /sbin/init).
> > Base cmdline:
> > qemu-system-x86_64 -nodefaults -machine pc,accel=tcg -m 2048 -cpu
> > qemu64 -rtc clock=vm,base=2000-01-01T00:00:00 -kernel bzImage -initrd
> > rootfs -append 'nokaslr console=ttyS0 rdinit=/init_poweroff'
> > -nographic -serial SERIAL_VALUE -icount
> > 1,sleep=off,rr=RR_VALUE,rrfile=icount_rr_capture.bin
>
> I've never tried it with sleep=off. Can you remove it and try again?
>
> We also seen a problem with '-nographic'. When we remove this option and
> QEMU runs with SDL
> window, everything is ok. There is some problem with main loop which may
> sleep when there
> is no GUI to update, or something like that. We couldn't fix it yet.
>
> >
> > Test 1. When SERIAL_VALUE=none
> > Running with RR_VALUE=record completes successfully.
> > Running with RR_VALUE=replay doesn't completes. qemu process just
> > eating ~100% cpu and memory usage doesn't grow after some moment. I
> > don't see what happens because of problem no.2 (see below).
>
> Try 'info replay' monitor command. Does instruction counter increases?
>
> >
> > Test 2. When SERIAL_VALUE=stdio
> > Running with RR_VALUE=record completes successfully.
> >
> > Running with RR_VALUE=replay caues exit with error:
> >
> > "qemu-system-x86_64: Missing character write event in the replay log"
> >
> > These problems are same with qemu 2.12 (both vanilla and with previous
> > versions of these patches applied). Furthemore, I consider whole
> > icount mode broken and determinism isn't achievable.
> > The irony is that I actually don't need record/replay feature. I've
> > tried to use it only as instrument to debug failing determinism in
> > qemu code. But since replay/record feature itself relies on
> > determinism, which is broken, it's no wonder why it fails also (I just
> > hoped to bypass it).
> >
> > Contact me if you need more details. I just tired a lot trying to get
> > all these things working... Hope is leaving me...
>
> Can you share the kernel in case the icount still broken?
>
>
> Pavel Dovgalyuk
>
> --

С уважением,
  Артем Писаренко


[Qemu-devel] [PULL 00/15] target/xtensa: preparation for FLIX support

2018-10-01 Thread Max Filippov
Hi Peter,

please pull the following series that rearranges target/xtensa code in
preparation for FLIX support.
The following changes since commit c5e4e49258e9b89cb34c085a419dd9f862935c48:

  Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-09-25' 
into staging (2018-09-25 16:47:35 +0100)

are available in the git repository at:

  git://github.com/OSLL/qemu-xtensa.git tags/20181001-xtensa

for you to fetch changes up to d74624e59a6087ccf233843b936aedc9f8a8050a:

  target/xtensa: extract gen_check_interrupts call (2018-10-01 11:08:36 -0700)


target/xtensa: preparation for FLIX support

Separate generation of per-instruction code (such as raising exceptions
and terminating TB) from per-opcode code.


Max Filippov (15):
  target/xtensa: extract test for an illegal instruction
  target/xtensa: extract test for privileged instruction
  target/xtensa: extract test for syscall instruction
  target/xtensa: extract test for debug exception
  target/xtensa: extract test for window overflow exception
  target/xtensa: extract test for window underflow exception
  target/xtensa: extract test for alloca exception
  target/xtensa: extract test for cpdisabled exception
  target/xtensa: extract test for division by zero
  target/xtensa: extract unconditional TB termination
  target/xtensa: change SR number checks to assertions
  target/xtensa: always end TB on CCOUNT access/CCOMPARE write
  target/xtensa: extract unconditional TB termination via slot 0
  target/xtensa: make rsr/wsr helpers return void
  target/xtensa: extract gen_check_interrupts call

 target/xtensa/cpu.h   |   37 +-
 target/xtensa/helper.c|6 +
 target/xtensa/helper.h|2 +
 target/xtensa/op_helper.c |   73 +-
 target/xtensa/translate.c | 2673 +
 5 files changed, 1817 insertions(+), 974 deletions(-)

Thanks.
-- Max



Re: [Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()

2018-10-01 Thread Alex Bennée


Peter Maydell  writes:

> I've been investigating a race condition where sometimes when my
> guest writes to a device register which triggers a
> qemu_system_reset_request(), it doesn't actually cause a clean reset,
> but instead the guest CPU continues to execute instructions.
> I managed to repro it under 'rr', which let me walk through enough
> of what was going on to determine the following:
>
> When a guest CPU thread calls qemu_system_reset_request(), this
> results in a call to qemu_cpu_stop(current_cpu, true), to
> make the CPU come back out to the main loop. We also set the
> reset_requested flag, to get the IO thread to actually do the
> reset.
>
> The main loop thread runs main_loop_should_exit(). If there is a
> pending reset, it calls pause_all_vcpus(), with the intention
> that this quiesces all the guest CPUs before it starts messing
> with reset actions.
>
> pause_all_vcpus() just waits for every cpu to have cpu->stopped set.
> However, if the running cpu has just called qemu_cpu_stop() on
> itself then it will have set cpu->stopped true but not actually
> made it out to the main loop yet. (In the case I'm looking at,
> what happens is that as soon as the CPU thread unlocks the
> iothread mutex in io_writex() after the device write, the
> main thread runs and does all the reset operations.)
>
> The reset code in the iothread then proceeds to start calling
> various reset functions while the CPU thread is still inside
> the exec loop, running generated code and so on. This doesn't
> seem like what ought to happen. In particular it includes
> calling cpu_common_reset(), which clears all kinds of flags
> relevant to the still-executing CPU...

I would have thought the reset code should be scheduled via safe async
work to run in the vCPU context. Why should the main loop get involved
at all here?

>
> Any suggestions for how we should fix this?
>
> thanks
> -- PMM


--
Alex Bennée



[Qemu-devel] [PULL 15/23] qcow2: Assign the L2 cache relatively to the image size

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O.

Previously, unless 'cache-size' was specified and was large enough, the
L2 cache was set to a certain size without taking the virtual image size
into account.

Now, the L2 cache assignment is aware of the virtual size of the image,
and will cover the entire image, unless the cache size needed for that is
larger than a certain maximum. This maximum is set to 1 MB by default
(enough to cover an 8 GB image with the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 docs/qcow2-cache.txt   | 15 ++-
 block/qcow2.h  |  4 +---
 block/qcow2.c  | 21 +
 qemu-options.hx|  6 +++---
 tests/qemu-iotests/137 |  8 +++-
 tests/qemu-iotests/137.out |  4 +++-
 6 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 7e28b41bd3..750447ea4f 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -125,8 +125,12 @@ There are a few things that need to be taken into account:
  - Both caches must have a size that is a multiple of the cluster size
(or the cache entry size: see "Using smaller cache sizes" below).
 
- - The default L2 cache size is 8 clusters or 1MB (whichever is more),
-   and the minimum is 2 clusters (or 2 cache entries, see below).
+ - The maximum L2 cache size is 1 MB by default (enough for full coverage
+   of 8 GB images, with the default cluster size). This value can be
+   modified using the "l2-cache-size" option. QEMU will not use more memory
+   than needed to hold all of the image's L2 tables, regardless of this max.
+   value. The minimal L2 cache size is 2 clusters (or 2 cache entries, see
+   below).
 
  - The default (and minimum) refcount cache size is 4 clusters.
 
@@ -184,9 +188,10 @@ Some things to take into account:
always uses the cluster size as the entry size.
 
  - If the L2 cache is big enough to hold all of the image's L2 tables
-   (as explained in the "Choosing the right cache sizes" section
-   earlier in this document) then none of this is necessary and you
-   can omit the "l2-cache-entry-size" parameter altogether.
+   (as explained in the "Choosing the right cache sizes" and "How to
+   configure the cache sizes" sections in this document) then none of
+   this is necessary and you can omit the "l2-cache-entry-size"
+   parameter altogether.
 
 
 Reducing the memory usage
diff --git a/block/qcow2.h b/block/qcow2.h
index a8d6f757b1..2f8c1fd15c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -74,9 +74,7 @@
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
 
-/* Whichever is more */
-#define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
-#define DEFAULT_L2_CACHE_SIZE S_1MiB
+#define DEFAULT_L2_CACHE_MAX_SIZE S_1MiB
 
 #define DEFAULT_CLUSTER_SIZE S_64KiB
 
diff --git a/block/qcow2.c b/block/qcow2.c
index cd0053b6ee..589f6c1b1c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -777,29 +777,35 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
  uint64_t *refcount_cache_size, Error **errp)
 {
 BDRVQcow2State *s = bs->opaque;
-uint64_t combined_cache_size;
+uint64_t combined_cache_size, l2_cache_max_setting;
 bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
 int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
+uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
+uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
 
 combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
 l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
 refcount_cache_size_set = qemu_opt_get(opts, 
QCOW2_OPT_REFCOUNT_CACHE_SIZE);
 
 combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
-*l2_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE, 0);
+l2_cache_max_setting = qemu_opt_get_size(opts, QCOW2_OPT_L2_CACHE_SIZE,
+ DEFAULT_L2_CACHE_MAX_SIZE);
 *refcount_cache_size = qemu_opt_get_size(opts,
  QCOW2_OPT_REFCOUNT_CACHE_SIZE, 0);
 
 *l2_cache_entry_size = qemu_opt_get_size(
 opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size);
 
+*l2_cache_size = MIN(max_l2_cache, l2_cache_max_setting);
+
 if (combined_cache_size_set) {
 if (l2_cache_size_set && re

[Qemu-devel] [PULL 22/23] test-replication: Lock AioContext around blk_unref()

2018-10-01 Thread Kevin Wolf
Recently, the test case has started failing because some job related
functions want to drop the AioContext lock even though it hasn't been
taken:

(gdb) bt
#0  0x7f51c067c9fb in raise () from /lib64/libc.so.6
#1  0x7f51c067e77d in abort () from /lib64/libc.so.6
#2  0x558c9d5dde7b in error_exit (err=, 
msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at 
util/qemu-thread-posix.c:36
#3  0x558c9d6b5263 in qemu_mutex_unlock_impl 
(mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f 
"util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96
#4  0x558c9d6b0565 in aio_context_release 
(ctx=ctx@entry=0x558c9f399940) at util/async.c:516
#5  0x558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at 
job.c:738
#6  0x558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, 
finish=finish@entry=0x558c9d5eb8d0 , errp=errp@entry=0x0) at 
job.c:986
#7  0x558c9d5eb8ee in job_cancel_sync (job=) at job.c:941
#8  0x558c9d64d853 in replication_close (bs=) at 
block/replication.c:148
#9  0x558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420
#10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629
#11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685
#12 0x558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at 
block/block-backend.c:783
#13 0x558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at 
block/block-backend.c:402
#14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457
#15 0x558c9d5dfcea in test_secondary_stop () at 
tests/test-replication.c:478
#16 0x7f51c1f13178 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#17 0x7f51c1f1337b in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#18 0x7f51c1f1337b in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#19 0x7f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0
#20 0x7f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0
#21 0x558c9d5de31f in main (argc=, argv=) 
at tests/test-replication.c:581

It is yet unclear whether this should really be considered a bug in the
test case or whether blk_unref() should work for callers that haven't
taken the AioContext lock, but in order to fix the build tests quickly,
just take the AioContext lock around blk_unref().

Signed-off-by: Kevin Wolf 
---
 tests/test-replication.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tests/test-replication.c b/tests/test-replication.c
index c8165ae954..f085d1993a 100644
--- a/tests/test-replication.c
+++ b/tests/test-replication.c
@@ -207,13 +207,17 @@ static BlockBackend *start_primary(void)
 static void teardown_primary(void)
 {
 BlockBackend *blk;
+AioContext *ctx;
 
 /* remove P_ID */
 blk = blk_by_name(P_ID);
 assert(blk);
 
+ctx = blk_get_aio_context(blk);
+aio_context_acquire(ctx);
 monitor_remove_blk(blk);
 blk_unref(blk);
+aio_context_release(ctx);
 }
 
 static void test_primary_read(void)
@@ -365,20 +369,27 @@ static void teardown_secondary(void)
 {
 /* only need to destroy two BBs */
 BlockBackend *blk;
+AioContext *ctx;
 
 /* remove S_LOCAL_DISK_ID */
 blk = blk_by_name(S_LOCAL_DISK_ID);
 assert(blk);
 
+ctx = blk_get_aio_context(blk);
+aio_context_acquire(ctx);
 monitor_remove_blk(blk);
 blk_unref(blk);
+aio_context_release(ctx);
 
 /* remove S_ID */
 blk = blk_by_name(S_ID);
 assert(blk);
 
+ctx = blk_get_aio_context(blk);
+aio_context_acquire(ctx);
 monitor_remove_blk(blk);
 blk_unref(blk);
+aio_context_release(ctx);
 }
 
 static void test_secondary_read(void)
-- 
2.13.6




[Qemu-devel] [PULL 18/23] qcow2: Set the default cache-clean-interval to 10 minutes

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

The default cache-clean-interval is set to 10 minutes, in order to lower
the overhead of the qcow2 caches (before the default was 0, i.e.
disabled).

* For non-Linux platforms the default is kept at 0, because
  cache-clean-interval is not supported there yet.

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 3 ++-
 docs/qcow2-cache.txt | 4 ++--
 block/qcow2.h| 4 +++-
 block/qcow2.c| 2 +-
 qemu-options.hx  | 2 +-
 5 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ac3b48ee54..46dac23d2f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2895,7 +2895,8 @@
 #
 # @cache-clean-interval:  clean unused entries in the L2 and refcount
 # caches. The interval is in seconds. The default value
-# is 0 and it disables this feature (since 2.5)
+# is 600, and 0 disables this feature. (since 2.5)
+#
 # @encrypt:   Image decryption options. Mandatory for
 # encrypted images, except when doing a metadata-only
 # probe of the image. (since 2.10)
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 1fcc0658b2..59358b816f 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -210,8 +210,8 @@ This example removes all unused cache entries every 15 
minutes:
 
-drive file=hd.qcow2,cache-clean-interval=900
 
-If unset, the default value for this parameter is 0 and it disables
-this feature.
+If unset, the default value for this parameter is 600. Setting it to 0
+disables this feature.
 
 Note that this functionality currently relies on the MADV_DONTNEED
 argument for madvise() to actually free the memory. This is a
diff --git a/block/qcow2.h b/block/qcow2.h
index 0f0e3534bf..ba430316b9 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -76,13 +76,15 @@
 
 #ifdef CONFIG_LINUX
 #define DEFAULT_L2_CACHE_MAX_SIZE S_32MiB
+#define DEFAULT_CACHE_CLEAN_INTERVAL 600  /* seconds */
 #else
 #define DEFAULT_L2_CACHE_MAX_SIZE S_8MiB
+/* Cache clean interval is currently available only on Linux, so must be 0 */
+#define DEFAULT_CACHE_CLEAN_INTERVAL 0
 #endif
 
 #define DEFAULT_CLUSTER_SIZE S_64KiB
 
-
 #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts"
 #define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request"
 #define QCOW2_OPT_DISCARD_SNAPSHOT "pass-discard-snapshot"
diff --git a/block/qcow2.c b/block/qcow2.c
index 20b5093269..95e1c98daa 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -944,7 +944,7 @@ static int qcow2_update_options_prepare(BlockDriverState 
*bs,
 /* New interval for cache cleanup timer */
 r->cache_clean_interval =
 qemu_opt_get_number(opts, QCOW2_OPT_CACHE_CLEAN_INTERVAL,
-s->cache_clean_interval);
+DEFAULT_CACHE_CLEAN_INTERVAL);
 #ifndef CONFIG_LINUX
 if (r->cache_clean_interval != 0) {
 error_setg(errp, QCOW2_OPT_CACHE_CLEAN_INTERVAL
diff --git a/qemu-options.hx b/qemu-options.hx
index 14aee78c6c..52d9d9f06d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -747,7 +747,7 @@ it which is not used for the L2 cache)
 
 @item cache-clean-interval
 Clean unused entries in the L2 and refcount caches. The interval is in seconds.
-The default value is 0 and it disables this feature.
+The default value is 600. Setting it to 0 disables this feature.
 
 @item pass-discard-request
 Whether discard requests to the qcow2 device should be forwarded to the data
-- 
2.13.6




[Qemu-devel] [PULL 21/23] qcow2: Fix cache-clean-interval documentation

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

Fixing cache-clean-interval documentation following the recent change to
a default of 600 seconds on supported plarforms (only Linux currently).

Signed-off-by: Leonid Bloch 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json |  3 ++-
 docs/qcow2-cache.txt | 20 ++--
 qemu-options.hx  |  3 ++-
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 46dac23d2f..25b8a0e744 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2895,7 +2895,8 @@
 #
 # @cache-clean-interval:  clean unused entries in the L2 and refcount
 # caches. The interval is in seconds. The default value
-# is 600, and 0 disables this feature. (since 2.5)
+# is 600 on supporting platforms, and 0 on other
+# platforms. 0 disables this feature. (since 2.5)
 #
 # @encrypt:   Image decryption options. Mandatory for
 # encrypted images, except when doing a metadata-only
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 59358b816f..c459bf5dd3 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -202,18 +202,18 @@ Reducing the memory usage
 It is possible to clean unused cache entries in order to reduce the
 memory usage during periods of low I/O activity.
 
-The parameter "cache-clean-interval" defines an interval (in seconds).
-All cache entries that haven't been accessed during that interval are
-removed from memory.
+The parameter "cache-clean-interval" defines an interval (in seconds),
+after which all the cache entries that haven't been accessed during the
+interval are removed from memory. Setting this parameter to 0 disables this
+feature.
 
-This example removes all unused cache entries every 15 minutes:
+The following example removes all unused cache entries every 15 minutes:
 
-drive file=hd.qcow2,cache-clean-interval=900
 
-If unset, the default value for this parameter is 600. Setting it to 0
-disables this feature.
+If unset, the default value for this parameter is 600 on platforms which
+support this functionality, and is 0 (disabled) on other platforms.
 
-Note that this functionality currently relies on the MADV_DONTNEED
-argument for madvise() to actually free the memory. This is a
-Linux-specific feature, so cache-clean-interval is not supported in
-other systems.
+This functionality currently relies on the MADV_DONTNEED argument for
+madvise() to actually free the memory. This is a Linux-specific feature,
+so cache-clean-interval is not supported on other systems.
diff --git a/qemu-options.hx b/qemu-options.hx
index 52d9d9f06d..f139459e80 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -747,7 +747,8 @@ it which is not used for the L2 cache)
 
 @item cache-clean-interval
 Clean unused entries in the L2 and refcount caches. The interval is in seconds.
-The default value is 600. Setting it to 0 disables this feature.
+The default value is 600 on supporting platforms, and 0 on other platforms.
+Setting it to 0 disables this feature.
 
 @item pass-discard-request
 Whether discard requests to the qcow2 device should be forwarded to the data
-- 
2.13.6




[Qemu-devel] [PULL 13/23] qcow2: Make sizes more humanly readable

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.h | 9 +
 block/qcow2.c | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 81b844e936..a8d6f757b1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
 
 #include "crypto/block.h"
 #include "qemu/coroutine.h"
+#include "qemu/units.h"
 
 //#define DEBUG_ALLOC
 //#define DEBUG_ALLOC2
@@ -43,11 +44,11 @@
 
 /* 8 MB refcount table is enough for 2 PB images at 64k cluster size
  * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
-#define QCOW_MAX_REFTABLE_SIZE 0x80
+#define QCOW_MAX_REFTABLE_SIZE S_8MiB
 
 /* 32 MB L1 table is enough for 2 PB images at 64k cluster size
  * (128 GB for 512 byte clusters, 2 EB for 2 MB clusters) */
-#define QCOW_MAX_L1_SIZE 0x200
+#define QCOW_MAX_L1_SIZE S_32MiB
 
 /* Allow for an average of 1k per snapshot table entry, should be plenty of
  * space for snapshot names and IDs */
@@ -75,9 +76,9 @@
 
 /* Whichever is more */
 #define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
-#define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
+#define DEFAULT_L2_CACHE_SIZE S_1MiB
 
-#define DEFAULT_CLUSTER_SIZE 65536
+#define DEFAULT_CLUSTER_SIZE S_64KiB
 
 
 #define QCOW2_OPT_LAZY_REFCOUNTS "lazy-refcounts"
diff --git a/block/qcow2.c b/block/qcow2.c
index c13153735a..d2c07ce9fe 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -830,7 +830,7 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts 
*opts,
 }
 } else {
 if (!l2_cache_size_set) {
-*l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
+*l2_cache_size = MAX(DEFAULT_L2_CACHE_SIZE,
  (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
  * s->cluster_size);
 }
-- 
2.13.6




[Qemu-devel] [PULL 23/23] tests/test-bdrv-drain: Fix too late qemu_event_reset()

2018-10-01 Thread Kevin Wolf
qemu_event_reset() must be called before the AIO request in a different
iothread is submitted. Otherwise the request could be completed before
we do the qemu_event_reset() and the test would hang in
qemu_event_wait().

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Tested-by: Max Reitz 
---
 tests/test-bdrv-drain.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index c9f29c8b10..ee1740ff06 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -694,6 +694,8 @@ static void test_iothread_common(enum drain_type 
drain_type, int drain_thread)
 s->bh_indirection_ctx = ctx_b;
 
 aio_ret = -EINPROGRESS;
+qemu_event_reset(&done_event);
+
 if (drain_thread == 0) {
 acb = blk_aio_preadv(blk, 0, &qiov, 0, test_iothread_aio_cb, &aio_ret);
 } else {
@@ -723,7 +725,6 @@ static void test_iothread_common(enum drain_type 
drain_type, int drain_thread)
  * but the drain in this thread can continue immediately after
  * bdrv_dec_in_flight() and aio_ret might be assigned only slightly
  * later. */
-qemu_event_reset(&done_event);
 do_drain_begin(drain_type, bs);
 g_assert_cmpint(bs->in_flight, ==, 0);
 
@@ -743,7 +744,6 @@ static void test_iothread_common(enum drain_type 
drain_type, int drain_thread)
 }
 break;
 case 1:
-qemu_event_reset(&done_event);
 aio_bh_schedule_oneshot(ctx_a, test_iothread_drain_entry, &data);
 qemu_event_wait(&done_event);
 break;
-- 
2.13.6




[Qemu-devel] [PULL 20/23] block-backend: Set werror/rerror defaults in blk_new()

2018-10-01 Thread Kevin Wolf
Currently, the default values for werror and rerror have to be set
explicitly with blk_set_on_error() by the callers of blk_new(). The only
caller actually doing this is blockdev_init(), which is called for
BlockBackends created using -drive.

In particular, anonymous BlockBackends created with
-device ...,drive= didn't get the correct default set and
instead defaulted to the integer value 0 (= BLOCKDEV_ON_ERROR_REPORT).
This is the intended default for rerror anyway, but the default for
werror should be BLOCKDEV_ON_ERROR_ENOSPC.

Set the defaults in blk_new() instead so that they apply no matter what
way the BlockBackend was created.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Kevin Wolf 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---
 block/block-backend.c  | 3 +++
 tests/qemu-iotests/067.out | 1 +
 2 files changed, 4 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 7b1ec5071b..dc0cd57724 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -325,6 +325,9 @@ BlockBackend *blk_new(uint64_t perm, uint64_t shared_perm)
 blk->shared_perm = shared_perm;
 blk_set_enable_write_cache(blk, true);
 
+blk->on_read_error = BLOCKDEV_ON_ERROR_REPORT;
+blk->on_write_error = BLOCKDEV_ON_ERROR_ENOSPC;
+
 block_acct_init(&blk->stats);
 
 notifier_list_init(&blk->remove_bs_notifiers);
diff --git a/tests/qemu-iotests/067.out b/tests/qemu-iotests/067.out
index 2e71cff3ce..b10c71db03 100644
--- a/tests/qemu-iotests/067.out
+++ b/tests/qemu-iotests/067.out
@@ -385,6 +385,7 @@ Testing: -device virtio-scsi -device scsi-cd,id=cd0
 {
 "return": [
 {
+"io-status": "ok",
 "device": "",
 "locked": false,
 "removable": true,
-- 
2.13.6




[Qemu-devel] [PULL 12/23] include: Add a lookup table of sizes

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

Adding a lookup table for the powers of two, with the appropriate size
prefixes. This is needed when a size has to be stringified, in which
case something like '(1 * KiB)' would become a literal '(1 * (1L << 10))'
string. Powers of two are used very often for sizes, so such a table
will also make it easier and more intuitive to write them.

This table is generatred using the following AWK script:

BEGIN {
suffix="KMGTPE";
for(i=10; i<64; i++) {
val=2**i;
s=substr(suffix, int(i/10), 1);
n=2**(i%10);
pad=21-int(log(n)/log(10));
printf("#define S_%d%siB %*d\n", n, s, pad, val);
}
}

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 include/qemu/units.h | 55 
 1 file changed, 55 insertions(+)

diff --git a/include/qemu/units.h b/include/qemu/units.h
index 692db3fbb2..68a7758650 100644
--- a/include/qemu/units.h
+++ b/include/qemu/units.h
@@ -17,4 +17,59 @@
 #define PiB (INT64_C(1) << 50)
 #define EiB (INT64_C(1) << 60)
 
+#define S_1KiB  1024
+#define S_2KiB  2048
+#define S_4KiB  4096
+#define S_8KiB  8192
+#define S_16KiB16384
+#define S_32KiB32768
+#define S_64KiB65536
+#define S_128KiB  131072
+#define S_256KiB  262144
+#define S_512KiB  524288
+#define S_1MiB   1048576
+#define S_2MiB   2097152
+#define S_4MiB   4194304
+#define S_8MiB   8388608
+#define S_16MiB 16777216
+#define S_32MiB 33554432
+#define S_64MiB 67108864
+#define S_128MiB   134217728
+#define S_256MiB   268435456
+#define S_512MiB   536870912
+#define S_1GiB1073741824
+#define S_2GiB2147483648
+#define S_4GiB4294967296
+#define S_8GiB8589934592
+#define S_16GiB  17179869184
+#define S_32GiB  34359738368
+#define S_64GiB  68719476736
+#define S_128GiB137438953472
+#define S_256GiB274877906944
+#define S_512GiB549755813888
+#define S_1TiB 1099511627776
+#define S_2TiB 219902322
+#define S_4TiB 4398046511104
+#define S_8TiB 8796093022208
+#define S_16TiB   17592186044416
+#define S_32TiB   35184372088832
+#define S_64TiB   70368744177664
+#define S_128TiB 140737488355328
+#define S_256TiB 281474976710656
+#define S_512TiB 562949953421312
+#define S_1PiB  1125899906842624
+#define S_2PiB  2251799813685248
+#define S_4PiB  4503599627370496
+#define S_8PiB  9007199254740992
+#define S_16PiB18014398509481984
+#define S_32PiB36028797018963968
+#define S_64PiB72057594037927936
+#define S_128PiB  144115188075855872
+#define S_256PiB  288230376151711744
+#define S_512PiB  576460752303423488
+#define S_1EiB   1152921504606846976
+#define S_2EiB   2305843009213693952
+#define S_4EiB   4611686018427387904
+#define S_8EiB   9223372036854775808
+
 #endif
-- 
2.13.6




[Qemu-devel] [PULL 16/23] qcow2: Increase the default upper limit on the L2 cache size

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

The upper limit on the L2 cache size is increased from 1 MB to 32 MB
on Linux platforms, and to 8 MB on other platforms (this difference is
caused by the ability to set intervals for cache cleaning on Linux
platforms only).

This is done in order to allow default full coverage with the L2 cache
for images of up to 256 GB in size (was 8 GB). Note, that only the
needed amount to cover the full image is allocated. The value which is
changed here is just the upper limit on the L2 cache size, beyond which
it will not grow, even if the size of the image will require it to.

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 docs/qcow2-cache.txt | 15 +--
 block/qcow2.h|  6 +-
 qemu-options.hx  |  6 +++---
 3 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 750447ea4f..1fcc0658b2 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -125,12 +125,15 @@ There are a few things that need to be taken into account:
  - Both caches must have a size that is a multiple of the cluster size
(or the cache entry size: see "Using smaller cache sizes" below).
 
- - The maximum L2 cache size is 1 MB by default (enough for full coverage
-   of 8 GB images, with the default cluster size). This value can be
-   modified using the "l2-cache-size" option. QEMU will not use more memory
-   than needed to hold all of the image's L2 tables, regardless of this max.
-   value. The minimal L2 cache size is 2 clusters (or 2 cache entries, see
-   below).
+ - The maximum L2 cache size is 32 MB by default on Linux platforms (enough
+   for full coverage of 256 GB images, with the default cluster size). This
+   value can be modified using the "l2-cache-size" option. QEMU will not use
+   more memory than needed to hold all of the image's L2 tables, regardless
+   of this max. value.
+   On non-Linux platforms the maximal value is smaller by default (8 MB) and
+   this difference stems from the fact that on Linux the cache can be cleared
+   periodically if needed, using the "cache-clean-interval" option (see below).
+   The minimal L2 cache size is 2 clusters (or 2 cache entries, see below).
 
  - The default (and minimum) refcount cache size is 4 clusters.
 
diff --git a/block/qcow2.h b/block/qcow2.h
index 2f8c1fd15c..0f0e3534bf 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -74,7 +74,11 @@
 /* Must be at least 4 to cover all cases of refcount table growth */
 #define MIN_REFCOUNT_CACHE_SIZE 4 /* clusters */
 
-#define DEFAULT_L2_CACHE_MAX_SIZE S_1MiB
+#ifdef CONFIG_LINUX
+#define DEFAULT_L2_CACHE_MAX_SIZE S_32MiB
+#else
+#define DEFAULT_L2_CACHE_MAX_SIZE S_8MiB
+#endif
 
 #define DEFAULT_CLUSTER_SIZE S_64KiB
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 6eef0f5651..14aee78c6c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -736,9 +736,9 @@ The maximum total size of the L2 table and refcount block 
caches in bytes
 
 @item l2-cache-size
 The maximum size of the L2 table cache in bytes
-(default: if cache-size is not specified - 1M; otherwise, as large as possible
-within the cache-size, while permitting the requested or the minimal refcount
-cache size)
+(default: if cache-size is not specified - 32M on Linux platforms, and 8M on
+non-Linux platforms; otherwise, as large as possible within the cache-size,
+while permitting the requested or the minimal refcount cache size)
 
 @item refcount-cache-size
 The maximum size of the refcount block cache in bytes
-- 
2.13.6




[Qemu-devel] [PULL 14/23] qcow2: Avoid duplication in setting the refcount cache size

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

The refcount cache size does not need to be set to its minimum value in
read_cache_sizes(), as it is set to at least its minimum value in
qcow2_update_options_prepare().

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index d2c07ce9fe..cd0053b6ee 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -834,10 +834,9 @@ static void read_cache_sizes(BlockDriverState *bs, 
QemuOpts *opts,
  (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
  * s->cluster_size);
 }
-if (!refcount_cache_size_set) {
-*refcount_cache_size = min_refcount_cache;
-}
 }
+/* l2_cache_size and refcount_cache_size are ensured to have at least
+ * their minimum values in qcow2_update_options_prepare() */
 
 if (*l2_cache_entry_size < (1 << MIN_CLUSTER_BITS) ||
 *l2_cache_entry_size > s->cluster_size ||
-- 
2.13.6




[Qemu-devel] [PULL 07/23] block: Forbid trying to change unsupported options during reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

The bdrv_reopen_prepare() function checks all options passed to each
BlockDriverState (in the reopen_state->options QDict) and makes all
necessary preparations to apply the option changes requested by the
user.

Options are removed from the QDict as they are processed, so at the
end of bdrv_reopen_prepare() only the options that can't be changed
are left. Then a loop goes over all remaining options and verifies
that the old and new values are identical, returning an error if
they're not.

The problem is that at the moment there are options that are removed
from the QDict although they can't be changed. The consequence of this
is any modification to any of those options is silently ignored:

   (qemu) qemu-io virtio0 "reopen -o discard=on"

This happens when all options from bdrv_runtime_opts are removed
from the QDict but then only a few of them are processed. Since
it's especially important that "node-name" and "driver" are not
changed, the code puts them back into the QDict so they are checked
at the end of the function. Instead of putting only those two options
back into the QDict, this patch puts all unprocessed options using
qemu_opts_to_qdict().

update_flags_from_options() also needs to be modified to prevent
BDRV_OPT_CACHE_NO_FLUSH, BDRV_OPT_CACHE_DIRECT and BDRV_OPT_READ_ONLY
from going back to the QDict.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block.c | 22 +++---
 1 file changed, 7 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index ff1aded4b8..e36ae3e8d1 100644
--- a/block.c
+++ b/block.c
@@ -1094,19 +1094,19 @@ static void update_flags_from_options(int *flags, 
QemuOpts *opts)
 *flags &= ~BDRV_O_CACHE_MASK;
 
 assert(qemu_opt_find(opts, BDRV_OPT_CACHE_NO_FLUSH));
-if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_NO_FLUSH, false)) {
+if (qemu_opt_get_bool_del(opts, BDRV_OPT_CACHE_NO_FLUSH, false)) {
 *flags |= BDRV_O_NO_FLUSH;
 }
 
 assert(qemu_opt_find(opts, BDRV_OPT_CACHE_DIRECT));
-if (qemu_opt_get_bool(opts, BDRV_OPT_CACHE_DIRECT, false)) {
+if (qemu_opt_get_bool_del(opts, BDRV_OPT_CACHE_DIRECT, false)) {
 *flags |= BDRV_O_NOCACHE;
 }
 
 *flags &= ~BDRV_O_RDWR;
 
 assert(qemu_opt_find(opts, BDRV_OPT_READ_ONLY));
-if (!qemu_opt_get_bool(opts, BDRV_OPT_READ_ONLY, false)) {
+if (!qemu_opt_get_bool_del(opts, BDRV_OPT_READ_ONLY, false)) {
 *flags |= BDRV_O_RDWR;
 }
 
@@ -3156,7 +3156,6 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 BlockDriver *drv;
 QemuOpts *opts;
 QDict *orig_reopen_opts;
-const char *value;
 bool read_only;
 
 assert(reopen_state != NULL);
@@ -3179,17 +3178,10 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 
 update_flags_from_options(&reopen_state->flags, opts);
 
-/* node-name and driver must be unchanged. Put them back into the QDict, so
- * that they are checked at the end of this function. */
-value = qemu_opt_get(opts, "node-name");
-if (value) {
-qdict_put_str(reopen_state->options, "node-name", value);
-}
-
-value = qemu_opt_get(opts, "driver");
-if (value) {
-qdict_put_str(reopen_state->options, "driver", value);
-}
+/* All other options (including node-name and driver) must be unchanged.
+ * Put them back into the QDict, so that they are checked at the end
+ * of this function. */
+qemu_opts_to_qdict(opts, reopen_state->options);
 
 /* If we are to stay read-only, do not allow permission change
  * to r/w. Attempting to set to r/w may fail if either BDRV_O_ALLOW_RDWR is
-- 
2.13.6




[Qemu-devel] [PULL 11/23] qcow2: Options' documentation fixes

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 docs/qcow2-cache.txt | 21 ++---
 qemu-options.hx  |  9 ++---
 2 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 8a09a5cc5f..7e28b41bd3 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -79,14 +79,14 @@ Choosing the right cache sizes
 In order to choose the cache sizes we need to know how they relate to
 the amount of allocated space.
 
-The amount of virtual disk that can be mapped by the L2 and refcount
+The part of the virtual disk that can be mapped by the L2 and refcount
 caches (in bytes) is:
 
disk_size = l2_cache_size * cluster_size / 8
disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
 
 With the default values for cluster_size (64KB) and refcount_bits
-(16), that is
+(16), this becomes:
 
disk_size = l2_cache_size * 8192
disk_size = refcount_cache_size * 32768
@@ -97,12 +97,16 @@ need:
l2_cache_size = disk_size_GB * 131072
refcount_cache_size = disk_size_GB * 32768
 
-QEMU has a default L2 cache of 1MB (1048576 bytes) and a refcount
-cache of 256KB (262144 bytes), so using the formulas we've just seen
-we have
+For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
+image size (given that the default cluster size is used):
 
-   1048576 / 131072 = 8 GB of virtual disk covered by that cache
-262144 /  32768 = 8 GB
+   8 GB / 8192 = 1 MB
+
+The refcount cache is 4 times the cluster size by default. With the default
+cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
+8 GB of image size:
+
+   262144 * 32768 = 8 GB
 
 
 How to configure the cache sizes
@@ -130,6 +134,9 @@ There are a few things that need to be taken into account:
memory as possible to the L2 cache before increasing the refcount
cache size.
 
+ - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
+   can be set simultaneously.
+
 Unlike L2 tables, refcount blocks are not used during normal I/O but
 only during allocations and internal snapshots. In most cases they are
 accessed sequentially (even during random guest I/O) so increasing the
diff --git a/qemu-options.hx b/qemu-options.hx
index a642ad297f..2db6247eff 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -732,15 +732,18 @@ image file)
 
 @item cache-size
 The maximum total size of the L2 table and refcount block caches in bytes
-(default: 1048576 bytes or 8 clusters, whichever is larger)
+(default: the sum of l2-cache-size and refcount-cache-size)
 
 @item l2-cache-size
 The maximum size of the L2 table cache in bytes
-(default: 4/5 of the total cache size)
+(default: if cache-size is not defined - 1048576 bytes or 8 clusters, whichever
+is larger; otherwise, as large as possible or needed within the cache-size,
+while permitting the requested or the minimal refcount cache size)
 
 @item refcount-cache-size
 The maximum size of the refcount block cache in bytes
-(default: 1/5 of the total cache size)
+(default: 4 times the cluster size; or if cache-size is specified, the part of
+it which is not used for the L2 cache)
 
 @item cache-clean-interval
 Clean unused entries in the L2 and refcount caches. The interval is in seconds.
-- 
2.13.6




[Qemu-devel] [PULL 09/23] block: Allow changing 'discard' on reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

'discard' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare() so any attempt
to change it results in an error:

   (qemu) qemu-io virtio0 "reopen -o discard=on"
   Cannot change the option 'discard'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block.c b/block.c
index e36ae3e8d1..7fb267bd1f 100644
--- a/block.c
+++ b/block.c
@@ -3156,6 +3156,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 BlockDriver *drv;
 QemuOpts *opts;
 QDict *orig_reopen_opts;
+char *discard = NULL;
 bool read_only;
 
 assert(reopen_state != NULL);
@@ -3178,6 +3179,15 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 
 update_flags_from_options(&reopen_state->flags, opts);
 
+discard = qemu_opt_get_del(opts, "discard");
+if (discard != NULL) {
+if (bdrv_parse_discard_flags(discard, &reopen_state->flags) != 0) {
+error_setg(errp, "Invalid discard option");
+ret = -EINVAL;
+goto error;
+}
+}
+
 /* All other options (including node-name and driver) must be unchanged.
  * Put them back into the QDict, so that they are checked at the end
  * of this function. */
@@ -3291,6 +3301,7 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 error:
 qemu_opts_del(opts);
 qobject_unref(orig_reopen_opts);
+g_free(discard);
 return ret;
 }
 
-- 
2.13.6




[Qemu-devel] [PULL 04/23] block: Remove child references from bs->{options, explicit_options}

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.

What is more important, these values can become wrong if the children
change:

$ qemu-img create -f qcow2 hd0.qcow2 10M
$ qemu-img create -f qcow2 hd1.qcow2 10M
$ qemu-img create -f qcow2 hd2.qcow2 10M
$ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
-drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
-drive file=hd2.qcow2,node-name=hd2,backing=hd1

After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:

(qemu) block_stream hd2 0 hd0.qcow2

Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index c298ca6a19..db71ea5b9a 100644
--- a/block.c
+++ b/block.c
@@ -2763,12 +2763,15 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 }
 }
 
-/* Remove all children options from bs->options and bs->explicit_options */
+/* Remove all children options and references
+ * from bs->options and bs->explicit_options */
 QLIST_FOREACH(child, &bs->children, next) {
 char *child_key_dot;
 child_key_dot = g_strdup_printf("%s.", child->name);
 qdict_extract_subqdict(bs->explicit_options, NULL, child_key_dot);
 qdict_extract_subqdict(bs->options, NULL, child_key_dot);
+qdict_del(bs->explicit_options, child->name);
+qdict_del(bs->options, child->name);
 g_free(child_key_dot);
 }
 
@@ -3290,6 +3293,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
 {
 BlockDriver *drv;
 BlockDriverState *bs;
+BdrvChild *child;
 bool old_can_write, new_can_write;
 
 assert(reopen_state != NULL);
@@ -3314,6 +3318,13 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
 bs->open_flags = reopen_state->flags;
 bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
 
+/* Remove child references from bs->options and bs->explicit_options.
+ * Child options were already removed in bdrv_reopen_queue_child() */
+QLIST_FOREACH(child, &bs->children, next) {
+qdict_del(bs->explicit_options, child->name);
+qdict_del(bs->options, child->name);
+}
+
 bdrv_refresh_limits(bs, NULL);
 
 bdrv_set_perm(reopen_state->bs, reopen_state->perm,
-- 
2.13.6




[Qemu-devel] [PULL 08/23] file-posix: Forbid trying to change unsupported options during reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

The file-posix code is used for the "file", "host_device" and
"host_cdrom" drivers, and it allows reopening images. However the only
option that is actually processed is "x-check-cache-dropped", and
changes in all other options (e.g. "filename") are silently ignored:

   (qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file"

While we could allow changing some of the other options, let's keep
things as they are for now but return an error if the user tries to
change any of them.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index bc5e54560a..2da3a76355 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -849,8 +849,13 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 goto out;
 }
 
-rs->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped",
-false);
+rs->check_cache_dropped =
+qemu_opt_get_bool_del(opts, "x-check-cache-dropped", false);
+
+/* This driver's reopen function doesn't currently allow changing
+ * other options, so let's put them back in the original QDict and
+ * bdrv_reopen_prepare() will detect changes and complain. */
+qemu_opts_to_qdict(opts, state->options);
 
 if (s->type == FTYPE_CD) {
 rs->open_flags |= O_NONBLOCK;
-- 
2.13.6




[Qemu-devel] [PULL 19/23] qcow2: Explicit number replaced by a constant

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 95e1c98daa..7277feda13 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1324,7 +1324,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 /* 2^(s->refcount_order - 3) is the refcount width in bytes */
 s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
 s->refcount_block_size = 1 << s->refcount_block_bits;
-bs->total_sectors = header.size / 512;
+bs->total_sectors = header.size / BDRV_SECTOR_SIZE;
 s->csize_shift = (62 - (s->cluster_bits - 8));
 s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
 s->cluster_offset_mask = (1LL << s->csize_shift) - 1;
@@ -3450,7 +3450,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 goto fail;
 }
 
-old_length = bs->total_sectors * 512;
+old_length = bs->total_sectors * BDRV_SECTOR_SIZE;
 new_l1_size = size_to_l1(s, offset);
 
 if (offset < old_length) {
-- 
2.13.6




[Qemu-devel] [PULL 10/23] block: Allow changing 'detect-zeroes' on reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

'detect-zeroes' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare(), so any attempt
to change it results in an error:

   (qemu) qemu-io virtio0 "reopen -o detect-zeroes=on"
   Cannot change the option 'detect-zeroes'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 include/block/block.h |  1 +
 block.c   | 64 ---
 2 files changed, 41 insertions(+), 24 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 4edc1e8afa..b189cf422e 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -184,6 +184,7 @@ typedef QSIMPLEQ_HEAD(BlockReopenQueue, 
BlockReopenQueueEntry) BlockReopenQueue;
 typedef struct BDRVReopenState {
 BlockDriverState *bs;
 int flags;
+BlockdevDetectZeroesOptions detect_zeroes;
 uint64_t perm, shared_perm;
 QDict *options;
 QDict *explicit_options;
diff --git a/block.c b/block.c
index 7fb267bd1f..7710b399a3 100644
--- a/block.c
+++ b/block.c
@@ -764,6 +764,31 @@ static void bdrv_join_options(BlockDriverState *bs, QDict 
*options,
 }
 }
 
+static BlockdevDetectZeroesOptions bdrv_parse_detect_zeroes(QemuOpts *opts,
+int open_flags,
+Error **errp)
+{
+Error *local_err = NULL;
+char *value = qemu_opt_get_del(opts, "detect-zeroes");
+BlockdevDetectZeroesOptions detect_zeroes =
+qapi_enum_parse(&BlockdevDetectZeroesOptions_lookup, value,
+BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, &local_err);
+g_free(value);
+if (local_err) {
+error_propagate(errp, local_err);
+return detect_zeroes;
+}
+
+if (detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
+!(open_flags & BDRV_O_UNMAP))
+{
+error_setg(errp, "setting detect-zeroes to unmap is not allowed "
+   "without setting discard operation to unmap");
+}
+
+return detect_zeroes;
+}
+
 /**
  * Set open flags for a given discard mode
  *
@@ -1328,7 +1353,6 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 const char *driver_name = NULL;
 const char *node_name = NULL;
 const char *discard;
-const char *detect_zeroes;
 QemuOpts *opts;
 BlockDriver *drv;
 Error *local_err = NULL;
@@ -1417,29 +1441,12 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockBackend *file,
 }
 }
 
-detect_zeroes = qemu_opt_get(opts, "detect-zeroes");
-if (detect_zeroes) {
-BlockdevDetectZeroesOptions value =
-qapi_enum_parse(&BlockdevDetectZeroesOptions_lookup,
-detect_zeroes,
-BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF,
-&local_err);
-if (local_err) {
-error_propagate(errp, local_err);
-ret = -EINVAL;
-goto fail_opts;
-}
-
-if (value == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP &&
-!(bs->open_flags & BDRV_O_UNMAP))
-{
-error_setg(errp, "setting detect-zeroes to unmap is not allowed "
- "without setting discard operation to unmap");
-ret = -EINVAL;
-goto fail_opts;
-}
-
-bs->detect_zeroes = value;
+bs->detect_zeroes =
+bdrv_parse_detect_zeroes(opts, bs->open_flags, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto fail_opts;
 }
 
 if (filename != NULL) {
@@ -3188,6 +3195,14 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 }
 }
 
+reopen_state->detect_zeroes =
+bdrv_parse_detect_zeroes(opts, reopen_state->flags, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+ret = -EINVAL;
+goto error;
+}
+
 /* All other options (including node-name and driver) must be unchanged.
  * Put them back into the QDict, so that they are checked at the end
  * of this function. */
@@ -3338,6 +3353,7 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
 bs->options= reopen_state->options;
 bs->open_flags = reopen_state->flags;
 bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
+bs->detect_zeroes  = reopen_state->detect_zeroes;
 
 /* Remove child references from bs->options and bs->explicit_options.
  * Child options were already removed in bdrv_reopen_queue_child() */
-- 
2.13.6




[Qemu-devel] [PULL 17/23] qcow2: Resize the cache upon image resizing

2018-10-01 Thread Kevin Wolf
From: Leonid Bloch 

The caches are now recalculated upon image resizing. This is done
because the new default behavior of assigning L2 cache relatively to
the image size, implies that the cache will be adapted accordingly
after an image resize.

Signed-off-by: Leonid Bloch 
Reviewed-by: Alberto Garcia 
Signed-off-by: Kevin Wolf 
---
 block/qcow2.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 589f6c1b1c..20b5093269 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3418,6 +3418,7 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 uint64_t old_length;
 int64_t new_l1_size;
 int ret;
+QDict *options;
 
 if (prealloc != PREALLOC_MODE_OFF && prealloc != PREALLOC_MODE_METADATA &&
 prealloc != PREALLOC_MODE_FALLOC && prealloc != PREALLOC_MODE_FULL)
@@ -3642,6 +3643,8 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 }
 }
 
+bs->total_sectors = offset / BDRV_SECTOR_SIZE;
+
 /* write updated header.size */
 offset = cpu_to_be64(offset);
 ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, size),
@@ -3652,6 +3655,14 @@ static int coroutine_fn 
qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
 }
 
 s->l1_vm_state_index = new_l1_size;
+
+/* Update cache sizes */
+options = qdict_clone_shallow(bs->options);
+ret = qcow2_update_options(bs, options, s->flags, errp);
+qobject_unref(options);
+if (ret < 0) {
+goto fail;
+}
 ret = 0;
 fail:
 qemu_co_mutex_unlock(&s->lock);
-- 
2.13.6




[Qemu-devel] [PULL 01/23] file-posix: Include filename in locking error message

2018-10-01 Thread Kevin Wolf
From: Fam Zheng 

Image locking errors happening at device initialization time doesn't say
which file cannot be locked, for instance,

-device scsi-disk,drive=drive-1: Failed to get shared "write" lock
Is another process using the image?

could refer to either the overlay image or its backing image.

Hoist the error_append_hint to the caller of raw_check_lock_bytes where
file name is known, and include it in the error hint.

Signed-off-by: Fam Zheng 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 10 +++---
 tests/qemu-iotests/153.out | 76 +++---
 tests/qemu-iotests/182.out |  2 +-
 3 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index fe83cbf0eb..327f39ca45 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -741,8 +741,6 @@ static int raw_check_lock_bytes(int fd, uint64_t perm, 
uint64_t shared_perm,
"Failed to get \"%s\" lock",
perm_name);
 g_free(perm_name);
-error_append_hint(errp,
-  "Is another process using the image?\n");
 return ret;
 }
 }
@@ -758,8 +756,6 @@ static int raw_check_lock_bytes(int fd, uint64_t perm, 
uint64_t shared_perm,
"Failed to get shared \"%s\" lock",
perm_name);
 g_free(perm_name);
-error_append_hint(errp,
-  "Is another process using the image?\n");
 return ret;
 }
 }
@@ -796,6 +792,9 @@ static int raw_handle_perm_lock(BlockDriverState *bs,
 if (!ret) {
 return 0;
 }
+error_append_hint(errp,
+  "Is another process using the image [%s]?\n",
+  bs->filename);
 }
 op = RAW_PL_ABORT;
 /* fall through to unlock bytes. */
@@ -2217,6 +2216,9 @@ raw_co_create(BlockdevCreateOptions *options, Error 
**errp)
 /* Step two: Check that nobody else has taken conflicting locks */
 result = raw_check_lock_bytes(fd, perm, shared, errp);
 if (result < 0) {
+error_append_hint(errp,
+  "Is another process using the image [%s]?\n",
+  file_opts->filename);
 goto out_unlock;
 }
 
diff --git a/tests/qemu-iotests/153.out b/tests/qemu-iotests/153.out
index 93eaf10486..884254868c 100644
--- a/tests/qemu-iotests/153.out
+++ b/tests/qemu-iotests/153.out
@@ -12,11 +12,11 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=33554432 
backing_file=TEST_DIR/t
 
 == Launching another QEMU, opts: '' ==
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,: Failed to get "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 == Launching another QEMU, opts: 'read-only=on' ==
 QEMU_PROG: -drive file=TEST_DIR/t.qcow2,if=none,read-only=on: Failed to get 
shared "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 == Launching another QEMU, opts: 'read-only=on,force-share=on' ==
 
@@ -24,77 +24,77 @@ Is another process using the image?
 
 _qemu_io_wrapper -c read 0 512 TEST_DIR/t.qcow2
 can't open device TEST_DIR/t.qcow2: Failed to get "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 _qemu_io_wrapper -r -c read 0 512 TEST_DIR/t.qcow2
 can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 _qemu_io_wrapper -c open  TEST_DIR/t.qcow2 -c read 0 512
 can't open device TEST_DIR/t.qcow2: Failed to get "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 no file open, try 'help open'
 
 _qemu_io_wrapper -c open -r  TEST_DIR/t.qcow2 -c read 0 512
 can't open device TEST_DIR/t.qcow2: Failed to get shared "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 no file open, try 'help open'
 
 _qemu_img_wrapper info TEST_DIR/t.qcow2
 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 _qemu_img_wrapper check TEST_DIR/t.qcow2
 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 _qemu_img_wrapper compare TEST_DIR/t.qcow2 TEST_DIR/t.qcow2
 qemu-img: Could not open 'TEST_DIR/t.qcow2': Failed to get shared "write" lock
-Is another process using the image?
+Is another process using the image [TEST_DIR/t.qcow2]?
 
 _qemu_img_wrapper map TEST_DIR/t.qcow2
 qe

[Qemu-devel] [PULL 03/23] file-posix: x-check-cache-dropped should default to false on reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

The default value of x-check-cache-dropped is false. There's no reason
to use the previous value as a default in raw_reopen_prepare() because
bdrv_reopen_queue_child() already takes care of putting the old
options in the BDRVReopenState.options QDict.

If x-check-cache-dropped was previously set but is now missing from
the reopen QDict then it should be reset to false.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block/file-posix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 327f39ca45..bc5e54560a 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -850,7 +850,7 @@ static int raw_reopen_prepare(BDRVReopenState *state,
 }
 
 rs->check_cache_dropped = qemu_opt_get_bool(opts, "x-check-cache-dropped",
-s->check_cache_dropped);
+false);
 
 if (s->type == FTYPE_CD) {
 rs->open_flags |= O_NONBLOCK;
-- 
2.13.6




[Qemu-devel] [PULL 06/23] block: Allow child references on reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

In the previous patches we removed all child references from
bs->{options,explicit_options} because keeping them is useless and
wrong.

Because of this, any attempt to reopen a BlockDriverState using a
child reference as one of its options would result in a failure,
because bdrv_reopen_prepare() would detect that there's a new option
(the child reference) that wasn't present in bs->options.

But passing child references on reopen can be useful. It's a way to
specify a BDS's child without having to pass recursively all of the
child's options, and if the reference points to a different BDS then
this can allow us to replace the child.

However, replacing the child is something that needs to be implemented
case by case and only when it makes sense. For now, this patch allows
passing a child reference as long as it points to the current child of
the BlockDriverState.

It's also important to remember that, as a consequence of the
previous patches, this child reference will be removed from
bs->{options,explicit_options} after the reopening has been completed.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block.c b/block.c
index 98aca56785..ff1aded4b8 100644
--- a/block.c
+++ b/block.c
@@ -3242,6 +3242,24 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 QObject *new = entry->value;
 QObject *old = qdict_get(reopen_state->bs->options, entry->key);
 
+/* Allow child references (child_name=node_name) as long as they
+ * point to the current child (i.e. everything stays the same). */
+if (qobject_type(new) == QTYPE_QSTRING) {
+BdrvChild *child;
+QLIST_FOREACH(child, &reopen_state->bs->children, next) {
+if (!strcmp(child->name, entry->key)) {
+break;
+}
+}
+
+if (child) {
+const char *str = qobject_get_try_str(new);
+if (!strcmp(child->bs->node_name, str)) {
+continue; /* Found child with this name, skip option */
+}
+}
+}
+
 /*
  * TODO: When using -drive to specify blockdev options, all values
  * will be strings; however, when using -blockdev, blockdev-add or
-- 
2.13.6




[Qemu-devel] [PULL 05/23] block: Don't look for child references in append_open_options()

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

In the previous patch we removed child references from bs->options, so
there's no need to look for them here anymore.

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 block.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/block.c b/block.c
index db71ea5b9a..98aca56785 100644
--- a/block.c
+++ b/block.c
@@ -5150,23 +5150,12 @@ static bool append_open_options(QDict *d, 
BlockDriverState *bs)
 {
 const QDictEntry *entry;
 QemuOptDesc *desc;
-BdrvChild *child;
 bool found_any = false;
 
 for (entry = qdict_first(bs->options); entry;
  entry = qdict_next(bs->options, entry))
 {
-/* Exclude node-name references to children */
-QLIST_FOREACH(child, &bs->children, next) {
-if (!strcmp(entry->key, child->name)) {
-break;
-}
-}
-if (child) {
-continue;
-}
-
-/* And exclude all non-driver-specific options */
+/* Exclude all non-driver-specific options */
 for (desc = bdrv_runtime_opts.desc; desc->name; desc++) {
 if (!strcmp(qdict_entry_key(entry), desc->name)) {
 break;
-- 
2.13.6




[Qemu-devel] [PULL 00/23] Block layer patches

2018-10-01 Thread Kevin Wolf
The following changes since commit 07f426c35eddd79388a23d11cb278600d7e3831d:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into 
staging (2018-09-28 18:56:09 +0100)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to dd353157942a59c21da07da5ac8749a871f7c3ed:

  tests/test-bdrv-drain: Fix too late qemu_event_reset() (2018-10-01 19:13:55 
+0200)


Block layer patches:

- qcow2 cache option default changes (Linux: 32 MB maximum, limited by
  whatever cache size can be made use of with the specific image;
  default cache-clean-interval of 10 minutes)
- reopen: Allow specifying unchanged child node references, and changing
  a few generic options (discard, detect-zeroes)
- Fix werror/rerror defaults for -device drive=
- Test case fixes


Alberto Garcia (9):
  qemu-io: Fix writethrough check in reopen
  file-posix: x-check-cache-dropped should default to false on reopen
  block: Remove child references from bs->{options,explicit_options}
  block: Don't look for child references in append_open_options()
  block: Allow child references on reopen
  block: Forbid trying to change unsupported options during reopen
  file-posix: Forbid trying to change unsupported options during reopen
  block: Allow changing 'discard' on reopen
  block: Allow changing 'detect-zeroes' on reopen

Fam Zheng (1):
  file-posix: Include filename in locking error message

Kevin Wolf (3):
  block-backend: Set werror/rerror defaults in blk_new()
  test-replication: Lock AioContext around blk_unref()
  tests/test-bdrv-drain: Fix too late qemu_event_reset()

Leonid Bloch (10):
  qcow2: Options' documentation fixes
  include: Add a lookup table of sizes
  qcow2: Make sizes more humanly readable
  qcow2: Avoid duplication in setting the refcount cache size
  qcow2: Assign the L2 cache relatively to the image size
  qcow2: Increase the default upper limit on the L2 cache size
  qcow2: Resize the cache upon image resizing
  qcow2: Set the default cache-clean-interval to 10 minutes
  qcow2: Explicit number replaced by a constant
  qcow2: Fix cache-clean-interval documentation

 qapi/block-core.json   |   4 +-
 docs/qcow2-cache.txt   |  59 
 block/qcow2.h  |  19 ---
 include/block/block.h  |   1 +
 include/qemu/units.h   |  55 ++
 block.c| 135 +
 block/block-backend.c  |   3 +
 block/file-posix.c |  19 +--
 block/qcow2.c  |  43 +--
 qemu-io-cmds.c |   2 +-
 tests/test-bdrv-drain.c|   4 +-
 tests/test-replication.c   |  11 
 qemu-options.hx|  12 ++--
 tests/qemu-iotests/067.out |   1 +
 tests/qemu-iotests/137 |   8 ++-
 tests/qemu-iotests/137.out |   4 +-
 tests/qemu-iotests/153.out |  76 -
 tests/qemu-iotests/182.out |   2 +-
 18 files changed, 307 insertions(+), 151 deletions(-)



[Qemu-devel] [PULL 02/23] qemu-io: Fix writethrough check in reopen

2018-10-01 Thread Kevin Wolf
From: Alberto Garcia 

"qemu-io reopen" doesn't allow changing the writethrough setting of
the cache, but the check is wrong, causing an error even on a simple
reopen with the default parameters:

   $ qemu-img create -f qcow2 hd.qcow2 1M
   $ qemu-system-x86_64 -monitor stdio -drive if=virtio,file=hd.qcow2
   (qemu) qemu-io virtio0 reopen
   Cannot change cache.writeback: Device attached

Signed-off-by: Alberto Garcia 
Reviewed-by: Max Reitz 
Signed-off-by: Kevin Wolf 
---
 qemu-io-cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 5bf5f28178..db0b3ee5ef 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -2025,7 +2025,7 @@ static int reopen_f(BlockBackend *blk, int argc, char 
**argv)
 return -EINVAL;
 }
 
-if (writethrough != blk_enable_write_cache(blk) &&
+if (!writethrough != blk_enable_write_cache(blk) &&
 blk_get_attached_dev(blk))
 {
 error_report("Cannot change cache.writeback: Device attached");
-- 
2.13.6




Re: [Qemu-devel] [PULL v3 00/79] Misc patches for 2018-09-30

2018-10-01 Thread Peter Maydell
On 1 October 2018 at 16:29, Paolo Bonzini  wrote:
> The following changes since commit 042938f46e1c477419d1931381fdadffaa49d45e:
>
>   Merge remote-tracking branch 
> 'remotes/dgilbert/tags/pull-migration-20180926a' into staging (2018-09-28 
> 17:07:23 +0100)
>
> are available in the Git repository at:
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 72d09800e7b85fa1a14b88d341ca8c042d236cba:
>
>   hw/scsi/mptendian: Avoid taking address of fields in packed structs 
> (2018-10-01 17:28:02 +0200)
>
> 
> * configure fix for environment variables (Daniel)
> * fix memory leaks (Alex)
> * x86_64 MTTCG fixes (Emilio)
> * introduce atomic64 (Emilio)
> * Fix for virtio hang (Fam, myself)
> * SH serial port fix (Geert)
> * Deprecate rotation_rate for scsi-block (Fam)
> * Extend memory-backend-file availability to all POSIX hosts (Hikaru)
> * Memory API cleanups and fixes (Igor, Li Qiang, Peter, Philippe)
> * MSI/IOMMU fix (Jan)
> * Socket reconnection fixes (Marc-André)
> * icount fixes (Emilio, myself)
> * QSP fixes for Coverity (myself)
> * Some record/replay improovements (Pavel)
> * Packed struct fixes (Peter)
> * Windows dump fixes and elf2dmp (Viktor)
> * kvmclock fix (Yongji)
>
> 

Hi; compile failures due to format string issues, I'm afraid:

/home/petmay01/qemu-for-merges/hw/mips/mips_r4k.c: In function 'load_kernel':
/home/petmay01/qemu-for-merges/hw/mips/mips_r4k.c:139:47: error:
format '%li' expects argument of type 'long int', but argument 5 has
type 'int64_t {aka long long int}' [-Werror=format=]
 snprintf((char *)params_buf + 8, 256, "rd_start=0x%" PRIx64 "
rd_size=%li %s",
   ^
cc1: all warnings being treated as errors

/home/petmay01/qemu-for-merges/hw/mips/mips_malta.c: In function 'load_kernel':
/home/petmay01/qemu-for-merges/hw/mips/mips_malta.c:1073:42: error:
format '%li' expects argument of type 'long int', but argument 5 has
type 'int64_t {aka long long int}' [-Werror=format=]
 prom_set(prom_buf, prom_index++, "rd_start=0x%" PRIx64 "
rd_size=%li %s",
  ^

I also saw this in the freebsd VM test:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
gtester -k --verbose -m=quick tests/test-char
TEST: tests/test-char... (pid=15938)
  /char/null:  OK
  /char/invalid:   OK
  /char/ringbuf:   OK
  /char/mux:   OK
  /char/stdio: OK
  /char/pipe:  OK
  /char/file:  OK
  /char/file-fifo: OK
  /char/udp:   OK
  /char/serial:OK
  /char/hotswap:   OK
  /char/socket/basic:  OK
  /char/socket/reconnect:  FAIL
GTester: last random seed: R02S615917be6302ea7de5d0fc9c58149953
(pid=16010)
  /char/socket/fdpass: OK
FAIL: tests/test-char
**
ERROR:tests/test-char.c:353:char_socket_test_common: assertion failed:
(object_property_get_bool(OBJECT(chr_client), "connected",
&error_abort))
gmake: *** [/var/tmp/qemu-test.YRve5C/tests/Makefile.include:925:
check-tests/test-char] Error 1


Haven't seen that before, so maybe it's something in this patchset;
it could be a pre-existing intermittent though I suppose.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2 06/15] gdbstub: add multiprocess support to (f|s)ThreadInfo and ThreadExtraInfo

2018-10-01 Thread Philippe Mathieu-Daudé



On 01/10/2018 13:56, Luc Michel wrote:
> Change the thread info related packets handling to support multiprocess
> extension.
> 
> Add the CPUs class name in the extra info to help differentiate
> them in multiprocess mode.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 32 ++--
>  1 file changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index 3242f0d261..63d7913269 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1248,11 +1248,10 @@ out:
>  static int gdb_handle_packet(GDBState *s, const char *line_buf)
>  {
>  CPUState *cpu;
>  CPUClass *cc;
>  const char *p;
> -uint32_t thread;
>  uint32_t pid, tid;
>  int ch, reg_size, type, res;
>  uint8_t mem_buf[MAX_PACKET_LENGTH];
>  char buf[sizeof(mem_buf) + 1 /* trailing NUL */];
>  char thread_id[16];
> @@ -1540,30 +1539,43 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  snprintf(buf, sizeof(buf), "QC%s",
>   gdb_fmt_thread_id(s, cpu, thread_id, 
> sizeof(thread_id)));
>  put_packet(s, buf);
>  break;
>  } else if (strcmp(p,"fThreadInfo") == 0) {
> -s->query_cpu = first_cpu;
> +s->query_cpu = gdb_first_cpu(s);
>  goto report_cpuinfo;
>  } else if (strcmp(p,"sThreadInfo") == 0) {
>  report_cpuinfo:
>  if (s->query_cpu) {
> -snprintf(buf, sizeof(buf), "m%x", 
> cpu_gdb_index(s->query_cpu));
> +snprintf(buf, sizeof(buf), "m%s",
> + gdb_fmt_thread_id(s, s->query_cpu,
> +   thread_id, sizeof(thread_id)));
>  put_packet(s, buf);
> -s->query_cpu = CPU_NEXT(s->query_cpu);
> +s->query_cpu = gdb_next_cpu(s, s->query_cpu);
>  } else
>  put_packet(s, "l");
>  break;
>  } else if (strncmp(p,"ThreadExtraInfo,", 16) == 0) {
> -thread = strtoull(p+16, (char **)&p, 16);
> -cpu = find_cpu(thread);
> +read_thread_id(p + 16, &p, &pid, &tid);

Check similar to patch #3:

if (read_thread_id(p, &p, &pid, &tid) == GDB_READ_THREAD_ERR) {
put_packet(s, "E22");
break;
}

With equivalent change:
Reviewed-by: Philippe Mathieu-Daudé 

> +cpu = gdb_get_cpu(s, pid, tid);
>  if (cpu != NULL) {
>  cpu_synchronize_state(cpu);
> -/* memtohex() doubles the required space */
> -len = snprintf((char *)mem_buf, sizeof(buf) / 2,
> -   "CPU#%d [%s]", cpu->cpu_index,
> -   cpu->halted ? "halted " : "running");
> +
> +if (s->multiprocess && (s->process_num > 1)) {
> +/* Print the CPU model in multiprocess mode */
> +ObjectClass *oc = object_get_class(OBJECT(cpu));
> +const char *cpu_model = object_class_get_name(oc);
> +len = snprintf((char *)mem_buf, sizeof(buf) / 2,
> +   "CPU#%d %s [%s]", cpu->cpu_index,
> +   cpu_model,
> +   cpu->halted ? "halted " : "running");
> +} else {
> +/* memtohex() doubles the required space */
> +len = snprintf((char *)mem_buf, sizeof(buf) / 2,
> +   "CPU#%d [%s]", cpu->cpu_index,
> +   cpu->halted ? "halted " : "running");
> +}
>  trace_gdbstub_op_extra_info((char *)mem_buf);
>  memtohex(buf, mem_buf, len);
>  put_packet(s, buf);
>  }
>  break;
> 



Re: [Qemu-devel] [PATCH v2 03/15] gdbstub: add multiprocess support to 'H' and 'T' packets

2018-10-01 Thread Philippe Mathieu-Daudé
On 01/10/2018 13:56, Luc Michel wrote:
> Add a couple of helper functions to cope with GDB threads and processes.
> 
> The gdb_get_process() function looks for a process given a pid.
> 
> The gdb_get_cpu() function returns the CPU corresponding to the (pid,
> tid) pair given as parameters.
> 
> The read_thread_id() function parses the thread-id sent by the peer.
> This function supports the multiprocess extension thread-id syntax.  The
> return value specifies if the parsing failed, or if a special case was
> encountered (all processes or all threads).
> 
> Use them in 'H' and 'T' packets handling to support the multiprocess
> extension.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 149 +++---
>  1 file changed, 131 insertions(+), 18 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index ac9d540fda..a21ce3ca18 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -667,10 +667,74 @@ static uint32_t gdb_get_cpu_pid(const GDBState *s, 
> CPUState *cpu)
>  }
>  
>  return pid + 1;
>  }
>  
> +static GDBProcess *gdb_get_process(const GDBState *s, uint32_t pid)
> +{
> +uint32_t process_idx;
> +
> +if (!pid) {
> +/* 0 means any process, we take the first one */
> +pid = 1;
> +}
> +
> +/* GDB PIDs are in the range [1;n] */
> +process_idx = pid - 1;
> +
> +if (process_idx >= s->process_num) {
> +return NULL;
> +}
> +
> +return &s->processes[process_idx];
> +}
> +
> +static GDBProcess *gdb_get_cpu_process(const GDBState *s, CPUState *cpu)
> +{
> +return gdb_get_process(s, gdb_get_cpu_pid(s, cpu));
> +}
> +
> +static CPUState *find_cpu(uint32_t thread_id)
> +{
> +CPUState *cpu;
> +
> +CPU_FOREACH(cpu) {
> +if (cpu_gdb_index(cpu) == thread_id) {
> +return cpu;
> +}
> +}
> +
> +return NULL;
> +}
> +
> +static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid)
> +{
> +GDBProcess *process;
> +CPUState *cpu = find_cpu(tid);
> +
> +if (!tid) {
> +/* 0 means any thread, we take the first one */
> +tid = 1;
> +}
> +
> +if (cpu == NULL) {
> +return NULL;
> +}
> +
> +process = gdb_get_cpu_process(s, cpu);
> +
> +if (process->pid != pid) {
> +return NULL;
> +}
> +
> +if (!process->attached) {
> +return NULL;
> +}
> +
> +return cpu;
> +}
> +
>  static const char *get_feature_xml(const char *p, const char **newp,
> CPUClass *cc)
>  {
>  size_t len;
>  int i;
> @@ -923,23 +987,10 @@ static void gdb_set_cpu_pc(GDBState *s, target_ulong pc)
>  
>  cpu_synchronize_state(cpu);
>  cpu_set_pc(cpu, pc);
>  }
>  
> -static CPUState *find_cpu(uint32_t thread_id)
> -{
> -CPUState *cpu;
> -
> -CPU_FOREACH(cpu) {
> -if (cpu_gdb_index(cpu) == thread_id) {
> -return cpu;
> -}
> -}
> -
> -return NULL;
> -}
> -
>  static char *gdb_fmt_thread_id(const GDBState *s, CPUState *cpu,
> char *buf, size_t buf_size)
>  {
>  if (s->multiprocess) {
>  snprintf(buf, buf_size, "p%02x.%02x",
> @@ -949,10 +1000,64 @@ static char *gdb_fmt_thread_id(const GDBState *s, 
> CPUState *cpu,
>  }
>  
>  return buf;
>  }
>  
> +typedef enum GDBThreadIdKind {
> +GDB_ONE_THREAD = 0,
> +GDB_ALL_THREADS, /* One process, all threads */
> +GDB_ALL_PROCESSES,
> +GDB_READ_THREAD_ERR
> +} GDBThreadIdKind;
> +
> +static GDBThreadIdKind read_thread_id(const char *buf, const char **end_buf,
> +  uint32_t *pid, uint32_t *tid)
> +{
> +unsigned long p, t;
> +int ret;
> +
> +if (*buf == 'p') {
> +buf++;
> +ret = qemu_strtoul(buf, &buf, 16, &p);
> +
> +if (ret) {
> +return GDB_READ_THREAD_ERR;
> +}
> +
> +/* Skip '.' */
> +buf++;
> +} else {
> +p = 1;
> +}
> +
> +ret = qemu_strtoul(buf, &buf, 16, &t);
> +
> +if (ret) {
> +return GDB_READ_THREAD_ERR;
> +}
> +
> +*end_buf = buf;
> +
> +if (p == -1) {
> +return GDB_ALL_PROCESSES;
> +}
> +
> +if (pid) {
> +*pid = p;
> +}
> +
> +if (t == -1) {
> +return GDB_ALL_THREADS;
> +}
> +
> +if (tid) {
> +*tid = t;
> +}
> +
> +return GDB_ONE_THREAD;
> +}
> +
>  static int is_query_packet(const char *p, const char *query, char separator)
>  {
>  unsigned int query_len = strlen(query);
>  
>  return strncmp(p, query, query_len) == 0 &&
> @@ -1057,16 +1162,18 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  {
>  CPUState *cpu;
>  CPUClass *cc;
>  const char *p;
>  uint32_t thread;
> +uint32_t pid, tid;
>  int ch, reg_size, type, res;
>  uint8_t mem_buf[MAX_PACKET_LENGTH];
>  char buf[sizeof(mem_buf) + 1 /* trailing NUL */];
>  char thread_id[16];

[Qemu-devel] racing between pause_all_vcpus() and qemu_cpu_stop()

2018-10-01 Thread Peter Maydell
I've been investigating a race condition where sometimes when my
guest writes to a device register which triggers a
qemu_system_reset_request(), it doesn't actually cause a clean reset,
but instead the guest CPU continues to execute instructions.
I managed to repro it under 'rr', which let me walk through enough
of what was going on to determine the following:

When a guest CPU thread calls qemu_system_reset_request(), this
results in a call to qemu_cpu_stop(current_cpu, true), to
make the CPU come back out to the main loop. We also set the
reset_requested flag, to get the IO thread to actually do the
reset.

The main loop thread runs main_loop_should_exit(). If there is a
pending reset, it calls pause_all_vcpus(), with the intention
that this quiesces all the guest CPUs before it starts messing
with reset actions.

pause_all_vcpus() just waits for every cpu to have cpu->stopped set.
However, if the running cpu has just called qemu_cpu_stop() on
itself then it will have set cpu->stopped true but not actually
made it out to the main loop yet. (In the case I'm looking at,
what happens is that as soon as the CPU thread unlocks the
iothread mutex in io_writex() after the device write, the
main thread runs and does all the reset operations.)

The reset code in the iothread then proceeds to start calling
various reset functions while the CPU thread is still inside
the exec loop, running generated code and so on. This doesn't
seem like what ought to happen. In particular it includes
calling cpu_common_reset(), which clears all kinds of flags
relevant to the still-executing CPU...

Any suggestions for how we should fix this?

thanks
-- PMM



Re: [Qemu-devel] [PATCH v2 04/15] gdbstub: add multiprocess support to vCont packets

2018-10-01 Thread Philippe Mathieu-Daudé
On 01/10/2018 13:56, Luc Michel wrote:
> Add the gdb_first_cpu() and gdb_next_cpu() to iterate over all
> the CPUs in currently attached processes.
> 
> Add the gdb_first_cpu_in_process() and gdb_next_cpu_in_process() to
> iterate over CPUs of a given process.
> 
> Use them to add multiprocess extension support to vCont packets.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 117 +++---
>  1 file changed, 102 insertions(+), 15 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index a21ce3ca18..779cc8b241 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -704,10 +704,40 @@ static CPUState *find_cpu(uint32_t thread_id)
>  }
>  
>  return NULL;
>  }
>  
> +static CPUState *get_first_cpu_in_process(const GDBState *s,
> +  GDBProcess *process)
> +{
> +CPUState *cpu;
> +
> +CPU_FOREACH(cpu) {
> +if (gdb_get_cpu_pid(s, cpu) == process->pid) {
> +return cpu;
> +}
> +}
> +
> +return NULL;
> +}
> +
> +static CPUState *gdb_next_cpu_in_process(const GDBState *s, CPUState *cpu)
> +{
> +uint32_t pid = gdb_get_cpu_pid(s, cpu);
> +cpu = CPU_NEXT(cpu);
> +
> +while (cpu) {
> +if (gdb_get_cpu_pid(s, cpu) == pid) {
> +break;
> +}
> +
> +cpu = CPU_NEXT(cpu);
> +}
> +
> +return cpu;
> +}
> +
>  static CPUState *gdb_get_cpu(const GDBState *s, uint32_t pid, uint32_t tid)
>  {
>  GDBProcess *process;
>  CPUState *cpu = find_cpu(tid);
>  
> @@ -731,10 +761,41 @@ static CPUState *gdb_get_cpu(const GDBState *s, 
> uint32_t pid, uint32_t tid)
>  }
>  
>  return cpu;
>  }
>  
> +/* Return the cpu following @cpu, while ignoring
> + * unattached processes.
> + */
> +static CPUState *gdb_next_cpu(const GDBState *s, CPUState *cpu)
> +{
> +cpu = CPU_NEXT(cpu);
> +
> +while (cpu) {
> +if (gdb_get_cpu_process(s, cpu)->attached) {
> +break;
> +}
> +
> +cpu = CPU_NEXT(cpu);
> +}
> +
> +return cpu;
> +}
> +
> +/* Return the first attached cpu */
> +static CPUState *gdb_first_cpu(const GDBState *s)
> +{
> +CPUState *cpu = first_cpu;
> +GDBProcess *process = gdb_get_cpu_process(s, cpu);
> +
> +if (!process->attached) {
> +return gdb_next_cpu(s, cpu);
> +}
> +
> +return cpu;
> +}
> +
>  static const char *get_feature_xml(const char *p, const char **newp,
> CPUClass *cc)
>  {
>  size_t len;
>  int i;
> @@ -1069,14 +1130,16 @@ static int is_query_packet(const char *p, const char 
> *query, char separator)
>   * returns -ENOTSUP if a command is unsupported, -EINVAL or -ERANGE if there 
> is
>   * a format error, 0 on success.
>   */
>  static int gdb_handle_vcont(GDBState *s, const char *p)
>  {
> -int res, idx, signal = 0;
> +int res, signal = 0;
>  char cur_action;
>  char *newstates;
>  unsigned long tmp;
> +uint32_t pid, tid;
> +GDBProcess *process;
>  CPUState *cpu;
>  #ifdef CONFIG_USER_ONLY
>  int max_cpus = 1; /* global variable max_cpus exists only in system mode 
> */
>  
>  CPU_FOREACH(cpu) {
> @@ -1115,29 +1178,52 @@ static int gdb_handle_vcont(GDBState *s, const char 
> *p)
>  } else if (cur_action != 'c' && cur_action != 's') {
>  /* unknown/invalid/unsupported command */
>  res = -ENOTSUP;
>  goto out;
>  }
> -/* thread specification. special values: (none), -1 = all; 0 = any */
> -if ((p[0] == ':' && p[1] == '-' && p[2] == '1') || (p[0] != ':')) {
> -if (*p == ':') {
> -p += 3;
> -}
> -for (idx = 0; idx < max_cpus; idx++) {
> -if (newstates[idx] == 1) {
> -newstates[idx] = cur_action;
> +
> +if (*p++ != ':') {
> +res = -ENOTSUP;
> +goto out;
> +}
> +
> +switch (read_thread_id(p, &p, &pid, &tid)) {
> +case GDB_READ_THREAD_ERR:
> +res = -EINVAL;
> +goto out;
> +
> +case GDB_ALL_PROCESSES:
> +cpu = gdb_first_cpu(s);
> +while (cpu) {
> +if (newstates[cpu->cpu_index] == 1) {
> +newstates[cpu->cpu_index] = cur_action;
>  }
> +
> +cpu = gdb_next_cpu(s, cpu);
>  }
> -} else if (*p == ':') {
> -p++;
> -res = qemu_strtoul(p, &p, 16, &tmp);
> -if (res) {
> +break;
> +
> +case GDB_ALL_THREADS:
> +process = gdb_get_process(s, pid);
> +
> +if (!process->attached) {
> +res = -EINVAL;
>  goto out;
>  }
>  
> -/* 0 means any thread, so we pick the first valid CPU */
> -cpu = tmp ? find_cpu(tmp) : first_cpu;
> +cpu = get_first_cpu_in_process(s, process);
> +

Re: [Qemu-devel] [PATCH] qcow2: Fix cache-clean-interval documentation

2018-10-01 Thread Kevin Wolf
Am 01.10.2018 um 16:35 hat Eric Blake geschrieben:
> On 9/29/18 4:54 AM, Leonid Bloch wrote:
> > Fixing cache-clean-interval documentation following the recent change to
> > a default of 600 seconds on supported plarforms (only Linux currently).
> > 
> > Signed-off-by: Leonid Bloch 
> > ---
> >   docs/qcow2-cache.txt | 19 +--
> >   qapi/block-core.json |  3 ++-
> >   qemu-options.hx  |  3 ++-
> >   3 files changed, 13 insertions(+), 12 deletions(-)
> > 
> > diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> > index 59358b816f..1778312e09 100644
> > --- a/docs/qcow2-cache.txt
> > +++ b/docs/qcow2-cache.txt
> > @@ -202,18 +202,17 @@ Reducing the memory usage
> >   It is possible to clean unused cache entries in order to reduce the
> >   memory usage during periods of low I/O activity.
> > -The parameter "cache-clean-interval" defines an interval (in seconds).
> > -All cache entries that haven't been accessed during that interval are
> > -removed from memory.
> > +The parameter "cache-clean-interval" defines an interval (in seconds),
> > +affer which all the cache entries that haven't been accessed during it
> 
> s/affer/after/
> 
> Maybe s/during it/during the interval/

Thanks, made these fixes and applied it to the block branch.

Kevin



Re: [Qemu-devel] [PATCH v2 11/15] gdbstub: add support for vAttach packets

2018-10-01 Thread Philippe Mathieu-Daudé
Hi Luc,

On 01/10/2018 13:57, Luc Michel wrote:
> Add support for the vAttach packets. In multiprocess mode, GDB sends
> them to attach to additional processes.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 32 
>  1 file changed, 32 insertions(+)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index d372972dd3..b6de821025 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1326,10 +1326,42 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  break;
>  }
>  goto unknown_command;
>  }
>  break;
> +} else if (strncmp(p, "Attach", 6) == 0) {

Shouldn't this be strncmp("Attach;", 7)?

> +unsigned long pid;
> +
> +p += 7;
> +
> +qemu_strtoul(p, &p, 16, &pid);

You should check for EINVAL/ERANGE here.

Rest of the patch is OK.

> +
> +process = gdb_get_process(s, pid);
> +
> +if (process == NULL) {
> +put_packet(s, "E22");
> +break;
> +}
> +
> +cpu = get_first_cpu_in_process(s, process);
> +
> +if (cpu == NULL) {
> +/* Refuse to attach an empty process */
> +put_packet(s, "E22");
> +break;
> +}
> +
> +process->attached = true;
> +
> +s->g_cpu = cpu;
> +s->c_cpu = cpu;
> +
> +snprintf(buf, sizeof(buf), "T%02xthread:%s;", GDB_SIGNAL_TRAP,
> + gdb_fmt_thread_id(s, cpu, thread_id, 
> sizeof(thread_id)));
> +
> +put_packet(s, buf);
> +break;
>  } else {
>  goto unknown_command;
>  }
>  case 'k':
>  /* Kill the target */
> 



Re: [Qemu-devel] [PATCH v2 10/15] gdbstub: add support for extended mode packet

2018-10-01 Thread Philippe Mathieu-Daudé
Hi Luc,

On 01/10/2018 13:56, Luc Michel wrote:
> Add support for the '!' extended mode packet. This is required for the
> multiprocess extension.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index 299783b3b8..d372972dd3 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1280,10 +1280,13 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  trace_gdbstub_io_command(line_buf);
>  
>  p = line_buf;
>  ch = *p++;
>  switch(ch) {
> +case '!':
> +put_packet(s, "OK");

Don't we want to also support the 'R' packet?

> +break;
>  case '?':
>  /* TODO: Make this return the correct value for user-mode.  */
>  snprintf(buf, sizeof(buf), "T%02xthread:%s;", GDB_SIGNAL_TRAP,
>   gdb_fmt_thread_id(s, s->c_cpu, thread_id, 
> sizeof(thread_id)));
>  put_packet(s, buf);
> 



Re: [Qemu-devel] [PATCH v2 14/15] gdbstub: add multiprocess extension support

2018-10-01 Thread Philippe Mathieu-Daudé
On 01/10/2018 13:57, Luc Michel wrote:
> Add multiprocess extension support by enabling multiprocess mode when
> the peer requests it, and by replying that we actually support it in the
> qSupported reply packet.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index 51cc11981e..89f6803533 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1692,15 +1692,19 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  put_packet(s, "OK");
>  break;
>  }
>  #endif /* !CONFIG_USER_ONLY */
>  if (is_query_packet(p, "Supported", ':')) {
> +if (strstr(p, "multiprocess+")) {
> +s->multiprocess = true;
> +}
>  snprintf(buf, sizeof(buf), "PacketSize=%x", MAX_PACKET_LENGTH);
>  cc = CPU_GET_CLASS(first_cpu);
>  if (cc->gdb_core_xml_file != NULL) {
>  pstrcat(buf, sizeof(buf), ";qXfer:features:read+");
>  }

Maybe move altogether:

   if (strstr(p, "multiprocess+")) {
   s->multiprocess = true;
   }

> +pstrcat(buf, sizeof(buf), ";multiprocess+");

Regardless:
Reviewed-by: Philippe Mathieu-Daudé 

>  put_packet(s, buf);
>  break;
>  }
>  if (strncmp(p, "Xfer:features:read:", 19) == 0) {
>  const char *xml;
> 



Re: [Qemu-devel] [PATCH v2 08/15] gdbstub: add multiprocess support to gdb_vm_state_change()

2018-10-01 Thread Philippe Mathieu-Daudé
On 01/10/2018 13:56, Luc Michel wrote:
> Add support for multiprocess extension in gdb_vm_state_change()
> function.
> 
> Signed-off-by: Luc Michel 

Reviewed-by: Philippe Mathieu-Daudé 

> ---
>  gdbstub.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index 9065e8e140..c1a02c34cd 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -1692,10 +1692,11 @@ void gdb_set_stop_cpu(CPUState *cpu)
>  static void gdb_vm_state_change(void *opaque, int running, RunState state)
>  {
>  GDBState *s = gdbserver_state;
>  CPUState *cpu = s->c_cpu;
>  char buf[256];
> +char thread_id[16];
>  const char *type;
>  int ret;
>  
>  if (running || s->state == RS_INACTIVE) {
>  return;
> @@ -1703,10 +1704,18 @@ static void gdb_vm_state_change(void *opaque, int 
> running, RunState state)
>  /* Is there a GDB syscall waiting to be sent?  */
>  if (s->current_syscall_cb) {
>  put_packet(s, s->syscall_buf);
>  return;
>  }
> +
> +if (cpu == NULL) {
> +/* No process attached */
> +return;
> +}
> +
> +gdb_fmt_thread_id(s, cpu, thread_id, sizeof(thread_id));
> +
>  switch (state) {
>  case RUN_STATE_DEBUG:
>  if (cpu->watchpoint_hit) {
>  switch (cpu->watchpoint_hit->flags & BP_MEM_ACCESS) {
>  case BP_MEM_READ:
> @@ -1720,12 +1729,12 @@ static void gdb_vm_state_change(void *opaque, int 
> running, RunState state)
>  break;
>  }
>  trace_gdbstub_hit_watchpoint(type, cpu_gdb_index(cpu),
>  (target_ulong)cpu->watchpoint_hit->vaddr);
>  snprintf(buf, sizeof(buf),
> - "T%02xthread:%02x;%swatch:" TARGET_FMT_lx ";",
> - GDB_SIGNAL_TRAP, cpu_gdb_index(cpu), type,
> + "T%02xthread:%s;%swatch:" TARGET_FMT_lx ";",
> + GDB_SIGNAL_TRAP, thread_id, type,
>   (target_ulong)cpu->watchpoint_hit->vaddr);
>  cpu->watchpoint_hit = NULL;
>  goto send_packet;
>  } else {
>  trace_gdbstub_hit_break();
> @@ -1763,11 +1772,11 @@ static void gdb_vm_state_change(void *opaque, int 
> running, RunState state)
>  trace_gdbstub_hit_unknown(state);
>  ret = GDB_SIGNAL_UNKNOWN;
>  break;
>  }
>  gdb_set_stop_cpu(cpu);
> -snprintf(buf, sizeof(buf), "T%02xthread:%02x;", ret, cpu_gdb_index(cpu));
> +snprintf(buf, sizeof(buf), "T%02xthread:%s;", ret, thread_id);
>  
>  send_packet:
>  put_packet(s, buf);
>  
>  /* disable single step if it was enabled */
> 



Re: [Qemu-devel] [PATCH v2 07/15] gdbstub: add multiprocess support to Xfer:features:read:

2018-10-01 Thread Philippe Mathieu-Daudé
Hi Luc,

On 01/10/2018 13:56, Luc Michel wrote:
> Change the Xfer:features:read: packet handling to support the
> multiprocess extension. This packet is used to request the XML
> description of the CPU. In multiprocess mode, different descriptions can
> be sent for different processes.
> 
> This function now takes the process to send the description for as a
> parameter, and use a buffer in the process structure to store the
> generated description.
> 
> It takes the first CPU of the process to generate the description.
> 
> Signed-off-by: Luc Michel 
> ---
>  gdbstub.c | 44 
>  1 file changed, 24 insertions(+), 20 deletions(-)
> 
> diff --git a/gdbstub.c b/gdbstub.c
> index 63d7913269..9065e8e140 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -792,55 +792,57 @@ static CPUState *gdb_first_cpu(const GDBState *s)
>  }
>  
>  return cpu;
>  }
>  
> -static const char *get_feature_xml(const char *p, const char **newp,
> -   CPUClass *cc)
> +static const char *get_feature_xml(const GDBState *s, const char *p,
> +   const char **newp, GDBProcess *process)
>  {
>  size_t len;
>  int i;
>  const char *name;
> -static char target_xml[1024];

As noted in your patch #2, I'd add GDBProcess::target_xml in this same
patch.

With this change:
Reviewed-by: Philippe Mathieu-Daudé 

> +CPUState *cpu = get_first_cpu_in_process(s, process);
> +CPUClass *cc = CPU_GET_CLASS(cpu);
>  
>  len = 0;
>  while (p[len] && p[len] != ':')
>  len++;
>  *newp = p + len;
>  
>  name = NULL;
>  if (strncmp(p, "target.xml", len) == 0) {
> +char *buf = process->target_xml;
> +const size_t buf_sz = sizeof(process->target_xml);
> +
>  /* Generate the XML description for this CPU.  */
> -if (!target_xml[0]) {
> +if (!buf[0]) {
>  GDBRegisterState *r;
> -CPUState *cpu = first_cpu;
>  
> -pstrcat(target_xml, sizeof(target_xml),
> +pstrcat(buf, buf_sz,
>  ""
>  ""
>  "");
>  if (cc->gdb_arch_name) {
>  gchar *arch = cc->gdb_arch_name(cpu);
> -pstrcat(target_xml, sizeof(target_xml), "");
> -pstrcat(target_xml, sizeof(target_xml), arch);
> -pstrcat(target_xml, sizeof(target_xml), "");
> +pstrcat(buf, buf_sz, "");
> +pstrcat(buf, buf_sz, arch);
> +pstrcat(buf, buf_sz, "");
>  g_free(arch);
>  }
> -pstrcat(target_xml, sizeof(target_xml), " -pstrcat(target_xml, sizeof(target_xml), cc->gdb_core_xml_file);
> -pstrcat(target_xml, sizeof(target_xml), "\"/>");
> +pstrcat(buf, buf_sz, " +pstrcat(buf, buf_sz, cc->gdb_core_xml_file);
> +pstrcat(buf, buf_sz, "\"/>");
>  for (r = cpu->gdb_regs; r; r = r->next) {
> -pstrcat(target_xml, sizeof(target_xml), " href=\"");
> -pstrcat(target_xml, sizeof(target_xml), r->xml);
> -pstrcat(target_xml, sizeof(target_xml), "\"/>");
> +pstrcat(buf, buf_sz, " +pstrcat(buf, buf_sz, r->xml);
> +pstrcat(buf, buf_sz, "\"/>");
>  }
> -pstrcat(target_xml, sizeof(target_xml), "");
> +pstrcat(buf, buf_sz, "");
>  }
> -return target_xml;
> +return buf;
>  }
>  if (cc->gdb_get_dynamic_xml) {
> -CPUState *cpu = first_cpu;
>  char *xmlname = g_strndup(p, len);
>  const char *xml = cc->gdb_get_dynamic_xml(cpu, xmlname);
>  
>  g_free(xmlname);
>  if (xml) {
> @@ -1246,10 +1248,11 @@ out:
>  }
>  
>  static int gdb_handle_packet(GDBState *s, const char *line_buf)
>  {
>  CPUState *cpu;
> +GDBProcess *process;
>  CPUClass *cc;
>  const char *p;
>  uint32_t pid, tid;
>  int ch, reg_size, type, res;
>  uint8_t mem_buf[MAX_PACKET_LENGTH];
> @@ -1620,18 +1623,19 @@ static int gdb_handle_packet(GDBState *s, const char 
> *line_buf)
>  }
>  if (strncmp(p, "Xfer:features:read:", 19) == 0) {
>  const char *xml;
>  target_ulong total_len;
>  
> -cc = CPU_GET_CLASS(first_cpu);
> +process = gdb_get_cpu_process(s, s->g_cpu);
> +cc = CPU_GET_CLASS(s->g_cpu);
>  if (cc->gdb_core_xml_file == NULL) {
>  goto unknown_command;
>  }
>  
>  gdb_has_xml = true;
>  p += 19;
> -xml = get_feature_xml(p, &p, cc);
> +xml = get_feature_xml(s, p, &p, process);
>  if (!xml) {
>  snprintf(buf, sizeof(buf), "E00");
>  put_packet(s, buf);
>  break;
>  }
> 



Re: [Qemu-devel] [PATCH 4/7] qcow2: async scheme for qcow2_co_preadv

2018-10-01 Thread Vladimir Sementsov-Ogievskiy

01.10.2018 18:49, Max Reitz wrote:

On 01.10.18 17:33, Vladimir Sementsov-Ogievskiy wrote:

27.09.2018 21:35, Max Reitz wrote:

On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:

Start several async requests instead of read chunk by chunk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
   block/qcow2.c | 208
--
   1 file changed, 204 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5e7f2ee318..a0df8d4e50 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1869,6 +1869,197 @@ out:
   return ret;
   }
   +typedef struct Qcow2WorkerTask {
+    uint64_t file_cluster_offset;
+    uint64_t offset;
+    uint64_t bytes;
+    uint64_t bytes_done;
+} Qcow2WorkerTask;

Why don't you make this a union of request-specific structs?

ok, will try


+
+typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector
*qiov,
+   Qcow2WorkerTask *task);
+
+typedef struct Qcow2RWState {
+    BlockDriverState *bs;
+    QEMUIOVector *qiov;
+    uint64_t bytes;

Maybe make it total_bytes so it doesn't conflict with the value in
Qcow2WorkerTask?

ok


+    int ret;
+    bool waiting_one;
+    bool waiting_all;
+    bool finalize;
+    Coroutine *co;
+    QSIMPLEQ_HEAD(, Qcow2Worker) free_workers;
+    QSIMPLEQ_HEAD(, Qcow2Worker) busy_workers;
+    int online_workers;
+    Qcow2DoWorkFunc do_work_func;
+} Qcow2RWState;
+
+typedef struct Qcow2Worker {
+    Qcow2RWState *rws;
+    Coroutine *co;
+    Qcow2WorkerTask task;
+    bool busy;
+    QSIMPLEQ_ENTRY(Qcow2Worker) entry;
+} Qcow2Worker;
+#define QCOW2_MAX_WORKERS 64

That's really a bit hidden here.  I think it should go into the header.

Also I'm not quite sure about the number.  In other places we've always
used 16.

(With the encryption code always allocating a new bounce buffer, this
can mean quite a bit of memory usage.)

No doubts.


+
+static coroutine_fn void qcow2_rw_worker(void *opaque);
+static Qcow2Worker *qcow2_new_worker(Qcow2RWState *rws)
+{
+    Qcow2Worker *w = g_new0(Qcow2Worker, 1);
+    w->rws = rws;
+    w->co = qemu_coroutine_create(qcow2_rw_worker, w);
+
+    return w;
+}
+
+static void qcow2_free_worker(Qcow2Worker *w)
+{
+    g_free(w);
+}
+
+static coroutine_fn void qcow2_rw_worker(void *opaque)
+{
+    Qcow2Worker *w = opaque;
+    Qcow2RWState *rws = w->rws;
+
+    rws->online_workers++;
+
+    while (!rws->finalize) {
+    int ret = rws->do_work_func(rws->bs, rws->qiov, &w->task);
+    if (ret < 0 && rws->ret == 0) {
+    rws->ret = ret;
+    }
+
+    if (rws->waiting_all || rws->ret < 0) {
+    break;
+    }
+
+    w->busy = false;
+    QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry);
+    QSIMPLEQ_INSERT_TAIL(&rws->free_workers, w, entry);
+    if (rws->waiting_one) {
+    rws->waiting_one = false;
+    /* we must unset it here, to prevent queuing rws->co in
several
+ * workers (it may happen if other worker already waits
us on mutex,
+ * so it will be entered after our yield and before
rws->co enter)
+ *
+ * TODO: rethink this comment, as here (and in other
places in the
+ * file) we moved from qemu_coroutine_add_next to
aio_co_wake.
+ */
+    aio_co_wake(rws->co);
+    }
+
+    qemu_coroutine_yield();
+    }
+
+    if (w->busy) {
+    w->busy = false;
+    QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry);
+    }
+    qcow2_free_worker(w);
+    rws->online_workers--;
+
+    if (rws->waiting_all && rws->online_workers == 0) {
+    aio_co_wake(rws->co);
+    }
+}
+
+static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws,
+    uint64_t
file_cluster_offset,
+    uint64_t offset,
+    uint64_t bytes,
+    uint64_t bytes_done)

I'd propose just taking a const Qcow2WorkerTask * here.  (Makes even
more sense if you make it a union.)

ok, I'll try this way


+{
+    Qcow2Worker *w;
+
+    assert(rws->co == qemu_coroutine_self());
+
+    if (bytes_done == 0 && bytes == rws->bytes) {
+    Qcow2WorkerTask task = {
+    .file_cluster_offset = file_cluster_offset,
+    .offset = offset,
+    .bytes = bytes,
+    .bytes_done = bytes_done
+    };
+    rws->ret = rws->do_work_func(rws->bs, rws->qiov, &task);

(If so, you'd just pass the pointer along here)


+    return;
+    }

I like this fast path, but I think it deserves a small comment.  (That
is a fast path and bypasses the whole worker infrastructure.)


+
+    if (!QSIMPLEQ_EMPTY(&rws->free_workers)) {
+    w = QSIMPLEQ_FIRST(&rws->free_workers);
+    QSIMPLEQ_REMOVE_HEAD(&rws->free_workers, entry);
+    } else if (rws->online_workers < QCOW2_MAX_WORKERS) {
+    w = qco

Re: [Qemu-devel] [PATCH v2 01/15] gdbstub: introduce GDB processes

2018-10-01 Thread Philippe Mathieu-Daudé
Hi Luc,

On 01/10/2018 13:56, Luc Michel wrote:
> Add a structure GDBProcess that represent processes from the GDB
> semantic point of view.
> 
> CPUs can be split into different processes, by grouping them under a QOM
> container named after the GDB_CPU_GROUP_NAME macro (`gdb-group[*]').
> Each occurrence of such a container implies the existence of the
> corresponding process in the GDB stub. The gdb_cpu_group_container_get()
> function can be used to create a new container.
> 
> When no such container are found, all the CPUs are put in a unique GDB
> process (create_unique_process()). This is also the case when compiled
> in user mode, where multi-processes do not make much sense for now.
> 
> Signed-off-by: Luc Michel 
> ---
>  include/exec/gdbstub.h |  8 +
>  gdbstub.c  | 67 ++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
> index 08363969c1..a3e4159bf4 100644
> --- a/include/exec/gdbstub.h
> +++ b/include/exec/gdbstub.h
> @@ -1,8 +1,10 @@
>  #ifndef GDBSTUB_H
>  #define GDBSTUB_H
>  
> +#include "qom/object.h"
> +
>  #define DEFAULT_GDBSTUB_PORT "1234"
>  
>  /* GDB breakpoint/watchpoint types */
>  #define GDB_BREAKPOINT_SW0
>  #define GDB_BREAKPOINT_HW1
> @@ -129,6 +131,12 @@ void gdbserver_cleanup(void);
>  extern bool gdb_has_xml;
>  
>  /* in gdbstub-xml.c, generated by scripts/feature_to_c.sh */
>  extern const char *const xml_builtin[][2];
>  
> +#define GDB_CPU_GROUP_NAME  "gdb-group"
> +
> +static inline Object *gdb_cpu_group_container_get(Object *parent)
> +{
> +return container_get(parent, "/" GDB_CPU_GROUP_NAME "[*]");
> +}
>  #endif
> diff --git a/gdbstub.c b/gdbstub.c
> index d6ab95006c..5c86218f49 100644
> --- a/gdbstub.c
> +++ b/gdbstub.c
> @@ -295,10 +295,17 @@ typedef struct GDBRegisterState {
>  gdb_reg_cb set_reg;
>  const char *xml;
>  struct GDBRegisterState *next;
>  } GDBRegisterState;
>  
> +typedef struct GDBProcess {
> +uint32_t pid;
> +bool attached;
> +
> +char target_xml[1024];

I'd add this field in the patch #7 "support to Xfer:features:read:"
where you start using it.

> +} GDBProcess;
> +
>  enum RSState {
>  RS_INACTIVE,
>  RS_IDLE,
>  RS_GETLINE,
>  RS_GETLINE_ESC,
> @@ -323,10 +330,13 @@ typedef struct GDBState {
>  int running_state;
>  #else
>  CharBackend chr;
>  Chardev *mon_chr;
>  #endif
> +bool multiprocess;
> +GDBProcess *processes;
> +int process_num;
>  char syscall_buf[256];
>  gdb_syscall_complete_cb current_syscall_cb;
>  } GDBState;
>  
>  /* By default use no IRQs and no timers while single stepping so as to
> @@ -1750,10 +1760,24 @@ void gdb_exit(CPUArchState *env, int code)
>  #ifndef CONFIG_USER_ONLY
>qemu_chr_fe_deinit(&s->chr, true);
>  #endif
>  }
>  
> +/*
> + * Create a unique process containing all the CPUs.
> + */
> +static void create_unique_process(GDBState *s)
> +{
> +GDBProcess *process;
> +
> +s->processes = g_malloc0(sizeof(GDBProcess));
> +s->process_num = 1;
> +process = &s->processes[0];
> +
> +process->pid = 1;
> +}
> +
>  #ifdef CONFIG_USER_ONLY
>  int
>  gdb_handlesig(CPUState *cpu, int sig)
>  {
>  GDBState *s;
> @@ -1847,10 +1871,11 @@ static bool gdb_accept(void)
>  }
>  
>  s = g_malloc0(sizeof(GDBState));
>  s->c_cpu = first_cpu;
>  s->g_cpu = first_cpu;
> +create_unique_process(s);
>  s->fd = fd;
>  gdb_has_xml = false;
>  
>  gdbserver_state = s;
>  return true;
> @@ -2003,10 +2028,48 @@ static const TypeInfo char_gdb_type_info = {
>  .name = TYPE_CHARDEV_GDB,
>  .parent = TYPE_CHARDEV,
>  .class_init = char_gdb_class_init,
>  };
>  
> +static void create_processes(GDBState *s)
> +{
> +Object *container;
> +int i = 0;
> +char process_str[16];
> +
> +container = object_resolve_path(GDB_CPU_GROUP_NAME "[0]", NULL);
> +
> +while (container) {
> +s->processes = g_renew(GDBProcess, s->processes, i + 1);
> +
> +GDBProcess *process = &s->processes[i];
> +
> +/* GDB process IDs -1 and 0 are reserved */
> +process->pid = i + 1;
> +process->attached = false;
> +process->target_xml[0] = '\0';
> +
> +i++;
> +snprintf(process_str, sizeof(process_str), GDB_CPU_GROUP_NAME 
> "[%d]", i);
> +container = object_resolve_path(process_str, NULL);
> +}
> +
> +if (!s->processes) {
> +/* No CPU group specified by the machine */
> +create_unique_process(s);
> +} else {
> +s->process_num = i;
> +}
> +}
> +
> +static void cleanup_processes(GDBState *s)
> +{
> +g_free(s->processes);
> +s->process_num = 0;
> +s->processes = NULL;
> +}
> +
>  int gdbserver_start(const char *device)
>  {
>  trace_gdbstub_op_start(device);
>  
>  GDBState *s;
> @@ -2055,15 +2118,19 @@ int gdbserver_start(const char *device)
>   

Re: [Qemu-devel] [PATCH] test-replication: Lock AioContext around blk_unref()

2018-10-01 Thread Paolo Bonzini
On 01/10/2018 18:03, Kevin Wolf wrote:
>> Given the backtrace, I think bdrv_close should be taking the AioContext
>> lock instead of blockdev_close_all_bdrv_states.
> Conversely, that would mean that calling bdrv_unref() with the
> AioContext lock held is a bug (because close callbacks can involve
> AIO_WAIT_WHILE()). I'm not sure if that's very practical.

What if bdrv_unref_child dropped the lock?

Perhaps it's simpler to just kill the lock though, if
bdrv_set_aio_context can be figured out.

Paolo

> Of course, there will probably be a lot of callers to fix either way
> after we define whether to hold the lock for bdrv_unref() or not. Either
> you need to add locking to the places where it's missing or you need to
> drop the locks in all other places.
> 
> I was leaning towards requiring the lock for bdrv_unref() (and
> therefore blk_unref()).




Re: [Qemu-devel] [PATCH] tests/test-bdrv-drain: Fix too late qemu_event_reset()

2018-10-01 Thread Max Reitz
On 01.10.18 17:10, Kevin Wolf wrote:
> qemu_event_reset() must be called before the AIO request in a different
> iothread is submitted. Otherwise the request could be completed before
> we do the qemu_event_reset() and the test would hang in
> qemu_event_wait().
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/test-bdrv-drain.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Max Reitz 
Tested-by: Max Reitz 

(I still got the same hang as the one you just reported, though -- but
it happens just very rarely, so it's definitely better.)



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PULL 00/42] Block patches

2018-10-01 Thread Kevin Wolf
Am 01.10.2018 um 16:14 hat Kevin Wolf geschrieben:
> Am 01.10.2018 um 15:03 hat Peter Maydell geschrieben:
> > On 28 September 2018 at 15:36, Peter Maydell  
> > wrote:
> > > I'm finding that test-bdrv-drain hangs intermittently on my OSX host.
> > 
> > Ping? Between this and test-replication I'm finding that my
> > parallel build tests for merges are failing about 50% of the
> > time :-(
> 
> Sorry, there wasn't much more than a weekend between your report and
> now.
> 
> For the replication one, I think we can just take the AioContext lock in
> the test case while we decide how the API should really be used. I'll
> prepare a fix for that (and hopefully I'll be able to reproduce the
> problem reliably enough to verify the fix).
> 
> Max said he could reproduce some hang in test-bdrv-drain (though we
> don't know if this has anything to do with your OS X hang, which looked
> rather odd) and would look into it, but I don't think we know the
> problem yet. I'll try to reproduce that one after fixing the replication
> test.

So I sent two patches for the two test cases that should fix the bugs
that made the tests fail relatively frequently. I can still reproduce
another hang, which is a bit mysterious to me:

Thread 2 (Thread 3321.3818):
#0  0x7f2ebbdcc4e9 in syscall () from /lib64/libc.so.6
#1  0x5594d095690b in qemu_futex_wait (val=, f=) at /home/kwolf/source/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x5594d0bff228 ) at 
util/qemu-thread-posix.c:442
#3  0x5594d0965f58 in call_rcu_thread (opaque=) at 
util/rcu.c:261
#4  0x7f2ebc09d36d in start_thread () from /lib64/libpthread.so.0
#5  0x7f2ebbdd1b4f in clone () from /lib64/libc.so.6

Thread 1 (Thread 3321.3321):
#0  0x7f2ebc09e89d in pthread_join () from /lib64/libpthread.so.0
#1  0x5594d0956b6f in qemu_thread_join (thread=thread@entry=0x5594d16bd0b8) 
at util/qemu-thread-posix.c:565
#2  0x5594d091f4d9 in iothread_join (iothread=0x5594d16bd0b0) at 
tests/iothread.c:62
#3  0x5594d08806cc in test_iothread_common (drain_type=BDRV_DRAIN_ALL, 
drain_thread=) at tests/test-bdrv-drain.c:763
#4  0x7f2ebd58e178 in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#5  0x7f2ebd58e37b in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#6  0x7f2ebd58e37b in g_test_run_suite_internal () from 
/lib64/libglib-2.0.so.0
#7  0x7f2ebd58e51b in g_test_run_suite () from /lib64/libglib-2.0.so.0
#8  0x7f2ebd58e571 in g_test_run () from /lib64/libglib-2.0.so.0
#9  0x5594d087a534 in main (argc=, argv=) at 
tests/test-bdrv-drain.c:1606

This pthread_join() is waiting for a thread that doesn't even exist any
more. I caught the bug in rr and am clearly seeing how the iothread is
notified and terminates. But pthread_join() just doesn't return.

Kevin



Re: [Qemu-devel] [PATCH] test-replication: Lock AioContext around blk_unref()

2018-10-01 Thread Kevin Wolf
Am 01.10.2018 um 17:40 hat Paolo Bonzini geschrieben:
> On 01/10/2018 16:32, Kevin Wolf wrote:
> > Recently, the test case has started failing because some job related
> > functions want to drop the AioContext lock even though it hasn't been
> > taken:
> > 
> > (gdb) bt
> > #0  0x7f51c067c9fb in raise () from /lib64/libc.so.6
> > #1  0x7f51c067e77d in abort () from /lib64/libc.so.6
> > #2  0x558c9d5dde7b in error_exit (err=, 
> > msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at 
> > util/qemu-thread-posix.c:36
> > #3  0x558c9d6b5263 in qemu_mutex_unlock_impl 
> > (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f 
> > "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96
> > #4  0x558c9d6b0565 in aio_context_release 
> > (ctx=ctx@entry=0x558c9f399940) at util/async.c:516
> > #5  0x558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) 
> > at job.c:738
> > #6  0x558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, 
> > finish=finish@entry=0x558c9d5eb8d0 , errp=errp@entry=0x0) 
> > at job.c:986
> > #7  0x558c9d5eb8ee in job_cancel_sync (job=) at 
> > job.c:941
> > #8  0x558c9d64d853 in replication_close (bs=) at 
> > block/replication.c:148
> > #9  0x558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420
> > #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629
> > #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685
> > #12 0x558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) 
> > at block/block-backend.c:783
> > #13 0x558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at 
> > block/block-backend.c:402
> > #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457
> > #15 0x558c9d5dfcea in test_secondary_stop () at 
> > tests/test-replication.c:478
> > #16 0x7f51c1f13178 in g_test_run_suite_internal () from 
> > /lib64/libglib-2.0.so.0
> > #17 0x7f51c1f1337b in g_test_run_suite_internal () from 
> > /lib64/libglib-2.0.so.0
> > #18 0x7f51c1f1337b in g_test_run_suite_internal () from 
> > /lib64/libglib-2.0.so.0
> > #19 0x7f51c1f13552 in g_test_run_suite () from 
> > /lib64/libglib-2.0.so.0
> > #20 0x7f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0
> > #21 0x558c9d5de31f in main (argc=, argv= > out>) at tests/test-replication.c:581
> > 
> > It is yet unclear whether this should really be considered a bug in the
> > test case or whether blk_unref() should work for callers that haven't
> > taken the AioContext lock, but in order to fix the build tests quickly,
> > just take the AioContext lock around blk_unref().
> 
> Given the backtrace, I think bdrv_close should be taking the AioContext
> lock instead of blockdev_close_all_bdrv_states.

Conversely, that would mean that calling bdrv_unref() with the
AioContext lock held is a bug (because close callbacks can involve
AIO_WAIT_WHILE()). I'm not sure if that's very practical.

Of course, there will probably be a lot of callers to fix either way
after we define whether to hold the lock for bdrv_unref() or not. Either
you need to add locking to the places where it's missing or you need to
drop the locks in all other places.

I was leaning towards requiring the lock for bdrv_unref() (and
therefore blk_unref()).

Kevin



Re: [Qemu-devel] [PATCH 3/7] qcow2: split out reading normal clusters from qcow2_co_preadv

2018-10-01 Thread Vladimir Sementsov-Ogievskiy

01.10.2018 18:39, Max Reitz wrote:

On 01.10.18 17:14, Vladimir Sementsov-Ogievskiy wrote:

27.09.2018 20:35, Max Reitz wrote:

On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:

Memory allocation may become less efficient for encrypted case. It's a
payment for further asynchronous scheme.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c | 114 --
  1 file changed, 64 insertions(+), 50 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 65e3ca00e2..5e7f2ee318 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1808,6 +1808,67 @@ out:
  return ret;
  }
  
+static coroutine_fn int qcow2_co_preadv_normal(BlockDriverState *bs,

+   uint64_t file_cluster_offset,
+   uint64_t offset,
+   uint64_t bytes,
+   QEMUIOVector *qiov,
+   uint64_t qiov_offset)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+void *crypt_buf = NULL;
+QEMUIOVector hd_qiov;
+int offset_in_cluster = offset_into_cluster(s, offset);
+
+if ((file_cluster_offset & 511) != 0) {
+return -EIO;
+}
+
+qemu_iovec_init(&hd_qiov, qiov->niov);

So you're not just re-allocating the bounce buffer for every single
call, but also this I/O vector.  Hm.


+if (bs->encrypted) {
+assert(s->crypto);
+assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+
+crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);

1. Why did you remove the comment?

2. The check for whether crypt_buf was successfully allocated is missing.

3. Do you have any benchmarks for encrypted images?  Having to allocate
the bounce buffer for potentially every single cluster sounds rather bad
to me.

Hmm, no excuses. 1- Will fix, 2 - will fix, 3 - will fix or at least
test the performance.


+qemu_iovec_add(&hd_qiov, crypt_buf, bytes);
+} else {
+qemu_iovec_concat(&hd_qiov, qiov, qiov_offset, bytes);

qcow2_co_preadv() continues to do this as well.  That's superfluous now.

hd_qiov is local here.. or what do you mean?

qcow2_co_preadv() continues to have its own hd_qiov and appends qiov to
it (just like here), but it's unused for normal clusters.  So it doesn't
have to do that for normal clusters.


ah, yes, understand. Will think how to properly handle it.




+}
+
+BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
+ret = bdrv_co_preadv(bs->file,
+ file_cluster_offset + offset_in_cluster,
+ bytes, &hd_qiov, 0);
+if (ret < 0) {
+goto out;
+}
+
+if (bs->encrypted) {
+assert(s->crypto);
+assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+if (qcrypto_block_decrypt(s->crypto,
+  (s->crypt_physical_offset ?
+   file_cluster_offset + offset_in_cluster :
+   offset),
+  crypt_buf,
+  bytes,
+  NULL) < 0) {

What's the reason against decrypting this in-place in qiov so we could
save the bounce buffer?  We allow offsets in clusters, so why can't we
just call this function once per involved I/O vector entry?

Isn't it unsafe to do decryption in guest buffers?

I don't know.  Why would it be?  Because...


Max


+ret = -EIO;
+goto out;
+}
+qemu_iovec_from_buf(qiov, qiov_offset, crypt_buf, bytes);

...we're putting the decrypted content there anyway.

The only issue I see is that the guest may see encrypted content for a
short duration.  Hm.  Maybe we don't want that.  In which case it
probably has to stay as it is.


Yes, I mean exactly this thing.



Max




--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH 1/7] qcow2: move qemu_co_mutex_lock below decryption procedure

2018-10-01 Thread Vladimir Sementsov-Ogievskiy

27.09.2018 20:02, Max Reitz wrote:

On 27.09.18 18:58, Max Reitz wrote:

On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:

From: "Denis V. Lunev" 

We are not working with a shared state data in the decruption code and

(*decryption)


thus this operation is safe. On the other hand this significantly
reduces the scope of the lock.

Sure, but does it have any effect?  This is a coroutine lock and I don't
see any yield points besides the bdrv_co_preadv() that was executed
outside of the lock anyway.

I think it makes sense to move the qemu_co_mutex_lock() down below the
decryption, as the commit title suggests.  Because then we can decrypt
while another coroutine that uses the lock is blocking.

But I don't see the point of moving the qemu_co_mutex_unlock() up.
(That is non-trivial movement because it means I'd have to find out
whether anything that could modify the cluster offset is serialized by
the generic block layer.  Or, well, not really, because there is no
yield point, so it doesn't actually matter.  But it doesn't give us any
benefit either, I think.)

Max


It don't make any performance benefit, but it allows to split part about 
normal cluster reading to separate function.


Hmm, in future we can do decryption in threads (like compress threads) 
and it should improve performance.





Max


Signed-off-by: Denis V. Lunev 
---
  block/qcow2.c | 14 --
  1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index ec9e6238a0..41def07c67 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1880,9 +1880,11 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState 
*bs, uint64_t offset,
  break;
  
  case QCOW2_CLUSTER_NORMAL:

+qemu_co_mutex_unlock(&s->lock);
+
  if ((cluster_offset & 511) != 0) {
  ret = -EIO;
-goto fail;
+goto fail_nolock;
  }
  
  if (bs->encrypted) {

@@ -1899,7 +1901,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState 
*bs, uint64_t offset,
  * s->cluster_size);
  if (cluster_data == NULL) {
  ret = -ENOMEM;
-goto fail;
+goto fail_nolock;
  }
  }
  
@@ -1909,13 +1911,11 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,

  }
  
  BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);

-qemu_co_mutex_unlock(&s->lock);
  ret = bdrv_co_preadv(bs->file,
   cluster_offset + offset_in_cluster,
   cur_bytes, &hd_qiov, 0);
-qemu_co_mutex_lock(&s->lock);
  if (ret < 0) {
-goto fail;
+goto fail_nolock;
  }
  if (bs->encrypted) {
  assert(s->crypto);
@@ -1929,10 +1929,11 @@ static coroutine_fn int 
qcow2_co_preadv(BlockDriverState *bs, uint64_t offset,
cur_bytes,
NULL) < 0) {
  ret = -EIO;
-goto fail;
+goto fail_nolock;
  }
  qemu_iovec_from_buf(qiov, bytes_done, cluster_data, 
cur_bytes);
  }
+qemu_co_mutex_lock(&s->lock);
  break;
  
  default:

@@ -1950,6 +1951,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState 
*bs, uint64_t offset,
  fail:
  qemu_co_mutex_unlock(&s->lock);
  
+fail_nolock:

  qemu_iovec_destroy(&hd_qiov);
  qemu_vfree(cluster_data);
  








--
Best regards,
Vladimir




Re: [Qemu-devel] [PATCH] scripts/device-crash-test: Remove entries for serial devices

2018-10-01 Thread Eduardo Habkost
On Mon, Oct 01, 2018 at 04:13:10PM +0200, Thomas Huth wrote:
> The problem with the various serial devices has been fixed a while
> ago in commit 47c4f85a0c27888e12af827471cfef87deb49821 ("hw/char/serial:
> Allow disconnected chardevs") already, so we can remove these entries
> from the "ignore" list in the device-crash-test script now.
> 
> Signed-off-by: Thomas Huth 

Thanks, queued on python-next.

-- 
Eduardo



Re: [Qemu-devel] [PATCH 4/7] qcow2: async scheme for qcow2_co_preadv

2018-10-01 Thread Max Reitz
On 01.10.18 17:33, Vladimir Sementsov-Ogievskiy wrote:
> 27.09.2018 21:35, Max Reitz wrote:
>> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:
>>> Start several async requests instead of read chunk by chunk.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/qcow2.c | 208
>>> --
>>>   1 file changed, 204 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 5e7f2ee318..a0df8d4e50 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -1869,6 +1869,197 @@ out:
>>>   return ret;
>>>   }
>>>   +typedef struct Qcow2WorkerTask {
>>> +    uint64_t file_cluster_offset;
>>> +    uint64_t offset;
>>> +    uint64_t bytes;
>>> +    uint64_t bytes_done;
>>> +} Qcow2WorkerTask;
>> Why don't you make this a union of request-specific structs?
> 
> ok, will try
> 
>>
>>> +
>>> +typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector
>>> *qiov,
>>> +   Qcow2WorkerTask *task);
>>> +
>>> +typedef struct Qcow2RWState {
>>> +    BlockDriverState *bs;
>>> +    QEMUIOVector *qiov;
>>> +    uint64_t bytes;
>> Maybe make it total_bytes so it doesn't conflict with the value in
>> Qcow2WorkerTask?
> 
> ok
> 
>>
>>> +    int ret;
>>> +    bool waiting_one;
>>> +    bool waiting_all;
>>> +    bool finalize;
>>> +    Coroutine *co;
>>> +    QSIMPLEQ_HEAD(, Qcow2Worker) free_workers;
>>> +    QSIMPLEQ_HEAD(, Qcow2Worker) busy_workers;
>>> +    int online_workers;
>>> +    Qcow2DoWorkFunc do_work_func;
>>> +} Qcow2RWState;
>>> +
>>> +typedef struct Qcow2Worker {
>>> +    Qcow2RWState *rws;
>>> +    Coroutine *co;
>>> +    Qcow2WorkerTask task;
>>> +    bool busy;
>>> +    QSIMPLEQ_ENTRY(Qcow2Worker) entry;
>>> +} Qcow2Worker;
>>> +#define QCOW2_MAX_WORKERS 64
>> That's really a bit hidden here.  I think it should go into the header.
>>
>> Also I'm not quite sure about the number.  In other places we've always
>> used 16.
>>
>> (With the encryption code always allocating a new bounce buffer, this
>> can mean quite a bit of memory usage.)
> 
> No doubts.
> 
>>
>>> +
>>> +static coroutine_fn void qcow2_rw_worker(void *opaque);
>>> +static Qcow2Worker *qcow2_new_worker(Qcow2RWState *rws)
>>> +{
>>> +    Qcow2Worker *w = g_new0(Qcow2Worker, 1);
>>> +    w->rws = rws;
>>> +    w->co = qemu_coroutine_create(qcow2_rw_worker, w);
>>> +
>>> +    return w;
>>> +}
>>> +
>>> +static void qcow2_free_worker(Qcow2Worker *w)
>>> +{
>>> +    g_free(w);
>>> +}
>>> +
>>> +static coroutine_fn void qcow2_rw_worker(void *opaque)
>>> +{
>>> +    Qcow2Worker *w = opaque;
>>> +    Qcow2RWState *rws = w->rws;
>>> +
>>> +    rws->online_workers++;
>>> +
>>> +    while (!rws->finalize) {
>>> +    int ret = rws->do_work_func(rws->bs, rws->qiov, &w->task);
>>> +    if (ret < 0 && rws->ret == 0) {
>>> +    rws->ret = ret;
>>> +    }
>>> +
>>> +    if (rws->waiting_all || rws->ret < 0) {
>>> +    break;
>>> +    }
>>> +
>>> +    w->busy = false;
>>> +    QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry);
>>> +    QSIMPLEQ_INSERT_TAIL(&rws->free_workers, w, entry);
>>> +    if (rws->waiting_one) {
>>> +    rws->waiting_one = false;
>>> +    /* we must unset it here, to prevent queuing rws->co in
>>> several
>>> + * workers (it may happen if other worker already waits
>>> us on mutex,
>>> + * so it will be entered after our yield and before
>>> rws->co enter)
>>> + *
>>> + * TODO: rethink this comment, as here (and in other
>>> places in the
>>> + * file) we moved from qemu_coroutine_add_next to
>>> aio_co_wake.
>>> + */
>>> +    aio_co_wake(rws->co);
>>> +    }
>>> +
>>> +    qemu_coroutine_yield();
>>> +    }
>>> +
>>> +    if (w->busy) {
>>> +    w->busy = false;
>>> +    QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry);
>>> +    }
>>> +    qcow2_free_worker(w);
>>> +    rws->online_workers--;
>>> +
>>> +    if (rws->waiting_all && rws->online_workers == 0) {
>>> +    aio_co_wake(rws->co);
>>> +    }
>>> +}
>>> +
>>> +static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws,
>>> +    uint64_t
>>> file_cluster_offset,
>>> +    uint64_t offset,
>>> +    uint64_t bytes,
>>> +    uint64_t bytes_done)
>> I'd propose just taking a const Qcow2WorkerTask * here.  (Makes even
>> more sense if you make it a union.)
> 
> ok, I'll try this way
> 
>>
>>> +{
>>> +    Qcow2Worker *w;
>>> +
>>> +    assert(rws->co == qemu_coroutine_self());
>>> +
>>> +    if (bytes_done == 0 && bytes == rws->bytes) {
>>> +    Qcow2WorkerTask task = {
>>> +    .file_cluster_offset = file_cluster_offset,
>>> +    .offset = offset,
>>> +    .bytes = bytes,
>>> +  

Re: [Qemu-devel] [PATCH 5/7] qcow2: refactor qcow2_co_pwritev: split out qcow2_co_do_pwritev

2018-10-01 Thread Max Reitz
On 01.10.18 17:43, Vladimir Sementsov-Ogievskiy wrote:
> 28.09.2018 17:23, Max Reitz wrote:
>> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:
>>> Split out block which will be reused in async scheme.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/qcow2.c | 138
>>> --
>>>   1 file changed, 86 insertions(+), 52 deletions(-)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index a0df8d4e50..4d669432d1 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -2210,6 +2210,85 @@ static bool merge_cow(uint64_t offset,
>>> unsigned bytes,
>>>   return false;
>>>   }
>>>   +/* qcow2_co_do_pwritev
>>> + * Called without s->lock unlocked
>>> + * hd_qiov - temp qiov for any use. It is initialized so it is empty
>>> and
>>> + *   support adding up to qiov->niov + 2 elements
>>> + * l2meta  - if not NULL, qcow2_co_do_pwritev() will consume it.
>>> Caller must not
>>> + *   use it somehow after qcow2_co_do_pwritev() call
>>> + */
>>> +static coroutine_fn int qcow2_co_do_pwritev(BlockDriverState *bs,
>>> +    uint64_t
>>> file_cluster_offset,
>>> +    uint64_t offset,
>>> +    uint64_t bytes,
>>> +    QEMUIOVector *qiov,
>>> +    uint64_t qiov_offset,
>>> +    QCowL2Meta *l2meta)
>>> +{
>>> +    int ret;
>>> +    BDRVQcow2State *s = bs->opaque;
>>> +    void *crypt_buf = NULL;
>>> +    QEMUIOVector hd_qiov;
>>> +    int offset_in_cluster = offset_into_cluster(s, offset);
>>> +
>>> +    qemu_iovec_reset(&hd_qiov);
>> This shouldn't be here.
>>
>>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>>> +
>>> +    if (bs->encrypted) {
>>> +    assert(s->crypto);
>>> +    assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
>>> +    crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
>>> +    qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
>>> +
>>> +    if (qcrypto_block_encrypt(s->crypto,
>>> +  (s->crypt_physical_offset ?
>>> +   file_cluster_offset +
>>> offset_in_cluster :
>>> +   offset),
>>> +  crypt_buf,
>>> +  bytes, NULL) < 0) {
>> Same question as in the read case: Can't we make do without the bounce
>> buffer?
> 
> I think, we should not modify guest buffers..

Hmm, yes, agreed. O:-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 7/7] qcow2: async scheme for qcow2_co_pwritev

2018-10-01 Thread Vladimir Sementsov-Ogievskiy

28.09.2018 17:35, Max Reitz wrote:

On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:

Start several async requests instead of read chunk by chunk.

Iotest 026 output is changed, as because of async io error path has
changed.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c  | 47 ++
  tests/qemu-iotests/026.out | 18 ---
  tests/qemu-iotests/026.out.nocache | 20 
  3 files changed, 59 insertions(+), 26 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 88b3fb8080..c7501adfc6 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1874,6 +1874,7 @@ typedef struct Qcow2WorkerTask {
  uint64_t offset;
  uint64_t bytes;
  uint64_t bytes_done;
+QCowL2Meta *l2meta;
  } Qcow2WorkerTask;
  
  typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector *qiov,

@@ -1965,11 +1966,16 @@ static coroutine_fn void qcow2_rw_worker(void *opaque)
  }
  }
  
+/* qcow2_rws_add_task

+* l2meta - opaque pointer for do_work_func, so behavior depends on
+*  do_work_func, specified in qcow2_init_rws
+*/

I think this would work better if Qcow2WorkerTask was indeed a union.


  static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws,
  uint64_t file_cluster_offset,
  uint64_t offset,
  uint64_t bytes,
-uint64_t bytes_done)
+uint64_t bytes_done,
+QCowL2Meta *l2meta)
  {
  Qcow2Worker *w;
  
@@ -1980,7 +1986,8 @@ static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws,

  .file_cluster_offset = file_cluster_offset,
  .offset = offset,
  .bytes = bytes,
-.bytes_done = bytes_done
+.bytes_done = bytes_done,
+.l2meta = l2meta
  };
  rws->ret = rws->do_work_func(rws->bs, rws->qiov, &task);
  return;
@@ -2006,6 +2013,7 @@ static coroutine_fn void qcow2_rws_add_task(Qcow2RWState 
*rws,
  w->task.offset = offset;
  w->task.bytes = bytes;
  w->task.bytes_done = bytes_done;
+w->task.l2meta = l2meta;
  
  qemu_coroutine_enter(w->co);

  }
@@ -2137,7 +2145,7 @@ static coroutine_fn int qcow2_co_preadv(BlockDriverState 
*bs, uint64_t offset,
  qemu_co_mutex_unlock(&s->lock);
  
  qcow2_rws_add_task(&rws, cluster_offset, offset, cur_bytes,

-   bytes_done);
+   bytes_done, NULL);
  if (rws.ret < 0) {
  ret = rws.ret;
  goto fail_nolock;
@@ -2289,6 +2297,19 @@ fail:
  return ret;
  }
  
+static coroutine_fn int qcow2_co_do_pwritev_task(BlockDriverState *bs,

+ QEMUIOVector *qiov,
+ Qcow2WorkerTask *task)
+{
+return qcow2_co_do_pwritev(bs,
+   task->file_cluster_offset,
+   task->offset,
+   task->bytes,
+   qiov,
+   task->bytes_done,
+   task->l2meta);
+}
+
  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
   uint64_t bytes, QEMUIOVector *qiov,
   int flags)
@@ -2299,16 +2320,19 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes_done = 0;
  uint8_t *cluster_data = NULL;
  QCowL2Meta *l2meta = NULL;
+Qcow2RWState rws = {0};
  
  trace_qcow2_writev_start_req(qemu_coroutine_self(), offset, bytes);
  
+qcow2_init_rws(&rws, bs, qiov, bytes, qcow2_co_do_pwritev_task);

+
  qemu_iovec_init(&hd_qiov, qiov->niov);
  
  s->cluster_cache_offset = -1; /* disable compressed cache */
  
  qemu_co_mutex_lock(&s->lock);
  
-while (bytes != 0) {

+while (bytes != 0 && rws.ret == 0) {
  int offset_in_cluster = offset_into_cluster(s, offset);
  unsigned int cur_bytes = MIN(bytes, INT_MAX); /* number of sectors in
   current iteration */
@@ -2340,11 +2364,11 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
  
  qemu_co_mutex_unlock(&s->lock);
  
-

-ret = qcow2_co_do_pwritev(bs, cluster_offset, offset, cur_bytes,
-  qiov, bytes_done, l2meta);
-l2meta = NULL; /* l2meta is consumed by qcow2_co_do_pwritev() */
-if (ret < 0) {
+qcow2_rws_add_task(&rws, cluster_offset, offset, cur_bytes, bytes_done,
+   l2meta);
+l2meta = NULL; /* l2m

Re: [Qemu-devel] [PATCH 5/7] qcow2: refactor qcow2_co_pwritev: split out qcow2_co_do_pwritev

2018-10-01 Thread Vladimir Sementsov-Ogievskiy

28.09.2018 17:23, Max Reitz wrote:

On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:

Split out block which will be reused in async scheme.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c | 138 --
  1 file changed, 86 insertions(+), 52 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index a0df8d4e50..4d669432d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2210,6 +2210,85 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
  return false;
  }
  
+/* qcow2_co_do_pwritev

+ * Called without s->lock unlocked
+ * hd_qiov - temp qiov for any use. It is initialized so it is empty and
+ *   support adding up to qiov->niov + 2 elements
+ * l2meta  - if not NULL, qcow2_co_do_pwritev() will consume it. Caller must 
not
+ *   use it somehow after qcow2_co_do_pwritev() call
+ */
+static coroutine_fn int qcow2_co_do_pwritev(BlockDriverState *bs,
+uint64_t file_cluster_offset,
+uint64_t offset,
+uint64_t bytes,
+QEMUIOVector *qiov,
+uint64_t qiov_offset,
+QCowL2Meta *l2meta)
+{
+int ret;
+BDRVQcow2State *s = bs->opaque;
+void *crypt_buf = NULL;
+QEMUIOVector hd_qiov;
+int offset_in_cluster = offset_into_cluster(s, offset);
+
+qemu_iovec_reset(&hd_qiov);

This shouldn't be here.


+qemu_iovec_init(&hd_qiov, qiov->niov);
+
+if (bs->encrypted) {
+assert(s->crypto);
+assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
+qemu_iovec_to_buf(qiov, qiov_offset, crypt_buf, bytes);
+
+if (qcrypto_block_encrypt(s->crypto,
+  (s->crypt_physical_offset ?
+   file_cluster_offset + offset_in_cluster :
+   offset),
+  crypt_buf,
+  bytes, NULL) < 0) {

Same question as in the read case: Can't we make do without the bounce
buffer?


I think, we should not modify guest buffers..




+ret = -EIO;
+goto fail;
+}
+
+qemu_iovec_add(&hd_qiov, crypt_buf, bytes);
+} else {
+qemu_iovec_concat(&hd_qiov, qiov, qiov_offset, bytes);
+}
+
+/* If we need to do COW, check if it's possible to merge the
+ * writing of the guest data together with that of the COW regions.
+ * If it's not possible (or not necessary) then write the
+ * guest data now. */
+if (!merge_cow(offset, bytes, &hd_qiov, l2meta)) {
+BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+trace_qcow2_writev_data(qemu_coroutine_self(),
+file_cluster_offset + offset_in_cluster);
+ret = bdrv_co_pwritev(bs->file,
+  file_cluster_offset + offset_in_cluster,
+  bytes, &hd_qiov, 0);
+if (ret < 0) {
+qemu_co_mutex_lock(&s->lock);
+goto fail;
+}
+}
+
+qemu_co_mutex_lock(&s->lock);

It looks a bit weird to only do some of the lock handling in this
function.  I don't quite like it, but as long as you keep it around the
qcow2_handle_l2meta(), I won't object.

A problem is that currently you don't keep it around
qcow2_handle_l2meta().  Before going to the fail label, one has to
manually lock the mutex (which you don't do in one of the above error
paths, but that's fallout from patch 2).  Maybe it would be nicer with a
separate label that does the lock.

Like:

 ...
 ret = qcow2_handle_l2meta(bs, &l2meta, true);
 goto done_locked;

fail_unlocked:
 qemu_co_mutex_lock(&s->lock)
done_locked:
 qcow2_handle_l2meta(bs, &l2meta, false);
 ...


ok




+
+ret = qcow2_handle_l2meta(bs, &l2meta, true);
+if (ret) {
+goto fail;
+}
+
+fail:
+qcow2_handle_l2meta(bs, &l2meta, false);
+qemu_co_mutex_unlock(&s->lock);
+
+qemu_vfree(crypt_buf);
+qemu_iovec_destroy(&hd_qiov);
+
+return ret;
+}
+
  static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
   uint64_t bytes, QEMUIOVector *qiov,
   int flags)
@@ -2262,63 +2341,16 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
  
  qemu_co_mutex_unlock(&s->lock);
  
-qemu_iovec_reset(&hd_qiov);

-qemu_iovec_concat(&hd_qiov, qiov, bytes_done, cur_bytes);
  
-if (bs->encrypted) {

-assert(s->crypto);
-if (!cluster_data) {
-cluster_data = qemu_try_blockalign(bs->file->bs,
-

Re: [Qemu-devel] [PATCH] test-replication: Lock AioContext around blk_unref()

2018-10-01 Thread Paolo Bonzini
On 01/10/2018 16:32, Kevin Wolf wrote:
> Recently, the test case has started failing because some job related
> functions want to drop the AioContext lock even though it hasn't been
> taken:
> 
> (gdb) bt
> #0  0x7f51c067c9fb in raise () from /lib64/libc.so.6
> #1  0x7f51c067e77d in abort () from /lib64/libc.so.6
> #2  0x558c9d5dde7b in error_exit (err=, 
> msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at 
> util/qemu-thread-posix.c:36
> #3  0x558c9d6b5263 in qemu_mutex_unlock_impl 
> (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f 
> "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96
> #4  0x558c9d6b0565 in aio_context_release 
> (ctx=ctx@entry=0x558c9f399940) at util/async.c:516
> #5  0x558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at 
> job.c:738
> #6  0x558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, 
> finish=finish@entry=0x558c9d5eb8d0 , errp=errp@entry=0x0) at 
> job.c:986
> #7  0x558c9d5eb8ee in job_cancel_sync (job=) at 
> job.c:941
> #8  0x558c9d64d853 in replication_close (bs=) at 
> block/replication.c:148
> #9  0x558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420
> #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629
> #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685
> #12 0x558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at 
> block/block-backend.c:783
> #13 0x558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at 
> block/block-backend.c:402
> #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457
> #15 0x558c9d5dfcea in test_secondary_stop () at 
> tests/test-replication.c:478
> #16 0x7f51c1f13178 in g_test_run_suite_internal () from 
> /lib64/libglib-2.0.so.0
> #17 0x7f51c1f1337b in g_test_run_suite_internal () from 
> /lib64/libglib-2.0.so.0
> #18 0x7f51c1f1337b in g_test_run_suite_internal () from 
> /lib64/libglib-2.0.so.0
> #19 0x7f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0
> #20 0x7f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0
> #21 0x558c9d5de31f in main (argc=, argv= out>) at tests/test-replication.c:581
> 
> It is yet unclear whether this should really be considered a bug in the
> test case or whether blk_unref() should work for callers that haven't
> taken the AioContext lock, but in order to fix the build tests quickly,
> just take the AioContext lock around blk_unref().

Given the backtrace, I think bdrv_close should be taking the AioContext
lock instead of blockdev_close_all_bdrv_states.

Paolo



Re: [Qemu-devel] [PATCH 3/7] qcow2: split out reading normal clusters from qcow2_co_preadv

2018-10-01 Thread Max Reitz
On 01.10.18 17:14, Vladimir Sementsov-Ogievskiy wrote:
> 27.09.2018 20:35, Max Reitz wrote:
>> On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:
>>> Memory allocation may become less efficient for encrypted case. It's a
>>> payment for further asynchronous scheme.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>  block/qcow2.c | 114 
>>> --
>>>  1 file changed, 64 insertions(+), 50 deletions(-)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 65e3ca00e2..5e7f2ee318 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -1808,6 +1808,67 @@ out:
>>>  return ret;
>>>  }
>>>  
>>> +static coroutine_fn int qcow2_co_preadv_normal(BlockDriverState *bs,
>>> +   uint64_t 
>>> file_cluster_offset,
>>> +   uint64_t offset,
>>> +   uint64_t bytes,
>>> +   QEMUIOVector *qiov,
>>> +   uint64_t qiov_offset)
>>> +{
>>> +int ret;
>>> +BDRVQcow2State *s = bs->opaque;
>>> +void *crypt_buf = NULL;
>>> +QEMUIOVector hd_qiov;
>>> +int offset_in_cluster = offset_into_cluster(s, offset);
>>> +
>>> +if ((file_cluster_offset & 511) != 0) {
>>> +return -EIO;
>>> +}
>>> +
>>> +qemu_iovec_init(&hd_qiov, qiov->niov);
>> So you're not just re-allocating the bounce buffer for every single
>> call, but also this I/O vector.  Hm.
>>
>>> +if (bs->encrypted) {
>>> +assert(s->crypto);
>>> +assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
>>> +
>>> +crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
>> 1. Why did you remove the comment?
>>
>> 2. The check for whether crypt_buf was successfully allocated is missing.
>>
>> 3. Do you have any benchmarks for encrypted images?  Having to allocate
>> the bounce buffer for potentially every single cluster sounds rather bad
>> to me.
> 
> Hmm, no excuses. 1- Will fix, 2 - will fix, 3 - will fix or at least
> test the performance.
> 
>>> +qemu_iovec_add(&hd_qiov, crypt_buf, bytes);
>>> +} else {
>>> +qemu_iovec_concat(&hd_qiov, qiov, qiov_offset, bytes);
>> qcow2_co_preadv() continues to do this as well.  That's superfluous now.
> 
> hd_qiov is local here.. or what do you mean?

qcow2_co_preadv() continues to have its own hd_qiov and appends qiov to
it (just like here), but it's unused for normal clusters.  So it doesn't
have to do that for normal clusters.

>>> +}
>>> +
>>> +BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
>>> +ret = bdrv_co_preadv(bs->file,
>>> + file_cluster_offset + offset_in_cluster,
>>> + bytes, &hd_qiov, 0);
>>> +if (ret < 0) {
>>> +goto out;
>>> +}
>>> +
>>> +if (bs->encrypted) {
>>> +assert(s->crypto);
>>> +assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
>>> +assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
>>> +if (qcrypto_block_decrypt(s->crypto,
>>> +  (s->crypt_physical_offset ?
>>> +   file_cluster_offset + offset_in_cluster 
>>> :
>>> +   offset),
>>> +  crypt_buf,
>>> +  bytes,
>>> +  NULL) < 0) {
>> What's the reason against decrypting this in-place in qiov so we could
>> save the bounce buffer?  We allow offsets in clusters, so why can't we
>> just call this function once per involved I/O vector entry?
> 
> Isn't it unsafe to do decryption in guest buffers?

I don't know.  Why would it be?  Because...

>> Max
>>
>>> +ret = -EIO;
>>> +goto out;
>>> +}
>>> +qemu_iovec_from_buf(qiov, qiov_offset, crypt_buf, bytes);

...we're putting the decrypted content there anyway.

The only issue I see is that the guest may see encrypted content for a
short duration.  Hm.  Maybe we don't want that.  In which case it
probably has to stay as it is.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PULL 0/3] Ui 20181001 patches

2018-10-01 Thread Peter Maydell
On 1 October 2018 at 11:42, Gerd Hoffmann  wrote:
> The following changes since commit 07f426c35eddd79388a23d11cb278600d7e3831d:
>
>   Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into 
> staging (2018-09-28 18:56:09 +0100)
>
> are available in the git repository at:
>
>   git://git.kraxel.org/qemu tags/ui-20181001-pull-request
>
> for you to fetch changes up to e8b1386ea1719525a1a92df03377764703fe8c64:
>
>   gtk: add zoom-to-fit to gtk options. (2018-10-01 11:29:03 +0200)
>
> 
> ui: some small fixes/improvements.
>
Applied, thanks.

-- PMM



Re: [Qemu-devel] [Qemu-ppc] [PATCH] PPC: e500: convert SysBus init method to a realize method

2018-10-01 Thread Greg Kurz
On Mon,  1 Oct 2018 17:04:13 +0200
Cédric Le Goater  wrote:

> Signed-off-by: Cédric Le Goater 
> ---

Reviewed-by: Greg Kurz 

>  hw/pci-host/ppce500.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/pci-host/ppce500.c b/hw/pci-host/ppce500.c
> index eb75e080fc10..b8f8c112e662 100644
> --- a/hw/pci-host/ppce500.c
> +++ b/hw/pci-host/ppce500.c
> @@ -436,8 +436,9 @@ static AddressSpace *e500_pcihost_set_iommu(PCIBus *bus, 
> void *opaque,
>  return &s->bm_as;
>  }
>  
> -static int e500_pcihost_initfn(SysBusDevice *dev)
> +static void e500_pcihost_realize(DeviceState *dev, Error **errp)
>  {
> +SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
>  PCIHostState *h;
>  PPCE500PCIState *s;
>  PCIBus *b;
> @@ -447,7 +448,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
>  s = PPC_E500_PCI_HOST_BRIDGE(dev);
>  
>  for (i = 0; i < ARRAY_SIZE(s->irq); i++) {
> -sysbus_init_irq(dev, &s->irq[i]);
> +sysbus_init_irq(sbd, &s->irq[i]);
>  }
>  
>  for (i = 0; i < PCI_NUM_PINS; i++) {
> @@ -460,7 +461,7 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
>  /* PIO lives at the bottom of our bus space */
>  memory_region_add_subregion_overlap(&s->busmem, 0, &s->pio, -2);
>  
> -b = pci_register_root_bus(DEVICE(dev), NULL, mpc85xx_pci_set_irq,
> +b = pci_register_root_bus(dev, NULL, mpc85xx_pci_set_irq,
>mpc85xx_pci_map_irq, s, &s->busmem, &s->pio,
>PCI_DEVFN(s->first_slot, 0), 4, TYPE_PCI_BUS);
>  h->bus = b;
> @@ -483,10 +484,8 @@ static int e500_pcihost_initfn(SysBusDevice *dev)
>  memory_region_add_subregion(&s->container, PCIE500_CFGADDR, 
> &h->conf_mem);
>  memory_region_add_subregion(&s->container, PCIE500_CFGDATA, 
> &h->data_mem);
>  memory_region_add_subregion(&s->container, PCIE500_REG_BASE, &s->iomem);
> -sysbus_init_mmio(dev, &s->container);
> +sysbus_init_mmio(sbd, &s->container);
>  pci_bus_set_route_irq_fn(b, e500_route_intx_pin_to_irq);
> -
> -return 0;
>  }
>  
>  static void e500_host_bridge_class_init(ObjectClass *klass, void *data)
> @@ -526,9 +525,8 @@ static Property pcihost_properties[] = {
>  static void e500_pcihost_class_init(ObjectClass *klass, void *data)
>  {
>  DeviceClass *dc = DEVICE_CLASS(klass);
> -SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
>  
> -k->init = e500_pcihost_initfn;
> +dc->realize = e500_pcihost_realize;
>  set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  dc->props = pcihost_properties;
>  dc->vmsd = &vmstate_ppce500_pci;




Re: [Qemu-devel] [PATCH 4/7] qcow2: async scheme for qcow2_co_preadv

2018-10-01 Thread Vladimir Sementsov-Ogievskiy

27.09.2018 21:35, Max Reitz wrote:

On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:

Start several async requests instead of read chunk by chunk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/qcow2.c | 208 --
  1 file changed, 204 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5e7f2ee318..a0df8d4e50 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1869,6 +1869,197 @@ out:
  return ret;
  }
  
+typedef struct Qcow2WorkerTask {

+uint64_t file_cluster_offset;
+uint64_t offset;
+uint64_t bytes;
+uint64_t bytes_done;
+} Qcow2WorkerTask;

Why don't you make this a union of request-specific structs?


ok, will try




+
+typedef int (*Qcow2DoWorkFunc)(BlockDriverState *bs, QEMUIOVector *qiov,
+   Qcow2WorkerTask *task);
+
+typedef struct Qcow2RWState {
+BlockDriverState *bs;
+QEMUIOVector *qiov;
+uint64_t bytes;

Maybe make it total_bytes so it doesn't conflict with the value in
Qcow2WorkerTask?


ok




+int ret;
+bool waiting_one;
+bool waiting_all;
+bool finalize;
+Coroutine *co;
+QSIMPLEQ_HEAD(, Qcow2Worker) free_workers;
+QSIMPLEQ_HEAD(, Qcow2Worker) busy_workers;
+int online_workers;
+Qcow2DoWorkFunc do_work_func;
+} Qcow2RWState;
+
+typedef struct Qcow2Worker {
+Qcow2RWState *rws;
+Coroutine *co;
+Qcow2WorkerTask task;
+bool busy;
+QSIMPLEQ_ENTRY(Qcow2Worker) entry;
+} Qcow2Worker;
+#define QCOW2_MAX_WORKERS 64

That's really a bit hidden here.  I think it should go into the header.

Also I'm not quite sure about the number.  In other places we've always
used 16.

(With the encryption code always allocating a new bounce buffer, this
can mean quite a bit of memory usage.)


No doubts.




+
+static coroutine_fn void qcow2_rw_worker(void *opaque);
+static Qcow2Worker *qcow2_new_worker(Qcow2RWState *rws)
+{
+Qcow2Worker *w = g_new0(Qcow2Worker, 1);
+w->rws = rws;
+w->co = qemu_coroutine_create(qcow2_rw_worker, w);
+
+return w;
+}
+
+static void qcow2_free_worker(Qcow2Worker *w)
+{
+g_free(w);
+}
+
+static coroutine_fn void qcow2_rw_worker(void *opaque)
+{
+Qcow2Worker *w = opaque;
+Qcow2RWState *rws = w->rws;
+
+rws->online_workers++;
+
+while (!rws->finalize) {
+int ret = rws->do_work_func(rws->bs, rws->qiov, &w->task);
+if (ret < 0 && rws->ret == 0) {
+rws->ret = ret;
+}
+
+if (rws->waiting_all || rws->ret < 0) {
+break;
+}
+
+w->busy = false;
+QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry);
+QSIMPLEQ_INSERT_TAIL(&rws->free_workers, w, entry);
+if (rws->waiting_one) {
+rws->waiting_one = false;
+/* we must unset it here, to prevent queuing rws->co in several
+ * workers (it may happen if other worker already waits us on 
mutex,
+ * so it will be entered after our yield and before rws->co enter)
+ *
+ * TODO: rethink this comment, as here (and in other places in the
+ * file) we moved from qemu_coroutine_add_next to aio_co_wake.
+ */
+aio_co_wake(rws->co);
+}
+
+qemu_coroutine_yield();
+}
+
+if (w->busy) {
+w->busy = false;
+QSIMPLEQ_REMOVE(&rws->busy_workers, w, Qcow2Worker, entry);
+}
+qcow2_free_worker(w);
+rws->online_workers--;
+
+if (rws->waiting_all && rws->online_workers == 0) {
+aio_co_wake(rws->co);
+}
+}
+
+static coroutine_fn void qcow2_rws_add_task(Qcow2RWState *rws,
+uint64_t file_cluster_offset,
+uint64_t offset,
+uint64_t bytes,
+uint64_t bytes_done)

I'd propose just taking a const Qcow2WorkerTask * here.  (Makes even
more sense if you make it a union.)


ok, I'll try this way




+{
+Qcow2Worker *w;
+
+assert(rws->co == qemu_coroutine_self());
+
+if (bytes_done == 0 && bytes == rws->bytes) {
+Qcow2WorkerTask task = {
+.file_cluster_offset = file_cluster_offset,
+.offset = offset,
+.bytes = bytes,
+.bytes_done = bytes_done
+};
+rws->ret = rws->do_work_func(rws->bs, rws->qiov, &task);

(If so, you'd just pass the pointer along here)


+return;
+}

I like this fast path, but I think it deserves a small comment.  (That
is a fast path and bypasses the whole worker infrastructure.)


+
+if (!QSIMPLEQ_EMPTY(&rws->free_workers)) {
+w = QSIMPLEQ_FIRST(&rws->free_workers);
+QSIMPLEQ_REMOVE_HEAD(&rws->free_workers, entry);
+} else if (rws->online_workers < QCOW2_MAX_WORKERS) {
+w = qcow2_new_worker(rws);
+} else {
+rws->waiting_one = true;
+qemu_

  1   2   3   >