[Qemu-devel] change x86 default machine type to Q35?

2017-07-04 Thread Chao Peng
Hi,
 
Q35 has been in QEMU for quite a while. Compared to the current default
i440FX, Q35 is probably not that mature and not widely used, however in
some case, Q35 has advantages, for example, in supporting new features.
For instance, we have some features require PCI-e support which is only
available on Q35 and some others need it for EFI support. It is of
course not necessary to change it as the default but if more and more
features have dependencies on Q35 because of requiring much more modern
features then I think it may be worth to do so. In such case we can have
more people to use it and find problems we may know or not know. There
are certainly some drawbacks:
-Compatibility: current code or script may need adjustment
-Quality: we may suffer more bugs on Q35
 
Any thoughts?
 
Chao



Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-04 Thread Tian, Kevin
> From: Liu, Yi L
> Sent: Monday, July 3, 2017 6:31 PM
> 
> Hi Jean,
> 
> 
> >
> > > 2. Define a structure in include/uapi/linux/iommu.h(newly added header
> file)
> > >
> > > struct iommu_tlb_invalidate {
> > >   __u32   scope;
> > > /* pasid-selective invalidation described by @pasid */
> > > #define IOMMU_INVALIDATE_PASID(1 << 0)
> > > /* address-selevtive invalidation described by (@vaddr, @size) */
> > > #define IOMMU_INVALIDATE_VADDR(1 << 1)

For VT-d above two flags are related. There is no method of flushing
(@vaddr, @size) for all pasids, which doesn't make sense. address-
selective invalidation is valid only for a given pasid. So it's not appropriate
to put them in same level of scope definition at least for VT-d.

> > >   __u32   flags;
> > > /*  targets non-pasid mappings, @pasid is not valid */
> > > #define IOMMU_INVALIDATE_NO_PASID (1 << 0)
> >
> > Although it was my proposal, I don't like this flag. In ARM SMMU, we're
> > using a special mode where PASID 0 is reserved and any traffic without
> > PASID uses entry 0 of the PASID table. So I proposed the "NO_PASID" flag
> > to invalidate that special context explicitly. But this means that
> > invalidation packet targeted at that context will have "scope = PASID" and
> > "flags = NO_PASID", which is utterly confusing.
> >
> > I now think that we should get rid of the IOMMU_INVALIDATE_NO_PASID
> flag
> > and just use PASID 0 to invalidate this context on ARM. I don't think
> > other architectures would use the NO_PASID flag anyway, but might be
> mistaken.
> 
> I may suggest to keep it so far. On VT-d, we may pass some data in opaque,
> so
> we may work without it. But if other vendor want to issue non-PASID tagged
> cache, then may encounter problem.

I'm worried about what's the criteria which attribute should be abstracted
in common structure and which can be left to opaque. It doesn't make
much sense to do such abstraction purely because different vendor formats
have some common fields. Usually we do such abstraction because 
vendor-agnostic code need to do some common handling before going to
vendor specific code. However in this case VFIO is not expected to do anything
with those IOMMU specific attributes. Then the structure is directly forwarded
to IOMMU driver, which simply translates the structure into vendor specific
opaque data again. Then why bothering to do double translations in Qemu
and IOMMU driver side?

Take VT-d for example. Below is a summary of all possible selections around
invalidation of 1st level structure for svm:

Scope: All PASIDs, single PASID
for each PASID:
all mappings, or page-selective mappings (addr, size)
invalidation target:
IOTLB entries (leaf)
paging structure cache (non-leaf)
PASID cache (pasid->cr3)
invalidation hint:
whether global pages are included
drain reads/writes

Above are pretty architectural attributes if just looking at functional
purpose. Then if we really consider defining a common structure, it
might be more natural to define a superset of all vendors' capabilities
and remove the opaque field at all. But as said earlier the purpose of
doing such abstraction is not clear if there is no vendor-agnostic
user actually digesting those fields. Then should we reconsider the
full opaque approach?

Welcome comments since I may overlook something here. :-)

Thanks
Kevin



Re: [Qemu-devel] [RFC v2 6/8] virtio-iommu: Implement the translation and commands

2017-07-04 Thread Auger Eric
Hi Bharat,

On 04/07/2017 11:13, Bharat Bhushan wrote:
> Hi Eric,
> 
>> -Original Message-
>> From: Eric Auger [mailto:eric.au...@redhat.com]
>> Sent: Wednesday, June 07, 2017 9:31 PM
>> To: eric.auger@gmail.com; eric.au...@redhat.com;
>> peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com;
>> qemu-...@nongnu.org; qemu-devel@nongnu.org; jean-
>> philippe.bruc...@arm.com
>> Cc: will.dea...@arm.com; robin.mur...@arm.com; kevin.t...@intel.com;
>> marc.zyng...@arm.com; christoffer.d...@linaro.org; drjo...@redhat.com;
>> w...@redhat.com; t...@semihalf.com; Bharat Bhushan
>> 
>> Subject: [RFC v2 6/8] virtio-iommu: Implement the translation and
>> commands
>>
>> This patch adds the actual implementation for the translation routine
>> and the virtio-iommu commands.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> v1 -> v2:
>> - fix compilation issue reported by autobuild system
>> ---
>>  hw/virtio/trace-events   |   6 ++
>>  hw/virtio/virtio-iommu.c | 202
>> +--
>>  2 files changed, 202 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index 341dbdf..9196b63 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -33,3 +33,9 @@ virtio_iommu_detach(uint32_t dev, uint32_t flags)
>> "dev=%d flags=%d"
>>  virtio_iommu_map(uint32_t as, uint64_t phys_addr, uint64_t virt_addr,
>> uint64_t size, uint32_t flags) "as= %d phys_addr=0x%"PRIx64"
>> virt_addr=0x%"PRIx64" size=0x%"PRIx64" flags=%d"
>>  virtio_iommu_unmap(uint32_t as, uint64_t virt_addr, uint64_t size, uint32_t
>> reserved) "as= %d virt_addr=0x%"PRIx64" size=0x%"PRIx64" reserved=%d"
>>  virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int
>> flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
>> +virtio_iommu_new_asid(uint32_t asid) "Allocate a new asid=%d"
>> +virtio_iommu_new_devid(uint32_t devid) "Allocate a new devid=%d"
>> +virtio_iommu_unmap_left_interval(uint64_t low, uint64_t high, uint64_t
>> next_low, uint64_t next_high) "Unmap left [0x%"PRIx64",0x%"PRIx64"],
>> new interval=[0x%"PRIx64",0x%"PRIx64"]"
>> +virtio_iommu_unmap_right_interval(uint64_t low, uint64_t high, uint64_t
>> next_low, uint64_t next_high) "Unmap right [0x%"PRIx64",0x%"PRIx64"],
>> new interval=[0x%"PRIx64",0x%"PRIx64"]"
>> +virtio_iommu_unmap_inc_interval(uint64_t low, uint64_t high) "Unmap inc
>> [0x%"PRIx64",0x%"PRIx64"]"
>> +virtio_iommu_translate_result(uint64_t virt_addr, uint64_t phys_addr,
>> uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 902c779..cd188fc 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -32,10 +32,37 @@
>>  #include "hw/virtio/virtio-bus.h"
>>  #include "hw/virtio/virtio-access.h"
>>  #include "hw/virtio/virtio-iommu.h"
>> +#include "hw/pci/pci_bus.h"
>> +#include "hw/pci/pci.h"
>>
>>  /* Max size */
>>  #define VIOMMU_DEFAULT_QUEUE_SIZE 256
>>
>> +typedef struct viommu_as viommu_as;
>> +
>> +typedef struct viommu_mapping {
>> +uint64_t virt_addr;
>> +uint64_t phys_addr;
>> +uint64_t size;
>> +uint32_t flags;
>> +} viommu_mapping;
>> +
>> +typedef struct viommu_interval {
>> +uint64_t low;
>> +uint64_t high;
>> +} viommu_interval;
>> +
>> +typedef struct viommu_dev {
>> +uint32_t id;
>> +viommu_as *as;
>> +} viommu_dev;
>> +
>> +struct viommu_as {
>> +uint32_t id;
>> +uint32_t nr_devices;
>> +GTree *mappings;
>> +};
>> +
>>  static inline uint16_t smmu_get_sid(IOMMUDevice *dev)
>>  {
>>  return  ((pci_bus_num(dev->bus) & 0xff) << 8) | dev->devfn;
>> @@ -88,6 +115,19 @@ static void virtio_iommu_init_as(VirtIOIOMMU *s)
>>  }
>>  }
>>
>> +static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer
>> user_data)
>> +{
>> +viommu_interval *inta = (viommu_interval *)a;
>> +viommu_interval *intb = (viommu_interval *)b;
>> +
>> +if (inta->high <= intb->low) {
>> +return -1;
>> +} else if (intb->high <= inta->low) {
>> +return 1;
>> +} else {
>> +return 0;
>> +}
>> +}
>>
>>  static int virtio_iommu_attach(VirtIOIOMMU *s,
>> struct virtio_iommu_req_attach *req)
>> @@ -95,10 +135,34 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
>>  uint32_t asid = le32_to_cpu(req->address_space);
>>  uint32_t devid = le32_to_cpu(req->device);
>>  uint32_t reserved = le32_to_cpu(req->reserved);
>> +viommu_as *as;
>> +viommu_dev *dev;
>>
>>  trace_virtio_iommu_attach(asid, devid, reserved);
>>
>> -return VIRTIO_IOMMU_S_UNSUPP;
>> +dev = g_tree_lookup(s->devices, GUINT_TO_POINTER(devid));
>> +if (dev) {
>> +return -1;
>> +}
>> +
>> +as = g_tree_lookup(s->address_spaces, GUINT_TO_POINTER(asid));
>> +if (!as) {
>> +as = g_malloc0(sizeof(*as));
>> +as->id = asid;
>> +as->mappings = g_tree_new_full((GC

[Qemu-devel] 答复: Re: [PATCH RESEND v6] qga: Add support networkinterface statistics in guest-network-get-interfaces command

2017-07-04 Thread lu.zhipeng
Thanks for the review.




>Hi>On Tue, Jul 4, 2017 at 10:51 AM ZhiPeng Lu  wrote:> 
>>we can get the network interface statistics inside a virtual machine by> 
>>guest-network-get-interfaces command. it is very useful for us to monitor> 
>>and analyze network traffic.


>It's nicer if you give v1->v2->..->v6 change summary at each revision

it's a good  suggestion.




>> Signed-off-by: ZhiPeng Lu >>--->>  
>> qga/commands-posix.c | 80>> 
>> +++->>  qga/qapi-schema.json 
>> | 38 ->>  2 files changed, 116 insertions(+), 2 
>> deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c>> 
>> index 915df9e..233b024 100644>> --- a/qga/commands-posix.c>> +++ 
>> b/qga/commands-posix.c>> @@ -1638,6 +1638,73 @@ 
>> guest_find_interface(GuestNetworkInterfaceList>> *head,>>  return head>> 
>>  } +>> +static int str_trim_off(const char *s, int off, int lmt)>> +{>> 
>> +for ( off < lmt ++off) {>> +if (!isspace(s[off])) {>> + 
>>break>> +}>> +}>> +return off>> +}>> +>> +static int 
>> guest_get_network_stats(const char *name,>> +   
>> GuestNetworkInterfaceStat *stats)>> +{>> +int name_len>> +char const 
>> *devinfo = "/proc/net/dev">> +FILE *fp>> +char *line = NULL, 
>> *colon>> +size_t n>> +fp = fopen(devinfo, "r")>> +if (!fp) {>>+  
>>   return -1>> +}>> +name_len = strlen(name)>> +while 
>> (getline(&line, &n, fp) != -1) {>> +long long dummy>> +long 
>> long rx_bytes>> +long long rx_packets>> +long long rx_errs>> 
>> +long long rx_dropped>> +long long tx_bytes>> +long 
>> long tx_packets>> +long long tx_errs>> +long long 
>> tx_dropped>>> Why have local variables, and not pass stats->filed directly?

if passing stats->filed directly ,stats->filed stores wrong data when it 
doesn't  find the network interface.


>>+int trim_off>>+colon = strchr(line, ':')>> +if 
>>(!colon) {>> +continue>> +}>> +trim_off = 
>>str_trim_off(line, 0, strlen(line))>>>Couldn't you  call g_strchomp() ?







i agreed,i will call g_strchomp().

>> +if (colon - name_len - trim_off == line &&>>+   
>> strncmp(line + trim_off, name, colon - line - trim_off) == 0) {>> +  
>>   if (sscanf(colon + 1,>> +"%lld %lld %lld %lld %lld %lld 
>> %lld %lld %lld %lld %lld>> %lld %lld %lld %lld %lld",>> +  
>> &rx_bytes, &rx_packets, &rx_errs, &rx_dropped,>> +  &dummy, 
>> &dummy, &dummy, &dummy,>> +  &tx_bytes, &tx_packets, 
>> &tx_errs, &tx_dropped,>> +  &dummy, &dummy, &dummy, &dummy) 
>> != 16) {>> +continue>> +}>> +
>> stats->rx_bytes = rx_bytes>>+stats->rx_packets = rx_packets>> +  
>>  stats->rx_errs = rx_errs>> +stats->rx_dropped = 
>> rx_dropped>> +stats->tx_bytes = tx_bytes>> +
>> stats->tx_packets = tx_packets>> +stats->tx_errs = tx_errs>> +   
>>  stats->tx_dropped = tx_dropped>> +fclose(fp)>> +
>> return 0>> +}>> +}>> +fclose(fp)>> +
>> g_debug("/proc/net/dev: Interface not found")>> +return -1>> +}>> +>>  
>> /*>>   * Build information about guest interfaces>>  */>> @@ -1654,6 +1721,7 
>> @@ GuestNetworkInterfaceList>> *qmp_guest_network_get_interfaces(Error 
>> **errp)>>  for (ifa = ifap ifa ifa = ifa->ifa_next) {> > 
>> GuestNetworkInterfaceList *info> > GuestIpAddressList **address_list 
>> = NULL, *address_item = NULL>> +GuestNetworkInterfaceStat  
>> *interface_stat = NULL>>  char addr4[INET_ADDRSTRLEN]>>  
>> char addr6[INET6_ADDRSTRLEN]>>  int sock>> @@ -1773,7 +1841,17 @@ 
>> GuestNetworkInterfaceList>> *qmp_guest_network_get_interfaces(Error 
>> **errp)  info->value->has_ip_addresses = true ->> +
>> if (!info->value->has_statistics) {>> +interface_stat = 
>> g_malloc0(sizeof(*interface_stat))>> +if 
>> (guest_get_network_stats(info->value->name,>> +
>> interface_stat) == -1) {>> +info->value->has_statistics = 
>> false>> +g_free(interface_stat)>> +} else {>> +  
>>   info->value->statistics = interface_stat>> +
>> info->value->has_statistics = true>> +}>> +}>>  
>> }  freeifaddrs(ifap)>> diff --git a/qga/qapi-schema.json 
>> b/qga/qapi-schema.json>> index a02dbf2..948219b 100644>> --- 
>> a/qga/qapi-schema.json>> +++ b/qga/qapi-schema.json>> @@ -635,6 +635,38 @@>> 
>> 'prefix': 'int'} }  ##>> +# @GuestNetworkInterfaceStat:>> 
>> +#>> +# @rx-bytes: total bytes received>> +#>> +# @rx-packets: total packets 
>> received>>

Re: [Qemu-devel] [PATCH RFC v19 00/13] QEMU AVR 8 bit cores

2017-07-04 Thread Michael Rolnik
Hi Richard.

Thank you for finding a bug.
As for the testing
1. I have a small program that calculates fibonacci numbers. I use it to
test mainly stack operations for different CPU flavors (1, 2 or 3 bytes PC).
2. I manually verified that I can debug with gdb, that includes
a. stepping
b. running
c. stopping
d. breakpoints
e. backtrace shows correct stack
3. I manually verified that I can save and load vm states
4. `make check` passes

If you please point me to a doc that explains how automatic unit tests can
be added to a QEMU target, I will happily add them.


Regards,
Michael


On Wed, Jul 5, 2017 at 1:38 AM, Richard Henderson  wrote:

> On 06/21/2017 09:15 PM, Michael Rolnik wrote:
>
>> Hi all,
>>
>> are there any action items for me?
>>
>
> What kind of testing are you doing for this?
>
> I just briefly browsed through the code again and happened to see that ROR
> has a critical typo.  Considering that ROR must be used for multi-byte
> shifts, I'm wondering how you wouldn't have found this via even cursory
> testing.
>
> +int avr_translate_ROR(CPUAVRState *env, DisasContext *ctx, uint32_t
>> opcode)
>> +{
>> +TCGv Rd = cpu_r[ROR_Rd(opcode)];
>> +TCGv t0 = tcg_temp_new_i32();
>> +
>> +tcg_gen_shli_tl(t0, cpu_Cf, 7);
>> +tcg_gen_andi_tl(cpu_Cf, Rd, 0);
>>
>
>
> r~
>



-- 
Best Regards,
Michael Rolnik


Re: [Qemu-devel] [PATCH] virtio-scsi: finalize IOMMU support

2017-07-04 Thread Paolo Bonzini


On 05/07/2017 08:23, Fam Zheng wrote:
>>  req->vq = vq;
>>  req->dev = s;
>> -qemu_sglist_init(&req->qsgl, DEVICE(s), 8, &address_space_memory);
>> +qemu_sglist_init(&req->qsgl, DEVICE(s), 8, vdev->dma_as);
> Not related to this patch, but is our vIOMMU iotlb implementation safe for 
> data
> plane (both virtio-blk and virtio-scsi)? There we don't have BQL to 
> synchronize
> between IOVA lookup and update, which means threads can race when accessing 
> the
> iotlb GHashTable.
> 
> Did I overlook a synchronization mechanism? Or is it missing? Paolo, MST?

I think it's missing.  One way to fix it could be to finish the
MemoryRegionCache work and have an IOMMU->virtio notifier, similar to
how vhost works.

Paolo



Re: [Qemu-devel] [PATCH] virtio-scsi: finalize IOMMU support

2017-07-04 Thread Fam Zheng
On Tue, 07/04 20:21, Jason Wang wrote:
> After converting to use DMA api for virtio devices, we should use
> dma_as instead of address_space_memory. Otherwise it won't work if
> IOMMU is enabled.
> 
> Fixes: commit 8607f5c3072c ("virtio: convert to use DMA api")
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Jason Wang 
> ---
>  hw/scsi/virtio-scsi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index f46f06d..d076fe7 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -43,12 +43,13 @@ static inline SCSIDevice 
> *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
>  
>  void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
>  {
> +VirtIODevice *vdev = VIRTIO_DEVICE(s);
>  const size_t zero_skip =
>  offsetof(VirtIOSCSIReq, resp_iov) + sizeof(req->resp_iov);
>  
>  req->vq = vq;
>  req->dev = s;
> -qemu_sglist_init(&req->qsgl, DEVICE(s), 8, &address_space_memory);
> +qemu_sglist_init(&req->qsgl, DEVICE(s), 8, vdev->dma_as);

Not related to this patch, but is our vIOMMU iotlb implementation safe for data
plane (both virtio-blk and virtio-scsi)? There we don't have BQL to synchronize
between IOVA lookup and update, which means threads can race when accessing the
iotlb GHashTable.

Did I overlook a synchronization mechanism? Or is it missing? Paolo, MST?

Fam

>  qemu_iovec_init(&req->resp_iov, 1);
>  memset((uint8_t *)req + zero_skip, 0, sizeof(*req) - zero_skip);
>  }
> -- 
> 2.7.4
> 
> 



[Qemu-devel] [PATCH] vnc: Set default kbd delay to 10ms

2017-07-04 Thread Alexander Graf
The default keyboard delay time in the input layer is 10ms. I don't know
how that number came to be, but empirical tests on some OpenQA driven ARM
systems show that 10ms really is a reasonable default number for the delay.

With the current 1ms we're constantly typing faster than the guest receives
keyboard events from our XHCI attached USB HID device.

This patch moves the delay to 10ms. That way our default is much safer (good!)
and also consistent with the input layer default (also good!).

Signed-off-by: Alexander Graf 
---
 qemu-options.hx | 2 +-
 ui/vnc.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 896ff17..64806f0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1730,7 +1730,7 @@ spec but is traditional QEMU behavior.
 @item key-delay-ms
 
 Set keyboard delay, for key down and key up events, in milliseconds.
-Default is 1.  Keyboards are low-bandwidth devices, so this slowdown
+Default is 10.  Keyboards are low-bandwidth devices, so this slowdown
 can help the device and guest to keep up and not lose events in case
 events are arriving in bulk.  Possible causes for the latter are flaky
 network connections, or scripts for automated testing.
diff --git a/ui/vnc.c b/ui/vnc.c
index 26136f5..eb91559 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -3808,7 +3808,7 @@ void vnc_display_open(const char *id, Error **errp)
 }
 
 lock_key_sync = qemu_opt_get_bool(opts, "lock-key-sync", true);
-key_delay_ms = qemu_opt_get_number(opts, "key-delay-ms", 1);
+key_delay_ms = qemu_opt_get_number(opts, "key-delay-ms", 10);
 sasl = qemu_opt_get_bool(opts, "sasl", false);
 #ifndef CONFIG_VNC_SASL
 if (sasl) {
-- 
1.8.5.6




Re: [Qemu-devel] Managing architectural restrictions with -device and libvirt

2017-07-04 Thread Markus Armbruster
Copying Marcel for PCI expertise.

Mark Cave-Ayland  writes:

> Hi all,
>
> I've been working on a patchset that brings the sun4u machine on
> qemu-system-sparc64 much closer to a real Ultra 5, however due to
> various design restrictions I need to be able to restrict how devices
> are added to the machine with -device.
>
> On a real Ultra 5, the root PCI bus (sabre) has 2 PCI bridges (simba A
> and simba B) with the onboard devices attached to simba A with 2 free
> slots, and an initially empty simba B.
>
> Firstly, is it possible to restrict the machine so that devices cannot
> be directly plugged into the root PCI bus, but only behind one of the
> PCI bridges? There is also an additional restriction in that slot 0
> behind simba A must be left empty to ensure that the ebus (containing
> the onboard devices) is the first device allocated.

I figure sabre, simba A, simba B and the onboard devices attached to
simba A are all created by MachineClass init().

What device provides "the ebus", and how is it created?

Can you provide a list of all onboard PCI devices and how they are
connected?  Diagram would be best.

The real sabre has two slots, and doesn't support hot (un)plug.  Can we
simply model that?  If yes, the root PCI bus is full after init(), and
remains full.  Takes care of "cannot directly plugged into the root PCI
bus".

> Secondly, how does libvirt handle these type of restrictions? Is it able
> to get the information from QEMU or is there some kind of libvirt
> profile that needs to be updated? And do newer versions of libvirt have
> the ability to attach devices behind PCI bridges using a GUI such as
> virt-manager, or is that only something that can only be done by
> directly editing the domain XML?

Got nothing to add to Dan's reply.



Re: [Qemu-devel] [RFC PATCH 7/8] VFIO: Add new IOCTL for IOMMU TLB invalidate propagation

2017-07-04 Thread Tian, Kevin
> From: Liu, Yi L [mailto:yi.l@linux.intel.com]
> Sent: Sunday, May 14, 2017 6:55 PM
> 
> On Fri, May 12, 2017 at 03:58:43PM -0600, Alex Williamson wrote:
> > On Wed, 26 Apr 2017 18:12:04 +0800
> > "Liu, Yi L"  wrote:
> >
> > > From: "Liu, Yi L" 
> > >
> > > This patch adds VFIO_IOMMU_TLB_INVALIDATE to propagate IOMMU
> TLB
> > > invalidate request from guest to host.
> > >
> > > In the case of SVM virtualization on VT-d, host IOMMU driver has
> > > no knowledge of caching structure updates unless the guest
> > > invalidation activities are passed down to the host. So a new
> > > IOCTL is needed to propagate the guest cache invalidation through
> > > VFIO.
> > >
> > > Signed-off-by: Liu, Yi L 
> > > ---
> > >  include/uapi/linux/vfio.h | 9 +
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > > index 6b97987..50c51f8 100644
> > > --- a/include/uapi/linux/vfio.h
> > > +++ b/include/uapi/linux/vfio.h
> > > @@ -564,6 +564,15 @@ struct vfio_device_svm {
> > >
> > >  #define VFIO_IOMMU_SVM_BIND_TASK _IO(VFIO_TYPE, VFIO_BASE +
> 22)
> > >
> > > +/* For IOMMU TLB Invalidation Propagation */
> > > +struct vfio_iommu_tlb_invalidate {
> > > + __u32   argsz;
> > > + __u32   length;
> > > + __u8data[];
> > > +};
> > > +
> > > +#define VFIO_IOMMU_TLB_INVALIDATE_IO(VFIO_TYPE, VFIO_BASE +
> 23)
> >
> > I'm kind of wondering why this isn't just a new flag bit on
> > vfio_device_svm, the data structure is so similar.  Of course data
> > needs to be fully specified in uapi.
> 
> Hi Alex,
> 
> For this part, it depends on using opaque structure or not. The following
> link mentioned it in [Open] session.
> 
> http://www.spinics.net/lists/kvm/msg148798.html
> 
> If we pick the full opaque solution for iommu tlb invalidate propagation.
> Then I may add a flag bit on vfio_device_svm and also add definition in
> uapi as you suggested.
> 

there is another benefit to keep it as a separate command. For now
we only need to invalidate 1st level translation (GVA->GPA) for SVM,
since 1st level page table is provided by guest while directly walked
by IOMMU. It's possible some vendor may also choose to implement
a nested 2nd level translation (e.g. GIOVA->GPA->HPA) then hardware
can directly walk guest GIOVA page table thus explicit invalidation is
also required. We'd better not to limit invalidation interface with 
svm structure.

Thanks
Kevin



Re: [Qemu-devel] [PATCH 00/35] RFC: coroutine annotations & clang check

2017-07-04 Thread Markus Armbruster
Marc-André Lureau  writes:

> Hi,
>
> After investigating a bit using clang-tidy to do some coroutine checks
> (and hitting a wall as there are no pre-processor info in the AST), it
> was suggested to me on the clang mailing list to try to use
> -Wthread-safety. I had to modify clang a bit to make it work on qemu
> code base (annotations on function typedef etc,
> https://github.com/elmarco/clang qemu-ta branch - very hackish state).

Hardcore!

Are your clang modifications upstreamable once cleaned up?

> The analysis simply checks that coroutine_fn are called from a
> coroutine "context" (or "role"). I couldn't find any misuse in qemu
> code base, however, a number of coroutine_fn annotations are missing.
>
> (I think it would make sense to squash all the "mark coroutine_fn"
> commits if we apply them, I tried to split them by domains/maintainer
> to ease review)
[...]
>  49 files changed, 299 insertions(+), 132 deletions(-)

"A number of coroutine_fn annotations are missing" seems to be an
understatement :)



Re: [Qemu-devel] [PATCH v7 4/4] net/socket: Improve -net socket error reporting

2017-07-04 Thread Mao Zhongyi



On 07/05/2017 12:24 AM, Markus Armbruster wrote:

Mao Zhongyi  writes:


When -net socket fails, it first reports a specific error, then
a generic one, like this:

$ qemu-system-x86_64 -net socket,
qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
mcast= or udp= is required
qemu-system-x86_64: -net socket: Device 'socket' could not be initialized

Convert net_socket_*_init() to Error to get rid of the superfluous second
error message. After the patch, the effect like this:

$ qemu-system-x86_64 -net socket,
qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
mcast= or udp= is required

At the same time, add many explicit error handling message when it fails.

Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: berra...@redhat.com
Signed-off-by: Mao Zhongyi 
---
 net/socket.c | 94 +---
 1 file changed, 45 insertions(+), 49 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index bd80b3c..a891c3a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -494,22 +494,21 @@ static void net_socket_accept(void *opaque)
 static int net_socket_listen_init(NetClientState *peer,
   const char *model,
   const char *name,
-  const char *host_str)
+  const char *host_str,
+  Error **errp)
 {
 NetClientState *nc;
 NetSocketState *s;
 struct sockaddr_in saddr;
 int fd, ret;
-Error *err = NULL;

-if (parse_host_port(&saddr, host_str, &err) < 0) {
-error_report_err(err);
+if (parse_host_port(&saddr, host_str, errp) < 0) {
 return -1;
 }

 fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
 if (fd < 0) {
-perror("socket");
+error_setg_errno(errp, errno, "failed to create stream socket");
 return -1;
 }
 qemu_set_nonblock(fd);
@@ -518,13 +517,14 @@ static int net_socket_listen_init(NetClientState *peer,

 ret = bind(fd, (struct sockaddr *)&saddr, sizeof(saddr));
 if (ret < 0) {
-perror("bind");
+error_setg_errno(errp, errno, "bind ip=%s to socket failed",
+ inet_ntoa(saddr.sin_addr));


Comment on the same error message in PATCH 2 applies.


OK, I will fix the similar problems right away.




 closesocket(fd);
 return -1;
 }
 ret = listen(fd, 0);
 if (ret < 0) {
-perror("listen");
+error_setg_errno(errp, errno, "listen socket failed");


Suggest "can't listen on socket".


OK, I will.



-fd = net_socket_mcast_create(&saddr, param_localaddr, &err);

[...]

 if (sock->has_listen) {
-if (net_socket_listen_init(peer, "socket", name, sock->listen) == -1) {
+if (net_socket_listen_init(peer, "socket", name,
+sock->listen, errp) == -1) {


Please avoid breaking the line in the middle of an operator's operand
when you could just as well break it at the operator, like this:

   if (net_socket_listen_init(peer, "socket", name, sock->listen, errp)
   == -1) {

Since you're touching this line anyway, I suggest to replace == -1 by
the more idiomatic < 0.



OK, I see.

Thanks,
Mao








Re: [Qemu-devel] [PATCH v7 2/4] net/socket: Convert several helper functions to Error

2017-07-04 Thread Mao Zhongyi

Hi, Markus

On 07/04/2017 10:54 PM, Markus Armbruster wrote:

Mao Zhongyi  writes:


Currently, net_socket_mcast_create(), net_socket_fd_init_dgram() and
net_socket_fd_init() use the function such as fprintf(), perror() to
report an error message.

Now, convert these functions to Error.

Cc: jasow...@redhat.com
Cc: arm...@redhat.com
Cc: berra...@redhat.com
Signed-off-by: Mao Zhongyi 
Reviewed-by: Daniel P. Berrange 
---
 net/socket.c | 81 +---
 1 file changed, 50 insertions(+), 31 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 7d05e70..44fb504 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -209,7 +209,9 @@ static void net_socket_send_dgram(void *opaque)
 }
 }

-static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct 
in_addr *localaddr)
+static int net_socket_mcast_create(struct sockaddr_in *mcastaddr,
+   struct in_addr *localaddr,
+   Error **errp)
 {
 struct ip_mreq imr;
 int fd;
@@ -221,16 +223,16 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 #endif

 if (!IN_MULTICAST(ntohl(mcastaddr->sin_addr.s_addr))) {
-fprintf(stderr, "qemu: error: specified mcastaddr \"%s\" (0x%08x) "
-"does not contain a multicast address\n",
-inet_ntoa(mcastaddr->sin_addr),
-(int)ntohl(mcastaddr->sin_addr.s_addr));
+error_setg(errp, "specified mcastaddr %s (0x%08x) "
+   "does not contain a multicast address",
+   inet_ntoa(mcastaddr->sin_addr),
+   (int)ntohl(mcastaddr->sin_addr.s_addr));
 return -1;



You could drop the empty line here, and ...


OK, I see.




 }


... maybe insert one here.


 fd = qemu_socket(PF_INET, SOCK_DGRAM, 0);
 if (fd < 0) {
-perror("socket(PF_INET, SOCK_DGRAM)");
+error_setg_errno(errp, errno, "failed to create datagram socket");


I suggested "can't create datagram socket" in my review of v5.  I prefer
"can't " over "failed to " myself, but grep
shows more strings starting "failed to" than with "can't".  I guess
reporting "failed to " is more common.  Okay.



Yes, I also prefer "can't ", but I found the "failed to "
in source code more generic and thus borrowed from it. I think both of these 
are ok,
it's depends on individual style. I decided to follow my heart and use "can't 
..."
in the next version, thanks.



 return -1;
 }

@@ -242,13 +244,15 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 val = 1;
 ret = qemu_setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val));
 if (ret < 0) {
-perror("setsockopt(SOL_SOCKET, SO_REUSEADDR)");
+error_setg_errno(errp, errno, "set the socket 'SO_REUSEADDR'"
+ " attribute failed");


Adapting my review of v5 to the next instance: I'm not a native speaker,
but "set FOO failed" doesn't feel right to me.  Where's the subject?
"Create" is a verb.  "Setting FOO failed" has a subject, but doesn't
feel right.  What about "can't set socket option SO_REUSEADDR"?


I‘ll examine all the similar issues and fix them completely in the next
version.



Also, please avoid breaking the line in the middle of an argument when
you could just as well break it between arguments, like this:

   error_setg_errno(errp, errno,
"can't set socket attribute SO_REUSEADDR");



Yes, it is more reasonable than before. But sometimes if the argument is
so long that it has to break the line in the middle of an argument.


 goto fail;
 }

 ret = bind(fd, (struct sockaddr *)mcastaddr, sizeof(*mcastaddr));
 if (ret < 0) {
-perror("bind");
+error_setg_errno(errp, errno, "bind ip=%s to socket failed",
+ inet_ntoa(mcastaddr->sin_addr));


Likewise, "can't bind ip=%s to socket".


 goto fail;
 }

@@ -263,7 +267,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret = qemu_setsockopt(fd, IPPROTO_IP, IP_ADD_MEMBERSHIP,
   &imr, sizeof(struct ip_mreq));
 if (ret < 0) {
-perror("setsockopt(IP_ADD_MEMBERSHIP)");
+error_setg_errno(errp, errno, "add socket to multicast group %s"
+ " failed", inet_ntoa(imr.imr_multiaddr));


Likewise.


 goto fail;
 }

@@ -272,7 +277,8 @@ static int net_socket_mcast_create(struct sockaddr_in 
*mcastaddr, struct in_addr
 ret = qemu_setsockopt(fd, IPPROTO_IP, IP_MULTICAST_LOOP,
   &loop, sizeof(loop));
 if (ret < 0) {
-perror("setsockopt(SOL_IP, IP_MULTICAST_LOOP)");
+error_setg_errno(errp, errno, "force multicast message to loopback"
+ " failed");


Likwise.


 goto fail;
 }

@@ -281,7 +287,8 @@ static int net_s

Re: [Qemu-devel] [PATCH] virtio-scsi: finalize IOMMU support

2017-07-04 Thread Wei Xu
On Tue, Jul 04, 2017 at 08:21:06PM +0800, Jason Wang wrote:
> After converting to use DMA api for virtio devices, we should use
> dma_as instead of address_space_memory. Otherwise it won't work if
> IOMMU is enabled.
> 
> Fixes: commit 8607f5c3072c ("virtio: convert to use DMA api")
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Jason Wang 
> ---
>  hw/scsi/virtio-scsi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index f46f06d..d076fe7 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -43,12 +43,13 @@ static inline SCSIDevice 
> *virtio_scsi_device_find(VirtIOSCSI *s, uint8_t *lun)
>  
>  void virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue *vq, VirtIOSCSIReq *req)
>  {
> +VirtIODevice *vdev = VIRTIO_DEVICE(s);
>  const size_t zero_skip =
>  offsetof(VirtIOSCSIReq, resp_iov) + sizeof(req->resp_iov);
>  
>  req->vq = vq;
>  req->dev = s;
> -qemu_sglist_init(&req->qsgl, DEVICE(s), 8, &address_space_memory);
> +qemu_sglist_init(&req->qsgl, DEVICE(s), 8, vdev->dma_as);
>  qemu_iovec_init(&req->resp_iov, 1);
>  memset((uint8_t *)req + zero_skip, 0, sizeof(*req) - zero_skip);
>  }

Reviewed-by: Wei Xu 

> -- 
> 2.7.4
> 
> 



Re: [Qemu-devel] [PATCH v4 1/1] virtio-scsi-ccw: use ioeventfd even when KVM is disabled

2017-07-04 Thread QingFeng Hao



在 2017/7/4 22:04, Christian Borntraeger 写道:

On 07/04/2017 03:23 PM, QingFeng Hao wrote:

This patch is based on a similar patch from Stefan Hajnoczi -
commit c324fd0a39c ("virtio-pci: use ioeventfd even when KVM is disabled")

Do not check kvm_eventfds_enabled() when KVM is disabled since it
always returns 0.  Since commit 8c56c1a592b5092d91da8d8943c1d6462a6f
("memory: emulate ioeventfd") it has been possible to use ioeventfds in
qtest or TCG mode.

This patch makes -device virtio-scsi-ccw,iothread=iothread0 work even
when KVM is disabled.
Currently we don't have an equivalent to "memory: emulate ioeventfd"
for ccw yet, but that this doesn't hurt and qemu-iotests 068 can pass with
skipping iothread arguments.

I have tested that virtio-scsi-ccw works under tcg both with and without
iothread.

This patch fixes qemu-iotests 068, which was accidentally merged early
despite the dependency on ioeventfd.

Signed-off-by: QingFeng Hao 
Reviewed-by: Cornelia Huck 

I will take it via the s390-next tree.

thanks applied.

Ok, thanks.

---
  hw/s390x/virtio-ccw.c | 2 +-
  target/s390x/cpu.h| 6 +-
  2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index 90d37cb9ff..35896eb007 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -711,7 +711,7 @@ static void virtio_ccw_device_realize(VirtioCcwDevice *dev, 
Error **errp)
  sch->cssid, sch->ssid, sch->schid, sch->devno,
  ccw_dev->devno.valid ? "user-configured" : "auto-configured");

-if (!kvm_eventfds_enabled()) {
+if (kvm_enabled() && !kvm_eventfds_enabled()) {
  dev->flags &= ~VIRTIO_CCW_FLAG_USE_IOEVENTFD;
  }

diff --git a/target/s390x/cpu.h b/target/s390x/cpu.h
index 9faca04b52..bdb9bdbc9d 100644
--- a/target/s390x/cpu.h
+++ b/target/s390x/cpu.h
@@ -1264,7 +1264,11 @@ static inline int 
s390_assign_subch_ioeventfd(EventNotifier *notifier,
uint32_t sch_id, int vq,
bool assign)
  {
-return kvm_s390_assign_subch_ioeventfd(notifier, sch_id, vq, assign);
+if (kvm_enabled()) {
+return kvm_s390_assign_subch_ioeventfd(notifier, sch_id, vq, assign);
+} else {
+return 0;
+}
  }

  static inline void s390_crypto_reset(void)



--
Regards
QingFeng Hao




Re: [Qemu-devel] [PATCH 7/7] MAINTAINERS: add Dump maintainers

2017-07-04 Thread Laszlo Ersek
On 06/29/17 15:23, Marc-André Lureau wrote:
> Proposing myself, since I have some familiarity with the code now.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  MAINTAINERS | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 839f7ca063..45a0eb4cb0 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1272,6 +1272,13 @@ S: Maintained
>  F: device_tree.c
>  F: include/sysemu/device_tree.h
>  
> +Dump
> +S: Supported
> +M: Marc-André Lureau 
> +F: dump.c
> +F: include/sysemu/dump.h
> +F: include/sysemu/dump-arch.h
> +
>  Error reporting
>  M: Markus Armbruster 
>  S: Supported
> 

That's very kind of you, thanks!

Do you have suggestions for the following files?

scripts/dump-guest-memory.py
stubs/dump.c
target/arm/arch_dump.c
target/i386/arch_dump.c
target/ppc/arch_dump.c
target/s390x/arch_dump.c

(If not, I'm OK with that too.)

Thanks!
Laszlo



Re: [Qemu-devel] [PATCH 6/7] scripts/dump-guest-memory.py: add vmcoreinfo

2017-07-04 Thread Laszlo Ersek
On 06/29/17 15:23, Marc-André Lureau wrote:
> Add vmcoreinfo ELF note if vmcoreinfo device is ready.
> 
> To help the python script, add a little global vmcoreinfo_gdb
> structure, that is populated with vmcoreinfo_gdb_update().
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  scripts/dump-guest-memory.py | 32 
>  include/hw/acpi/vmcoreinfo.h |  1 +
>  hw/acpi/vmcoreinfo.c | 16 
>  3 files changed, 49 insertions(+)
> 
> diff --git a/scripts/dump-guest-memory.py b/scripts/dump-guest-memory.py
> index f7c6635f15..16c3d7cb10 100644
> --- a/scripts/dump-guest-memory.py
> +++ b/scripts/dump-guest-memory.py
> @@ -120,6 +120,20 @@ class ELF(object):
>  self.segments[0].p_filesz += ctypes.sizeof(note)
>  self.segments[0].p_memsz += ctypes.sizeof(note)
>  
> +
> +def add_vmcoreinfo_note(self, vmcoreinfo):
> +"""Adds a vmcoreinfo note to the ELF dump."""
> +chead = type(get_arch_note(self.endianness, 0, 0))
> +header = chead.from_buffer_copy(vmcoreinfo[0:ctypes.sizeof(chead)])
> +note = get_arch_note(self.endianness,
> + header.n_namesz - 1, header.n_descsz)
> +ctypes.memmove(ctypes.pointer(note), vmcoreinfo, ctypes.sizeof(note))
> +header_size = ctypes.sizeof(note) - header.n_descsz
> +
> +self.notes.append(note)
> +self.segments[0].p_filesz += ctypes.sizeof(note)
> +self.segments[0].p_memsz += ctypes.sizeof(note)
> +
>  def add_segment(self, p_type, p_paddr, p_size):
>  """Adds a segment to the elf."""
>  
> @@ -505,6 +519,23 @@ shape and this command should mostly work."""
>  cur += chunk_size
>  left -= chunk_size
>  
> +def add_vmcoreinfo(self):
> +qemu_core = gdb.inferiors()[0]
> +
> +gdb.execute("call vmcoreinfo_gdb_update()")

I think it's a bad idea to call a function from a process that's just
crashed.

If this feature is so important, maybe we can simply set a global
pointer variable at the end of vmcoreinfo_realize(); something like:

static void vmcoreinfo_realize(DeviceState *dev, Error **errp)
{
static VmcoreinfoState * volatile vmcoreinfo_gdb_helper;
[...]
vmcoreinfo_gdb_helper = VMCOREINFO(dev);
}

- vmcoreinfo_gdb_helper has function scope, so no other code can abuse
  it
- it has static storage duration so gdb can access it at any time
- the pointer (not the pointed-to object) is qualified volatile, so gcc
  cannot optimize out the pointer assignment (which it might be tempted
  to do otherwise, due to the pointer never being read within QEMU)

Then you can use "vmcoreinfo_gdb_helper->vmcoreinfo_addr_le" to
implement all the logic in "dump-guest-memory.py".

Just my two cents, of course.

Thanks
Laszlo

> +avail = gdb.parse_and_eval("vmcoreinfo_gdb.available")
> +if not avail:
> +return;
> +
> +addr = gdb.parse_and_eval("vmcoreinfo_gdb.paddr")
> +size = gdb.parse_and_eval("vmcoreinfo_gdb.size")
> +for block in self.guest_phys_blocks:
> +if block["target_start"] <= addr < block["target_end"]:
> +haddr = block["host_addr"] + (addr - block["target_start"])
> +vmcoreinfo = qemu_core.read_memory(haddr, size)
> +self.elf.add_vmcoreinfo_note(vmcoreinfo.tobytes())
> +return
> +
>  def invoke(self, args, from_tty):
>  """Handles command invocation from gdb."""
>  
> @@ -518,6 +549,7 @@ shape and this command should mostly work."""
>  
>  self.elf = ELF(argv[1])
>  self.guest_phys_blocks = get_guest_phys_blocks()
> +self.add_vmcoreinfo()
>  
>  with open(argv[0], "wb") as vmcore:
>  self.dump_init(vmcore)
> diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
> index 40fe99c3ed..4efa678237 100644
> --- a/include/hw/acpi/vmcoreinfo.h
> +++ b/include/hw/acpi/vmcoreinfo.h
> @@ -32,5 +32,6 @@ void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray 
> *table_data,
>  void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray 
> *vmci);
>  bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
>  Error **errp);
> +void vmcoreinfo_gdb_update(void);
>  
>  #endif
> diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
> index 216e0bb83a..75e3330813 100644
> --- a/hw/acpi/vmcoreinfo.c
> +++ b/hw/acpi/vmcoreinfo.c
> @@ -145,6 +145,22 @@ bool vmcoreinfo_get(VmcoreinfoState *vis,
>  return true;
>  }
>  
> +struct vmcoreinfo_gdb {
> +bool available;
> +uint64_t paddr;
> +uint64_t size;
> +} vmcoreinfo_gdb;
> +
> +void vmcoreinfo_gdb_update(void)
> +{
> +Object *vmci = find_vmcoreinfo_dev();
> +
> +vmcoreinfo_gdb.available = vmci ?
> +vmcoreinfo_get(VMCOREINFO(vmci),
> +   &vmcoreinfo_gdb.paddr, &vmcoreinfo_gdb.size, NULL)
> +: false;
> +}
> +
>  static const V

Re: [Qemu-devel] [PATCH 5/7] kdump: add vmcoreinfo ELF note

2017-07-04 Thread Laszlo Ersek
On 06/29/17 15:23, Marc-André Lureau wrote:
> kdump header provides offset and size of the vmcoreinfo ELF note,
> append it if available.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  dump.c | 48 
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/dump.c b/dump.c
> index 8fda5cc1ed..b78bc1fda7 100644
> --- a/dump.c
> +++ b/dump.c
> @@ -788,8 +788,9 @@ static void create_header32(DumpState *s, Error **errp)
>  uint32_t sub_hdr_size;
>  uint32_t bitmap_blocks;
>  uint32_t status = 0;
> -uint64_t offset_note;
> +uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
>  Error *local_err = NULL;
> +uint8_t *vmcoreinfo = NULL;
>  
>  /* write common header, the version of kdump-compressed format is 6th */
>  size = sizeof(DiskDumpHeader32);
> @@ -838,7 +839,18 @@ static void create_header32(DumpState *s, Error **errp)
>  kh->phys_base = cpu_to_dump32(s, s->dump_info.phys_base);
>  kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
>  
> -offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +if (s->vmcoreinfo) {
> +uint64_t hsize, name_size;
> +
> +get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, 
> &size_vmcoreinfo);

Should we round up "size_vmcoreinfo" as well? (Without the rounding,
offset_note might become unaligned, plus I simply don't know what
alignment is expected from kh->size_vmcoreinfo.)

> +vmcoreinfo =
> +s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + 
> DIV_ROUND_UP(name_size, 4)) * 4;
> +kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
> +kh->size_vmcoreinfo = cpu_to_dump32(s, size_vmcoreinfo);
> +}
> +
> +offset_note = offset_vmcoreinfo + size_vmcoreinfo;
>  kh->offset_note = cpu_to_dump64(s, offset_note);
>  kh->note_size = cpu_to_dump32(s, s->note_size);
>  
> @@ -848,6 +860,14 @@ static void create_header32(DumpState *s, Error **errp)
>  goto out;
>  }
>  
> +if (vmcoreinfo) {
> +if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
> + size_vmcoreinfo) < 0) {
> +error_setg(errp, "dump: failed to vmcoreinfo");

The verb "write" is missing from the message.

Same comments for create_header64() below.

Thanks
Laszlo

> +goto out;
> +}
> +}
> +
>  /* write note */
>  s->note_buf = g_malloc0(s->note_size);
>  s->note_buf_offset = 0;
> @@ -888,8 +908,9 @@ static void create_header64(DumpState *s, Error **errp)
>  uint32_t sub_hdr_size;
>  uint32_t bitmap_blocks;
>  uint32_t status = 0;
> -uint64_t offset_note;
> +uint64_t offset_note, offset_vmcoreinfo, size_vmcoreinfo = 0;
>  Error *local_err = NULL;
> +uint8_t *vmcoreinfo = NULL;
>  
>  /* write common header, the version of kdump-compressed format is 6th */
>  size = sizeof(DiskDumpHeader64);
> @@ -938,7 +959,18 @@ static void create_header64(DumpState *s, Error **errp)
>  kh->phys_base = cpu_to_dump64(s, s->dump_info.phys_base);
>  kh->dump_level = cpu_to_dump32(s, DUMP_LEVEL);
>  
> -offset_note = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +offset_vmcoreinfo = DISKDUMP_HEADER_BLOCKS * block_size + size;
> +if (s->vmcoreinfo) {
> +uint64_t hsize, name_size;
> +
> +get_note_sizes(s, s->vmcoreinfo, &hsize, &name_size, 
> &size_vmcoreinfo);
> +vmcoreinfo =
> +s->vmcoreinfo + (DIV_ROUND_UP(hsize, 4) + 
> DIV_ROUND_UP(name_size, 4)) * 4;
> +kh->offset_vmcoreinfo = cpu_to_dump64(s, offset_vmcoreinfo);
> +kh->size_vmcoreinfo = cpu_to_dump64(s, size_vmcoreinfo);
> +}
> +
> +offset_note = offset_vmcoreinfo + size_vmcoreinfo;
>  kh->offset_note = cpu_to_dump64(s, offset_note);
>  kh->note_size = cpu_to_dump64(s, s->note_size);
>  
> @@ -948,6 +980,14 @@ static void create_header64(DumpState *s, Error **errp)
>  goto out;
>  }
>  
> +if (vmcoreinfo) {
> +if (write_buffer(s->fd, offset_vmcoreinfo, vmcoreinfo,
> + size_vmcoreinfo) < 0) {
> +error_setg(errp, "dump: failed to vmcoreinfo");
> +goto out;
> +}
> +}
> +
>  /* write note */
>  s->note_buf = g_malloc0(s->note_size);
>  s->note_buf_offset = 0;
> 




Re: [Qemu-devel] [PATCH] virtio-scsi: finalize IOMMU support

2017-07-04 Thread Fam Zheng
On Tue, 07/04 20:21, Jason Wang wrote:
> After converting to use DMA api for virtio devices, we should use
> dma_as instead of address_space_memory. Otherwise it won't work if
> IOMMU is enabled.
> 
> Fixes: commit 8607f5c3072c ("virtio: convert to use DMA api")
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Jason Wang 

Reviewed-by: Fam Zheng 



Re: [Qemu-devel] [PATCH 4/7] dump: add vmcoreinfo ELF note

2017-07-04 Thread Laszlo Ersek
On 06/29/17 15:23, Marc-André Lureau wrote:
> Read the vmcoreinfo ELF PT_NOTE from guest memory when vmcoreinfo
> device provides the location, and write it as an ELF note in the dump.
> 
> There are now 2 possible sources of phys_base information.
> 
> (1) arch guessed value from arch_dump_info_get()

The function is called cpu_get_dump_info().

> (2) vmcoreinfo ELF note NUMBER(phys_base)= field
> 
> NUMBER(phys_base) in vmcoreinfo has only been recently introduced
> in Linux 4.10 (401721ecd1dc "kexec: export the value of phys_base
> instead of symbol address").
> 
> Since (2) has better chances to be accurate, the guessed value is
> replaced by the value from the vmcoreinfo ELF note.
> 
> The phys_base value is stored in the same dump field locations as
> before, and may duplicate the information available in the vmcoreinfo
> ELF PT_NOTE. Crash tools should be prepared to handle this case.
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/sysemu/dump.h |   2 +
>  dump.c| 117 
> ++
>  2 files changed, 119 insertions(+)
> 
> diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
> index 2672a15f8b..111a7dcaa4 100644
> --- a/include/sysemu/dump.h
> +++ b/include/sysemu/dump.h
> @@ -192,6 +192,8 @@ typedef struct DumpState {
>* this could be used to calculate
>* how much work we have
>* finished. */
> +uint8_t *vmcoreinfo; /* ELF note content */
> +size_t vmcoreinfo_size;
>  } DumpState;
>  
>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val);
> diff --git a/dump.c b/dump.c
> index d9090a24cc..8fda5cc1ed 100644
> --- a/dump.c
> +++ b/dump.c
> @@ -26,6 +26,8 @@
>  #include "qapi/qmp/qerror.h"
>  #include "qmp-commands.h"
>  #include "qapi-event.h"
> +#include "qemu/error-report.h"
> +#include "hw/acpi/vmcoreinfo.h"
>  
>  #include 
>  #ifdef CONFIG_LZO
> @@ -38,6 +40,11 @@
>  #define ELF_MACHINE_UNAME "Unknown"
>  #endif
>  
> +#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size)   \
> +((DIV_ROUND_UP((hdr_size), 4)   \
> +  + DIV_ROUND_UP((name_size), 4)\
> +  + DIV_ROUND_UP((desc_size), 4)) * 4)
> +

This looks really useful to me, but (I think?) we generally leave the
operator hanging at the end of the line:

#define ELF_NOTE_SIZE(hdr_size, name_size, desc_size) \
((DIV_ROUND_UP((hdr_size), 4) +   \
  DIV_ROUND_UP((name_size), 4) +  \
  DIV_ROUND_UP((desc_size), 4)) * 4)

>  uint16_t cpu_to_dump16(DumpState *s, uint16_t val)
>  {
>  if (s->dump_info.d_endian == ELFDATA2LSB) {
> @@ -76,6 +83,8 @@ static int dump_cleanup(DumpState *s)
>  guest_phys_blocks_free(&s->guest_phys_blocks);
>  memory_mapping_list_free(&s->list);
>  close(s->fd);
> +g_free(s->vmcoreinfo);
> +s->vmcoreinfo = NULL;
>  if (s->resume) {
>  if (s->detached) {
>  qemu_mutex_lock_iothread();
> @@ -235,6 +244,19 @@ static inline int cpu_index(CPUState *cpu)
>  return cpu->cpu_index + 1;
>  }
>  
> +static void write_vmcoreinfo_note(WriteCoreDumpFunction f, DumpState *s,
> +  Error **errp)
> +{
> +int ret;
> +
> +if (s->vmcoreinfo) {
> +ret = f(s->vmcoreinfo, s->vmcoreinfo_size, s);
> +if (ret < 0) {
> +error_setg(errp, "dump: failed to write vmcoreinfo");
> +}
> +}
> +}
> +
>  static void write_elf64_notes(WriteCoreDumpFunction f, DumpState *s,
>Error **errp)
>  {
> @@ -258,6 +280,8 @@ static void write_elf64_notes(WriteCoreDumpFunction f, 
> DumpState *s,
>  return;
>  }
>  }
> +
> +write_vmcoreinfo_note(f, s, errp);
>  }
>  
>  static void write_elf32_note(DumpState *s, Error **errp)
> @@ -303,6 +327,8 @@ static void write_elf32_notes(WriteCoreDumpFunction f, 
> DumpState *s,
>  return;
>  }
>  }
> +
> +write_vmcoreinfo_note(f, s, errp);
>  }
>  

Wait, I'm confused again. You explained why it was OK to hook this logic
into the kdump handling too, but I don't think I understand your
explanation, so let me repeat my confusion below :)

In the ELF case, this code works fine, I think. As long as the guest
provided us with a well-formed note, a well-formed note will be appended
to the ELF dump.

But, this code is also invoked in the kdump case, and I don't understand
why that's a good thing. If I understand the next patch correctly, the
kdump format already provides crash with a (trimmed) copy of the guest
kernels vmcoreinfo note. So in the kdump case, why do we have to create
yet another copy of the guest kernel's vmcoreinfo note?

Thus, my confusion persists, and I can only think (again) that
write_vmcoreinfo_note() should be called from dump_begin() only (at the
end). (And the s->note_size adjustment should take that int

Re: [Qemu-devel] [Bug 1701971] [NEW] multithreading not working right under qemu user mode for sh4

2017-07-04 Thread Richard Henderson

On 07/02/2017 12:53 PM, Bruno Haible wrote:

In a multithreaded program running under qemu-sh4 (version 2.9.0),
thread termination and/or pthread_join is not working right.


QEMU does not support the roll-back atomic sequences used by linux on most 
uniprocessor SH.


Nor do we support the ll/sc form used by SH4A, although fixing that should be 
trivial.



r~



Re: [Qemu-devel] [PATCH RFC v19 00/13] QEMU AVR 8 bit cores

2017-07-04 Thread Richard Henderson

On 06/21/2017 09:15 PM, Michael Rolnik wrote:

Hi all,

are there any action items for me?


What kind of testing are you doing for this?

I just briefly browsed through the code again and happened to see that ROR has 
a critical typo.  Considering that ROR must be used for multi-byte shifts, I'm 
wondering how you wouldn't have found this via even cursory testing.



+int avr_translate_ROR(CPUAVRState *env, DisasContext *ctx, uint32_t opcode)
+{
+TCGv Rd = cpu_r[ROR_Rd(opcode)];
+TCGv t0 = tcg_temp_new_i32();
+
+tcg_gen_shli_tl(t0, cpu_Cf, 7);
+tcg_gen_andi_tl(cpu_Cf, Rd, 0);



r~



Re: [Qemu-devel] [PATCH v11 04/29] target: [tcg] Add generic translation framework

2017-07-04 Thread Peter Maydell
On 4 July 2017 at 23:31, Richard Henderson  wrote:
> On 07/04/2017 09:14 AM, Peter Maydell wrote:
>> I kind of like not having CPUState* in DisasContext, because
>> it enforces the rule that you can't read from fields of
>> it inside the target translate.c code without jumping through
>> a hoop (ie copying the info from CPUState->foo to
>> DisasContext->foo). That then acts as a useful flag in code
>> review (or when writing the code) to confirm that foo really
>> is constant for the life of the simulation (or to recommend
>> using a TB flag instead).
>
>
> I don't see how the spelling "cpu" vs "dc->cpu" really affects that.

If you put it in dc->cpu then everywhere that gets the dc gets
the cpu, unavoidably (which is why I'm suggesting it would
be nicer not to do that). If you don't put it in dc then you
can structure your translate code so that pretty much all of
it gets the dc but not the cpu. target/arm/translate-a64.c
does this, for instance.

thanks
-- PMM



Re: [Qemu-devel] [PATCH v11 04/29] target: [tcg] Add generic translation framework

2017-07-04 Thread Richard Henderson

On 07/04/2017 09:14 AM, Peter Maydell wrote:

On 4 July 2017 at 19:59, Lluís Vilanova  wrote:

Richard Henderson writes:

Any reason not to stuff the cpu pointer into the DisasContextBase instead of
passing it around separately?


None, really. I'll move it from DisasContext (in targets where it's present)
into DisasContextBase, and use that one everywhere.


I kind of like not having CPUState* in DisasContext, because
it enforces the rule that you can't read from fields of
it inside the target translate.c code without jumping through
a hoop (ie copying the info from CPUState->foo to
DisasContext->foo). That then acts as a useful flag in code
review (or when writing the code) to confirm that foo really
is constant for the life of the simulation (or to recommend
using a TB flag instead).


I don't see how the spelling "cpu" vs "dc->cpu" really affects that.

More practically, I don't see that "cpu" will actually be used by most of those 
hooks.  But because of things like cpu->some_target_feature, it's kind of hard 
to predict.



r~



Re: [Qemu-devel] [PATCH 2/7] acpi: add vmcoreinfo device

2017-07-04 Thread Laszlo Ersek
comments below

On 06/29/17 15:23, Marc-André Lureau wrote:
> The VM coreinfo (vmcoreinfo) device is an emulated device which
> exposes a 4k memory range to the guest to store various informations
> useful to debug the guest OS. (it is greatly inspired by the VMGENID
> device implementation)
> 
> This is an early-boot alternative to the qemu-ga VMDUMP_INFO event
> proposed in "[PATCH 00/21] WIP: dump: add kaslr support".
> 
> A proof-of-concept kernel module:
> https://github.com/elmarco/vmgenid-test/blob/master/qemuvmci-test.c
> 
> Signed-off-by: Marc-André Lureau 
> ---
>  include/hw/acpi/aml-build.h|   1 +
>  include/hw/acpi/vmcoreinfo.h   |  36 +++
>  hw/acpi/aml-build.c|   2 +
>  hw/acpi/vmcoreinfo.c   | 198 
> +
>  hw/i386/acpi-build.c   |  14 +++
>  default-configs/arm-softmmu.mak|   1 +
>  default-configs/i386-softmmu.mak   |   1 +
>  default-configs/x86_64-softmmu.mak |   1 +
>  docs/specs/vmcoreinfo.txt  | 138 ++
>  hw/acpi/Makefile.objs  |   1 +
>  10 files changed, 393 insertions(+)
>  create mode 100644 include/hw/acpi/vmcoreinfo.h
>  create mode 100644 hw/acpi/vmcoreinfo.c
>  create mode 100644 docs/specs/vmcoreinfo.txt
>
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 88d0738d76..cf781bcd34 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -211,6 +211,7 @@ struct AcpiBuildTables {
>  GArray *rsdp;
>  GArray *tcpalog;
>  GArray *vmgenid;
> +GArray *vmcoreinfo;
>  BIOSLinker *linker;
>  } AcpiBuildTables;
>  
> diff --git a/include/hw/acpi/vmcoreinfo.h b/include/hw/acpi/vmcoreinfo.h
> new file mode 100644
> index 00..40fe99c3ed
> --- /dev/null
> +++ b/include/hw/acpi/vmcoreinfo.h
> @@ -0,0 +1,36 @@
> +#ifndef ACPI_VMCOREINFO_H
> +#define ACPI_VMCOREINFO_H
> +
> +#include "hw/acpi/bios-linker-loader.h"
> +#include "hw/qdev.h"
> +
> +#define VMCOREINFO_DEVICE   "vmcoreinfo"
> +#define VMCOREINFO_FW_CFG_FILE  "etc/vmcoreinfo"
> +#define VMCOREINFO_ADDR_FW_CFG_FILE "etc/vmcoreinfo-addr"
> +
> +#define VMCOREINFO_FW_CFG_SIZE  4096 /* Occupy a page of memory */
> +#define VMCOREINFO_OFFSET   40   /* allow space for
> +  * OVMF SDT Header Probe Supressor
> +  */
> +
> +#define VMCOREINFO(obj) OBJECT_CHECK(VmcoreinfoState, (obj), 
> VMCOREINFO_DEVICE)
> +
> +typedef struct VmcoreinfoState {

I think this should be spelled with a bit more camel-casing, like
VMCoreInfoState or some such.

> +DeviceClass parent_obj;
> +uint8_t vmcoreinfo_addr_le[8];   /* Address of memory region */
> +bool write_pointer_available;
> +} VmcoreinfoState;
> +
> +/* returns NULL unless there is exactly one device */
> +static inline Object *find_vmcoreinfo_dev(void)
> +{
> +return object_resolve_path_type("", VMCOREINFO_DEVICE, NULL);
> +}
> +
> +void vmcoreinfo_build_acpi(VmcoreinfoState *vis, GArray *table_data,
> +   GArray *vmci, BIOSLinker *linker);
> +void vmcoreinfo_add_fw_cfg(VmcoreinfoState *vis, FWCfgState *s, GArray 
> *vmci);
> +bool vmcoreinfo_get(VmcoreinfoState *vis, uint64_t *paddr, uint64_t *size,
> +Error **errp);
> +
> +#endif
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 36a6cc450e..47043ade4a 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1561,6 +1561,7 @@ void acpi_build_tables_init(AcpiBuildTables *tables)
>  tables->table_data = g_array_new(false, true /* clear */, 1);
>  tables->tcpalog = g_array_new(false, true /* clear */, 1);
>  tables->vmgenid = g_array_new(false, true /* clear */, 1);
> +tables->vmcoreinfo = g_array_new(false, true /* clear */, 1);
>  tables->linker = bios_linker_loader_init();
>  }
>  
> @@ -1571,6 +1572,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, 
> bool mfre)
>  g_array_free(tables->table_data, true);
>  g_array_free(tables->tcpalog, mfre);
>  g_array_free(tables->vmgenid, mfre);
> +g_array_free(tables->vmcoreinfo, mfre);
>  }
>  
>  /* Build rsdt table */
> diff --git a/hw/acpi/vmcoreinfo.c b/hw/acpi/vmcoreinfo.c
> new file mode 100644
> index 00..216e0bb83a
> --- /dev/null
> +++ b/hw/acpi/vmcoreinfo.c
> @@ -0,0 +1,198 @@
> +/*
> + *  Virtual Machine coreinfo device
> + *  (based on Virtual Machine Generation ID Device)
> + *
> + *  Copyright (C) 2017 Red Hat, Inc.
> + *  Copyright (C) 2017 Skyport Systems.
> + *
> + *  Authors: Marc-André Lureau 
> + *   Ben Warren 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +#include "qemu/osdep.h"
> +#include "hw/acpi/acpi.h"
> +#include "hw/acpi/aml-build.h"
> +#include "hw/acpi/vmcoreinfo.h"
> +#include "hw/nvram/fw_cfg.h"
> +

[Qemu-devel] [PATCH 31/35] parallels: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/parallels.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index 8be46a7d48..213e42b9d2 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -472,7 +472,8 @@ static int parallels_check(BlockDriverState *bs, 
BdrvCheckResult *res,
 }
 
 
-static int parallels_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+parallels_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int64_t total_size, cl_size;
 uint8_t tmp[BDRV_SECTOR_SIZE];
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 35/35] vpc: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/vpc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/vpc.c b/block/vpc.c
index 4240ba9d1c..1b4aba20bd 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -872,7 +872,8 @@ static int create_fixed_disk(BlockBackend *blk, uint8_t 
*buf,
 return ret;
 }
 
-static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+vpc_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 uint8_t buf[1024];
 VHDFooter *footer = (VHDFooter *) buf;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 28/35] 9p: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 hw/9pfs/9p.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/9pfs/9p.h b/hw/9pfs/9p.h
index d1cfeaf10e..935a6c9a3c 100644
--- a/hw/9pfs/9p.h
+++ b/hw/9pfs/9p.h
@@ -312,21 +312,24 @@ typedef struct V9fsGetlock
 extern int open_fd_hw;
 extern int total_open_fd;
 
-static inline void v9fs_path_write_lock(V9fsState *s)
+static inline void coroutine_fn
+v9fs_path_write_lock(V9fsState *s)
 {
 if (s->ctx.export_flags & V9FS_PATHNAME_FSCONTEXT) {
 qemu_co_rwlock_wrlock(&s->rename_lock);
 }
 }
 
-static inline void v9fs_path_read_lock(V9fsState *s)
+static inline void coroutine_fn
+v9fs_path_read_lock(V9fsState *s)
 {
 if (s->ctx.export_flags & V9FS_PATHNAME_FSCONTEXT) {
 qemu_co_rwlock_rdlock(&s->rename_lock);
 }
 }
 
-static inline void v9fs_path_unlock(V9fsState *s)
+static inline void coroutine_fn
+v9fs_path_unlock(V9fsState *s)
 {
 if (s->ctx.export_flags & V9FS_PATHNAME_FSCONTEXT) {
 qemu_co_rwlock_unlock(&s->rename_lock);
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 33/35] vdi: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/vdi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/vdi.c b/block/vdi.c
index 79af47763b..53cd7f64d8 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -716,7 +716,8 @@ vdi_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return ret;
 }
 
-static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+vdi_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int ret = 0;
 uint64_t bytes = 0;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 26/35] iscsi: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/iscsi.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 54067e2620..e16311cb4a 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1005,7 +1005,8 @@ static void iscsi_ioctl_handle_emulated(IscsiAIOCB *acb, 
int req, void *buf)
 qemu_bh_schedule(acb->bh);
 }
 
-static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+iscsi_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
 {
@@ -2107,7 +2108,8 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t 
offset, Error **errp)
 return 0;
 }
 
-static int iscsi_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+iscsi_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int ret = 0;
 int64_t total_size = 0;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 30/35] block-backend: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 include/sysemu/block-backend.h |  4 ++--
 block/block-backend.c  | 36 
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 1e05281fff..2f967037af 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -165,8 +165,8 @@ int blk_co_ioctl(BlockBackend *blk, unsigned long int req, 
void *buf);
 int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf);
 BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
   BlockCompletionFunc *cb, void *opaque);
-int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes);
-int blk_co_flush(BlockBackend *blk);
+int coroutine_fn blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes);
+int coroutine_fn blk_co_flush(BlockBackend *blk);
 int blk_flush(BlockBackend *blk);
 int blk_commit_all(void);
 void blk_drain(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 56fc0a4d1e..a48aa4f900 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1032,7 +1032,8 @@ typedef struct BlkRwCo {
 BdrvRequestFlags flags;
 } BlkRwCo;
 
-static void blk_read_entry(void *opaque)
+static void coroutine_fn
+blk_read_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
 
@@ -1040,7 +1041,8 @@ static void blk_read_entry(void *opaque)
   rwco->qiov, rwco->flags);
 }
 
-static void blk_write_entry(void *opaque)
+static void coroutine_fn
+blk_write_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
 
@@ -1195,7 +1197,8 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, 
int64_t offset, int bytes,
 return &acb->common;
 }
 
-static void blk_aio_read_entry(void *opaque)
+static void coroutine_fn
+blk_aio_read_entry(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
 BlkRwCo *rwco = &acb->rwco;
@@ -1206,7 +1209,8 @@ static void blk_aio_read_entry(void *opaque)
 blk_aio_complete(acb);
 }
 
-static void blk_aio_write_entry(void *opaque)
+static void coroutine_fn
+blk_aio_write_entry(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
 BlkRwCo *rwco = &acb->rwco;
@@ -1288,7 +1292,8 @@ BlockAIOCB *blk_aio_pwritev(BlockBackend *blk, int64_t 
offset,
 blk_aio_write_entry, flags, cb, opaque);
 }
 
-static void blk_aio_flush_entry(void *opaque)
+static void coroutine_fn
+blk_aio_flush_entry(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
 BlkRwCo *rwco = &acb->rwco;
@@ -1303,7 +1308,8 @@ BlockAIOCB *blk_aio_flush(BlockBackend *blk,
 return blk_aio_prwv(blk, 0, 0, NULL, blk_aio_flush_entry, 0, cb, opaque);
 }
 
-static void blk_aio_pdiscard_entry(void *opaque)
+static void coroutine_fn
+blk_aio_pdiscard_entry(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
 BlkRwCo *rwco = &acb->rwco;
@@ -1339,7 +1345,8 @@ int blk_co_ioctl(BlockBackend *blk, unsigned long int 
req, void *buf)
 return bdrv_co_ioctl(blk_bs(blk), req, buf);
 }
 
-static void blk_ioctl_entry(void *opaque)
+static void coroutine_fn
+blk_ioctl_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
 rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
@@ -1351,7 +1358,8 @@ int blk_ioctl(BlockBackend *blk, unsigned long int req, 
void *buf)
 return blk_prw(blk, req, buf, 0, blk_ioctl_entry, 0);
 }
 
-static void blk_aio_ioctl_entry(void *opaque)
+static void coroutine_fn
+blk_aio_ioctl_entry(void *opaque)
 {
 BlkAioEmAIOCB *acb = opaque;
 BlkRwCo *rwco = &acb->rwco;
@@ -1376,7 +1384,8 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned 
long int req, void *buf,
 return blk_aio_prwv(blk, req, 0, &qiov, blk_aio_ioctl_entry, 0, cb, 
opaque);
 }
 
-int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
+int coroutine_fn
+blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
 {
 int ret = blk_check_byte_request(blk, offset, bytes);
 if (ret < 0) {
@@ -1386,7 +1395,8 @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, 
int bytes)
 return bdrv_co_pdiscard(blk_bs(blk), offset, bytes);
 }
 
-int blk_co_flush(BlockBackend *blk)
+int coroutine_fn
+blk_co_flush(BlockBackend *blk)
 {
 if (!blk_is_available(blk)) {
 return -ENOMEDIUM;
@@ -1395,7 +1405,8 @@ int blk_co_flush(BlockBackend *blk)
 return bdrv_co_flush(blk_bs(blk));
 }
 
-static void blk_flush_entry(void *opaque)
+static void coroutine_fn
+blk_flush_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
 rwco->ret = blk_co_flush(rwco->blk);
@@ -1785,7 +1796,8 @@ int blk_truncate(BlockBackend *blk, int64_t offset, Error 
**errp)
 return bdrv_truncate(blk->root, offset, errp);
 }
 
-static void blk_pdiscard_entry(void *opaque)
+static void coroutine_fn
+blk_pdiscard_entry(void *opaque)
 {
 BlkRwCo *rwco = opaque;
 rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, rwco->qiov->size);
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 25/35] mirror: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/mirror.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 68744a17e8..2f0a9946d9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -224,7 +224,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
 return ret;
 }
 
-static inline void mirror_wait_for_io(MirrorBlockJob *s)
+static inline void coroutine_fn
+mirror_wait_for_io(MirrorBlockJob *s)
 {
 assert(!s->waiting_for_io);
 s->waiting_for_io = true;
@@ -239,7 +240,8 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
  *  (new_end - sector_num) if tail is rounded up or down due to
  *  alignment or buffer limit.
  */
-static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
+static int coroutine_fn
+mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
   int nb_sectors)
 {
 BlockBackend *source = s->common.blk;
@@ -490,7 +492,8 @@ static void mirror_free_init(MirrorBlockJob *s)
  * mirror_resume() because mirror_run() will begin iterating again
  * when the job is resumed.
  */
-static void mirror_wait_for_all_io(MirrorBlockJob *s)
+static void coroutine_fn
+mirror_wait_for_all_io(MirrorBlockJob *s)
 {
 while (s->in_flight > 0) {
 mirror_wait_for_io(s);
@@ -605,7 +608,8 @@ static void mirror_exit(BlockJob *job, void *opaque)
 bdrv_unref(src);
 }
 
-static void mirror_throttle(MirrorBlockJob *s)
+static void coroutine_fn
+mirror_throttle(MirrorBlockJob *s)
 {
 int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
 
@@ -984,7 +988,8 @@ static void mirror_complete(BlockJob *job, Error **errp)
 block_job_enter(&s->common);
 }
 
-static void mirror_pause(BlockJob *job)
+static void coroutine_fn
+mirror_pause(BlockJob *job)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 
-- 
2.13.1.395.gf7b71de06




Re: [Qemu-devel] [PATCH] vhost: fix a memory leak

2017-07-04 Thread Michael S. Tsirkin
On Tue, Jul 04, 2017 at 11:59:54PM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Wed, Jul 5, 2017 at 12:33 AM, Peng Hao  wrote:
> > vhost exists a call for g_file_get_contents, but not call g_free.
> >
> > Signed-off-by: Peng Hao
> 
> Reviewed-by: Marc-André Lureau 
> 
> mst: is this better?

That's fine, thanks!

> > ---
> >  hw/virtio/vhost-backend.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> > index 4e31de1..2c481d6 100644
> > --- a/hw/virtio/vhost-backend.c
> > +++ b/hw/virtio/vhost-backend.c
> > @@ -52,11 +52,13 @@ static int vhost_kernel_memslots_limit(struct vhost_dev 
> > *dev)
> >  &s, NULL, NULL)) {
> >  uint64_t val = g_ascii_strtoull(s, NULL, 10);
> >  if (!((val == G_MAXUINT64 || !val) && errno)) {
> > +g_free(s);
> >  return val;
> >  }
> >  error_report("ignoring invalid max_mem_regions value in vhost 
> > module:"
> >   " %s", s);
> >  }
> > +g_free(s);
> >  return limit;
> >  }
> >
> > --
> > 1.8.3.1
> >
> >
> >
> 
> 
> 
> -- 
> Marc-André Lureau



[Qemu-devel] [PATCH 23/35] ssh: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/ssh.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index 52964416da..03a8ebe6f7 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -813,7 +813,8 @@ static QemuOptsList ssh_create_opts = {
 }
 };
 
-static int ssh_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+ssh_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int r, ret;
 int64_t total_size = 0;
@@ -1029,7 +1030,8 @@ static coroutine_fn int ssh_co_readv(BlockDriverState *bs,
 return ret;
 }
 
-static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
+static int coroutine_fn
+ssh_write(BDRVSSHState *s, BlockDriverState *bs,
  int64_t offset, size_t size,
  QEMUIOVector *qiov)
 {
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 34/35] vhdx: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/vhdx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 8b270b57c9..56b54f3ed7 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1787,7 +1787,8 @@ exit:
  *. ~ --- ~  ~  ~ ---.
  *   1MB
  */
-static int vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+vhdx_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int ret = 0;
 uint64_t image_size = (uint64_t) 2 * GiB;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 17/35] curl: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/curl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/curl.c b/block/curl.c
index 2a244e2439..d3719dc086 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -855,7 +855,8 @@ out_noclean:
 return -EINVAL;
 }
 
-static void curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb)
+static void coroutine_fn
+curl_setup_preadv(BlockDriverState *bs, CURLAIOCB *acb)
 {
 CURLState *state;
 int running;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 27/35] file-posix: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/file-posix.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 3927fabf06..adafbbb6a0 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1483,7 +1483,8 @@ static int aio_worker(void *arg)
 return ret;
 }
 
-static int paio_submit_co(BlockDriverState *bs, int fd,
+static int coroutine_fn
+paio_submit_co(BlockDriverState *bs, int fd,
   int64_t offset, QEMUIOVector *qiov,
   int bytes, int type)
 {
@@ -1599,7 +1600,8 @@ static void raw_aio_unplug(BlockDriverState *bs)
 #endif
 }
 
-static BlockAIOCB *raw_aio_flush(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+raw_aio_flush(BlockDriverState *bs,
 BlockCompletionFunc *cb, void *opaque)
 {
 BDRVRawState *s = bs->opaque;
@@ -1835,7 +1837,8 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return (int64_t)st.st_blocks * 512;
 }
 
-static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+raw_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int fd;
 int result = 0;
@@ -2526,7 +2529,8 @@ hdev_open_Mac_error:
 
 #if defined(__linux__)
 
-static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+hdev_aio_ioctl(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque)
 {
@@ -2592,7 +2596,8 @@ static coroutine_fn int 
hdev_co_pwrite_zeroes(BlockDriverState *bs,
 return -ENOTSUP;
 }
 
-static int hdev_create(const char *filename, QemuOpts *opts,
+static int coroutine_fn
+hdev_create(const char *filename, QemuOpts *opts,
Error **errp)
 {
 int fd;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 20/35] quorum: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/quorum.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 55ba916655..b086d70daa 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -264,7 +264,8 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
 }
 }
 
-static void quorum_rewrite_entry(void *opaque)
+static void coroutine_fn
+quorum_rewrite_entry(void *opaque)
 {
 QuorumCo *co = opaque;
 QuorumAIOCB *acb = co->acb;
@@ -282,7 +283,8 @@ static void quorum_rewrite_entry(void *opaque)
 }
 }
 
-static bool quorum_rewrite_bad_versions(QuorumAIOCB *acb,
+static bool coroutine_fn
+quorum_rewrite_bad_versions(QuorumAIOCB *acb,
 QuorumVoteValue *value)
 {
 QuorumVoteVersion *version;
@@ -497,7 +499,8 @@ static int quorum_vote_error(QuorumAIOCB *acb)
 return ret;
 }
 
-static void quorum_vote(QuorumAIOCB *acb)
+static void coroutine_fn
+quorum_vote(QuorumAIOCB *acb)
 {
 bool quorum = true;
 int i, j, ret;
@@ -577,7 +580,8 @@ free_exit:
 quorum_free_vote_list(&acb->votes);
 }
 
-static void read_quorum_children_entry(void *opaque)
+static void coroutine_fn
+read_quorum_children_entry(void *opaque)
 {
 QuorumCo *co = opaque;
 QuorumAIOCB *acb = co->acb;
@@ -605,7 +609,7 @@ static void read_quorum_children_entry(void *opaque)
 }
 }
 
-static int read_quorum_children(QuorumAIOCB *acb)
+static int coroutine_fn read_quorum_children(QuorumAIOCB *acb)
 {
 BDRVQuorumState *s = acb->bs->opaque;
 int i, ret;
@@ -648,7 +652,8 @@ static int read_quorum_children(QuorumAIOCB *acb)
 return ret;
 }
 
-static int read_fifo_child(QuorumAIOCB *acb)
+static int coroutine_fn
+read_fifo_child(QuorumAIOCB *acb)
 {
 BDRVQuorumState *s = acb->bs->opaque;
 int n, ret;
@@ -669,7 +674,8 @@ static int read_fifo_child(QuorumAIOCB *acb)
 return ret;
 }
 
-static int quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
+static int coroutine_fn
+quorum_co_preadv(BlockDriverState *bs, uint64_t offset,
 uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
 BDRVQuorumState *s = bs->opaque;
@@ -689,7 +695,7 @@ static int quorum_co_preadv(BlockDriverState *bs, uint64_t 
offset,
 return ret;
 }
 
-static void write_quorum_entry(void *opaque)
+static void coroutine_fn write_quorum_entry(void *opaque)
 {
 QuorumCo *co = opaque;
 QuorumAIOCB *acb = co->acb;
@@ -715,7 +721,8 @@ static void write_quorum_entry(void *opaque)
 }
 }
 
-static int quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
+static int coroutine_fn
+quorum_co_pwritev(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
 BDRVQuorumState *s = bs->opaque;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 32/35] qed: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/qed.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/qed.c b/block/qed.c
index 385381a78a..dd2859a1c9 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -622,7 +622,8 @@ out:
 return ret;
 }
 
-static int bdrv_qed_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+bdrv_qed_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 uint64_t image_size = 0;
 uint32_t cluster_size = QED_DEFAULT_CLUSTER_SIZE;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 16/35] crypto: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/crypto.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/crypto.c b/block/crypto.c
index 10e5ddccaa..0e30a4ea06 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -568,7 +568,8 @@ static int block_crypto_open_luks(BlockDriverState *bs,
  bs, options, flags, errp);
 }
 
-static int block_crypto_create_luks(const char *filename,
+static int coroutine_fn
+block_crypto_create_luks(const char *filename,
 QemuOpts *opts,
 Error **errp)
 {
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 22/35] sheepdog: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/sheepdog.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 83bc43dde4..64ff275db9 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -481,7 +481,8 @@ static inline AIOReq *alloc_aio_req(BDRVSheepdogState *s, 
SheepdogAIOCB *acb,
 return aio_req;
 }
 
-static void wait_for_overlapping_aiocb(BDRVSheepdogState *s, SheepdogAIOCB 
*acb)
+static void coroutine_fn
+wait_for_overlapping_aiocb(BDRVSheepdogState *s, SheepdogAIOCB *acb)
 {
 SheepdogAIOCB *cb;
 
@@ -494,7 +495,8 @@ retry:
 }
 }
 
-static void sd_aio_setup(SheepdogAIOCB *acb, BDRVSheepdogState *s,
+static void coroutine_fn
+sd_aio_setup(SheepdogAIOCB *acb, BDRVSheepdogState *s,
  QEMUIOVector *qiov, int64_t sector_num, int 
nb_sectors,
  int type)
 {
@@ -1954,7 +1956,8 @@ static int parse_block_size_shift(BDRVSheepdogState *s, 
QemuOpts *opt)
 return 0;
 }
 
-static int sd_create(const char *filename, QemuOpts *opts,
+static int coroutine_fn
+sd_create(const char *filename, QemuOpts *opts,
  Error **errp)
 {
 Error *err = NULL;
@@ -2431,7 +2434,8 @@ static void coroutine_fn sd_co_rw_vector(SheepdogAIOCB 
*acb)
 }
 }
 
-static void sd_aio_complete(SheepdogAIOCB *acb)
+static void coroutine_fn
+sd_aio_complete(SheepdogAIOCB *acb)
 {
 if (acb->aiocb_type == AIOCB_FLUSH_CACHE) {
 return;
@@ -2905,7 +2909,8 @@ cleanup:
 return ret;
 }
 
-static int sd_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+static int coroutine_fn
+sd_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t pos)
 {
 BDRVSheepdogState *s = bs->opaque;
@@ -2920,7 +2925,8 @@ static int sd_save_vmstate(BlockDriverState *bs, 
QEMUIOVector *qiov,
 return ret;
 }
 
-static int sd_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+static int coroutine_fn
+sd_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t pos)
 {
 BDRVSheepdogState *s = bs->opaque;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 18/35] gluster: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/gluster.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/gluster.c b/block/gluster.c
index addceed6eb..dea8ab43a5 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -965,7 +965,8 @@ static coroutine_fn int 
qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
 }
 #endif
 
-static int qemu_gluster_create(const char *filename,
+static int coroutine_fn
+qemu_gluster_create(const char *filename,
QemuOpts *opts, Error **errp)
 {
 BlockdevOptionsGluster *gconf;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 15/35] backup: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 include/block/block_backup.h | 4 ++--
 block/backup.c   | 9 ++---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 8a759477a3..415cf8519d 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -27,12 +27,12 @@ typedef struct CowRequest {
 CoQueue wait_queue; /* coroutines blocked on this request */
 } CowRequest;
 
-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+void coroutine_fn backup_wait_for_overlapping_requests(BlockJob *job, int64_t 
sector_num,
   int nb_sectors);
 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
   int64_t sector_num,
   int nb_sectors);
-void backup_cow_request_end(CowRequest *req);
+void coroutine_fn backup_cow_request_end(CowRequest *req);
 
 void backup_do_checkpoint(BlockJob *job, Error **errp);
 
diff --git a/block/backup.c b/block/backup.c
index 5387fbd84e..58ddd80b3f 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -84,7 +84,8 @@ static void cow_request_begin(CowRequest *req, BackupBlockJob 
*job,
 }
 
 /* Forget about a completed request */
-static void cow_request_end(CowRequest *req)
+static void coroutine_fn
+cow_request_end(CowRequest *req)
 {
 QLIST_REMOVE(req, list);
 qemu_co_queue_restart_all(&req->wait_queue);
@@ -275,7 +276,8 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 bitmap_zero(backup_job->done_bitmap, len);
 }
 
-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
+void coroutine_fn
+backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
   int nb_sectors)
 {
 BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
@@ -304,7 +306,8 @@ void backup_cow_request_begin(CowRequest *req, BlockJob 
*job,
 cow_request_begin(req, backup_job, start, end);
 }
 
-void backup_cow_request_end(CowRequest *req)
+void coroutine_fn
+backup_cow_request_end(CowRequest *req)
 {
 cow_request_end(req);
 }
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 29/35] block: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/blkdebug.c  | 15 ++-
 block/blkverify.c |  3 ++-
 block/io.c|  9 ++---
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index a1b24b9b0d..d55e2e69c8 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -483,7 +483,8 @@ out:
 return ret;
 }
 
-static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
+static int coroutine_fn
+rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes)
 {
 BDRVBlkdebugState *s = bs->opaque;
 BlkdebugRule *rule = NULL;
@@ -563,7 +564,8 @@ blkdebug_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
 }
 
-static int blkdebug_co_flush(BlockDriverState *bs)
+static int coroutine_fn
+blkdebug_co_flush(BlockDriverState *bs)
 {
 int err = rule_check(bs, 0, 0);
 
@@ -656,7 +658,8 @@ static void blkdebug_close(BlockDriverState *bs)
 g_free(s->config_file);
 }
 
-static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
+static void coroutine_fn
+suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
 {
 BDRVBlkdebugState *s = bs->opaque;
 BlkdebugSuspendedReq r;
@@ -681,7 +684,8 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 g_free(r.tag);
 }
 
-static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
+static bool coroutine_fn
+process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
 bool injected)
 {
 BDRVBlkdebugState *s = bs->opaque;
@@ -712,7 +716,8 @@ static bool process_rule(BlockDriverState *bs, struct 
BlkdebugRule *rule,
 return injected;
 }
 
-static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
+static void coroutine_fn
+blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
 {
 BDRVBlkdebugState *s = bs->opaque;
 struct BlkdebugRule *rule, *next;
diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9eac..d0c946173a 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -255,7 +255,8 @@ blkverify_co_pwritev(BlockDriverState *bs, uint64_t offset, 
uint64_t bytes,
 return blkverify_co_prwv(bs, &r, offset, bytes, qiov, qiov, flags, true);
 }
 
-static int blkverify_co_flush(BlockDriverState *bs)
+static int coroutine_fn
+blkverify_co_flush(BlockDriverState *bs)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
diff --git a/block/io.c b/block/io.c
index 14b88c8609..a53a86df3e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -366,7 +366,8 @@ void bdrv_drain_all(void)
  *
  * This function should be called when a tracked request is completing.
  */
-static void tracked_request_end(BdrvTrackedRequest *req)
+static void coroutine_fn
+tracked_request_end(BdrvTrackedRequest *req)
 {
 if (req->serialising) {
 atomic_dec(&req->bs->serialising_in_flight);
@@ -381,7 +382,8 @@ static void tracked_request_end(BdrvTrackedRequest *req)
 /**
  * Add an active request to the tracked requests list
  */
-static void tracked_request_begin(BdrvTrackedRequest *req,
+static void coroutine_fn
+tracked_request_begin(BdrvTrackedRequest *req,
   BlockDriverState *bs,
   int64_t offset,
   unsigned int bytes,
@@ -2430,7 +2432,8 @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, 
int bytes)
 return rwco.ret;
 }
 
-int bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf)
+int coroutine_fn
+bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf)
 {
 BlockDriver *drv = bs->drv;
 CoroutineIOCompletion co = {
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 19/35] nfs: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/nfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/nfs.c b/block/nfs.c
index c3c5de0113..3f393a95a4 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -679,7 +679,8 @@ static QemuOptsList nfs_create_opts = {
 }
 };
 
-static int nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+nfs_file_create(const char *url, QemuOpts *opts, Error **errp)
 {
 int ret = 0;
 int64_t total_size = 0;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 13/35] nbd: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/nbd-client.h | 10 +-
 block/nbd-client.c | 24 
 block/nbd.c|  3 ++-
 nbd/server.c   |  3 ++-
 4 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/block/nbd-client.h b/block/nbd-client.h
index 49636bc621..473d1f88fd 100644
--- a/block/nbd-client.h
+++ b/block/nbd-client.h
@@ -42,13 +42,13 @@ int nbd_client_init(BlockDriverState *bs,
 Error **errp);
 void nbd_client_close(BlockDriverState *bs);
 
-int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes);
-int nbd_client_co_flush(BlockDriverState *bs);
-int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
+int coroutine_fn nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, 
int bytes);
+int coroutine_fn nbd_client_co_flush(BlockDriverState *bs);
+int coroutine_fn nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, QEMUIOVector *qiov, int flags);
-int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+int coroutine_fn nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t 
offset,
 int bytes, BdrvRequestFlags flags);
-int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
+int coroutine_fn nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes, QEMUIOVector *qiov, int flags);
 
 void nbd_client_detach_aio_context(BlockDriverState *bs);
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 02e928142e..63c0210c37 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -111,7 +111,8 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
 s->read_reply_co = NULL;
 }
 
-static int nbd_co_send_request(BlockDriverState *bs,
+static int coroutine_fn
+nbd_co_send_request(BlockDriverState *bs,
NBDRequest *request,
QEMUIOVector *qiov)
 {
@@ -158,7 +159,8 @@ static int nbd_co_send_request(BlockDriverState *bs,
 return rc;
 }
 
-static void nbd_co_receive_reply(NBDClientSession *s,
+static void coroutine_fn
+nbd_co_receive_reply(NBDClientSession *s,
  NBDRequest *request,
  NBDReply *reply,
  QEMUIOVector *qiov)
@@ -185,7 +187,8 @@ static void nbd_co_receive_reply(NBDClientSession *s,
 }
 }
 
-static void nbd_coroutine_end(BlockDriverState *bs,
+static void coroutine_fn
+nbd_coroutine_end(BlockDriverState *bs,
   NBDRequest *request)
 {
 NBDClientSession *s = nbd_get_client_session(bs);
@@ -204,7 +207,8 @@ static void nbd_coroutine_end(BlockDriverState *bs,
 qemu_co_mutex_unlock(&s->send_mutex);
 }
 
-int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
+int coroutine_fn
+nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
@@ -229,7 +233,8 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t 
offset,
 return -reply.error;
 }
 
-int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
+int coroutine_fn
+nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
@@ -258,7 +263,8 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t 
offset,
 return -reply.error;
 }
 
-int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+int coroutine_fn
+nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
 int bytes, BdrvRequestFlags flags)
 {
 ssize_t ret;
@@ -292,7 +298,8 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset,
 return -reply.error;
 }
 
-int nbd_client_co_flush(BlockDriverState *bs)
+int coroutine_fn
+nbd_client_co_flush(BlockDriverState *bs)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = { .type = NBD_CMD_FLUSH };
@@ -316,7 +323,8 @@ int nbd_client_co_flush(BlockDriverState *bs)
 return -reply.error;
 }
 
-int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
+int coroutine_fn
+nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
 {
 NBDClientSession *client = nbd_get_client_session(bs);
 NBDRequest request = {
diff --git a/block/nbd.c b/block/nbd.c
index d529305330..8689b27d7d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -465,7 +465,8 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 return ret;
 }
 
-static int nbd_co_flush(BlockDriverState *bs)
+static int coroutine_fn
+nbd_co_flush(BlockDriverState *bs)
 {
 return nbd_client_co_flush(bs);
 }
diff --git a/nbd/server.c b/nbd/server.c
index 8a70c054a6..411

[Qemu-devel] [PATCH 14/35] migration: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 migration/migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 51ccd1a4c5..3370482637 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -303,7 +303,8 @@ static void process_incoming_migration_bh(void *opaque)
 migration_incoming_state_destroy();
 }
 
-static void process_incoming_migration_co(void *opaque)
+static void coroutine_fn
+process_incoming_migration_co(void *opaque)
 {
 QEMUFile *f = opaque;
 MigrationIncomingState *mis = migration_incoming_get_current();
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 24/35] null: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/null.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/block/null.c b/block/null.c
index 876f90965b..4c8afe16d7 100644
--- a/block/null.c
+++ b/block/null.c
@@ -167,7 +167,8 @@ static inline BlockAIOCB *null_aio_common(BlockDriverState 
*bs,
 return &acb->common;
 }
 
-static BlockAIOCB *null_aio_readv(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+null_aio_readv(BlockDriverState *bs,
   int64_t sector_num, QEMUIOVector *qiov,
   int nb_sectors,
   BlockCompletionFunc *cb,
@@ -182,7 +183,8 @@ static BlockAIOCB *null_aio_readv(BlockDriverState *bs,
 return null_aio_common(bs, cb, opaque);
 }
 
-static BlockAIOCB *null_aio_writev(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+null_aio_writev(BlockDriverState *bs,
int64_t sector_num, QEMUIOVector *qiov,
int nb_sectors,
BlockCompletionFunc *cb,
@@ -191,7 +193,8 @@ static BlockAIOCB *null_aio_writev(BlockDriverState *bs,
 return null_aio_common(bs, cb, opaque);
 }
 
-static BlockAIOCB *null_aio_flush(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+null_aio_flush(BlockDriverState *bs,
   BlockCompletionFunc *cb,
   void *opaque)
 {
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 05/35] coroutine: remove coroutine_fn from qemu_co_queue_run_restart()

2017-07-04 Thread Marc-André Lureau
The function can be invoked from non-coroutine context.

Signed-off-by: Marc-André Lureau 
---
 include/qemu/coroutine_int.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine_int.h b/include/qemu/coroutine_int.h
index cb98892bba..e9fc72c5b0 100644
--- a/include/qemu/coroutine_int.h
+++ b/include/qemu/coroutine_int.h
@@ -61,6 +61,6 @@ Coroutine *qemu_coroutine_new(void);
 void qemu_coroutine_delete(Coroutine *co);
 CoroutineAction qemu_coroutine_switch(Coroutine *from, Coroutine *to,
   CoroutineAction action);
-void coroutine_fn qemu_co_queue_run_restart(Coroutine *co);
+void qemu_co_queue_run_restart(Coroutine *co);
 
 #endif
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 10/35] vmdk: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/vmdk.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index 55581b03fe..f8422e8971 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1334,7 +1334,8 @@ static int64_t coroutine_fn 
vmdk_co_get_block_status(BlockDriverState *bs,
 return ret;
 }
 
-static int vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
+static int coroutine_fn
+vmdk_write_extent(VmdkExtent *extent, int64_t cluster_offset,
 int64_t offset_in_cluster, QEMUIOVector *qiov,
 uint64_t qiov_offset, uint64_t n_bytes,
 uint64_t offset)
@@ -1406,7 +1407,8 @@ static int vmdk_write_extent(VmdkExtent *extent, int64_t 
cluster_offset,
 return ret;
 }
 
-static int vmdk_read_extent(VmdkExtent *extent, int64_t cluster_offset,
+static int coroutine_fn
+vmdk_read_extent(VmdkExtent *extent, int64_t cluster_offset,
 int64_t offset_in_cluster, QEMUIOVector *qiov,
 int bytes)
 {
@@ -1551,7 +1553,8 @@ fail:
  *
  * Returns: error code with 0 for success.
  */
-static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
+static int coroutine_fn
+vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
bool zeroed, bool zero_dry_run)
 {
@@ -1857,7 +1860,8 @@ static int filename_decompose(const char *filename, char 
*path, char *prefix,
 return VMDK_OK;
 }
 
-static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int idx = 0;
 BlockBackend *new_blk = NULL;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 21/35] rbd: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/rbd.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 9da02cdceb..7b4d548cd2 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -348,7 +348,8 @@ static QemuOptsList runtime_opts = {
 },
 };
 
-static int qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+qemu_rbd_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 Error *local_err = NULL;
 int64_t bytes = 0;
@@ -861,7 +862,8 @@ failed:
 return NULL;
 }
 
-static BlockAIOCB *qemu_rbd_aio_readv(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+qemu_rbd_aio_readv(BlockDriverState *bs,
   int64_t sector_num,
   QEMUIOVector *qiov,
   int nb_sectors,
@@ -873,7 +875,8 @@ static BlockAIOCB *qemu_rbd_aio_readv(BlockDriverState *bs,
  RBD_AIO_READ);
 }
 
-static BlockAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+qemu_rbd_aio_writev(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
int nb_sectors,
@@ -886,7 +889,8 @@ static BlockAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs,
 }
 
 #ifdef LIBRBD_SUPPORTS_AIO_FLUSH
-static BlockAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+qemu_rbd_aio_flush(BlockDriverState *bs,
   BlockCompletionFunc *cb,
   void *opaque)
 {
@@ -1063,7 +1067,8 @@ static int qemu_rbd_snap_list(BlockDriverState *bs,
 }
 
 #ifdef LIBRBD_SUPPORTS_DISCARD
-static BlockAIOCB *qemu_rbd_aio_pdiscard(BlockDriverState *bs,
+static BlockAIOCB * coroutine_fn
+qemu_rbd_aio_pdiscard(BlockDriverState *bs,
  int64_t offset,
  int bytes,
  BlockCompletionFunc *cb,
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 12/35] raw: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/raw-format.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index 0d185fe41b..402d3b9fba 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -361,7 +361,8 @@ static void raw_lock_medium(BlockDriverState *bs, bool 
locked)
 bdrv_lock_medium(bs->file->bs, locked);
 }
 
-static int raw_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
+static int coroutine_fn
+raw_co_ioctl(BlockDriverState *bs, unsigned long int req, void *buf)
 {
 BDRVRawState *s = bs->opaque;
 if (s->offset || s->has_size) {
@@ -375,7 +376,8 @@ static int raw_has_zero_init(BlockDriverState *bs)
 return bdrv_has_zero_init(bs->file->bs);
 }
 
-static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+raw_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 return bdrv_create_file(filename, opts, errp);
 }
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 04/35] coroutine: remove coroutine_fn from qemu_coroutine_self()

2017-07-04 Thread Marc-André Lureau
The function may be safely called from non-coroutine context.

Signed-off-by: Marc-André Lureau 
---
 include/qemu/coroutine.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 35ff394f51..ec55fe295c 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -121,7 +121,7 @@ void coroutine_fn qemu_coroutine_yield(void);
 /**
  * Get the currently executing coroutine
  */
-Coroutine *coroutine_fn qemu_coroutine_self(void);
+Coroutine *qemu_coroutine_self(void);
 
 /**
  * Return whether or not currently inside a coroutine
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 include/block/block_int.h | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 15fa602150..93eb2a9528 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -133,15 +133,15 @@ struct BlockDriver {
 void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
 
 /* aio */
-BlockAIOCB *(*bdrv_aio_readv)(BlockDriverState *bs,
+BlockAIOCB * coroutine_fn (*bdrv_aio_readv)(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque);
-BlockAIOCB *(*bdrv_aio_writev)(BlockDriverState *bs,
+BlockAIOCB * coroutine_fn (*bdrv_aio_writev)(BlockDriverState *bs,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque);
-BlockAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
+BlockAIOCB * coroutine_fn (*bdrv_aio_flush)(BlockDriverState *bs,
 BlockCompletionFunc *cb, void *opaque);
-BlockAIOCB *(*bdrv_aio_pdiscard)(BlockDriverState *bs,
+BlockAIOCB * coroutine_fn (*bdrv_aio_pdiscard)(BlockDriverState *bs,
 int64_t offset, int bytes,
 BlockCompletionFunc *cb, void *opaque);
 
@@ -247,7 +247,7 @@ struct BlockDriver {
 void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
 
 /* to control generic scsi devices */
-BlockAIOCB *(*bdrv_aio_ioctl)(BlockDriverState *bs,
+BlockAIOCB * coroutine_fn (*bdrv_aio_ioctl)(BlockDriverState *bs,
 unsigned long int req, void *buf,
 BlockCompletionFunc *cb, void *opaque);
 int coroutine_fn (*bdrv_co_ioctl)(BlockDriverState *bs,
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 11/35] qcow2: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
Signed-off-by: Marc-André Lureau 
---
 block/qcow2.h |  6 --
 block/qcow.c  |  4 +++-
 block/qcow2-cluster.c | 11 +++
 block/qcow2.c | 15 ++-
 4 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index 87b15eb4aa..a32b47b7f6 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -550,14 +550,16 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 
 int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
  unsigned int *bytes, uint64_t *cluster_offset);
-int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
+int coroutine_fn
+qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m);
 uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
  uint64_t offset,
  int compressed_size);
 
-int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
+int coroutine_fn
+qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
 int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
   uint64_t bytes, enum qcow2_discard_type type,
   bool full_discard);
diff --git a/block/qcow.c b/block/qcow.c
index 7bd94dcd46..d84ae7fb74 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -796,7 +796,9 @@ static void qcow_close(BlockDriverState *bs)
 error_free(s->migration_blocker);
 }
 
-static int qcow_create(const char *filename, QemuOpts *opts, Error **errp)
+
+static int coroutine_fn
+qcow_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 int header_size, backing_filename_len, l1_size, shift, i;
 QCowHeader header;
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 3d341fd9cb..964d23aee8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -761,7 +761,8 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 return cluster_offset;
 }
 
-static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
+static int coroutine_fn
+perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 {
 BDRVQcow2State *s = bs->opaque;
 Qcow2COWRegion *start = &m->cow_start;
@@ -890,7 +891,7 @@ fail:
 return ret;
 }
 
-int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
+int coroutine_fn qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta 
*m)
 {
 BDRVQcow2State *s = bs->opaque;
 int i, j = 0, l2_index, ret;
@@ -1014,7 +1015,8 @@ out:
  *   information on cluster allocation may be invalid now. The caller
  *   must start over anyway, so consider *cur_bytes undefined.
  */
-static int handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
+static int coroutine_fn
+handle_dependencies(BlockDriverState *bs, uint64_t guest_offset,
 uint64_t *cur_bytes, QCowL2Meta **m)
 {
 BDRVQcow2State *s = bs->opaque;
@@ -1413,7 +1415,8 @@ fail:
  *
  * Return 0 on success and -errno in error cases
  */
-int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
+int coroutine_fn
+qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset,
unsigned int *bytes, uint64_t *host_offset,
QCowL2Meta **m)
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index 2f94f0326e..6ecf1489dc 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2079,7 +2079,8 @@ static int qcow2_change_backing_file(BlockDriverState *bs,
 return qcow2_update_header(bs);
 }
 
-static int preallocate(BlockDriverState *bs)
+static int coroutine_fn
+preallocate(BlockDriverState *bs)
 {
 uint64_t bytes;
 uint64_t offset;
@@ -2140,7 +2141,8 @@ static int preallocate(BlockDriverState *bs)
 return 0;
 }
 
-static int qcow2_create2(const char *filename, int64_t total_size,
+static int coroutine_fn
+qcow2_create2(const char *filename, int64_t total_size,
  const char *backing_file, const char *backing_format,
  int flags, size_t cluster_size, PreallocMode prealloc,
  QemuOpts *opts, int version, int refcount_order,
@@ -2390,7 +2392,8 @@ out:
 return ret;
 }
 
-static int qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn
+qcow2_create(const char *filename, QemuOpts *opts, Error **errp)
 {
 char *backing_file = NULL;
 char *backing_fmt = NULL;
@@ -3011,7 +3014,8 @@ static void dump_refcounts(BlockDriverState *bs)
 }
 #endif
 
-static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+static int coroutine_fn
+qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
   int64_t pos)
 {
 BDRVQcow2State *s = bs->opaque;
@@ -3021,7 +3025,8 @@ static int qcow2_save_vmsta

[Qemu-devel] [PATCH 07/35] blockjob: mark coroutine_fn

2017-07-04 Thread Marc-André Lureau
/home/elmarco/src/qemu/blockjob.c:820:9: error: calling function 
'qemu_coroutine_yield' requires holding role '_coroutine_fn' exclusively 
[-Werror,-Wthread-safety-analysis]
qemu_coroutine_yield();
^
/home/elmarco/src/qemu/blockjob.c:824:5: error: calling function 
'block_job_pause_point' requires holding role '_coroutine_fn' exclusively 
[-Werror,-Wthread-safety-analysis]
block_job_pause_point(job);
^
Signed-off-by: Marc-André Lureau 
---
 include/block/blockjob_int.h | 4 ++--
 blockjob.c   | 6 --
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index f13ad05c0d..a3bc01fd51 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -145,7 +145,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
  * Put the job to sleep (assuming that it wasn't canceled) for @ns
  * nanoseconds.  Canceling the job will interrupt the wait immediately.
  */
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
+void coroutine_fn block_job_sleep_ns(BlockJob *job, QEMUClockType type, 
int64_t ns);
 
 /**
  * block_job_yield:
@@ -153,7 +153,7 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, 
int64_t ns);
  *
  * Yield the block job coroutine.
  */
-void block_job_yield(BlockJob *job);
+void coroutine_fn block_job_yield(BlockJob *job);
 
 /**
  * block_job_pause_all:
diff --git a/blockjob.c b/blockjob.c
index 70a78188b7..96424323c1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -788,7 +788,8 @@ bool block_job_is_cancelled(BlockJob *job)
 return job->cancelled;
 }
 
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
+void coroutine_fn
+block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
 {
 assert(job->busy);
 
@@ -806,7 +807,8 @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, 
int64_t ns)
 block_job_pause_point(job);
 }
 
-void block_job_yield(BlockJob *job)
+void coroutine_fn
+block_job_yield(BlockJob *job)
 {
 assert(job->busy);
 
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 01/35] WIP: coroutine: annotate coroutine with clang thread safety attributes

2017-07-04 Thread Marc-André Lureau
It is possible to use clang -Wthread-safety to do some basic coroutine
checks:
http://lists.llvm.org/pipermail/cfe-dev/2017-June/054372.html
https://clang.llvm.org/docs/ThreadSafetyAnalysis.html

This will basically check that you don't call accidentally a coroutine
function from a non-coroutine, as this may crash at run time if the
coroutine function yields.

I had to modify clang to support annotations on typedef and function
pointers, and check some function assignments/arguments. The end
result is quire far from ready for upstream review, but could serve as
basis for more checks or work. (https://github.com/elmarco/clang
qemu-ta branch)

Signed-off-by: Marc-André Lureau 
---
 include/qemu/coroutine.h | 31 ++-
 util/coroutine-sigaltstack.c |  2 ++
 util/coroutine-ucontext.c|  2 ++
 util/coroutine-win32.c   |  2 ++
 util/qemu-coroutine.c|  2 ++
 5 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index a4509bd977..35ff394f51 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -28,6 +28,34 @@
  * These functions are re-entrant and may be used outside the global mutex.
  */
 
+/* clang thread-safety attributes, used for static analysis of the CFG */
+#if defined(__clang__) && (!defined(SWIG))
+#define THREAD_ANNOTATION_ATTRIBUTE__(x)   __attribute__((x))
+#else
+#define THREAD_ANNOTATION_ATTRIBUTE__(x)
+#endif
+
+#define TAA_ROLE\
+THREAD_ANNOTATION_ATTRIBUTE__(capability("role"))
+
+#define TAA_REQUIRES(...)   \
+THREAD_ANNOTATION_ATTRIBUTE__(requires_capability(__VA_ARGS__))
+
+#define TAA_ACQUIRE(R)  \
+THREAD_ANNOTATION_ATTRIBUTE__(acquire_capability(R))
+
+#define TAA_RELEASE(R)  \
+THREAD_ANNOTATION_ATTRIBUTE__(release_capability(R))
+
+#define TAA_NO_ANALYSYS \
+THREAD_ANNOTATION_ATTRIBUTE__(no_thread_safety_analysis)
+
+typedef int TAA_ROLE coroutine_role;
+extern coroutine_role _coroutine_fn;
+
+static inline void co_role_acquire(coroutine_role R) TAA_ACQUIRE(R) 
TAA_NO_ANALYSYS {}
+static inline void co_role_release(coroutine_role R) TAA_RELEASE(R) 
TAA_NO_ANALYSYS {}
+
 /**
  * Mark a function that executes in coroutine context
  *
@@ -42,7 +70,8 @@
  *   
  *   }
  */
-#define coroutine_fn
+
+#define coroutine_fn TAA_REQUIRES(_coroutine_fn)
 
 typedef struct Coroutine Coroutine;
 
diff --git a/util/coroutine-sigaltstack.c b/util/coroutine-sigaltstack.c
index f6fc49a0e5..05d1a378d1 100644
--- a/util/coroutine-sigaltstack.c
+++ b/util/coroutine-sigaltstack.c
@@ -98,7 +98,9 @@ static void coroutine_bootstrap(CoroutineSigAltStack *self, 
Coroutine *co)
 }
 
 while (true) {
+co_role_acquire(_coroutine_fn);
 co->entry(co->entry_arg);
+co_role_release(_coroutine_fn);
 qemu_coroutine_switch(co, co->caller, COROUTINE_TERMINATE);
 }
 }
diff --git a/util/coroutine-ucontext.c b/util/coroutine-ucontext.c
index 6621f3f692..010fbaedf1 100644
--- a/util/coroutine-ucontext.c
+++ b/util/coroutine-ucontext.c
@@ -76,7 +76,9 @@ static void coroutine_trampoline(int i0, int i1)
 }
 
 while (true) {
+co_role_acquire(_coroutine_fn);
 co->entry(co->entry_arg);
+co_role_release(_coroutine_fn);
 qemu_coroutine_switch(co, co->caller, COROUTINE_TERMINATE);
 }
 }
diff --git a/util/coroutine-win32.c b/util/coroutine-win32.c
index de6bd4fd3e..75a3bed543 100644
--- a/util/coroutine-win32.c
+++ b/util/coroutine-win32.c
@@ -64,7 +64,9 @@ static void CALLBACK coroutine_trampoline(void *co_)
 Coroutine *co = co_;
 
 while (true) {
+co_role_acquire(_coroutine_fn);
 co->entry(co->entry_arg);
+co_role_release(_coroutine_fn);
 qemu_coroutine_switch(co, co->caller, COROUTINE_TERMINATE);
 }
 }
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index d6095c1d5a..efa0f20e69 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -25,6 +25,8 @@ enum {
 POOL_BATCH_SIZE = 64,
 };
 
+coroutine_role _coroutine_fn;
+
 /** Free list to speed up creation */
 static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
 static unsigned int release_pool_size;
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 03/35] test-coroutine: fix coroutine attribute

2017-07-04 Thread Marc-André Lureau
  CC  tests/test-coroutine.o
qemu/tests/test-coroutine.c:390:5: warning: calling function 
'qemu_coroutine_yield' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_coroutine_yield();
^
qemu/tests/test-coroutine.c:403:14: warning: Unmached attributes 
[-Wthread-safety-analysis]
co = qemu_coroutine_create(perf_cost_func, &i);
 ^
2 warnings generated.

Signed-off-by: Marc-André Lureau 
---
 tests/test-coroutine.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/test-coroutine.c b/tests/test-coroutine.c
index abd97c23c1..f14277af83 100644
--- a/tests/test-coroutine.c
+++ b/tests/test-coroutine.c
@@ -385,7 +385,7 @@ static void perf_baseline(void)
 maxcycles, duration);
 }
 
-static __attribute__((noinline)) void perf_cost_func(void *opaque)
+static __attribute__((noinline)) void coroutine_fn perf_cost_func(void *opaque)
 {
 qemu_coroutine_yield();
 }
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 09/35] block: bdrv_create() and bdrv_debug_event() are coroutine_fn

2017-07-04 Thread Marc-André Lureau
Called from coroutine.

Signed-off-by: Marc-André Lureau 
---
 include/block/block_int.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 93eb2a9528..a183c72b7c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -126,7 +126,7 @@ struct BlockDriver {
 int (*bdrv_file_open)(BlockDriverState *bs, QDict *options, int flags,
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
-int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
+int coroutine_fn (*bdrv_create)(const char *filename, QemuOpts *opts, 
Error **errp);
 int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
@@ -267,7 +267,7 @@ struct BlockDriver {
   BlockDriverAmendStatusCB *status_cb,
   void *cb_opaque);
 
-void (*bdrv_debug_event)(BlockDriverState *bs, BlkdebugEvent event);
+void coroutine_fn (*bdrv_debug_event)(BlockDriverState *bs, BlkdebugEvent 
event);
 
 /* TODO Better pass a option string/QDict/QemuOpts to add any rule? */
 int (*bdrv_debug_breakpoint)(BlockDriverState *bs, const char *event,
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 06/35] coroutine: mark CoRwLock coroutine_fn

2017-07-04 Thread Marc-André Lureau
  CC  util/qemu-coroutine-lock.o
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:371:5: warning: calling 
function 'qemu_co_mutex_lock' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_co_mutex_lock(&lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:371:5: warning: calling 
function 'qemu_co_mutex_lock' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:374:9: warning: calling 
function 'qemu_co_queue_wait' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_co_queue_wait(&lock->queue, &lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:374:9: warning: calling 
function 'qemu_co_queue_wait' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:377:5: warning: calling 
function 'qemu_co_mutex_unlock' requires holding role '_coroutine_fn' 
exclusively [-Wthread-safety-analysis]
qemu_co_mutex_unlock(&lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:377:5: warning: calling 
function 'qemu_co_mutex_unlock' requires holding role '_coroutine_fn' 
exclusively [-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:390:9: warning: calling 
function 'qemu_co_queue_restart_all' requires holding role '_coroutine_fn' 
exclusively [-Wthread-safety-analysis]
qemu_co_queue_restart_all(&lock->queue);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:390:9: warning: calling 
function 'qemu_co_queue_restart_all' requires holding role '_coroutine_fn' 
exclusively [-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:394:9: warning: calling 
function 'qemu_co_mutex_lock' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_co_mutex_lock(&lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:394:9: warning: calling 
function 'qemu_co_mutex_lock' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:399:13: warning: calling 
function 'qemu_co_queue_next' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_co_queue_next(&lock->queue);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:399:13: warning: calling 
function 'qemu_co_queue_next' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:402:5: warning: calling 
function 'qemu_co_mutex_unlock' requires holding role '_coroutine_fn' 
exclusively [-Wthread-safety-analysis]
qemu_co_mutex_unlock(&lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:402:5: warning: calling 
function 'qemu_co_mutex_unlock' requires holding role '_coroutine_fn' 
exclusively [-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:407:5: warning: calling 
function 'qemu_co_mutex_lock' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_co_mutex_lock(&lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:407:5: warning: calling 
function 'qemu_co_mutex_lock' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:410:9: warning: calling 
function 'qemu_co_queue_wait' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]
qemu_co_queue_wait(&lock->queue, &lock->mutex);
^
/home/elmarco/src/qemu/util/qemu-coroutine-lock.c:410:9: warning: calling 
function 'qemu_co_queue_wait' requires holding role '_coroutine_fn' exclusively 
[-Wthread-safety-analysis]

Signed-off-by: Marc-André Lureau 
---
 include/qemu/coroutine.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index ec55fe295c..5698414c14 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -255,20 +255,20 @@ void qemu_co_rwlock_init(CoRwlock *lock);
  * of a parallel writer, control is transferred to the caller of the current
  * coroutine.
  */
-void qemu_co_rwlock_rdlock(CoRwlock *lock);
+void coroutine_fn qemu_co_rwlock_rdlock(CoRwlock *lock);
 
 /**
  * Write Locks the mutex. If the lock cannot be taken immediately because
  * of a parallel reader, control is transferred to the caller of the current
  * coroutine.
  */
-void qemu_co_rwlock_wrlock(CoRwlock *lock);
+void coroutine_fn qemu_co_rwlock_wrlock(CoRwlock *lock);
 
 /**
  * Unlocks the read/write lock and schedules the next coroutine that was
  * waiting for this lock to be run.
  */
-void qemu_co_rwlock_unlock(CoRwlock *lock);
+void coroutine_fn qemu_co_rwlock_unlock(CoRwlock *lock);
 
 /**
  * Yield the coroutine for a given duration
-- 
2.13.1.395.gf7b71de06




[Qemu-devel] [PATCH 02/35] WIP: coroutine: manually tag the fast-paths

2017-07-04 Thread Marc-André Lureau
Some functions are both regular and coroutine. They may call coroutine
functions in some cases, if it is known to be running in a coroutine.

Signed-off-by: Marc-André Lureau 
---
 block.c |  2 ++
 block/block-backend.c   |  2 ++
 block/io.c  | 16 +++-
 block/sheepdog.c|  2 ++
 block/throttle-groups.c | 10 --
 migration/rdma.c|  2 ++
 6 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 694396281b..b08c006da4 100644
--- a/block.c
+++ b/block.c
@@ -443,7 +443,9 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
+co_role_acquire(_coroutine_fn);
 bdrv_create_co_entry(&cco);
+co_role_release(_coroutine_fn);
 } else {
 co = qemu_coroutine_create(bdrv_create_co_entry, &cco);
 qemu_coroutine_enter(co);
diff --git a/block/block-backend.c b/block/block-backend.c
index 0df3457a09..56fc0a4d1e 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1072,7 +1072,9 @@ static int blk_prw(BlockBackend *blk, int64_t offset, 
uint8_t *buf,
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
+co_role_acquire(_coroutine_fn);
 co_entry(&rwco);
+co_role_release(_coroutine_fn);
 } else {
 Coroutine *co = qemu_coroutine_create(co_entry, &rwco);
 bdrv_coroutine_enter(blk_bs(blk), co);
diff --git a/block/io.c b/block/io.c
index 2de7c77983..14b88c8609 100644
--- a/block/io.c
+++ b/block/io.c
@@ -229,7 +229,9 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
 void bdrv_drained_begin(BlockDriverState *bs)
 {
 if (qemu_in_coroutine()) {
+co_role_acquire(_coroutine_fn);
 bdrv_co_yield_to_drain(bs);
+co_role_release(_coroutine_fn);
 return;
 }
 
@@ -616,7 +618,9 @@ static int bdrv_prwv_co(BdrvChild *child, int64_t offset,
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
+co_role_acquire(_coroutine_fn);
 bdrv_rw_co_entry(&rwco);
+co_role_release(_coroutine_fn);
 } else {
 co = qemu_coroutine_create(bdrv_rw_co_entry, &rwco);
 bdrv_coroutine_enter(child->bs, co);
@@ -1901,7 +1905,9 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
+co_role_acquire(_coroutine_fn);
 bdrv_get_block_status_above_co_entry(&data);
+co_role_release(_coroutine_fn);
 } else {
 co = qemu_coroutine_create(bdrv_get_block_status_above_co_entry,
&data);
@@ -2027,7 +2033,11 @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector 
*qiov, int64_t pos,
 bool is_read)
 {
 if (qemu_in_coroutine()) {
-return bdrv_co_rw_vmstate(bs, qiov, pos, is_read);
+int ret;
+co_role_acquire(_coroutine_fn);
+ret = bdrv_co_rw_vmstate(bs, qiov, pos, is_read);
+co_role_release(_coroutine_fn);
+return ret;
 } else {
 BdrvVmstateCo data = {
 .bs = bs,
@@ -2259,7 +2269,9 @@ int bdrv_flush(BlockDriverState *bs)
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
+co_role_acquire(_coroutine_fn);
 bdrv_flush_co_entry(&flush_co);
+co_role_release(_coroutine_fn);
 } else {
 co = qemu_coroutine_create(bdrv_flush_co_entry, &flush_co);
 bdrv_coroutine_enter(bs, co);
@@ -2406,7 +2418,9 @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, 
int bytes)
 
 if (qemu_in_coroutine()) {
 /* Fast-path if already in coroutine context */
+co_role_acquire(_coroutine_fn);
 bdrv_pdiscard_co_entry(&rwco);
+co_role_release(_coroutine_fn);
 } else {
 co = qemu_coroutine_create(bdrv_pdiscard_co_entry, &rwco);
 bdrv_coroutine_enter(bs, co);
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 08d7b11e9d..83bc43dde4 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -726,7 +726,9 @@ static int do_req(int sockfd, BlockDriverState *bs, 
SheepdogReq *hdr,
 };
 
 if (qemu_in_coroutine()) {
+co_role_acquire(_coroutine_fn);
 do_co_req(&srco);
+co_role_release(_coroutine_fn);
 } else {
 co = qemu_coroutine_create(do_co_req, &srco);
 if (bs) {
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index da2b490c38..8778f78965 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -304,9 +304,15 @@ static void schedule_next_request(BlockBackend *blk, bool 
is_write)
 
 /* If it doesn't have to wait, queue it for immediate execution */
 if (!must_wait) {
+bool handled = false;
+
+if (qemu_in_coroutine()) {
+co_role_acquire(_coroutine_

[Qemu-devel] [PATCH 00/35] RFC: coroutine annotations & clang check

2017-07-04 Thread Marc-André Lureau
Hi,

After investigating a bit using clang-tidy to do some coroutine checks
(and hitting a wall as there are no pre-processor info in the AST), it
was suggested to me on the clang mailing list to try to use
-Wthread-safety. I had to modify clang a bit to make it work on qemu
code base (annotations on function typedef etc,
https://github.com/elmarco/clang qemu-ta branch - very hackish state).

The analysis simply checks that coroutine_fn are called from a
coroutine "context" (or "role"). I couldn't find any misuse in qemu
code base, however, a number of coroutine_fn annotations are missing.

(I think it would make sense to squash all the "mark coroutine_fn"
commits if we apply them, I tried to split them by domains/maintainer
to ease review)

Marc-André Lureau (35):
  WIP: coroutine: annotate coroutine with clang thread safety attributes
  WIP: coroutine: manually tag the fast-paths
  test-coroutine: fix coroutine attribute
  coroutine: remove coroutine_fn from qemu_coroutine_self()
  coroutine: remove coroutine_fn from qemu_co_queue_run_restart()
  coroutine: mark CoRwLock coroutine_fn
  blockjob: mark coroutine_fn
  block: all bdrv_aio callbacks are coroutine_fn
  block: bdrv_create() and bdrv_debug_event() are coroutine_fn
  vmdk: mark coroutine_fn
  qcow2: mark coroutine_fn
  raw: mark coroutine_fn
  nbd: mark coroutine_fn
  migration: mark coroutine_fn
  backup: mark coroutine_fn
  crypto: mark coroutine_fn
  curl: mark coroutine_fn
  gluster: mark coroutine_fn
  nfs: mark coroutine_fn
  quorum: mark coroutine_fn
  rbd: mark coroutine_fn
  sheepdog: mark coroutine_fn
  ssh: mark coroutine_fn
  null: mark coroutine_fn
  mirror: mark coroutine_fn
  iscsi: mark coroutine_fn
  file-posix: mark coroutine_fn
  9p: mark coroutine_fn
  block: mark coroutine_fn
  block-backend: mark coroutine_fn
  parallels: mark coroutine_fn
  qed: mark coroutine_fn
  vdi: mark coroutine_fn
  vhdx: mark coroutine_fn
  vpc: mark coroutine_fn

 block/nbd-client.h | 10 +-
 block/qcow2.h  |  6 --
 hw/9pfs/9p.h   |  9 ++---
 include/block/block_backup.h   |  4 ++--
 include/block/block_int.h  | 14 +++---
 include/block/blockjob_int.h   |  4 ++--
 include/qemu/coroutine.h   | 39 ++-
 include/qemu/coroutine_int.h   |  2 +-
 include/sysemu/block-backend.h |  4 ++--
 block.c|  2 ++
 block/backup.c |  9 ++---
 block/blkdebug.c   | 15 ++-
 block/blkverify.c  |  3 ++-
 block/block-backend.c  | 38 ++
 block/crypto.c |  3 ++-
 block/curl.c   |  3 ++-
 block/file-posix.c | 15 ++-
 block/gluster.c|  3 ++-
 block/io.c | 25 +
 block/iscsi.c  |  6 --
 block/mirror.c | 15 ++-
 block/nbd-client.c | 24 
 block/nbd.c|  3 ++-
 block/nfs.c|  3 ++-
 block/null.c   |  9 ++---
 block/parallels.c  |  3 ++-
 block/qcow.c   |  4 +++-
 block/qcow2-cluster.c  | 11 +++
 block/qcow2.c  | 15 ++-
 block/qed.c|  3 ++-
 block/quorum.c | 25 -
 block/raw-format.c |  6 --
 block/rbd.c| 15 ++-
 block/sheepdog.c   | 20 ++--
 block/ssh.c|  6 --
 block/throttle-groups.c| 10 --
 block/vdi.c|  3 ++-
 block/vhdx.c   |  3 ++-
 block/vmdk.c   | 12 
 block/vpc.c|  3 ++-
 blockjob.c |  6 --
 migration/migration.c  |  3 ++-
 migration/rdma.c   |  2 ++
 nbd/server.c   |  3 ++-
 tests/test-coroutine.c |  2 +-
 util/coroutine-sigaltstack.c   |  2 ++
 util/coroutine-ucontext.c  |  2 ++
 util/coroutine-win32.c |  2 ++
 util/qemu-coroutine.c  |  2 ++
 49 files changed, 299 insertions(+), 132 deletions(-)

-- 
2.13.1.395.gf7b71de06




Re: [Qemu-devel] [PATCH] vhost: fix a memory leak

2017-07-04 Thread Marc-André Lureau
Hi

On Wed, Jul 5, 2017 at 12:33 AM, Peng Hao  wrote:
> vhost exists a call for g_file_get_contents, but not call g_free.
>
> Signed-off-by: Peng Hao

Reviewed-by: Marc-André Lureau 

mst: is this better?

> ---
>  hw/virtio/vhost-backend.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 4e31de1..2c481d6 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -52,11 +52,13 @@ static int vhost_kernel_memslots_limit(struct vhost_dev 
> *dev)
>  &s, NULL, NULL)) {
>  uint64_t val = g_ascii_strtoull(s, NULL, 10);
>  if (!((val == G_MAXUINT64 || !val) && errno)) {
> +g_free(s);
>  return val;
>  }
>  error_report("ignoring invalid max_mem_regions value in vhost 
> module:"
>   " %s", s);
>  }
> +g_free(s);
>  return limit;
>  }
>
> --
> 1.8.3.1
>
>
>



-- 
Marc-André Lureau



Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base

2017-07-04 Thread Michael S. Tsirkin
On Wed, Jun 28, 2017 at 08:00:43PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> When we receive a GET_VRING_BASE message set enable = false
> to stop any new received packets modifying the ring.
> 
> Signed-off-by: Dr. David Alan Gilbert 

I think I already reviewed a similar patch.
Spec says:
Client must only process each ring when it is started.

IMHO the real fix is to fix client to check the started
flag before processing the ring.

> ---
>  contrib/libvhost-user/libvhost-user.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/contrib/libvhost-user/libvhost-user.c 
> b/contrib/libvhost-user/libvhost-user.c
> index ceddeac74f..d37052b7b0 100644
> --- a/contrib/libvhost-user/libvhost-user.c
> +++ b/contrib/libvhost-user/libvhost-user.c
> @@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
>  vmsg->size = sizeof(vmsg->payload.state);
>  
>  dev->vq[index].started = false;
> +dev->vq[index].enable = false;
>  if (dev->iface->queue_set_started) {
>  dev->iface->queue_set_started(dev, index, false);
>  }
> -- 
> 2.13.0



Re: [Qemu-devel] [PATCH] vhost: fix a memory leak

2017-07-04 Thread Michael S. Tsirkin
On Tue, Jul 04, 2017 at 02:21:08PM +, Marc-André Lureau wrote:
> Hi
> 
> On Tue, Jul 4, 2017 at 4:16 PM Peng Hao  wrote:
> 
> vhost exists a call for g_file_get_contents, but not call g_free.
> 
> Signed-off-by: Peng Hao
> 
> 
>  Reviewed-by: Marc-André Lureau 
> 
> 
> ---
>  hw/virtio/vhost-backend.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 4e31de1..2c481d6 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -52,11 +52,13 @@ static int vhost_kernel_memslots_limit(struct 
> vhost_dev
> *dev)
>                              &s, NULL, NULL)) {
>          uint64_t val = g_ascii_strtoull(s, NULL, 10);
>          if (!((val == G_MAXUINT64 || !val) && errno)) {
> +            g_free(s);
>              return val;
>          }
>          error_report("ignoring invalid max_mem_regions value in vhost
> module:"
>                       " %s", s);
>      }
> +    g_free(s);
>      return limit;
>  }
> 
> --
> 1.8.3.1
> 

Thanks for the review.
I'd like to include your tag in commit log.
For that, could you please repost your ack in text format using
some other mail client?
My scripts don't handle the way your mail client scrambles text.

> 
> 
> --
> Marc-André Lureau



Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/5] spapr: DRC cleanups (part VI)

2017-07-04 Thread Daniel Henrique Barboza
I just tested this patch set on top of current ppc-for-2.10 branch 
(which contains
the patches from part V). It applied cleanly but required a couple of 
trivial

fixes to build probably because it was made on top of an older code base.

The trivial migration test worked fine. The libvirt scenario (attaching 
a device on
target before migration, try to unplug after migration) isn't working as 
expected
but we have a different result with this series. Instead of silently 
failing to unplug

with error messages on dmesg, the hot unplug works on QEMU level:

(qemu) device_del core1
(qemu)
(qemu) info cpus
* CPU #0: nip=0xc00a3e0c thread_id=86162
(qemu) info hotpluggable-cpus
Hotpluggable CPUs:
  type: "host-spapr-cpu-core"
  vcpus_count: "1"
  CPUInstance Properties:
core-id: "3"
  type: "host-spapr-cpu-core"
  vcpus_count: "1"
  CPUInstance Properties:
core-id: "2"
  type: "host-spapr-cpu-core"
  vcpus_count: "1"
  CPUInstance Properties:
core-id: "1"
  type: "host-spapr-cpu-core"
  vcpus_count: "1"
  qom_path: "/machine/unattached/device[0]"
  CPUInstance Properties:
core-id: "0"
(qemu)


However, any operation on the guest afterwards (tried with lscpu and 
dmesg) seems

to hung the guest. This is what I got when trying to do a dmesg after the
hot unplug:

danielhb@ubuntu1704:~$
danielhb@ubuntu1704:~$
danielhb@ubuntu1704:~$ dmesg | tail -n 5
^C

[  362.749693] INFO: task kworker/u8:1:30 blocked for more than 120 seconds.
[  362.749819]   Not tainted 4.10.0-26-generic #30-Ubuntu
[  362.749892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  362.750224] INFO: task kworker/0:3:1887 blocked for more than 120 
seconds.

[  362.750320]   Not tainted 4.10.0-26-generic #30-Ubuntu
[  362.750394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.

[  483.568842] INFO: task kworker/u8:1:30 blocked for more than 120 seconds.
[  483.568997]   Not tainted 4.10.0-26-generic #30-Ubuntu
[  483.569067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  483.569364] INFO: task kworker/0:3:1887 blocked for more than 120 
seconds.

(...)


I am not sure if it was intended for this scenario to work with this 
patch set already. Figured

it can serve as a FYI for the next series.


Given that hotplug/unplug without migration still works and on the 
assumption that

we'll look more into this libvirt scenario in the next spins, +1.


Tested-by: Daniel Barboza 



On 06/21/2017 06:18 AM, David Gibson wrote:

This sixth set of DRC cleanup patches (to be applied on top of the
patches from part V) is a complete rework of DRC state management.  We
stop tracking some unnecessary things, and change the basic state
representation to a simpler and more robust model.

Most of the patches in this set "break" migration from earlier git
snapshots, but not from any released qemu version.  The previous
migration stream format had multiple problems, so better to fix them
now, before 2.10 is out.

David Gibson (5):
   spapr: Remove 'awaiting_allocation' DRC flag
   spapr: Refactor spapr_drc_detach()
   spapr: Cleanups relating to DRC awaiting_release field
   spapr: Consolidate DRC state variables
   spapr: Remove sPAPRConfigureConnectorState sub-structure

  hw/ppc/spapr.c |   9 +-
  hw/ppc/spapr_drc.c | 321 +
  hw/ppc/spapr_pci.c |  13 +-
  hw/ppc/trace-events|   3 +-
  include/hw/ppc/spapr_drc.h |  53 +---
  5 files changed, 187 insertions(+), 212 deletions(-)






Re: [Qemu-devel] [PATCH] specs: Describe the TPM support in QEMU

2017-07-04 Thread Laszlo Ersek
On 06/29/17 20:00, Stefan Berger wrote:
> This patch adds a description of the current TPM support in QEMU
> to the specs.
> 
> Several public specs are referenced via their landing page on the
> trustedcomputinggroup.org website.
> 
> Signed-off-by: Stefan Berger 
> ---
>  docs/specs/tpm.txt | 98 
> ++
>  1 file changed, 98 insertions(+)
>  create mode 100644 docs/specs/tpm.txt
> 
> diff --git a/docs/specs/tpm.txt b/docs/specs/tpm.txt
> new file mode 100644
> index 000..6472989
> --- /dev/null
> +++ b/docs/specs/tpm.txt
> @@ -0,0 +1,98 @@
> +
> +QEMU TPM Device
> +===
> +
> += Guest-side Hardware Interface =
> +
> +The QEMU TPM emulation implements a TPM TIS hardware interface following
> +the Trusted Computing Group's specification "TCG PC Client Specific TPM
> +Interface Specification (TIS)", Specifcation Version 1.3, 21 March 2013.
> +This specification, or a later version of it, can be accessed from the
> +following URL:
> +
> +https://trustedcomputinggroup.org/pc-client-work-group-pc-client-specific-tpm-interface-specification-tis/
> +
> +The TIS interface makes a memory mapped IO region in the area 0xfed4 -
> +0xfed44fff available to the guest operating system.
> +
> +
> +QEMU files related to TPM TIS interfaceL

... OK, Eric mentioned the typo already

> + - hw/tpm/tpm_tis.c
> + - hw/tpm/tpm_tis.h
> +
> +
> += ACPI Interface =
> +
> +The TPM device is defined with ACPI ID "PNP0C31". QEMU builds a SSDT and 
> passes
> +it into the guest through the fw_cfg device. The device description contains
> +the base address of the TIS interface 0xfed4 and the size of the MMIO 
> area
> +(0x5000). In case a TPM2 is used by QEMU, a TPM2 ACPI table is also provided.
> +The device is described to be used in polling mode rather than interrupt mode
> +primarily because no unused IRQ could be found.
> +
> +To support measurement logs to be written by the firmware, e.g. SeaBIOS, a 
> TCPA
> +table is implemented. This table provides a 64kb buffer where the firmware 
> can
> +write its log into. For TPM 2 only a more recent version of the TPM2 table
> +provides support for measurements logs and a TCPA table does not need to be
> +created.
> +
> +The TCPA and TPM2 ACPI tables follow the Trusted Computing Group 
> specification
> +"TCG ACPI Specification" Family "1.2" and "2.0", Level 00 Revision 00.37. 
> This
> +specification, or a later version of it, can be accessed from the following
> +URL:
> +
> +https://trustedcomputinggroup.org/tcg-acpi-specification/
> +
> +
> +QEMU files related to TPM ACPI tables:
> + - hw/i386/acpi-build.c
> + - include/hw/acpi/tpm.h
> +
> +
> += TPM backend devices =
> +
> +The TPM implementation is split into two parts, frontend and backend. 

git-am complains that this line adds a whitespace error (trailing space,
namely).

> +The frontend part is the hardware interface, such as the TPM TIS interface
> +described earlier, and the other part is the TPM backend interface. The
> +backend interfaces implement the interaction with a TPM device,
> +which may be a physical or an emulated device. The split between the front-
> +and backend devices allows a frontend to be connected with any available
> +backend. This enables the TIS interface to be used with the passthrough
> +backend or the swtpm backend.

In the previous (RFC-like) version, you wrote "(future) swtpm" -- I
think that would be more precise. When QEMU gets swtpm support, we can
update this language (because we should update the filename list(s) as
well). What do you think?

> +
> +
> +QEMU file related to TPM backends:

s/file/files/

> + - backends/tpm.c
> + - include/sysemu/tpm_backend.h
> + - include/sysemu/tpm_backend_int.h

Is this generic backend code? Because for the passthrough backend, you
provide a separate file list below (which is great, but then we should
qualify *this* list as "generic" or some such).

> +
> +
> +== The QEMU TPM passthrough device ==
> +
> +In case QEMU is run on Linux as the host operating system it is possible to
> +make the hardware TPM device available to a single QEMU guest. In this case 
> the
> +user must make sure that no other program is using the device, e.g., 
> /dev/tpm0,
> +before trying to start QEMU with it.
> +
> +The passthrough driver uses the host's TPM device for sending TPM commands
> +and receiving responses from. Besides that it accesses the TPM device's sysfs
> +entry for support of command cancellation. Since none of the state of a 
> hardware
> +TPM can be migrated between hosts, virtual machine migration is disabled when
> +the TPM passthrough driver is used.
> +
> +Since the host's TPM device will already be initialize by the host's 
> firmware,

s/initialize/initialized/

... ah, also pointed out by Eric.

> +certain commands, e.g. TPM_Startup(), sent by the virtual firmware for device
> +initialization, will fail. In this case the firmware should simply not use 
> the
> +TPM.
> +
> +Sharing the d

Re: [Qemu-devel] Managing architectural restrictions with -device and libvirt

2017-07-04 Thread Daniel P. Berrange
On Tue, Jul 04, 2017 at 07:25:41PM +0100, Mark Cave-Ayland wrote:
> Hi all,
> 
> I've been working on a patchset that brings the sun4u machine on
> qemu-system-sparc64 much closer to a real Ultra 5, however due to
> various design restrictions I need to be able to restrict how devices
> are added to the machine with -device.
> 
> On a real Ultra 5, the root PCI bus (sabre) has 2 PCI bridges (simba A
> and simba B) with the onboard devices attached to simba A with 2 free
> slots, and an initially empty simba B.
> 
> Firstly, is it possible to restrict the machine so that devices cannot
> be directly plugged into the root PCI bus, but only behind one of the
> PCI bridges? There is also an additional restriction in that slot 0
> behind simba A must be left empty to ensure that the ebus (containing
> the onboard devices) is the first device allocated.
> 
> Secondly, how does libvirt handle these type of restrictions? Is it able
> to get the information from QEMU or is there some kind of libvirt
> profile that needs to be updated? And do newer versions of libvirt have
> the ability to attach devices behind PCI bridges using a GUI such as
> virt-manager, or is that only something that can only be done by
> directly editing the domain XML?

Libvirt has a bunch of code that provides different logic for various
machine types when doing addressing & setting up default PCI controllers.
Libvirt support for sparc machine types almost certainly broken since
I'm not aware of anyone having looked at it forever. Probably best to
re-ask your Q's on the libvir-list to get info from the people familiar
with this, since I'm fuzzy on precise details of what you'd need todo.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] [RFC 25/29] vhu: enable = false on get_vring_base

2017-07-04 Thread Maxime Coquelin



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert" 

When we receive a GET_VRING_BASE message set enable = false
to stop any new received packets modifying the ring.

Signed-off-by: Dr. David Alan Gilbert 


Reviewed-by: Maxime Coquelin 

Maxime

---
  contrib/libvhost-user/libvhost-user.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/contrib/libvhost-user/libvhost-user.c 
b/contrib/libvhost-user/libvhost-user.c
index ceddeac74f..d37052b7b0 100644
--- a/contrib/libvhost-user/libvhost-user.c
+++ b/contrib/libvhost-user/libvhost-user.c
@@ -652,6 +652,7 @@ vu_get_vring_base_exec(VuDev *dev, VhostUserMsg *vmsg)
  vmsg->size = sizeof(vmsg->payload.state);
  
  dev->vq[index].started = false;

+dev->vq[index].enable = false;
  if (dev->iface->queue_set_started) {
  dev->iface->queue_set_started(dev, index, false);
  }





Re: [Qemu-devel] [RFC 24/29] vhost+postcopy: Lock around set_mem_table

2017-07-04 Thread Maxime Coquelin



On 06/28/2017 09:00 PM, Dr. David Alan Gilbert (git) wrote:

From: "Dr. David Alan Gilbert"

**HACK - better solution needed **
We have the situation where:

  qemu  bridge

  send set_mem_table
   map memory
   a)  mark area with UFD
   send reply with map addresses
   b)  start using
   c) receive reply

   As soon as (a) happens qemu might start seeing faults
from memory accesses (but doesn't until b); but it can't
process those faults until (c) when it's received the
mmap addresses.

Make the fault handler spin until it gets the reply in (c).

At the very least this needs some proper locks, but preferably
we need to split the message.


Yes, maybe the slave channel could be used to send the ufds with
a dedicated request? The backend would set the reply-ack flag, so that
it starts accessing the guest memory only when Qemu is ready to handle
faults.

Note that the slave channel support has not been implemented in Qemu's
libvhost-user yet, but this is something I can do if we feel the need.

Maxime



Re: [Qemu-devel] [PATCH v11 04/29] target: [tcg] Add generic translation framework

2017-07-04 Thread Peter Maydell
On 4 July 2017 at 19:59, Lluís Vilanova  wrote:
> Richard Henderson writes:
>
>> On 06/28/2017 05:32 AM, Lluís Vilanova wrote:
>>> +void (*init_disas_context)(DisasContextBase *db, CPUState *cpu);
>>> +void (*init_globals)(DisasContextBase *db, CPUState *cpu);
>>> +void (*tb_start)(DisasContextBase *db, CPUState *cpu);
>>> +void (*insn_start)(DisasContextBase *db, CPUState *cpu);
>>> +BreakpointCheckType (*breakpoint_check)(DisasContextBase *db, CPUState 
>>> *cpu,
>>> +const CPUBreakpoint *bp);
>>> +target_ulong (*translate_insn)(DisasContextBase *db, CPUState *cpu);
>>> +void (*tb_stop)(DisasContextBase *db, CPUState *cpu);
>>> +void (*disas_log)(const DisasContextBase *db, CPUState *cpu);
>
>> Any reason not to stuff the cpu pointer into the DisasContextBase instead of
>> passing it around separately?
>
> None, really. I'll move it from DisasContext (in targets where it's present)
> into DisasContextBase, and use that one everywhere.

I kind of like not having CPUState* in DisasContext, because
it enforces the rule that you can't read from fields of
it inside the target translate.c code without jumping through
a hoop (ie copying the info from CPUState->foo to
DisasContext->foo). That then acts as a useful flag in code
review (or when writing the code) to confirm that foo really
is constant for the life of the simulation (or to recommend
using a TB flag instead).

thanks
-- PMM



[Qemu-devel] [Bug 1545052] Re: RDMA migration will hang forever if target QEMU fails to load vmstate

2017-07-04 Thread Dr. David Alan Gilbert
Fix series posted upstream:
0001-migration-rdma-Fix-race-on-source.patch
0002-migration-Close-file-on-failed-migration-load.patch
0003-migration-rdma-Allow-cancelling-while-waiting-for-wr.patch
0004-migration-rdma-Safely-convert-control-types.patch
0005-migration-rdma-Send-error-during-cancelling.patch


** Changed in: qemu
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1545052

Title:
  RDMA migration will hang forever if target QEMU fails to load vmstate

Status in QEMU:
  In Progress

Bug description:
  Get a pair of machines with infiniband support. On one host run

  $ qemu-system-x86_64 -monitor stdio -incoming rdma:ibme: -vnc :1
  -m 1000

  To start an incoming migration.

  
  Now on the other host, run QEMU with an intentionally different configuration 
(ie different RAM size)

  $ qemu-system-x86_64 -monitor stdio -vnc :1 -m 2000

  Now trigger a migration on this source host

  (qemu) migrate rdma:ibpair:

  
  You will see on the target host, that it failed to load migration:

  dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, 
infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, 
infiniband class device path /sys/class/infiniband/mlx4_0, transport: (2) 
Ethernet
  qemu-system-x86_64: Length mismatch: pc.ram: 0x7d00 in != 0x3e80: 
Invalid argument
  qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'

  This is to be expected, however, at this point QEMU has hung and no
  longer responds to the monitor

  GDB shows the target host is stuck in this callpath

  #0  0x739141cd in write () at ../sysdeps/unix/syscall-template.S:81
  #1  0x727fe795 in rdma_get_cm_event.part.15 () from 
/lib64/librdmacm.so.1
  #2  0x5593e445 in qemu_rdma_cleanup (rdma=0x7fff9647e010) at 
migration/rdma.c:2210
  #3  0x5593ea45 in qemu_rdma_close (opaque=0x57796770) at 
migration/rdma.c:2652
  #4  0x559397cc in qemu_fclose (f=f@entry=0x564b1450) at 
migration/qemu-file.c:270
  #5  0x55936b88 in process_incoming_migration_co 
(opaque=0x564b1450) at migration/migration.c:361
  #6  0x55a25a1a in coroutine_trampoline (i0=, 
i1=) at util/coroutine-ucontext.c:79
  #7  0x7fffef5b3110 in ?? () from /lib64/libc.so.6


  Now, back on the source host again, you would expect to see that the
  migrate command failed. Instead, this QEMU is hung too.

  GDB shows the source host, migrate thread, is stuck in this callpath:

  #0  0x7391522d in read#1  0x700efd93 in ibv_get_cq_event () 
at /lib64/libibverbs.so.1
  #2  0x559403f2 in qemu_rdma_block_for_wrid 
(rdma=rdma@entry=0x7fff3d07e010, wrid_requested=wrid_requested@entry=4000, 
byte_len=byte_len@entry=0x7fff39de370c) at migration/rdma.c:1511
  #3  0x5594058a in qemu_rdma_exchange_get_response 
(rdma=0x7fff3d07e010, head=head@entry=0x7fff39de3780, 
expecting=expecting@entry=2, idx=idx@entry=0)
  at migration/rdma.c:1648
  #4  0x55941e71 in qemu_rdma_exchange_send (rdma=0x7fff3d07e010, 
head=0x7fff39de3840, data=0x0, resp=0x7fff39de3870, resp_idx=0x7fff39de3880, 
callback=0x0) at migration/rdma.c:1725
  #5  0x559447e4 in qemu_rdma_registration_stop (f=, 
opaque=, flags=0, data=) at migration/rdma.c:3302
  #6  0x5593bc4b in ram_control_after_iterate 
(f=f@entry=0x564c20f0, flags=flags@entry=0) at migration/qemu-file.c:157
  #7  0x55740b59 in ram_save_setup (f=0x564c20f0, opaque=) at /home/berrange/src/virt/qemu/migration/ram.c:1959
  #8  0x557451c1 in qemu_savevm_state_begin (f=0x564c20f0, 
params=params@entry=0x55f6f048 )
  at /home/berrange/src/virt/qemu/migration/savevm.c:919
  #9  0x559381a5 in migration_thread (opaque=0x55f6f000 
) at migration/migration.c:1633
  #10 0x7390edc5 in start_thread (arg=0x7fff39de4700) at 
pthread_create.c:308

  
  It should have aborted migrate and set the status to failed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1545052/+subscriptions



Re: [Qemu-devel] [PATCH v11 04/29] target: [tcg] Add generic translation framework

2017-07-04 Thread Lluís Vilanova
Richard Henderson writes:

> On 06/28/2017 05:32 AM, Lluís Vilanova wrote:
>> +void (*init_disas_context)(DisasContextBase *db, CPUState *cpu);
>> +void (*init_globals)(DisasContextBase *db, CPUState *cpu);
>> +void (*tb_start)(DisasContextBase *db, CPUState *cpu);
>> +void (*insn_start)(DisasContextBase *db, CPUState *cpu);
>> +BreakpointCheckType (*breakpoint_check)(DisasContextBase *db, CPUState 
>> *cpu,
>> +const CPUBreakpoint *bp);
>> +target_ulong (*translate_insn)(DisasContextBase *db, CPUState *cpu);
>> +void (*tb_stop)(DisasContextBase *db, CPUState *cpu);
>> +void (*disas_log)(const DisasContextBase *db, CPUState *cpu);

> Any reason not to stuff the cpu pointer into the DisasContextBase instead of
> passing it around separately?

None, really. I'll move it from DisasContext (in targets where it's present)
into DisasContextBase, and use that one everywhere.


> Otherwise,

> Reviewed-by: Richard Henderson 


Thanks,
  Lluis



[Qemu-devel] [PATCH 3/5] migration/rdma: Allow cancelling while waiting for wrid

2017-07-04 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

When waiting for a WRID, if the other side dies we end up waiting
for ever with no way to cancel the migration.
Cure this by poll()ing the fd first with a timeout and checking
error flags and migration state.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 54 --
 1 file changed, 48 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 6111e10c70..7273ae9929 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -1466,6 +1466,52 @@ static uint64_t qemu_rdma_poll(RDMAContext *rdma, 
uint64_t *wr_id_out,
 return  0;
 }
 
+/* Wait for activity on the completion channel.
+ * Returns 0 on success, none-0 on error.
+ */
+static int qemu_rdma_wait_comp_channel(RDMAContext *rdma)
+{
+/*
+ * Coroutine doesn't start until migration_fd_process_incoming()
+ * so don't yield unless we know we're running inside of a coroutine.
+ */
+if (rdma->migration_started_on_destination) {
+yield_until_fd_readable(rdma->comp_channel->fd);
+} else {
+/* This is the source side, we're in a separate thread
+ * or destination prior to migration_fd_process_incoming()
+ * we can't yield; so we have to poll the fd.
+ * But we need to be able to handle 'cancel' or an error
+ * without hanging forever.
+ */
+while (!rdma->error_state && !rdma->error_reported &&
+   !rdma->received_error) {
+GPollFD pfds[1];
+pfds[0].fd = rdma->comp_channel->fd;
+pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+/* 0.5s timeout, should be fine for a 'cancel' */
+switch (qemu_poll_ns(pfds, 1, 500 * 1000 * 1000)) {
+case 1: /* fd active */
+return 0;
+
+case 0: /* Timeout, go around again */
+break;
+
+default: /* Error of some type */
+return -1;
+}
+
+if (migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) {
+/* Bail out and let the cancellation happen */
+return -EPIPE;
+}
+}
+}
+
+return rdma->error_state || rdma->error_reported ||
+   rdma->received_error;
+}
+
 /*
  * Block until the next work request has completed.
  *
@@ -1513,12 +1559,8 @@ static int qemu_rdma_block_for_wrid(RDMAContext *rdma, 
int wrid_requested,
 }
 
 while (1) {
-/*
- * Coroutine doesn't start until migration_fd_process_incoming()
- * so don't yield unless we know we're running inside of a coroutine.
- */
-if (rdma->migration_started_on_destination) {
-yield_until_fd_readable(rdma->comp_channel->fd);
+if (qemu_rdma_wait_comp_channel(rdma)) {
+goto err_block_for_wrid;
 }
 
 if (ibv_get_cq_event(rdma->comp_channel, &cq, &cq_ctx)) {
-- 
2.13.0




[Qemu-devel] [PATCH 5/5] migration/rdma: Send error during cancelling

2017-07-04 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

When we issue a cancel and clean up the RDMA channel
send a CONTROL_ERROR to get the destination to quit.

The rdma_cleanup code waits for the event to come back
from the rdma_disconnect; but that wont happen until the
destination quits and there's currently nothing to force
it.

Note this makes the case of a cancel work while the destination
is alive, and it already works if the destination is
truly dead.  Note it doesn't fix the case where the destination
is hung (we get stuck waiting for the rdma_disconnect event).

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bfb0a43740..3d17db3a23 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2260,7 +2260,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
 int ret, idx;
 
 if (rdma->cm_id && rdma->connected) {
-if (rdma->error_state && !rdma->received_error) {
+if ((rdma->error_state ||
+ migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
+!rdma->received_error) {
 RDMAControlHeader head = { .len = 0,
.type = RDMA_CONTROL_ERROR,
.repeat = 1,
-- 
2.13.0




[Qemu-devel] [PATCH 0/5] A bunch of RDMA fixes

2017-07-04 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 


Hi,
  This is a bunch of RDMA fixes, the first is a race
I spotted a while ago that you don't hit during normal
operation; the rest are to do with migration failure and
cancellation that I started looking at because of lp1545052 which
is a failure to recover on the source if the destination
fails.

  I'm pretty sure there are other cases where the source
might hang waiting for a failed destination; particularly
if the destination hangs rather than fails completely;
one I know of is waiting for the event after the rdma_disconnect
but I don't have a good fix for it.  Suggestions welcome.

(I sent the 'Fix race on source' patch yesterday, but this
set supersedes it).

Dave

Dr. David Alan Gilbert (5):
  migration/rdma: Fix race on source
  migration: Close file on failed migration load
  migration/rdma: Allow cancelling while waiting for wrid
  migration/rdma: Safely convert control types
  migration/rdma: Send error during cancelling

 migration/migration.c |   1 +
 migration/rdma.c  | 124 --
 2 files changed, 90 insertions(+), 35 deletions(-)

-- 
2.13.0




[Qemu-devel] [PATCH 4/5] migration/rdma: Safely convert control types

2017-07-04 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

control_desc[] is an array of strings that correspond to a
series of message types; they're used only for error messages, but if
the message type is seriously broken then we could go off the end of
the array.

Convert the array to a function control_desc() that bound checks.

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 54 --
 1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index 7273ae9929..bfb0a43740 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -165,20 +165,6 @@ enum {
 RDMA_CONTROL_UNREGISTER_FINISHED, /* unpinning finished */
 };
 
-static const char *control_desc[] = {
-[RDMA_CONTROL_NONE] = "NONE",
-[RDMA_CONTROL_ERROR] = "ERROR",
-[RDMA_CONTROL_READY] = "READY",
-[RDMA_CONTROL_QEMU_FILE] = "QEMU FILE",
-[RDMA_CONTROL_RAM_BLOCKS_REQUEST] = "RAM BLOCKS REQUEST",
-[RDMA_CONTROL_RAM_BLOCKS_RESULT] = "RAM BLOCKS RESULT",
-[RDMA_CONTROL_COMPRESS] = "COMPRESS",
-[RDMA_CONTROL_REGISTER_REQUEST] = "REGISTER REQUEST",
-[RDMA_CONTROL_REGISTER_RESULT] = "REGISTER RESULT",
-[RDMA_CONTROL_REGISTER_FINISHED] = "REGISTER FINISHED",
-[RDMA_CONTROL_UNREGISTER_REQUEST] = "UNREGISTER REQUEST",
-[RDMA_CONTROL_UNREGISTER_FINISHED] = "UNREGISTER FINISHED",
-};
 
 /*
  * Memory and MR structures used to represent an IB Send/Recv work request.
@@ -251,6 +237,30 @@ typedef struct QEMU_PACKED RDMADestBlock {
 uint32_t padding;
 } RDMADestBlock;
 
+static const char *control_desc(unsigned int rdma_control)
+{
+static const char *strs[] = {
+[RDMA_CONTROL_NONE] = "NONE",
+[RDMA_CONTROL_ERROR] = "ERROR",
+[RDMA_CONTROL_READY] = "READY",
+[RDMA_CONTROL_QEMU_FILE] = "QEMU FILE",
+[RDMA_CONTROL_RAM_BLOCKS_REQUEST] = "RAM BLOCKS REQUEST",
+[RDMA_CONTROL_RAM_BLOCKS_RESULT] = "RAM BLOCKS RESULT",
+[RDMA_CONTROL_COMPRESS] = "COMPRESS",
+[RDMA_CONTROL_REGISTER_REQUEST] = "REGISTER REQUEST",
+[RDMA_CONTROL_REGISTER_RESULT] = "REGISTER RESULT",
+[RDMA_CONTROL_REGISTER_FINISHED] = "REGISTER FINISHED",
+[RDMA_CONTROL_UNREGISTER_REQUEST] = "UNREGISTER REQUEST",
+[RDMA_CONTROL_UNREGISTER_FINISHED] = "UNREGISTER FINISHED",
+};
+
+if (rdma_control > RDMA_CONTROL_UNREGISTER_FINISHED) {
+return "??BAD CONTROL VALUE??";
+}
+
+return strs[rdma_control];
+}
+
 static uint64_t htonll(uint64_t v)
 {
 union { uint32_t lv[2]; uint64_t llv; } u;
@@ -1632,7 +1642,7 @@ static int qemu_rdma_post_send_control(RDMAContext *rdma, 
uint8_t *buf,
.num_sge = 1,
 };
 
-trace_qemu_rdma_post_send_control(control_desc[head->type]);
+trace_qemu_rdma_post_send_control(control_desc(head->type));
 
 /*
  * We don't actually need to do a memcpy() in here if we used
@@ -1711,16 +1721,16 @@ static int qemu_rdma_exchange_get_response(RDMAContext 
*rdma,
 network_to_control((void *) rdma->wr_data[idx].control);
 memcpy(head, rdma->wr_data[idx].control, sizeof(RDMAControlHeader));
 
-trace_qemu_rdma_exchange_get_response_start(control_desc[expecting]);
+trace_qemu_rdma_exchange_get_response_start(control_desc(expecting));
 
 if (expecting == RDMA_CONTROL_NONE) {
-trace_qemu_rdma_exchange_get_response_none(control_desc[head->type],
+trace_qemu_rdma_exchange_get_response_none(control_desc(head->type),
  head->type);
 } else if (head->type != expecting || head->type == RDMA_CONTROL_ERROR) {
 error_report("Was expecting a %s (%d) control message"
 ", but got: %s (%d), length: %d",
-control_desc[expecting], expecting,
-control_desc[head->type], head->type, head->len);
+control_desc(expecting), expecting,
+control_desc(head->type), head->type, head->len);
 if (head->type == RDMA_CONTROL_ERROR) {
 rdma->received_error = true;
 }
@@ -1830,7 +1840,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, 
RDMAControlHeader *head,
 }
 }
 
-trace_qemu_rdma_exchange_send_waiting(control_desc[resp->type]);
+trace_qemu_rdma_exchange_send_waiting(control_desc(resp->type));
 ret = qemu_rdma_exchange_get_response(rdma, resp,
   resp->type, RDMA_WRID_DATA);
 
@@ -1842,7 +1852,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, 
RDMAControlHeader *head,
 if (resp_idx) {
 *resp_idx = RDMA_WRID_DATA;
 }
-trace_qemu_rdma_exchange_send_received(control_desc[resp->type]);
+trace_qemu_rdma_exchange_send_received(control_desc(resp->type));
 }
 
 rdma->control_ready_expected = 1;
@@ -3392,7 +3402,7 @@ static int qemu_rdma_registration_h

[Qemu-devel] [PATCH 2/5] migration: Close file on failed migration load

2017-07-04 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

Closing the file before exit on a failure allows
the source to cleanup better, especially with RDMA.

Partial fix for https://bugs.launchpad.net/qemu/+bug/1545052

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index 51ccd1a4c5..21d6902a29 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -355,6 +355,7 @@ static void process_incoming_migration_co(void *opaque)
   MIGRATION_STATUS_FAILED);
 error_report("load of migration failed: %s", strerror(-ret));
 migrate_decompress_threads_join();
+qemu_fclose(mis->from_src_file);
 exit(EXIT_FAILURE);
 }
 
-- 
2.13.0




[Qemu-devel] [PATCH 1/5] migration/rdma: Fix race on source

2017-07-04 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

Fix a race where the destination might try and send the source a
WRID_READY before the source has done a post-recv for it.

rdma_post_recv has to happen after the qp exists, and we're
OK since we've already called qemu_rdma_source_init that calls
qemu_alloc_qp.

This corresponds to:
https://bugzilla.redhat.com/show_bug.cgi?id=1285044

The race can be triggered by adding a few ms wait before this
post_recv_control (which was originally due to me turning on loads of
debug).

Signed-off-by: Dr. David Alan Gilbert 
---
 migration/rdma.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index c6bc607a03..6111e10c70 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2365,6 +2365,12 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error 
**errp)
 
 caps_to_network(&cap);
 
+ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
+if (ret) {
+ERROR(errp, "posting second control recv");
+goto err_rdma_source_connect;
+}
+
 ret = rdma_connect(rdma->cm_id, &conn_param);
 if (ret) {
 perror("rdma_connect");
@@ -2405,12 +2411,6 @@ static int qemu_rdma_connect(RDMAContext *rdma, Error 
**errp)
 
 rdma_ack_cm_event(cm_event);
 
-ret = qemu_rdma_post_recv_control(rdma, RDMA_WRID_READY);
-if (ret) {
-ERROR(errp, "posting second control recv!");
-goto err_rdma_source_connect;
-}
-
 rdma->control_ready_expected = 1;
 rdma->nb_sent = 0;
 return 0;
-- 
2.13.0




Re: [Qemu-devel] [PATCH 20/22] target/i386: add the tcg_enabled() in target/i386/

2017-07-04 Thread Richard Henderson

On 07/04/2017 01:12 AM, Paolo Bonzini wrote:

From: Yang Zhong

Add the tcg_enabled() where the x86 target needs to disable
TCG-specific code.

Signed-off-by: Yang Zhong
Signed-off-by: Paolo Bonzini
---
v2: do not touch bpt_helper.c, adjust caller in machine.c [Richard]


Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 18/22] target/i386: split cpu_set_mxcsr() and make cpu_set_fpuc() inline

2017-07-04 Thread Richard Henderson

On 07/04/2017 01:12 AM, Paolo Bonzini wrote:

From: Yang Zhong

Split the cpu_set_mxcsr() and make cpu_set_fpuc() inline with specific
tcg code.

Signed-off-by: Yang Zhong
Signed-off-by: Paolo Bonzini
---
v2: renamed tcg_update_mxcsr [Richard],
added missing call to cpu_post_load


Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 16/22] target/i386: move cpu_sync_bndcs_hflags() function

2017-07-04 Thread Richard Henderson

On 07/04/2017 01:12 AM, Paolo Bonzini wrote:

From: Yang Zhong

Move cpu_sync_bndcs_hflags() function from mpx_helper.c
to helper.c because mpx_helper.c need be disabled when
tcg is disabled.

Signed-off-by: Yang Zhong
Signed-off-by: Paolo Bonzini
---
v2: moved cpu_report_tpr_access hunk later [Richard]

  target/i386/helper.c | 30 ++
  target/i386/mpx_helper.c | 30 --
  2 files changed, 30 insertions(+), 30 deletions(-)


Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH 14/22] tcg: add CONFIG_TCG guards in headers

2017-07-04 Thread Richard Henderson

On 07/04/2017 01:12 AM, Paolo Bonzini wrote:

From: Yang Zhong

Add the CONFIG_TCG for exec-all.h. Since function tlb_set_page_with_attrs()
is defined in ./accel/tcg/cputlb.c, which will be disabled if tcg is disabled.
This function need be implemented in accel/stubs/tcg-stub.c for disable-tcg.

Signed-off-by: Yang Zhong
Signed-off-by: Paolo Bonzini
---
v2: do not touch include/exec/helper-proto.h [Richard]

  include/exec/cpu-defs.h |  4 +++-
  include/exec/cputlb.h   |  2 +-
  include/exec/exec-all.h | 53 ++---
  3 files changed, 32 insertions(+), 27 deletions(-)


Reviewed-by: Richard Henderson 


r~



[Qemu-devel] Managing architectural restrictions with -device and libvirt

2017-07-04 Thread Mark Cave-Ayland
Hi all,

I've been working on a patchset that brings the sun4u machine on
qemu-system-sparc64 much closer to a real Ultra 5, however due to
various design restrictions I need to be able to restrict how devices
are added to the machine with -device.

On a real Ultra 5, the root PCI bus (sabre) has 2 PCI bridges (simba A
and simba B) with the onboard devices attached to simba A with 2 free
slots, and an initially empty simba B.

Firstly, is it possible to restrict the machine so that devices cannot
be directly plugged into the root PCI bus, but only behind one of the
PCI bridges? There is also an additional restriction in that slot 0
behind simba A must be left empty to ensure that the ebus (containing
the onboard devices) is the first device allocated.

Secondly, how does libvirt handle these type of restrictions? Is it able
to get the information from QEMU or is there some kind of libvirt
profile that needs to be updated? And do newer versions of libvirt have
the ability to attach devices behind PCI bridges using a GUI such as
virt-manager, or is that only something that can only be done by
directly editing the domain XML?


ATB,

Mark.




Re: [Qemu-devel] [PATCHv7 3/6] fw_cfg: switch fw_cfg_find() to locate the fw_cfg device by type rather than path

2017-07-04 Thread Mark Cave-Ayland
On 03/07/17 10:42, Igor Mammedov wrote:
> On Thu, 29 Jun 2017 15:07:17 +0100
> Mark Cave-Ayland  wrote:
> 
>> This will enable the fw_cfg device to be placed anywhere within the QOM tree
>> regardless of its machine location.
>>
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  hw/nvram/fw_cfg.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
>> index 99bdbc2..0fe7404 100644
>> --- a/hw/nvram/fw_cfg.c
>> +++ b/hw/nvram/fw_cfg.c
>> @@ -1017,7 +1017,7 @@ FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr 
>> data_addr)
>>  
>>  FWCfgState *fw_cfg_find(void)
>>  {
>> -return FW_CFG(object_resolve_path(FW_CFG_PATH, NULL));
>> +return FW_CFG(object_resolve_path_type("", TYPE_FW_CFG, NULL));
> I insist on using ambiguous argument
> 
> see why it's needed
>  https://www.mail-archive.com/qemu-devel@nongnu.org/msg460692.html

Yes I did take into account your previous comment about the ambiguous
argument, but with the fw_cfg_unattached_at_realize() check added in the
v7 patchset, it is actually impossible to realize more than one fw_cfg
device, and at realize time that device must have been linked as a child
device to a parent device.

Hence it should be impossible for the above situation where more than
one fw_cfg device can exist in the QOM tree to occur. Have I missed
something in the logic which would prevent this from happening?


ATB,

Mark.




Re: [Qemu-devel] [PATCHv7 5/6] fw_cfg: move qdev_init_nofail() from fw_cfg_init1() to callers

2017-07-04 Thread Mark Cave-Ayland
On 03/07/17 10:39, Igor Mammedov wrote:

> On Thu, 29 Jun 2017 15:07:19 +0100
> Mark Cave-Ayland  wrote:
> 
>> When looking to instantiate a TYPE_FW_CFG_MEM or TYPE_FW_CFG_IO device to be
>> able to wire it up differently, it is much more convenient for the caller to
>> instantiate the device and have the fw_cfg default files already preloaded
>> during realize.
>>
>> Move fw_cfg_init1() to the end of both the fw_cfg_mem_realize() and
>> fw_cfg_io_realize() functions so it no longer needs to be called manually
>> when instantiating the device, and also rename it to fw_cfg_common_realize()
>> which better describes its new purpose.
>>
>> Since it is now the responsibility of the machine to wire up the fw_cfg 
>> device
>> it is necessary to introduce a object_property_add_child() call into
>> fw_cfg_init_io() and fw_cfg_init_mem() to link the fw_cfg device to the root
>> machine object as before.
>>
>> Finally we can now convert the asserts() preventing multiple fw_cfg devices
>> and unparented fw_cfg devices being instantiated and replace them with proper
>> error reporting at realize time. This allows us to remove FW_CFG_NAME and
>> FW_CFG_PATH since they are no longer required.
>>
>> Signed-off-by: Mark Cave-Ayland 
>> ---
>>  hw/nvram/fw_cfg.c |   41 +
>>  1 file changed, 29 insertions(+), 12 deletions(-)
>>
>> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
>> index 2291121..31029ac 100644
>> --- a/hw/nvram/fw_cfg.c
>> +++ b/hw/nvram/fw_cfg.c
>> @@ -37,9 +37,6 @@
>>  
>>  #define FW_CFG_FILE_SLOTS_DFLT 0x20
>>  
>> -#define FW_CFG_NAME "fw_cfg"
>> -#define FW_CFG_PATH "/machine/" FW_CFG_NAME
>> -
>>  #define TYPE_FW_CFG "fw_cfg"
>>  #define TYPE_FW_CFG_IO  "fw_cfg_io"
>>  #define TYPE_FW_CFG_MEM "fw_cfg_mem"
>> @@ -920,19 +917,22 @@ static int fw_cfg_unattached_at_realize(void)
>>  }
>>  
>>  
>> -static void fw_cfg_init1(DeviceState *dev)
>> +static void fw_cfg_common_realize(DeviceState *dev, Error **errp)
>>  {
>>  FWCfgState *s = FW_CFG(dev);
>>  MachineState *machine = MACHINE(qdev_get_machine());
>>  uint32_t version = FW_CFG_VERSION;
>>  
>> -assert(!object_resolve_path(FW_CFG_PATH, NULL));
>> -
>> -object_property_add_child(OBJECT(machine), FW_CFG_NAME, OBJECT(s), 
>> NULL);
>> -
>> -qdev_init_nofail(dev);
>> +if (!fw_cfg_find()) {
> maybe add comment that here, that fw_cfg_find() will return NULL if more than
> 1 device is found.

This should be the same behaviour as before, i.e. a NULL means that the
fw_cfg device couldn't be found. It seems intuitive to me from the name
of the function exactly what it does, so I don't find the lack of
comment too confusing. Does anyone else have any thoughts here?

> 
>> +error_setg(errp, "at most one %s device is permitted", TYPE_FW_CFG);
> s/TYPE_FW_CFG/object_get_typename()/
> so it would print leaf type name 

Previously the code would add the device to the machine at FW_CFG_PATH
which was fixed at /machine/fw_cfg regardless of whether it was
fw_cfg_io or fw_cfg_mem type (see the top of fw_cfg.c).

This was a deliberate attempt to preserve the existing behaviour and if
we were to give the leaf type name I think this would be misleading,
since it implies you could have one fw_cfg_io and one fw_cfg_mem which
isn't correct.

> 
>> +return;
>> +}
>>  
>> -assert(!fw_cfg_unattached_at_realize());
>> +if (fw_cfg_unattached_at_realize()) {
> as I pointed out in v6, this condition will always be false,
> I suggest to drop 4/6 patch and this hunk here so it won't to confuse
> readers with assumption that condition might succeed.

Definitely look more closely at the fw_cfg_unattached_at_realize()
implementation in patch 4. You'll see that the check to determine if the
device is attached does not check obj->parent but checks to see if the
device exists under /machine/unattached which is what the
device_set_realised() does if the device doesn't have a parent.

I can confirm that I've given this a fairly good test with parented and
non-parented objects and it seems to work well here. Does it not work
for you in testing?

> 
>> +error_setg(errp, "%s device must be added as a child device before "
>> +   "realize", TYPE_FW_CFG);
>> +return;
>> +}
>>  
>>  fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4);
>>  fw_cfg_add_bytes(s, FW_CFG_UUID, &qemu_uuid, 16);
>> @@ -965,7 +965,9 @@ FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t 
>> dma_iobase,
>>  qdev_prop_set_bit(dev, "dma_enabled", false);
>>  }
>>  
>> -fw_cfg_init1(dev);
>> +object_property_add_child(OBJECT(qdev_get_machine()), TYPE_FW_CFG,
>> +  OBJECT(dev), NULL);
>> +qdev_init_nofail(dev);
>>  
>>  sbd = SYS_BUS_DEVICE(dev);
>>  ios = FW_CFG_IO(dev);
>> @@ -1003,7 +1005,9 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
>>  qdev_prop_set_bit(dev, "dma_enabled", false);
>>  }
>

Re: [Qemu-devel] [PATCH 3/7] s390x: fix error propagation in kvm-flic's realize

2017-07-04 Thread Christian Borntraeger
On 07/04/2017 06:59 PM, Cornelia Huck wrote:
> On Tue, 4 Jul 2017 17:08:52 +0200
> Halil Pasic  wrote:
> 
>  cd.type = KVM_DEV_TYPE_FLIC;
>  ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
>  if (ret < 0) {
> -trace_flic_create_device(errno);
> -return;
> +error_setg_errno(&errp_local, errno, "Creating the KVM device 
> failed");
> +trace_flic_no_device_api(errno);  

 Err... this should still be trace_flic_create_device(), no?  
>>>
>>> I'm afraid you are right! Probably a copy paste error :/
>>>   
>>
>> Do you think the traces are still appropriate once we have
>> proper error propagation?
> 
> They add less value, but I'd just keep them.
> 
>>
>> I did not feel comfortable removing them but thinking again,
>> than might be the thing to do.
>>
>> @Christian:
>> I think we should really fix this the one way or the other.
>> Can you tell me what is the proper procedure?
> 
> I'd vote for just fixing the trace event. Smaller change, and we can
> revisit this later.
> 

+1




[Qemu-devel] [PATCH 3/3] hw: Use new memory_region_allocate_aux_memory() function

2017-07-04 Thread Peter Maydell
Use the new utility function memory_region_allocate_aux_memory()
instead of manually calling memory_region_init_ram() and then
vmstate_register_ram_global().

Patch automatically created using the included coccinelle script:
 spatch --in-place -sp_file scripts/coccinelle/allocate_aux_mem.cocci -dir hw

Signed-off-by: Peter Maydell 
---
 hw/arm/exynos4210.c   | 10 --
 hw/arm/exynos4_boards.c   | 12 +---
 hw/arm/integratorcp.c |  5 ++---
 hw/arm/mainstone.c|  5 ++---
 hw/arm/musicpal.c |  5 ++---
 hw/arm/omap1.c|  5 ++---
 hw/arm/omap2.c|  5 ++---
 hw/arm/omap_sx1.c | 10 --
 hw/arm/palm.c |  4 +---
 hw/arm/pxa2xx.c   | 20 
 hw/arm/realview.c | 14 +-
 hw/arm/spitz.c|  3 +--
 hw/arm/stellaris.c|  9 +++--
 hw/arm/stm32f205_soc.c| 10 +++---
 hw/arm/tosa.c |  3 +--
 hw/arm/vexpress.c | 12 +++-
 hw/arm/virt.c |  3 +--
 hw/arm/xilinx_zynq.c  |  5 ++---
 hw/arm/xlnx-zynqmp.c  |  5 ++---
 hw/block/onenand.c|  5 ++---
 hw/cris/axis_dev88.c  |  5 ++---
 hw/display/cg3.c  |  5 ++---
 hw/display/sm501.c|  5 ++---
 hw/display/tc6393xb.c |  5 ++---
 hw/display/tcx.c  |  5 ++---
 hw/display/vmware_vga.c   |  5 ++---
 hw/i386/pc.c  |  5 ++---
 hw/i386/pc_sysfw.c|  8 +++-
 hw/i386/xen/xen-hvm.c |  4 +---
 hw/input/milkymist-softusb.c  | 10 --
 hw/m68k/an5206.c  |  3 +--
 hw/m68k/mcf5208.c |  3 +--
 hw/microblaze/petalogix_ml605_mmu.c   | 11 +--
 hw/microblaze/petalogix_s3adsp1800_mmu.c  | 12 +---
 hw/mips/mips_fulong2e.c   |  4 +---
 hw/mips/mips_jazz.c   | 10 --
 hw/mips/mips_mipssim.c|  5 ++---
 hw/mips/mips_r4k.c|  5 ++---
 hw/moxie/moxiesim.c   |  6 ++
 hw/net/milkymist-minimac2.c   |  6 +++---
 hw/openrisc/openrisc_sim.c|  3 +--
 hw/pci-host/prep.c|  5 +
 hw/ppc/mac_newworld.c |  5 ++---
 hw/ppc/mac_oldworld.c |  5 ++---
 hw/ppc/ppc405_boards.c| 14 +-
 hw/ppc/ppc405_uc.c|  5 ++---
 hw/s390x/sclp.c   |  5 ++---
 hw/sh4/r2d.c  |  3 +--
 hw/sh4/shix.c | 13 +
 hw/sparc/leon3.c  |  3 +--
 hw/sparc/sun4m.c  | 13 +
 hw/sparc64/sun4u.c| 10 --
 hw/tricore/tricore_testboard.c| 30 --
 hw/unicore32/puv3.c   |  4 +---
 hw/xtensa/sim.c   |  6 ++
 hw/xtensa/xtfpga.c| 14 +-
 scripts/coccinelle/allocate_aux_mem.cocci | 14 ++
 57 files changed, 168 insertions(+), 256 deletions(-)
 create mode 100644 scripts/coccinelle/allocate_aux_mem.cocci

diff --git a/hw/arm/exynos4210.c b/hw/arm/exynos4210.c
index 0050626..92c57bb 100644
--- a/hw/arm/exynos4210.c
+++ b/hw/arm/exynos4210.c
@@ -276,9 +276,8 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem)
 &s->chipid_mem);
 
 /* Internal ROM */
-memory_region_init_ram(&s->irom_mem, NULL, "exynos4210.irom",
-   EXYNOS4210_IROM_SIZE, &error_fatal);
-vmstate_register_ram_global(&s->irom_mem);
+memory_region_allocate_aux_memory(&s->irom_mem, NULL, "exynos4210.irom",
+  EXYNOS4210_IROM_SIZE);
 memory_region_set_readonly(&s->irom_mem, true);
 memory_region_add_subregion(system_mem, EXYNOS4210_IROM_BASE_ADDR,
 &s->irom_mem);
@@ -292,9 +291,8 @@ Exynos4210State *exynos4210_init(MemoryRegion *system_mem)
 &s->irom_alias_mem);
 
 /* Internal RAM */
-memory_region_init_ram(&s->iram_mem, NULL, "exynos4210.iram",
-   EXYNOS4210_IRAM_SIZE, &error_fatal);
-vmstate_register_ram_global(&s->iram_mem);
+memory_region_allocate_aux_memory(&s->iram_mem, NULL, "exynos4210.iram",
+  EXYNOS4210_IRAM_SIZE);
 memory_region_add_subregion(system_mem, EXYNOS4210_IRAM_B

[Qemu-devel] [PATCH 1/3] include/hw/boards.h: Document memory_region_allocate_system_memory()

2017-07-04 Thread Peter Maydell
Add a documentation comment for memory_region_allocate_system_memory().

In particular, the reason for this function's existence and the
requirement on board code to call it exactly once are non-obvious.

Signed-off-by: Peter Maydell 
---
 include/hw/boards.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index 76ce021..1bc5389 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -9,6 +9,34 @@
 #include "qom/object.h"
 #include "qom/cpu.h"
 
+/**
+ * memory_region_allocate_system_memory - Allocate a board's main memory
+ * @mr: the #MemoryRegion to be initialized
+ * @owner: the object that tracks the region's reference count
+ * @name: name of the memory region
+ * @ram_size: size of the region in bytes
+ *
+ * This function allocates the main memory for a board model, and
+ * initializes @mr appropriately. It also arranges for the memory
+ * to be migrated (by calling vmstate_register_ram_global()).
+ *
+ * Memory allocated via this function will be backed with the memory
+ * backend the user provided using -mem-path if appropriate; this
+ * is typically used to cause host huge pages to be used.
+ * This function should therefore be called by a board exactly once,
+ * for the primary or largest RAM area it implements.
+ *
+ * For boards where the major RAM is split into two parts in the memory
+ * map, you can deal with this by calling 
memory_region_allocate_system_memory()
+ * once to get a MemoryRegion with enough RAM for both parts, and then
+ * creating alias MemoryRegions via memory_region_init_alias() which
+ * alias into different parts of the RAM MemoryRegion and can be mapped
+ * into the memory map in the appropriate places.
+ *
+ * Smaller pieces of memory (display RAM, static RAMs, etc) don't need
+ * to be backed via the -mem-path memory backend and can simply
+ * be created via memory_region_init_ram().
+ */
 void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
   const char *name,
   uint64_t ram_size);
-- 
2.7.4




[Qemu-devel] [PATCH 0/3] Add new utility function memory_region_allocate_aux_memory()

2017-07-04 Thread Peter Maydell
Many board models and several devices need to create auxiliary
regions of RAM (in addition to the main lump of 'system' memory),
to model static RAMs, video memory, ROMs, etc. Currently they do
this with a sequence like:
   memory_region_init_ram(sram, NULL, "sram", 0x1, &error_fatal);
   vmstate_register_ram_global(sram);

but the need for the second function call is not obvious and if
omitted the bug is subtle (possible migration failure). This series
wraps the two calls up into a single new function
memory_region_allocate_aux_memory(), named to parallel the existing
memory_region_allocate_system_memory().

Patch 1 documents memory_region_allocate_system_memory() which
has a couple of non-obvious wrinkles. Patch 2 implements the new
utility function. Patch 3 changes a lot of callsites to use it
with the aid of the included coccinelle patch. (I think most
of the remaining callsites of vmstate_register_ram_global() are
not changed just because they report the error up to their caller
rather than using error_fatal. I'm not sure what kind of error you
might get back that wasn't "out of memory", which we usually make
fatal anyway, so perhaps we could change those callsites too.)

thanks
-- PMM

Peter Maydell (3):
  include/hw/boards.h: Document memory_region_allocate_system_memory()
  memory.h: Add new utility function memory_region_allocate_aux_memory()
  hw: Use new memory_region_allocate_aux_memory() function

 include/exec/memory.h | 23 +++
 include/hw/boards.h   | 29 +
 hw/arm/exynos4210.c   | 10 --
 hw/arm/exynos4_boards.c   | 12 +---
 hw/arm/integratorcp.c |  5 ++---
 hw/arm/mainstone.c|  5 ++---
 hw/arm/musicpal.c |  5 ++---
 hw/arm/omap1.c|  5 ++---
 hw/arm/omap2.c|  5 ++---
 hw/arm/omap_sx1.c | 10 --
 hw/arm/palm.c |  4 +---
 hw/arm/pxa2xx.c   | 20 
 hw/arm/realview.c | 14 +-
 hw/arm/spitz.c|  3 +--
 hw/arm/stellaris.c|  9 +++--
 hw/arm/stm32f205_soc.c| 10 +++---
 hw/arm/tosa.c |  3 +--
 hw/arm/vexpress.c | 12 +++-
 hw/arm/virt.c |  3 +--
 hw/arm/xilinx_zynq.c  |  5 ++---
 hw/arm/xlnx-zynqmp.c  |  5 ++---
 hw/block/onenand.c|  5 ++---
 hw/cris/axis_dev88.c  |  5 ++---
 hw/display/cg3.c  |  5 ++---
 hw/display/sm501.c|  5 ++---
 hw/display/tc6393xb.c |  5 ++---
 hw/display/tcx.c  |  5 ++---
 hw/display/vmware_vga.c   |  5 ++---
 hw/i386/pc.c  |  5 ++---
 hw/i386/pc_sysfw.c|  8 +++-
 hw/i386/xen/xen-hvm.c |  4 +---
 hw/input/milkymist-softusb.c  | 10 --
 hw/m68k/an5206.c  |  3 +--
 hw/m68k/mcf5208.c |  3 +--
 hw/microblaze/petalogix_ml605_mmu.c   | 11 +--
 hw/microblaze/petalogix_s3adsp1800_mmu.c  | 12 +---
 hw/mips/mips_fulong2e.c   |  4 +---
 hw/mips/mips_jazz.c   | 10 --
 hw/mips/mips_mipssim.c|  5 ++---
 hw/mips/mips_r4k.c|  5 ++---
 hw/moxie/moxiesim.c   |  6 ++
 hw/net/milkymist-minimac2.c   |  6 +++---
 hw/openrisc/openrisc_sim.c|  3 +--
 hw/pci-host/prep.c|  5 +
 hw/ppc/mac_newworld.c |  5 ++---
 hw/ppc/mac_oldworld.c |  5 ++---
 hw/ppc/ppc405_boards.c| 14 +-
 hw/ppc/ppc405_uc.c|  5 ++---
 hw/s390x/sclp.c   |  5 ++---
 hw/sh4/r2d.c  |  3 +--
 hw/sh4/shix.c | 13 +
 hw/sparc/leon3.c  |  3 +--
 hw/sparc/sun4m.c  | 13 +
 hw/sparc64/sun4u.c| 10 --
 hw/tricore/tricore_testboard.c| 30 --
 hw/unicore32/puv3.c   |  4 +---
 hw/xtensa/sim.c   |  6 ++
 hw/xtensa/xtfpga.c| 14 +-
 memory.c  |  8 
 scripts/coccinelle/allocate_aux_mem.cocci | 14 ++
 60 files changed, 228 insertions(+), 256 deletions(-)
 create mode 100644 scripts/coccinelle/allocate_aux_mem.cocci

-- 
2.7.4


[Qemu-devel] [PATCH 2/3] memory.h: Add new utility function memory_region_allocate_aux_memory()

2017-07-04 Thread Peter Maydell
Add a new utility function memory_region_allocate_aux_memory()
for board code to use to create auxiliary memory regions (such
as display RAM  or static RAMs). This parallels the existing
memory_region_allocate_system_memory() and wraps up the very
common sequence of:
   memory_region_init_ram(sram, NULL, "sram", 0x1, &error_fatal);
   vmstate_register_ram_global(sram);

Although this is the parallel function to the existing
memory_region_allocate_system_memory(), we don't put it in
boards.h/numa.c with it because this function is useful for
devices, not just boards, and it has nothing to do with NUMA.

Signed-off-by: Peter Maydell 
---
 include/exec/memory.h | 23 +++
 include/hw/boards.h   |  3 ++-
 memory.c  |  8 
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 8503685..77fc777 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -412,6 +412,12 @@ void memory_region_init_io(MemoryRegion *mr,
  *must be unique within any device
  * @size: size of the region.
  * @errp: pointer to Error*, to store an error if it happens.
+ *
+ * Note that it is the caller's responsibility to ensure that the contents
+ * of the RAM are migrated (by calling vmstate_register_ram_global(), or
+ * otherwise). The utility function memory_region_allocate_aux_memory()
+ * may be used to combine the "initialize MR" and "register contents for
+ * migration" steps.
  */
 void memory_region_init_ram(MemoryRegion *mr,
 struct Object *owner,
@@ -631,6 +637,23 @@ void memory_region_init_iommu(MemoryRegion *mr,
   uint64_t size);
 
 /**
+ * memory_region_allocate_aux_memory - Allocate auxiliary (non-main) memory
+ * @mr: the #MemoryRegion to be initialized
+ * @owner: the object that tracks the region's reference count
+ * @name: name of the memory region
+ * @ram_size: size of the region in bytes
+ *
+ * This function allocates RAM for a board model or device, and
+ * arranges for it to be migrated (by calling vmstate_register_ram_global()).
+ * Board models should call memory_region_allocate_system_memory()
+ * exactly once to allocate the memory for the primary or largest RAM
+ * area on the board, and then can use this function for smaller
+ * pieces of memory such as display RAM or static RAMs.
+ */
+void memory_region_allocate_aux_memory(MemoryRegion *mr, Object *owner,
+   const char *name, uint64_t ram_size);
+
+/**
  * memory_region_owner: get a memory region's owner.
  *
  * @mr: the memory region being queried.
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 1bc5389..a127a97 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -35,7 +35,8 @@
  *
  * Smaller pieces of memory (display RAM, static RAMs, etc) don't need
  * to be backed via the -mem-path memory backend and can simply
- * be created via memory_region_init_ram().
+ * be created via memory_region_allocate_aux_memory() or
+ * memory_region_init_ram().
  */
 void memory_region_allocate_system_memory(MemoryRegion *mr, Object *owner,
   const char *name,
diff --git a/memory.c b/memory.c
index 1044bba..25ac7d7 100644
--- a/memory.c
+++ b/memory.c
@@ -32,6 +32,7 @@
 #include "sysemu/sysemu.h"
 #include "hw/misc/mmio_interface.h"
 #include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
 
 //#define DEBUG_UNASSIGNED
 
@@ -2817,6 +2818,13 @@ void mtree_info(fprintf_function mon_printf, void *f, 
bool flatview)
 }
 }
 
+void memory_region_allocate_aux_memory(MemoryRegion *mr, Object *owner,
+   const char *name, uint64_t ram_size)
+{
+memory_region_init_ram(mr, owner, name, ram_size, &error_fatal);
+vmstate_register_ram_global(mr);
+}
+
 static const TypeInfo memory_region_info = {
 .parent = TYPE_OBJECT,
 .name   = TYPE_MEMORY_REGION,
-- 
2.7.4




Re: [Qemu-devel] [PATCH 3/7] s390x: fix error propagation in kvm-flic's realize

2017-07-04 Thread Cornelia Huck
On Tue, 4 Jul 2017 17:08:52 +0200
Halil Pasic  wrote:

> >>>  cd.type = KVM_DEV_TYPE_FLIC;
> >>>  ret = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &cd);
> >>>  if (ret < 0) {
> >>> -trace_flic_create_device(errno);
> >>> -return;
> >>> +error_setg_errno(&errp_local, errno, "Creating the KVM device 
> >>> failed");
> >>> +trace_flic_no_device_api(errno);  
> >>
> >> Err... this should still be trace_flic_create_device(), no?  
> > 
> > I'm afraid you are right! Probably a copy paste error :/
> >   
> 
> Do you think the traces are still appropriate once we have
> proper error propagation?

They add less value, but I'd just keep them.

> 
> I did not feel comfortable removing them but thinking again,
> than might be the thing to do.
> 
> @Christian:
> I think we should really fix this the one way or the other.
> Can you tell me what is the proper procedure?

I'd vote for just fixing the trace event. Smaller change, and we can
revisit this later.



Re: [Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Igor Druzhinin
On 04/07/17 17:42, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin
>> Sent: 04 July 2017 17:34
>> To: Paul Durrant ; xen-de...@lists.xenproject.org;
>> qemu-devel@nongnu.org
>> Cc: sstabell...@kernel.org; Anthony Perard ;
>> pbonz...@redhat.com
>> Subject: Re: [PATCH v2 3/4] xen/mapcache: introduce
>> xen_replace_cache_entry()
>>
>> On 04/07/17 17:27, Paul Durrant wrote:
 -Original Message-
 From: Igor Druzhinin
 Sent: 04 July 2017 16:48
 To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
 Cc: Igor Druzhinin ; sstabell...@kernel.org;
 Anthony Perard ; Paul Durrant
 ; pbonz...@redhat.com
 Subject: [PATCH v2 3/4] xen/mapcache: introduce
 xen_replace_cache_entry()

 This new call is trying to update a requested map cache entry
 according to the changes in the physmap. The call is searching
 for the entry, unmaps it and maps again at the same place using
 a new guest address. If the mapping is dummy this call will
 make it real.

 This function makes use of a new xenforeignmemory_map2() call
 with an extended interface that was recently introduced in
 libxenforeignmemory [1].
>>>
>>> I don't understand how the compat layer works here. If
>> xenforeignmemory_map2() is not available then you can't control the
>> placement in virtual address space.
>>>
>>
>> If it's not 4.10 or newer xenforeignmemory_map2() doesn't exist and is
>> going to be defined as xenforeignmemory_map(). At the same time
>> XEN_COMPAT_PHYSMAP is defined and the entry replace function (which
>> relies on xenforeignmemory_map2 functionality) is never going to be called.
>>
>> If you mean that I should incorporate this into the description I can do it.
> 
> AFAICT XEN_COMPAT_PHYSMAP is not introduced until patch #4 though.
> 
> The problem really comes down to defining xenforeignmemory_map2() in terms of 
> xenforeignmemory_map(). It basically can't be safely done. Could you define 
> xenforeignmemory_map2() as abort() in the compat case instead? 
>

xen_replace_cache_entry() is not called in patch #3. Which means it's
safe to use a fallback version (xenforeignmemory_map) in
xen_remap_bucket here.

Igor

>   Paul
> 
>>
>> Igor
>>
>>>   Paul
>>>

 [1] https://www.mail-archive.com/xen-
>> de...@lists.xen.org/msg113007.html

 Signed-off-by: Igor Druzhinin 
 ---
  configure | 18 ++
  hw/i386/xen/xen-mapcache.c| 79
 ++-
  include/hw/xen/xen_common.h   |  7 
  include/sysemu/xen-mapcache.h | 11 +-
  4 files changed, 106 insertions(+), 9 deletions(-)

 diff --git a/configure b/configure
 index c571ad1..ad6156b 100755
 --- a/configure
 +++ b/configure
 @@ -2021,6 +2021,24 @@ EOF
  # Xen unstable
  elif
  cat > $TMPC <>>> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
 +#include 
 +int main(void) {
 +  xenforeignmemory_handle *xfmem;
 +
 +  xfmem = xenforeignmemory_open(0, 0);
 +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
 +
 +  return 0;
 +}
 +EOF
 +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
 +  then
 +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
 +  xen_ctrl_version=41000
 +  xen=yes
 +elif
 +cat > $TMPC <>>>  #undef XC_WANT_COMPAT_DEVICEMODEL_API
  #define __XEN_TOOLS__
  #include 
 diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
 index cd4e746..a988be7 100644
 --- a/hw/i386/xen/xen-mapcache.c
 +++ b/hw/i386/xen/xen-mapcache.c
 @@ -151,6 +151,7 @@ void
>> xen_map_cache_init(phys_offset_to_gaddr_t f,
 void *opaque)
  }

  static void xen_remap_bucket(MapCacheEntry *entry,
 + void *vaddr,
   hwaddr size,
   hwaddr address_index,
   bool dummy)
 @@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry
 *entry,
  err = g_malloc0(nb_pfn * sizeof (int));

  if (entry->vaddr_base != NULL) {
 -ram_block_notify_remove(entry->vaddr_base, entry->size);
 +if (entry->vaddr_base != vaddr) {
 +ram_block_notify_remove(entry->vaddr_base, entry->size);
 +}
  if (munmap(entry->vaddr_base, entry->size) != 0) {
  perror("unmap fails");
  exit(-1);
 @@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry
 *entry,
  }

  if (!dummy) {
 -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
 -   PROT_READ|PROT_WRITE,
 +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid,
 vaddr,
 +   

Re: [Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Paul Durrant
> -Original Message-
> From: Igor Druzhinin
> Sent: 04 July 2017 17:34
> To: Paul Durrant ; xen-de...@lists.xenproject.org;
> qemu-devel@nongnu.org
> Cc: sstabell...@kernel.org; Anthony Perard ;
> pbonz...@redhat.com
> Subject: Re: [PATCH v2 3/4] xen/mapcache: introduce
> xen_replace_cache_entry()
> 
> On 04/07/17 17:27, Paul Durrant wrote:
> >> -Original Message-
> >> From: Igor Druzhinin
> >> Sent: 04 July 2017 16:48
> >> To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
> >> Cc: Igor Druzhinin ; sstabell...@kernel.org;
> >> Anthony Perard ; Paul Durrant
> >> ; pbonz...@redhat.com
> >> Subject: [PATCH v2 3/4] xen/mapcache: introduce
> >> xen_replace_cache_entry()
> >>
> >> This new call is trying to update a requested map cache entry
> >> according to the changes in the physmap. The call is searching
> >> for the entry, unmaps it and maps again at the same place using
> >> a new guest address. If the mapping is dummy this call will
> >> make it real.
> >>
> >> This function makes use of a new xenforeignmemory_map2() call
> >> with an extended interface that was recently introduced in
> >> libxenforeignmemory [1].
> >
> > I don't understand how the compat layer works here. If
> xenforeignmemory_map2() is not available then you can't control the
> placement in virtual address space.
> >
> 
> If it's not 4.10 or newer xenforeignmemory_map2() doesn't exist and is
> going to be defined as xenforeignmemory_map(). At the same time
> XEN_COMPAT_PHYSMAP is defined and the entry replace function (which
> relies on xenforeignmemory_map2 functionality) is never going to be called.
> 
> If you mean that I should incorporate this into the description I can do it.

AFAICT XEN_COMPAT_PHYSMAP is not introduced until patch #4 though.

The problem really comes down to defining xenforeignmemory_map2() in terms of 
xenforeignmemory_map(). It basically can't be safely done. Could you define 
xenforeignmemory_map2() as abort() in the compat case instead? 

  Paul

> 
> Igor
> 
> >   Paul
> >
> >>
> >> [1] https://www.mail-archive.com/xen-
> de...@lists.xen.org/msg113007.html
> >>
> >> Signed-off-by: Igor Druzhinin 
> >> ---
> >>  configure | 18 ++
> >>  hw/i386/xen/xen-mapcache.c| 79
> >> ++-
> >>  include/hw/xen/xen_common.h   |  7 
> >>  include/sysemu/xen-mapcache.h | 11 +-
> >>  4 files changed, 106 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/configure b/configure
> >> index c571ad1..ad6156b 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -2021,6 +2021,24 @@ EOF
> >>  # Xen unstable
> >>  elif
> >>  cat > $TMPC < >> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> >> +#include 
> >> +int main(void) {
> >> +  xenforeignmemory_handle *xfmem;
> >> +
> >> +  xfmem = xenforeignmemory_open(0, 0);
> >> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
> >> +
> >> +  return 0;
> >> +}
> >> +EOF
> >> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
> >> +  then
> >> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
> >> +  xen_ctrl_version=41000
> >> +  xen=yes
> >> +elif
> >> +cat > $TMPC < >>  #undef XC_WANT_COMPAT_DEVICEMODEL_API
> >>  #define __XEN_TOOLS__
> >>  #include 
> >> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> >> index cd4e746..a988be7 100644
> >> --- a/hw/i386/xen/xen-mapcache.c
> >> +++ b/hw/i386/xen/xen-mapcache.c
> >> @@ -151,6 +151,7 @@ void
> xen_map_cache_init(phys_offset_to_gaddr_t f,
> >> void *opaque)
> >>  }
> >>
> >>  static void xen_remap_bucket(MapCacheEntry *entry,
> >> + void *vaddr,
> >>   hwaddr size,
> >>   hwaddr address_index,
> >>   bool dummy)
> >> @@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry
> >> *entry,
> >>  err = g_malloc0(nb_pfn * sizeof (int));
> >>
> >>  if (entry->vaddr_base != NULL) {
> >> -ram_block_notify_remove(entry->vaddr_base, entry->size);
> >> +if (entry->vaddr_base != vaddr) {
> >> +ram_block_notify_remove(entry->vaddr_base, entry->size);
> >> +}
> >>  if (munmap(entry->vaddr_base, entry->size) != 0) {
> >>  perror("unmap fails");
> >>  exit(-1);
> >> @@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry
> >> *entry,
> >>  }
> >>
> >>  if (!dummy) {
> >> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> >> -   PROT_READ|PROT_WRITE,
> >> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid,
> >> vaddr,
> >> +   PROT_READ|PROT_WRITE, 0,
> >> nb_pfn, pfns, err);
> >>  if (vaddr_base == NULL) {
> >> -perror("xenforeignmemory_map");
> >> +perror("xenforeignmemory_map2");

Re: [Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Igor Druzhinin
On 04/07/17 17:27, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin
>> Sent: 04 July 2017 16:48
>> To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
>> Cc: Igor Druzhinin ; sstabell...@kernel.org;
>> Anthony Perard ; Paul Durrant
>> ; pbonz...@redhat.com
>> Subject: [PATCH v2 3/4] xen/mapcache: introduce
>> xen_replace_cache_entry()
>>
>> This new call is trying to update a requested map cache entry
>> according to the changes in the physmap. The call is searching
>> for the entry, unmaps it and maps again at the same place using
>> a new guest address. If the mapping is dummy this call will
>> make it real.
>>
>> This function makes use of a new xenforeignmemory_map2() call
>> with an extended interface that was recently introduced in
>> libxenforeignmemory [1].
> 
> I don't understand how the compat layer works here. If 
> xenforeignmemory_map2() is not available then you can't control the placement 
> in virtual address space.
> 

If it's not 4.10 or newer xenforeignmemory_map2() doesn't exist and is
going to be defined as xenforeignmemory_map(). At the same time
XEN_COMPAT_PHYSMAP is defined and the entry replace function (which
relies on xenforeignmemory_map2 functionality) is never going to be called.

If you mean that I should incorporate this into the description I can do it.

Igor

>   Paul
> 
>>
>> [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html
>>
>> Signed-off-by: Igor Druzhinin 
>> ---
>>  configure | 18 ++
>>  hw/i386/xen/xen-mapcache.c| 79
>> ++-
>>  include/hw/xen/xen_common.h   |  7 
>>  include/sysemu/xen-mapcache.h | 11 +-
>>  4 files changed, 106 insertions(+), 9 deletions(-)
>>
>> diff --git a/configure b/configure
>> index c571ad1..ad6156b 100755
>> --- a/configure
>> +++ b/configure
>> @@ -2021,6 +2021,24 @@ EOF
>>  # Xen unstable
>>  elif
>>  cat > $TMPC <> +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
>> +#include 
>> +int main(void) {
>> +  xenforeignmemory_handle *xfmem;
>> +
>> +  xfmem = xenforeignmemory_open(0, 0);
>> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
>> +
>> +  return 0;
>> +}
>> +EOF
>> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
>> +  then
>> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
>> +  xen_ctrl_version=41000
>> +  xen=yes
>> +elif
>> +cat > $TMPC <>  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>>  #define __XEN_TOOLS__
>>  #include 
>> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
>> index cd4e746..a988be7 100644
>> --- a/hw/i386/xen/xen-mapcache.c
>> +++ b/hw/i386/xen/xen-mapcache.c
>> @@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f,
>> void *opaque)
>>  }
>>
>>  static void xen_remap_bucket(MapCacheEntry *entry,
>> + void *vaddr,
>>   hwaddr size,
>>   hwaddr address_index,
>>   bool dummy)
>> @@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry
>> *entry,
>>  err = g_malloc0(nb_pfn * sizeof (int));
>>
>>  if (entry->vaddr_base != NULL) {
>> -ram_block_notify_remove(entry->vaddr_base, entry->size);
>> +if (entry->vaddr_base != vaddr) {
>> +ram_block_notify_remove(entry->vaddr_base, entry->size);
>> +}
>>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>>  perror("unmap fails");
>>  exit(-1);
>> @@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry
>> *entry,
>>  }
>>
>>  if (!dummy) {
>> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
>> -   PROT_READ|PROT_WRITE,
>> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid,
>> vaddr,
>> +   PROT_READ|PROT_WRITE, 0,
>> nb_pfn, pfns, err);
>>  if (vaddr_base == NULL) {
>> -perror("xenforeignmemory_map");
>> +perror("xenforeignmemory_map2");
>>  exit(-1);
>>  }
>>  entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
>> @@ -194,7 +197,7 @@ static void xen_remap_bucket(MapCacheEntry
>> *entry,
>>   * We create dummy mappings where we are unable to create a foreign
>>   * mapping immediately due to certain circumstances (i.e. on resume
>> now)
>>   */
>> -vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
>> +vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
>>MAP_ANON|MAP_SHARED, -1, 0);
>>  if (vaddr_base == NULL) {
>>  perror("mmap");
>> @@ -203,13 +206,16 @@ static void xen_remap_bucket(MapCacheEntry
>> *entry,
>>  entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
>>  }
>>
>> +if (entry->vaddr_base == NULL || entry->vaddr_base != vaddr) {
>> +

Re: [Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Paul Durrant
> -Original Message-
> From: Igor Druzhinin
> Sent: 04 July 2017 16:48
> To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
> Cc: Igor Druzhinin ; sstabell...@kernel.org;
> Anthony Perard ; Paul Durrant
> ; pbonz...@redhat.com
> Subject: [PATCH v2 3/4] xen/mapcache: introduce
> xen_replace_cache_entry()
> 
> This new call is trying to update a requested map cache entry
> according to the changes in the physmap. The call is searching
> for the entry, unmaps it and maps again at the same place using
> a new guest address. If the mapping is dummy this call will
> make it real.
> 
> This function makes use of a new xenforeignmemory_map2() call
> with an extended interface that was recently introduced in
> libxenforeignmemory [1].

I don't understand how the compat layer works here. If xenforeignmemory_map2() 
is not available then you can't control the placement in virtual address space.

  Paul

> 
> [1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html
> 
> Signed-off-by: Igor Druzhinin 
> ---
>  configure | 18 ++
>  hw/i386/xen/xen-mapcache.c| 79
> ++-
>  include/hw/xen/xen_common.h   |  7 
>  include/sysemu/xen-mapcache.h | 11 +-
>  4 files changed, 106 insertions(+), 9 deletions(-)
> 
> diff --git a/configure b/configure
> index c571ad1..ad6156b 100755
> --- a/configure
> +++ b/configure
> @@ -2021,6 +2021,24 @@ EOF
>  # Xen unstable
>  elif
>  cat > $TMPC < +#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> +#include 
> +int main(void) {
> +  xenforeignmemory_handle *xfmem;
> +
> +  xfmem = xenforeignmemory_open(0, 0);
> +  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
> +
> +  return 0;
> +}
> +EOF
> +compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
> +  then
> +  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
> +  xen_ctrl_version=41000
> +  xen=yes
> +elif
> +cat > $TMPC <  #undef XC_WANT_COMPAT_DEVICEMODEL_API
>  #define __XEN_TOOLS__
>  #include 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index cd4e746..a988be7 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f,
> void *opaque)
>  }
> 
>  static void xen_remap_bucket(MapCacheEntry *entry,
> + void *vaddr,
>   hwaddr size,
>   hwaddr address_index,
>   bool dummy)
> @@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry
> *entry,
>  err = g_malloc0(nb_pfn * sizeof (int));
> 
>  if (entry->vaddr_base != NULL) {
> -ram_block_notify_remove(entry->vaddr_base, entry->size);
> +if (entry->vaddr_base != vaddr) {
> +ram_block_notify_remove(entry->vaddr_base, entry->size);
> +}
>  if (munmap(entry->vaddr_base, entry->size) != 0) {
>  perror("unmap fails");
>  exit(-1);
> @@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry
> *entry,
>  }
> 
>  if (!dummy) {
> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> -   PROT_READ|PROT_WRITE,
> +vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid,
> vaddr,
> +   PROT_READ|PROT_WRITE, 0,
> nb_pfn, pfns, err);
>  if (vaddr_base == NULL) {
> -perror("xenforeignmemory_map");
> +perror("xenforeignmemory_map2");
>  exit(-1);
>  }
>  entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
> @@ -194,7 +197,7 @@ static void xen_remap_bucket(MapCacheEntry
> *entry,
>   * We create dummy mappings where we are unable to create a foreign
>   * mapping immediately due to certain circumstances (i.e. on resume
> now)
>   */
> -vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
> +vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
>MAP_ANON|MAP_SHARED, -1, 0);
>  if (vaddr_base == NULL) {
>  perror("mmap");
> @@ -203,13 +206,16 @@ static void xen_remap_bucket(MapCacheEntry
> *entry,
>  entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
>  }
> 
> +if (entry->vaddr_base == NULL || entry->vaddr_base != vaddr) {
> +ram_block_notify_add(vaddr_base, size);
> +}
> +
>  entry->vaddr_base = vaddr_base;
>  entry->paddr_index = address_index;
>  entry->size = size;
>  entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned long)
> *
>  BITS_TO_LONGS(size >> XC_PAGE_SHIFT));
> 
> -ram_block_notify_add(entry->vaddr_base, entry->size);
>  bitmap_zero(entry->valid_mapping, nb_pfn);
>  for (i = 0; i < nb_pfn; i++) {
>  if (!err[i]) {
> @@ -282,14 +288,14 @@ tryagain:
>  if 

Re: [Qemu-devel] [PATCH v7 4/4] net/socket: Improve -net socket error reporting

2017-07-04 Thread Markus Armbruster
Mao Zhongyi  writes:

> When -net socket fails, it first reports a specific error, then
> a generic one, like this:
>
> $ qemu-system-x86_64 -net socket,
> qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
> mcast= or udp= is required
> qemu-system-x86_64: -net socket: Device 'socket' could not be initialized
>
> Convert net_socket_*_init() to Error to get rid of the superfluous second
> error message. After the patch, the effect like this:
>
> $ qemu-system-x86_64 -net socket,
> qemu-system-x86_64: -net socket: exactly one of fd=, listen=, connect=, 
> mcast= or udp= is required
>
> At the same time, add many explicit error handling message when it fails.
>
> Cc: jasow...@redhat.com
> Cc: arm...@redhat.com
> Cc: berra...@redhat.com
> Signed-off-by: Mao Zhongyi 
> ---
>  net/socket.c | 94 
> +---
>  1 file changed, 45 insertions(+), 49 deletions(-)
>
> diff --git a/net/socket.c b/net/socket.c
> index bd80b3c..a891c3a 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -494,22 +494,21 @@ static void net_socket_accept(void *opaque)
>  static int net_socket_listen_init(NetClientState *peer,
>const char *model,
>const char *name,
> -  const char *host_str)
> +  const char *host_str,
> +  Error **errp)
>  {
>  NetClientState *nc;
>  NetSocketState *s;
>  struct sockaddr_in saddr;
>  int fd, ret;
> -Error *err = NULL;
>  
> -if (parse_host_port(&saddr, host_str, &err) < 0) {
> -error_report_err(err);
> +if (parse_host_port(&saddr, host_str, errp) < 0) {
>  return -1;
>  }
>  
>  fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
>  if (fd < 0) {
> -perror("socket");
> +error_setg_errno(errp, errno, "failed to create stream socket");
>  return -1;
>  }
>  qemu_set_nonblock(fd);
> @@ -518,13 +517,14 @@ static int net_socket_listen_init(NetClientState *peer,
>  
>  ret = bind(fd, (struct sockaddr *)&saddr, sizeof(saddr));
>  if (ret < 0) {
> -perror("bind");
> +error_setg_errno(errp, errno, "bind ip=%s to socket failed",
> + inet_ntoa(saddr.sin_addr));

Comment on the same error message in PATCH 2 applies.

>  closesocket(fd);
>  return -1;
>  }
>  ret = listen(fd, 0);
>  if (ret < 0) {
> -perror("listen");
> +error_setg_errno(errp, errno, "listen socket failed");

Suggest "can't listen on socket".

>  closesocket(fd);
>  return -1;
>  }
> @@ -543,21 +543,20 @@ static int net_socket_listen_init(NetClientState *peer,
>  static int net_socket_connect_init(NetClientState *peer,
> const char *model,
> const char *name,
> -   const char *host_str)
> +   const char *host_str,
> +   Error **errp)
>  {
>  NetSocketState *s;
>  int fd, connected, ret;
>  struct sockaddr_in saddr;
> -Error *err = NULL;
>  
> -if (parse_host_port(&saddr, host_str, &err) < 0) {
> -error_report_err(err);
> +if (parse_host_port(&saddr, host_str, errp) < 0) {
>  return -1;
>  }
>  
>  fd = qemu_socket(PF_INET, SOCK_STREAM, 0);
>  if (fd < 0) {
> -perror("socket");
> +error_setg_errno(errp, errno, "failed to create stream socket");
>  return -1;
>  }
>  qemu_set_nonblock(fd);
> @@ -573,7 +572,7 @@ static int net_socket_connect_init(NetClientState *peer,
> errno == EINVAL) {
>  break;
>  } else {
> -perror("connect");
> +error_setg_errno(errp, errno, "connection failed");

Suggest "can't connect socket".

>  closesocket(fd);
>  return -1;
>  }
> @@ -582,9 +581,8 @@ static int net_socket_connect_init(NetClientState *peer,
>  break;
>  }
>  }
> -s = net_socket_fd_init(peer, model, name, fd, connected, &err);
> +s = net_socket_fd_init(peer, model, name, fd, connected, errp);
>  if (!s) {
> -error_report_err(err);
>  return -1;
>  }
>  snprintf(s->nc.info_str, sizeof(s->nc.info_str),
> @@ -597,36 +595,36 @@ static int net_socket_mcast_init(NetClientState *peer,
>   const char *model,
>   const char *name,
>   const char *host_str,
> - const char *localaddr_str)
> + const char *localaddr_str,
> + Error **errp)
>  {
>  NetSocketState *s;
>  int fd;
>  struct sockaddr_in 

Re: [Qemu-devel] [PATCH v2 2/4] xen/mapcache: add an ability to create dummy mappings

2017-07-04 Thread Paul Durrant
> -Original Message-
> From: Igor Druzhinin
> Sent: 04 July 2017 16:48
> To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
> Cc: Igor Druzhinin ; sstabell...@kernel.org;
> Anthony Perard ; Paul Durrant
> ; pbonz...@redhat.com
> Subject: [PATCH v2 2/4] xen/mapcache: add an ability to create dummy
> mappings
> 
> Dummys are simple anonymous mappings that are placed instead
> of regular foreign mappings in certain situations when we need
> to postpone the actual mapping but still have to give a
> memory region to QEMU to play with.
> 
> This is planned to be used for restore on Xen.
> 
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Paul Durrant 

> ---
>  hw/i386/xen/xen-mapcache.c | 40
> 
>  1 file changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
> index e60156c..cd4e746 100644
> --- a/hw/i386/xen/xen-mapcache.c
> +++ b/hw/i386/xen/xen-mapcache.c
> @@ -53,6 +53,8 @@ typedef struct MapCacheEntry {
>  uint8_t *vaddr_base;
>  unsigned long *valid_mapping;
>  uint8_t lock;
> +#define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
> +uint8_t flags;
>  hwaddr size;
>  struct MapCacheEntry *next;
>  } MapCacheEntry;
> @@ -150,7 +152,8 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f,
> void *opaque)
> 
>  static void xen_remap_bucket(MapCacheEntry *entry,
>   hwaddr size,
> - hwaddr address_index)
> + hwaddr address_index,
> + bool dummy)
>  {
>  uint8_t *vaddr_base;
>  xen_pfn_t *pfns;
> @@ -177,11 +180,27 @@ static void xen_remap_bucket(MapCacheEntry
> *entry,
>  pfns[i] = (address_index << (MCACHE_BUCKET_SHIFT-XC_PAGE_SHIFT))
> + i;
>  }
> 
> -vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> PROT_READ|PROT_WRITE,
> -  nb_pfn, pfns, err);
> -if (vaddr_base == NULL) {
> -perror("xenforeignmemory_map");
> -exit(-1);
> +if (!dummy) {
> +vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
> +   PROT_READ|PROT_WRITE,
> +   nb_pfn, pfns, err);
> +if (vaddr_base == NULL) {
> +perror("xenforeignmemory_map");
> +exit(-1);
> +}
> +entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
> +} else {
> +/*
> + * We create dummy mappings where we are unable to create a foreign
> + * mapping immediately due to certain circumstances (i.e. on resume
> now)
> + */
> +vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
> +  MAP_ANON|MAP_SHARED, -1, 0);
> +if (vaddr_base == NULL) {
> +perror("mmap");
> +exit(-1);
> +}
> +entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
>  }
> 
>  entry->vaddr_base = vaddr_base;
> @@ -211,6 +230,7 @@ static uint8_t *xen_map_cache_unlocked(hwaddr
> phys_addr, hwaddr size,
>  hwaddr cache_size = size;
>  hwaddr test_bit_size;
>  bool translated = false;
> +bool dummy = false;
> 
>  tryagain:
>  address_index  = phys_addr >> MCACHE_BUCKET_SHIFT;
> @@ -262,14 +282,14 @@ tryagain:
>  if (!entry) {
>  entry = g_malloc0(sizeof (MapCacheEntry));
>  pentry->next = entry;
> -xen_remap_bucket(entry, cache_size, address_index);
> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>  } else if (!entry->lock) {
>  if (!entry->vaddr_base || entry->paddr_index != address_index ||
>  entry->size != cache_size ||
>  !test_bits(address_offset >> XC_PAGE_SHIFT,
>  test_bit_size >> XC_PAGE_SHIFT,
>  entry->valid_mapping)) {
> -xen_remap_bucket(entry, cache_size, address_index);
> +xen_remap_bucket(entry, cache_size, address_index, dummy);
>  }
>  }
> 
> @@ -282,6 +302,10 @@ tryagain:
>  translated = true;
>  goto tryagain;
>  }
> +if (!dummy && runstate_check(RUN_STATE_INMIGRATE)) {
> +dummy = true;
> +goto tryagain;
> +}
>  trace_xen_map_cache_return(NULL);
>  return NULL;
>  }
> --
> 2.7.4




Re: [Qemu-devel] [PATCH v2 1/4] xen: move physmap saving into a separate function

2017-07-04 Thread Paul Durrant


> -Original Message-
> From: Igor Druzhinin
> Sent: 04 July 2017 16:48
> To: xen-de...@lists.xenproject.org; qemu-devel@nongnu.org
> Cc: Igor Druzhinin ; sstabell...@kernel.org;
> Anthony Perard ; Paul Durrant
> ; pbonz...@redhat.com
> Subject: [PATCH v2 1/4] xen: move physmap saving into a separate function
> 
> Non-functional change.
> 
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Paul Durrant 

> ---
>  hw/i386/xen/xen-hvm.c | 57 
> ---
>  1 file changed, 31 insertions(+), 26 deletions(-)
> 
> diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
> index cffa7e2..d259cf7 100644
> --- a/hw/i386/xen/xen-hvm.c
> +++ b/hw/i386/xen/xen-hvm.c
> @@ -305,6 +305,36 @@ static hwaddr xen_phys_offset_to_gaddr(hwaddr
> start_addr,
>  return start_addr;
>  }
> 
> +static int xen_save_physmap(XenIOState *state, XenPhysmap *physmap)
> +{
> +char path[80], value[17];
> +
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-
> model/%d/physmap/%"PRIx64"/start_addr",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap-
> >start_addr);
> +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> +return -1;
> +}
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)physmap->size);
> +if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> +return -1;
> +}
> +if (physmap->name) {
> +snprintf(path, sizeof(path),
> +"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
> +xen_domid, (uint64_t)physmap->phys_offset);
> +if (!xs_write(state->xenstore, 0, path,
> +  physmap->name, strlen(physmap->name))) {
> +return -1;
> +}
> +}
> +return 0;
> +}
> +
>  static int xen_add_to_physmap(XenIOState *state,
>hwaddr start_addr,
>ram_addr_t size,
> @@ -316,7 +346,6 @@ static int xen_add_to_physmap(XenIOState *state,
>  XenPhysmap *physmap = NULL;
>  hwaddr pfn, start_gpfn;
>  hwaddr phys_offset = memory_region_get_ram_addr(mr);
> -char path[80], value[17];
>  const char *mr_name;
> 
>  if (get_physmapping(state, start_addr, size)) {
> @@ -368,31 +397,7 @@ go_physmap:
> start_addr >> TARGET_PAGE_BITS,
> (start_addr + size - 1) >> 
> TARGET_PAGE_BITS,
> XEN_DOMCTL_MEM_CACHEATTR_WB);
> -
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-
> model/%d/physmap/%"PRIx64"/start_addr",
> -xen_domid, (uint64_t)phys_offset);
> -snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)start_addr);
> -if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> -return -1;
> -}
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/size",
> -xen_domid, (uint64_t)phys_offset);
> -snprintf(value, sizeof(value), "%"PRIx64, (uint64_t)size);
> -if (!xs_write(state->xenstore, 0, path, value, strlen(value))) {
> -return -1;
> -}
> -if (mr_name) {
> -snprintf(path, sizeof(path),
> -"/local/domain/0/device-model/%d/physmap/%"PRIx64"/name",
> -xen_domid, (uint64_t)phys_offset);
> -if (!xs_write(state->xenstore, 0, path, mr_name, strlen(mr_name))) {
> -return -1;
> -}
> -}
> -
> -return 0;
> +return xen_save_physmap(state, physmap);
>  }
> 
>  static int xen_remove_from_physmap(XenIOState *state,
> --
> 2.7.4




[Qemu-devel] [PATCH v2 3/4] xen/mapcache: introduce xen_replace_cache_entry()

2017-07-04 Thread Igor Druzhinin
This new call is trying to update a requested map cache entry
according to the changes in the physmap. The call is searching
for the entry, unmaps it and maps again at the same place using
a new guest address. If the mapping is dummy this call will
make it real.

This function makes use of a new xenforeignmemory_map2() call
with an extended interface that was recently introduced in
libxenforeignmemory [1].

[1] https://www.mail-archive.com/xen-devel@lists.xen.org/msg113007.html

Signed-off-by: Igor Druzhinin 
---
 configure | 18 ++
 hw/i386/xen/xen-mapcache.c| 79 ++-
 include/hw/xen/xen_common.h   |  7 
 include/sysemu/xen-mapcache.h | 11 +-
 4 files changed, 106 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index c571ad1..ad6156b 100755
--- a/configure
+++ b/configure
@@ -2021,6 +2021,24 @@ EOF
 # Xen unstable
 elif
 cat > $TMPC <
+int main(void) {
+  xenforeignmemory_handle *xfmem;
+
+  xfmem = xenforeignmemory_open(0, 0);
+  xenforeignmemory_map2(xfmem, 0, 0, 0, 0, 0, 0, 0);
+
+  return 0;
+}
+EOF
+compile_prog "" "$xen_libs -lxendevicemodel $xen_stable_libs"
+  then
+  xen_stable_libs="-lxendevicemodel $xen_stable_libs"
+  xen_ctrl_version=41000
+  xen=yes
+elif
+cat > $TMPC <
diff --git a/hw/i386/xen/xen-mapcache.c b/hw/i386/xen/xen-mapcache.c
index cd4e746..a988be7 100644
--- a/hw/i386/xen/xen-mapcache.c
+++ b/hw/i386/xen/xen-mapcache.c
@@ -151,6 +151,7 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, void 
*opaque)
 }
 
 static void xen_remap_bucket(MapCacheEntry *entry,
+ void *vaddr,
  hwaddr size,
  hwaddr address_index,
  bool dummy)
@@ -167,7 +168,9 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 err = g_malloc0(nb_pfn * sizeof (int));
 
 if (entry->vaddr_base != NULL) {
-ram_block_notify_remove(entry->vaddr_base, entry->size);
+if (entry->vaddr_base != vaddr) {
+ram_block_notify_remove(entry->vaddr_base, entry->size);
+}
 if (munmap(entry->vaddr_base, entry->size) != 0) {
 perror("unmap fails");
 exit(-1);
@@ -181,11 +184,11 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 }
 
 if (!dummy) {
-vaddr_base = xenforeignmemory_map(xen_fmem, xen_domid,
-   PROT_READ|PROT_WRITE,
+vaddr_base = xenforeignmemory_map2(xen_fmem, xen_domid, vaddr,
+   PROT_READ|PROT_WRITE, 0,
nb_pfn, pfns, err);
 if (vaddr_base == NULL) {
-perror("xenforeignmemory_map");
+perror("xenforeignmemory_map2");
 exit(-1);
 }
 entry->flags &= ~(XEN_MAPCACHE_ENTRY_DUMMY);
@@ -194,7 +197,7 @@ static void xen_remap_bucket(MapCacheEntry *entry,
  * We create dummy mappings where we are unable to create a foreign
  * mapping immediately due to certain circumstances (i.e. on resume 
now)
  */
-vaddr_base = mmap(NULL, size, PROT_READ|PROT_WRITE,
+vaddr_base = mmap(vaddr, size, PROT_READ|PROT_WRITE,
   MAP_ANON|MAP_SHARED, -1, 0);
 if (vaddr_base == NULL) {
 perror("mmap");
@@ -203,13 +206,16 @@ static void xen_remap_bucket(MapCacheEntry *entry,
 entry->flags |= XEN_MAPCACHE_ENTRY_DUMMY;
 }
 
+if (entry->vaddr_base == NULL || entry->vaddr_base != vaddr) {
+ram_block_notify_add(vaddr_base, size);
+}
+
 entry->vaddr_base = vaddr_base;
 entry->paddr_index = address_index;
 entry->size = size;
 entry->valid_mapping = (unsigned long *) g_malloc0(sizeof(unsigned long) *
 BITS_TO_LONGS(size >> XC_PAGE_SHIFT));
 
-ram_block_notify_add(entry->vaddr_base, entry->size);
 bitmap_zero(entry->valid_mapping, nb_pfn);
 for (i = 0; i < nb_pfn; i++) {
 if (!err[i]) {
@@ -282,14 +288,14 @@ tryagain:
 if (!entry) {
 entry = g_malloc0(sizeof (MapCacheEntry));
 pentry->next = entry;
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 } else if (!entry->lock) {
 if (!entry->vaddr_base || entry->paddr_index != address_index ||
 entry->size != cache_size ||
 !test_bits(address_offset >> XC_PAGE_SHIFT,
 test_bit_size >> XC_PAGE_SHIFT,
 entry->valid_mapping)) {
-xen_remap_bucket(entry, cache_size, address_index, dummy);
+xen_remap_bucket(entry, NULL, cache_size, address_index, dummy);
 }
 }
 
@@ -486,3 +492,60 @@ void xen_invalidate_map_cache(void)
 
 mapcache_unlock();
 }
+
+static uint8_t *xen_replace_cache_entry_unlocked

  1   2   3   4   >