date:20160525

Re: [Qemu-devel] [Qemu-ppc] [PATCH] MAINTAINERS: Add David Gibson as ppc maintainer

2016-05-25 Thread Mark Cave-Ayland

On 26/05/16 07:16, David Gibson wrote:

> I've been de facto co-maintainer of all ppc target related code for some
> time.  Alex Graf isworking on other things and doesn't have a whole lot of
> time for qemu ppc maintainership.  So, update the MAINTAINERS file to
> reflect this.
> 
> Signed-off-by: David Gibson 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 81e7fac..012a99b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -165,6 +165,7 @@ F: hw/openrisc/
>  F: tests/tcg/openrisc/
>  
>  PowerPC
> +M: David Gibson 
>  M: Alexander Graf 
>  L: qemu-...@nongnu.org
>  S: Maintained

Excellent news!

Acked-by: Mark Cave-Ayland 


ATB,

Mark.

Re: [Qemu-devel] [for-2.7 PATCH v3 00/15] Core based CPU hotplug for PowerPC sPAPR

2016-05-25 Thread David Gibson

On Thu, May 12, 2016 at 09:18:10AM +0530, Bharata B Rao wrote:
> Hi,
> 
> This is v3 of "Core based CPU hotplug for PowerPC sPAPR". The hotplug
> semantics looks like this:
> 
> (qemu) device_add POWER8E-spapr-cpu-core,id=core2,core=16[,threads=4]
> (qemu) device_add POWER8E_v2.1-spapr-cpu-core,id=core2,core=16[,threads=4]
> 
> Changes in v3
> -
> - Moved CPU ObjectClass pointer from sPAPR specific CPU core type to
>   its parent type, the abstract sPAPR CPU core type. This largely reduces
>   the use of macros.
> - Including Igor's QMP query-hotpluggable-cpus patches in this series.
> - Added HMP version for query-hotpluggable-cpus.
> - Added a patch to prevent QEMU crash due to DRC detach racing against attach.
> - Addressed miscellaneous review comments from previous post.
> 
> v2.1: https://lists.gnu.org/archive/html/qemu-ppc/2016-03/msg00649.html
> 
> Bharata B Rao (11):
>   exec: Remove cpu from cpus list during cpu_exec_exit()
>   exec: Do vmstate unregistration from cpu_exec_exit()
>   cpu: Add a sync version of cpu_remove()
>   cpu: Abstract CPU core type
>   spapr: Abstract CPU core device and type specific core devices
>   spapr: convert boot CPUs into CPU core devices
>   spapr: CPU hotplug support
>   xics,xics_kvm: Handle CPU unplug correctly
>   spapr_drc: Prevent detach racing against attach for CPU DR
>   spapr: CPU hot unplug support
>   hmp: Add 'info hotpluggable-cpus' HMP command

Paolo,

Patches 1-4 of this series stand on their own, are AFAICT in your
bailiwick, and I think they're ready to go.  Are you ready to merge
these yet?


Igor,

Patches 5-6 are not ppc specific.  Do you think they're ready to go as
a first step towards new cpu hotplug infrastructure?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH] MAINTAINERS: Add David Gibson as ppc maintainer

2016-05-25 Thread David Gibson

On Thu, May 26, 2016 at 08:18:26AM +0200, Alexander Graf wrote:
> 
> 
> > Am 26.05.2016 um 08:16 schrieb David Gibson :
> > 
> > I've been de facto co-maintainer of all ppc target related code for some
> > time.  Alex Graf isworking on other things and doesn't have a whole lot of
> > time for qemu ppc maintainership.  So, update the MAINTAINERS file to
> > reflect this.
> > 
> > Signed-off-by: David Gibson 
> 
> Reviewed-by: Alexander Graf 

Peter, do you want to merge this yourself, or should I bundle it in
with my next ppc pull request?

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

[Qemu-devel] [PATCH 1/2] raw-posix: Fetch max sectors for host block device from sysfs

2016-05-25 Thread Fam Zheng

This is sometimes a useful value we should count in.

Signed-off-by: Fam Zheng 
---
 block/raw-posix.c | 47 +++
 1 file changed, 47 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index a4f5a1b..d3796ad 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -729,9 +729,56 @@ static void raw_reopen_abort(BDRVReopenState *state)
 state->opaque = NULL;
 }
 
+static int hdev_get_max_transfer_length(dev_t dev)
+{
+int ret;
+int fd;
+char *path;
+const char *end;
+char buf[32];
+long len;
+
+path = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_sectors_kb",
+   major(dev), minor(dev));
+fd = open(path, O_RDONLY);
+if (fd < 0) {
+ret = -errno;
+goto out;
+}
+ret = read(fd, buf, sizeof(buf));
+if (ret < 0) {
+ret = -errno;
+goto out;
+} else if (ret == 0) {
+ret = -EIO;
+goto out;
+}
+buf[ret] = 0;
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtol(buf, &end, 10, &len);
+if (ret == 0 && end && *end == '\n') {
+ret = len * 1024 / BDRV_SECTOR_SIZE;
+}
+
+close(fd);
+out:
+g_free(path);
+return ret;
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
+struct stat st;
+
+if (!fstat(s->fd, &st)) {
+if (S_ISBLK(st.st_mode)) {
+int ret = hdev_get_max_transfer_length(st.st_rdev);
+if (ret >= 0) {
+bs->bl.max_transfer_length = ret;
+}
+}
+}
 
 raw_probe_alignment(bs, s->fd, errp);
 bs->bl.min_mem_alignment = s->buf_align;
-- 
2.8.2

Re: [Qemu-devel] [PATCH v6 for-2.7 00/28] Convert migration to QIOChannel & support

2016-05-25 Thread Amit Shah

On (Wed) 27 Apr 2016 [11:04:50], Daniel P. Berrange wrote:
> This is an update of patches that were previously posted
> 
>   FYI: https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg00829.html
>v1: https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg01914.html
>v2: https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg03509.html
>v3: https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg06279.html
>v4: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg02769.html
>v5: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg04591.html
> 
> There are no functional changes since v5 posting. This is just a
> rebase to resolve conflicts against master. From my POV this is
> ready for 2.7.

Thanks, I've applied this series, and sent a pull req.

I've rebased it to master, which caused a few patches to change (but
not significantly.  Majority changes were the dropping of x- in
autoconverge, and the change from qjson.c to migration/qjson.c in
Makefiles.

I've also updated the 'Since 2.6' to 'Since 2.7' in the qapi
descriptions.

Each patch compiles fine, and the diffstat is the same (except for one
patch in migration.c, which did not need wrapping after the x- drop).

The only outstanding request I see is from Juan and me to consider
renaming of the functions that start migrations; that can be addressed
in a follow-up series.

Dan, can you also fill up a feature page (or two) for this series
based on the template here:

http://qemu-project.org/NewFeatureTemplate

that'll help with testing efforts.  I've seen the extensive
documentation on your blog, links to them will be great.

Thanks!

Amit

Re: [Qemu-devel] [PATCH V2] block/io: optimize bdrv_co_pwritev for small requests

2016-05-25 Thread Fam Zheng

On Tue, 05/24 16:30, Peter Lieven wrote:
> in a read-modify-write cycle a small request might cause
> head and tail to fall into the same aligned block. Currently
> QEMU reads the same block twice in this case which is
> not necessary.
> 
> Signed-off-by: Peter Lieven 

Thanks, applied to my block branch:

https://github.com/famz/qemu/tree/block

[Qemu-devel] [PATCH 0/2] block: Expose host block dev I/O size limit in scsi-block/scsi-generic

2016-05-25 Thread Fam Zheng

Host devices passed through as scsi-block or scsi-generic may have a compound
maximum I/O limit (out of physical hardware limit, driver quirks and file
system configuration, etc) that is presented in the sysfs entry. SG_IOs we
issue should respect this. However the value is currently not discoverable by
guest, because the vHBA (virtio-scsi) would present an fixed emulated limit,
while the INQUIRY (vpd=0xb0, block limits page) response solely speaks for the
LUN itself, not the host kernel. The issue is observed on scsi passthrough of
host usb or dm-multipath disks, and it is not specific to certain device types.

The proposed solution is collecting the host sysfs limit in raw-posix driver
when a block device is used, and intercepting INQUIRY response to clip the
max xfer len field.

This fixes booting a SanDisk usb-key with an upstream kernel.  The usb disk
reports a nonsense large value in INQUIRY, while the host (usb quirk?) only
allows 120KB.


Fam Zheng (2):
  raw-posix: Fetch max sectors for host block device from sysfs
  scsi-generic: Merge block max xfer len in INQUIRY response

 block/raw-posix.c  | 47 +++
 hw/scsi/scsi-generic.c | 12 
 2 files changed, 59 insertions(+)

-- 
2.8.2

[Qemu-devel] [PATCH 2/2] scsi-generic: Merge block max xfer len in INQUIRY response

2016-05-25 Thread Fam Zheng

The rationale is similar to the above mode sense response interception:
this is practically the only channel to communicate restraints from
elsewhere such as host and block driver.

The scsi bus we attach onto can have a larger max xfer len than what is
accepted by the host file system (guarding between the host scsi LUN and
QEMU), in which case the SG_IO we generate would get -EINVAL.

Signed-off-by: Fam Zheng 
---
 hw/scsi/scsi-generic.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 7459465..71372a8 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -222,6 +222,18 @@ static void scsi_read_complete(void * opaque, int ret)
 r->buf[3] |= 0x80;
 }
 }
+if (s->type == TYPE_DISK &&
+r->req.cmd.buf[0] == INQUIRY &&
+r->req.cmd.buf[2] == 0xb0) {
+uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
+if (max_xfer_len) {
+stl_be_p(&r->buf[8], max_xfer_len);
+/* Also take care of the opt xfer len. */
+if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
+stl_be_p(&r->buf[12], max_xfer_len);
+}
+}
+}
 scsi_req_data(&r->req, len);
 scsi_req_unref(&r->req);
 }
-- 
2.8.2

[Qemu-devel] [PULL 25/28] migration: define 'tls-creds' and 'tls-hostname' migration parameters

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Define two new migration parameters to be used with TLS encryption.
The 'tls-creds' parameter provides the ID of an instance of the
'tls-creds' object type, or rather a subclass such as 'tls-creds-x509'.
Providing these credentials will enable use of TLS on the migration
data stream.

If using x509 certificates, together with a migration URI that does
not include a hostname, the 'tls-hostname' parameter provides the
hostname to use when verifying the server's x509 certificate. This
allows TLS to be used in combination with fd: and exec: protocols
where a TCP connection is established by a 3rd party outside of
QEMU.

NB, this requires changing the migrate_set_parameter method in the
HMP to accept a 's' (string) value instead of 'i' (integer). This
is backwards compatible, because the parsing of strings allows the
quotes to be optional, thus any integer is also a valid string.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-26-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 hmp-commands.hx   |  2 +-
 hmp.c | 44 --
 migration/migration.c | 14 +
 qapi-schema.json  | 58 ---
 4 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 4f4f60a..98b4b1a 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1008,7 +1008,7 @@ ETEXI
 
 {
 .name   = "migrate_set_parameter",
-.args_type  = "parameter:s,value:i",
+.args_type  = "parameter:s,value:s",
 .params = "parameter value",
 .help   = "Set the parameter for migration",
 .mhandler.cmd = hmp_migrate_set_parameter,
diff --git a/hmp.c b/hmp.c
index a464ca9..a4b1d3d 100644
--- a/hmp.c
+++ b/hmp.c
@@ -294,6 +294,12 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, " %s: %" PRId64,
 
MigrationParameter_lookup[MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT],
 params->cpu_throttle_increment);
+monitor_printf(mon, " %s: '%s'",
+MigrationParameter_lookup[MIGRATION_PARAMETER_TLS_CREDS],
+params->tls_creds ? : "");
+monitor_printf(mon, " %s: '%s'",
+MigrationParameter_lookup[MIGRATION_PARAMETER_TLS_HOSTNAME],
+params->tls_hostname ? : "");
 monitor_printf(mon, "\n");
 }
 
@@ -1243,13 +1249,17 @@ void hmp_migrate_set_capability(Monitor *mon, const 
QDict *qdict)
 void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
 {
 const char *param = qdict_get_str(qdict, "parameter");
-int value = qdict_get_int(qdict, "value");
+const char *valuestr = qdict_get_str(qdict, "value");
+long valueint = 0;
 Error *err = NULL;
 bool has_compress_level = false;
 bool has_compress_threads = false;
 bool has_decompress_threads = false;
 bool has_cpu_throttle_initial = false;
 bool has_cpu_throttle_increment = false;
+bool has_tls_creds = false;
+bool has_tls_hostname = false;
+bool use_int_value = false;
 int i;
 
 for (i = 0; i < MIGRATION_PARAMETER__MAX; i++) {
@@ -1257,25 +1267,46 @@ void hmp_migrate_set_parameter(Monitor *mon, const 
QDict *qdict)
 switch (i) {
 case MIGRATION_PARAMETER_COMPRESS_LEVEL:
 has_compress_level = true;
+use_int_value = true;
 break;
 case MIGRATION_PARAMETER_COMPRESS_THREADS:
 has_compress_threads = true;
+use_int_value = true;
 break;
 case MIGRATION_PARAMETER_DECOMPRESS_THREADS:
 has_decompress_threads = true;
+use_int_value = true;
 break;
 case MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL:
 has_cpu_throttle_initial = true;
+use_int_value = true;
 break;
 case MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT:
 has_cpu_throttle_increment = true;
 break;
+case MIGRATION_PARAMETER_TLS_CREDS:
+has_tls_creds = true;
+break;
+case MIGRATION_PARAMETER_TLS_HOSTNAME:
+has_tls_hostname = true;
+break;
+}
+
+if (use_int_value) {
+if (qemu_strtol(valuestr, NULL, 10, &valueint) < 0) {
+error_setg(&err, "Unable to parse '%s' as an int",
+   valuestr);
+goto cleanup;
+}
 }
-qmp_migrate_set_parameters(has_compress_level, value,
-   has_compress_threads, value,
-   has_decompress_threads, value,
-   has_cpu_throttle_initi

[Qemu-devel] [PATCH] MAINTAINERS: Add David Gibson as ppc maintainer

2016-05-25 Thread David Gibson

I've been de facto co-maintainer of all ppc target related code for some
time.  Alex Graf isworking on other things and doesn't have a whole lot of
time for qemu ppc maintainership.  So, update the MAINTAINERS file to
reflect this.

Signed-off-by: David Gibson 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 81e7fac..012a99b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -165,6 +165,7 @@ F: hw/openrisc/
 F: tests/tcg/openrisc/
 
 PowerPC
+M: David Gibson 
 M: Alexander Graf 
 L: qemu-...@nongnu.org
 S: Maintained
-- 
2.5.5

[Qemu-devel] [PULL 21/28] migration: delete QEMUFile sockets implementation

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Now that the tcp, unix and fd migration backends have converted
to use the QIOChannel based QEMUFile, there is no user remaining
for the sockets based QEMUFile impl and it can be deleted.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-22-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |   2 -
 migration/Makefile.objs   |   2 +-
 migration/qemu-file-unix.c| 323 --
 3 files changed, 1 insertion(+), 326 deletions(-)
 delete mode 100644 migration/qemu-file-unix.c

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index edaf598..ba5fe08 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -128,8 +128,6 @@ typedef struct QEMUFileHooks {
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
-QEMUFile *qemu_fdopen(int fd, const char *mode);
-QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
 QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 9e977a4..5260048 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += vmstate.o
-common-obj-y += qemu-file.o qemu-file-unix.o qemu-file-stdio.o
+common-obj-y += qemu-file.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
 common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
diff --git a/migration/qemu-file-unix.c b/migration/qemu-file-unix.c
deleted file mode 100644
index 4474e18..000
--- a/migration/qemu-file-unix.c
+++ /dev/null
@@ -1,323 +0,0 @@
-/*
- * QEMU System Emulator
- *
- * Copyright (c) 2003-2008 Fabrice Bellard
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "qemu/error-report.h"
-#include "qemu/iov.h"
-#include "qemu/sockets.h"
-#include "qemu/coroutine.h"
-#include "migration/qemu-file.h"
-#include "migration/qemu-file-internal.h"
-
-typedef struct QEMUFileSocket {
-int fd;
-QEMUFile *file;
-} QEMUFileSocket;
-
-static ssize_t socket_writev_buffer(void *opaque, struct iovec *iov, int 
iovcnt,
-int64_t pos)
-{
-QEMUFileSocket *s = opaque;
-ssize_t len;
-ssize_t size = iov_size(iov, iovcnt);
-ssize_t offset = 0;
-int err;
-
-while (size > 0) {
-len = iov_send(s->fd, iov, iovcnt, offset, size);
-
-if (len > 0) {
-size -= len;
-offset += len;
-}
-
-if (size > 0) {
-if (errno != EAGAIN && errno != EWOULDBLOCK) {
-error_report("socket_writev_buffer: Got err=%d for (%zu/%zu)",
- errno, (size_t)size, (size_t)len);
-/*
- * If I've already sent some but only just got the error, I
- * could return the amount validly sent so far and wait for the
- * next call to report the error, but I'd rather flag the error
- * immediately.
- */
-return -errno;
-}
-
-/* Emulate blocking */
-GPollFD pfd;
-
-pfd.fd = s->fd;
-pfd.events = G_IO_OUT | G_IO_ERR;
-pfd.revents = 0;
-TFR(err = g_poll(&pfd, 1, -1 /* no timeout */));
-/* Errors other than EINTR intentionally ignored */
-}
- }
-
-return offset;
-}
-
-static int socket_get_fd(void *opaque)
-{
-QEMUFileSocket *s = opaque;
-
-return s->fd;
-}
-
-static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
-

[Qemu-devel] [PULL 26/28] migration: add support for encrypting data with TLS

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

This extends the migration_set_incoming_channel and
migration_set_outgoing_channel methods so that they
will automatically wrap the QIOChannel in a
QIOChannelTLS instance if TLS credentials are configured
in the migration parameters.

This allows TLS to work for tcp, unix, fd and exec
migration protocols. It does not (currently) work for
RDMA since it does not use these APIs, but it is
unlikely that TLS would be desired with RDMA anyway
since it would degrade the performance to that seen
with TCP defeating the purpose of using RDMA.

On the target host, QEMU would be launched with a set
of TLS credentials for a server endpoint

 $ qemu-system-x86_64 -monitor stdio -incoming defer \
-object 
tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=server,id=tls0 \
...other args...

To enable incoming TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate_incoming tcp:myhostname:9000

On the source host, QEMU is launched in a similar
manner but using client endpoint credentials

 $ qemu-system-x86_64 -monitor stdio \
-object 
tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=client,id=tls0 \
...other args...

To enable outgoing TLS migration 2 monitor commands are
then used

  (qemu) migrate_set_str_parameter tls-creds tls0
  (qemu) migrate tcp:otherhostname:9000

Thanks to earlier improvements to error reporting,
TLS errors can be seen 'info migrate' when doing a
detached migration. For example:

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: TLS handshake failed: The TLS connection was non-properly 
terminated.

Or

  (qemu) info migrate
  capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: 
off compress: off events: off x-postcopy-ram: off
  Migration status: failed
  total time: 0 milliseconds
  error description: Certificate does not match the hostname localhost

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-27-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/migration.h |  12 +++-
 migration/Makefile.objs   |   1 +
 migration/exec.c  |   2 +-
 migration/fd.c|   2 +-
 migration/migration.c |  40 +--
 migration/socket.c|  34 +++--
 migration/tls.c   | 161 ++
 trace-events  |  12 +++-
 8 files changed, 247 insertions(+), 17 deletions(-)
 create mode 100644 migration/tls.c

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 74105a1..13b12b7 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -188,8 +188,18 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp);
 void migration_set_incoming_channel(MigrationState *s,
 QIOChannel *ioc);
 
+void migration_tls_set_incoming_channel(MigrationState *s,
+QIOChannel *ioc,
+Error **errp);
+
 void migration_set_outgoing_channel(MigrationState *s,
-QIOChannel *ioc);
+QIOChannel *ioc,
+const char *hostname);
+
+void migration_tls_set_outgoing_channel(MigrationState *s,
+QIOChannel *ioc,
+const char *hostname,
+Error **errp);
 
 uint64_t migrate_max_downtime(void);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 7e1bec2..30ad945 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,5 @@
 common-obj-y += migration.o socket.o fd.o exec.o
+common-obj-y += tls.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/exec.c b/migration/exec.c
index c825e27..1515cc3 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -38,7 +38,7 @@ void exec_start_outgoing_migration(MigrationState *s, const 
char *command, Error
 return;
 }
 
-migration_set_outgoing_channel(s, ioc);
+migration_set_outgoing_channel(s, ioc, NULL);
 object_unref(OBJECT(ioc));
 }
 
diff --git a/migration/fd.c b/migration/fd.c
index 60a75b8..fc5c9ee 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -38,7 +38,7 @@ void fd_start_outgoing_migration(MigrationState *s, const 
char *fdname, Error **
 return;
 }
 
-migration_set_outgoing_channel(s, ioc);
+migration_set_outgoing_channel(s, ioc, NULL);
 object_unref(OBJECT(ioc));
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 9b037ab..7ecbade 1

[Qemu-devel] [PULL 28/28] migration: remove qemu_get_fd method from QEMUFile

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Now that there is a set_blocking callback in QEMUFileOps,
and all users needing non-blocking support have been
converted to QIOChannel, there is no longer any codepath
requiring the qemu_get_fd() method for QEMUFile. Remove it
to avoid further code being introduced with an expectation
of direct file handle access.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-29-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |  1 -
 migration/qemu-file.c | 14 --
 2 files changed, 15 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 36af5f4..2409a98 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -103,7 +103,6 @@ typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, 
bool wr);
 typedef struct QEMUFileOps {
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
-QEMUFileGetFD *get_fd;
 QEMUFileSetBlocking *set_blocking;
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 6790040..8aea1c7 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -268,14 +268,6 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
 return len;
 }
 
-int qemu_get_fd(QEMUFile *f)
-{
-if (f->ops->get_fd) {
-return f->ops->get_fd(f->opaque);
-}
-return -1;
-}
-
 void qemu_update_position(QEMUFile *f, size_t size)
 {
 f->pos += size;
@@ -688,11 +680,5 @@ void qemu_file_set_blocking(QEMUFile *f, bool block)
 {
 if (f->ops->set_blocking) {
 f->ops->set_blocking(f->opaque, block);
-} else {
-if (block) {
-qemu_set_block(qemu_get_fd(f));
-} else {
-qemu_set_nonblock(qemu_get_fd(f));
-}
 }
 }
-- 
2.5.5

[Qemu-devel] [PULL 18/28] migration: convert savevm to use QIOChannel for writing to files

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Convert the exec savevm code to use QIOChannel and QEMUFileChannel,
instead of the stdio APIs.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-19-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/savevm.c   |  8 +---
 tests/Makefile   |  4 ++--
 tests/test-vmstate.c | 11 ++-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 43031a0..2bd3452 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -52,6 +52,7 @@
 #include "block/qapi.h"
 #include "qemu/cutils.h"
 #include "io/channel-buffer.h"
+#include "io/channel-file.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -2046,6 +2047,7 @@ void hmp_savevm(Monitor *mon, const QDict *qdict)
 void qmp_xen_save_devices_state(const char *filename, Error **errp)
 {
 QEMUFile *f;
+QIOChannelFile *ioc;
 int saved_vm_running;
 int ret;
 
@@ -2053,11 +2055,11 @@ void qmp_xen_save_devices_state(const char *filename, 
Error **errp)
 vm_stop(RUN_STATE_SAVE_VM);
 global_state_store_running();
 
-f = qemu_fopen(filename, "wb");
-if (!f) {
-error_setg_file_open(errp, errno, filename);
+ioc = qio_channel_file_new_path(filename, O_WRONLY | O_CREAT, 0660, errp);
+if (!ioc) {
 goto the_end;
 }
+f = qemu_fopen_channel_output(QIO_CHANNEL(ioc));
 ret = qemu_save_device_state(f);
 qemu_fclose(f);
 if (ret < 0) {
diff --git a/tests/Makefile b/tests/Makefile
index b5cb75e..8397b96 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -439,8 +439,8 @@ tests/test-qdev-global-props$(EXESUF): 
tests/test-qdev-global-props.o \
$(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
migration/vmstate.o migration/qemu-file.o \
-migration/qemu-file-unix.o migration/qjson.o \
-   $(test-qom-obj-y)
+migration/qemu-file-channel.o migration/qjson.o \
+   $(test-io-obj-y)
 tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \
$(test-util-obj-y)
 tests/test-base64$(EXESUF): tests/test-base64.o \
diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index f337cf6..d19b16a 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -29,6 +29,7 @@
 #include "migration/migration.h"
 #include "migration/vmstate.h"
 #include "qemu/coroutine.h"
+#include "io/channel-file.h"
 
 static char temp_file[] = "/tmp/vmst.test.XX";
 static int temp_fd;
@@ -49,11 +50,17 @@ void yield_until_fd_readable(int fd)
 static QEMUFile *open_test_file(bool write)
 {
 int fd = dup(temp_fd);
+QIOChannel *ioc;
 lseek(fd, 0, SEEK_SET);
 if (write) {
 g_assert_cmpint(ftruncate(fd, 0), ==, 0);
 }
-return qemu_fdopen(fd, write ? "wb" : "rb");
+ioc = QIO_CHANNEL(qio_channel_file_new_fd(fd));
+if (write) {
+return qemu_fopen_channel_output(ioc);
+} else {
+return qemu_fopen_channel_input(ioc);
+}
 }
 
 #define SUCCESS(val) \
@@ -469,6 +476,8 @@ int main(int argc, char **argv)
 {
 temp_fd = mkstemp(temp_file);
 
+module_call_init(MODULE_INIT_QOM);
+
 g_test_init(&argc, &argv, NULL);
 g_test_add_func("/vmstate/simple/primitive", test_simple_primitive);
 g_test_add_func("/vmstate/versioned/load/v1", test_load_v1);
-- 
2.5.5

[Qemu-devel] [PULL 22/28] migration: delete QEMUFile stdio implementation

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Now that the exec migration backend and savevm have converted
to use the QIOChannel based QEMUFile, there is no user remaining
for the stdio based QEMUFile impl and it can be deleted.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-23-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |   2 -
 migration/Makefile.objs   |   2 +-
 migration/qemu-file-stdio.c   | 196 --
 3 files changed, 1 insertion(+), 199 deletions(-)
 delete mode 100644 migration/qemu-file-stdio.c

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index ba5fe08..43eba9b 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -127,10 +127,8 @@ typedef struct QEMUFileHooks {
 } QEMUFileHooks;
 
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
-QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
 QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
-QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 5260048..7e1bec2 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += vmstate.o
-common-obj-y += qemu-file.o qemu-file-stdio.o
+common-obj-y += qemu-file.o
 common-obj-y += qemu-file-channel.o
 common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
diff --git a/migration/qemu-file-stdio.c b/migration/qemu-file-stdio.c
deleted file mode 100644
index f402e8f..000
--- a/migration/qemu-file-stdio.c
+++ /dev/null
@@ -1,196 +0,0 @@
-/*
- * QEMU System Emulator
- *
- * Copyright (c) 2003-2008 Fabrice Bellard
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "qemu/coroutine.h"
-#include "migration/qemu-file.h"
-
-typedef struct QEMUFileStdio {
-FILE *stdio_file;
-QEMUFile *file;
-} QEMUFileStdio;
-
-static int stdio_get_fd(void *opaque)
-{
-QEMUFileStdio *s = opaque;
-
-return fileno(s->stdio_file);
-}
-
-static ssize_t stdio_put_buffer(void *opaque, const uint8_t *buf, int64_t pos,
-size_t size)
-{
-QEMUFileStdio *s = opaque;
-size_t res;
-
-res = fwrite(buf, 1, size, s->stdio_file);
-
-if (res != size) {
-return -errno;
-}
-return res;
-}
-
-static ssize_t stdio_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
-size_t size)
-{
-QEMUFileStdio *s = opaque;
-FILE *fp = s->stdio_file;
-ssize_t bytes;
-
-for (;;) {
-clearerr(fp);
-bytes = fread(buf, 1, size, fp);
-if (bytes != 0 || !ferror(fp)) {
-break;
-}
-if (errno == EAGAIN) {
-yield_until_fd_readable(fileno(fp));
-} else if (errno != EINTR) {
-break;
-}
-}
-return bytes;
-}
-
-static int stdio_pclose(void *opaque)
-{
-QEMUFileStdio *s = opaque;
-int ret;
-ret = pclose(s->stdio_file);
-if (ret == -1) {
-ret = -errno;
-} else if (!WIFEXITED(ret) || WEXITSTATUS(ret) != 0) {
-/* close succeeded, but non-zero exit code: */
-ret = -EIO; /* fake errno value */
-}
-g_free(s);
-return ret;
-}
-
-static int stdio_fclose(void *opaque)
-{
-QEMUFileStdio *s = opaque;
-int ret = 0;
-
-if (qemu_file_is_writable(s->file)) {
-int fd = fileno(s->stdio_file);
-struct stat st;
-
-ret = fstat(fd, &st);
-if (ret == 0 && S_ISREG(st.st_mode)) {
-/*
- * If the file handle

[Qemu-devel] [PULL 27/28] migration: remove support for non-iovec based write handlers

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

All the remaining QEMUFile implementations provide an iovec
based write handler, so the put_buffer callback can be removed
to simplify the code.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-28-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |  9 -
 migration/qemu-file.c | 36 
 migration/savevm.c|  8 
 3 files changed, 8 insertions(+), 45 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 43eba9b..36af5f4 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -28,14 +28,6 @@
 #include "io/channel.h"
 
 
-/* This function writes a chunk of data to a file at the given position.
- * The pos argument can be ignored if the file is only being used for
- * streaming.  The handler must write all of the data or return a negative
- * errno value.
- */
-typedef ssize_t (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf,
-int64_t pos, size_t size);
-
 /* Read a chunk of data from a file at the given position.  The pos argument
  * can be ignored if the file is only be used for streaming.  The number of
  * bytes actually read should be returned.
@@ -109,7 +101,6 @@ typedef QEMUFile *(QEMURetPathFunc)(void *opaque);
 typedef int (QEMUFileShutdownFunc)(void *opaque, bool rd, bool wr);
 
 typedef struct QEMUFileOps {
-QEMUFilePutBufferFunc *put_buffer;
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
 QEMUFileGetFD *get_fd;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index cf743d1..6790040 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -129,7 +129,7 @@ void qemu_file_set_error(QEMUFile *f, int ret)
 
 bool qemu_file_is_writable(QEMUFile *f)
 {
-return f->ops->writev_buffer || f->ops->put_buffer;
+return f->ops->writev_buffer;
 }
 
 /**
@@ -148,16 +148,9 @@ void qemu_fflush(QEMUFile *f)
 return;
 }
 
-if (f->ops->writev_buffer) {
-if (f->iovcnt > 0) {
-expect = iov_size(f->iov, f->iovcnt);
-ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, f->pos);
-}
-} else {
-if (f->buf_index > 0) {
-expect = f->buf_index;
-ret = f->ops->put_buffer(f->opaque, f->buf, f->pos, f->buf_index);
-}
+if (f->iovcnt > 0) {
+expect = iov_size(f->iov, f->iovcnt);
+ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, f->pos);
 }
 
 if (ret >= 0) {
@@ -337,11 +330,6 @@ static void add_to_iovec(QEMUFile *f, const uint8_t *buf, 
size_t size)
 
 void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size)
 {
-if (!f->ops->writev_buffer) {
-qemu_put_buffer(f, buf, size);
-return;
-}
-
 if (f->last_error) {
 return;
 }
@@ -365,9 +353,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, 
size_t size)
 }
 memcpy(f->buf + f->buf_index, buf, l);
 f->bytes_xfer += l;
-if (f->ops->writev_buffer) {
-add_to_iovec(f, f->buf + f->buf_index, l);
-}
+add_to_iovec(f, f->buf + f->buf_index, l);
 f->buf_index += l;
 if (f->buf_index == IO_BUF_SIZE) {
 qemu_fflush(f);
@@ -388,9 +374,7 @@ void qemu_put_byte(QEMUFile *f, int v)
 
 f->buf[f->buf_index] = v;
 f->bytes_xfer++;
-if (f->ops->writev_buffer) {
-add_to_iovec(f, f->buf + f->buf_index, 1);
-}
+add_to_iovec(f, f->buf + f->buf_index, 1);
 f->buf_index++;
 if (f->buf_index == IO_BUF_SIZE) {
 qemu_fflush(f);
@@ -554,12 +538,8 @@ int64_t qemu_ftell_fast(QEMUFile *f)
 int64_t ret = f->pos;
 int i;
 
-if (f->ops->writev_buffer) {
-for (i = 0; i < f->iovcnt; i++) {
-ret += f->iov[i].iov_len;
-}
-} else {
-ret += f->buf_index;
+for (i = 0; i < f->iovcnt; i++) {
+ret += f->iov[i].iov_len;
 }
 
 return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index 2bd3452..6c21231 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -160,13 +160,6 @@ static ssize_t block_writev_buffer(void *opaque, struct 
iovec *iov, int iovcnt,
 return qiov.size;
 }
 
-static ssize_t block_put_buffer(void *opaque, const uint8_t *buf,
-int64_t pos, size_t size)
-{
-bdrv_save_vmstate(opaque, buf, pos, size);
-return size;
-}
-
 static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
 size_t size)
 {
@@ -184,7 +177,6 @@ static const QEMUFileOps bdrv_read_ops = {
 };
 
 static const QEMUFileOps bdrv_write_ops = {
-.put_buffer = block_put_buffer,
 .writev_buffer  = block_writev_buffer,
 .close  = bdrv_fclose
 };
-- 
2.5.5

[Qemu-devel] [PULL 19/28] migration: delete QEMUFile buffer implementation

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The qemu_bufopen() method is no longer used, so the memory
buffer based QEMUFile backend can be deleted entirely.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-20-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |  6 ---
 migration/qemu-file-buf.c | 96 ---
 2 files changed, 102 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 0329ccc..6618d19 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -140,7 +140,6 @@ QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
 QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
-QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
 void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
@@ -166,11 +165,6 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t 
*buf,
  off_t pos, size_t count);
 
 
-/*
- * For use on files opened with qemu_bufopen
- */
-const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f);
-
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
 qemu_put_byte(f, (int)v);
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
index 7b8e78e..668ab35 100644
--- a/migration/qemu-file-buf.c
+++ b/migration/qemu-file-buf.c
@@ -366,99 +366,3 @@ ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t 
*source,
 
 return count;
 }
-
-typedef struct QEMUBuffer {
-QEMUSizedBuffer *qsb;
-QEMUFile *file;
-bool qsb_allocated;
-} QEMUBuffer;
-
-static ssize_t buf_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
-  size_t size)
-{
-QEMUBuffer *s = opaque;
-ssize_t len = qsb_get_length(s->qsb) - pos;
-
-if (len <= 0) {
-return 0;
-}
-
-if (len > size) {
-len = size;
-}
-return qsb_get_buffer(s->qsb, pos, len, buf);
-}
-
-static ssize_t buf_put_buffer(void *opaque, const uint8_t *buf,
-  int64_t pos, size_t size)
-{
-QEMUBuffer *s = opaque;
-
-return qsb_write_at(s->qsb, buf, pos, size);
-}
-
-static int buf_close(void *opaque)
-{
-QEMUBuffer *s = opaque;
-
-if (s->qsb_allocated) {
-qsb_free(s->qsb);
-}
-
-g_free(s);
-
-return 0;
-}
-
-const QEMUSizedBuffer *qemu_buf_get(QEMUFile *f)
-{
-QEMUBuffer *p;
-
-qemu_fflush(f);
-
-p = f->opaque;
-
-return p->qsb;
-}
-
-static const QEMUFileOps buf_read_ops = {
-.get_buffer = buf_get_buffer,
-.close =  buf_close,
-};
-
-static const QEMUFileOps buf_write_ops = {
-.put_buffer = buf_put_buffer,
-.close =  buf_close,
-};
-
-QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input)
-{
-QEMUBuffer *s;
-
-if (mode == NULL || (mode[0] != 'r' && mode[0] != 'w') ||
-mode[1] != '\0') {
-error_report("qemu_bufopen: Argument validity check failed");
-return NULL;
-}
-
-s = g_new0(QEMUBuffer, 1);
-s->qsb = input;
-
-if (s->qsb == NULL) {
-s->qsb = qsb_create(NULL, 0);
-s->qsb_allocated = true;
-}
-if (!s->qsb) {
-g_free(s);
-error_report("qemu_bufopen: qsb_create failed");
-return NULL;
-}
-
-
-if (mode[0] == 'r') {
-s->file = qemu_fopen_ops(s, &buf_read_ops);
-} else {
-s->file = qemu_fopen_ops(s, &buf_write_ops);
-}
-return s->file;
-}
-- 
2.5.5

[Qemu-devel] [PULL 12/28] migration: convert unix socket protocol to use QIOChannel

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Convert the unix socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of plain sockets
APIs. It can be unconditionally built, since the socket
impl of QIOChannel will report a suitable error on platforms
where UNIX sockets are unavailable.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-13-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/Makefile.objs |   4 +-
 migration/migration.c   |   4 ++
 migration/unix.c| 120 
 trace-events|   5 ++
 4 files changed, 80 insertions(+), 53 deletions(-)

diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index facf251..ad7fed9 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += migration.o tcp.o
+common-obj-y += migration.o tcp.o unix.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
@@ -6,7 +6,7 @@ common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
-common-obj-$(CONFIG_POSIX) += exec.o unix.o fd.o
+common-obj-$(CONFIG_POSIX) += exec.o fd.o
 
 common-obj-y += block.o
 
diff --git a/migration/migration.c b/migration/migration.c
index 1ab9535..a38da3a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -314,8 +314,10 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 #if !defined(WIN32)
 } else if (strstart(uri, "exec:", &p)) {
 exec_start_incoming_migration(p, errp);
+#endif
 } else if (strstart(uri, "unix:", &p)) {
 unix_start_incoming_migration(p, errp);
+#if !defined(WIN32)
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_incoming_migration(p, errp);
 #endif
@@ -1072,8 +1074,10 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk,
 #if !defined(WIN32)
 } else if (strstart(uri, "exec:", &p)) {
 exec_start_outgoing_migration(s, p, &local_err);
+#endif
 } else if (strstart(uri, "unix:", &p)) {
 unix_start_outgoing_migration(s, p, &local_err);
+#if !defined(WIN32)
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_outgoing_migration(s, p, &local_err);
 #endif
diff --git a/migration/unix.c b/migration/unix.c
index b3537fd..75205d4 100644
--- a/migration/unix.c
+++ b/migration/unix.c
@@ -1,10 +1,11 @@
 /*
  * QEMU live migration via Unix Domain Sockets
  *
- * Copyright Red Hat, Inc. 2009
+ * Copyright Red Hat, Inc. 2009-2016
  *
  * Authors:
  *  Chris Lalancette 
+ *  Daniel P. Berrange 
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -17,87 +18,104 @@
 
 #include "qemu-common.h"
 #include "qemu/error-report.h"
-#include "qemu/sockets.h"
-#include "qemu/main-loop.h"
+#include "qapi/error.h"
 #include "migration/migration.h"
 #include "migration/qemu-file.h"
-#include "block/block.h"
+#include "io/channel-socket.h"
+#include "trace.h"
 
-//#define DEBUG_MIGRATION_UNIX
 
-#ifdef DEBUG_MIGRATION_UNIX
-#define DPRINTF(fmt, ...) \
-do { printf("migration-unix: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
+static SocketAddress *unix_build_address(const char *path)
+{
+SocketAddress *saddr;
+
+saddr = g_new0(SocketAddress, 1);
+saddr->type = SOCKET_ADDRESS_KIND_UNIX;
+saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
+saddr->u.q_unix.data->path = g_strdup(path);
+
+return saddr;
+}
 
-static void unix_wait_for_connect(int fd, Error *err, void *opaque)
+
+static void unix_outgoing_migration(Object *src,
+Error *err,
+gpointer opaque)
 {
 MigrationState *s = opaque;
+QIOChannel *sioc = QIO_CHANNEL(src);
 
-if (fd < 0) {
-DPRINTF("migrate connect error: %s\n", error_get_pretty(err));
+if (err) {
+trace_migration_unix_outgoing_error(error_get_pretty(err));
 s->to_dst_file = NULL;
 migrate_fd_error(s, err);
 } else {
-DPRINTF("migrate connect success\n");
-s->to_dst_file = qemu_fopen_socket(fd, "wb");
-migrate_fd_connect(s);
+trace_migration_unix_outgoing_connected();
+migration_set_outgoing_channel(s, sioc);
 }
+object_unref(src);
 }
 
+
 void unix_start_outgoing_migration(MigrationState *s, const char *path, Error 
**errp)
 {
-unix_nonblocking_connect(path, unix_wait_for_connect, s, errp);
+SocketAddress *saddr = unix_build_address(path);
+QIOChannelSocket *sioc;
+sioc = qio_channel_socket_new();
+qio_channel_socket_connect_async(sioc,
+ saddr,
+ unix_outgoing_migration,
+ s,
+

Re: [Qemu-devel] [PATCH] MAINTAINERS: Add David Gibson as ppc maintainer

2016-05-25 Thread Alexander Graf



> Am 26.05.2016 um 08:16 schrieb David Gibson :
> 
> I've been de facto co-maintainer of all ppc target related code for some
> time.  Alex Graf isworking on other things and doesn't have a whole lot of
> time for qemu ppc maintainership.  So, update the MAINTAINERS file to
> reflect this.
> 
> Signed-off-by: David Gibson 

Reviewed-by: Alexander Graf 

Alex

> ---
> MAINTAINERS | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 81e7fac..012a99b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -165,6 +165,7 @@ F: hw/openrisc/
> F: tests/tcg/openrisc/
> 
> PowerPC
> +M: David Gibson 
> M: Alexander Graf 
> L: qemu-...@nongnu.org
> S: Maintained
> -- 
> 2.5.5
>

[Qemu-devel] [PULL 23/28] migration: move definition of struct QEMUFile back into qemu-file.c

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Now that the memory buffer based QEMUFile impl is gone, there
is no need for any backend to be accessing internals of the
QEMUFile struct, so it can be moved back into qemu-file.c

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-24-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/qemu-file-internal.h | 54 --
 migration/qemu-file.c  | 24 ++-
 2 files changed, 23 insertions(+), 55 deletions(-)
 delete mode 100644 migration/qemu-file-internal.h

diff --git a/migration/qemu-file-internal.h b/migration/qemu-file-internal.h
deleted file mode 100644
index 8fdfa95..000
--- a/migration/qemu-file-internal.h
+++ /dev/null
@@ -1,54 +0,0 @@
-/*
- * QEMU System Emulator
- *
- * Copyright (c) 2003-2008 Fabrice Bellard
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#ifndef QEMU_FILE_INTERNAL_H
-#define QEMU_FILE_INTERNAL_H 1
-
-#include "qemu-common.h"
-#include "qemu/iov.h"
-
-#define IO_BUF_SIZE 32768
-#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
-
-struct QEMUFile {
-const QEMUFileOps *ops;
-const QEMUFileHooks *hooks;
-void *opaque;
-
-int64_t bytes_xfer;
-int64_t xfer_limit;
-
-int64_t pos; /* start of buffer when writing, end of buffer
-when reading */
-int buf_index;
-int buf_size; /* 0 when writing */
-uint8_t buf[IO_BUF_SIZE];
-
-struct iovec iov[MAX_IOV_SIZE];
-unsigned int iovcnt;
-
-int last_error;
-};
-
-#endif
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2b25dec..cf743d1 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,9 +30,31 @@
 #include "qemu/coroutine.h"
 #include "migration/migration.h"
 #include "migration/qemu-file.h"
-#include "migration/qemu-file-internal.h"
 #include "trace.h"
 
+#define IO_BUF_SIZE 32768
+#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+
+struct QEMUFile {
+const QEMUFileOps *ops;
+const QEMUFileHooks *hooks;
+void *opaque;
+
+int64_t bytes_xfer;
+int64_t xfer_limit;
+
+int64_t pos; /* start of buffer when writing, end of buffer
+when reading */
+int buf_index;
+int buf_size; /* 0 when writing */
+uint8_t buf[IO_BUF_SIZE];
+
+struct iovec iov[MAX_IOV_SIZE];
+unsigned int iovcnt;
+
+int last_error;
+};
+
 /*
  * Stop a file from being read/written - not all backing files can do this
  * typically only sockets can.
-- 
2.5.5

[Qemu-devel] [PULL 20/28] migration: delete QEMUSizedBuffer struct

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Now that we don't have have a buffer based QemuFile
implementation, the QEMUSizedBuffer code is also
unused and can be deleted. A simpler buffer class
also exists in util/buffer.c which other code can
used as needed.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-21-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |  16 --
 include/qemu/typedefs.h   |   1 -
 migration/Makefile.objs   |   2 +-
 migration/qemu-file-buf.c | 368 --
 4 files changed, 1 insertion(+), 386 deletions(-)
 delete mode 100644 migration/qemu-file-buf.c

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 6618d19..edaf598 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -126,13 +126,6 @@ typedef struct QEMUFileHooks {
 QEMURamSaveFunc *save_page;
 } QEMUFileHooks;
 
-struct QEMUSizedBuffer {
-struct iovec *iov;
-size_t n_iov;
-size_t size; /* total allocated size in all iov's */
-size_t used; /* number of used bytes */
-};
-
 QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
@@ -155,15 +148,6 @@ void qemu_put_buffer_async(QEMUFile *f, const uint8_t 
*buf, size_t size);
 bool qemu_file_mode_is_not_valid(const char *mode);
 bool qemu_file_is_writable(QEMUFile *f);
 
-QEMUSizedBuffer *qsb_create(const uint8_t *buffer, size_t len);
-void qsb_free(QEMUSizedBuffer *);
-size_t qsb_set_length(QEMUSizedBuffer *qsb, size_t length);
-size_t qsb_get_length(const QEMUSizedBuffer *qsb);
-ssize_t qsb_get_buffer(const QEMUSizedBuffer *, off_t start, size_t count,
-   uint8_t *buf);
-ssize_t qsb_write_at(QEMUSizedBuffer *qsb, const uint8_t *buf,
- off_t pos, size_t count);
-
 
 static inline void qemu_put_ubyte(QEMUFile *f, unsigned int v)
 {
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 1dcf6f5..b113fcf 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -82,7 +82,6 @@ typedef struct QemuOpt QemuOpt;
 typedef struct QemuOpts QemuOpts;
 typedef struct QemuOptsList QemuOptsList;
 typedef struct QEMUSGList QEMUSGList;
-typedef struct QEMUSizedBuffer QEMUSizedBuffer;
 typedef struct QEMUTimer QEMUTimer;
 typedef struct QEMUTimerListGroup QEMUTimerListGroup;
 typedef struct QObject QObject;
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index c73fb8a..9e977a4 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,6 @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += vmstate.o
-common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
+common-obj-y += qemu-file.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
 common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
diff --git a/migration/qemu-file-buf.c b/migration/qemu-file-buf.c
deleted file mode 100644
index 668ab35..000
--- a/migration/qemu-file-buf.c
+++ /dev/null
@@ -1,368 +0,0 @@
-/*
- * QEMU System Emulator
- *
- * Copyright (c) 2003-2008 Fabrice Bellard
- * Copyright (c) 2014 IBM Corp.
- *
- * Authors:
- *  Stefan Berger 
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to 
deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-#include "qemu/osdep.h"
-#include "qemu-common.h"
-#include "qemu/error-report.h"
-#include "qemu/iov.h"
-#include "qemu/sockets.h"
-#include "qemu/coroutine.h"
-#include "migration/migration.h"
-#include "migration/qemu-file.h"
-#include "migration/qemu-file-internal.h"
-#include "trace.h"
-
-#define QSB_CHUNK_SIZE  (1 << 10)
-#define QSB_MAX_CHUNK_SIZE  (16 * QSB_CHUNK_SIZE)
-
-/**
- * Create a QEMUSizedBuffer
- * This type of buffer uses scatter-gather lists internally and
- * can grow to any size. Any data array in

[Qemu-devel] [PULL 08/28] migration: introduce a new QEMUFile impl based on QIOChannel

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Introduce a new QEMUFile implementation that is based on
the QIOChannel objects. This impl is different from existing
impls in that there is no file descriptor that can be made
available, as some channels may be based on higher level
protocols such as TLS.

Although the QIOChannel based implementation can trivially
provide a bi-directional stream, initially we have separate
functions for opening input & output directions to fit with
the expectation of the current QEMUFile interface.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-9-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |   4 +
 migration/Makefile.objs   |   1 +
 migration/qemu-file-channel.c | 180 ++
 3 files changed, 185 insertions(+)
 create mode 100644 migration/qemu-file-channel.c

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 2dea81f..0329ccc 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -23,7 +23,9 @@
  */
 #ifndef QEMU_FILE_H
 #define QEMU_FILE_H 1
+#include "qemu-common.h"
 #include "exec/cpu-common.h"
+#include "io/channel.h"
 
 
 /* This function writes a chunk of data to a file at the given position.
@@ -135,6 +137,8 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
+QEMUFile *qemu_fopen_channel_input(QIOChannel *ioc);
+QEMUFile *qemu_fopen_channel_output(QIOChannel *ioc);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
 void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index d25ff48..facf251 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,6 +1,7 @@
 common-obj-y += migration.o tcp.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
+common-obj-y += qemu-file-channel.o
 common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
 
diff --git a/migration/qemu-file-channel.c b/migration/qemu-file-channel.c
new file mode 100644
index 000..45c13f1
--- /dev/null
+++ b/migration/qemu-file-channel.c
@@ -0,0 +1,180 @@
+/*
+ * QEMUFile backend for QIOChannel objects
+ *
+ * Copyright (c) 2015-2016 Red Hat, Inc
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "migration/qemu-file.h"
+#include "io/channel-socket.h"
+#include "qemu/iov.h"
+
+
+static ssize_t channel_writev_buffer(void *opaque,
+ struct iovec *iov,
+ int iovcnt,
+ int64_t pos)
+{
+QIOChannel *ioc = QIO_CHANNEL(opaque);
+ssize_t done = 0;
+struct iovec *local_iov = g_new(struct iovec, iovcnt);
+struct iovec *local_iov_head = local_iov;
+unsigned int nlocal_iov = iovcnt;
+
+nlocal_iov = iov_copy(local_iov, nlocal_iov,
+  iov, iovcnt,
+  0, iov_size(iov, iovcnt));
+
+while (nlocal_iov > 0) {
+ssize_t len;
+len = qio_channel_writev(ioc, local_iov, nlocal_iov, NULL);
+if (len == QIO_CHANNEL_ERR_BLOCK) {
+qio_channel_wait(ioc, G_IO_OUT);
+continue;
+}
+if (len < 0) {
+/* XXX handle Error objects */
+done = -EIO;
+goto cleanup;
+}
+
+iov_discard_front(&local_iov, &nlocal_iov, len);
+done += len;
+}
+
+ cleanup:
+g_free(local_iov_head);
+return done;
+}
+
+
+static ssize_t channel_get_buffer(void *opaque,
+

[Qemu-devel] [PULL 17/28] migration: convert RDMA to use QIOChannel interface

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

This converts the RDMA code to provide a subclass of QIOChannel
that uses RDMA for the data transport.

This implementation of RDMA does not correctly handle non-blocking
mode. Reads might block if there was not already some pending data
and writes will block until all data is sent. This flawed behaviour
was already present in the existing impl, so appears to not be a
critical problem at this time. It should be on the list of things
to fix in the future though.

The RDMA code would be much better off it it could be split up in
a generic RDMA layer, a QIOChannel impl based on RMDA, and then
the RMDA migration glue. This is left as a future exercise for
the brave.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-18-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/rdma.c | 374 ---
 1 file changed, 275 insertions(+), 99 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index f8578b9..51bafc7 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2,10 +2,12 @@
  * RDMA protocol and interfaces
  *
  * Copyright IBM, Corp. 2010-2013
+ * Copyright Red Hat, Inc. 2015-2016
  *
  * Authors:
  *  Michael R. Hines 
  *  Jiuxing Liu 
+ *  Daniel P. Berrange 
  *
  * This work is licensed under the terms of the GNU GPL, version 2 or
  * later.  See the COPYING file in the top-level directory.
@@ -374,14 +376,20 @@ typedef struct RDMAContext {
 GHashTable *blockmap;
 } RDMAContext;
 
-/*
- * Interface to the rest of the migration call stack.
- */
-typedef struct QEMUFileRDMA {
+#define TYPE_QIO_CHANNEL_RDMA "qio-channel-rdma"
+#define QIO_CHANNEL_RDMA(obj) \
+OBJECT_CHECK(QIOChannelRDMA, (obj), TYPE_QIO_CHANNEL_RDMA)
+
+typedef struct QIOChannelRDMA QIOChannelRDMA;
+
+
+struct QIOChannelRDMA {
+QIOChannel parent;
 RDMAContext *rdma;
+QEMUFile *file;
 size_t len;
-void *file;
-} QEMUFileRDMA;
+bool blocking; /* XXX we don't actually honour this yet */
+};
 
 /*
  * Main structure for IB Send/Recv control messages.
@@ -2518,15 +2526,19 @@ static void *qemu_rdma_data_init(const char *host_port, 
Error **errp)
  * SEND messages for control only.
  * VM's ram is handled with regular RDMA messages.
  */
-static ssize_t qemu_rdma_put_buffer(void *opaque, const uint8_t *buf,
-int64_t pos, size_t size)
-{
-QEMUFileRDMA *r = opaque;
-QEMUFile *f = r->file;
-RDMAContext *rdma = r->rdma;
-size_t remaining = size;
-uint8_t * data = (void *) buf;
+static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
+   const struct iovec *iov,
+   size_t niov,
+   int *fds,
+   size_t nfds,
+   Error **errp)
+{
+QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
+QEMUFile *f = rioc->file;
+RDMAContext *rdma = rioc->rdma;
 int ret;
+ssize_t done = 0;
+size_t i;
 
 CHECK_ERROR_STATE();
 
@@ -2540,27 +2552,31 @@ static ssize_t qemu_rdma_put_buffer(void *opaque, const 
uint8_t *buf,
 return ret;
 }
 
-while (remaining) {
-RDMAControlHeader head;
+for (i = 0; i < niov; i++) {
+size_t remaining = iov[i].iov_len;
+uint8_t * data = (void *)iov[i].iov_base;
+while (remaining) {
+RDMAControlHeader head;
 
-r->len = MIN(remaining, RDMA_SEND_INCREMENT);
-remaining -= r->len;
+rioc->len = MIN(remaining, RDMA_SEND_INCREMENT);
+remaining -= rioc->len;
 
-/* Guaranteed to fit due to RDMA_SEND_INCREMENT MIN above */
-head.len = (uint32_t)r->len;
-head.type = RDMA_CONTROL_QEMU_FILE;
+head.len = rioc->len;
+head.type = RDMA_CONTROL_QEMU_FILE;
 
-ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
+ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
 
-if (ret < 0) {
-rdma->error_state = ret;
-return ret;
-}
+if (ret < 0) {
+rdma->error_state = ret;
+return ret;
+}
 
-data += r->len;
+data += rioc->len;
+done += rioc->len;
+}
 }
 
-return size;
+return done;
 }
 
 static size_t qemu_rdma_fill(RDMAContext *rdma, uint8_t *buf,
@@ -2585,41 +2601,74 @@ static size_t qemu_rdma_fill(RDMAContext *rdma, uint8_t 
*buf,
  * RDMA links don't use bytestreams, so we have to
  * return bytes to QEMUFile opportunistically.
  */
-static ssize_t qemu_rdma_get_buffer(void *opaque, uint8_t *buf,
-int64_t pos, size_t size)
-{
-QEMUFileRDMA *r = opaque;
-RDMAContext *rdma = r->rdma;
+static ssize_t qio_ch

[Qemu-devel] [PULL 24/28] migration: don't use an array for storing migrate parameters

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The MigrateState struct uses an array for storing migration
parameters. This presumes that all future parameters will
be integers too, which is not going to be the case. There
is no functional reason why an array is used, if anything
it makes the code less clear. The QAPI schema already
defines a struct - MigrationParameters - capable of storing
all the individual parameters, so just use that instead of
an array.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-25-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/migration.h |  5 +++-
 migration/migration.c | 55 ++-
 migration/ram.c   |  6 ++---
 3 files changed, 29 insertions(+), 37 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d24c6ef..74105a1 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -135,9 +135,12 @@ struct MigrationState
 QemuThread thread;
 QEMUBH *cleanup_bh;
 QEMUFile *to_dst_file;
-int parameters[MIGRATION_PARAMETER__MAX];
+
+/* New style params from 'migrate-set-parameters' */
+MigrationParameters parameters;
 
 int state;
+/* Old style params from 'migrate' command */
 MigrationParams params;
 
 /* State related to return path */
diff --git a/migration/migration.c b/migration/migration.c
index 695384b..27064af 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -82,16 +82,13 @@ MigrationState *migrate_get_current(void)
 .bandwidth_limit = MAX_THROTTLE,
 .xbzrle_cache_size = DEFAULT_MIGRATE_CACHE_SIZE,
 .mbps = -1,
-.parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] =
-DEFAULT_MIGRATE_COMPRESS_LEVEL,
-.parameters[MIGRATION_PARAMETER_COMPRESS_THREADS] =
-DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
-.parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
-DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
-.parameters[MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL] =
-DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL,
-.parameters[MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT] =
-DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT,
+.parameters = {
+.compress_level = DEFAULT_MIGRATE_COMPRESS_LEVEL,
+.compress_threads = DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT,
+.decompress_threads = DEFAULT_MIGRATE_DECOMPRESS_THREAD_COUNT,
+.cpu_throttle_initial = DEFAULT_MIGRATE_CPU_THROTTLE_INITIAL,
+.cpu_throttle_increment = DEFAULT_MIGRATE_CPU_THROTTLE_INCREMENT,
+},
 };
 
 if (!once) {
@@ -534,15 +531,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 MigrationState *s = migrate_get_current();
 
 params = g_malloc0(sizeof(*params));
-params->compress_level = s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL];
-params->compress_threads =
-s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS];
-params->decompress_threads =
-s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS];
-params->cpu_throttle_initial =
-s->parameters[MIGRATION_PARAMETER_CPU_THROTTLE_INITIAL];
-params->cpu_throttle_increment =
-s->parameters[MIGRATION_PARAMETER_CPU_THROTTLE_INCREMENT];
+params->compress_level = s->parameters.compress_level;
+params->compress_threads = s->parameters.compress_threads;
+params->decompress_threads = s->parameters.decompress_threads;
+params->cpu_throttle_initial = s->parameters.cpu_throttle_initial;
+params->cpu_throttle_increment = s->parameters.cpu_throttle_increment;
 
 return params;
 }
@@ -743,7 +736,8 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 bool has_cpu_throttle_initial,
 int64_t cpu_throttle_initial,
 bool has_cpu_throttle_increment,
-int64_t cpu_throttle_increment, Error **errp)
+int64_t cpu_throttle_increment,
+Error **errp)
 {
 MigrationState *s = migrate_get_current();
 
@@ -780,26 +774,23 @@ void qmp_migrate_set_parameters(bool has_compress_level,
 }
 
 if (has_compress_level) {
-s->parameters[MIGRATION_PARAMETER_COMPRESS_LEVEL] = compress_level;
+s->parameters.compress_level = compress_level;
 }
 if (has_compress_threads) {
-s->parameters[MIGRATION_PARAMETER_COMPRESS_THREADS] = compress_threads;
+s->parameters.compress_threads = compress_threads;
 }
 if (has_decompress_threads) {
-s->parameters[MIGRATION_PARAMETER_DECOMPRESS_THREADS] =
-decompress_threads;
+s->parameters.decompress_threads = decompress_threads;

[Qemu-devel] [PULL 14/28] migration: convert tcp socket protocol to use QIOChannel

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Drop the current TCP socket migration driver and extend
the new generic socket driver to cope with the TCP address
format

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-15-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/Makefile.objs |   2 +-
 migration/socket.c  |  31 +++
 migration/tcp.c | 102 
 3 files changed, 32 insertions(+), 103 deletions(-)
 delete mode 100644 migration/tcp.c

diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index b0b9550..4221861 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += migration.o tcp.o socket.o
+common-obj-y += migration.o socket.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/socket.c b/migration/socket.c
index a9911d6..25457ea 100644
--- a/migration/socket.c
+++ b/migration/socket.c
@@ -25,6 +25,23 @@
 #include "trace.h"
 
 
+static SocketAddress *tcp_build_address(const char *host_port, Error **errp)
+{
+InetSocketAddress *iaddr = inet_parse(host_port, errp);
+SocketAddress *saddr;
+
+if (!iaddr) {
+return NULL;
+}
+
+saddr = g_new0(SocketAddress, 1);
+saddr->type = SOCKET_ADDRESS_KIND_INET;
+saddr->u.inet.data = iaddr;
+
+return saddr;
+}
+
+
 static SocketAddress *unix_build_address(const char *path)
 {
 SocketAddress *saddr;
@@ -69,6 +86,14 @@ static void socket_start_outgoing_migration(MigrationState 
*s,
 qapi_free_SocketAddress(saddr);
 }
 
+void tcp_start_outgoing_migration(MigrationState *s,
+  const char *host_port,
+  Error **errp)
+{
+SocketAddress *saddr = tcp_build_address(host_port, errp);
+socket_start_outgoing_migration(s, saddr, errp);
+}
+
 void unix_start_outgoing_migration(MigrationState *s,
const char *path,
Error **errp)
@@ -125,6 +150,12 @@ static void socket_start_incoming_migration(SocketAddress 
*saddr,
 qapi_free_SocketAddress(saddr);
 }
 
+void tcp_start_incoming_migration(const char *host_port, Error **errp)
+{
+SocketAddress *saddr = tcp_build_address(host_port, errp);
+socket_start_incoming_migration(saddr, errp);
+}
+
 void unix_start_incoming_migration(const char *path, Error **errp)
 {
 SocketAddress *saddr = unix_build_address(path);
diff --git a/migration/tcp.c b/migration/tcp.c
deleted file mode 100644
index d0e0db9..000
--- a/migration/tcp.c
+++ /dev/null
@@ -1,102 +0,0 @@
-/*
- * QEMU live migration
- *
- * Copyright IBM, Corp. 2008
- *
- * Authors:
- *  Anthony Liguori   
- *
- * This work is licensed under the terms of the GNU GPL, version 2.  See
- * the COPYING file in the top-level directory.
- *
- * Contributions after 2012-01-13 are licensed under the terms of the
- * GNU GPL, version 2 or (at your option) any later version.
- */
-
-#include "qemu/osdep.h"
-
-#include "qemu-common.h"
-#include "qemu/error-report.h"
-#include "qemu/sockets.h"
-#include "migration/migration.h"
-#include "migration/qemu-file.h"
-#include "block/block.h"
-#include "qemu/main-loop.h"
-
-//#define DEBUG_MIGRATION_TCP
-
-#ifdef DEBUG_MIGRATION_TCP
-#define DPRINTF(fmt, ...) \
-do { printf("migration-tcp: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
-static void tcp_wait_for_connect(int fd, Error *err, void *opaque)
-{
-MigrationState *s = opaque;
-
-if (fd < 0) {
-DPRINTF("migrate connect error: %s\n", error_get_pretty(err));
-s->to_dst_file = NULL;
-migrate_fd_error(s, err);
-} else {
-DPRINTF("migrate connect success\n");
-s->to_dst_file = qemu_fopen_socket(fd, "wb");
-migrate_fd_connect(s);
-}
-}
-
-void tcp_start_outgoing_migration(MigrationState *s, const char *host_port, 
Error **errp)
-{
-inet_nonblocking_connect(host_port, tcp_wait_for_connect, s, errp);
-}
-
-static void tcp_accept_incoming_migration(void *opaque)
-{
-struct sockaddr_in addr;
-socklen_t addrlen = sizeof(addr);
-int s = (intptr_t)opaque;
-QEMUFile *f;
-int c;
-
-do {
-c = qemu_accept(s, (struct sockaddr *)&addr, &addrlen);
-} while (c < 0 && errno == EINTR);
-qemu_set_fd_handler(s, NULL, NULL, NULL);
-closesocket(s);
-
-DPRINTF("accepted migration\n");
-
-if (c < 0) {
-error_report("could not accept migration connection (%s)",
- strerror(errno));
-return;
-}
-
-f = qemu_fopen_socket(c, "rb");
-if (f == NULL) {
-error_report("could not qemu_fopen socket");
-goto out;
-}
-
-process_incoming_migration(f);
-return;
-
-out:
-closesocket(c);
-}
-

[Qemu-devel] [PULL 06/28] migration: introduce set_blocking function in QEMUFileOps

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Remove the assumption that every QEMUFile implementation has
a file descriptor available by introducing a new function
in QEMUFileOps to change the blocking state of a QEMUFile.

If not set, it will fallback to the original code using
the get_fd method.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-7-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |  5 +
 migration/migration.c |  4 +---
 migration/qemu-file.c | 10 +++---
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 1934a64..2dea81f 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -54,6 +54,10 @@ typedef int (QEMUFileCloseFunc)(void *opaque);
  */
 typedef int (QEMUFileGetFD)(void *opaque);
 
+/* Called to change the blocking mode of the file
+ */
+typedef int (QEMUFileSetBlocking)(void *opaque, bool enabled);
+
 /*
  * This function writes an iovec to file. The handler must write all
  * of the data or return a negative errno value.
@@ -107,6 +111,7 @@ typedef struct QEMUFileOps {
 QEMUFileGetBufferFunc *get_buffer;
 QEMUFileCloseFunc *close;
 QEMUFileGetFD *get_fd;
+QEMUFileSetBlocking *set_blocking;
 QEMUFileWritevBufferFunc *writev_buffer;
 QEMURetPathFunc *get_return_path;
 QEMUFileShutdownFunc *shut_down;
diff --git a/migration/migration.c b/migration/migration.c
index f5327e8..ac7790f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -422,11 +422,9 @@ static void process_incoming_migration_co(void *opaque)
 void process_incoming_migration(QEMUFile *f)
 {
 Coroutine *co = qemu_coroutine_create(process_incoming_migration_co);
-int fd = qemu_get_fd(f);
 
-assert(fd != -1);
 migrate_decompress_threads_create();
-qemu_set_nonblock(fd);
+qemu_file_set_blocking(f, false);
 qemu_coroutine_enter(co, f);
 }
 
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index b480b72..2b25dec 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -684,9 +684,13 @@ size_t qemu_get_counted_string(QEMUFile *f, char buf[256])
  */
 void qemu_file_set_blocking(QEMUFile *f, bool block)
 {
-if (block) {
-qemu_set_block(qemu_get_fd(f));
+if (f->ops->set_blocking) {
+f->ops->set_blocking(f->opaque, block);
 } else {
-qemu_set_nonblock(qemu_get_fd(f));
+if (block) {
+qemu_set_block(qemu_get_fd(f));
+} else {
+qemu_set_nonblock(qemu_get_fd(f));
+}
 }
 }
-- 
2.5.5

[Qemu-devel] [PULL 16/28] migration: convert exec socket protocol to use QIOChannel

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Convert the exec socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of the stdio
popen APIs. It can be unconditionally built because the
QIOChannelCommand class can report suitable error messages
on platforms which can't fork processes.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-17-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/Makefile.objs |  3 +--
 migration/exec.c| 62 +
 migration/migration.c   |  4 
 trace-events|  4 
 4 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 2e5040c..c73fb8a 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += migration.o socket.o fd.o
+common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
@@ -6,7 +6,6 @@ common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
-common-obj-$(CONFIG_POSIX) += exec.o
 
 common-obj-y += block.o
 
diff --git a/migration/exec.c b/migration/exec.c
index 5594209..c825e27 100644
--- a/migration/exec.c
+++ b/migration/exec.c
@@ -3,10 +3,12 @@
  *
  * Copyright IBM, Corp. 2008
  * Copyright Dell MessageOne 2008
+ * Copyright Red Hat, Inc. 2015-2016
  *
  * Authors:
  *  Anthony Liguori   
  *  Charles Duffy 
+ *  Daniel P. Berrange 
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -18,53 +20,53 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
-#include "qemu/sockets.h"
-#include "qemu/main-loop.h"
 #include "migration/migration.h"
-#include "migration/qemu-file.h"
-#include "block/block.h"
-#include 
+#include "io/channel-command.h"
+#include "trace.h"
 
-//#define DEBUG_MIGRATION_EXEC
-
-#ifdef DEBUG_MIGRATION_EXEC
-#define DPRINTF(fmt, ...) \
-do { printf("migration-exec: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
 
 void exec_start_outgoing_migration(MigrationState *s, const char *command, 
Error **errp)
 {
-s->to_dst_file = qemu_popen_cmd(command, "w");
-if (s->to_dst_file == NULL) {
-error_setg_errno(errp, errno, "failed to popen the migration target");
+QIOChannel *ioc;
+const char *argv[] = { "/bin/sh", "-c", command, NULL };
+
+trace_migration_exec_outgoing(command);
+ioc = QIO_CHANNEL(qio_channel_command_new_spawn(argv,
+O_WRONLY,
+errp));
+if (!ioc) {
 return;
 }
 
-migrate_fd_connect(s);
+migration_set_outgoing_channel(s, ioc);
+object_unref(OBJECT(ioc));
 }
 
-static void exec_accept_incoming_migration(void *opaque)
+static gboolean exec_accept_incoming_migration(QIOChannel *ioc,
+   GIOCondition condition,
+   gpointer opaque)
 {
-QEMUFile *f = opaque;
-
-qemu_set_fd_handler(qemu_get_fd(f), NULL, NULL, NULL);
-process_incoming_migration(f);
+migration_set_incoming_channel(migrate_get_current(), ioc);
+object_unref(OBJECT(ioc));
+return FALSE; /* unregister */
 }
 
 void exec_start_incoming_migration(const char *command, Error **errp)
 {
-QEMUFile *f;
+QIOChannel *ioc;
+const char *argv[] = { "/bin/sh", "-c", command, NULL };
 
-DPRINTF("Attempting to start an incoming migration\n");
-f = qemu_popen_cmd(command, "r");
-if(f == NULL) {
-error_setg_errno(errp, errno, "failed to popen the migration source");
+trace_migration_exec_incoming(command);
+ioc = QIO_CHANNEL(qio_channel_command_new_spawn(argv,
+O_RDONLY,
+errp));
+if (!ioc) {
 return;
 }
 
-qemu_set_fd_handler(qemu_get_fd(f), exec_accept_incoming_migration, NULL,
-f);
+qio_channel_add_watch(ioc,
+  G_IO_IN,
+  exec_accept_incoming_migration,
+  NULL,
+  NULL);
 }
diff --git a/migration/migration.c b/migration/migration.c
index 8decc7d..695384b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -311,10 +311,8 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 } else if (strstart(uri, "rdma:", &p)) {
 rdma_start_incoming_migration(p, errp);
 #endif
-#if !defined(WIN32)
 } else if (strstart(uri, "exec:", &p)) {
 exec_start_incoming_migration(p, errp);
-#endif
 } else if (strstart

[Qemu-devel] [PULL 15/28] migration: convert fd socket protocol to use QIOChannel

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Convert the fd socket migration protocol driver to use
QIOChannel and QEMUFileChannel, instead of plain sockets
APIs. It can be unconditionally built because the
QIOChannel APIs it uses will take care to report suitable
error messages if needed.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-16-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/Makefile.objs |  4 +--
 migration/fd.c  | 75 +++--
 migration/migration.c   |  4 ---
 trace-events|  4 +++
 4 files changed, 35 insertions(+), 52 deletions(-)

diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index 4221861..2e5040c 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += migration.o socket.o
+common-obj-y += migration.o socket.o fd.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
@@ -6,7 +6,7 @@ common-obj-y += xbzrle.o postcopy-ram.o
 common-obj-y += qjson.o
 
 common-obj-$(CONFIG_RDMA) += rdma.o
-common-obj-$(CONFIG_POSIX) += exec.o fd.o
+common-obj-$(CONFIG_POSIX) += exec.o
 
 common-obj-y += block.o
 
diff --git a/migration/fd.c b/migration/fd.c
index 3d788bb..60a75b8 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -1,10 +1,11 @@
 /*
  * QEMU live migration via generic fd
  *
- * Copyright Red Hat, Inc. 2009
+ * Copyright Red Hat, Inc. 2009-2016
  *
  * Authors:
  *  Chris Lalancette 
+ *  Daniel P. Berrange 
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
@@ -16,75 +17,57 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qemu-common.h"
-#include "qemu/main-loop.h"
-#include "qemu/sockets.h"
 #include "migration/migration.h"
 #include "monitor/monitor.h"
-#include "migration/qemu-file.h"
-#include "block/block.h"
+#include "io/channel-util.h"
+#include "trace.h"
 
-//#define DEBUG_MIGRATION_FD
-
-#ifdef DEBUG_MIGRATION_FD
-#define DPRINTF(fmt, ...) \
-do { printf("migration-fd: " fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
-static bool fd_is_socket(int fd)
-{
-struct stat stat;
-int ret = fstat(fd, &stat);
-if (ret == -1) {
-/* When in doubt say no */
-return false;
-}
-return S_ISSOCK(stat.st_mode);
-}
 
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error 
**errp)
 {
+QIOChannel *ioc;
 int fd = monitor_get_fd(cur_mon, fdname, errp);
 if (fd == -1) {
 return;
 }
 
-if (fd_is_socket(fd)) {
-s->to_dst_file = qemu_fopen_socket(fd, "wb");
-} else {
-s->to_dst_file = qemu_fdopen(fd, "wb");
+trace_migration_fd_outgoing(fd);
+ioc = qio_channel_new_fd(fd, errp);
+if (!ioc) {
+close(fd);
+return;
 }
 
-migrate_fd_connect(s);
+migration_set_outgoing_channel(s, ioc);
+object_unref(OBJECT(ioc));
 }
 
-static void fd_accept_incoming_migration(void *opaque)
+static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
+ GIOCondition condition,
+ gpointer opaque)
 {
-QEMUFile *f = opaque;
-
-qemu_set_fd_handler(qemu_get_fd(f), NULL, NULL, NULL);
-process_incoming_migration(f);
+migration_set_incoming_channel(migrate_get_current(), ioc);
+object_unref(OBJECT(ioc));
+return FALSE; /* unregister */
 }
 
 void fd_start_incoming_migration(const char *infd, Error **errp)
 {
+QIOChannel *ioc;
 int fd;
-QEMUFile *f;
-
-DPRINTF("Attempting to start an incoming migration via fd\n");
 
 fd = strtol(infd, NULL, 0);
-if (fd_is_socket(fd)) {
-f = qemu_fopen_socket(fd, "rb");
-} else {
-f = qemu_fdopen(fd, "rb");
-}
-if(f == NULL) {
-error_setg_errno(errp, errno, "failed to open the source descriptor");
+trace_migration_fd_incoming(fd);
+
+ioc = qio_channel_new_fd(fd, errp);
+if (!ioc) {
+close(fd);
 return;
 }
 
-qemu_set_fd_handler(fd, fd_accept_incoming_migration, NULL, f);
+qio_channel_add_watch(ioc,
+  G_IO_IN,
+  fd_accept_incoming_migration,
+  NULL,
+  NULL);
 }
diff --git a/migration/migration.c b/migration/migration.c
index a38da3a..8decc7d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -317,10 +317,8 @@ void qemu_start_incoming_migration(const char *uri, Error 
**errp)
 #endif
 } else if (strstart(uri, "unix:", &p)) {
 unix_start_incoming_migration(p, errp);
-#if !defined(WIN32)
 } else if (strstart(uri, "fd:", &p)) {
 fd_start_incoming_migration(p, errp);
-#endif
 } else {
 e

[Qemu-devel] [PULL 07/28] migration: force QEMUFile to blocking mode for outgoing migration

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Instead of relying on the default QEMUFile I/O blocking flag
state, explicitly turn on blocking I/O for outgoing migration
since it takes place in a background thread.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-8-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index ac7790f..c8d10ee 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1791,6 +1791,7 @@ void migrate_fd_connect(MigrationState *s)
 s->expected_downtime = max_downtime/100;
 s->cleanup_bh = qemu_bh_new(migrate_fd_cleanup, s);
 
+qemu_file_set_blocking(s->to_dst_file, true);
 qemu_file_set_rate_limit(s->to_dst_file,
  s->bandwidth_limit / XFER_LIMIT_RATIO);
 
-- 
2.5.5

[Qemu-devel] [PULL 10/28] migration: add reporting of errors for outgoing migration

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Currently if an application initiates an outgoing migration,
it may or may not, get an error reported back on failure. If
the error occurs synchronously to the 'migrate' command
execution, the client app will see the error message. This
is the case for DNS lookup failures. If the error occurs
asynchronously to the monitor command though, the error
will be thrown away and the client left guessing about
what went wrong. This is the case for failure to connect
to the TCP server (eg due to wrong port, or firewall
rules, or other similar errors).

In the future we'll be adding more scope for errors to
happen asynchronously with the TLS protocol handshake.
TLS errors are hard to diagnose even when they are well
reported, so discarding errors entirely will make it
impossible to debug TLS connection problems.

Management apps which do migration are already using
'query-migrate' / 'info migrate' to check up on progress
of background migration operations and to see their end
status. This is a fine place to also include the error
message when things go wrong.

This patch thus adds an 'error-desc' field to the
MigrationInfo struct, which will be populated when
the 'status' is set to 'failed':

(qemu) migrate -d tcp:localhost:9001
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off x-postcopy-ram: off
Migration status: failed (Error connecting to socket: Connection refused)
total time: 0 milliseconds

In the HMP, when doing non-detached migration, it is
also possible to display this error message directly
to the app.

(qemu) migrate tcp:localhost:9001
Error connecting to socket: Connection refused

Or with QMP

  {
"execute": "query-migrate",
"arguments": {}
  }
  {
"return": {
  "status": "failed",
  "error-desc": "address resolution failed for myhost:9000: No address 
associated with hostname"
}
  }

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-11-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 hmp.c | 13 -
 include/migration/migration.h |  5 -
 include/qapi/error.h  |  2 +-
 migration/migration.c | 15 ---
 migration/rdma.c  | 10 +++---
 migration/tcp.c   |  2 +-
 migration/unix.c  |  2 +-
 qapi-schema.json  |  7 ++-
 trace-events  |  2 +-
 util/error.c  |  2 +-
 10 files changed, 42 insertions(+), 18 deletions(-)

diff --git a/hmp.c b/hmp.c
index 9f9bcf9..a464ca9 100644
--- a/hmp.c
+++ b/hmp.c
@@ -35,6 +35,7 @@
 #include "block/qapi.h"
 #include "qemu-io.h"
 #include "qemu/cutils.h"
+#include "qemu/error-report.h"
 
 #ifdef CONFIG_SPICE
 #include 
@@ -168,8 +169,15 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 }
 
 if (info->has_status) {
-monitor_printf(mon, "Migration status: %s\n",
+monitor_printf(mon, "Migration status: %s",
MigrationStatus_lookup[info->status]);
+if (info->status == MIGRATION_STATUS_FAILED &&
+info->has_error_desc) {
+monitor_printf(mon, " (%s)\n", info->error_desc);
+} else {
+monitor_printf(mon, "\n");
+}
+
 monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
info->total_time);
 if (info->has_expected_downtime) {
@@ -1533,6 +1541,9 @@ static void hmp_migrate_status_cb(void *opaque)
 if (status->is_block_migration) {
 monitor_printf(status->mon, "\n");
 }
+if (info->has_error_desc) {
+error_report("%s", info->error_desc);
+}
 monitor_resume(status->mon);
 timer_del(status->timer);
 g_free(status);
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 87ad577..d24c6ef 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -171,6 +171,9 @@ struct MigrationState
 QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest) 
src_page_requests;
 /* The RAMBlock used in the last src_page_request */
 RAMBlock *last_req_rb;
+
+/* The last error that occurred */
+Error *error;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
@@ -207,7 +210,7 @@ void rdma_start_outgoing_migration(void *opaque, const char 
*host_port, Error **
 
 void rdma_start_incoming_migration(const char *host_port, Error **errp);
 
-void migrate_fd_error(MigrationState *s);
+void migrate_fd_error(MigrationState *s, const Error *error);
 
 void migrate_fd_connect(MigrationState *s);
 
diff --git a/include/qapi/error.h b/include/qapi/error.h
index 11be232..0576659 100644
--- a/include/qapi/error.h
+++ b/include/qapi/error.h
@@ -134,7 +134,7 @@ typedef enum ErrorClass {
 /*
  * Get @err's human-readable error messa

[Qemu-devel] [PULL 13/28] migration: rename unix.c to socket.c

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The unix.c file will be nearly the same as the tcp.c file,
only differing in the initial SocketAddress creation code.
Rename unix.c to socket.c and refactor it a little to
prepare for merging the TCP code.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-14-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 migration/Makefile.objs |   2 +-
 migration/socket.c  | 132 
 migration/unix.c| 121 
 trace-events|   8 +--
 4 files changed, 137 insertions(+), 126 deletions(-)
 create mode 100644 migration/socket.c
 delete mode 100644 migration/unix.c

diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index ad7fed9..b0b9550 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y += migration.o tcp.o unix.o
+common-obj-y += migration.o tcp.o socket.o
 common-obj-y += vmstate.o
 common-obj-y += qemu-file.o qemu-file-buf.o qemu-file-unix.o qemu-file-stdio.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/socket.c b/migration/socket.c
new file mode 100644
index 000..a9911d6
--- /dev/null
+++ b/migration/socket.c
@@ -0,0 +1,132 @@
+/*
+ * QEMU live migration via Unix Domain Sockets
+ *
+ * Copyright Red Hat, Inc. 2009-2016
+ *
+ * Authors:
+ *  Chris Lalancette 
+ *  Daniel P. Berrange 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Contributions after 2012-01-13 are licensed under the terms of the
+ * GNU GPL, version 2 or (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
+#include "io/channel-socket.h"
+#include "trace.h"
+
+
+static SocketAddress *unix_build_address(const char *path)
+{
+SocketAddress *saddr;
+
+saddr = g_new0(SocketAddress, 1);
+saddr->type = SOCKET_ADDRESS_KIND_UNIX;
+saddr->u.q_unix.data = g_new0(UnixSocketAddress, 1);
+saddr->u.q_unix.data->path = g_strdup(path);
+
+return saddr;
+}
+
+
+static void socket_outgoing_migration(Object *src,
+  Error *err,
+  gpointer opaque)
+{
+MigrationState *s = opaque;
+QIOChannel *sioc = QIO_CHANNEL(src);
+
+if (err) {
+trace_migration_socket_outgoing_error(error_get_pretty(err));
+s->to_dst_file = NULL;
+migrate_fd_error(s, err);
+} else {
+trace_migration_socket_outgoing_connected();
+migration_set_outgoing_channel(s, sioc);
+}
+object_unref(src);
+}
+
+static void socket_start_outgoing_migration(MigrationState *s,
+SocketAddress *saddr,
+Error **errp)
+{
+QIOChannelSocket *sioc = qio_channel_socket_new();
+qio_channel_socket_connect_async(sioc,
+ saddr,
+ socket_outgoing_migration,
+ s,
+ NULL);
+qapi_free_SocketAddress(saddr);
+}
+
+void unix_start_outgoing_migration(MigrationState *s,
+   const char *path,
+   Error **errp)
+{
+SocketAddress *saddr = unix_build_address(path);
+socket_start_outgoing_migration(s, saddr, errp);
+}
+
+
+static gboolean socket_accept_incoming_migration(QIOChannel *ioc,
+ GIOCondition condition,
+ gpointer opaque)
+{
+QIOChannelSocket *sioc;
+Error *err = NULL;
+
+sioc = qio_channel_socket_accept(QIO_CHANNEL_SOCKET(ioc),
+ &err);
+if (!sioc) {
+error_report("could not accept migration connection (%s)",
+ error_get_pretty(err));
+goto out;
+}
+
+trace_migration_socket_incoming_accepted();
+
+migration_set_incoming_channel(migrate_get_current(),
+   QIO_CHANNEL(sioc));
+object_unref(OBJECT(sioc));
+
+out:
+/* Close listening socket as its no longer needed */
+qio_channel_close(ioc, NULL);
+return FALSE; /* unregister */
+}
+
+
+static void socket_start_incoming_migration(SocketAddress *saddr,
+Error **errp)
+{
+QIOChannelSocket *listen_ioc = qio_channel_socket_new();
+
+if (qio_channel_socket_listen_sync(listen_ioc, saddr, errp) < 0) {
+object_unref(OBJECT(listen_ioc));
+qapi_free_SocketAddress(saddr);
+return;
+}
+
+qio_channel_add_watch(QIO_CHANNEL(listen_ioc),
+  G_IO_IN,
+

[Qemu-devel] [PULL 11/28] migration: convert post-copy to use QIOChannelBuffer

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The post-copy code does some I/O to/from an intermediate
in-memory buffer rather than direct to the underlying
I/O channel. Switch this code to use QIOChannelBuffer
instead of QEMUSizedBuffer.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Message-Id: <1461751518-12128-12-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 docs/migration.txt  |  4 ++--
 include/sysemu/sysemu.h |  2 +-
 migration/migration.c   | 15 +++
 migration/savevm.c  | 47 ---
 4 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/docs/migration.txt b/docs/migration.txt
index 90209ab..6503c17 100644
--- a/docs/migration.txt
+++ b/docs/migration.txt
@@ -403,8 +403,8 @@ listen thread: --- page -- page -- page 
-- page -- page --
 
 On receipt of CMD_PACKAGED (1)
All the data associated with the package - the ( ... ) section in the
-diagram - is read into memory (into a QEMUSizedBuffer), and the main thread
-recurses into qemu_loadvm_state_main to process the contents of the package (2)
+diagram - is read into memory, and the main thread recurses into
+qemu_loadvm_state_main to process the contents of the package (2)
 which contains commands (3,6) and devices (4...)
 
 On receipt of 'postcopy listen' - 3 -(i.e. the 1st command in the package)
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 618169c..9428141 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -119,7 +119,7 @@ void qemu_savevm_command_send(QEMUFile *f, enum qemu_vm_cmd 
command,
   uint16_t len, uint8_t *data);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
 void qemu_savevm_send_open_return_path(QEMUFile *f);
-int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb);
+int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len);
 void qemu_savevm_send_postcopy_advise(QEMUFile *f);
 void qemu_savevm_send_postcopy_listen(QEMUFile *f);
 void qemu_savevm_send_postcopy_run(QEMUFile *f);
diff --git a/migration/migration.c b/migration/migration.c
index 1420ccc..1ab9535 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -34,6 +34,7 @@
 #include "qom/cpu.h"
 #include "exec/memory.h"
 #include "exec/address-spaces.h"
+#include "io/channel-buffer.h"
 
 #define MAX_THROTTLE  (32 << 20)  /* Migration transfer speed throttling */
 
@@ -1457,7 +1458,8 @@ static int 
await_return_path_close_on_source(MigrationState *ms)
 static int postcopy_start(MigrationState *ms, bool *old_vm_running)
 {
 int ret;
-const QEMUSizedBuffer *qsb;
+QIOChannelBuffer *bioc;
+QEMUFile *fb;
 int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE,
   MIGRATION_STATUS_POSTCOPY_ACTIVE);
@@ -1516,11 +1518,9 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
  * So we wrap the device state up in a package with a length at the start;
  * to do this we use a qemu_buf to hold the whole of the device state.
  */
-QEMUFile *fb = qemu_bufopen("w", NULL);
-if (!fb) {
-error_report("Failed to create buffered file");
-goto fail;
-}
+bioc = qio_channel_buffer_new(4096);
+fb = qemu_fopen_channel_output(QIO_CHANNEL(bioc));
+object_unref(OBJECT(bioc));
 
 /*
  * Make sure the receiver can get incoming pages before we send the rest
@@ -1534,10 +1534,9 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
 qemu_savevm_send_postcopy_run(fb);
 
 /* <><> end of stuff going into the package */
-qsb = qemu_buf_get(fb);
 
 /* Now send that blob */
-if (qemu_savevm_send_packaged(ms->to_dst_file, qsb)) {
+if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage)) {
 goto fail_closefb;
 }
 qemu_fclose(fb);
diff --git a/migration/savevm.c b/migration/savevm.c
index 65ce0c6..43031a0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -51,6 +51,7 @@
 #include "block/snapshot.h"
 #include "block/qapi.h"
 #include "qemu/cutils.h"
+#include "io/channel-buffer.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -760,10 +761,8 @@ void qemu_savevm_send_open_return_path(QEMUFile *f)
  *0 on success
  *-ve on error
  */
-int qemu_savevm_send_packaged(QEMUFile *f, const QEMUSizedBuffer *qsb)
+int qemu_savevm_send_packaged(QEMUFile *f, const uint8_t *buf, size_t len)
 {
-size_t cur_iov;
-size_t len = qsb_get_length(qsb);
 uint32_t tmp;
 
 if (len > MAX_VM_CMD_PACKAGED_SIZE) {
@@ -777,18 +776,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const 
QEMUSizedBuffer *qsb)
 trace_qemu_savevm_send_packaged();
 qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp);
 
-/* all the data follows (concatinating the iov's) */
-for (cur_iov = 0; cur_iov < qsb->n_

[Qemu-devel] [PULL 01/28] s390: use FILE instead of QEMUFile for creating text file

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The s390 skeys monitor command needs to write out a plain text
file. Currently it is using the QEMUFile class for this, but
work is ongoing to refactor QEMUFile and eliminate much code
related to it. The only feature qemu_fopen() gives over fopen()
is support for QEMU FD passing, but this can be achieved with
qemu_open() + fdopen() too. Switching to regular stdio FILE
APIs avoids the need to sprintf via an intermedia buffer which
slightly simplifies the code.

Reviewed-by: Eric Blake 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-2-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 hw/s390x/s390-skeys.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
index d772cfc..e2d4e1a 100644
--- a/hw/s390x/s390-skeys.c
+++ b/hw/s390x/s390-skeys.c
@@ -47,15 +47,11 @@ void s390_skeys_init(void)
 qdev_init_nofail(DEVICE(obj));
 }
 
-static void write_keys(QEMUFile *f, uint8_t *keys, uint64_t startgfn,
+static void write_keys(FILE *f, uint8_t *keys, uint64_t startgfn,
uint64_t count, Error **errp)
 {
 uint64_t curpage = startgfn;
 uint64_t maxpage = curpage + count - 1;
-const char *fmt = "page=%03" PRIx64 ": key(%d) => ACC=%X, FP=%d, REF=%d,"
-  " ch=%d, reserved=%d\n";
-char buf[128];
-int len;
 
 for (; curpage <= maxpage; curpage++) {
 uint8_t acc = (*keys & 0xF0) >> 4;
@@ -64,10 +60,9 @@ static void write_keys(QEMUFile *f, uint8_t *keys, uint64_t 
startgfn,
 int ch = (*keys & 0x02);
 int res = (*keys & 0x01);
 
-len = snprintf(buf, sizeof(buf), fmt, curpage,
-   *keys, acc, fp, ref, ch, res);
-assert(len < sizeof(buf));
-qemu_put_buffer(f, (uint8_t *)buf, len);
+fprintf(f, "page=%03" PRIx64 ": key(%d) => ACC=%X, FP=%d, REF=%d,"
+" ch=%d, reserved=%d\n",
+curpage, *keys, acc, fp, ref, ch, res);
 keys++;
 }
 }
@@ -116,7 +111,8 @@ void qmp_dump_skeys(const char *filename, Error **errp)
 vaddr cur_gfn = 0;
 uint8_t *buf;
 int ret;
-QEMUFile *f;
+int fd;
+FILE *f;
 
 /* Quick check to see if guest is using storage keys*/
 if (!skeyclass->skeys_enabled(ss)) {
@@ -125,8 +121,14 @@ void qmp_dump_skeys(const char *filename, Error **errp)
 return;
 }
 
-f = qemu_fopen(filename, "wb");
+fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0600);
+if (fd < 0) {
+error_setg_file_open(errp, errno, filename);
+return;
+}
+f = fdopen(fd, "wb");
 if (!f) {
+close(fd);
 error_setg_file_open(errp, errno, filename);
 return;
 }
@@ -162,7 +164,7 @@ out_free:
 error_propagate(errp, lerr);
 g_free(buf);
 out:
-qemu_fclose(f);
+fclose(f);
 }
 
 static void qemu_s390_skeys_init(Object *obj)
-- 
2.5.5

[Qemu-devel] [PULL 09/28] migration: add helpers for creating QEMUFile from a QIOChannel

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Currently creating a QEMUFile instance from a QIOChannel is
quite simple only requiring a single call to
qemu_fopen_channel_input or  qemu_fopen_channel_output
depending on the end of migration connection.

When QEMU gains TLS support, however, there will need to be
a TLS negotiation done inbetween creation of the QIOChannel
and creation of the final QEMUFile. Introduce some helper
methods that will encapsulate this logic, isolating the
migration protocol drivers from knowledge about TLS.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Acked-by: Juan Quintela 
Message-Id: <1461751518-12128-10-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/migration.h |  6 ++
 migration/migration.c | 21 +
 2 files changed, 27 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index 9e36a97..87ad577 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -179,6 +179,12 @@ void process_incoming_migration(QEMUFile *f);
 
 void qemu_start_incoming_migration(const char *uri, Error **errp);
 
+void migration_set_incoming_channel(MigrationState *s,
+QIOChannel *ioc);
+
+void migration_set_outgoing_channel(MigrationState *s,
+QIOChannel *ioc);
+
 uint64_t migrate_max_downtime(void);
 
 void exec_start_incoming_migration(const char *host_port, Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index c8d10ee..c960e16 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -428,6 +428,27 @@ void process_incoming_migration(QEMUFile *f)
 qemu_coroutine_enter(co, f);
 }
 
+
+void migration_set_incoming_channel(MigrationState *s,
+QIOChannel *ioc)
+{
+QEMUFile *f = qemu_fopen_channel_input(ioc);
+
+process_incoming_migration(f);
+}
+
+
+void migration_set_outgoing_channel(MigrationState *s,
+QIOChannel *ioc)
+{
+QEMUFile *f = qemu_fopen_channel_output(ioc);
+
+s->to_dst_file = f;
+
+migrate_fd_connect(s);
+}
+
+
 /*
  * Send a message on the return channel back to the source
  * of the migration.
-- 
2.5.5

[Qemu-devel] [PULL 04/28] migration: ensure qemu_fflush() always writes full data amount

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The QEMUFile writev_buffer / put_buffer functions are expected
to write out the full set of requested data, blocking until
complete. The qemu_fflush() caller does not expect to deal with
partial writes. Clarify the function comments and add a sanity
check to the code to catch mistaken implementations.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-5-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h |  6 --
 migration/qemu-file.c | 16 
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 3f6b4ed..5909ff0 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -28,7 +28,8 @@
 
 /* This function writes a chunk of data to a file at the given position.
  * The pos argument can be ignored if the file is only being used for
- * streaming.  The handler should try to write all of the data it can.
+ * streaming.  The handler must write all of the data or return a negative
+ * errno value.
  */
 typedef ssize_t (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf,
 int64_t pos, size_t size);
@@ -54,7 +55,8 @@ typedef int (QEMUFileCloseFunc)(void *opaque);
 typedef int (QEMUFileGetFD)(void *opaque);
 
 /*
- * This function writes an iovec to file.
+ * This function writes an iovec to file. The handler must write all
+ * of the data or return a negative errno value.
  */
 typedef ssize_t (QEMUFileWritevBufferFunc)(void *opaque, struct iovec *iov,
int iovcnt, int64_t pos);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 6f4a129..656db4a 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -108,11 +108,13 @@ bool qemu_file_is_writable(QEMUFile *f)
  * Flushes QEMUFile buffer
  *
  * If there is writev_buffer QEMUFileOps it uses it otherwise uses
- * put_buffer ops.
+ * put_buffer ops. This will flush all pending data. If data was
+ * only partially flushed, it will set an error state.
  */
 void qemu_fflush(QEMUFile *f)
 {
 ssize_t ret = 0;
+ssize_t expect = 0;
 
 if (!qemu_file_is_writable(f)) {
 return;
@@ -120,21 +122,27 @@ void qemu_fflush(QEMUFile *f)
 
 if (f->ops->writev_buffer) {
 if (f->iovcnt > 0) {
+expect = iov_size(f->iov, f->iovcnt);
 ret = f->ops->writev_buffer(f->opaque, f->iov, f->iovcnt, f->pos);
 }
 } else {
 if (f->buf_index > 0) {
+expect = f->buf_index;
 ret = f->ops->put_buffer(f->opaque, f->buf, f->pos, f->buf_index);
 }
 }
+
 if (ret >= 0) {
 f->pos += ret;
 }
+/* We expect the QEMUFile write impl to send the full
+ * data set we requested, so sanity check that.
+ */
+if (ret != expect) {
+qemu_file_set_error(f, ret < 0 ? ret : -EIO);
+}
 f->buf_index = 0;
 f->iovcnt = 0;
-if (ret < 0) {
-qemu_file_set_error(f, ret);
-}
 }
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
-- 
2.5.5

[Qemu-devel] [PULL 00/28] migration: support for TLS

2016-05-25 Thread Amit Shah

The following changes since commit 287db79df8af8e31f18e262feb5e05103a09e4d4:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2016-05-24 13:06:33 +0100)

are available in the git repository at:

  https://git.kernel.org/pub/scm/virt/qemu/amit/migration.git 
tags/migration-2.7-2

for you to fetch changes up to 12992c16d9afd8a23a94a84ad532a1adedf9e511:

  migration: remove qemu_get_fd method from QEMUFile (2016-05-26 11:32:21 +0530)


migration: add TLS support to the migration data channel

This is a big refactoring of the migration backend code - moving away from
QEMUFile to the new QIOChannel framework introduced here.  This brings a
good level of abstraction and reduction of many lines of code.

This series also adds the ability for many backends (all except RDMA) to
use TLS for encrypting the migration data between the endpoints.




Daniel P. Berrange (28):
  s390: use FILE instead of QEMUFile for creating text file
  io: avoid double-free when closing QIOChannelBuffer
  migration: remove use of qemu_bufopen from vmstate tests
  migration: ensure qemu_fflush() always writes full data amount
  migration: split migration hooks out of QEMUFileOps
  migration: introduce set_blocking function in QEMUFileOps
  migration: force QEMUFile to blocking mode for outgoing migration
  migration: introduce a new QEMUFile impl based on QIOChannel
  migration: add helpers for creating QEMUFile from a QIOChannel
  migration: add reporting of errors for outgoing migration
  migration: convert post-copy to use QIOChannelBuffer
  migration: convert unix socket protocol to use QIOChannel
  migration: rename unix.c to socket.c
  migration: convert tcp socket protocol to use QIOChannel
  migration: convert fd socket protocol to use QIOChannel
  migration: convert exec socket protocol to use QIOChannel
  migration: convert RDMA to use QIOChannel interface
  migration: convert savevm to use QIOChannel for writing to files
  migration: delete QEMUFile buffer implementation
  migration: delete QEMUSizedBuffer struct
  migration: delete QEMUFile sockets implementation
  migration: delete QEMUFile stdio implementation
  migration: move definition of struct QEMUFile back into qemu-file.c
  migration: don't use an array for storing migrate parameters
  migration: define 'tls-creds' and 'tls-hostname' migration parameters
  migration: add support for encrypting data with TLS
  migration: remove support for non-iovec based write handlers
  migration: remove qemu_get_fd method from QEMUFile

 docs/migration.txt |   4 +-
 hmp-commands.hx|   2 +-
 hmp.c  |  57 -
 hw/s390x/s390-skeys.c  |  26 +--
 include/migration/migration.h  |  26 ++-
 include/migration/qemu-file.h  |  57 ++---
 include/qapi/error.h   |   2 +-
 include/qemu/typedefs.h|   1 -
 include/sysemu/sysemu.h|   2 +-
 io/channel-buffer.c|   1 +
 migration/Makefile.objs|   7 +-
 migration/exec.c   |  62 +++---
 migration/fd.c |  75 +++
 migration/migration.c  | 157 +-
 migration/qemu-file-buf.c  | 464 -
 migration/qemu-file-channel.c  | 180 
 migration/qemu-file-internal.h |  53 -
 migration/qemu-file-stdio.c| 196 -
 migration/qemu-file-unix.c | 323 
 migration/qemu-file.c  | 110 +-
 migration/ram.c|   6 +-
 migration/rdma.c   | 380 -
 migration/savevm.c |  63 ++
 migration/socket.c | 183 
 migration/tcp.c| 102 -
 migration/tls.c| 161 ++
 migration/unix.c   | 103 -
 qapi-schema.json   |  65 +-
 tests/Makefile |   6 +-
 tests/test-vmstate.c   |  55 ++---
 trace-events   |  25 ++-
 util/error.c   |   2 +-
 32 files changed, 1281 insertions(+), 1675 deletions(-)
 delete mode 100644 migration/qemu-file-buf.c
 create mode 100644 migration/qemu-file-channel.c
 delete mode 100644 migration/qemu-file-internal.h
 delete mode 100644 migration/qemu-file-stdio.c
 delete mode 100644 migration/qemu-file-unix.c
 create mode 100644 migration/socket.c
 delete mode 100644 migration/tcp.c
 create mode 100644 migration/tls.c
 delete mode 100644 migration/unix.c

-- 
2.5.5

[Qemu-devel] [PULL 02/28] io: avoid double-free when closing QIOChannelBuffer

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The QIOChannelBuffer's close implementation will free
the internal data buffer. It failed to reset the pointer
to NULL though, so when the object is later finalized
it will free it a second time with predictable crash.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-3-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 io/channel-buffer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/io/channel-buffer.c b/io/channel-buffer.c
index 3e5117b..43d7959 100644
--- a/io/channel-buffer.c
+++ b/io/channel-buffer.c
@@ -140,6 +140,7 @@ static int qio_channel_buffer_close(QIOChannel *ioc,
 QIOChannelBuffer *bioc = QIO_CHANNEL_BUFFER(ioc);
 
 g_free(bioc->data);
+bioc->data = NULL;
 bioc->capacity = bioc->usage = bioc->offset = 0;
 
 return 0;
-- 
2.5.5

[Qemu-devel] [PULL 03/28] migration: remove use of qemu_bufopen from vmstate tests

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

Some of the test-vmstate.c test cases use a temporary file
while others use a memory buffer. To facilitate the future
removal of the qemu_bufopen() function, convert all the tests
to use a temporary file.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-4-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 tests/Makefile   |  2 +-
 tests/test-vmstate.c | 44 +---
 2 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/tests/Makefile b/tests/Makefile
index 1bbd1ca..b5cb75e 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -438,7 +438,7 @@ tests/test-qdev-global-props$(EXESUF): 
tests/test-qdev-global-props.o \
hw/core/fw-path-provider.o \
$(test-qapi-obj-y)
 tests/test-vmstate$(EXESUF): tests/test-vmstate.o \
-   migration/vmstate.o migration/qemu-file.o migration/qemu-file-buf.o \
+   migration/vmstate.o migration/qemu-file.o \
 migration/qemu-file-unix.o migration/qjson.o \
$(test-qom-obj-y)
 tests/test-timed-average$(EXESUF): tests/test-timed-average.o qemu-timer.o \
diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index 713d444..f337cf6 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -44,11 +44,6 @@ void yield_until_fd_readable(int fd)
 select(fd + 1, &fds, NULL, NULL, NULL);
 }
 
-/*
- * Some tests use 'open_test_file' to work on a real fd, some use
- * an in memory file (QEMUSizedBuffer+qemu_bufopen); we could pick one
- * but this way we test both.
- */
 
 /* Duplicate temp_fd and seek to the beginning of the file */
 static QEMUFile *open_test_file(bool write)
@@ -61,20 +56,6 @@ static QEMUFile *open_test_file(bool write)
 return qemu_fdopen(fd, write ? "wb" : "rb");
 }
 
-/*
- * Check that the contents of the memory-buffered file f match
- * the given size/data.
- */
-static void check_mem_file(QEMUFile *f, void *data, size_t size)
-{
-uint8_t *result = g_malloc(size);
-const QEMUSizedBuffer *qsb = qemu_buf_get(f);
-g_assert_cmpint(qsb_get_length(qsb), ==, size);
-g_assert_cmpint(qsb_get_buffer(qsb, 0, size, result), ==, size);
-g_assert_cmpint(memcmp(result, data, size), ==, 0);
-g_free(result);
-}
-
 #define SUCCESS(val) \
 g_assert_cmpint((val), ==, 0)
 
@@ -392,7 +373,7 @@ static const VMStateDescription vmstate_skipping = {
 
 static void test_save_noskip(void)
 {
-QEMUFile *fsave = qemu_bufopen("w", NULL);
+QEMUFile *fsave = open_test_file(true);
 TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4, .e = 5, .f = 6,
.skip_c_e = false };
 vmstate_save_state(fsave, &vmstate_skipping, &obj, NULL);
@@ -406,13 +387,14 @@ static void test_save_noskip(void)
 0, 0, 0, 5, /* e */
 0, 0, 0, 0, 0, 0, 0, 6, /* f */
 };
-check_mem_file(fsave, expected, sizeof(expected));
+
 qemu_fclose(fsave);
+compare_vmstate(expected, sizeof(expected));
 }
 
 static void test_save_skip(void)
 {
-QEMUFile *fsave = qemu_bufopen("w", NULL);
+QEMUFile *fsave = open_test_file(true);
 TestStruct obj = { .a = 1, .b = 2, .c = 3, .d = 4, .e = 5, .f = 6,
.skip_c_e = true };
 vmstate_save_state(fsave, &vmstate_skipping, &obj, NULL);
@@ -424,13 +406,14 @@ static void test_save_skip(void)
 0, 0, 0, 0, 0, 0, 0, 4, /* d */
 0, 0, 0, 0, 0, 0, 0, 6, /* f */
 };
-check_mem_file(fsave, expected, sizeof(expected));
 
 qemu_fclose(fsave);
+compare_vmstate(expected, sizeof(expected));
 }
 
 static void test_load_noskip(void)
 {
+QEMUFile *fsave = open_test_file(true);
 uint8_t buf[] = {
 0, 0, 0, 10, /* a */
 0, 0, 0, 20, /* b */
@@ -440,10 +423,10 @@ static void test_load_noskip(void)
 0, 0, 0, 0, 0, 0, 0, 60, /* f */
 QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
 };
+qemu_put_buffer(fsave, buf, sizeof(buf));
+qemu_fclose(fsave);
 
-QEMUSizedBuffer *qsb = qsb_create(buf, sizeof(buf));
-g_assert(qsb);
-QEMUFile *loading = qemu_bufopen("r", qsb);
+QEMUFile *loading = open_test_file(false);
 TestStruct obj = { .skip_c_e = false };
 vmstate_load_state(loading, &vmstate_skipping, &obj, 2);
 g_assert(!qemu_file_get_error(loading));
@@ -454,11 +437,11 @@ static void test_load_noskip(void)
 g_assert_cmpint(obj.e, ==, 50);
 g_assert_cmpint(obj.f, ==, 60);
 qemu_fclose(loading);
-qsb_free(qsb);
 }
 
 static void test_load_skip(void)
 {
+QEMUFile *fsave = open_test_file(true);
 uint8_t buf[] = {
 0, 0, 0, 10, /* a */
 0, 0, 0, 20, /* b */
@@ -466,10 +449,10 @@ static void test_load_skip(void)
 0, 0, 0, 0, 0, 0, 0, 60, /* f */
 QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
 };
+qemu_put

[Qemu-devel] [PULL 05/28] migration: split migration hooks out of QEMUFileOps

2016-05-25 Thread Amit Shah

From: "Daniel P. Berrange" 

The QEMUFileOps struct contains the I/O subsystem callbacks
and the migration stage hooks. Split the hooks out into a
separate QEMUFileHooks struct to make it easier to refactor
the I/O side of QEMUFile without affecting the hooks.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Daniel P. Berrange 
Reviewed-by: Juan Quintela 
Message-Id: <1461751518-12128-6-git-send-email-berra...@redhat.com>
Signed-off-by: Amit Shah 
---
 include/migration/qemu-file.h  | 10 +++---
 migration/qemu-file-internal.h |  1 +
 migration/qemu-file.c  | 24 +++-
 migration/rdma.c   |  8 
 4 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 5909ff0..1934a64 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -108,13 +108,16 @@ typedef struct QEMUFileOps {
 QEMUFileCloseFunc *close;
 QEMUFileGetFD *get_fd;
 QEMUFileWritevBufferFunc *writev_buffer;
+QEMURetPathFunc *get_return_path;
+QEMUFileShutdownFunc *shut_down;
+} QEMUFileOps;
+
+typedef struct QEMUFileHooks {
 QEMURamHookFunc *before_ram_iterate;
 QEMURamHookFunc *after_ram_iterate;
 QEMURamHookFunc *hook_ram_load;
 QEMURamSaveFunc *save_page;
-QEMURetPathFunc *get_return_path;
-QEMUFileShutdownFunc *shut_down;
-} QEMUFileOps;
+} QEMUFileHooks;
 
 struct QEMUSizedBuffer {
 struct iovec *iov;
@@ -129,6 +132,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 QEMUFile *qemu_bufopen(const char *mode, QEMUSizedBuffer *input);
+void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
 int qemu_get_fd(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
diff --git a/migration/qemu-file-internal.h b/migration/qemu-file-internal.h
index d95e853..8fdfa95 100644
--- a/migration/qemu-file-internal.h
+++ b/migration/qemu-file-internal.h
@@ -33,6 +33,7 @@
 
 struct QEMUFile {
 const QEMUFileOps *ops;
+const QEMUFileHooks *hooks;
 void *opaque;
 
 int64_t bytes_xfer;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 656db4a..b480b72 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -80,6 +80,12 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps 
*ops)
 return f;
 }
 
+
+void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
+{
+f->hooks = hooks;
+}
+
 /*
  * Get last error for stream f
  *
@@ -149,8 +155,8 @@ void ram_control_before_iterate(QEMUFile *f, uint64_t flags)
 {
 int ret = 0;
 
-if (f->ops->before_ram_iterate) {
-ret = f->ops->before_ram_iterate(f, f->opaque, flags, NULL);
+if (f->hooks && f->hooks->before_ram_iterate) {
+ret = f->hooks->before_ram_iterate(f, f->opaque, flags, NULL);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -161,8 +167,8 @@ void ram_control_after_iterate(QEMUFile *f, uint64_t flags)
 {
 int ret = 0;
 
-if (f->ops->after_ram_iterate) {
-ret = f->ops->after_ram_iterate(f, f->opaque, flags, NULL);
+if (f->hooks && f->hooks->after_ram_iterate) {
+ret = f->hooks->after_ram_iterate(f, f->opaque, flags, NULL);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -173,8 +179,8 @@ void ram_control_load_hook(QEMUFile *f, uint64_t flags, 
void *data)
 {
 int ret = -EINVAL;
 
-if (f->ops->hook_ram_load) {
-ret = f->ops->hook_ram_load(f, f->opaque, flags, data);
+if (f->hooks && f->hooks->hook_ram_load) {
+ret = f->hooks->hook_ram_load(f, f->opaque, flags, data);
 if (ret < 0) {
 qemu_file_set_error(f, ret);
 }
@@ -193,9 +199,9 @@ size_t ram_control_save_page(QEMUFile *f, ram_addr_t 
block_offset,
  ram_addr_t offset, size_t size,
  uint64_t *bytes_sent)
 {
-if (f->ops->save_page) {
-int ret = f->ops->save_page(f, f->opaque, block_offset,
-offset, size, bytes_sent);
+if (f->hooks && f->hooks->save_page) {
+int ret = f->hooks->save_page(f, f->opaque, block_offset,
+  offset, size, bytes_sent);
 
 if (ret != RAM_SAVE_CONTROL_DELAYED) {
 if (bytes_sent && *bytes_sent > 0) {
diff --git a/migration/rdma.c b/migration/rdma.c
index f6a9992..0d067a1 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -3380,12 +3380,18 @@ static const QEMUFileOps rdma_read_ops = {
 .get_buffer= qemu_rdma_get_buffer,
 .get_fd= qemu_rdma_get_fd,
 .close = qemu_rdma_close,
+};
+
+static const QEMUFileHooks rdma_read_hooks = {
 .hook_ram_load = rdma_load_hook,
 };
 
 static const QEMUFileOps rdma_write_ops = {
 .put_buffer = qemu_rdma_put_buf

Re: [Qemu-devel] [PATCH qemu v16 14/19] vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)

2016-05-25 Thread David Gibson

On Mon, May 16, 2016 at 02:20:33PM -0600, Alex Williamson wrote:
> On Mon, 16 May 2016 11:10:05 +1000
> Alexey Kardashevskiy  wrote:
> 
> > On 05/14/2016 08:25 AM, Alex Williamson wrote:
> > > On Wed,  4 May 2016 16:52:26 +1000
> > > Alexey Kardashevskiy  wrote:
> > >  
> > >> This makes use of the new "memory registering" feature. The idea is
> > >> to provide the userspace ability to notify the host kernel about pages
> > >> which are going to be used for DMA. Having this information, the host
> > >> kernel can pin them all once per user process, do locked pages
> > >> accounting (once) and not spent time on doing that in real time with
> > >> possible failures which cannot be handled nicely in some cases.
> > >>
> > >> This adds a prereg memory listener which listens on address_space_memory
> > >> and notifies a VFIO container about memory which needs to be
> > >> pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are 
> > >> skipped.
> > >>
> > >> As there is no per-IOMMU-type release() callback anymore, this stores
> > >> the IOMMU type in the container so vfio_listener_release() can determine
> > >> if it needs to unregister @prereg_listener.
> > >>
> > >> The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
> > >> are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
> > >> not call it when v2 is detected and enabled.
> > >>
> > >> This enforces guest RAM blocks to be host page size aligned; however
> > >> this is not new as KVM already requires memory slots to be host page
> > >> size aligned.
> > >>
> > >> Signed-off-by: Alexey Kardashevskiy 
> > >> ---
> > >> Changes:
> > >> v16:
> > >> * switched to 64bit math everywhere as there is no chance to see
> > >> region_add on RAM blocks even remotely close to 1<<64bytes.
> > >>
> > >> v15:
> > >> * banned unaligned sections
> > >> * added an vfio_prereg_gpa_to_ua() helper
> > >>
> > >> v14:
> > >> * s/free_container_exit/listener_release_exit/g
> > >> * added "if memory_region_is_iommu()" to 
> > >> vfio_prereg_listener_skipped_section
> > >> ---
> > >>  hw/vfio/Makefile.objs |   1 +
> > >>  hw/vfio/common.c  |  38 +---
> > >>  hw/vfio/prereg.c  | 137 
> > >> ++
> > >>  include/hw/vfio/vfio-common.h |   4 ++
> > >>  trace-events  |   2 +
> > >>  5 files changed, 172 insertions(+), 10 deletions(-)
> > >>  create mode 100644 hw/vfio/prereg.c
> > >>
> > >> diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
> > >> index ceddbb8..5800e0e 100644
> > >> --- a/hw/vfio/Makefile.objs
> > >> +++ b/hw/vfio/Makefile.objs
> > >> @@ -4,4 +4,5 @@ obj-$(CONFIG_PCI) += pci.o pci-quirks.o
> > >>  obj-$(CONFIG_SOFTMMU) += platform.o
> > >>  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
> > >>  obj-$(CONFIG_SOFTMMU) += amd-xgbe.o
> > >> +obj-$(CONFIG_SOFTMMU) += prereg.o
> > >>  endif
> > >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > >> index 2050040..496eb82 100644
> > >> --- a/hw/vfio/common.c
> > >> +++ b/hw/vfio/common.c
> > >> @@ -501,6 +501,9 @@ static const MemoryListener vfio_memory_listener = {
> > >>  static void vfio_listener_release(VFIOContainer *container)
> > >>  {
> > >>  memory_listener_unregister(&container->listener);
> > >> +if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> > >> +memory_listener_unregister(&container->prereg_listener);
> > >> +}
> > >>  }
> > >>
> > >>  int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion 
> > >> *region,
> > >> @@ -808,8 +811,8 @@ static int vfio_connect_container(VFIOGroup *group, 
> > >> AddressSpace *as)
> > >>  goto free_container_exit;
> > >>  }
> > >>
> > >> -ret = ioctl(fd, VFIO_SET_IOMMU,
> > >> -v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU);
> > >> +container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : 
> > >> VFIO_TYPE1_IOMMU;
> > >> +ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type);
> > >>  if (ret) {
> > >>  error_report("vfio: failed to set iommu for container: %m");
> > >>  ret = -errno;
> > >> @@ -834,8 +837,10 @@ static int vfio_connect_container(VFIOGroup *group, 
> > >> AddressSpace *as)
> > >>  if ((ret == 0) && (info.flags & VFIO_IOMMU_INFO_PGSIZES)) {
> > >>  container->iova_pgsizes = info.iova_pgsizes;
> > >>  }
> > >> -} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU)) {
> > >> +} else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) ||
> > >> +   ioctl(fd, VFIO_CHECK_EXTENSION, 
> > >> VFIO_SPAPR_TCE_v2_IOMMU)) {
> > >>  struct vfio_iommu_spapr_tce_info info;
> > >> +bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, 
> > >> VFIO_SPAPR_TCE_v2_IOMMU);
> > >>
> > >>  ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd);
> > >>  if (ret) {
> > >> @@ -843,7 +848,9 @@ static int vfio_connect_container(VFIOGroup

[Qemu-devel] inconsistent handling of "qemu64" CPU model

2016-05-25 Thread Chris Friesen


Hi,

I'm not sure where the problem lies, hence the CC to both lists.  Please copy me 
on the reply.


I'm playing with OpenStack's devstack environment on an Ubuntu 14.04 host with a 
Celeron 2961Y CPU.  (libvirt detects it as a Nehalem with a bunch of extra 
features.)  Qemu gives version 2.2.0 (Debian 1:2.2+dfsg-5expubuntu9.7~cloud2).


If I don't specify a virtual CPU model, it appears to give me a "qemu64" CPU, 
and /proc/cpuinfo in the guest instance looks something like this:


processor 0
vendor_id GenuineIntel
cpu family 6
model 6
model name: QEMU Virtual CPU version 2.2.0
stepping: 3
microcode: 0x1
flags: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush
mmx fxsr sse sse2 syscall nx lm rep_good nopl pni vmx cx16 x2apic popcnt 
hypervisor lahf_lm abm vnmi ept



However, if I explicitly specify a custom CPU model of "qemu64" the instance 
refuses to boot and I get a log saying:


libvirtError: unsupported configuration: guest and host CPU are not compatible: 
Host CPU does not provide required features: svmlibvirtError: unsupported 
configuration: guest and host CPU are not compatible: Host CPU does not provide 
required features: svm


When this happens, some of the XML for the domain looks like this:
  
hvm
 

  
qemu64

  

Of course "svm" is an AMD flag and I'm running an Intel CPU.  But why does it 
work when I just rely on the default virtual CPU?  Is kvm_default_unset_features 
handled differently when it's implicit vs explicit?


If I explicitly specify a custom CPU model of "kvm64" then it boots, but of 
course I get a different virtual CPU from what I get if I don't specify anything.


Following some old suggestions I tried turning off nested kvm, deleting 
/var/cache/libvirt/qemu/capabilities/*, and restarting libvirtd.  Didn't help.


So...anyone got any ideas what's going on?  Is there no way to explicitly 
specify the model that you get by default?



Thanks,
Chris

Re: [Qemu-devel] [PATCH qemu v16 09/19] spapr_iommu: Finish renaming vfio_accel to need_vfio

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:21PM +1000, Alexey Kardashevskiy wrote:
> 6a81dd17 "spapr_iommu: Rename vfio_accel parameter" renamed vfio_accel
> flag everywhere but one spot was missed.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 

Applied to ppc-for-2.7.


> ---
>  target-ppc/kvm_ppc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
> index fc79312..3b2090e 100644
> --- a/target-ppc/kvm_ppc.h
> +++ b/target-ppc/kvm_ppc.h
> @@ -163,7 +163,7 @@ static inline bool kvmppc_spapr_use_multitce(void)
>  
>  static inline void *kvmppc_create_spapr_tce(uint32_t liobn,
>  uint32_t window_size, int *fd,
> -bool vfio_accel)
> +bool need_vfio)
>  {
>  return NULL;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu v16 07/19] spapr_iommu: Move table allocation to helpers

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:19PM +1000, Alexey Kardashevskiy wrote:
> At the moment presence of vfio-pci devices on a bus affect the way
> the guest view table is allocated. If there is no vfio-pci on a PHB
> and the host kernel supports KVM acceleration of H_PUT_TCE, a table
> is allocated in KVM. However, if there is vfio-pci and we do yet not
> KVM acceleration for these, the table has to be allocated by
> the userspace. At the moment the table is allocated once at boot time
> but next patches will reallocate it.
> 
> This moves kvmppc_create_spapr_tce/g_malloc0 and their counterparts
> to helpers.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 

This is a reasonable clean up on its own, so I've applied to ppc-for-2.7.

> ---
>  hw/ppc/spapr_iommu.c | 58 
> +++-
>  trace-events |  2 +-
>  2 files changed, 40 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 277f289..8132f64 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -75,6 +75,37 @@ static IOMMUAccessFlags 
> spapr_tce_iommu_access_flags(uint64_t tce)
>  }
>  }
>  
> +static uint64_t *spapr_tce_alloc_table(uint32_t liobn,
> +   uint32_t page_shift,
> +   uint32_t nb_table,
> +   int *fd,
> +   bool need_vfio)
> +{
> +uint64_t *table = NULL;
> +uint64_t window_size = (uint64_t)nb_table << page_shift;
> +
> +if (kvm_enabled() && !(window_size >> 32)) {
> +table = kvmppc_create_spapr_tce(liobn, window_size, fd, need_vfio);
> +}
> +
> +if (!table) {
> +*fd = -1;
> +table = g_malloc0(nb_table * sizeof(uint64_t));
> +}
> +
> +trace_spapr_iommu_new_table(liobn, table, *fd);
> +
> +return table;
> +}
> +
> +static void spapr_tce_free_table(uint64_t *table, int fd, uint32_t nb_table)
> +{
> +if (!kvm_enabled() ||
> +(kvmppc_remove_spapr_tce(table, fd, nb_table) != 0)) {
> +g_free(table);
> +}
> +}
> +
>  /* Called from RCU critical section */
>  static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr 
> addr,
> bool is_write)
> @@ -141,21 +172,13 @@ static MemoryRegionIOMMUOps spapr_iommu_ops = {
>  static int spapr_tce_table_realize(DeviceState *dev)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
> -uint64_t window_size = (uint64_t)tcet->nb_table << tcet->page_shift;
>  
> -if (kvm_enabled() && !(window_size >> 32)) {
> -tcet->table = kvmppc_create_spapr_tce(tcet->liobn,
> -  window_size,
> -  &tcet->fd,
> -  tcet->need_vfio);
> -}
> -
> -if (!tcet->table) {
> -size_t table_size = tcet->nb_table * sizeof(uint64_t);
> -tcet->table = g_malloc0(table_size);
> -}
> -
> -trace_spapr_iommu_new_table(tcet->liobn, tcet, tcet->table, tcet->fd);
> +tcet->fd = -1;
> +tcet->table = spapr_tce_alloc_table(tcet->liobn,
> +tcet->page_shift,
> +tcet->nb_table,
> +&tcet->fd,
> +tcet->need_vfio);
>  
>  memory_region_init_iommu(&tcet->iommu, OBJECT(dev), &spapr_iommu_ops,
>   "iommu-spapr",
> @@ -241,11 +264,8 @@ static void spapr_tce_table_unrealize(DeviceState *dev, 
> Error **errp)
>  
>  QLIST_REMOVE(tcet, list);
>  
> -if (!kvm_enabled() ||
> -(kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
> - tcet->nb_table) != 0)) {
> -g_free(tcet->table);
> -}
> +spapr_tce_free_table(tcet->table, tcet->fd, tcet->nb_table);
> +tcet->fd = -1;
>  }
>  
>  MemoryRegion *spapr_tce_get_iommu(sPAPRTCETable *tcet)
> diff --git a/trace-events b/trace-events
> index 8350743..d96d344 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1431,7 +1431,7 @@ spapr_iommu_pci_get(uint64_t liobn, uint64_t ioba, 
> uint64_t ret, uint64_t tce) "
>  spapr_iommu_pci_indirect(uint64_t liobn, uint64_t ioba, uint64_t tce, 
> uint64_t iobaN, uint64_t tceN, uint64_t ret) "liobn=%"PRIx64" 
> ioba=0x%"PRIx64" tcelist=0x%"PRIx64" iobaN=0x%"PRIx64" tceN=0x%"PRIx64" 
> ret=%"PRId64
>  spapr_iommu_pci_stuff(uint64_t liobn, uint64_t ioba, uint64_t tce_value, 
> uint64_t npages, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" 
> tcevalue=0x%"PRIx64" npages=%"PRId64" ret=%"PRId64
>  spapr_iommu_xlate(uint64_t liobn, uint64_t ioba, uint64_t tce, unsigned 
> perm, unsigned pgsize) "liobn=%"PRIx64" 0x%"PRIx64" -> 0x%"PRIx64" perm=%u 
> mask=%x"
> -spapr_iommu_new_table(uint64_t liobn, void *tcet, void *table, int fd) 
> "liobn=%"PRI

Re: [Qemu-devel] [PATCH qemu v16 08/19] spapr_iommu: Introduce "enabled" state for TCE table

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:20PM +1000, Alexey Kardashevskiy wrote:
> Currently TCE tables are created once at start and their sizes never
> change. We are going to change that by introducing a Dynamic DMA windows
> support where DMA configuration may change during the guest execution.
> 
> This changes spapr_tce_new_table() to create an empty zero-size IOMMU
> memory region (IOMMU MR). Only LIOBN is assigned by the time of creation.
> It still will be called once at the owner object (VIO or PHB) creation.
> 
> This introduces an "enabled" state for TCE table objects with two
> helper functions - spapr_tce_table_enable()/spapr_tce_table_disable().
> - spapr_tce_table_enable() receives TCE table parameters, allocates
> a guest view of the TCE table (in the user space or KVM) and
> sets the correct size on the IOMMU MR.
> - spapr_tce_table_disable() disposes the table and resets the IOMMU MR
> size.
> 
> This changes the PHB reset handler to do the default DMA initialization
> instead of spapr_phb_realize(). This does not make differenct now but
> later with more than just one DMA window, we will have to remove them all
> and create the default one on a system reset.
> 
> No visible change in behaviour is expected except the actual table
> will be reallocated every reset. We might optimize this later.
> 
> The other way to implement this would be dynamically create/remove
> the TCE table QOM objects but this would make migration impossible
> as the migration code expects all QOM objects to exist at the receiver
> so we have to have TCE table objects created when migration begins.
> 
> spapr_tce_table_do_enable() is separated from from spapr_tce_table_enable()
> as later it will be called at the sPAPRTCETable post-migration stage when
> it already has all the properties set after the migration; the same is
> done for spapr_tce_table_disable().
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 
> ---
> Changes:
> v15:
> * made adjustments after removing spapr_phb_dma_window_enable()
> 
> v14:
> * added spapr_tce_table_do_disable(), will make difference in following
> patch with fully dynamic table migration
> 
> # Conflicts:
> # hw/ppc/spapr_pci.c
> ---
>  hw/ppc/spapr_iommu.c   | 86 
> --
>  hw/ppc/spapr_pci.c |  8 +++--
>  hw/ppc/spapr_vio.c |  8 ++---
>  include/hw/ppc/spapr.h | 10 +++---
>  4 files changed, 75 insertions(+), 37 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 8132f64..9bcd3f6 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -17,6 +17,7 @@
>   * License along with this library; if not, see 
> .
>   */
>  #include "qemu/osdep.h"
> +#include "qemu/error-report.h"
>  #include "hw/hw.h"
>  #include "sysemu/kvm.h"
>  #include "hw/qdev.h"
> @@ -174,15 +175,9 @@ static int spapr_tce_table_realize(DeviceState *dev)
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(dev);
>  
>  tcet->fd = -1;
> -tcet->table = spapr_tce_alloc_table(tcet->liobn,
> -tcet->page_shift,
> -tcet->nb_table,
> -&tcet->fd,
> -tcet->need_vfio);
> -
> +tcet->need_vfio = false;
>  memory_region_init_iommu(&tcet->iommu, OBJECT(dev), &spapr_iommu_ops,
> - "iommu-spapr",
> - (uint64_t)tcet->nb_table << tcet->page_shift);
> + "iommu-spapr", 0);
>  
>  QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>  
> @@ -224,14 +219,10 @@ void spapr_tce_set_need_vfio(sPAPRTCETable *tcet, bool 
> need_vfio)
>  tcet->table = newtable;
>  }
>  
> -sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
> -   uint64_t bus_offset,
> -   uint32_t page_shift,
> -   uint32_t nb_table,
> -   bool need_vfio)
> +sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn)
>  {
>  sPAPRTCETable *tcet;
> -char tmp[64];
> +char tmp[32];
>  
>  if (spapr_tce_find_by_liobn(liobn)) {
>  fprintf(stderr, "Attempted to create TCE table with duplicate"
> @@ -239,16 +230,8 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, 
> uint32_t liobn,
>  return NULL;
>  }
>  
> -if (!nb_table) {
> -return NULL;
> -}
> -
>  tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>  tcet->liobn = liobn;
> -tcet->bus_offset = bus_offset;
> -tcet->page_shift = page_shift;
> -tcet->nb_table = nb_table;
> -tcet->need_vfio = need_vfio;
>  
>  snprintf(tmp, sizeof(tmp), "tce-table-%x", liobn);
>  object_property_add_child(OBJECT(owner), tmp, OBJECT(tcet), NULL);
> @@ -258,14 +241,69 @@ sPAPRTCETable *spapr_tce_new_table

Re: [Qemu-devel] [PATCH qemu v16 06/19] spapr_pci: Use correct DMA LIOBN when composing the device tree

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:18PM +1000, Alexey Kardashevskiy wrote:
> The user could have picked LIOBN via the CLI but the device tree
> rendering code would still use the value derived from the PHB index
> (which is the default fallback if LIOBN is not set in the CLI).
> 
> This replaces SPAPR_PCI_LIOBN() with the actual DMA LIOBN value.
> 
> Signed-off-by: Alexey Kardashevskiy 

Applied to ppc-for-2.7.

> ---
> Changes:
> v16:
> * new in the series
> ---
>  hw/ppc/spapr_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 573e635..742d127 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1815,7 +1815,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>  _FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
>   sizeof(interrupt_map)));
>  
> -tcet = spapr_tce_find_by_liobn(SPAPR_PCI_LIOBN(phb->index, 0));
> +tcet = spapr_tce_find_by_liobn(phb->dma_liobn);
>  if (!tcet) {
>  return -1;
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu v16 10/19] spapr_iommu: Migrate full state

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:22PM +1000, Alexey Kardashevskiy wrote:
> The source guest could have reallocated the default TCE table and
> migrate bigger/smaller table. This adds reallocation in post_load()
> if the default table size is different on source and destination.
> 
> This adds @bus_offset, @page_shift, @enabled to the migration stream.
> These cannot change without dynamic DMA windows so no change in
> behavior is expected now.
> 
> Signed-off-by: Alexey Kardashevskiy 
> David Gibson 
> ---
> Changes:
> v15:
> * squashed "migrate full state" into this
> * added missing tcet->mig_nb_table initialization in 
> spapr_tce_table_pre_save()
> * instead of bumping the version, moved extra parameters to subsection
> 
> v14:
> * new to the series
> ---
>  hw/ppc/spapr_iommu.c   | 67 
> --
>  include/hw/ppc/spapr.h |  2 ++
>  trace-events   |  2 ++
>  3 files changed, 69 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 9bcd3f6..52b1e0d 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -137,33 +137,96 @@ static IOMMUTLBEntry 
> spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr,
>  return ret;
>  }
>  
> +static void spapr_tce_table_pre_save(void *opaque)
> +{
> +sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> +
> +tcet->mig_table = tcet->table;
> +tcet->mig_nb_table = tcet->nb_table;
> +
> +trace_spapr_iommu_pre_save(tcet->liobn, tcet->mig_nb_table,
> +   tcet->bus_offset, tcet->page_shift);
> +}
> +
> +static void spapr_tce_table_do_enable(sPAPRTCETable *tcet);
> +static void spapr_tce_table_do_disable(sPAPRTCETable *tcet);
> +
>  static int spapr_tce_table_post_load(void *opaque, int version_id)
>  {
>  sPAPRTCETable *tcet = SPAPR_TCE_TABLE(opaque);
> +uint32_t old_nb_table = tcet->nb_table;
>  
>  if (tcet->vdev) {
>  spapr_vio_set_bypass(tcet->vdev, tcet->bypass);
>  }
>  
> +if (tcet->enabled) {
> +if (tcet->nb_table != tcet->mig_nb_table) {
> +if (tcet->nb_table) {
> +spapr_tce_table_do_disable(tcet);
> +}
> +tcet->nb_table = tcet->mig_nb_table;
> +spapr_tce_table_do_enable(tcet);
> +}
> +
> +memcpy(tcet->table, tcet->mig_table,
> +   tcet->nb_table * sizeof(tcet->table[0]));
> +
> +free(tcet->mig_table);
> +tcet->mig_table = NULL;
> +} else if (tcet->table) {
> +/* Destination guest has a default table but source does not -> free 
> */
> +spapr_tce_table_do_disable(tcet);
> +}
> +
> +trace_spapr_iommu_post_load(tcet->liobn, old_nb_table, tcet->nb_table,
> +tcet->bus_offset, tcet->page_shift);
> +
>  return 0;
>  }
>  
> +static bool spapr_tce_table_ex_needed(void *opaque)
> +{
> +sPAPRTCETable *tcet = opaque;
> +
> +return tcet->bus_offset || tcet->page_shift != 0xC;

|| !tcet->enabled ??

AFAICT you're assuming that the existing tcet on the destination will
be enabled prior to an incoming migration.

> +}
> +
> +static const VMStateDescription vmstate_spapr_tce_table_ex = {
> +.name = "spapr_iommu_ex",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_tce_table_ex_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_BOOL(enabled, sPAPRTCETable),

..or could you encode enabled as !!mig_nb_table?

> +VMSTATE_UINT64(bus_offset, sPAPRTCETable),
> +VMSTATE_UINT32(page_shift, sPAPRTCETable),
> +VMSTATE_END_OF_LIST()
> +},
> +};
> +
>  static const VMStateDescription vmstate_spapr_tce_table = {
>  .name = "spapr_iommu",
>  .version_id = 2,
>  .minimum_version_id = 2,
> +.pre_save = spapr_tce_table_pre_save,
>  .post_load = spapr_tce_table_post_load,
>  .fields  = (VMStateField []) {
>  /* Sanity check */
>  VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
> -VMSTATE_UINT32_EQUAL(nb_table, sPAPRTCETable),
>  
>  /* IOMMU state */
> +VMSTATE_UINT32(mig_nb_table, sPAPRTCETable),
>  VMSTATE_BOOL(bypass, sPAPRTCETable),
> -VMSTATE_VARRAY_UINT32(table, sPAPRTCETable, nb_table, 0, 
> vmstate_info_uint64, uint64_t),
> +VMSTATE_VARRAY_UINT32_ALLOC(mig_table, sPAPRTCETable, mig_nb_table, 
> 0,
> +vmstate_info_uint64, uint64_t),
>  
>  VMSTATE_END_OF_LIST()
>  },
> +.subsections = (const VMStateDescription*[]) {
> +&vmstate_spapr_tce_table_ex,
> +NULL
> +}
>  };
>  
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 0140810..d36dda2 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -540,6 +540,8 @@ struct sPAPRTCETable {
>  uint64_t bus_offset;
>  uint32_t page_shift;
>  uint64_t *t

[Qemu-devel] [PATCH v3 0/5] qcow2_co_write_zeroes and related improvements

2016-05-25 Thread Eric Blake

This series improves write_zeroes for qcow2

Since the work conflicts with my proposed patches to switch
write_zeroes to a byte-base interface, I figured I'd fix the
bugs and get this part nailed first, then rebase my other
work on top, rather than making Denis have to do the dirty work.

Changes from v2:
- patch 1: close to a rewrite, same concept but in fewer lines
of code, and with testsuite change to back it up as valid; but
keep authorship
- patch 2, 3: unchanged
- patch 4 from original series already applied
- patch 4 in this series is new
- patch 5: rewrite of the original v5 that catches even more
cases; claim authorship

[hmm, maybe I should treat 1 and 5 the same on whether to leave
authorship unchanged or just credit Denis for the original idea]

Denis V. Lunev (3):
  block: split write_zeroes always
  qcow2: simplify logic in qcow2_co_write_zeroes
  qcow2: add tracepoints for qcow2_co_write_zeroes

Eric Blake (2):
  qemu-iotests: Test one more spot for optimizing write_zeroes
  qcow2: Catch more unaligned write_zero into zero cluster

 block/io.c | 30 -
 block/qcow2.c  | 67 --
 tests/qemu-iotests/154 | 40 +++
 tests/qemu-iotests/154.out | 55 -
 trace-events   |  2 ++
 5 files changed, 137 insertions(+), 57 deletions(-)

-- 
2.5.5

[Qemu-devel] [PATCH v3 3/5] qcow2: add tracepoints for qcow2_co_write_zeroes

2016-05-25 Thread Eric Blake

From: "Denis V. Lunev" 

This patch follows guidelines of all other tracepoints in qcow2, like ones
in qcow2_co_writev. I think that they should dump values in the same
quantities or be changed all together.

Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Kevin Wolf 
Message-Id: <1463476543-3087-4-git-send-email-...@openvz.org>
[eblake: typo fix in commit message]
Signed-off-by: Eric Blake 
---
 block/qcow2.c | 5 +
 trace-events  | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2f73201..105fd5e 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2437,6 +2437,9 @@ static coroutine_fn int 
qcow2_co_write_zeroes(BlockDriverState *bs,
 int head = sector_num % s->cluster_sectors;
 int tail = (sector_num + nb_sectors) % s->cluster_sectors;

+trace_qcow2_write_zeroes_start_req(qemu_coroutine_self(), sector_num,
+   nb_sectors);
+
 if (head != 0 || tail != 0) {
 int64_t cl_start = sector_num - head;

@@ -2459,6 +2462,8 @@ static coroutine_fn int 
qcow2_co_write_zeroes(BlockDriverState *bs,
 qemu_co_mutex_lock(&s->lock);
 }

+trace_qcow2_write_zeroes(qemu_coroutine_self(), sector_num, nb_sectors);
+
 /* Whatever is left can use real zero clusters */
 ret = qcow2_zero_clusters(bs, sector_num << BDRV_SECTOR_BITS, nb_sectors);
 qemu_co_mutex_unlock(&s->lock);
diff --git a/trace-events b/trace-events
index 4450d8f..46726cc 100644
--- a/trace-events
+++ b/trace-events
@@ -612,6 +612,8 @@ qcow2_writev_done_req(void *co, int ret) "co %p ret %d"
 qcow2_writev_start_part(void *co) "co %p"
 qcow2_writev_done_part(void *co, int cur_nr_sectors) "co %p cur_nr_sectors %d"
 qcow2_writev_data(void *co, uint64_t offset) "co %p offset %" PRIx64
+qcow2_write_zeroes_start_req(void *co, int64_t sector, int nb_sectors) "co %p 
sector %" PRIx64 " nb_sectors %d"
+qcow2_write_zeroes(void *co, int64_t sector, int nb_sectors) "co %p sector %" 
PRIx64 " nb_sectors %d"

 # block/qcow2-cluster.c
 qcow2_alloc_clusters_offset(void *co, uint64_t offset, int num) "co %p offset 
%" PRIx64 " num %d"
-- 
2.5.5

[Qemu-devel] [PATCH v3 5/5] qcow2: Catch more unaligned write_zero into zero cluster

2016-05-25 Thread Eric Blake

is_zero_cluster() and is_zero_cluster_top_locked() are used only
by qcow2_co_write_zeroes().  The former is too broad (we don't
care if the sectors we are about to overwrite are non-zero, only
that all other sectors in the cluster are zero), so it needs to
be called up to twice but with smaller limits - rename it along
with adding the neeeded parameter.  The latter can be inlined for
more compact code.

The testsuite change shows that we now have a sparser top file
when an unaligned write_zeroes overwrites the only portion of
the backing file with data.

Based on a patch proposal by Denis V. Lunev.

CC: Denis V. Lunev 
Signed-off-by: Eric Blake 
---
 block/qcow2.c  | 47 +++---
 tests/qemu-iotests/154.out |  8 
 2 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 105fd5e..ecac399 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2406,26 +2406,19 @@ finish:
 }


-static bool is_zero_cluster(BlockDriverState *bs, int64_t start)
+static bool is_zero_sectors(BlockDriverState *bs, int64_t start,
+uint32_t count)
 {
-BDRVQcow2State *s = bs->opaque;
 int nr;
 BlockDriverState *file;
-int64_t res = bdrv_get_block_status_above(bs, NULL, start,
-  s->cluster_sectors, &nr, &file);
-return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == s->cluster_sectors;
-}
+int64_t res;

-static bool is_zero_cluster_top_locked(BlockDriverState *bs, int64_t start)
-{
-BDRVQcow2State *s = bs->opaque;
-int nr = s->cluster_sectors;
-uint64_t off;
-int ret;
-
-ret = qcow2_get_cluster_offset(bs, start << BDRV_SECTOR_BITS, &nr, &off);
-assert(nr == s->cluster_sectors);
-return ret == QCOW2_CLUSTER_UNALLOCATED || ret == QCOW2_CLUSTER_ZERO;
+if (!count) {
+return true;
+}
+res = bdrv_get_block_status_above(bs, NULL, start, count,
+  &nr, &file);
+return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == count;
 }

 static coroutine_fn int qcow2_co_write_zeroes(BlockDriverState *bs,
@@ -2434,27 +2427,33 @@ static coroutine_fn int 
qcow2_co_write_zeroes(BlockDriverState *bs,
 int ret;
 BDRVQcow2State *s = bs->opaque;

-int head = sector_num % s->cluster_sectors;
-int tail = (sector_num + nb_sectors) % s->cluster_sectors;
+uint32_t head = sector_num % s->cluster_sectors;
+uint32_t tail = (sector_num + nb_sectors) % s->cluster_sectors;

 trace_qcow2_write_zeroes_start_req(qemu_coroutine_self(), sector_num,
nb_sectors);

-if (head != 0 || tail != 0) {
+if (head || tail) {
 int64_t cl_start = sector_num - head;
+uint64_t off;
+int nr;

 assert(cl_start + s->cluster_sectors >= sector_num + nb_sectors);

-sector_num = cl_start;
-nb_sectors = s->cluster_sectors;
-
-if (!is_zero_cluster(bs, sector_num)) {
+/* check whether remainder of cluster already reads as zero */
+if (!(is_zero_sectors(bs, cl_start, head) &&
+  is_zero_sectors(bs, sector_num + nb_sectors,
+  -tail & (s->cluster_sectors - 1 {
 return -ENOTSUP;
 }

 qemu_co_mutex_lock(&s->lock);
 /* We can have new write after previous check */
-if (!is_zero_cluster_top_locked(bs, sector_num)) {
+sector_num = cl_start;
+nb_sectors = nr = s->cluster_sectors;
+ret = qcow2_get_cluster_offset(bs, cl_start << BDRV_SECTOR_BITS,
+   &nr, &off);
+if (ret != QCOW2_CLUSTER_UNALLOCATED && ret != QCOW2_CLUSTER_ZERO) {
 qemu_co_mutex_unlock(&s->lock);
 return -ENOTSUP;
 }
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index 531fd8c..da9eabd 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -102,13 +102,13 @@ wrote 2048/2048 bytes at offset 29696
 read 4096/4096 bytes at offset 28672
 4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 4096, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 20480},
+{ "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 8192, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 12288, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 24576},
+{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 16384, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 20480, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 28672},
+{ "start": 20480, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 24576, "length": 4096, "depth": 1, "zero": true, "data": false},
-{ "start": 286

[Qemu-devel] [PATCH v3 1/5] block: split write_zeroes always

2016-05-25 Thread Eric Blake

From: "Denis V. Lunev" 

We should split requests even if they are less than write_zeroes_alignment.
For example we can have the following request:
  offset 62k
  size   4k
  write_zeroes_alignment 64k
The original code sent 1 request covering 2 qcow2 clusters, and resulted
in both clusters being allocated. But by splitting the request, we can
cater to the case where one of the two clusters can be zeroed as a
whole, for only 1 cluster allocated after the operation.

Signed-off-by: Denis V. Lunev 
CC: Eric Blake 
CC: Kevin Wolf 
Message-Id: <1463476543-3087-2-git-send-email-...@openvz.org>

[eblake: Avoid exceeding nb_sectors, hoist alignment checks out of
loop, and update testsuite to show that patch works]

Signed-off-by: Eric Blake 
---
 block/io.c | 30 +-
 tests/qemu-iotests/154.out | 18 --
 2 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/block/io.c b/block/io.c
index f474b9a..26b5845 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1118,28 +1118,32 @@ static int coroutine_fn 
bdrv_co_do_write_zeroes(BlockDriverState *bs,
 struct iovec iov = {0};
 int ret = 0;
 bool need_flush = false;
+int head = 0;
+int tail = 0;

 int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_write_zeroes,
 BDRV_REQUEST_MAX_SECTORS);
+if (bs->bl.write_zeroes_alignment) {
+assert(is_power_of_2(bs->bl.write_zeroes_alignment));
+head = sector_num & (bs->bl.write_zeroes_alignment - 1);
+tail = (sector_num + nb_sectors) & (bs->bl.write_zeroes_alignment - 1);
+max_write_zeroes &= ~(bs->bl.write_zeroes_alignment - 1);
+}

 while (nb_sectors > 0 && !ret) {
 int num = nb_sectors;

 /* Align request.  Block drivers can expect the "bulk" of the request
- * to be aligned.
+ * to be aligned, and that unaligned requests do not cross cluster
+ * boundaries.
  */
-if (bs->bl.write_zeroes_alignment
-&& num > bs->bl.write_zeroes_alignment) {
-if (sector_num % bs->bl.write_zeroes_alignment != 0) {
-/* Make a small request up to the first aligned sector.  */
-num = bs->bl.write_zeroes_alignment;
-num -= sector_num % bs->bl.write_zeroes_alignment;
-} else if ((sector_num + num) % bs->bl.write_zeroes_alignment != 
0) {
-/* Shorten the request to the last aligned sector.  num cannot
- * underflow because num > bs->bl.write_zeroes_alignment.
- */
-num -= (sector_num + num) % bs->bl.write_zeroes_alignment;
-}
+if (head) {
+/* Make a small request up to the first aligned sector.  */
+num = MIN(nb_sectors, bs->bl.write_zeroes_alignment - head);
+head = 0;
+} else if (tail && num > bs->bl.write_zeroes_alignment) {
+/* Shorten the request to the last aligned sector.  */
+num -= tail;
 }

 /* limit request size */
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index 8946b73..b9d27c5 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -106,11 +106,14 @@ read 1024/1024 bytes at offset 67584
 read 5120/5120 bytes at offset 68608
 5 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 8192, "depth": 0, "zero": false, "data": true, 
"offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 20480},
+{ "start": 36864, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 40960, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 49152, "length": 8192, "depth": 0, "zero": false, "data": true, 
"offset": 28672},
+{ "start": 49152, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 24576},
+{ "start": 53248, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 57344, "length": 8192, "depth": 1, "zero": true, "data": false},
-{ "start": 65536, "length": 8192, "depth": 0, "zero": false, "data": true, 
"offset": 36864},
+{ "start": 65536, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 28672},
+{ "start": 69632, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 73728, "length": 134144000, "depth": 1, "zero": true, "data": 
false}]

 == spanning two clusters, non-zero after request ==
@@ -145,11 +148,14 @@ read 7168/7168 bytes at offset 65536
 read 1024/1024 bytes at offset 72704
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 [{ "start": 0, "length": 32768, "depth": 1, "zero": true, "data": false},
-{ "start": 32768, "length": 8192, "depth": 0, "zero": false, "data": true, 
"offset": 20480},
+{ "start": 32768, "length": 4096, "depth": 0, "zero": true, "data": false},
+{ "start": 36864, "

[Qemu-devel] [PATCH v3 4/5] qemu-iotests: Test one more spot for optimizing write_zeroes

2016-05-25 Thread Eric Blake

Add another test to 154, showing that we currently allocate a
data cluster in the top layer if any sector of the backing file
was allocated.  The next patch will optimize this case.

Signed-off-by: Eric Blake 
---
 tests/qemu-iotests/154 | 40 
 tests/qemu-iotests/154.out | 37 +
 2 files changed, 77 insertions(+)

diff --git a/tests/qemu-iotests/154 b/tests/qemu-iotests/154
index 23f1b3a..5905c55 100755
--- a/tests/qemu-iotests/154
+++ b/tests/qemu-iotests/154
@@ -115,6 +115,46 @@ $QEMU_IO -c "read -P 0 40k 3k" "$TEST_IMG" | 
_filter_qemu_io
 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map

 echo
+echo == write_zeroes covers non-zero data ==
+
+CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $size
+_make_test_img -b "$TEST_IMG.base"
+
+# non-zero data at front of request
+# Backing file: -- XX -- --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 5k 1k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 5k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 4k 4k" "$TEST_IMG" | _filter_qemu_io
+
+# non-zero data at end of request
+# Backing file: -- -- XX --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 14k 1k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 13k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 12k 4k" "$TEST_IMG" | _filter_qemu_io
+
+# non-zero data matches size of request
+# Backing file: -- XX XX --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 21k 2k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 21k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 20k 4k" "$TEST_IMG" | _filter_qemu_io
+
+# non-zero data smaller than request
+# Backing file: -- -X X- --
+# Active layer: -- 00 00 --
+
+$QEMU_IO -c "write -P 0x11 30208 1k" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -z 29k 2k" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0 28k 4k" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map
+
+echo
 echo == spanning two clusters, non-zero before request ==

 CLUSTER_SIZE=512 TEST_IMG="$TEST_IMG.base" _make_test_img $size
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index b9d27c5..531fd8c 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -74,6 +74,43 @@ read 3072/3072 bytes at offset 40960
 { "start": 40960, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 24576},
 { "start": 45056, "length": 134172672, "depth": 1, "zero": true, "data": 
false}]

+== write_zeroes covers non-zero data ==
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
backing_file=TEST_DIR/t.IMGFMT.base
+wrote 1024/1024 bytes at offset 5120
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 5120
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 4096
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 14336
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 13312
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 12288
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 21504
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 21504
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 20480
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1024/1024 bytes at offset 30208
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 2048/2048 bytes at offset 29696
+2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 28672
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+[{ "start": 0, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 4096, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 20480},
+{ "start": 8192, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 12288, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 24576},
+{ "start": 16384, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 20480, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 28672},
+{ "start": 24576, "length": 4096, "depth": 1, "zero": true, "data": false},
+{ "start": 28672, "length": 4096, "depth": 0, "zero": false, "data": true, 
"offset": 32768},
+{ "start": 32768, "length": 134184960, "depth": 1, "zero": true, "data": 
false}]
+
 == spanning two clusters, non-zero before request ==
 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=134217728
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 
backing_file=TEST_DIR/t.IMGFMT.base
-- 
2.5.5

[Qemu-devel] [PATCH v3 2/5] qcow2: simplify logic in qcow2_co_write_zeroes

2016-05-25 Thread Eric Blake

From: "Denis V. Lunev" 

Unaligned requests will occupy only one cluster. This is true since the
previous commit. Simplify the code taking this consideration into
account.

In other words, the caller is now buggy if it ever passes us an unaligned
request that crosses cluster boundaries (the only requests that can cross
boundaries will be aligned).

There are no other changes so far.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Eric Blake 
CC: Eric Blake 
CC: Kevin Wolf 
Message-Id: <1463476543-3087-3-git-send-email-...@openvz.org>
---
 block/qcow2.c | 23 +--
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index c9306a7..2f73201 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2438,33 +2438,20 @@ static coroutine_fn int 
qcow2_co_write_zeroes(BlockDriverState *bs,
 int tail = (sector_num + nb_sectors) % s->cluster_sectors;

 if (head != 0 || tail != 0) {
-int64_t cl_end = -1;
+int64_t cl_start = sector_num - head;

-sector_num -= head;
-nb_sectors += head;
+assert(cl_start + s->cluster_sectors >= sector_num + nb_sectors);

-if (tail != 0) {
-nb_sectors += s->cluster_sectors - tail;
-}
+sector_num = cl_start;
+nb_sectors = s->cluster_sectors;

 if (!is_zero_cluster(bs, sector_num)) {
 return -ENOTSUP;
 }

-if (nb_sectors > s->cluster_sectors) {
-/* Technically the request can cover 2 clusters, f.e. 4k write
-   at s->cluster_sectors - 2k offset. One of these cluster can
-   be zeroed, one unallocated */
-cl_end = sector_num + nb_sectors - s->cluster_sectors;
-if (!is_zero_cluster(bs, cl_end)) {
-return -ENOTSUP;
-}
-}
-
 qemu_co_mutex_lock(&s->lock);
 /* We can have new write after previous check */
-if (!is_zero_cluster_top_locked(bs, sector_num) ||
-(cl_end > 0 && !is_zero_cluster_top_locked(bs, cl_end))) {
+if (!is_zero_cluster_top_locked(bs, sector_num)) {
 qemu_co_mutex_unlock(&s->lock);
 return -ENOTSUP;
 }
-- 
2.5.5

[Qemu-devel] [PULL V3 20/20] net/net: Add SocketReadState for reuse codes

2016-05-25 Thread Jason Wang

From: Zhang Chen 

This function is from net/socket.c, move it to net.c and net.h.
Add SocketReadState to make others reuse net_fill_rstate().
suggestion from jason.

v4:
 - move 'rs->finalize = finalize' to rs_init()

v3:
 - remove SocketReadState init callback
 - put finalize callback to net_fill_rstate()

v2:
 - rename ReadState to SocketReadState
 - add SocketReadState init and finalize callback

v1:
 - init patch

Signed-off-by: Zhang Chen 
Signed-off-by: Li Zhijian 
Signed-off-by: Wen Congyang 
Signed-off-by: Jason Wang 
---
 include/net/net.h   | 13 +
 net/filter-mirror.c | 66 +++--
 net/net.c   | 70 
 net/socket.c| 77 +++--
 4 files changed, 121 insertions(+), 105 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 422bc3e..a69e382 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -62,6 +62,8 @@ typedef void (SetOffload)(NetClientState *, int, int, int, 
int, int);
 typedef void (SetVnetHdrLen)(NetClientState *, int);
 typedef int (SetVnetLE)(NetClientState *, bool);
 typedef int (SetVnetBE)(NetClientState *, bool);
+typedef struct SocketReadState SocketReadState;
+typedef void (SocketReadStateFinalize)(SocketReadState *rs);
 
 typedef struct NetClientInfo {
 NetClientOptionsKind type;
@@ -107,6 +109,15 @@ typedef struct NICState {
 bool peer_deleted;
 } NICState;
 
+struct SocketReadState {
+int state; /* 0 = getting length, 1 = getting data */
+uint32_t index;
+uint32_t packet_len;
+uint8_t buf[NET_BUFSIZE];
+SocketReadStateFinalize *finalize;
+};
+
+int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size);
 char *qemu_mac_strdup_printf(const uint8_t *macaddr);
 NetClientState *qemu_find_netdev(const char *id);
 int qemu_find_net_clients_except(const char *id, NetClientState **ncs,
@@ -165,6 +176,8 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
 
 void print_net_client(Monitor *mon, NetClientState *nc);
 void hmp_info_network(Monitor *mon, const QDict *qdict);
+void net_socket_rs_init(SocketReadState *rs,
+SocketReadStateFinalize *finalize);
 
 /* NIC info */
 
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index c0c4dc6..35df374 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -40,10 +40,7 @@ typedef struct MirrorState {
 char *outdev;
 CharDriverState *chr_in;
 CharDriverState *chr_out;
-int state; /* 0 = getting length, 1 = getting data */
-unsigned int index;
-unsigned int packet_len;
-uint8_t buf[REDIRECTOR_MAX_LEN];
+SocketReadState rs;
 } MirrorState;
 
 static int filter_mirror_send(CharDriverState *chr_out,
@@ -108,51 +105,12 @@ static void redirector_chr_read(void *opaque, const 
uint8_t *buf, int size)
 {
 NetFilterState *nf = opaque;
 MirrorState *s = FILTER_REDIRECTOR(nf);
-unsigned int l;
-
-while (size > 0) {
-/* reassemble a packet from the network */
-switch (s->state) { /* 0 = getting length, 1 = getting data */
-case 0:
-l = 4 - s->index;
-if (l > size) {
-l = size;
-}
-memcpy(s->buf + s->index, buf, l);
-buf += l;
-size -= l;
-s->index += l;
-if (s->index == 4) {
-/* got length */
-s->packet_len = ntohl(*(uint32_t *)s->buf);
-s->index = 0;
-s->state = 1;
-}
-break;
-case 1:
-l = s->packet_len - s->index;
-if (l > size) {
-l = size;
-}
-if (s->index + l <= sizeof(s->buf)) {
-memcpy(s->buf + s->index, buf, l);
-} else {
-error_report("serious error: oversized packet received.");
-s->index = s->state = 0;
-qemu_chr_add_handlers(s->chr_in, NULL, NULL, NULL, NULL);
-return;
-}
-
-s->index += l;
-buf += l;
-size -= l;
-if (s->index >= s->packet_len) {
-s->index = 0;
-s->state = 0;
-redirector_to_filter(nf, s->buf, s->packet_len);
-}
-break;
-}
+int ret;
+
+ret = net_fill_rstate(&s->rs, buf, size);
+
+if (ret == -1) {
+qemu_chr_add_handlers(s->chr_in, NULL, NULL, NULL, NULL);
 }
 }
 
@@ -258,6 +216,14 @@ static void filter_mirror_setup(NetFilterState *nf, Error 
**errp)
 }
 }
 
+static void redirector_rs_finalize(SocketReadState *rs)
+{
+MirrorState *s = container_of(rs, MirrorState, rs);
+NetFilterState *nf = NETFILTER(s);
+
+redirector_to_filter(nf, rs->buf, rs->packet_len);
+}
+
 static void filter_redirector_setup(NetFilterState *nf, Error **errp)
 {
 MirrorState *s = FILTER_REDIRECTOR(nf);

[Qemu-devel] [PULL V3 16/20] e1000: Move out code that will be reused in e1000e

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Code that will be shared moved to a separate files.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 MAINTAINERS|   5 +
 hw/net/Makefile.objs   |   2 +-
 hw/net/e1000.c | 411 +++--
 hw/net/e1000x_common.c | 267 
 hw/net/e1000x_common.h | 213 +
 trace-events   |  13 ++
 6 files changed, 591 insertions(+), 320 deletions(-)
 create mode 100644 hw/net/e1000x_common.c
 create mode 100644 hw/net/e1000x_common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dc5e536..e379f38 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -980,6 +980,11 @@ F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 
+e1000x
+M: Dmitry Fleytman 
+S: Maintained
+F: hw/net/e1000x*
+
 Subsystems
 --
 Audio
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 527d264..bc69948 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -6,7 +6,7 @@ common-obj-$(CONFIG_NE2000_PCI) += ne2000.o
 common-obj-$(CONFIG_EEPRO100_PCI) += eepro100.o
 common-obj-$(CONFIG_PCNET_PCI) += pcnet-pci.o
 common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
-common-obj-$(CONFIG_E1000_PCI) += e1000.o
+common-obj-$(CONFIG_E1000_PCI) += e1000.o e1000x_common.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
 common-obj-$(CONFIG_VMXNET3_PCI) += net_tx_pkt.o net_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8e79b55..36e3dbe 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -36,7 +36,7 @@
 #include "qemu/iov.h"
 #include "qemu/range.h"
 
-#include "e1000_regs.h"
+#include "e1000x_common.h"
 
 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
@@ -64,11 +64,6 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 #define PNPMMIO_SIZE  0x2
 #define MIN_BUF_SIZE  60 /* Min. octets in an ethernet frame sans FCS */
 
-/* this is the size past which hardware will drop packets when setting LPE=0 */
-#define MAXIMUM_ETHERNET_VLAN_SIZE 1522
-/* this is the size past which hardware will drop packets when setting LPE=1 */
-#define MAXIMUM_ETHERNET_LPE_SIZE 16384
-
 #define MAXIMUM_ETHERNET_HDR_LEN (14+4)
 
 /*
@@ -102,22 +97,9 @@ typedef struct E1000State_st {
 unsigned char vlan[4];
 unsigned char data[0x1];
 uint16_t size;
-unsigned char sum_needed;
 unsigned char vlan_needed;
-uint8_t ipcss;
-uint8_t ipcso;
-uint16_t ipcse;
-uint8_t tucss;
-uint8_t tucso;
-uint16_t tucse;
-uint8_t hdr_len;
-uint16_t mss;
-uint32_t paylen;
+e1000x_txd_props props;
 uint16_t tso_frames;
-char tse;
-int8_t ip;
-int8_t tcp;
-char cptse; // current packet tse bit
 } tx;
 
 struct {
@@ -162,52 +144,19 @@ typedef struct E1000BaseClass {
 #define E1000_DEVICE_GET_CLASS(obj) \
 OBJECT_GET_CLASS(E1000BaseClass, (obj), TYPE_E1000_BASE)
 
-#define defreg(x)x = (E1000_##x>>2)
-enum {
-defreg(CTRL),defreg(EECD),defreg(EERD),defreg(GPRC),
-defreg(GPTC),defreg(ICR), defreg(ICS), defreg(IMC),
-defreg(IMS), defreg(LEDCTL),  defreg(MANC),defreg(MDIC),
-defreg(MPC), defreg(PBA), defreg(RCTL),defreg(RDBAH),
-defreg(RDBAL),   defreg(RDH), defreg(RDLEN),   defreg(RDT),
-defreg(STATUS),  defreg(SWSM),defreg(TCTL),defreg(TDBAH),
-defreg(TDBAL),   defreg(TDH), defreg(TDLEN),   defreg(TDT),
-defreg(TORH),defreg(TORL),defreg(TOTH),defreg(TOTL),
-defreg(TPR), defreg(TPT), defreg(TXDCTL),  defreg(WUFC),
-defreg(RA),  defreg(MTA), defreg(CRCERRS), defreg(VFTA),
-defreg(VET), defreg(RDTR),defreg(RADV),defreg(TADV),
-defreg(ITR), defreg(FCRUC),   defreg(TDFH),defreg(TDFT),
-defreg(TDFHS),   defreg(TDFTS),   defreg(TDFPC),   defreg(RDFH),
-defreg(RDFT),defreg(RDFHS),   defreg(RDFTS),   defreg(RDFPC),
-defreg(IPAV),defreg(WUC), defreg(WUS), defreg(AIT),
-defreg(IP6AT),   defreg(IP4AT),   defreg(FFLT),defreg(FFMT),
-defreg(FFVT),defreg(WUPM),defreg(PBM), defreg(SCC),
-defreg(ECOL),defreg(MCC), defreg(LATECOL), defreg(COLC),
-defreg(DC),  defreg(TNCRS),   defreg(SEC), defreg(CEXTERR),
-defreg(RLEC),defreg(XONRXC),  defreg(XONTXC),  defreg(XOFFRXC),
-defreg(XOFFTXC), defreg(RFC), defreg(RJC), defreg(RNBC),
-defreg(TSCTFC),  defreg(MGTPRC),  defreg(MGTPDC),  defreg(MGTPTC),
-defreg(RUC), defreg(ROC), defreg(GORCL),   defreg(GORCH),
-defreg(GOTCL),   defreg(GOTCH),   defreg(BPRC),defreg(MPRC),
-defreg(TSCTC),   defreg(PRC64),   defreg(PRC127),  defreg(PRC255),
-defreg(PRC511),  defreg(PRC1023), defreg(PRC1522), defreg(PTC

[Qemu-devel] [PULL V3 15/20] e1000_regs: Add definitions for Intel 82574-specific bits

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/net/e1000_regs.h | 345 +++-
 1 file changed, 342 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000_regs.h b/hw/net/e1000_regs.h
index 1c40244..d62b3fa 100644
--- a/hw/net/e1000_regs.h
+++ b/hw/net/e1000_regs.h
@@ -85,6 +85,7 @@
 #define E1000_DEV_ID_82573E  0x108B
 #define E1000_DEV_ID_82573E_IAMT 0x108C
 #define E1000_DEV_ID_82573L  0x109A
+#define E1000_DEV_ID_82574L  0x10D3
 #define E1000_DEV_ID_82546GB_QUAD_COPPER_KSP3 0x10B5
 #define E1000_DEV_ID_80003ES2LAN_COPPER_DPT 0x1096
 #define E1000_DEV_ID_80003ES2LAN_SERDES_DPT 0x1098
@@ -104,6 +105,7 @@
 #define E1000_PHY_ID2_82544x 0xC30
 #define E1000_PHY_ID2_8254xx_DEFAULT 0xC20 /* 82540x, 82545x, and 82546x */
 #define E1000_PHY_ID2_82573x 0xCC0
+#define E1000_PHY_ID2_82574x 0xCB1
 
 /* Register Set. (82543, 82544)
  *
@@ -135,8 +137,11 @@
 #define E1000_ITR  0x000C4  /* Interrupt Throttling Rate - RW */
 #define E1000_ICS  0x000C8  /* Interrupt Cause Set - WO */
 #define E1000_IMS  0x000D0  /* Interrupt Mask Set - RW */
+#define E1000_EIAC 0x000DC  /* Ext. Interrupt Auto Clear - RW */
 #define E1000_IMC  0x000D8  /* Interrupt Mask Clear - WO */
 #define E1000_IAM  0x000E0  /* Interrupt Acknowledge Auto Mask */
+#define E1000_IVAR 0x000E4  /* Interrupt Vector Allocation Register - RW */
+#define E1000_EITR 0x000E8  /* Extended Interrupt Throttling Rate - RW */
 #define E1000_RCTL 0x00100  /* RX Control - RW */
 #define E1000_RDTR10x02820  /* RX Delay Timer (1) - RW */
 #define E1000_RDBAL1   0x02900  /* RX Descriptor Base Address Low (1) - RW */
@@ -145,6 +150,7 @@
 #define E1000_RDH1 0x02910  /* RX Descriptor Head (1) - RW */
 #define E1000_RDT1 0x02918  /* RX Descriptor Tail (1) - RW */
 #define E1000_FCTTV0x00170  /* Flow Control Transmit Timer Value - RW */
+#define E1000_FCRTV0x05F40  /* Flow Control Refresh Timer Value - RW */
 #define E1000_TXCW 0x00178  /* TX Configuration Word - RW */
 #define E1000_RXCW 0x00180  /* RX Configuration Word - RO */
 #define E1000_TCTL 0x00400  /* TX Control - RW */
@@ -161,6 +167,10 @@
 #define E1000_PBM  0x1  /* Packet Buffer Memory - RW */
 #define E1000_PBS  0x01008  /* Packet Buffer Size - RW */
 #define E1000_EEMNGCTL 0x01010  /* MNG EEprom Control */
+#define E1000_EEMNGDATA0x01014 /* MNG EEPROM Read/Write data */
+#define E1000_FLMNGCTL 0x01018 /* MNG Flash Control */
+#define E1000_FLMNGDATA0x0101C /* MNG FLASH Read data */
+#define E1000_FLMNGCNT 0x01020 /* MNG FLASH Read Counter */
 #define E1000_FLASH_UPDATES 1000
 #define E1000_EEARBC   0x01024  /* EEPROM Auto Read Bus Control */
 #define E1000_FLASHT   0x01028  /* FLASH Timer Register */
@@ -169,9 +179,12 @@
 #define E1000_FLSWDATA 0x01034  /* FLASH data register */
 #define E1000_FLSWCNT  0x01038  /* FLASH Access Counter */
 #define E1000_FLOP 0x0103C  /* FLASH Opcode Register */
+#define E1000_FLOL 0x01050  /* FEEP Auto Load */
 #define E1000_ERT  0x02008  /* Early Rx Threshold - RW */
 #define E1000_FCRTL0x02160  /* Flow Control Receive Threshold Low - RW */
+#define E1000_FCRTL_A  0x00168  /* Alias to FCRTL */
 #define E1000_FCRTH0x02168  /* Flow Control Receive Threshold High - RW */
+#define E1000_FCRTH_A  0x00160  /* Alias to FCRTH */
 #define E1000_PSRCTL   0x02170  /* Packet Split Receive Control - RW */
 #define E1000_RDBAL0x02800  /* RX Descriptor Base Address Low - RW */
 #define E1000_RDBAH0x02804  /* RX Descriptor Base Address High - RW */
@@ -179,11 +192,17 @@
 #define E1000_RDH  0x02810  /* RX Descriptor Head - RW */
 #define E1000_RDT  0x02818  /* RX Descriptor Tail - RW */
 #define E1000_RDTR 0x02820  /* RX Delay Timer - RW */
+#define E1000_RDTR_A   0x00108  /* Alias to RDTR */
 #define E1000_RDBAL0   E1000_RDBAL /* RX Desc Base Address Low (0) - RW */
+#define E1000_RDBAL0_A 0x00110 /* Alias to RDBAL0 */
 #define E1000_RDBAH0   E1000_RDBAH /* RX Desc Base Address High (0) - RW */
+#define E1000_RDBAH0_A 0x00114 /* Alias to RDBAH0 */
 #define E1000_RDLEN0   E1000_RDLEN /* RX Desc Length (0) - RW */
+#define E1000_RDLEN0_A 0x00118 /* Alias to RDLEN0 */
 #define E1000_RDH0 E1000_RDH   /* RX Desc Head (0) - RW */
+#define E1000_RDH0_A   0x00120 /* Alias to RDH0 */
 #define E1000_RDT0 E1000_RDT   /* RX Desc Tail (0) - RW */
+#define E1000_RDT0_A   0x00128 /* Alias to RDT0 */
 #define E1000_RDTR0E1000_RDTR  /* RX Delay Timer (0) - RW */
 #define E1000_RXDCTL   0x02828  /* RX Descriptor Control queue 0 - RW */
 #define E1000_RXDCTL1  0x02928  /* RX Descriptor Control queue 1 - RW */
@@ -192,22 +211,33 @@
 #define E1000_RAID 0x02C08  /* Receive Ack Interrupt Delay - RW */
 #define E1000_TXDMAC   0x03000  /* TX DMA Co

[Qemu-devel] [PULL V3 14/20] vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

To make this device and network packets
abstractions ready for IOMMU.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/net/net_tx_pkt.c | 16 +++-
 hw/net/net_tx_pkt.h |  5 +++--
 hw/net/vmxnet3.c| 51 ++-
 3 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index ad2258c..dbcbe23 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -20,6 +20,7 @@
 #include "net/checksum.h"
 #include "net/tap.h"
 #include "net/net.h"
+#include "hw/pci/pci.h"
 
 enum {
 NET_TX_PKT_VHDR_FRAG = 0,
@@ -30,6 +31,8 @@ enum {
 
 /* TX packet private context */
 struct NetTxPkt {
+PCIDevice *pci_dev;
+
 struct virtio_net_hdr virt_hdr;
 bool has_virt_hdr;
 
@@ -54,11 +57,13 @@ struct NetTxPkt {
 bool is_loopback;
 };
 
-void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags,
-bool has_virt_hdr)
+void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
+uint32_t max_frags, bool has_virt_hdr)
 {
 struct NetTxPkt *p = g_malloc0(sizeof *p);
 
+p->pci_dev = pci_dev;
+
 p->vec = g_malloc((sizeof *p->vec) *
 (max_frags + NET_TX_PKT_PL_START_FRAG));
 
@@ -383,7 +388,8 @@ bool net_tx_pkt_add_raw_fragment(struct NetTxPkt *pkt, 
hwaddr pa,
 ventry = &pkt->raw[pkt->raw_frags];
 mapped_len = len;
 
-ventry->iov_base = cpu_physical_memory_map(pa, &mapped_len, false);
+ventry->iov_base = pci_dma_map(pkt->pci_dev, pa,
+   &mapped_len, DMA_DIRECTION_TO_DEVICE);
 
 if ((ventry->iov_base != NULL) && (len == mapped_len)) {
 ventry->iov_len = mapped_len;
@@ -444,8 +450,8 @@ void net_tx_pkt_reset(struct NetTxPkt *pkt)
 assert(pkt->raw);
 for (i = 0; i < pkt->raw_frags; i++) {
 assert(pkt->raw[i].iov_base);
-cpu_physical_memory_unmap(pkt->raw[i].iov_base, pkt->raw[i].iov_len,
-  false, pkt->raw[i].iov_len);
+pci_dma_unmap(pkt->pci_dev, pkt->raw[i].iov_base, pkt->raw[i].iov_len,
+  DMA_DIRECTION_TO_DEVICE, 0);
 }
 pkt->raw_frags = 0;
 
diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index e49772d..07b9a20 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -31,11 +31,12 @@ struct NetTxPkt;
  * Init function for tx packet functionality
  *
  * @pkt:packet pointer
+ * @pci_dev:PCI device processing this packet
  * @max_frags:  max tx ip fragments
  * @has_virt_hdr:   device uses virtio header.
  */
-void net_tx_pkt_init(struct NetTxPkt **pkt, uint32_t max_frags,
-bool has_virt_hdr);
+void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice *pci_dev,
+uint32_t max_frags, bool has_virt_hdr);
 
 /**
  * Clean all tx packet resources.
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 33cd07d..16645e6 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -802,7 +802,9 @@ vmxnet3_pop_rxc_descr(VMXNET3State *s, int qidx, uint32_t 
*descr_gen)
 hwaddr daddr =
 vmxnet3_ring_curr_cell_pa(&s->rxq_descr[qidx].comp_ring);
 
-cpu_physical_memory_read(daddr, &rxcd, sizeof(struct Vmxnet3_RxCompDesc));
+pci_dma_read(PCI_DEVICE(s), daddr,
+ &rxcd, sizeof(struct Vmxnet3_RxCompDesc));
+
 ring_gen = vmxnet3_ring_curr_gen(&s->rxq_descr[qidx].comp_ring);
 
 if (rxcd.gen != ring_gen) {
@@ -1023,10 +1025,11 @@ nocsum:
 }
 
 static void
-vmxnet3_physical_memory_writev(const struct iovec *iov,
-   size_t start_iov_off,
-   hwaddr target_addr,
-   size_t bytes_to_copy)
+vmxnet3_pci_dma_writev(PCIDevice *pci_dev,
+   const struct iovec *iov,
+   size_t start_iov_off,
+   hwaddr target_addr,
+   size_t bytes_to_copy)
 {
 size_t curr_off = 0;
 size_t copied = 0;
@@ -1036,9 +1039,9 @@ vmxnet3_physical_memory_writev(const struct iovec *iov,
 size_t chunk_len =
 MIN((curr_off + iov->iov_len) - start_iov_off, bytes_to_copy);
 
-cpu_physical_memory_write(target_addr + copied,
-  iov->iov_base + start_iov_off - curr_off,
-  chunk_len);
+pci_dma_write(pci_dev, target_addr + copied,
+  iov->iov_base + start_iov_off - curr_off,
+  chunk_len);
 
 copied += chunk_len;
 start_iov_off += chunk_len;
@@ -1088,15 +1091,15 @@ vmxnet3_indicate_packet(VMXNET3State *s)
 }
 
 chunk_size = MIN(bytes_left, rxd.len);
-vmxnet3_physical_memory_writev(data, bytes_copied,
-   le64_to_cpu(rxd.addr), chunk_size);
+vmxnet3_pci_dma_writev(PCI_DEVICE(s), data,

[Qemu-devel] [PULL V3 13/20] net_pkt: Extend packet abstraction as required by e1000e functionality

2016-05-25 Thread Jason Wang

This patch extends the TX/RX packet abstractions with features that will
be used by the e1000e device implementation.

Changes are:

  1. Support iovec lists for RX buffers
  2. Deeper RX packets parsing
  3. Loopback option for TX packets
  4. Extended VLAN headers handling
  5. RSS processing for RX packets

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/net/net_rx_pkt.c| 473 +
 hw/net/net_rx_pkt.h| 193 +++-
 hw/net/net_tx_pkt.c| 204 +
 hw/net/net_tx_pkt.h|  60 ++-
 include/net/checksum.h |   4 +-
 include/net/eth.h  | 153 +++-
 net/checksum.c |   7 +-
 net/eth.c  | 410 +-
 trace-events   |  39 
 9 files changed, 1335 insertions(+), 208 deletions(-)

diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
index 8a4f29f..1019b50 100644
--- a/hw/net/net_rx_pkt.c
+++ b/hw/net/net_rx_pkt.c
@@ -16,24 +16,16 @@
  */
 
 #include "qemu/osdep.h"
+#include "trace.h"
 #include "net_rx_pkt.h"
-#include "net/eth.h"
-#include "qemu-common.h"
-#include "qemu/iov.h"
 #include "net/checksum.h"
 #include "net/tap.h"
 
-/*
- * RX packet may contain up to 2 fragments - rebuilt eth header
- * in case of VLAN tag stripping
- * and payload received from QEMU - in any case
- */
-#define NET_MAX_RX_PACKET_FRAGMENTS (2)
-
 struct NetRxPkt {
 struct virtio_net_hdr virt_hdr;
-uint8_t ehdr_buf[ETH_MAX_L2_HDR_LEN];
-struct iovec vec[NET_MAX_RX_PACKET_FRAGMENTS];
+uint8_t ehdr_buf[sizeof(struct eth_header)];
+struct iovec *vec;
+uint16_t vec_len_total;
 uint16_t vec_len;
 uint32_t tot_len;
 uint16_t tci;
@@ -46,17 +38,31 @@ struct NetRxPkt {
 bool isip6;
 bool isudp;
 bool istcp;
+
+size_t l3hdr_off;
+size_t l4hdr_off;
+size_t l5hdr_off;
+
+eth_ip6_hdr_info ip6hdr_info;
+eth_ip4_hdr_info ip4hdr_info;
+eth_l4_hdr_info  l4hdr_info;
 };
 
 void net_rx_pkt_init(struct NetRxPkt **pkt, bool has_virt_hdr)
 {
 struct NetRxPkt *p = g_malloc0(sizeof *p);
 p->has_virt_hdr = has_virt_hdr;
+p->vec = NULL;
+p->vec_len_total = 0;
 *pkt = p;
 }
 
 void net_rx_pkt_uninit(struct NetRxPkt *pkt)
 {
+if (pkt->vec_len_total != 0) {
+g_free(pkt->vec);
+}
+
 g_free(pkt);
 }
 
@@ -66,33 +72,88 @@ struct virtio_net_hdr *net_rx_pkt_get_vhdr(struct NetRxPkt 
*pkt)
 return &pkt->virt_hdr;
 }
 
-void net_rx_pkt_attach_data(struct NetRxPkt *pkt, const void *data,
-   size_t len, bool strip_vlan)
+static inline void
+net_rx_pkt_iovec_realloc(struct NetRxPkt *pkt,
+int new_iov_len)
+{
+if (pkt->vec_len_total < new_iov_len) {
+g_free(pkt->vec);
+pkt->vec = g_malloc(sizeof(*pkt->vec) * new_iov_len);
+pkt->vec_len_total = new_iov_len;
+}
+}
+
+static void
+net_rx_pkt_pull_data(struct NetRxPkt *pkt,
+const struct iovec *iov, int iovcnt,
+size_t ploff)
+{
+if (pkt->vlan_stripped) {
+net_rx_pkt_iovec_realloc(pkt, iovcnt + 1);
+
+pkt->vec[0].iov_base = pkt->ehdr_buf;
+pkt->vec[0].iov_len = sizeof(pkt->ehdr_buf);
+
+pkt->tot_len =
+iov_size(iov, iovcnt) - ploff + sizeof(struct eth_header);
+
+pkt->vec_len = iov_copy(pkt->vec + 1, pkt->vec_len_total - 1,
+iov, iovcnt, ploff, pkt->tot_len);
+} else {
+net_rx_pkt_iovec_realloc(pkt, iovcnt);
+
+pkt->tot_len = iov_size(iov, iovcnt) - ploff;
+pkt->vec_len = iov_copy(pkt->vec, pkt->vec_len_total,
+iov, iovcnt, ploff, pkt->tot_len);
+}
+
+eth_get_protocols(pkt->vec, pkt->vec_len, &pkt->isip4, &pkt->isip6,
+  &pkt->isudp, &pkt->istcp,
+  &pkt->l3hdr_off, &pkt->l4hdr_off, &pkt->l5hdr_off,
+  &pkt->ip6hdr_info, &pkt->ip4hdr_info, &pkt->l4hdr_info);
+
+trace_net_rx_pkt_parsed(pkt->isip4, pkt->isip6, pkt->isudp, pkt->istcp,
+pkt->l3hdr_off, pkt->l4hdr_off, pkt->l5hdr_off);
+}
+
+void net_rx_pkt_attach_iovec(struct NetRxPkt *pkt,
+const struct iovec *iov, int iovcnt,
+size_t iovoff, bool strip_vlan)
 {
 uint16_t tci = 0;
-uint16_t ploff;
+uint16_t ploff = iovoff;
 assert(pkt);
 pkt->vlan_stripped = false;
 
 if (strip_vlan) {
-pkt->vlan_stripped = eth_strip_vlan(data, pkt->ehdr_buf, &ploff, &tci);
+pkt->vlan_stripped = eth_strip_vlan(iov, iovcnt, iovoff, pkt->ehdr_buf,
+&ploff, &tci);
 }
 
-if (pkt->vlan_stripped) {
-pkt->vec[0].iov_base = pkt->ehdr_buf;
-pkt->vec[0].iov_len = ploff - sizeof(struct vlan_header);

[Qemu-devel] [PULL V3 18/20] e1000e: Introduce qtest for e1000e device

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 tests/Makefile  |   3 +
 tests/e1000e-test.c | 480 
 2 files changed, 483 insertions(+)
 create mode 100644 tests/e1000e-test.c

diff --git a/tests/Makefile b/tests/Makefile
index e328b81..b196489 100644
--- a/tests/Makefile
+++ b/tests/Makefile
@@ -142,6 +142,8 @@ gcov-files-virtio-y += $(gcov-files-virtioserial-y)
 
 check-qtest-pci-y += tests/e1000-test$(EXESUF)
 gcov-files-pci-y += hw/net/e1000.c
+check-qtest-pci-y += tests/e1000e-test$(EXESUF)
+gcov-files-pci-y += hw/net/e1000e.c hw/net/e1000e_core.c
 check-qtest-pci-y += tests/rtl8139-test$(EXESUF)
 gcov-files-pci-y += hw/net/rtl8139.c
 check-qtest-pci-y += tests/pcnet-test$(EXESUF)
@@ -550,6 +552,7 @@ tests/i440fx-test$(EXESUF): tests/i440fx-test.o 
$(libqos-pc-obj-y)
 tests/q35-test$(EXESUF): tests/q35-test.o $(libqos-pc-obj-y)
 tests/fw_cfg-test$(EXESUF): tests/fw_cfg-test.o $(libqos-pc-obj-y)
 tests/e1000-test$(EXESUF): tests/e1000-test.o
+tests/e1000e-test$(EXESUF): tests/e1000e-test.o $(libqos-pc-obj-y)
 tests/rtl8139-test$(EXESUF): tests/rtl8139-test.o $(libqos-pc-obj-y)
 tests/pcnet-test$(EXESUF): tests/pcnet-test.o
 tests/eepro100-test$(EXESUF): tests/eepro100-test.o
diff --git a/tests/e1000e-test.c b/tests/e1000e-test.c
new file mode 100644
index 000..d6e6311
--- /dev/null
+++ b/tests/e1000e-test.c
@@ -0,0 +1,480 @@
+ /*
+ * QTest testcase for e1000e NIC
+ *
+ * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Leonid Bloch 
+ * Yan Vugenfirer 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+
+#include "qemu/osdep.h"
+#include 
+#include "libqtest.h"
+#include "qemu-common.h"
+#include "libqos/pci-pc.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/bitops.h"
+#include "libqos/malloc.h"
+#include "libqos/malloc-pc.h"
+#include "libqos/malloc-generic.h"
+
+#define E1000E_IMS  (0x00d0)
+
+#define E1000E_STATUS   (0x0008)
+#define E1000E_STATUS_LU BIT(1)
+#define E1000E_STATUS_ASDV1000 BIT(9)
+
+#define E1000E_CTRL (0x)
+#define E1000E_CTRL_RESET BIT(26)
+
+#define E1000E_RCTL (0x0100)
+#define E1000E_RCTL_EN  BIT(1)
+#define E1000E_RCTL_UPE BIT(3)
+#define E1000E_RCTL_MPE BIT(4)
+
+#define E1000E_RFCTL (0x5008)
+#define E1000E_RFCTL_EXTEN  BIT(15)
+
+#define E1000E_TCTL (0x0400)
+#define E1000E_TCTL_EN  BIT(1)
+
+#define E1000E_CTRL_EXT (0x0018)
+#define E1000E_CTRL_EXT_DRV_LOADBIT(28)
+#define E1000E_CTRL_EXT_TXLSFLOWBIT(22)
+
+#define E1000E_RX0_MSG_ID   (0)
+#define E1000E_TX0_MSG_ID   (1)
+#define E1000E_OTHER_MSG_ID (2)
+
+#define E1000E_IVAR (0x00E4)
+#define E1000E_IVAR_TEST_CFG((E1000E_RX0_MSG_ID << 0)| BIT(3)  | \
+ (E1000E_TX0_MSG_ID << 8)| BIT(11) | \
+ (E1000E_OTHER_MSG_ID << 16) | BIT(19) | \
+ BIT(31))
+
+#define E1000E_RING_LEN (0x1000)
+#define E1000E_TXD_LEN  (16)
+#define E1000E_RXD_LEN  (16)
+
+#define E1000E_TDBAL(0x3800)
+#define E1000E_TDBAH(0x3804)
+#define E1000E_TDLEN(0x3808)
+#define E1000E_TDH  (0x3810)
+#define E1000E_TDT  (0x3818)
+
+#define E1000E_RDBAL(0x2800)
+#define E1000E_RDBAH(0x2804)
+#define E1000E_RDLEN(0x2808)
+#define E1000E_RDH  (0x2810)
+#define E1000E_RDT  (0x2818)
+
+typedef struct {
+QPCIDevice *pci_dev;
+void *mac_regs;
+
+uint64_t tx_ring;
+uint64_t rx_ring;
+} e1000e_device;
+
+static int test_sockets[2];
+static QGuestAllocator *test_alloc;
+static QPCIBus *test_bus;
+
+static void e1000e_pci_foreach_callback(QPCIDevice *dev, int devfn, void *data)
+{
+*(QPCIDevice **) data = dev;
+}
+
+static QPCIDevice *e1000e_device_find(QPCIBus *bus)
+{
+static const int e1000e_vendor_id = 0x8086;
+static const int e1000e_dev_id = 0x10D3;
+
+QPCIDevice *e1000e_dev = NULL;
+
+qpci_device_foreach(bus, e1000e_vendor_id, e1000e_dev_id,
+e1000e_pci_foreach_callback, &e1000e_dev);
+
+g_assert_nonnull(e1000e_dev);

[Qemu-devel] [PULL V3 19/20] net: vl: Move default_net to vl.c

2016-05-25 Thread Jason Wang

From: Eduardo Habkost 

All handling of defaults (default_* variables) is inside vl.c,
move default_net there too, so we can more easily refactor that
code later.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Eduardo Habkost 
Signed-off-by: Jason Wang 
---
 include/net/net.h |  1 -
 net/net.c | 23 ---
 vl.c  | 24 +++-
 3 files changed, 23 insertions(+), 25 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 129d46b..422bc3e 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -183,7 +183,6 @@ struct NICInfo {
 
 extern int nb_nics;
 extern NICInfo nd_table[MAX_NICS];
-extern int default_net;
 extern const char *host_net_devices[];
 
 /* from net.c */
diff --git a/net/net.c b/net/net.c
index 0ad6217..1680b68 100644
--- a/net/net.c
+++ b/net/net.c
@@ -76,8 +76,6 @@ const char *host_net_devices[] = {
 NULL,
 };
 
-int default_net = 1;
-
 /***/
 /* network device redirectors */
 
@@ -1415,18 +1413,6 @@ void net_check_clients(void)
 NetClientState *nc;
 int i;
 
-/* Don't warn about the default network setup that you get if
- * no command line -net or -netdev options are specified. There
- * are two cases that we would otherwise complain about:
- * (1) board doesn't support a NIC but the implicit "-net nic"
- * requested one
- * (2) CONFIG_SLIRP not set, in which case the implicit "-net nic"
- * sets up a nic that isn't connected to anything.
- */
-if (default_net) {
-return;
-}
-
 net_hub_check_clients();
 
 QTAILQ_FOREACH(nc, &net_clients, next) {
@@ -1483,14 +1469,6 @@ int net_init_clients(void)
 {
 QemuOptsList *net = qemu_find_opts("net");
 
-if (default_net) {
-/* if no clients, we use a default config */
-qemu_opts_set(net, NULL, "type", "nic", &error_abort);
-#ifdef CONFIG_SLIRP
-qemu_opts_set(net, NULL, "type", "user", &error_abort);
-#endif
-}
-
 net_change_state_entry =
 qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL);
 
@@ -1521,7 +1499,6 @@ int net_client_parse(QemuOptsList *opts_list, const char 
*optarg)
 return -1;
 }
 
-default_net = 0;
 return 0;
 }
 
diff --git a/vl.c b/vl.c
index 18d1423..2f74fe8 100644
--- a/vl.c
+++ b/vl.c
@@ -207,6 +207,7 @@ static int default_floppy = 1;
 static int default_cdrom = 1;
 static int default_sdcard = 1;
 static int default_vga = 1;
+static int default_net = 1;
 
 static struct {
 const char *driver;
@@ -3267,11 +3268,13 @@ int main(int argc, char **argv, char **envp)
 fd_bootchk = 0;
 break;
 case QEMU_OPTION_netdev:
+default_net = 0;
 if (net_client_parse(qemu_find_opts("netdev"), optarg) == -1) {
 exit(1);
 }
 break;
 case QEMU_OPTION_net:
+default_net = 0;
 if (net_client_parse(qemu_find_opts("net"), optarg) == -1) {
 exit(1);
 }
@@ -4361,6 +4364,14 @@ int main(int argc, char **argv, char **envp)
 /* clean up network at qemu process termination */
 atexit(&net_cleanup);
 
+if (default_net) {
+QemuOptsList *net = qemu_find_opts("net");
+qemu_opts_set(net, NULL, "type", "nic", &error_abort);
+#ifdef CONFIG_SLIRP
+qemu_opts_set(net, NULL, "type", "user", &error_abort);
+#endif
+}
+
 if (net_init_clients() < 0) {
 exit(1);
 }
@@ -4509,7 +4520,18 @@ int main(int argc, char **argv, char **envp)
 /* Did we create any drives that we failed to create a device for? */
 drive_check_orphaned();
 
-net_check_clients();
+/* Don't warn about the default network setup that you get if
+ * no command line -net or -netdev options are specified. There
+ * are two cases that we would otherwise complain about:
+ * (1) board doesn't support a NIC but the implicit "-net nic"
+ * requested one
+ * (2) CONFIG_SLIRP not set, in which case the implicit "-net nic"
+ * sets up a nic that isn't connected to anything.
+ */
+if (!default_net) {
+net_check_clients();
+}
+
 
 if (boot_once) {
 qemu_boot_set(boot_once, &error_fatal);
-- 
2.7.4

[Qemu-devel] [PULL V3 10/20] vmxnet3: Use common MAC address tracing macros

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/net/vmxnet3.c  | 8 
 hw/net/vmxnet_debug.h | 3 ---
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 586e915..200d2ea 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -474,7 +474,7 @@ static void vmxnet3_set_variable_mac(VMXNET3State *s, 
uint32_t h, uint32_t l)
 s->conf.macaddr.a[4] = VMXNET3_GET_BYTE(h, 0);
 s->conf.macaddr.a[5] = VMXNET3_GET_BYTE(h, 1);
 
-VMW_CFPRN("Variable MAC: " VMXNET_MF, VMXNET_MA(s->conf.macaddr.a));
+VMW_CFPRN("Variable MAC: " MAC_FMT, MAC_ARG(s->conf.macaddr.a));
 
 qemu_format_nic_info_str(qemu_get_queue(s->nic), s->conf.macaddr.a);
 }
@@ -1219,7 +1219,7 @@ static void vmxnet3_reset_interrupt_states(VMXNET3State 
*s)
 static void vmxnet3_reset_mac(VMXNET3State *s)
 {
 memcpy(&s->conf.macaddr.a, &s->perm_mac.a, sizeof(s->perm_mac.a));
-VMW_CFPRN("MAC address set to: " VMXNET_MF, VMXNET_MA(s->conf.macaddr.a));
+VMW_CFPRN("MAC address set to: " MAC_FMT, MAC_ARG(s->conf.macaddr.a));
 }
 
 static void vmxnet3_deactivate_device(VMXNET3State *s)
@@ -1301,7 +1301,7 @@ static void vmxnet3_update_mcast_filters(VMXNET3State *s)
 cpu_physical_memory_read(mcast_list_pa, s->mcast_list, list_bytes);
 VMW_CFPRN("Current multicast list len is %d:", s->mcast_list_len);
 for (i = 0; i < s->mcast_list_len; i++) {
-VMW_CFPRN("\t" VMXNET_MF, VMXNET_MA(s->mcast_list[i].a));
+VMW_CFPRN("\t" MAC_FMT, MAC_ARG(s->mcast_list[i].a));
 }
 }
 }
@@ -2102,7 +2102,7 @@ static void vmxnet3_net_init(VMXNET3State *s)
 
 s->link_status_and_speed = VMXNET3_LINK_SPEED | VMXNET3_LINK_STATUS_UP;
 
-VMW_CFPRN("Permanent MAC: " VMXNET_MF, VMXNET_MA(s->perm_mac.a));
+VMW_CFPRN("Permanent MAC: " MAC_FMT, MAC_ARG(s->perm_mac.a));
 
 s->nic = qemu_new_nic(&net_vmxnet3_info, &s->conf,
   object_get_typename(OBJECT(s)),
diff --git a/hw/net/vmxnet_debug.h b/hw/net/vmxnet_debug.h
index 96495db..5aab00b 100644
--- a/hw/net/vmxnet_debug.h
+++ b/hw/net/vmxnet_debug.h
@@ -142,7 +142,4 @@
 } \
 } while (0)
 
-#define VMXNET_MF   "%02X:%02X:%02X:%02X:%02X:%02X"
-#define VMXNET_MA(a)(a)[0], (a)[1], (a)[2], (a)[3], (a)[4], (a)[5]
-
 #endif /* _QEMU_VMXNET3_DEBUG_H  */
-- 
2.7.4

[Qemu-devel] [PULL V3 11/20] net_pkt: Name vmxnet3 packet abstractions more generic

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

This patch drops "vmx" prefix from packet abstractions names
to emphasize the fact they are generic and not tied to any
specific network device.

These abstractions will be reused by e1000e emulation implementation
introduced by following patches so their names need generalization.

This patch (except renamed files, adjusted comments and changes in MAINTAINTERS)
was produced by:

git grep -lz 'vmxnet_tx_pkt' | xargs -0 perl -i'' -pE 
"s/vmxnet_tx_pkt/net_tx_pkt/g"
git grep -lz 'vmxnet_rx_pkt' | xargs -0 perl -i'' -pE 
"s/vmxnet_rx_pkt/net_rx_pkt/g"
git grep -lz 'VmxnetTxPkt' | xargs -0 perl -i'' -pE "s/VmxnetTxPkt/NetTxPkt/g"
git grep -lz 'VMXNET_TX_PKT' | xargs -0 perl -i'' -pE 
"s/VMXNET_TX_PKT/NET_TX_PKT/g"
git grep -lz 'VmxnetRxPkt' | xargs -0 perl -i'' -pE "s/VmxnetRxPkt/NetRxPkt/g"
git grep -lz 'VMXNET_RX_PKT' | xargs -0 perl -i'' -pE 
"s/VMXNET_RX_PKT/NET_RX_PKT/g"
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_rx_pkt.c
sed -ie 's/VMXNET_/NET_/g' hw/net/vmxnet_tx_pkt.c

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 MAINTAINERS|   8 +
 hw/net/Makefile.objs   |   2 +-
 hw/net/net_rx_pkt.c| 187 
 hw/net/net_rx_pkt.h| 174 +++
 hw/net/net_tx_pkt.c| 581 +
 hw/net/net_tx_pkt.h| 146 +
 hw/net/vmxnet3.c   |  88 
 hw/net/vmxnet_rx_pkt.c | 187 
 hw/net/vmxnet_rx_pkt.h | 174 ---
 hw/net/vmxnet_tx_pkt.c | 581 -
 hw/net/vmxnet_tx_pkt.h | 146 -
 tests/Makefile |   4 +-
 12 files changed, 1143 insertions(+), 1135 deletions(-)
 create mode 100644 hw/net/net_rx_pkt.c
 create mode 100644 hw/net/net_rx_pkt.h
 create mode 100644 hw/net/net_tx_pkt.c
 create mode 100644 hw/net/net_tx_pkt.h
 delete mode 100644 hw/net/vmxnet_rx_pkt.c
 delete mode 100644 hw/net/vmxnet_rx_pkt.h
 delete mode 100644 hw/net/vmxnet_tx_pkt.c
 delete mode 100644 hw/net/vmxnet_tx_pkt.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 81e7fac..dc5e536 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -953,6 +953,14 @@ S: Maintained
 F: hw/*/xilinx_*
 F: include/hw/xilinx.h
 
+Network packet abstractions
+M: Dmitry Fleytman 
+S: Maintained
+F: include/net/eth.h
+F: net/eth.c
+F: hw/net/net_rx_pkt*
+F: hw/net/net_tx_pkt*
+
 Vmware
 M: Dmitry Fleytman 
 S: Maintained
diff --git a/hw/net/Makefile.objs b/hw/net/Makefile.objs
index 64d0449..527d264 100644
--- a/hw/net/Makefile.objs
+++ b/hw/net/Makefile.objs
@@ -8,7 +8,7 @@ common-obj-$(CONFIG_PCNET_PCI) += pcnet-pci.o
 common-obj-$(CONFIG_PCNET_COMMON) += pcnet.o
 common-obj-$(CONFIG_E1000_PCI) += e1000.o
 common-obj-$(CONFIG_RTL8139_PCI) += rtl8139.o
-common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet_tx_pkt.o vmxnet_rx_pkt.o
+common-obj-$(CONFIG_VMXNET3_PCI) += net_tx_pkt.o net_rx_pkt.o
 common-obj-$(CONFIG_VMXNET3_PCI) += vmxnet3.o
 
 common-obj-$(CONFIG_SMC91C111) += smc91c111.o
diff --git a/hw/net/net_rx_pkt.c b/hw/net/net_rx_pkt.c
new file mode 100644
index 000..8a4f29f
--- /dev/null
+++ b/hw/net/net_rx_pkt.c
@@ -0,0 +1,187 @@
+/*
+ * QEMU RX packets abstractions
+ *
+ * Copyright (c) 2012 Ravello Systems LTD (http://ravellosystems.com)
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Tamir Shomer 
+ * Yan Vugenfirer 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "net_rx_pkt.h"
+#include "net/eth.h"
+#include "qemu-common.h"
+#include "qemu/iov.h"
+#include "net/checksum.h"
+#include "net/tap.h"
+
+/*
+ * RX packet may contain up to 2 fragments - rebuilt eth header
+ * in case of VLAN tag stripping
+ * and payload received from QEMU - in any case
+ */
+#define NET_MAX_RX_PACKET_FRAGMENTS (2)
+
+struct NetRxPkt {
+struct virtio_net_hdr virt_hdr;
+uint8_t ehdr_buf[ETH_MAX_L2_HDR_LEN];
+struct iovec vec[NET_MAX_RX_PACKET_FRAGMENTS];
+uint16_t vec_len;
+uint32_t tot_len;
+uint16_t tci;
+bool vlan_stripped;
+bool has_virt_hdr;
+eth_pkt_types_e packet_type;
+
+/* Analysis results */
+bool isip4;
+bool isip6;
+bool isudp;
+bool istcp;
+};
+
+void net_rx_pkt_init(struct NetRxPkt **pkt, bool has_virt_hdr)
+{
+struct NetRxPkt *p = g_malloc0(sizeof *p);
+p->has_virt_hdr = has_virt_hdr;
+*pkt = p;
+}
+
+void net_rx_pkt_uninit(struct NetRxPkt *pkt)
+{
+g_free(pkt);
+}
+
+struct virtio_net_hdr *net_rx_pkt_get_vhdr(struct NetRxPkt *pkt)
+{
+assert(pkt);
+return &pkt->virt_hdr;
+}
+
+void net_rx_pkt_attach_data(struct NetRxPkt *pkt, const void *data,
+   size_t len, bool strip_vlan)
+{
+uint16_t tci = 0;
+uint16_t ploff;
+assert(pkt);
+pkt->vlan_stripped = false;
+
+if (strip_vlan) {
+p

[Qemu-devel] [PULL V3 09/20] net: Add macros for MAC address tracing

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

These macros will be used by future commits introducing
e1000e device emulation and by vmxnet3 tracing code.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 include/net/net.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/include/net/net.h b/include/net/net.h
index 73e4c46..129d46b 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -9,6 +9,11 @@
 #include "migration/vmstate.h"
 #include "qapi-types.h"
 
+#define MAC_FMT "%02X:%02X:%02X:%02X:%02X:%02X"
+#define MAC_ARG(x) ((uint8_t *)(x))[0], ((uint8_t *)(x))[1], \
+   ((uint8_t *)(x))[2], ((uint8_t *)(x))[3], \
+   ((uint8_t *)(x))[4], ((uint8_t *)(x))[5]
+
 #define MAX_QUEUE_NUM 1024
 
 /* Maximum GSO packet size (64k) plus plenty of room for
-- 
2.7.4

[Qemu-devel] [PULL V3 08/20] net: Introduce Toeplitz hash calculator

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 include/net/checksum.h | 45 +
 1 file changed, 45 insertions(+)

diff --git a/include/net/checksum.h b/include/net/checksum.h
index 7de1acb..dd8b4f6 100644
--- a/include/net/checksum.h
+++ b/include/net/checksum.h
@@ -18,6 +18,7 @@
 #ifndef QEMU_NET_CHECKSUM_H
 #define QEMU_NET_CHECKSUM_H
 
+#include "qemu/bswap.h"
 struct iovec;
 
 uint32_t net_checksum_add_cont(int len, uint8_t *buf, int seq);
@@ -50,4 +51,48 @@ uint32_t net_checksum_add_iov(const struct iovec *iov,
   const unsigned int iov_cnt,
   uint32_t iov_off, uint32_t size);
 
+typedef struct toeplitz_key_st {
+uint32_t leftmost_32_bits;
+uint8_t *next_byte;
+} net_toeplitz_key;
+
+static inline
+void net_toeplitz_key_init(net_toeplitz_key *key, uint8_t *key_bytes)
+{
+key->leftmost_32_bits = be32_to_cpu(*(uint32_t *)key_bytes);
+key->next_byte = key_bytes + sizeof(uint32_t);
+}
+
+static inline
+void net_toeplitz_add(uint32_t *result,
+  uint8_t *input,
+  uint32_t len,
+  net_toeplitz_key *key)
+{
+register uint32_t accumulator = *result;
+register uint32_t leftmost_32_bits = key->leftmost_32_bits;
+register uint32_t byte;
+
+for (byte = 0; byte < len; byte++) {
+register uint8_t input_byte = input[byte];
+register uint8_t key_byte = *(key->next_byte++);
+register uint8_t bit;
+
+for (bit = 0; bit < 8; bit++) {
+if (input_byte & (1 << 7)) {
+accumulator ^= leftmost_32_bits;
+}
+
+leftmost_32_bits =
+(leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
+
+input_byte <<= 1;
+key_byte <<= 1;
+}
+}
+
+key->leftmost_32_bits = leftmost_32_bits;
+*result = accumulator;
+}
+
 #endif /* QEMU_NET_CHECKSUM_H */
-- 
2.7.4

[Qemu-devel] [PULL V3 05/20] pcie: Add support for PCIe CAP v1

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Added support for PCIe CAP v1, while reusing some of the existing v2
infrastructure.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/pci/pcie.c  | 84 --
 include/hw/pci/pcie.h  |  4 +++
 include/hw/pci/pcie_regs.h |  5 +--
 3 files changed, 73 insertions(+), 20 deletions(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 728386a..24cfc3b 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -43,26 +43,15 @@
 /***
  * pci express capability helper functions
  */
-int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
-{
-int pos;
-uint8_t *exp_cap;
-
-assert(pci_is_express(dev));
-
-pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset,
- PCI_EXP_VER2_SIZEOF);
-if (pos < 0) {
-return pos;
-}
-dev->exp.exp_cap = pos;
-exp_cap = dev->config + pos;
 
+static void
+pcie_cap_v1_fill(uint8_t *exp_cap, uint8_t port, uint8_t type, uint8_t version)
+{
 /* capability register
-   interrupt message number defaults to 0 */
+interrupt message number defaults to 0 */
 pci_set_word(exp_cap + PCI_EXP_FLAGS,
  ((type << PCI_EXP_FLAGS_TYPE_SHIFT) & PCI_EXP_FLAGS_TYPE) |
- PCI_EXP_FLAGS_VER2);
+ version);
 
 /* device capability register
  * table 7-12:
@@ -81,7 +70,27 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
 
 pci_set_word(exp_cap + PCI_EXP_LNKSTA,
  PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25 |PCI_EXP_LNKSTA_DLLLA);
+}
+
+int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port)
+{
+/* PCIe cap v2 init */
+int pos;
+uint8_t *exp_cap;
+
+assert(pci_is_express(dev));
+
+pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, PCI_EXP_VER2_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+dev->exp.exp_cap = pos;
+exp_cap = dev->config + pos;
+
+/* Filling values common with v1 */
+pcie_cap_v1_fill(exp_cap, port, type, PCI_EXP_FLAGS_VER2);
 
+/* Filling v2 specific values */
 pci_set_long(exp_cap + PCI_EXP_DEVCAP2,
  PCI_EXP_DEVCAP2_EFF | PCI_EXP_DEVCAP2_EETLPP);
 
@@ -89,7 +98,29 @@ int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t 
type, uint8_t port)
 return pos;
 }
 
-int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
+int pcie_cap_v1_init(PCIDevice *dev, uint8_t offset, uint8_t type,
+ uint8_t port)
+{
+/* PCIe cap v1 init */
+int pos;
+uint8_t *exp_cap;
+
+assert(pci_is_express(dev));
+
+pos = pci_add_capability(dev, PCI_CAP_ID_EXP, offset, PCI_EXP_VER1_SIZEOF);
+if (pos < 0) {
+return pos;
+}
+dev->exp.exp_cap = pos;
+exp_cap = dev->config + pos;
+
+pcie_cap_v1_fill(exp_cap, port, type, PCI_EXP_FLAGS_VER1);
+
+return pos;
+}
+
+static int
+pcie_endpoint_cap_common_init(PCIDevice *dev, uint8_t offset, uint8_t cap_size)
 {
 uint8_t type = PCI_EXP_TYPE_ENDPOINT;
 
@@ -102,7 +133,19 @@ int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
 type = PCI_EXP_TYPE_RC_END;
 }
 
-return pcie_cap_init(dev, offset, type, 0);
+return (cap_size == PCI_EXP_VER1_SIZEOF)
+? pcie_cap_v1_init(dev, offset, type, 0)
+: pcie_cap_init(dev, offset, type, 0);
+}
+
+int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset)
+{
+return pcie_endpoint_cap_common_init(dev, offset, PCI_EXP_VER2_SIZEOF);
+}
+
+int pcie_endpoint_cap_v1_init(PCIDevice *dev, uint8_t offset)
+{
+return pcie_endpoint_cap_common_init(dev, offset, PCI_EXP_VER1_SIZEOF);
 }
 
 void pcie_cap_exit(PCIDevice *dev)
@@ -110,6 +153,11 @@ void pcie_cap_exit(PCIDevice *dev)
 pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER2_SIZEOF);
 }
 
+void pcie_cap_v1_exit(PCIDevice *dev)
+{
+pci_del_capability(dev, PCI_CAP_ID_EXP, PCI_EXP_VER1_SIZEOF);
+}
+
 uint8_t pcie_cap_get_type(const PCIDevice *dev)
 {
 uint32_t pos = dev->exp.exp_cap;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b48a7a2..cbbf0c5 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -80,8 +80,12 @@ struct PCIExpressDevice {
 
 /* PCI express capability helper functions */
 int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type, uint8_t port);
+int pcie_cap_v1_init(PCIDevice *dev, uint8_t offset,
+ uint8_t type, uint8_t port);
 int pcie_endpoint_cap_init(PCIDevice *dev, uint8_t offset);
 void pcie_cap_exit(PCIDevice *dev);
+int pcie_endpoint_cap_v1_init(PCIDevice *dev, uint8_t offset);
+void pcie_cap_v1_exit(PCIDevice *dev);
 uint8_t pcie_cap_get_type(const PCIDevice *dev);
 void pcie_cap_flags_set_vector(PCIDevice *dev, uint8_t vector);
 uint8_t pcie_cap_flags_get_vector(PCIDev

[Qemu-devel] [PULL V3 07/20] vmxnet3: Use generic function for DSN capability definition

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/net/vmxnet3.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 20f26b7..586e915 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -2255,9 +2255,9 @@ static const MemoryRegionOps b1_ops = {
 },
 };
 
-static uint8_t *vmxnet3_device_serial_num(VMXNET3State *s)
+static uint64_t vmxnet3_device_serial_num(VMXNET3State *s)
 {
-static uint64_t dsn_payload;
+uint64_t dsn_payload;
 uint8_t *dsnp = (uint8_t *)&dsn_payload;
 
 dsnp[0] = 0xfe;
@@ -2268,7 +2268,7 @@ static uint8_t *vmxnet3_device_serial_num(VMXNET3State *s)
 dsnp[5] = s->conf.macaddr.a[1];
 dsnp[6] = s->conf.macaddr.a[2];
 dsnp[7] = 0xff;
-return dsnp;
+return dsn_payload;
 }
 
 static void vmxnet3_pci_realize(PCIDevice *pci_dev, Error **errp)
@@ -2313,10 +2313,8 @@ static void vmxnet3_pci_realize(PCIDevice *pci_dev, 
Error **errp)
 pcie_endpoint_cap_init(pci_dev, VMXNET3_EXP_EP_OFFSET);
 }
 
-pcie_add_capability(pci_dev, PCI_EXT_CAP_ID_DSN, 0x1,
-VMXNET3_DSN_OFFSET, PCI_EXT_CAP_DSN_SIZEOF);
-memcpy(pci_dev->config + VMXNET3_DSN_OFFSET + 4,
-   vmxnet3_device_serial_num(s), sizeof(uint64_t));
+pcie_dev_ser_num_init(pci_dev, VMXNET3_DSN_OFFSET,
+  vmxnet3_device_serial_num(s));
 }
 
 register_savevm(dev, "vmxnet3-msix", -1, 1,
-- 
2.7.4

[Qemu-devel] [PULL V3 06/20] pcie: Introduce function for DSN capability creation

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/pci/pcie.c | 10 ++
 include/hw/pci/pcie.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 24cfc3b..9599fde 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -695,3 +695,13 @@ void pcie_ari_init(PCIDevice *dev, uint16_t offset, 
uint16_t nextfn)
 offset, PCI_ARI_SIZEOF);
 pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & 0xff) << 8);
 }
+
+void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num)
+{
+static const int pci_dsn_ver = 1;
+static const int pci_dsn_cap = 4;
+
+pcie_add_capability(dev, PCI_EXT_CAP_ID_DSN, pci_dsn_ver, offset,
+PCI_EXT_CAP_DSN_SIZEOF);
+pci_set_quad(dev->config + offset + pci_dsn_cap, ser_num);
+}
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index cbbf0c5..056d25e 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -119,6 +119,7 @@ void pcie_add_capability(PCIDevice *dev,
  uint16_t offset, uint16_t size);
 
 void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
 
 extern const VMStateDescription vmstate_pcie_device;
 
-- 
2.7.4

[Qemu-devel] [PULL V3 03/20] msix: make msix_clr_pending() visible for clients

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

This function will be used by e1000e device code.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/pci/msix.c | 2 +-
 include/hw/pci/msix.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index b75f0e9..0ec1cb1 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -72,7 +72,7 @@ void msix_set_pending(PCIDevice *dev, unsigned int vector)
 *msix_pending_byte(dev, vector) |= msix_pending_mask(vector);
 }
 
-static void msix_clr_pending(PCIDevice *dev, int vector)
+void msix_clr_pending(PCIDevice *dev, int vector)
 {
 *msix_pending_byte(dev, vector) &= ~msix_pending_mask(vector);
 }
diff --git a/include/hw/pci/msix.h b/include/hw/pci/msix.h
index 72e5f93..048a29d 100644
--- a/include/hw/pci/msix.h
+++ b/include/hw/pci/msix.h
@@ -29,6 +29,7 @@ int msix_present(PCIDevice *dev);
 
 bool msix_is_masked(PCIDevice *dev, unsigned vector);
 void msix_set_pending(PCIDevice *dev, unsigned vector);
+void msix_clr_pending(PCIDevice *dev, int vector);
 
 int msix_vector_use(PCIDevice *dev, unsigned vector);
 void msix_vector_unuse(PCIDevice *dev, unsigned vector);
-- 
2.7.4

[Qemu-devel] [PULL V3 12/20] rtl8139: Move more TCP definitions to common header

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 hw/net/rtl8139.c  | 5 -
 include/net/eth.h | 8 
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 1e5ec14..562c1fd 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -1867,11 +1867,6 @@ static int rtl8139_transmit_one(RTL8139State *s, int 
descriptor)
 return 1;
 }
 
-/* structures and macros for task offloading */
-#define TCP_HEADER_DATA_OFFSET(tcp) (((be16_to_cpu(tcp->th_offset_flags) >> 
12)&0xf) << 2)
-#define TCP_FLAGS_ONLY(flags) ((flags)&0x3f)
-#define TCP_HEADER_FLAGS(tcp) TCP_FLAGS_ONLY(be16_to_cpu(tcp->th_offset_flags))
-
 #define TCP_HEADER_CLEAR_FLAGS(tcp, off) ((tcp)->th_offset_flags &= 
cpu_to_be16(~TCP_FLAGS_ONLY(off)))
 
 /* produces ones' complement sum of data */
diff --git a/include/net/eth.h b/include/net/eth.h
index 18d0be3..5a32259 100644
--- a/include/net/eth.h
+++ b/include/net/eth.h
@@ -67,6 +67,14 @@ typedef struct tcp_header {
 uint16_t th_urp;/* urgent pointer */
 } tcp_header;
 
+#define TCP_FLAGS_ONLY(flags) ((flags) & 0x3f)
+
+#define TCP_HEADER_FLAGS(tcp) \
+TCP_FLAGS_ONLY(be16_to_cpu((tcp)->th_offset_flags))
+
+#define TCP_HEADER_DATA_OFFSET(tcp) \
+(((be16_to_cpu((tcp)->th_offset_flags) >> 12) & 0xf) << 2)
+
 typedef struct udp_header {
 uint16_t uh_sport; /* source port */
 uint16_t uh_dport; /* destination port */
-- 
2.7.4

[Qemu-devel] [PULL V3 04/20] pci: Introduce define for PM capability version 1.1

2016-05-25 Thread Jason Wang

From: Dmitry Fleytman 

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Dmitry Fleytman 
Signed-off-by: Leonid Bloch 
Signed-off-by: Jason Wang 
---
 include/hw/pci/pci_regs.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index ba8cbe9..7a83142 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -1 +1,3 @@
 #include "standard-headers/linux/pci_regs.h"
+
+#define  PCI_PM_CAP_VER_1_1 0x0002  /* PCI PM spec ver. 1.1 */
-- 
2.7.4

[Qemu-devel] [PULL V3 01/20] net/tap: Allocating Large sized arrays to heap

2016-05-25 Thread Jason Wang

From: Zhou Jie 

net_init_tap has a huge stack usage of 8192 bytes approx.
Moving large arrays to heap to reduce stack usage.

Signed-off-by: Zhou Jie 
Signed-off-by: Jason Wang 
---
 net/tap.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 740e8a2..49817c7 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -769,8 +769,8 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 return -1;
 }
 } else if (tap->has_fds) {
-char *fds[MAX_TAP_QUEUES];
-char *vhost_fds[MAX_TAP_QUEUES];
+char **fds = g_new(char *, MAX_TAP_QUEUES);
+char **vhost_fds = g_new(char *, MAX_TAP_QUEUES);
 int nfds, nvhosts;
 
 if (tap->has_ifname || tap->has_script || tap->has_downscript ||
@@ -818,6 +818,8 @@ int net_init_tap(const NetClientOptions *opts, const char 
*name,
 return -1;
 }
 }
+g_free(fds);
+g_free(vhost_fds);
 } else if (tap->has_helper) {
 if (tap->has_ifname || tap->has_script || tap->has_downscript ||
 tap->has_vnet_hdr || tap->has_queues || tap->has_vhostfds) {
-- 
2.7.4

[Qemu-devel] [PULL V3 00/20] Net patches

2016-05-25 Thread Jason Wang

The following changes since commit 287db79df8af8e31f18e262feb5e05103a09e4d4:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2016-05-24 13:06:33 +0100)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 136796b070ddd09dd14ef73e77ae20419ba6554a:

  net/net: Add SocketReadState for reuse codes (2016-05-26 09:58:22 +0800)



Main changes:
- e1000e emulation
- convet vmxnet3 to use DMA api
Changes from V2:
- fix clang build
Changes from V1:
- fix 32bit build


Dmitry Fleytman (14):
  msix: make msix_clr_pending() visible for clients
  pci: Introduce define for PM capability version 1.1
  pcie: Add support for PCIe CAP v1
  pcie: Introduce function for DSN capability creation
  vmxnet3: Use generic function for DSN capability definition
  net: Introduce Toeplitz hash calculator
  net: Add macros for MAC address tracing
  vmxnet3: Use common MAC address tracing macros
  net_pkt: Name vmxnet3 packet abstractions more generic
  rtl8139: Move more TCP definitions to common header
  vmxnet3: Use pci_dma_* API instead of cpu_physical_memory_*
  e1000_regs: Add definitions for Intel 82574-specific bits
  e1000: Move out code that will be reused in e1000e
  e1000e: Introduce qtest for e1000e device

Eduardo Habkost (1):
  net: vl: Move default_net to vl.c

Jason Wang (2):
  net_pkt: Extend packet abstraction as required by e1000e functionality
  net: Introduce e1000e device emulation

Prasad J Pandit (1):
  net: mipsnet: check packet length against buffer

Zhang Chen (1):
  net/net: Add SocketReadState for reuse codes

Zhou Jie (1):
  net/tap: Allocating Large sized arrays to heap

 MAINTAINERS  |   18 +
 default-configs/pci.mak  |1 +
 hw/net/Makefile.objs |5 +-
 hw/net/e1000.c   |  411 +---
 hw/net/e1000_regs.h  |  349 ++-
 hw/net/e1000e.c  |  739 +++
 hw/net/e1000e_core.c | 3478 ++
 hw/net/e1000e_core.h |  146 ++
 hw/net/e1000x_common.c   |  267 +++
 hw/net/e1000x_common.h   |  213 ++
 hw/net/mipsnet.c |3 +
 hw/net/net_rx_pkt.c  |  600 ++
 hw/net/net_rx_pkt.h  |  363 
 hw/net/{vmxnet_tx_pkt.c => net_tx_pkt.c} |  358 +--
 hw/net/net_tx_pkt.h  |  191 ++
 hw/net/rtl8139.c |5 -
 hw/net/vmxnet3.c |  155 +-
 hw/net/vmxnet_debug.h|3 -
 hw/net/vmxnet_rx_pkt.c   |  187 --
 hw/net/vmxnet_rx_pkt.h   |  174 --
 hw/net/vmxnet_tx_pkt.h   |  146 --
 hw/pci/msix.c|2 +-
 hw/pci/pcie.c|   94 +-
 include/hw/pci/msix.h|1 +
 include/hw/pci/pci_regs.h|2 +
 include/hw/pci/pcie.h|5 +
 include/hw/pci/pcie_regs.h   |5 +-
 include/net/checksum.h   |   49 +-
 include/net/eth.h|  161 +-
 include/net/net.h|   19 +-
 net/checksum.c   |7 +-
 net/eth.c|  410 +++-
 net/filter-mirror.c  |   66 +-
 net/net.c|   93 +-
 net/socket.c |   77 +-
 net/tap.c|6 +-
 tests/Makefile   |7 +-
 tests/e1000e-test.c  |  480 +
 trace-events |  211 ++
 vl.c |   24 +-
 40 files changed, 8221 insertions(+), 1310 deletions(-)
 create mode 100644 hw/net/e1000e.c
 create mode 100644 hw/net/e1000e_core.c
 create mode 100644 hw/net/e1000e_core.h
 create mode 100644 hw/net/e1000x_common.c
 create mode 100644 hw/net/e1000x_common.h
 create mode 100644 hw/net/net_rx_pkt.c
 create mode 100644 hw/net/net_rx_pkt.h
 rename hw/net/{vmxnet_tx_pkt.c => net_tx_pkt.c} (53%)
 create mode 100644 hw/net/net_tx_pkt.h
 delete mode 100644 hw/net/vmxnet_rx_pkt.c
 delete mode 100644 hw/net/vmxnet_rx_pkt.h
 delete mode 100644 hw/net/vmxnet_tx_pkt.h
 create mode 100644 tests/e1000e-test.c

[Qemu-devel] [PULL V3 02/20] net: mipsnet: check packet length against buffer

2016-05-25 Thread Jason Wang

From: Prasad J Pandit 

When receiving packets over MIPSnet network device, it uses
receive buffer of size 1514 bytes. In case the controller
accepts large(MTU) packets, it could lead to memory corruption.
Add check to avoid it.

Reported by: Oleksandr Bazhaniuk 
Signed-off-by: Prasad J Pandit 
Signed-off-by: Jason Wang 
---
 hw/net/mipsnet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
index 740cd98..cf8b823 100644
--- a/hw/net/mipsnet.c
+++ b/hw/net/mipsnet.c
@@ -83,6 +83,9 @@ static ssize_t mipsnet_receive(NetClientState *nc, const 
uint8_t *buf, size_t si
 if (!mipsnet_can_receive(nc))
 return 0;
 
+if (size >= sizeof(s->rx_buffer)) {
+return 0;
+}
 s->busy = 1;
 
 /* Just accept everything. */
-- 
2.7.4

Re: [Qemu-devel] [PATCH qemu v16 05/19] vfio: Check that IOMMU MR translates to system address space

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:17PM +1000, Alexey Kardashevskiy wrote:
> At the moment IOMMU MR only translate to the system memory.
> However if some new code changes this, we will need clear indication why
> it is not working so here is the check.
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 

Alex,

I think this is a reasonable sanity check regardless of what happens
with the rest of the series.  Can you apply this?

> ---
> Changes:
> v15:
> * added some spaces
> 
> v14:
> * new to the series
> ---
>  hw/vfio/common.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f32cc49..6d23d0f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -266,6 +266,12 @@ static void vfio_iommu_map_notify(Notifier *n, void 
> *data)
>  
>  trace_vfio_iommu_map_notify(iova, iova + iotlb->addr_mask);
>  
> +if (iotlb->target_as != &address_space_memory) {
> +error_report("Wrong target AS \"%s\", only system memory is allowed",
> + iotlb->target_as->name ? iotlb->target_as->name : 
> "none");
> +return;
> +}
> +
>  /*
>   * The IOMMU TLB entry we have just covers translation through
>   * this IOMMU to its immediate target.  We need to translate

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu v16 03/19] memory: Fix IOMMU replay base address

2016-05-25 Thread David Gibson

On Wed, May 04, 2016 at 04:52:15PM +1000, Alexey Kardashevskiy wrote:
> Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
> when new VFIO listener is added, all existing IOMMU mappings are
> replayed. However there is a problem that the base address of
> an IOMMU memory region (IOMMU MR) is ignored which is not a problem
> for the existing user (which is pseries) with its default 32bit DMA
> window starting at 0 but it is if there is another DMA window.
> 
> This stores the IOMMU's offset_within_address_space and adjusts
> the IOVA before calling vfio_dma_map/vfio_dma_unmap.
> 
> As the IOMMU notifier expects IOVA offset rather than the absolute
> address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before
> calling notifier(s).
> 
> Signed-off-by: Alexey Kardashevskiy 
> Reviewed-by: David Gibson 

Alex, this is a real fix independent of the other stuff.  Can we apply
it ASAP.

> ---
> Changes:
> v15:
> * accounted section->offset_within_region
> * s/giommu->offset_within_address_space/giommu->iommu_offset/
> ---
>  hw/ppc/spapr_iommu.c  |  2 +-
>  hw/vfio/common.c  | 14 --
>  include/hw/vfio/vfio-common.h |  1 +
>  3 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 7dd4588..277f289 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -277,7 +277,7 @@ static target_ulong put_tce_emu(sPAPRTCETable *tcet, 
> target_ulong ioba,
>  tcet->table[index] = tce;
>  
>  entry.target_as = &address_space_memory,
> -entry.iova = ioba & page_mask;
> +entry.iova = (ioba - tcet->bus_offset) & page_mask;
>  entry.translated_addr = tce & page_mask;
>  entry.addr_mask = ~page_mask;
>  entry.perm = spapr_tce_iommu_access_flags(tce);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 0b40262..f32cc49 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -257,14 +257,14 @@ static void vfio_iommu_map_notify(Notifier *n, void 
> *data)
>  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>  VFIOContainer *container = giommu->container;
>  IOMMUTLBEntry *iotlb = data;
> +hwaddr iova = iotlb->iova + giommu->iommu_offset;
>  MemoryRegion *mr;
>  hwaddr xlat;
>  hwaddr len = iotlb->addr_mask + 1;
>  void *vaddr;
>  int ret;
>  
> -trace_vfio_iommu_map_notify(iotlb->iova,
> -iotlb->iova + iotlb->addr_mask);
> +trace_vfio_iommu_map_notify(iova, iova + iotlb->addr_mask);
>  
>  /*
>   * The IOMMU TLB entry we have just covers translation through
> @@ -291,21 +291,21 @@ static void vfio_iommu_map_notify(Notifier *n, void 
> *data)
>  
>  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>  vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -ret = vfio_dma_map(container, iotlb->iova,
> +ret = vfio_dma_map(container, iova,
> iotlb->addr_mask + 1, vaddr,
> !(iotlb->perm & IOMMU_WO) || mr->readonly);
>  if (ret) {
>  error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx", %p) = %d (%m)",
> - container, iotlb->iova,
> + container, iova,
>   iotlb->addr_mask + 1, vaddr, ret);
>  }
>  } else {
> -ret = vfio_dma_unmap(container, iotlb->iova, iotlb->addr_mask + 1);
> +ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
>  if (ret) {
>  error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx") = %d (%m)",
> - container, iotlb->iova,
> + container, iova,
>   iotlb->addr_mask + 1, ret);
>  }
>  }
> @@ -377,6 +377,8 @@ static void vfio_listener_region_add(MemoryListener 
> *listener,
>   */
>  giommu = g_malloc0(sizeof(*giommu));
>  giommu->iommu = section->mr;
> +giommu->iommu_offset = section->offset_within_address_space -
> +section->offset_within_region;
>  giommu->container = container;
>  giommu->n.notify = vfio_iommu_map_notify;
>  QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index eb0e1b0..c9b6622 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -90,6 +90,7 @@ typedef struct VFIOContainer {
>  typedef struct VFIOGuestIOMMU {
>  VFIOContainer *container;
>  MemoryRegion *iommu;
> +hwaddr iommu_offset;
>  Notifier n;
>  QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>  } VFIOGuestIOMMU;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_

Re: [Qemu-devel] [PATCH qemu v16 02/19] memory: Call region_del() callbacks on memory listener unregistering

2016-05-25 Thread David Gibson

On Thu, May 05, 2016 at 04:45:04PM -0600, Alex Williamson wrote:
> On Wed,  4 May 2016 16:52:14 +1000
> Alexey Kardashevskiy  wrote:
> 
> > When a new memory listener is registered, listener_add_address_space()
> > is called and which in turn calls region_add() callbacks of memory regions.
> > However when unregistering the memory listener, it is just removed from
> > the listening chain and no region_del() is called.
> > 
> > This adds listener_del_address_space() and uses it in
> > memory_listener_unregister(). listener_add_address_space() was used as
> > a template with the following changes:
> > s/log_global_start/log_global_stop/
> > s/log_start/log_stop/
> > s/region_add/region_del/
> > 
> > This will allow the following patches to add/remove DMA windows
> > dynamically from VFIO's PCI address space's region_add()/region_del().
> 
> Following patch 1 comments, it would be a bug if the kernel actually
> needed this to do cleanup, we must release everything if QEMU gets shot
> with a SIGKILL anyway.  So what does this cleanup facilitate in QEMU?
> Having QEMU trigger an unmap for each region_del is not going to be as
> efficient as just dropping the container and letting the kernel handle
> the cleanup all in one go.  Thanks,

So, what the kernel does is kind of a red herring, because that's only
relevant to the specific case of the VFIO listener, whereas this is a
change to the behaviour of all memory listeners.

It seems plausible that some memory listeners could have a legitimate
reason to want clean up region_del calls when unregistered.  But, we
know this could be expensive for other listeners, so I don't think we
should make that behaviour standard.

So I'd be thinking either a special unregister_with_delete() call, or
a standalone "delete all" helper function.

Assuming this is still needed at all, once the other changes to the
reference counting we've discussed have been done.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH qemu v16 01/19] vfio: Delay DMA address space listener release

2016-05-25 Thread David Gibson

On Wed, May 25, 2016 at 07:59:26AM -0600, Alex Williamson wrote:
> On Wed, 25 May 2016 16:34:37 +1000
> David Gibson  wrote:
> 
> > On Fri, May 13, 2016 at 04:24:53PM -0600, Alex Williamson wrote:
> > > On Fri, 13 May 2016 17:16:48 +1000
> > > Alexey Kardashevskiy  wrote:
> > >   
> > > > On 05/06/2016 08:39 AM, Alex Williamson wrote:  
> > > > > On Wed,  4 May 2016 16:52:13 +1000
> > > > > Alexey Kardashevskiy  wrote:
> > > > >
> > > > >> This postpones VFIO container deinitialization to let region_del()
> > > > >> callbacks (called via vfio_listener_release) do proper clean up
> > > > >> while the group is still attached to the container.
> > > > >
> > > > > Any mappings within the container should clean themselves up when the
> > > > > container is deprivleged by removing the last group in the kernel. Is
> > > > > the issue that that doesn't happen, which would be a spapr vfio kernel
> > > > > bug, or that our QEMU side structures get all out of whack if we let
> > > > > that happen?
> > > > 
> > > > My mailbase got corrupted, missed that.
> > > > 
> > > > This is mostly for "[PATCH qemu v16 17/19] spapr_iommu, vfio, memory: 
> > > > Notify IOMMU about starting/stopping being used by VFIO", I should have 
> > > > put 
> > > > 01/19 and 02/19 right before 17/19, sorry about that.  
> > > 
> > > Which I object to, it's just ridiculous to have vfio start/stop
> > > callbacks in a set of generic iommu region ops.  
> > 
> > It's ugly, but I don't actually see a better way to do this (the
> > general concept of having vfio start/stop callbacks, that is, not the
> > specifics of the patches).
> > 
> > The fact is that how we implement the guest side IOMMU *does* need to
> > change depending on whether VFIO devices are present or not. 
> 
> No, how the guest side iommu is implemented needs to change depending
> on whether there's someone, anyone, in QEMU that cares about the iommu,
> which can be determined by whether the iommu notifier has any clients.
> Alexey has posted another patch that does this.

*thinks*  ah, yes, you're right of course.  So instead we need some
hook that's triggered on transition of number of notifier listeners
from zero<->non-zero.

> > That's
> > due essentially to incompatibilities between a couple of kernel
> > mechanisms.  Which in itself is ugly, but nonetheless real.
> > 
> > A (usually blank) vfio on/off callback in the guest side IOMMU ops
> > seems like the least-bad way to handle this.
> 
> I disagree, we already call memory_region_register_iommu_notifier() to
> indicate we care about the guest iommu, so the abstraction is already
> there, there's absolutely no reason to make a vfio specific interface.
> Thanks,
> 
> Alex
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v4 00/15] Dirty bitmap changes for migration/persistence work

2016-05-25 Thread Fam Zheng

On Wed, 05/25 17:45, Vladimir Sementsov-Ogievskiy wrote:
> Hi!
> 
> Are you going to update the series in the near future?

Yes, probably in a couple days.

Fam

> 
> On 08.03.2016 07:44, Fam Zheng wrote:
> > v4: Rebase.
> >  Add rev-by from John in patches 1-5, 7, 8.
> >  Remove BdrvDirtyBitmap typedef from dirty-bitmap.h in patch 4. [Max]
> >  Add assertion on bm->meta in patch 9. [John]
> > 
> > Two major features are added to block dirty bitmap (and underlying HBitmap) 
> > in
> > this series: meta bitmap and serialization, together with all other 
> > supportive
> > patches.
> > 
> > Both operations are common in dirty bitmap migration and persistence: they 
> > need
> > to find whether and which part of the dirty bitmap in question has changed 
> > with
> > meta dirty bitmap, and they need to write it to the target with 
> > serialization.
> > 
> > 
> > Fam Zheng (13):
> >backup: Use Bitmap to replace "s->bitmap"
> >block: Include hbitmap.h in block.h
> >typedefs: Add BdrvDirtyBitmap
> >block: Move block dirty bitmap code to separate files
> >block: Remove unused typedef of BlockDriverDirtyHandler
> >block: Hide HBitmap in block dirty bitmap interface
> >HBitmap: Introduce "meta" bitmap to track bit changes
> >tests: Add test code for meta bitmap
> >block: Support meta dirty bitmap
> >block: Add two dirty bitmap getters
> >block: Assert that bdrv_release_dirty_bitmap succeeded
> >tests: Add test code for hbitmap serialization
> >block: More operations for meta dirty bitmap
> > 
> > Vladimir Sementsov-Ogievskiy (2):
> >hbitmap: serialization
> >block: BdrvDirtyBitmap serialization interface
> > 
> >   block.c  | 360 -
> >   block/Makefile.objs  |   2 +-
> >   block/backup.c   |  25 +-
> >   block/dirty-bitmap.c | 535 
> > +++
> >   block/mirror.c   |  15 +-
> >   include/block/block.h|  40 +---
> >   include/block/dirty-bitmap.h |  75 ++
> >   include/qemu/hbitmap.h   |  96 
> >   include/qemu/typedefs.h  |   2 +
> >   tests/test-hbitmap.c | 255 +
> >   util/hbitmap.c   | 203 ++--
> >   11 files changed, 1177 insertions(+), 431 deletions(-)
> >   create mode 100644 block/dirty-bitmap.c
> >   create mode 100644 include/block/dirty-bitmap.h
> > 
> 
> 
> -- 
> Best regards,
> Vladimir
> 
>

Re: [Qemu-devel] [RFC PATCH v4 1/3] Mediated device Core driver

2016-05-25 Thread Alex Williamson

On Wed, 25 May 2016 01:28:15 +0530
Kirti Wankhede  wrote:

> Design for Mediated Device Driver:
> Main purpose of this driver is to provide a common interface for mediated
> device management that can be used by differnt drivers of different
> devices.
> 
> This module provides a generic interface to create the device, add it to
> mediated bus, add device to IOMMU group and then add it to vfio group.
> 
> Below is the high Level block diagram, with Nvidia, Intel and IBM devices
> as example, since these are the devices which are going to actively use
> this module as of now.
> 
>  +---+
>  |   |
>  | +---+ |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |  mdev | +>+  |<-> VFIO user
>  | |  bus  | | probe()/remove()| vfio_mpci.ko |APIs
>  | |  driver   | | |  |
>  | |   | | +--+
>  | |   | |  mdev_register_driver() +--+
>  | |   | +<+ __init() |
>  | |   | | |  |
>  | |   | +>+  |<-> VFIO user
>  | +---+ | probe()/remove()| vfio_mccw.ko |APIs
>  |   | |  |
>  |  MDEV CORE| +--+
>  |   MODULE  |
>  |   mdev.ko |
>  | +---+ |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | |  nvidia.ko   |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | | Physical  | |
>  | |  device   | |  mdev_register_device() +--+
>  | | interface | |<+  |
>  | |   | | |  i915.ko |<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | |   | |
>  | |   | |  mdev_register_device() +--+
>  | |   | +<+  |
>  | |   | | | ccw_device.ko|<-> physical
>  | |   | +>+  |device
>  | |   | |callback +--+
>  | +---+ |
>  +---+
> 
> Core driver provides two types of registration interfaces:
> 1. Registration interface for mediated bus driver:
> 
> /**
>   * struct mdev_driver - Mediated device's driver
>   * @name: driver name
>   * @probe: called when new device created
>   * @remove:called when device removed
>   * @match: called when new device or driver is added for this bus.
>   Return 1 if given device can be handled by given driver and
>   zero otherwise.
>   * @driver:device driver structure
>   *
>   **/
> struct mdev_driver {
>  const char *name;
>  int  (*probe)  (struct device *dev);
>  void (*remove) (struct device *dev);
>int  (*match)(struct device *dev);
>  struct device_driverdriver;
> };
> 
> int  mdev_register_driver(struct mdev_driver *drv, struct module *owner);
> void mdev_unregister_driver(struct mdev_driver *drv);
> 
> Mediated device's driver for mdev should use this interface to register
> with Core driver. With this, mediated devices driver for such devices is
> responsible to add mediated device to VFIO group.
> 
> 2. Physical device driver interface
> This interface provides vendor driver the set APIs to manage physical
> device related work in their own driver. APIs are :
> - supported_config: provide supported configuration list by the vendor
>   driver
> - create: to allocate basic resources in vendor driver for a mediated
> device.
> - destroy: to free resources in vendor driver when mediated device is
>  destroyed.
> - start: to initiate mediated device initialization process from vendor
>driver when VM boots and before QEMU starts.
> - shutdown: to teardown mediated device resources during VM teardown.
> - read : read emulation callback.
> - write: write emulation callback.
> - set_irqs: send interrupt configuration information that QEMU sets.
> - get_region_info: to provide region size and its flags for the mediated
>  device.
> - validate_map_request: to validate remap pfn request.

nit, vfio is a userspace driver interface where QEMU is simply a user
of that interface.  We should never assume QEMU is the only user.

> 
> This registration interface should be used by vendor drivers to register
> each physical device to mdev core driver.
> 
> Signed-off-by: Kirti Wankhede

[Qemu-devel] [PULL 0/1] QOM devices patch queue 2016-05-25

2016-05-25 Thread Andreas Färber

Hello Peter,

This is my QOM (devices) patch queue. Please pull.

I've needed to build-fix it twice by now, so if I fixed the #includes wrongly
please pick it up as patch and tweak it or apply a cleanup on top.

Thanks,
Andreas

P.S. I don't seem to have a MAINTAINERS patch to go with it yet, but part of
the discussion prompting the patch was about creating more fine-grained sections
reflecting maintenance reality, so we might want to consider adding a new one?
The intended follow-up was refactoring tree-walking and device reset for s390x.

Cc: Peter Maydell 
Cc: Eduardo Habkost 
Cc: David Hildenbrand 

The following changes since commit 287db79df8af8e31f18e262feb5e05103a09e4d4:

  Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into 
staging (2016-05-24 13:06:33 +0100)

are available in the git repository at:

  git://github.com/afaerber/qemu-cpu.git tags/qom-devices-for-peter

for you to fetch changes up to 8cce06115b6f69e4596e9f81455140759cca8c9d:

  qdev: Start disentangling bus from device (2016-05-25 23:24:35 +0200)


QOM infrastructure fixes and device conversions

* Start splitting up qdev.c


Andreas Färber (1):
  qdev: Start disentangling bus from device

 hw/core/Makefile.objs |   1 +
 hw/core/bus.c | 251 ++
 hw/core/qdev.c| 222 
 3 files changed, 252 insertions(+), 222 deletions(-)
 create mode 100644 hw/core/bus.c

[Qemu-devel] [PULL 1/1] qdev: Start disentangling bus from device

2016-05-25 Thread Andreas Färber

Move bus type and related APIs to a separate file bus.c.
This is a first step in breaking up qdev.c into more manageable chunks.

Reviewed-by: Peter Maydell 
[AF: Rebased onto osdep.h]
Signed-off-by: Andreas Färber 
---
 hw/core/Makefile.objs |   1 +
 hw/core/bus.c | 251 ++
 hw/core/qdev.c| 222 
 3 files changed, 252 insertions(+), 222 deletions(-)
 create mode 100644 hw/core/bus.c

diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 70951d4..82a9ef8 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -1,5 +1,6 @@
 # core qdev-related obj files, also used by *-user:
 common-obj-y += qdev.o qdev-properties.o
+common-obj-y += bus.o
 common-obj-y += fw-path-provider.o
 # irq.o needed for qdev GPIO handling:
 common-obj-y += irq.o
diff --git a/hw/core/bus.c b/hw/core/bus.c
new file mode 100644
index 000..3e3f8ac
--- /dev/null
+++ b/hw/core/bus.c
@@ -0,0 +1,251 @@
+/*
+ *  Dynamic device configuration and creation -- buses.
+ *
+ *  Copyright (c) 2009 CodeSourcery
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+#include "hw/qdev.h"
+#include "qapi/error.h"
+
+static void qbus_set_hotplug_handler_internal(BusState *bus, Object *handler,
+  Error **errp)
+{
+
+object_property_set_link(OBJECT(bus), OBJECT(handler),
+ QDEV_HOTPLUG_HANDLER_PROPERTY, errp);
+}
+
+void qbus_set_hotplug_handler(BusState *bus, DeviceState *handler, Error 
**errp)
+{
+qbus_set_hotplug_handler_internal(bus, OBJECT(handler), errp);
+}
+
+void qbus_set_bus_hotplug_handler(BusState *bus, Error **errp)
+{
+qbus_set_hotplug_handler_internal(bus, OBJECT(bus), errp);
+}
+
+int qbus_walk_children(BusState *bus,
+   qdev_walkerfn *pre_devfn, qbus_walkerfn *pre_busfn,
+   qdev_walkerfn *post_devfn, qbus_walkerfn *post_busfn,
+   void *opaque)
+{
+BusChild *kid;
+int err;
+
+if (pre_busfn) {
+err = pre_busfn(bus, opaque);
+if (err) {
+return err;
+}
+}
+
+QTAILQ_FOREACH(kid, &bus->children, sibling) {
+err = qdev_walk_children(kid->child,
+ pre_devfn, pre_busfn,
+ post_devfn, post_busfn, opaque);
+if (err < 0) {
+return err;
+}
+}
+
+if (post_busfn) {
+err = post_busfn(bus, opaque);
+if (err) {
+return err;
+}
+}
+
+return 0;
+}
+
+static void qbus_realize(BusState *bus, DeviceState *parent, const char *name)
+{
+const char *typename = object_get_typename(OBJECT(bus));
+BusClass *bc;
+char *buf;
+int i, len, bus_id;
+
+bus->parent = parent;
+
+if (name) {
+bus->name = g_strdup(name);
+} else if (bus->parent && bus->parent->id) {
+/* parent device has id -> use it plus parent-bus-id for bus name */
+bus_id = bus->parent->num_child_bus;
+
+len = strlen(bus->parent->id) + 16;
+buf = g_malloc(len);
+snprintf(buf, len, "%s.%d", bus->parent->id, bus_id);
+bus->name = buf;
+} else {
+/* no id -> use lowercase bus type plus global bus-id for bus name */
+bc = BUS_GET_CLASS(bus);
+bus_id = bc->automatic_ids++;
+
+len = strlen(typename) + 16;
+buf = g_malloc(len);
+len = snprintf(buf, len, "%s.%d", typename, bus_id);
+for (i = 0; i < len; i++) {
+buf[i] = qemu_tolower(buf[i]);
+}
+bus->name = buf;
+}
+
+if (bus->parent) {
+QLIST_INSERT_HEAD(&bus->parent->child_bus, bus, sibling);
+bus->parent->num_child_bus++;
+object_property_add_child(OBJECT(bus->parent), bus->name, OBJECT(bus), 
NULL);
+object_unref(OBJECT(bus));
+} else if (bus != sysbus_get_default()) {
+/* TODO: once all bus devices are qdevified,
+   only reset handler for main_system_bus should be registered here. */
+qemu_register_reset(qbus_reset_all_fn, bus);
+}
+}
+
+static void bus_unparent(Object *obj)
+{
+BusState *bus = BUS(obj);
+BusChild *kid;
+
+while ((kid = QTAILQ_FIRST(&bus->

[Qemu-devel] [PATCH v1 1/1] zynqmp: Add the ZCU102 board

2016-05-25 Thread Alistair Francis

Most Zynq UltraScale+ users will be targetting and using the ZCU102
board instead of the development focused EP108. To make our QEMU machine
names clearer add a ZCU102 machine model.

Signed-off-by: Alistair Francis 
---
There are differences between the two boards, but at the moment we don't
support those differences in QEMU so it doesn't make sense to use a new
init function. So for the time being this machine is the same as the
EP108 machine.

 hw/arm/xlnx-ep108.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/arm/xlnx-ep108.c b/hw/arm/xlnx-ep108.c
index 40f7cc1..34b4641 100644
--- a/hw/arm/xlnx-ep108.c
+++ b/hw/arm/xlnx-ep108.c
@@ -114,3 +114,11 @@ static void xlnx_ep108_machine_init(MachineClass *mc)
 }
 
 DEFINE_MACHINE("xlnx-ep108", xlnx_ep108_machine_init)
+
+static void xlnx_zcu102_machine_init(MachineClass *mc)
+{
+mc->desc = "Xilinx ZynqMP ZCU102 board";
+mc->init = xlnx_ep108_init;
+}
+
+DEFINE_MACHINE("xlnx-zcu102", xlnx_zcu102_machine_init)
-- 
2.7.4

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Jianjun Duan



On 05/25/2016 12:22 PM, Paolo Bonzini wrote:
>> 1 QTAILQ should only be accessed using the interfaces defined in
>> queue.h. Its structs should not be directly used. So I created
>> interfaces in queue.h to query about its layout. If the implementation
>> is changed, these interfaces should be changed accordingly. Code using
>> these interfaces should not break.
> 
> You don't need to query the layout, as long as the knowledge
> remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
> returns constant values, you can just put the knowledge of the offsets
> directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.
> 
>> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
>> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
>> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
>> parameters to be passed in. So it is not redundant if we only want to
>> only use the interfaces.
> 
> No, you only need two.  The other four are internal to qemu/queue.h.
> Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
> know about their offsets, only the fields that contain them.
> 

In vmstate_load/put_state usually we use a VMSD to describe the type for
dump/load purpose. The metadata for the QTAILQ is for the same purpose.
Of course we can encode the information in a VMSD or VMStateField if
we don't have to change much.

The user may only care the position of head and entry. But to
implement QTAILQ_RAW_***, we need more offset information than that.
If we don't query the offsets using something like offset() and store
it in a metadata, we have to make the assumption that all the pointer
types have the same size. Since in vmstate_load/put_state we don't have
any type information about the QTAILQ instance, we cannot use offset()
in QTAILQ_RAW_*** macros. May have to stick the constants there for
first/last/next/prev in QTAILQ_RAW_***. Not sure if it will work for
all arches.
>> 3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
>> queue, or a list, or another recursive structure. To make it
>> extensible, I think a metadata is needed. The idea is for any
>> structure which needs special handling, customized metadata/put/get
>> should provide enough flexibility to hack around.
> 
> I think your solution is a bit overengineered.  If the metadata can
> fit in the VMStateField, you can use VMStateField.
> 
> Thanks,
> 
> Paolo
> 

Thanks,
Jianjun

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Pranith Kumar

Hi Richard,

Thank you for the helpful comments.

On Wed, May 25, 2016 at 1:35 PM, Richard Henderson  wrote:
> On 05/24/2016 10:18 AM, Pranith Kumar wrote:
>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>> index 92be341..93ea42e 100644
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -100,6 +100,7 @@ extern bool have_bmi1;
>>  #define TCG_TARGET_HAS_muls2_i321
>>  #define TCG_TARGET_HAS_muluh_i320
>>  #define TCG_TARGET_HAS_mulsh_i320
>> +#define TCG_TARGET_HAS_fence1
>
>
> This has to be defined for all hosts.

OK. I will add an entry in tcg.h with default 0 and override in
individual architecture once it is implemented.

>> @@ -347,6 +347,7 @@ static inline int
>> tcg_target_const_match(tcg_target_long val, TCGType type,
>>  #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
>>  #define OPC_TESTL  (0x85)
>>  #define OPC_XCHG_ax_r32(0x90)
>> +#define OPC_MFENCE  (0xAE | P_EXT)
>
> Why define OPC_MFENCE if you're not going to use it?  Of course, it's not
> exactly a complete and useful definition, so maybe just delete OPC_MFENCE.

I want to use OPC_MFENCE instead of hard-coding the value in
tcg_out_fence(), but as you said the definition is not complete(it
currently generates only 0x0FAE). I am trying to figure out how to
generate 0x0FAEF0 using the definition.

>
> Also, for 32-bit you need to check for sse2 before outputting this.  See
> also the existing cpuid checks in tcg_target_init and the fallback smp_mb
> definition for pre-gcc-4.4.

OK, I'll check the current code and do something similar.

Thanks,
-- 
Pranith

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Sergey Fedorov

On 25/05/16 22:59, Pranith Kumar wrote:
> On Wed, May 25, 2016 at 3:43 PM, Sergey Fedorov  wrote:
>> I think it would better not to defer native support for the operation.
>> It should be relatively simple instruction. Otherwise we could wind up
>> deferring this indefinitely.
>>
> Agreed. I will go with the native generation for now.

I mean we'd better implement native support for all the supported host
architectures right away.

Kind regards,
Sergey

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Pranith Kumar

On Wed, May 25, 2016 at 3:43 PM, Sergey Fedorov  wrote:
>
> I think it would better not to defer native support for the operation.
> It should be relatively simple instruction. Otherwise we could wind up
> deferring this indefinitely.
>

Agreed. I will go with the native generation for now.

Thanks,
-- 
Pranith

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Pranith Kumar

On Wed, May 25, 2016 at 3:25 PM, Alex Bennée  wrote:
> Should we make the emitting of the function call/TCGop conditional on
> MTTCG being enabled? If we are running in round-robin mode there is no
> need to issue any fence operations.
>

Also, we should check if SMP(> 1 processors) is enabled since fences
are not necessary on UP systems.

-- 
Pranith

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Richard Henderson


On 05/25/2016 12:25 PM, Alex Bennée wrote:

That would solves the problem of converting the various backends
piecemeal - although obviously we should move to all backends having
"native" support ASAP. However by introducing expensive substitute
functions we will slow down the translations as each front end is
expanded to translate the target barrier ops.


Obviously.  We could in fact do that all up front if desired.  It doesn't take 
long to look up the barrier instructions for each isa.




Should we make the emitting of the function call/TCGop conditional on
MTTCG being enabled? If we are running in round-robin mode there is no
need to issue any fence operations.


Probably.  But to keep the translators clean we should probably hide that 
within tcg_gen_fence().



r~

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Sergey Fedorov

On 25/05/16 22:25, Alex Bennée wrote:
> Richard Henderson  writes:
>> On 05/24/2016 10:18 AM, Pranith Kumar wrote:
>>> Signed-off-by: Pranith Kumar 
>>> ---
>>>  tcg/i386/tcg-target.h | 1 +
>>>  tcg/i386/tcg-target.inc.c | 9 +
>>>  tcg/tcg-opc.h | 2 +-
>>>  tcg/tcg.c | 1 +
>>>  4 files changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>>> index 92be341..93ea42e 100644
>>> --- a/tcg/i386/tcg-target.h
>>> +++ b/tcg/i386/tcg-target.h
>>> @@ -100,6 +100,7 @@ extern bool have_bmi1;
>>>  #define TCG_TARGET_HAS_muls2_i321
>>>  #define TCG_TARGET_HAS_muluh_i320
>>>  #define TCG_TARGET_HAS_mulsh_i320
>>> +#define TCG_TARGET_HAS_fence1
>> This has to be defined for all hosts.
>>
>> The default implementation should be a function call into tcg-runtime.c that
>> calls smp_mb().
> That would solves the problem of converting the various backends
> piecemeal - although obviously we should move to all backends having
> "native" support ASAP. However by introducing expensive substitute
> functions we will slow down the translations as each front end is
> expanded to translate the target barrier ops.

I think it would better not to defer native support for the operation.
It should be relatively simple instruction. Otherwise we could wind up
deferring this indefinitely.

> Should we make the emitting of the function call/TCGop conditional on
> MTTCG being enabled? If we are running in round-robin mode there is no
> need to issue any fence operations.

Good idea.

Kind regards,
Sergey

Re: [Qemu-devel] [RFC PATCH 2/3] tcg: Add support for fence generation in x86 backend

2016-05-25 Thread Alex Bennée


Richard Henderson  writes:

> On 05/24/2016 10:18 AM, Pranith Kumar wrote:
>> Signed-off-by: Pranith Kumar 
>> ---
>>  tcg/i386/tcg-target.h | 1 +
>>  tcg/i386/tcg-target.inc.c | 9 +
>>  tcg/tcg-opc.h | 2 +-
>>  tcg/tcg.c | 1 +
>>  4 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
>> index 92be341..93ea42e 100644
>> --- a/tcg/i386/tcg-target.h
>> +++ b/tcg/i386/tcg-target.h
>> @@ -100,6 +100,7 @@ extern bool have_bmi1;
>>  #define TCG_TARGET_HAS_muls2_i321
>>  #define TCG_TARGET_HAS_muluh_i320
>>  #define TCG_TARGET_HAS_mulsh_i320
>> +#define TCG_TARGET_HAS_fence1
>
> This has to be defined for all hosts.
>
> The default implementation should be a function call into tcg-runtime.c that
> calls smp_mb().

That would solves the problem of converting the various backends
piecemeal - although obviously we should move to all backends having
"native" support ASAP. However by introducing expensive substitute
functions we will slow down the translations as each front end is
expanded to translate the target barrier ops.

Should we make the emitting of the function call/TCGop conditional on
MTTCG being enabled? If we are running in round-robin mode there is no
need to issue any fence operations.

>
>> @@ -347,6 +347,7 @@ static inline int tcg_target_const_match(tcg_target_long 
>> val, TCGType type,
>>  #define OPC_SHRX(0xf7 | P_EXT38 | P_SIMDF2)
>>  #define OPC_TESTL   (0x85)
>>  #define OPC_XCHG_ax_r32 (0x90)
>> +#define OPC_MFENCE  (0xAE | P_EXT)
>>
>>  #define OPC_GRP3_Ev (0xf7)
>>  #define OPC_GRP5(0xff)
>> @@ -686,6 +687,14 @@ static inline void tcg_out_pushi(TCGContext *s, 
>> tcg_target_long val)
>>  }
>>  }
>>
>> +static inline void tcg_out_fence(TCGContext *s)
>> +{
>> +/* TODO: Figure out an appropriate place for the encoding */
>> +tcg_out8(s, 0x0F);
>> +tcg_out8(s, 0xAE);
>> +tcg_out8(s, 0xF0);
>> +}
>
> Why define OPC_MFENCE if you're not going to use it?  Of course, it's not
> exactly a complete and useful definition, so maybe just delete OPC_MFENCE.
>
> Also, for 32-bit you need to check for sse2 before outputting this.  See also
> the existing cpuid checks in tcg_target_init and the fallback smp_mb 
> definition
> for pre-gcc-4.4.
>
>
> r~


--
Alex Bennée

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Paolo Bonzini

> 1 QTAILQ should only be accessed using the interfaces defined in
> queue.h. Its structs should not be directly used. So I created
> interfaces in queue.h to query about its layout. If the implementation
> is changed, these interfaces should be changed accordingly. Code using
> these interfaces should not break.

You don't need to query the layout, as long as the knowledge
remains hidden in QTAILQ_RAW_* macros.  And because QTAILQ_*_OFFSET
returns constant values, you can just put the knowledge of the offsets
directly in QTAILQ_RAW_FOREACH and QTAILQ_RAW_INSERT_TAIL.

> 2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
> doesn't exactly know the structs of QTAILAQ head and entry. So pointer
> arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
> parameters to be passed in. So it is not redundant if we only want to
> only use the interfaces.

No, you only need two.  The other four are internal to qemu/queue.h.
Just like QTAILQ users do not know about tqh_* and tqe_*, they need not
know about their offsets, only the fields that contain them.

> 3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
> queue, or a list, or another recursive structure. To make it
> extensible, I think a metadata is needed. The idea is for any
> structure which needs special handling, customized metadata/put/get
> should provide enough flexibility to hack around.

I think your solution is a bit overengineered.  If the metadata can
fit in the VMStateField, you can use VMStateField.

Thanks,

Paolo

[Qemu-devel] [PATCH v7 2/3] generic-loader: Add a generic loader

2016-05-25 Thread Alistair Francis

Add a generic loader to QEMU which can be used to load images or set
memory values.

Signed-off-by: Alistair Francis 
---
V7:
 - Rebase
V6:
 - Add error checking
V5:
 - Rebase
V4:
 - Allow the loader to work with every architecture
 - Move the file to hw/core
 - Increase the maximum number of CPUs
 - Make the CPU operations conditional
 - Convert the cpu option to cpu-num
 - Require the user to specify endianess
V3:
 - Pass the ram_size to load_image_targphys()
V2:
 - Add maintainers entry
 - Perform bounds checking
 - Register and unregister the reset in the realise/unrealise
Changes since RFC:
 - Add BE support

 MAINTAINERS  |   6 ++
 hw/core/Makefile.objs|   2 +
 hw/core/generic-loader.c | 170 +++
 include/hw/core/generic-loader.h |  45 +++
 4 files changed, 223 insertions(+)
 create mode 100644 hw/core/generic-loader.c
 create mode 100644 include/hw/core/generic-loader.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 81e7fac..b4a77d8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -972,6 +972,12 @@ F: hw/acpi/nvdimm.c
 F: hw/mem/nvdimm.c
 F: include/hw/mem/nvdimm.h
 
+Generic Loader
+M: Alistair Francis 
+S: Maintained
+F: hw/core/generic-loader.c
+F: include/hw/core/generic-loader.h
+
 Subsystems
 --
 Audio
diff --git a/hw/core/Makefile.objs b/hw/core/Makefile.objs
index 70951d4..eb00e7d 100644
--- a/hw/core/Makefile.objs
+++ b/hw/core/Makefile.objs
@@ -15,3 +15,5 @@ common-obj-$(CONFIG_SOFTMMU) += null-machine.o
 common-obj-$(CONFIG_SOFTMMU) += loader.o
 common-obj-$(CONFIG_SOFTMMU) += qdev-properties-system.o
 common-obj-$(CONFIG_PLATFORM_BUS) += platform-bus.o
+
+obj-$(CONFIG_SOFTMMU) += generic-loader.o
diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
new file mode 100644
index 000..7160d58
--- /dev/null
+++ b/hw/core/generic-loader.c
@@ -0,0 +1,170 @@
+/*
+ * Generic Loader
+ *
+ * Copyright (C) 2014 Li Guang
+ * Copyright (C) 2016 Xilinx Inc.
+ * Written by Li Guang 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ */
+
+#include "qemu/osdep.h"
+#include "qom/cpu.h"
+#include "hw/sysbus.h"
+#include "sysemu/dma.h"
+#include "hw/loader.h"
+#include "qapi/error.h"
+#include "hw/core/generic-loader.h"
+
+#define CPU_NONE 0x
+
+static void generic_loader_reset(void *opaque)
+{
+GenericLoaderState *s = GENERIC_LOADER(opaque);
+
+if (s->cpu) {
+CPUClass *cc = CPU_GET_CLASS(s->cpu);
+cpu_reset(s->cpu);
+if (cc) {
+cc->set_pc(s->cpu, s->addr);
+}
+}
+
+if (s->data_len) {
+assert(s->data_len < sizeof(s->data));
+dma_memory_write((s->cpu ? s->cpu : first_cpu)->as, s->addr, &s->data,
+ s->data_len);
+}
+}
+
+static void generic_loader_realize(DeviceState *dev, Error **errp)
+{
+GenericLoaderState *s = GENERIC_LOADER(dev);
+hwaddr entry;
+int big_endian;
+int size = 0;
+
+/* Perform some eror checking on the users options */
+if (s->data || s->data_len  || s->data_be) {
+/* User is loading memory values */
+if (s->file) {
+error_setg(errp, "Specifying a file is not supported when loading "
+   "memory values.");
+return;
+} else if (s->force_raw) {
+error_setg(errp, "Specifying force raw is not supported when "
+   "loading memory values.");
+return;
+} else if (!s->data || !s->data_len) {
+error_setg(errp, "Both data and data length must be specified.");
+return;
+}
+} else if (s->file || s->force_raw)  {
+/* User is loading an image */
+if (s->data || s->data_len || s->data_be) {
+error_setg(errp, "Data can not be specified when loading an "
+   "image.");
+return;
+}
+}
+
+qemu_register_reset(generic_loader_reset, dev);
+
+if (s->cpu_num != CPU_NONE) {
+s->cpu = qemu_get_cpu(s->cpu_num);
+if (!s->cpu) {
+error_setg(errp, "Specified boot CPU#%d is nonexistent",
+   s->cpu_num);
+return;
+}
+}
+
+#ifdef TARGET_WORDS_BIGENDIAN
+big_endian = 1;
+#else
+big_endian = 0;
+#endif
+
+if (s->file) {
+if (!s->force_raw) {
+size = load_elf(s->file, NULL, NULL, &entry, NULL, NULL,
+big_endian, 0, 0, 0);
+
+if (size < 0) {
+size = load

[Qemu-devel] [PATCH v7 0/3] Add a generic loader

2016-05-25 Thread Alistair Francis

This work is based on the original work by Li Guang with extra
features added by Peter C and myself.

The idea of this loader is to allow the user to load multiple images
or values into QEMU at startup.

Memory values can be loaded like this: -device 
loader,addr=0xfd1a0104,data=0x800e,data-len=4

Images can be loaded like this: -device loader,file=./images/u-boot.elf,cpu=0

This can be useful and we use it a lot in Xilinx to load multiple images
into a machine at creation (ATF, Kernel and DTB for example).

It can also be used to set registers.

This patch series also make the load_elf() function more generic by not
requiring an architecture.

V7:
 - Fix typo in comment
 - Rebase
V6:
 - Add error checking
V5:
 - Rebase
V4:
 - Re-write documentation
 - Allow the loader to work with every architecture
 - Move the file to hw/core
 - Increase the maximum number of CPUs
 - Make the CPU operations conditional
 - Convert the cpu option to cpu-num
 - Require the user to specify endianess
V2:
 - Add an entry to the maintainers file
 - Add some documentation
 - Perform bounds checking on the data_len
 - Register and unregister the reset in the realise/unrealise
Changes since RFC:
 - Add support for BE


Alistair Francis (3):
  loader: Allow ELF loader to auto-detect the ELF arch
  generic-loader: Add a generic loader
  docs: Add a generic loader explanation document

 MAINTAINERS  |   6 ++
 docs/generic-loader.txt  |  54 +
 hw/core/Makefile.objs|   2 +
 hw/core/generic-loader.c | 170 +++
 hw/core/loader.c |  10 +++
 include/hw/core/generic-loader.h |  45 +++
 6 files changed, 287 insertions(+)
 create mode 100644 docs/generic-loader.txt
 create mode 100644 hw/core/generic-loader.c
 create mode 100644 include/hw/core/generic-loader.h

-- 
2.7.4

[Qemu-devel] [PATCH v7 3/3] docs: Add a generic loader explanation document

2016-05-25 Thread Alistair Francis

Signed-off-by: Alistair Francis 
---
V6:
 - Fixup documentation
V4:
 - Re-write to be more comprehensive

 docs/generic-loader.txt | 54 +
 1 file changed, 54 insertions(+)
 create mode 100644 docs/generic-loader.txt

diff --git a/docs/generic-loader.txt b/docs/generic-loader.txt
new file mode 100644
index 000..effb244
--- /dev/null
+++ b/docs/generic-loader.txt
@@ -0,0 +1,54 @@
+Copyright (c) 2016 Xilinx Inc.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+
+The 'loader' device allows the user to load multiple images or values into
+QEMU at startup.
+
+Loading Memory Values
+-
+The loader device allows memory values to be set from the command line. This
+can be done by following the syntax below:
+
+-device loader,addr=,data=,data-len=
+-device loader,addr=,cpu-num=
+
+NOTE: It is also possible to mix the commands above, e.g. include the cpu-num
+  argument with the data argument.
+
+  - The address to store the data or the value to set the CPUs PC
+  - The value to be written to the addr. The maximum size of the
+  data is 8 bytes.
+  - The length of the data in bytes. This argument must be 
included
+  if the data argument is.
+   - Set to true if the data to be stored on the guest should be
+  written as big endian data. The default is to write little
+  endian data.
+   - This will cause the CPU to be reset and the PC to be set to
+  the value of addr.
+
+An example of loading value 0x800e to address 0xfd1a0104 is:
+-device loader,addr=0xfd1a0104,data=0x800e,data-len=4
+
+Loading Files
+-
+The loader device also allows files to be loaded into memory. This can be done
+similarly to setting memory values. The syntax is shown below:
+
+-device loader,file=,addr=,cpu-num=,force-raw=
+
+  - A file to be loaded into memory
+  - The addr in memory that the file should be loaded. This is
+  ignored if you are using an ELF (unless force-raw is true).
+  This is required if you aren't loading an ELF.
+   - This specifies the CPU that should be used. This is an
+  optional argument and will cause the CPU's PC to be set to
+  where the image is stored. This option should only be used
+  for the boot image.
+ - Forces the file to be treated as a raw image. This can be
+  used to specify the load address of ELF files.
+
+An example of loading an ELF file which CPU0 will boot is shown below:
+-device loader,file=./images/boot.elf,cpu-num=0
-- 
2.7.4

[Qemu-devel] [PATCH v7 1/3] loader: Allow ELF loader to auto-detect the ELF arch

2016-05-25 Thread Alistair Francis

If the caller didn't specify an architecture for the ELF machine
the load_elf() function will auto detect it based on the ELF file.

Signed-off-by: Alistair Francis 
---
V7:
 - Fix typo

 hw/core/loader.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index 53e0e41..a8a372d 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -419,6 +419,7 @@ int load_elf(const char *filename, uint64_t 
(*translate_fn)(void *, uint64_t),
 {
 int fd, data_order, target_data_order, must_swab, ret = ELF_LOAD_FAILED;
 uint8_t e_ident[EI_NIDENT];
+uint16_t e_machine;
 
 fd = open(filename, O_RDONLY | O_BINARY);
 if (fd < 0) {
@@ -451,6 +452,15 @@ int load_elf(const char *filename, uint64_t 
(*translate_fn)(void *, uint64_t),
 goto fail;
 }
 
+if (elf_machine < 1) {
+/* The caller didn't specify an ARCH, we can figure it out */
+lseek(fd, 0x12, SEEK_SET);
+if (read(fd, &e_machine, sizeof(e_machine)) != sizeof(e_machine)) {
+goto fail;
+}
+elf_machine = e_machine;
+}
+
 lseek(fd, 0, SEEK_SET);
 if (e_ident[EI_CLASS] == ELFCLASS64) {
 ret = load_elf64(filename, fd, translate_fn, translate_opaque, 
must_swab,
-- 
2.7.4

Re: [Qemu-devel] [PATCH 1/5] block: split write_zeroes always

2016-05-25 Thread Eric Blake

On 05/17/2016 10:34 AM, Kevin Wolf wrote:
> Am 17.05.2016 um 11:15 hat Denis V. Lunev geschrieben:
>> We should split requests even if they are less than write_zeroes_alignment.
>> For example we can have the following request:
>>   offset 62k
>>   size   4k
>>   write_zeroes_alignment 64k
>> The original code sent 1 request covering 2 qcow2 clusters, and resulted
>> in both clusters being allocated. But by splitting the request, we can
>> cater to the case where one of the two clusters can be zeroed as a
>> whole, for only 1 cluster allocated after the operation.
>>
>> Signed-off-by: Denis V. Lunev 
>> CC: Eric Blake 
>> CC: Kevin Wolf 
>> ---
>>  block/io.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index cd6d71a..6a24ea8 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1172,13 +1172,13 @@ static int coroutine_fn 
>> bdrv_co_do_write_zeroes(BlockDriverState *bs,
>>  /* Align request.  Block drivers can expect the "bulk" of the 
>> request
>>   * to be aligned.
>>   */
>> -if (bs->bl.write_zeroes_alignment
>> -&& num > bs->bl.write_zeroes_alignment) {
>> +if (bs->bl.write_zeroes_alignment) {
>>  if (sector_num % bs->bl.write_zeroes_alignment != 0) {
>>  /* Make a small request up to the first aligned sector.  */
>>  num = bs->bl.write_zeroes_alignment;
>>  num -= sector_num % bs->bl.write_zeroes_alignment;
> 
> Turns out this doesn't work. If this is a small request that zeros
> something in the middle of a single cluster (i.e. we have untouched data
> both before and after the request in the same cluster), then num can now
> become greater than nb_sectors, so that we end up zeroing too much.

I'm planning on folding in a working version of this patch in my
byte-based write_zeroes conversion series.  As part of the patch, I'm
also hoisting the division out of the loop (no guarantees that the
compiler can spot that bs->bl.write_zeroes_alignment will be a power of
two, to optimize it to a shift).



-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Jianjun Duan

I will try to explain my design rationale in details here.

1 QTAILQ should only be accessed using the interfaces defined in
queue.h. Its structs should not be directly used. So I created
interfaces in queue.h to query about its layout. If the implementation
is changed, these interfaces should be changed accordingly. Code using
these interfaces should not break.

2 Based on point 1, vmstate_load_state/vmstate_put_state in vmstate.c
doesn't exactly know the structs of QTAILAQ head and entry. So pointer
arithmetic is needed to put/get a QTAILQ. To do it, we need those 6
parameters to be passed in. So it is not redundant if we only want to
only use the interfaces.

3 At this moment, vmstate_load_state/vmstate_put_state couldn't handle a
queue, or a list, or another recursive structure. To make it
extensible, I think a metadata is needed. The idea is for any
structure which needs special handling, customized metadata/put/get
should provide enough flexibility to hack around.

There are two issues I tried to address. One is the recursive queue,
another is working around of hiding of the QTAILQ implementation
details. It seems we need to agree on how the latter should be
addressed.

I will give it a try to fix those 35 calling sites of VMStateInfo.

Thanks,
Jianjun

On 05/25/2016 10:51 AM, Paolo Bonzini wrote:
 +/*
 + * Following 3 fields are for VMStateField which needs customized
 handling,
 + * such as QTAILQ in qemu/queue.h, lists, and tree.
 + */
 +const void *meta_data;
 +int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
 +void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
 +   QJSON *vmdesc);
  } VMStateField;
>>>
>>> Do not add these two function pointers to VMStateField, instead add
>>> QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
>>
>> That is definitely the ideal way. However, VMStateInfo's get/put are
>> already used extensively. If I change them, I need change all the
>> calling sites of them. Not very sure about whether it will be welcomed.
> 
> Sure, don't be worried. :)  True, there are quite a few VMStateInfos (about
> 35) but the changes are simple.
> 
 +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
 +.first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),\
 +.last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)), \
 +.next = QTAILQ_NEXT_OFFSET(_type, _next),  \
 +.prev = QTAILQ_PREV_OFFSET(_type, _next),  \
 +.entry = offsetof(_type, _next),   \
 +.size = sizeof(_type), \
 +.vmsd = &(_vmsd),  \
 +}
>>>
>>> .last and .prev are unnecessary, since they come just after .first and
>>> .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
>   
> 
> Actually, .first and .next are always 0.  I misread your changes to 
> qemu/queue.h.
> See below.
> 
>>> can be placed in .offset and .num_offset respectively.  So you don't
>>> really need an extra metadata struct.
>>>
>>> If you prefer you could have something like
>>>
>>> union {
>>> size_t num_offset;/* VMS_VARRAY */
>>> size_t size_offset;   /* VMS_VBUFFER */
>>> size_t next_offset;   /* VMS_TAILQ */
>>> } offsets;
>>
>> Actually I explored the approach in which I use a VMSD to encode all the
>> information. But a VMSD describes actual memory layout. Interpreting it
>> another way could be confusing.
>>
>> One of the assumption about QTAILQ is that we can only use the
>> interfaces defined in queue.h to access it. I intend not to depend on
>> its actually layout. From this perspective, these 6 parameters are
>> needed.
> 
> You are already adding QTAILQ_RAW_*, aren't you?  Those interfaces
> would need to know about the layout, and you are passing redundant
> information:
> 
> - .next_offset should always be 0
> - .prev_offset should always be sizeof(void *)
> - .first_offset should always be 0
> - .last_offset should always be sizeof(void *)
> 
> so you only need head and entry, which you can store in .offset and
> .num_offset.  The .vmsd field you have in ->metadata can be stored
> in VMStateField's .vmsd, and likewise for .size (which can be stored
> in VMStateField's .vmsd).
> 
> Thanks,
> 
> Paolo
>

Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)

2016-05-25 Thread Sergey Fedorov

On 25/05/16 21:03, Paolo Bonzini wrote:
>> The page table seems to be protected by 'mmap_lock' in user mode
>> emulation but by 'tb_lock' in system mode emulation. It may turn to be
>> possible to read it safely even with no lock held.
> Yes, it is possible to at least follow the radix tree safely with no
> lock held.  The fields in the leaves can be either lockless or protected
> by a lock.
>
> The radix tree can be followed without a lock just like you do with RCU.
> The difference with RCU is that:
>
> 1) the leaves are protected with a lock, so you don't do the "copy";
> instead after reading you lock around updates
>
> 2) the radix tree is only ever added to, so you don't need to protect
> the reads with rcu_read_lock/rcu_read_unlock.  rcu_read_lock and
> rcu_read_unlock are only needed to inform the deleters that something
> cannot yet go away.  Without deleters, you don't need rcu_read_lock
> and rcu_read_unlock (but you still need atomic_rcu_read/atomic_rcu_set).
>
>

Yes, however looking closer at how the leafs are used I can't see much
point to do this so far...

Thanks,
Sergey

Re: [Qemu-devel] [RFC v2 03/11] docs: new design document multi-thread-tcg.txt (DRAFTING)

2016-05-25 Thread Paolo Bonzini

> The page table seems to be protected by 'mmap_lock' in user mode
> emulation but by 'tb_lock' in system mode emulation. It may turn to be
> possible to read it safely even with no lock held.

Yes, it is possible to at least follow the radix tree safely with no
lock held.  The fields in the leaves can be either lockless or protected
by a lock.

The radix tree can be followed without a lock just like you do with RCU.
The difference with RCU is that:

1) the leaves are protected with a lock, so you don't do the "copy";
instead after reading you lock around updates

2) the radix tree is only ever added to, so you don't need to protect
the reads with rcu_read_lock/rcu_read_unlock.  rcu_read_lock and
rcu_read_unlock are only needed to inform the deleters that something
cannot yet go away.  Without deleters, you don't need rcu_read_lock
and rcu_read_unlock (but you still need atomic_rcu_read/atomic_rcu_set).

Paolo

Re: [Qemu-devel] [QEMU RFC PATCH v2 4/6] Migration: migrate QTAILQ

2016-05-25 Thread Paolo Bonzini

> >> +/*
> >> + * Following 3 fields are for VMStateField which needs customized
> >> handling,
> >> + * such as QTAILQ in qemu/queue.h, lists, and tree.
> >> + */
> >> +const void *meta_data;
> >> +int (*extend_get)(QEMUFile *f, const void *metadata, void *opaque);
> >> +void (*extend_put)(QEMUFile *f, const void *metadata, void *opaque,
> >> +   QJSON *vmdesc);
> >>  } VMStateField;
> > 
> > Do not add these two function pointers to VMStateField, instead add
> > QJSON* and VMStateField* arguments as needed to VMStateInfo's get and put.
> 
> That is definitely the ideal way. However, VMStateInfo's get/put are
> already used extensively. If I change them, I need change all the
> calling sites of them. Not very sure about whether it will be welcomed.

Sure, don't be worried. :)  True, there are quite a few VMStateInfos (about
35) but the changes are simple.

> >> +#define VMSTATE_QTAILQ_METADATA(_field, _state, _type, _next, _vmsd) { \
> >> +.first = QTAILQ_FIRST_OFFSET(typeof_field(_state, _field)),\
> >> +.last =  QTAILQ_LAST_OFFSET(typeof_field(_state, _field)), \
> >> +.next = QTAILQ_NEXT_OFFSET(_type, _next),  \
> >> +.prev = QTAILQ_PREV_OFFSET(_type, _next),  \
> >> +.entry = offsetof(_type, _next),   \
> >> +.size = sizeof(_type), \
> >> +.vmsd = &(_vmsd),  \
> >> +}
> > 
> > .last and .prev are unnecessary, since they come just after .first and
> > .next (and their use is hidden behind QTAILQ_RAW_*).  .first and .next
  

Actually, .first and .next are always 0.  I misread your changes to 
qemu/queue.h.
See below.

> > can be placed in .offset and .num_offset respectively.  So you don't
> > really need an extra metadata struct.
> > 
> > If you prefer you could have something like
> > 
> > union {
> > size_t num_offset;/* VMS_VARRAY */
> > size_t size_offset;   /* VMS_VBUFFER */
> > size_t next_offset;   /* VMS_TAILQ */
> > } offsets;
>
> Actually I explored the approach in which I use a VMSD to encode all the
> information. But a VMSD describes actual memory layout. Interpreting it
> another way could be confusing.
> 
> One of the assumption about QTAILQ is that we can only use the
> interfaces defined in queue.h to access it. I intend not to depend on
> its actually layout. From this perspective, these 6 parameters are
> needed.

You are already adding QTAILQ_RAW_*, aren't you?  Those interfaces
would need to know about the layout, and you are passing redundant
information:

- .next_offset should always be 0
- .prev_offset should always be sizeof(void *)
- .first_offset should always be 0
- .last_offset should always be sizeof(void *)

so you only need head and entry, which you can store in .offset and
.num_offset.  The .vmsd field you have in ->metadata can be stored
in VMStateField's .vmsd, and likewise for .size (which can be stored
in VMStateField's .vmsd).

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v6 1/3] loader: Allow ELF loader to auto-detect the ELF arch

2016-05-25 Thread Alistair Francis

On Tue, May 24, 2016 at 3:08 PM, Cleber Rosa  wrote:
>
> On 05/13/2016 05:37 PM, Alistair Francis wrote:
>>
>>
>> +if (elf_machine < 1) {
>> +/* The caller didn't specify and ARCH, we can figure it out */
>
>
> Spotted a comment typo: s/and/an/

Thanks, sending a version 7 with it fixed.

Thanks,

Alistair

>
>
>> +lseek(fd, 0x12, SEEK_SET);
>> +if (read(fd, &e_machine, sizeof(e_machine)) != sizeof(e_machine))
>> {
>> +goto fail;
>> +}
>> +elf_machine = e_machine;
>> +}
>> +
>
>
> --
> Cleber Rosa
> [ Sr Software Engineer - Virtualization Team - Red Hat ]
> [ Avocado Test Framework - avocado-framework.github.io ]
>

1 2 3 4 >

1 - 100 of 378 matches

Mail list logo