Re: [Qemu-devel] [PATCH 0/5] Initial write support for MTP objects

2017-09-13 Thread no-reply
Hi,

This series failed build test on s390x host. Please find the details below.

Message-id: 20170914053148.2512-1-...@redhat.com
Type: series
Subject: [Qemu-devel] [PATCH 0/5] Initial write support for MTP objects

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  
patchew/1505318697-77161-1-git-send-email-imamm...@redhat.com -> 
patchew/1505318697-77161-1-git-send-email-imamm...@redhat.com
 - [tag update]  patchew/1505328054-23805-1-git-send-email-th...@redhat.com 
-> patchew/1505328054-23805-1-git-send-email-th...@redhat.com
 - [tag update]  patchew/20170912203119.24166-1-ebl...@redhat.com -> 
patchew/20170912203119.24166-1-ebl...@redhat.com
 - [tag update]  patchew/20170913142036.2469-1-lviv...@redhat.com -> 
patchew/20170913142036.2469-1-lviv...@redhat.com
 - [tag update]  patchew/20170913221149.30382-1-f4...@amsat.org -> 
patchew/20170913221149.30382-1-f4...@amsat.org
 * [new tag] patchew/20170914053148.2512-1-...@redhat.com -> 
patchew/20170914053148.2512-1-...@redhat.com
Switched to a new branch 'test'
d18138a usb-mtp: Advertise SendObjectInfo for write support
af36a8b usb-mtp: Introduce write support for MTP objects
965ffc4 usb-mtp: Support delete of mtp objects
cc267c6 usb-mtp: print parent path in IN_IGNORED trace fn
481a4e2 usb-mtp: Add one more argument when building results

=== OUTPUT BEGIN ===
=== ENV ===
XDG_SESSION_ID=2311
SHELL=/bin/sh
USER=fam
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
PATH=/usr/bin:/bin
PWD=/var/tmp/patchew-tester-tmp-3hahtxvb/src
LANG=en_US.UTF-8
HOME=/home/fam
SHLVL=2
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
xz-libs-5.2.2-2.fc24.s390x
libxshmfence-1.2-3.fc24.s390x
giflib-4.1.6-15.fc24.s390x
trousers-lib-0.3.13-6.fc24.s390x
ncurses-base-6.0-6.20160709.fc25.noarch
gmp-6.1.1-1.fc25.s390x
libidn-1.33-1.fc25.s390x
slang-2.3.0-7.fc25.s390x
pkgconfig-0.29.1-1.fc25.s390x
alsa-lib-1.1.1-2.fc25.s390x
yum-metadata-parser-1.1.4-17.fc25.s390x
python3-slip-dbus-0.6.4-4.fc25.noarch
python2-cssselect-0.9.2-1.fc25.noarch
createrepo_c-libs-0.10.0-6.fc25.s390x
initscripts-9.69-1.fc25.s390x
parted-3.2-21.fc25.s390x
flex-2.6.0-3.fc25.s390x
colord-libs-1.3.4-1.fc25.s390x
python-osbs-client-0.33-3.fc25.noarch
perl-Pod-Simple-3.35-1.fc25.noarch
python2-simplejson-3.10.0-1.fc25.s390x
brltty-5.4-2.fc25.s390x
librados2-10.2.4-2.fc25.s390x
tcp_wrappers-7.6-83.fc25.s390x
libcephfs_jni1-10.2.4-2.fc25.s390x
nettle-devel-3.3-1.fc25.s390x
bzip2-devel-1.0.6-21.fc25.s390x
libuuid-2.28.2-2.fc25.s390x
python3-dnf-1.1.10-6.fc25.noarch
texlive-kpathsea-doc-svn41139-33.fc25.1.noarch
openssh-7.4p1-4.fc25.s390x
texlive-kpathsea-bin-svn40473-33.20160520.fc25.1.s390x
texlive-graphics-svn41015-33.fc25.1.noarch
texlive-dvipdfmx-def-svn40328-33.fc25.1.noarch
texlive-mfware-svn40768-33.fc25.1.noarch
texlive-texlive-scripts-svn41433-33.fc25.1.noarch
texlive-euro-svn22191.1.1-33.fc25.1.noarch
texlive-etex-svn37057.0-33.fc25.1.noarch
texlive-iftex-svn29654.0.2-33.fc25.1.noarch
texlive-palatino-svn31835.0-33.fc25.1.noarch
texlive-texlive-docindex-svn41430-33.fc25.1.noarch
texlive-xunicode-svn30466.0.981-33.fc25.1.noarch
texlive-koma-script-svn41508-33.fc25.1.noarch
texlive-pst-grad-svn15878.1.06-33.fc25.1.noarch
texlive-pst-blur-svn15878.2.0-33.fc25.1.noarch
texlive-jknapltx-svn19440.0-33.fc25.1.noarch
texinfo-6.1-4.fc25.s390x
openssl-devel-1.0.2k-1.fc25.s390x
jansson-2.10-2.fc25.s390x
fedora-repos-25-4.noarch
python3-libs-3.5.3-6.fc25.s390x
perl-Errno-1.25-387.fc25.s390x
acl-2.2.52-13.fc25.s390x
systemd-pam-231-17.fc25.s390x
NetworkManager-libnm-1.4.4-5.fc25.s390x
poppler-0.45.0-5.fc25.s390x
ccache-3.3.4-1.fc25.s390x
valgrind-3.12.0-9.fc25.s390x
perl-open-1.10-387.fc25.noarch
libgcc-6.4.1-1.fc25.s390x
libsoup-2.56.1-1.fc25.s390x
libstdc++-devel-6.4.1-1.fc25.s390x
nss-softokn-devel-3.31.0-1.0.fc25.s390x
libobjc-6.4.1-1.fc25.s390x
python2-rpm-4.13.0.1-2.fc25.s390x
python2-gluster-3.10.5-1.fc25.s390x
rpm-build-4.13.0.1-2.fc25.s390x
glibc-static-2.24-10.fc25.s390x
lz4-1.8.0-1.fc25.s390x
xapian-core-libs-1.2.24-1.fc25.s390x
elfutils-libelf-devel-0.169-1.fc25.s390x
libaio-0.3.110-6.fc24.s390x
libfontenc-1.1.3-3.fc24.s390x
lzo-2.08-8.fc24.s390x
isl-0.14-5.fc24.s390x
libXau-1.0.8-6.fc24.s390x
linux-atm-libs-2.5.1-14.fc24.s390x
libXext-1.3.3-4.fc24.s390x

Re: [Qemu-devel] [PATCH] dump: do not dump non-existent guest memory

2017-09-13 Thread Peter Xu
On Wed, Sep 13, 2017 at 09:58:04AM +0200, Cornelia Huck wrote:
> On Tue, 12 Sep 2017 12:17:51 +0200
> Marc-André Lureau  wrote:
> 
> > Hi Peter
> > 
> > On Tue, Sep 12, 2017 at 5:33 AM, Peter Xu  wrote:
> > > On Mon, Sep 11, 2017 at 03:26:27PM +0200, Cornelia Huck wrote:  
> > >> It does not really make sense to dump memory that is not there.
> > >>
> > >> Moreover, that fixes a segmentation fault when calling dump-guest-memory
> > >> with no filter for a machine with no memory defined.
> > >>
> > >> New behaviour is:
> > >>
> > >> (qemu) dump-guest-memory /dev/null
> > >> dump: no guest memory to dump
> > >> (qemu) dump-guest-memory /dev/null 0 4096
> > >> dump: no guest memory to dump
> > >>
> > >> Signed-off-by: Cornelia Huck 
> > >> ---
> > >>
> > >> Another unmaintained file. Joy. cc:ing some more-or-less random folks.  
> > >
> > > I thought Marc-André had proposed to be the maintainer? But indeed I
> > > didn't see the line in maintainer file.
> > >
> > > Anyway, if anyone think I am ok to maintain this single file
> > > (considering that I haven't posted patches on it for 2 years, and I
> > > haven't been working on any kind of maintainer job), please let me
> > > know. I'm glad to start with it (or with Marc-André).
> > >  
> > 
> > That would be great!
> > thanks
> 
> Now on to the most important question: Will you merge this patch? :)
> 
> (It should probably go together with "tests/hmp: test "none" machine
> with memory".)

I posted a RFC to maintain this file with Marc-André Lureau, but not
getting any acks yet, so I guess at least I don't have that
permission, yet...

And I see Laruent posted this already:

  [Qemu-devel] [PATCH v4 0/4] hmp: fix "dump-quest-memory" segfault

So I assume all these four patches can go with Dave's tree.  Thanks,

-- 
Peter Xu



[Qemu-devel] [PATCH 1/5] usb-mtp: Add one more argument when building results

2017-09-13 Thread Bandan Das
From: Bandan Das 

The response to a SendObjectInfo consists of the storageid,
parent obejct handle anad the handle reserved for the new
incoming object

Signed-off-by: Bandan Das 
---
 hw/usb/dev-mtp.c | 50 +++---
 1 file changed, 27 insertions(+), 23 deletions(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 94c2e94..b55aa82 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -765,7 +765,8 @@ static void usb_mtp_add_time(MTPData *data, time_t time)
 /* --- */
 
 static void usb_mtp_queue_result(MTPState *s, uint16_t code, uint32_t trans,
- int argc, uint32_t arg0, uint32_t arg1)
+ int argc, uint32_t arg0, uint32_t arg1,
+ uint32_t arg2)
 {
 MTPControl *c = g_new0(MTPControl, 1);
 
@@ -778,6 +779,9 @@ static void usb_mtp_queue_result(MTPState *s, uint16_t 
code, uint32_t trans,
 if (argc > 1) {
 c->argv[1] = arg1;
 }
+if (argc > 2) {
+c->argv[2] = arg2;
+}
 
 assert(s->result == NULL);
 s->result = c;
@@ -1119,7 +1123,7 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 /* sanity checks */
 if (c->code >= CMD_CLOSE_SESSION && s->session == 0) {
 usb_mtp_queue_result(s, RES_SESSION_NOT_OPEN,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 
@@ -1131,12 +1135,12 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 case CMD_OPEN_SESSION:
 if (s->session) {
 usb_mtp_queue_result(s, RES_SESSION_ALREADY_OPEN,
- c->trans, 1, s->session, 0);
+ c->trans, 1, s->session, 0, 0);
 return;
 }
 if (c->argv[0] == 0) {
 usb_mtp_queue_result(s, RES_INVALID_PARAMETER,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 trace_usb_mtp_op_open_session(s->dev.addr);
@@ -1165,7 +1169,7 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 if (c->argv[0] != QEMU_STORAGE_ID &&
 c->argv[0] != 0x) {
 usb_mtp_queue_result(s, RES_INVALID_STORAGE_ID,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 data_in = usb_mtp_get_storage_info(s, c);
@@ -1175,12 +1179,12 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 if (c->argv[0] != QEMU_STORAGE_ID &&
 c->argv[0] != 0x) {
 usb_mtp_queue_result(s, RES_INVALID_STORAGE_ID,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 if (c->argv[1] != 0x) {
 usb_mtp_queue_result(s, RES_SPEC_BY_FORMAT_UNSUPPORTED,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 if (c->argv[2] == 0x ||
@@ -1191,12 +1195,12 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 }
 if (o == NULL) {
 usb_mtp_queue_result(s, RES_INVALID_OBJECT_HANDLE,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 if (o->format != FMT_ASSOCIATION) {
 usb_mtp_queue_result(s, RES_INVALID_PARENT_OBJECT,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 usb_mtp_object_readdir(s, o);
@@ -1212,7 +1216,7 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 o = usb_mtp_object_lookup(s, c->argv[0]);
 if (o == NULL) {
 usb_mtp_queue_result(s, RES_INVALID_OBJECT_HANDLE,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 data_in = usb_mtp_get_object_info(s, c, o);
@@ -1221,18 +1225,18 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 o = usb_mtp_object_lookup(s, c->argv[0]);
 if (o == NULL) {
 usb_mtp_queue_result(s, RES_INVALID_OBJECT_HANDLE,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 if (o->format == FMT_ASSOCIATION) {
 usb_mtp_queue_result(s, RES_INVALID_OBJECT_HANDLE,
- c->trans, 0, 0, 0);
+ c->trans, 0, 0, 0, 0);
 return;
 }
 data_in = usb_mtp_get_object(s, c, o);

[Qemu-devel] [PATCH 4/5] usb-mtp: Introduce write support for MTP objects

2017-09-13 Thread Bandan Das
From: Bandan Das 

Allow write operations on behalf of the initiator. The
precursor to write is the sending of the write metadata
that consists of the ObjectInfo dataset. This patch introduces
a flag that is set when the responder is ready to receive
write data based on a previous SendObjectInfo operation by
the initiator (The SendObjectInfo implementation is in a
later patch)

Signed-off-by: Bandan Das 
---
 hw/usb/dev-mtp.c | 159 ++-
 1 file changed, 157 insertions(+), 2 deletions(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 76494f3..2ff9f0f 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -47,6 +47,7 @@ enum mtp_code {
 CMD_GET_OBJECT_INFO= 0x1008,
 CMD_GET_OBJECT = 0x1009,
 CMD_DELETE_OBJECT  = 0x100b,
+CMD_SEND_OBJECT= 0x100d,
 CMD_GET_PARTIAL_OBJECT = 0x101b,
 CMD_GET_OBJECT_PROPS_SUPPORTED = 0x9801,
 CMD_GET_OBJECT_PROP_DESC   = 0x9802,
@@ -66,7 +67,9 @@ enum mtp_code {
 RES_STORE_READ_ONLY= 0x200e,
 RES_PARTIAL_DELETE = 0x2012,
 RES_SPEC_BY_FORMAT_UNSUPPORTED = 0x2014,
+RES_INVALID_OBJECTINFO = 0x2015,
 RES_INVALID_PARENT_OBJECT  = 0x201a,
+RES_STORE_FULL = 0x200c,
 RES_INVALID_PARAMETER  = 0x201d,
 RES_SESSION_ALREADY_OPEN   = 0x201e,
 RES_INVALID_OBJECT_PROP_CODE   = 0xA801,
@@ -182,6 +185,14 @@ struct MTPState {
 int  inotifyfd;
 QTAILQ_HEAD(events, MTPMonEntry) events;
 #endif
+/* Responder is expecting a write operation */
+bool write_pending;
+struct {
+uint32_t parent_handle;
+uint16_t format;
+uint32_t size;
+char *filename;
+} dataset;
 };
 
 #define TYPE_USB_MTP "usb-mtp"
@@ -803,6 +814,7 @@ static MTPData *usb_mtp_get_device_info(MTPState *s, 
MTPControl *c)
 CMD_GET_OBJECT_HANDLES,
 CMD_GET_OBJECT_INFO,
 CMD_DELETE_OBJECT,
+CMD_SEND_OBJECT,
 CMD_GET_OBJECT,
 CMD_GET_PARTIAL_OBJECT,
 CMD_GET_OBJECT_PROPS_SUPPORTED,
@@ -1381,6 +1393,14 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 nres = 1;
 res0 = data_in->length;
 break;
+case CMD_SEND_OBJECT:
+if (!s->write_pending) {
+usb_mtp_queue_result(s, RES_INVALID_OBJECTINFO,
+ c->trans, 0, 0, 0, 0);
+return;
+}
+s->data_out = usb_mtp_data_alloc(c);
+return;
 case CMD_GET_OBJECT_PROPS_SUPPORTED:
 if (c->argv[0] != FMT_UNDEFINED_OBJECT &&
 c->argv[0] != FMT_ASSOCIATION) {
@@ -1475,12 +1495,133 @@ static void usb_mtp_cancel_packet(USBDevice *dev, 
USBPacket *p)
 fprintf(stderr, "%s\n", __func__);
 }
 
+mode_t getumask(void)
+{
+mode_t mask = umask(0);
+umask(mask);
+return mask;
+}
+
+static void usb_mtp_write_data(MTPState *s)
+{
+MTPData *d = s->data_out;
+MTPObject *parent =
+usb_mtp_object_lookup(s, s->dataset.parent_handle);
+char *path;
+int rc = -1;
+mode_t mask = ~getumask() & 0777;
+
+assert(d != NULL);
+
+if (parent == NULL || !s->write_pending) {
+usb_mtp_queue_result(s, RES_INVALID_OBJECTINFO, d->trans,
+ 0, 0, 0, 0);
+return;
+}
+
+if (s->dataset.filename) {
+path = g_strdup_printf("%s/%s", parent->path, s->dataset.filename);
+if (s->dataset.format == FMT_ASSOCIATION) {
+d->fd = mkdir(path, mask);
+goto free;
+}
+if (s->dataset.size < d->length) {
+usb_mtp_queue_result(s, RES_STORE_FULL, d->trans,
+ 0, 0, 0, 0);
+goto done;
+}
+d->fd = open(path, O_CREAT | O_WRONLY, mask);
+if (d->fd == -1) {
+usb_mtp_queue_result(s, RES_STORE_FULL, d->trans,
+ 0, 0, 0, 0);
+goto done;
+}
+
+/*
+ * Return success if initiator sent 0 sized data
+ */
+if (!s->dataset.size) {
+goto success;
+}
+
+rc = write(d->fd, d->data, s->dataset.size);
+if (rc == -1) {
+usb_mtp_queue_result(s, RES_STORE_FULL, d->trans,
+ 0, 0, 0, 0);
+goto done;
+}
+if (rc != s->dataset.size) {
+usb_mtp_queue_result(s, RES_INCOMPLETE_TRANSFER, d->trans,
+ 0, 0, 0, 0);
+goto done;
+}
+}
+
+success:
+usb_mtp_queue_result(s, RES_OK, d->trans,
+ 0, 0, 0, 0);
+
+done:
+/*
+ * The write dataset is kept around and freed only
+ * on success or if another write request comes in
+ */
+if (d->fd != -1) {
+close(d->fd);
+}
+free:
+

[Qemu-devel] [PATCH 2/5] usb-mtp: print parent path in IN_IGNORED trace fn

2017-09-13 Thread Bandan Das
From: Bandan Das 

Fix a possible null dereference when deleting a folder and
its contents. An ignored event might be received for its contents
after the parent folder is deleted which will return a null object.

Signed-off-by: Bandan Das 
---
 hw/usb/dev-mtp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index b55aa82..63f8f3b 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -540,9 +540,8 @@ static void inotify_watchfn(void *arg)
 break;
 
 case IN_IGNORED:
-o = usb_mtp_object_lookup_name(parent, event->name, 
event->len);
-trace_usb_mtp_inotify_event(s->dev.addr, o->path,
-  event->mask, "Obj ignored");
+trace_usb_mtp_inotify_event(s->dev.addr, parent->path,
+  event->mask, "Obj parent dir ignored");
 break;
 
 default:
-- 
2.9.4




[Qemu-devel] [PATCH 0/5] Initial write support for MTP objects

2017-09-13 Thread Bandan Das
These patches implement write support for Qemu's MTP
emulation. Simple tests such as delete/move/edit/copy work ok.
Current issues/TODO:

 - File transfers > 4GB has not been tested and will probably not work
 - Some (or most) MTP clients don't advertise hidden files and folders (names
 that start with a .) even though iiuc Qemu MTP does advertise these files.
 This can confuse certain applications such as text editors or git.
 - Also related, file editors typically run fsync when saving. Depending on
 the MTP client, it may choose not to implement it (such as simple-mtpfs that 
runs
 on top of fuse).
 - Needs more testing :)


Bandan Das (5):
  usb-mtp: Add one more argument when building results
  usb-mtp: print parent path in IN_IGNORED trace fn
  usb-mtp: Support delete of mtp objects
  usb-mtp: Introduce write support for MTP objects
  usb-mtp: Advertise SendObjectInfo for write support

 hw/usb/dev-mtp.c | 449 +++
 1 file changed, 421 insertions(+), 28 deletions(-)

-- 
2.9.4




[Qemu-devel] [PATCH 5/5] usb-mtp: Advertise SendObjectInfo for write support

2017-09-13 Thread Bandan Das
From: Bandan Das 

This patch implements a dummy ObjectInfo structure so that
it's easy to typecast the incoming data. If the metadata is
valid, write_pending is set. Also, the incoming filename
is utf-16, so, instead of depending on external libraries, just
implement a simple function to get the filename

Signed-off-by: Bandan Das 
---
 hw/usb/dev-mtp.c | 114 +++
 1 file changed, 114 insertions(+)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 2ff9f0f..e5152db 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -47,6 +47,7 @@ enum mtp_code {
 CMD_GET_OBJECT_INFO= 0x1008,
 CMD_GET_OBJECT = 0x1009,
 CMD_DELETE_OBJECT  = 0x100b,
+CMD_SEND_OBJECT_INFO   = 0x100c,
 CMD_SEND_OBJECT= 0x100d,
 CMD_GET_PARTIAL_OBJECT = 0x101b,
 CMD_GET_OBJECT_PROPS_SUPPORTED = 0x9801,
@@ -66,8 +67,10 @@ enum mtp_code {
 RES_INVALID_OBJECT_FORMAT_CODE = 0x200b,
 RES_STORE_READ_ONLY= 0x200e,
 RES_PARTIAL_DELETE = 0x2012,
+RES_STORE_NOT_AVAILABLE= 0x2013,
 RES_SPEC_BY_FORMAT_UNSUPPORTED = 0x2014,
 RES_INVALID_OBJECTINFO = 0x2015,
+RES_DESTINATION_UNSUPPORTED= 0x2020,
 RES_INVALID_PARENT_OBJECT  = 0x201a,
 RES_STORE_FULL = 0x200c,
 RES_INVALID_PARAMETER  = 0x201d,
@@ -195,6 +198,24 @@ struct MTPState {
 } dataset;
 };
 
+/*
+ * ObjectInfo dataset received from initiator
+ * Fields we don't care about are ignored
+ */
+typedef struct {
+char __pad1[4];
+uint16_t format;
+char __pad2[2];
+uint32_t size;
+char __pad3[30];
+uint16_t assoc_type;
+uint32_t assoc_desc;
+char __pad4[4];
+uint8_t length;
+uint16_t filename[0];
+/* string and other data follows */
+} QEMU_PACKED ObjectInfo;
+
 #define TYPE_USB_MTP "usb-mtp"
 #define USB_MTP(obj) OBJECT_CHECK(MTPState, (obj), TYPE_USB_MTP)
 
@@ -814,6 +835,7 @@ static MTPData *usb_mtp_get_device_info(MTPState *s, 
MTPControl *c)
 CMD_GET_OBJECT_HANDLES,
 CMD_GET_OBJECT_INFO,
 CMD_DELETE_OBJECT,
+CMD_SEND_OBJECT_INFO,
 CMD_SEND_OBJECT,
 CMD_GET_OBJECT,
 CMD_GET_PARTIAL_OBJECT,
@@ -1393,6 +1415,35 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 nres = 1;
 res0 = data_in->length;
 break;
+case CMD_SEND_OBJECT_INFO:
+/* First parameter points to storage id or is 0 */
+if (c->argv[0] && (c->argv[0] != QEMU_STORAGE_ID)) {
+usb_mtp_queue_result(s, RES_STORE_NOT_AVAILABLE, c->trans,
+ 0, 0, 0, 0);
+} else if (c->argv[1] && !c->argv[0]) {
+/* If second parameter is specified, first must also be specified 
*/
+usb_mtp_queue_result(s, RES_DESTINATION_UNSUPPORTED, c->trans,
+ 0, 0, 0, 0);
+} else {
+uint32_t handle = c->argv[1];
+if (handle == 0x || handle == 0) {
+/* root object */
+o = QTAILQ_FIRST(>objects);
+} else {
+o = usb_mtp_object_lookup(s, handle);
+}
+if (o == NULL) {
+usb_mtp_queue_result(s, RES_INVALID_OBJECT_HANDLE, c->trans,
+ 0, 0, 0, 0);
+}
+if (o->format != FMT_ASSOCIATION) {
+usb_mtp_queue_result(s, RES_INVALID_PARENT_OBJECT, c->trans,
+ 0, 0, 0, 0);
+}
+}
+s->dataset.parent_handle = o->handle;
+s->data_out = usb_mtp_data_alloc(c);
+return;
 case CMD_SEND_OBJECT:
 if (!s->write_pending) {
 usb_mtp_queue_result(s, RES_INVALID_OBJECTINFO,
@@ -1502,6 +1553,17 @@ mode_t getumask(void)
 return mask;
 }
 
+static void utf16_to_str(uint8_t len, uint16_t *arr, char *name)
+{
+int count;
+
+for (count = 0; count < len; count++) {
+/* Check for valid ascii */
+assert(!(arr[count] & 0xFF80));
+name[count] = arr[count];
+}
+}
+
 static void usb_mtp_write_data(MTPState *s)
 {
 MTPData *d = s->data_out;
@@ -1575,6 +1637,45 @@ free:
 s->write_pending = false;
 }
 
+static void usb_mtp_write_metadata(MTPState *s)
+{
+MTPData *d = s->data_out;
+ObjectInfo *dataset = (ObjectInfo *)d->data;
+char *filename = g_new0(char, dataset->length);
+MTPObject *o;
+MTPObject *p = usb_mtp_object_lookup(s, s->dataset.parent_handle);
+uint32_t next_handle = s->next_handle;
+
+assert(!s->write_pending);
+
+utf16_to_str(dataset->length, dataset->filename, filename);
+
+o = usb_mtp_object_lookup_name(p, filename, dataset->length);
+if (o != NULL) {
+next_handle = o->handle;
+}
+
+s->dataset.filename = filename;
+s->dataset.format = 

[Qemu-devel] [PATCH 3/5] usb-mtp: Support delete of mtp objects

2017-09-13 Thread Bandan Das
From: Bandan Das 

This is required because write of existing objects by the
initiator is acheived by making a temporary buffer with the
new changes, deleting the old file and then writing a new file
with the same name.

Note that this operation will fail as of this patch
since the store is advertised RO

Signed-off-by: Bandan Das 
---
 hw/usb/dev-mtp.c | 121 +++
 1 file changed, 121 insertions(+)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 63f8f3b..76494f3 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -46,6 +46,7 @@ enum mtp_code {
 CMD_GET_OBJECT_HANDLES = 0x1007,
 CMD_GET_OBJECT_INFO= 0x1008,
 CMD_GET_OBJECT = 0x1009,
+CMD_DELETE_OBJECT  = 0x100b,
 CMD_GET_PARTIAL_OBJECT = 0x101b,
 CMD_GET_OBJECT_PROPS_SUPPORTED = 0x9801,
 CMD_GET_OBJECT_PROP_DESC   = 0x9802,
@@ -62,6 +63,8 @@ enum mtp_code {
 RES_INVALID_STORAGE_ID = 0x2008,
 RES_INVALID_OBJECT_HANDLE  = 0x2009,
 RES_INVALID_OBJECT_FORMAT_CODE = 0x200b,
+RES_STORE_READ_ONLY= 0x200e,
+RES_PARTIAL_DELETE = 0x2012,
 RES_SPEC_BY_FORMAT_UNSUPPORTED = 0x2014,
 RES_INVALID_PARENT_OBJECT  = 0x201a,
 RES_INVALID_PARAMETER  = 0x201d,
@@ -799,6 +802,7 @@ static MTPData *usb_mtp_get_device_info(MTPState *s, 
MTPControl *c)
 CMD_GET_NUM_OBJECTS,
 CMD_GET_OBJECT_HANDLES,
 CMD_GET_OBJECT_INFO,
+CMD_DELETE_OBJECT,
 CMD_GET_OBJECT,
 CMD_GET_PARTIAL_OBJECT,
 CMD_GET_OBJECT_PROPS_SUPPORTED,
@@ -1113,6 +1117,120 @@ static MTPData *usb_mtp_get_object_prop_value(MTPState 
*s, MTPControl *c,
 return d;
 }
 
+/* Return correct return code for a delete event */
+enum {
+ALL_DELETE,
+PARTIAL_DELETE,
+READ_ONLY,
+};
+
+#ifndef CONFIG_INOTIFY1
+/* Assumes that children, if any, have been already freed */
+static void usb_mtp_object_free_one(MTPState *s, MTPObject *o)
+{
+assert(o->nchildren == 0);
+QTAILQ_REMOVE(>objects, o, next);
+g_free(o->name);
+g_free(o->path);
+g_free(o);
+}
+#endif
+
+static int usb_mtp_deletefn(MTPState *s, MTPObject *o, uint32_t trans)
+{
+MTPObject *iter, *iter2;
+bool partial_delete = false;
+bool success = false;
+
+/*
+ * TODO: Add support for Protection Status
+ */
+
+QLIST_FOREACH(iter, >children, list) {
+if (iter->format == FMT_ASSOCIATION) {
+QLIST_FOREACH(iter2, >children, list) {
+usb_mtp_deletefn(s, iter2, trans);
+}
+}
+}
+
+if (o->format == FMT_UNDEFINED_OBJECT) {
+if (remove(o->path)) {
+partial_delete = true;
+} else {
+#ifndef CONFIG_INOTIFY1
+usb_mtp_object_free_one(s, o);
+#endif
+success = true;
+}
+}
+
+if (o->format == FMT_ASSOCIATION) {
+if (rmdir(o->path)) {
+partial_delete = true;
+} else {
+#ifndef CONFIG_INOTIFY1
+usb_mtp_object_free_one(s, o);
+#endif
+success = true;
+}
+}
+
+if (success && partial_delete) {
+return PARTIAL_DELETE;
+}
+if (!success && partial_delete) {
+return READ_ONLY;
+}
+return ALL_DELETE;
+}
+
+static void usb_mtp_object_delete(MTPState *s, uint32_t handle,
+  uint32_t format_code, uint32_t trans)
+{
+MTPObject *o;
+int ret;
+
+/* Return error if store is read-only */
+if (!FLAG_SET(s, MTP_FLAG_WRITABLE)) {
+usb_mtp_queue_result(s, RES_STORE_READ_ONLY,
+ trans, 0, 0, 0, 0);
+return;
+}
+
+if (format_code != 0) {
+usb_mtp_queue_result(s, RES_SPEC_BY_FORMAT_UNSUPPORTED,
+ trans, 0, 0, 0, 0);
+return;
+}
+
+if (handle == 0xFFF) {
+o = QTAILQ_FIRST(>objects);
+} else {
+o = usb_mtp_object_lookup(s, handle);
+}
+if (o == NULL) {
+usb_mtp_queue_result(s, RES_INVALID_OBJECT_HANDLE,
+ trans, 0, 0, 0, 0);
+return;
+}
+
+ret = usb_mtp_deletefn(s, o, trans);
+if (ret == PARTIAL_DELETE) {
+usb_mtp_queue_result(s, RES_PARTIAL_DELETE,
+ trans, 0, 0, 0, 0);
+return;
+} else if (ret == READ_ONLY) {
+usb_mtp_queue_result(s, RES_STORE_READ_ONLY, trans,
+ 0, 0, 0, 0);
+return;
+} else {
+usb_mtp_queue_result(s, RES_OK, trans,
+ 0, 0, 0, 0);
+return;
+}
+}
+
 static void usb_mtp_command(MTPState *s, MTPControl *c)
 {
 MTPData *data_in = NULL;
@@ -1239,6 +1357,9 @@ static void usb_mtp_command(MTPState *s, MTPControl *c)
 return;
 }
 break;
+case CMD_DELETE_OBJECT:
+

Re: [Qemu-devel] [PATCH] ppc/pnv: fix cores per chip for multiple cpus

2017-09-13 Thread Nikunj A Dadhania
David Gibson  writes:

> On Mon, Sep 11, 2017 at 10:40:10AM +0530, Nikunj A Dadhania wrote:
>> David Gibson  writes:
>> 
>> > On Wed, Sep 06, 2017 at 01:57:48PM +0530, Nikunj A Dadhania wrote:
>> >> When the user does not provide the cpu topology, e.g. "-smp 4", machine 
>> >> fails to
>> >> initialize 4 cpus. Compute the chip per cores depending on the number of 
>> >> chips
>> >> and smt threads.
>> >> 
>> >> Signed-off-by: Nikunj A Dadhania 
>> >
>> > I don't understand why simply treating smp_cores as cores per chip is 
>> > wrong.
>> 
>> We do not have SMT support and when "-smp 4" is passed, smp_cores is
>> always set to 1. So only once core with one thread finally show up in
>> the guest. Moreover, I see spapr too doing similar thing in
>> spapr_init_cpus() with boot_cores_nr.
>
> I'm ok with adding an error if smp_threads > 1, since that won't
> work.  Breaking the identity that smp_cores is the number of cores per
> socket (which should correspond to one chip) is not ok, though.
>
> I think you're misinterpreting the boot_cores_nr stuff - that's just
> about translating the number of initially online vcpus into a number
> of initially online cores.

I thought, I am doing the same here for PowerNV, number of online cores
is equal to initial online vcpus / threads per core

   int boot_cores_nr = smp_cpus / smp_threads;

Only difference that I see in PowerNV is that we have multiple chips
(max 2, at the moment)

cores_per_chip = smp_cpus / (smp_threads * pnv->num_chips);

And in case user has provided sane smp_cores, we use it.

Regards,
Nikunj




Re: [Qemu-devel] [PATCH] accel/hax: move hax-stub.c to accel/stubs/

2017-09-13 Thread Philippe Mathieu-Daudé
>> --- a/accel/stubs/Makefile.objs
>> +++ b/accel/stubs/Makefile.objs
>> @@ -1,2 +1,3 @@
>>  obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
>>  obj-$(call lnot,$(CONFIG_TCG)) += tcg-stub.o
>> +obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o
>
> Reviewed-by: Stefan Weil 

Thanks!

>
> Personally, I like alphabetic order for such lines.
> Maybe that can be done when it is applied.

Fine by me ;)

Richard if you plan to take this, tell me if you prefer another patch
alphabetically sorted.

Regards,

Phil.



Re: [Qemu-devel] [Qemu devel v8 PATCH 2/5] msf2: Microsemi Smartfusion2 System Register block

2017-09-13 Thread Philippe Mathieu-Daudé

Hi Sundeep,

On 09/07/2017 04:24 PM, Subbaraya Sundeep wrote:

Added Sytem register block of Smartfusion2.
This block has PLL registers which are accessed by guest.

Signed-off-by: Subbaraya Sundeep 
Reviewed-by: Alistair Francis 
---
  hw/misc/Makefile.objs |   1 +
  hw/misc/msf2-sysreg.c | 195 ++
  include/hw/misc/msf2-sysreg.h |  78 +
  3 files changed, 274 insertions(+)
  create mode 100644 hw/misc/msf2-sysreg.c
  create mode 100644 include/hw/misc/msf2-sysreg.h

diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 29fb922..e8f0a02 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -59,3 +59,4 @@ obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
  obj-$(CONFIG_AUX) += auxbus.o
  obj-$(CONFIG_ASPEED_SOC) += aspeed_scu.o aspeed_sdmc.o
  obj-y += mmio_interface.o
+obj-$(CONFIG_MSF2) += msf2-sysreg.o
diff --git a/hw/misc/msf2-sysreg.c b/hw/misc/msf2-sysreg.c
new file mode 100644
index 000..40d603d
--- /dev/null
+++ b/hw/misc/msf2-sysreg.c
@@ -0,0 +1,195 @@
+/*
+ * System Register block model of Microsemi SmartFusion2.
+ *
+ * Copyright (c) 2017 Subbaraya Sundeep 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/misc/msf2-sysreg.h"
+
+#ifndef MSF2_SYSREG_ERR_DEBUG
+#define MSF2_SYSREG_ERR_DEBUG  0
+#endif
+
+#define DB_PRINT_L(lvl, fmt, args...) do { \
+if (MSF2_SYSREG_ERR_DEBUG >= lvl) { \
+qemu_log("%s: " fmt "\n", __func__, ## args); \
+} \
+} while (0);
+
+#define DB_PRINT(fmt, args...) DB_PRINT_L(1, fmt, ## args)
+
+static inline int msf2_divbits(uint32_t div)


Please directly use ctz32() instead of msf2_divbits()


+{
+int ret = 0;
+
+switch (div) {
+case 1:
+ret = 0;
+break;
+case 2:
+ret = 1;
+break;
+case 4:
+ret = 2;
+break;
+case 8:
+ret = 4;
+break;
+case 16:
+ret = 5;
+break;
+case 32:
+ret = 6;
+break;
+default:
+break;
+}
+
+return ret;
+}
+
+static void msf2_sysreg_reset(DeviceState *d)
+{
+MSF2SysregState *s = MSF2_SYSREG(d);
+
+DB_PRINT("RESET");
+
+s->regs[MSSDDR_PLL_STATUS_LOW_CR] = 0x021A2358;
+s->regs[MSSDDR_PLL_STATUS] = 0x3;
+s->regs[MSSDDR_FACC1_CR] = msf2_divbits(s->apb0div) << 5 |
+   msf2_divbits(s->apb1div) << 2;
+}
+
+static uint64_t msf2_sysreg_read(void *opaque, hwaddr offset,
+unsigned size)
+{
+MSF2SysregState *s = opaque;
+uint32_t ret = 0;
+
+offset >>= 2;
+if (offset < ARRAY_SIZE(s->regs)) {


This comment is controversial, I'll let Peter nod.

The SYSREG behaves differently regarding which bus access it (CPU, AHB).
You are implementing CPU access to the SYSREG, the registers have 
different permissions when accessed by the CPU. (see the SmartFusion2 
User Guide: Table 21-1 "Register Types" and Table 21-2 "Register Map").


I'd think of this stub:

switch(reg) {
case register_supported1:
case register_supported2:
case register_supported3:
   ret = s->regs[offset];
   trace_msf2_sysreg_read(offset, ret);
   break;
case RO-U:
   qemu_log_mask(LOG_GUEST_ERROR, "Illegal AHB access 0x%08"...
   break;
case W1P:
   qemu_log_mask(LOG_GUEST_ERROR, "Illegal read access ...
   break;
case RW:
case RW-P:
case RO:
case RO-P:
default:
   ret = s->regs[offset];
   qemu_log_mask(LOG_UNIMP, "...
   break;
}

Anyway keep this comment for further improvements, it's probably not 
useful for your use case.



+ret = s->regs[offset];
+DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx32,
+offset << 2, ret);


Can you replace DB_PRINT by traces? such:

   trace_msf2_sysreg_read(...);


+} else {
+qemu_log_mask(LOG_GUEST_ERROR,
+"%s: Bad offset 0x%08" HWADDR_PRIx "\n", __func__,
+offset << 2);
+}
+
+return ret;
+}
+
+static void msf2_sysreg_write(void *opaque, hwaddr offset,
+  uint64_t val, unsigned size)
+{
+MSF2SysregState *s = opaque;
+uint32_t newval = val;
+
+DB_PRINT("addr: 0x%08" HWADDR_PRIx " data: 0x%08" PRIx64,
+offset, val);
+
+offset >>= 2;
+
+switch (offset) {
+case MSSDDR_PLL_STATUS:

   trace_msf2_sysreg_write_pll_status();


+break;
+
+case ESRAM_CR:
+if (newval != 

[Qemu-devel] [Bug 1037675] Re: Guest Kernel Panic if using "-cpu host" in qemu-kvm 1.1.1

2017-09-13 Thread Launchpad Bug Tracker
[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1037675

Title:
  Guest Kernel Panic if using "-cpu host" in qemu-kvm 1.1.1

Status in QEMU:
  Expired

Bug description:
  After Upgrading to qemu-kvm-1.1.1-r1 from version 1.0.1-r1 my virtual
  machines (running gentoo linux) panic at intel_pmu_init. (detailed
  information including stacktrace are in the uploaded screenshot). When
  i remove the "-cpu host" option, the system starts normally.

  the command line from whicht the system is bootet:

  qemu-kvm -vnc :7 -usbdevice tablet -daemonize -m 256 -drive
  file=/data/virtual_machines/wgs-l08.img,if=virtio  -boot c -k de -net
  nic,model=virtio,macaddr=12:12:00:12:34:63,vlan=0 -net
  tap,ifname=qtap6,script=no,downscript=no,vlan=0 -smp 2 -enable-kvm
  -cpu host -monitor unix:/var/run/qemu-
  kvm/wgs-l08.monitor,server,nowait

  also reported on gentoo bug tracker (with some more details of the
  host): https://bugs.gentoo.org/show_bug.cgi?id=431640

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1037675/+subscriptions



Re: [Qemu-devel] [PATCH v7 31/38] libqtest: Merge qtest_clock_*() with clock_*()

2017-09-13 Thread Thomas Huth
On 12.09.2017 15:35, Eric Blake wrote:
> On 09/12/2017 05:45 AM, Thomas Huth wrote:
>> On 11.09.2017 19:20, Eric Blake wrote:
>>> Maintaining two layers of libqtest APIs, one that takes an explicit
>>> QTestState object, and the other that uses the implicit global_qtest,
>>> is annoying.  In the interest of getting rid of global implicit
>>> state and having less code to maintain, merge:
>>>  qtest_clock_set()
>>>  qtest_clock_step()
>>>  qtest_clock_step_next()
>>> with their short counterparts.  All callers that previously
>>> used the short form now make it explicit that they are relying on
>>> global_qtest, and later patches can then clean things up to remove
>>> the global variable.
>>>
> 
>>> @@ -446,7 +446,7 @@ int64_t qtest_clock_step(QTestState *s, int64_t step);
>>>   *
>>>   * Returns: The current value of the QEMU_CLOCK_VIRTUAL in nanoseconds.
>>>   */
>>> -int64_t qtest_clock_set(QTestState *s, int64_t val);
>>> +int64_t clock_set(QTestState *s, int64_t val);
>>  Could we please keep the "qtest" prefix here and rather get rid of the
>> other ones? Even if it's more to type, I prefer to have a proper prefix
>> here so that it is clear at the first sight that the functions belong to
>> the qtest framework.
> 
> I suppose we can, although it makes more lines that are likely to bump
> up against 80 columns, and thus slightly more churn to reformat things
> to keep checkpatch happy.  I like the shorter name, because less typing
> is easier to remember.  I'd prefer a second opinion on naming before
> doing anything about it though - Markus or Paolo, do you have any
> preference?

IMHO you should at least keep the qtest prefix in patch 33/38 to avoid
confusion with the system definitions that have the same names (see "man
2 outb" for example).

 Thomas



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2] tests: Introduce generic device hot-plug/hot-unplug functions

2017-09-13 Thread Peter Xu
On Wed, Sep 13, 2017 at 08:40:54PM +0200, Thomas Huth wrote:
> A lot of tests provide code for adding and removing a device via the
> device_add and device_del QMP commands. Maintaining this code in so many
> places is cumbersome and error-prone (some of the code parts check the
> responses for device deletion in an incorrect way, for example, we've got
> to deal with both, error code and DEVICE_DEL event here). So let's provide
> some proper generic functions for adding and removing a device instead.
> 
> The code for correctly unplugging a device has been taken from a patch
> from Peter Xu.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Peter Xu 

I verified that this patch passes the "make check" either on master
branch or my private branch (which broke before this one). So:

Tested-by: Peter Xu 

Thanks!

-- 
Peter Xu



Re: [Qemu-devel] [Qemu devel v8 PATCH 0/5] Add support for Smartfusion2 SoC

2017-09-13 Thread Philippe Mathieu-Daudé

On 09/08/2017 04:24 AM, sundeep subbaraya wrote:

Hi Phillipe,

On Fri, Sep 8, 2017 at 3:14 AM, Philippe Mathieu-Daudé > wrote:


Hi Subbaraya,

very good work!

On 09/07/2017 04:24 PM, Subbaraya Sundeep wrote:

Hi Qemu-devel,

I am trying to add Smartfusion2 SoC.
SoC is from Microsemi and System on Module(SOM)
board is from Emcraft systems. Smartfusion2 has hardened
Microcontroller(Cortex-M3)based Sub System and FPGA fabric.
At the moment only system timer, sysreg and SPI
controller are modelled.

Testing:
./arm-softmmu/qemu-system-arm -M smartfusion2-som -serial
mon:stdio \
-kernel u-boot.bin -display none -drive
file=spi.bin,if=mtd,format=raw


"-M emcraft-sf2" ;)


copy/paste error :)


U-Boot 2010.03-00147-g7da5092 (Jul 03 2017 - 09:04:50)

CPU  : SmartFusion2 SoC (Cortex-M3 Hard IP)
Freqs: CORTEX-M3=142MHz,PCLK0=71MHz,PCLK1=71MHz
Board: M2S-FG484-SOM Rev 1A, www.emcraft.com 
DRAM:  64 MB
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   M2S_MAC
Hit any key to stop autoboot:  0
16384 KiB S25FL128P_64K at 0:0 is now current device
## Booting kernel from Legacy Image at a0007fc0 ...
    Image Name:   dtskernel
    Image Type:   ARM Linux Kernel Image (uncompressed)
    Data Size:    2664224 Bytes =  2.5 MB
    Load Address: a0008000
    Entry Point:  a0008001
    Verifying Checksum ... OK
    Loading Kernel Image ... OK
OK

Starting kernel ...

NVIC: Bad read offset 0xd74
[    0.00] Booting Linux on physical CPU 0x0
[    0.00] Linux version 4.5.0-1-g3aa90e8-dirty
(sundeep@sundeep-Vostro-3458) (gcc version 5.4.0 (GCC) ) #102
PREEMPT Tue May 16 19:43:40 IST 2017
[    0.00] CPU: ARMv7-M [410fc231] revision 1 (ARMv7M), cr=
[    0.00] CPU: unknown data cache, unknown instruction cache
[    0.00] Machine model: Microsemi SmartFusion 2 development board
[    0.00] Kernel command line: console=ttyS0,115200n8 panic=10
mem=64M@0xa000 earlyprintk
[    0.00] Memory: 62204K/65536K available (1472K kernel code,
73K rwdata, 652K rodata, 400K init, 120K bss, 3332K reserved, 0K
cma-reserved)
[    0.00] NR_IRQS:16 nr_irqs:16 16
[    0.001178] sched_clock: 32 bits at 83MHz, resolution 12ns, wraps
every 25873297401ns
[    0.003085] clocksource: msf2_clocksource: mask: 0x
max_cycles: 0x, max_idle_ns: 23027234290 ns
[    0.009732] timer at 40004000, irq=16
[    0.014475] Calibrating delay loop... 442.36 BogoMIPS (lpj=2211840)
[    0.653685] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[    0.690789] console [ttyS0] disabled
[    0.696659] 4000.serial: ttyS0 at MMIO 0x4000 (irq = 17,
base_baud = 5187500) is a 16550
[    0.702725] console [ttyS0] enabled
[    0.826251] Freeing unused kernel memory: 400K (a021c000 - a028)
init started: BusyBox v1.24.1 (2017-05-15 09:57:00 IST)
~ #

(qemu) info mtree
address-space: cpu-memory
   - (prio 0, i/o): armv7m-container
     - (prio -1, i/o): system
       -0003 (prio 0, i/o): alias
MSF2.eNVM.alias @MSF2.eNVM -0003
       2000-2000 (prio 0, ram): MSF2.eSRAM
       4000-401f (prio 0, i/o): serial
       40001000-4000103f (prio 0, i/o): mss-spi
       40004000-4000402f (prio 0, i/o): mss-timer
       40011000-4001103f (prio 0, i/o): mss-spi
       40038000-400382ff (prio 0, i/o): msf2-sysreg
       6000-6003 (prio 0, rom): MSF2.eNVM
       a000-a3ff (prio 0, ram): ddr-ram
     2200-23ff (prio 0, i/o): bitband
     4200-43ff (prio 0, i/o): bitband
     e000e000-e000efff (prio 0, i/o): nvic
       e000e000-e000efff (prio 0, i/o): nvic_sysregs
       e000e010-e000e0ef (prio 1, i/o): systick

(qemu) info qtree
bus: main-system-bus
   type System
   dev: msf2-soc, id ""
     part-name = "M2S010"
     eNVM-size = 262144 (0x4)
     eSRAM-size = 65536 (0x1)
     m3clk = 14200 (0x876bf80)
     apb0div = 2 (0x2)
     apb1div = 2 (0x2)
   dev: mss-spi, id ""
     gpio-out "sysbus-irq" 2
     mmio 40011000/0040
     bus: spi
       type SSI
   dev: mss-spi, id ""
     gpio-out "sysbus-irq" 2
     mmio 40001000/0040
     bus: 

Re: [Qemu-devel] [PATCH 4/4] ppc: ensure we update the decrementer value during migration

2017-09-13 Thread David Gibson
On Wed, Sep 13, 2017 at 07:58:23PM +0200, Laurent Vivier wrote:
> On 13/09/2017 19:11, Mark Cave-Ayland wrote:
> > On 13/09/17 08:12, David Gibson wrote:
> > 
> >> This is subtly incorrect.  It sets the DECR on load to exactly the
> >> value that was saved.  That effectively means that the DECR is frozen
> >> for the migration downtime, which probably isn't what we want.
> >>
> >> Instead we need to save the DECR as an offset from the timebase, and
> >> restore it relative to the (downtime adjusted) timebase on the
> >> destination.
> >>
> >> The complication with that is that the timebase is generally migrated
> >> at the machine level, not the cpu level: the timebase is generally
> >> synchronized between cpus in the machine, and migrating it at the
> >> individual cpu could break that.  Which means we probably need a
> >> machine level hook to handle the decrementer too, even though it
> >> logically *is* per-cpu, because otherwise we'll be trying to restore
> >> it before the timebase is restored.
> > 
> > I know that we discussed this in-depth last year, however I was working
> > along the lines that Laurent's patch fixed this along the lines of our
> > previous discussion:
> > https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00338.html (and
> > indeed Laurent's analysis at
> > https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg01489.html).
> > 
> > However looking again at the this patch in the context you mentioned
> > above, I'm starting to wonder if the right solution now is for the
> > machine init function for g3beige/mac99 to do the same as spapr, e.g.
> > add cpu_ppc_clock_vm_state_change() as a vm_change_state_handler and
> > then add VMSTATE_PPC_TIMEBASE_V from the machine PPCTimeBase into my new
> > subsection?
> > 
> > Laurent, do you think that your state change handler would work
> > correctly in this way?
> 
> I think all is explained in the second link you have mentioned, it seems 
> we don't need a state handler as KVM DECR will no be updated by the migrated 
> value:
> 
> hw/ppc/ppc.c
> 
> 736 static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
> 737  QEMUTimer *timer,
> 738  void (*raise_excp)(void *),
> 739  void (*lower_excp)(PowerPCCPU *),
> 740  uint32_t decr, uint32_t value)
> 741 {
> ...
> 749 if (kvm_enabled()) {
> 750 /* KVM handles decrementer exceptions, we don't need our own 
> timer */
> 751 return;
> 752 }
> ...
> 
> But this allows to migrate it for TCG. And it should be correct because in 
> case of TCG I think [need to check] timebase is stopped too (so offset is 0)
> 
> David, do you agree with that?

Yes, I think so.  Some details might be different, but the basic idea
of migrating the timebase and decrementers at machine level should be
the same for pseries and g3beige/whatever.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription

2017-09-13 Thread David Gibson
On Wed, Sep 13, 2017 at 07:13:09PM +0200, Greg Kurz wrote:
> On Wed, 13 Sep 2017 17:44:54 +0100
> Mark Cave-Ayland  wrote:
> 
> > On 13/09/17 07:02, David Gibson wrote:
> > 
> > >>> Alexey - do you recall from your analysis why these fields were no
> > >>> longer deemed necessary, and how your TCG tests were configured?  
> > >>
> > >> I most certainly did not do analysis (my bad. sorry) - I took the patch
> > >> from David as he left the team, fixed to compile and pushed away. I am 
> > >> also
> > >> very suspicions we did not try migrating TCG or anything but pseries. My
> > >> guest that things did not break (if they did not which I am not sure 
> > >> about,
> > >> for the TCG case) because the interrupt controller (XICS) or the
> > >> pseries-guest took care of resending an interrupt which does not seem to 
> > >> be
> > >> the case for mac99.  
> > > 
> > > Right, that's probably true.  The main point, though, is that these
> > > fields were dropped a *long* time ago, when migration was barely
> > > working to begin with.  In particular I'm pretty sure most of the
> > > non-pseries platforms were already pretty broken for migration
> > > (amongst other things).
> > > 
> > > Polishing the mac platforms up to working again, including migration,
> > > is a reasonable goal.  But it can't be at the expense of pseries,
> > > which is already working, used in production, and much better tested
> > > than mac99 or g3beige ever were.  
> > 
> > Oh I completely agree since I'm well aware pseries likely has more users
> > than the Mac machines - my question was directed more about why we
> > support backwards migration.
> > 
> 
> Downstream support backward migration because end users/customers ask for it
> for maximum flexibility when it comes to move workloads around different 
> systems
> with different QEMU versions. This is fairly usual in data centers with many
> systems.

One specific case where this matters is if you want to update the qemu
version on an entire cluster of hosts.  Management layers like
oVirt/RHV allow you to do that without disrupting guests by migrating
them off each host, one by one, allowing you to remove it from the
cluster, update and then reinsert after which guests may be migrated
onto it again.

But that process could obviously take a fair while for a big cluster;
days or even weeks.   In the meantime you have a cluster with mixed
qemu versions, and since oVirt/RHV/whatever will also migrate guests
around the cluster to load balance, that means the possibility of a
backwards migration.

> As others already said, breaking things upstream may turn downstream work
> into a nightmare (and FWIW, most of the people working on ppc are also
> involved in downstream work).

Right.  And I even used to work as a RHV support tech too :).

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription

2017-09-13 Thread David Gibson
On Wed, Sep 13, 2017 at 05:44:54PM +0100, Mark Cave-Ayland wrote:
> On 13/09/17 07:02, David Gibson wrote:
> 
> >>> Alexey - do you recall from your analysis why these fields were no
> >>> longer deemed necessary, and how your TCG tests were configured?
> >>
> >> I most certainly did not do analysis (my bad. sorry) - I took the patch
> >> from David as he left the team, fixed to compile and pushed away. I am also
> >> very suspicions we did not try migrating TCG or anything but pseries. My
> >> guest that things did not break (if they did not which I am not sure about,
> >> for the TCG case) because the interrupt controller (XICS) or the
> >> pseries-guest took care of resending an interrupt which does not seem to be
> >> the case for mac99.
> > 
> > Right, that's probably true.  The main point, though, is that these
> > fields were dropped a *long* time ago, when migration was barely
> > working to begin with.  In particular I'm pretty sure most of the
> > non-pseries platforms were already pretty broken for migration
> > (amongst other things).
> > 
> > Polishing the mac platforms up to working again, including migration,
> > is a reasonable goal.  But it can't be at the expense of pseries,
> > which is already working, used in production, and much better tested
> > than mac99 or g3beige ever were.
> 
> Oh I completely agree since I'm well aware pseries likely has more users
> than the Mac machines - my question was directed more about why we
> support backwards migration.
> 
> I spent several hours yesterday poking my Darwin test case with trying
> the different combinations of pending_interrupts, irq_input_state and
> access_type and could easily provoke migration failures unless all 3 of
> the fields were present so a practical test shows they are still
> required for TCG migration. I think ppc_set_irq()'s use of the interrupt
> fields in hw/ppc/ppc.c and the subsequent reference to pending
> interrupts in target/ppc may explain why I see freezes/hangs until a key
> is pressed in many cases.

Ok, I think we need to consider (pending_interrupts and irq_input_state)
separately from access_type.  The first two are pretty closely related
to each other, and I've got at least a rough idea of what the problems
there might be.  access_type I'm pretty sure has to be an unrelated
problem, and I've got much less of a handle on it.

I suspect we could work around the problems with pending_interrupts
and irq_input_state by having a post_load hook in the board level
interrupt controller to reassert its output irq line based on its
current state.  I believe the relevant irq inputs to the cpu are
effectively level triggered, so I think that will be enough.

access_type I don't have any good ideas for yet.  We really need to
work out what the exact race is here that's causing its state to be
lost harmfully.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 3/4] ppc: add CPU access_type into the migration stream

2017-09-13 Thread David Gibson
On Wed, Sep 13, 2017 at 06:17:21PM +0100, Mark Cave-Ayland wrote:
> On 13/09/17 08:19, David Gibson wrote:
> 
> >> When pausing a VM, does execution stop at the end of the current TB
> >> rather than immediately? If so, perhaps someone could confirm that
> >> guarantee is good enough for access_type?
> > 
> > I'm pretty sure it has to; we'd have to come up out of an individual
> > TB in order to get to the main loop where we check the "please pause"
> > flag.  I'm not sure if that helps us here though - I *think* access
> > type is about carrying information from the point where we trigger an
> > exception to the point where we actually start processing the
> > exception.
> > 
> > This code is really ugly and I've never understood it well :(. It's
> > always seemed bogus to me that we have an essentially global variable
> > to carry information over that small gap, though.  Unfortunately it's
> > unlikely that I'd be able to dive into this and work out if it's
> > really needed any time soon.
> 
> >From my testing yesterday it does appear to be required for TCG (or
> unless this is exposing another bug in the Mac migration) so I can check
> it works here and then maybe someone else confirm it works on KVM?
> 
> A couple of things I've noticed: the heathrow VMStateDescription looks
> good, however I can see that the OpenPIC timers won't likely migrate
> correctly without adding a few more fields - that's easy to fix.

Right.  And since OpenPIC isn't used on any platforms that have real
production use in the wild, it's fine to bump the migration stream
version for it.

> Another thing is that if we're changing the migration stream for Mac
> models Ben has some OpenPIC and other related changes in his wip queue
> that it might make sense to put those in first before properly fixing
> the Mac machine migration.

That would have something to be said for it.

It's probably not essential, though, since I don't consider the
non-pseries platforms at this stage sufficiently mature that we
guarantee a stable migration stream format.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH v2 5/5] arm: drop intermediate cpu_model -> cpu type parsing and use cpu type directly

2017-09-13 Thread Philippe Mathieu-Daudé

Hi Igor,

awesome clean refactor!

just 1 comment inlined.

On 09/13/2017 01:04 PM, Igor Mammedov wrote:

there are 2 use cases to deal with:
   1: fixed CPU models per board/soc
   2: boards with user configurable cpu_model and fallback to
  default cpu_model if user hasn't specified one explicitly

For the 1st
   drop intermediate cpu_model parsing and use const cpu type
   directly, which replaces:
  typename = object_class_get_name(
cpu_class_by_name(TYPE_ARM_CPU, cpu_model))
  object_new(typename)
   with
  object_new(FOO_CPU_TYPE_NAME)
   or
  cpu_generic_init(BASE_CPU_TYPE, "my cpu model")
   with
  cpu_create(FOO_CPU_TYPE_NAME)

as result 1st use case doesn't have to invoke not necessary
translation and not needed code is removed.

For the 2nd
  1: set default cpu type with MachineClass::default_cpu_type and
  2: use generic cpu_model parsing that done before machine_init()
 is run and:
 2.1: drop custom cpu_model parsing where pattern is:
typename = object_class_get_name(
cpu_class_by_name(TYPE_ARM_CPU, cpu_model))
[parse_features(typename, cpu_model, ) ]

 2.2: or replace cpu_generic_init() which does what
  2.1 does + create_cpu(typename) with just
  create_cpu(machine->cpu_type)
as result cpu_name -> cpu_type translation is done using
generic machine code one including parsing optional features
if supported/present (removes a bunch of duplicated cpu_model
parsing code) and default cpu type is defined in an uniform way
within machine_class_init callbacks instead of adhoc places
in boadr's machine_init code.

Signed-off-by: Igor Mammedov 
Reviewed-by: Eduardo Habkost 
---
v2:
  - fix merge conflicts with ignore_memory_transaction_failures
  - fix couple merge conflicts where SoC type string where replaced by type 
macro
  - keep plain prefix string in: strncmp(cpu_type, "pxa27", 5)
  - s/"%s" ARM_CPU_TYPE_SUFFIX/ARM_CPU_TYPE_NAME("%s")/
---
  include/hw/arm/armv7m.h|  2 +-
  include/hw/arm/aspeed_soc.h|  2 +-
  include/hw/arm/stm32f205_soc.h |  2 +-
  target/arm/cpu.h   |  3 +++
  hw/arm/armv7m.c| 40 +---
  hw/arm/aspeed_soc.c| 13 +---
  hw/arm/collie.c| 10 +++--
  hw/arm/exynos4210.c|  6 +-
  hw/arm/gumstix.c   |  5 +++--
  hw/arm/highbank.c  | 10 -
  hw/arm/integratorcp.c  | 30 ++-
  hw/arm/mainstone.c |  9 -
  hw/arm/mps2.c  | 17 +++-
  hw/arm/musicpal.c  |  7 ++-
  hw/arm/netduino2.c |  2 +-
  hw/arm/nseries.c   |  4 +++-
  hw/arm/omap1.c |  7 ++-
  hw/arm/omap2.c |  4 ++--
  hw/arm/omap_sx1.c  |  5 -
  hw/arm/palm.c  |  5 +++--
  hw/arm/pxa2xx.c| 10 -
  hw/arm/realview.c  | 25 +--
  hw/arm/spitz.c | 12 ++-
  hw/arm/stellaris.c | 16 +++
  hw/arm/stm32f205_soc.c |  4 ++--
  hw/arm/strongarm.c | 10 +++--
  hw/arm/tosa.c  |  4 
  hw/arm/versatilepb.c   | 15 +++---
  hw/arm/vexpress.c  | 32 +
  hw/arm/virt.c  | 46 +-
  hw/arm/xilinx_zynq.c   | 10 ++---
  hw/arm/z2.c|  9 +++--
  target/arm/cpu.c   |  2 +-
  33 files changed, 114 insertions(+), 264 deletions(-)

diff --git a/include/hw/arm/armv7m.h b/include/hw/arm/armv7m.h
index 10eb058..68cb30d 100644
--- a/include/hw/arm/armv7m.h
+++ b/include/hw/arm/armv7m.h
@@ -55,7 +55,7 @@ typedef struct ARMv7MState {
  MemoryRegion container;
  
  /* Properties */

-char *cpu_model;
+char *cpu_type;
  /* MemoryRegion the board provides to us (with its devices, RAM, etc) */
  MemoryRegion *board_memory;
  } ARMv7MState;
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index 0b88baa..f26914a 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -49,7 +49,7 @@ typedef struct AspeedSoCState {
  
  typedef struct AspeedSoCInfo {

  const char *name;
-const char *cpu_model;
+const char *cpu_type;
  uint32_t silicon_rev;
  hwaddr sdram_base;
  uint64_t sram_size;
diff --git a/include/hw/arm/stm32f205_soc.h b/include/hw/arm/stm32f205_soc.h
index e2dce11..922a733 100644
--- a/include/hw/arm/stm32f205_soc.h
+++ b/include/hw/arm/stm32f205_soc.h
@@ -52,7 +52,7 @@ typedef struct STM32F205State {
  SysBusDevice parent_obj;
  /*< public >*/
  
-char *cpu_model;

+char *cpu_type;
  
  ARMv7MState armv7m;
  
diff --git a/target/arm/cpu.h b/target/arm/cpu.h

index 98b9b26..1bfdd8d 

Re: [Qemu-devel] [PATCH] accel/hax: move hax-stub.c to accel/stubs/

2017-09-13 Thread Stefan Weil
Am 14.09.2017 um 00:11 schrieb Philippe Mathieu-Daudé:
> Suggested-by: Paolo Bonzini 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  Makefile.target  | 1 -
>  hax-stub.c => accel/stubs/hax-stub.c | 0
>  accel/stubs/Makefile.objs| 1 +
>  3 files changed, 1 insertion(+), 1 deletion(-)
>  rename hax-stub.c => accel/stubs/hax-stub.c (100%)
> 
> diff --git a/Makefile.target b/Makefile.target
> index 6361f957fb..af4b60d012 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -101,7 +101,6 @@ obj-y += fpu/softfloat.o
>  obj-y += target/$(TARGET_BASE_ARCH)/
>  obj-y += disas.o
>  obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
> -obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o
>  
>  obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decContext.o
>  obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decNumber.o
> diff --git a/hax-stub.c b/accel/stubs/hax-stub.c
> similarity index 100%
> rename from hax-stub.c
> rename to accel/stubs/hax-stub.c
> diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
> index fdfbf7332c..8038ad6553 100644
> --- a/accel/stubs/Makefile.objs
> +++ b/accel/stubs/Makefile.objs
> @@ -1,2 +1,3 @@
>  obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
>  obj-$(call lnot,$(CONFIG_TCG)) += tcg-stub.o
> +obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o

Reviewed-by: Stefan Weil 

Personally, I like alphabetic order for such lines.
Maybe that can be done when it is applied.

Stefan



Re: [Qemu-devel] [PATCH] target-i386: enable kvm_pv_unhalt by default

2017-09-13 Thread Eduardo Habkost
On Wed, Sep 13, 2017 at 04:39:58PM +0200, Alexander Graf wrote:
> Commit f010bc643a (target-i386: add feature kvm_pv_unhalt) introduced the
> kvm_pv_unhalt feature but didn't enable it by default.
> 
> Without kvm_pv_unhalt we see a measurable degradation in scheduling
> performance, so enabling it by default does make sense IMHO. This patch
> just flips it to default to on by default.
> 
>   [With kvm_pv_unhalt disabled]
>   $ perf bench sched messaging -l 1
> Total time: 8.573 [sec]
> 
>   [With kvm_pv_unhalt enabled]
>   $ perf bench sched messaging -l 1
> Total time: 4.416 [sec]
> 
> Signed-off-by: Alexander Graf 
> 
> ---
> 
> Let's ask everyone who was involved back then whether this is a feature
> that is good to enable by default. My measurements imply so, but who knows...
> 
> Also, I'd kindly like to ask for guidance on how to make this reasonably
> well backwards compatible. I assume we only want to flip the default in newer
> machine models? If so, how?

There are two things that need to be taken care of:

* Guest ABI compatibility: if running an older machine-type, the
  feature needs to remain disabled.
* machine-type runnability: libvirt doesn't expect an existing VM
  to become not runnable on the same host if changing only the
  machine-type.  This can happen if the new machine-type enables
  a feature that's not supported by the host kernel.

Guest ABI compatibility is relatively easy to handle: see how we
deal with kvm-pv-eoi on pc-1.2.  Moving the machine-type-specific
kvm-defaults to PCMachineClass (instead of introducing new
pc_compat_*() functions) would be interesting, though; I will
take a look and try to implement that.

machine-type runnability is trickier: it's a problem in theory if
people can be running old kernels that didn't support the
feature, but may be not a problem in practice if the feature is
so old that everybody running a recent QEMU is also running a
recent enough kernel.  In either case, the problem still needs to
be addressed somehow; just documenting the minimum kernel version
QEMU needs would probably be enough.

For reference, kvm_pv_unhalt was introduced in Linux v3.12-rc1.

> ---
>  target/i386/cpu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 69676e13e1..c58f4ab24f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -1581,6 +1581,7 @@ static PropValue kvm_default_props[] = {
>  { "kvm-asyncpf", "on" },
>  { "kvm-steal-time", "on" },
>  { "kvm-pv-eoi", "on" },
> +{ "kvm-pv-unhalt", "on" },
>  { "kvmclock-stable-bit", "on" },
>  { "x2apic", "on" },
>  { "acpi", "off" },
> -- 
> 2.12.3
> 

-- 
Eduardo



Re: [Qemu-devel] [virtio-dev] Re: [RFC 0/8] virtio-crypto: add multiplexing mode support

2017-09-13 Thread Longpeng (Mike)


On 2017/9/14 2:14, Halil Pasic wrote:

> 
> 
> On 09/11/2017 03:10 AM, Longpeng(Mike) wrote:
>> *NOTE*
>> The code realization is based on the latest virtio crypto spec:
>>  [PATCH v19 0/2] virtio-crypto: virtio crypto device specification
>>https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg05217.html
>>
>> In session mode, the process of create/close a session
>> makes we have a least one full round-trip cost from guest to host to guest
>> to be able to send any data for symmetric algorithms. It gets ourself into
>> synchronization troubles in some scenarios like a web server handling lots
>> of small requests whose algorithms and keys are different.
>>
>> We can support one-blob request (no sessions) as well for symmetric
>> algorithms, including HASH, MAC services. The benefit is obvious for
>> HASH service because it's usually a one-blob operation.
>>
> 
> Hi!
> 
> I've just started looking at this. Patch #1 modifies linux/virtio_crypto.h
> which if I compare with the (almost) latest linux master is different. Thus
> I would expect a corresponding kernel patch set too, but I haven't received
> one, nor did I find a reference in the cover letter.
> 
> I think if I want to test the new features I need the kernel counter-part
> too, or?
> 
> Could you point me to the kernel counterpart?
> 


Hi Halil,

We haven't implemented the kernel frontend part yet, but there's a testcase
based on qtest, you can use it.

Please see the attachment.

-- 
Regards,
Longpeng(Mike)

> Regards,
> Halil
> 
> 
>> Gonglei (3):
>>   virtio-crypto: add stateless crypto request handler
>>   cryptodev: extract one util function
>>   virtio-crypto: add host feature bits support
>>
>> Longpeng(Mike) (5):
>>   virtio-crypto: add new definations for multiplexing mode
>>   virtio-crypto: add session creation logic for mux mode
>>   virtio-crypto: add dataq operation logic for mux mode
>>   cryptodev: add stateless mode cipher support
>>   cryptodev-builtin: add stateless cipher support
>>
>>  backends/cryptodev-builtin.c   | 189 ---
>>  backends/cryptodev.c   |  21 ++
>>  hw/virtio/virtio-crypto.c  | 433 
>> +++--
>>  include/hw/virtio/virtio-crypto.h  |   2 +
>>  include/standard-headers/linux/virtio_crypto.h | 182 ++-
>>  include/sysemu/cryptodev.h |  21 ++
>>  6 files changed, 774 insertions(+), 74 deletions(-)
>>
> 
> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> 
> 
> .
> 


-- 
Regards,
Longpeng(Mike)
From 259359700b1847cd66f9c3e04a86a14546f6f0e0 Mon Sep 17 00:00:00 2001
From: Gonglei 
Date: Mon, 8 May 2017 13:42:53 +0800
Subject: [PATCH] qtest: emulate virtio crypto as a legacy device for
 experiment

Because the current qtest framework do not support virtio-1
or latter devices. For experimental purpose,
let's emulate the virtio crypto device as a legacy virtio
device by default. Using 0x1014 as virtio crypto pci device ID
because virtio crypto ID is 20 (0x14).

Signed-off-by: Gonglei 

virtio-crypto-test: add qtest case for virtio-crypto

We can simply test the functions of virtio crypto
device, including session creation, session closing,
cipher encryption and decryption.

Quick usage:
 # make tests/virtio-crypto-test && ./tests/virtio-crypto-test
  CCtests/virtio-crypto-test.o
  LINK  tests/virtio-crypto-test
/virtio/crypto/cbc(aes-128-session-mode): OK
/virtio/crypto/cbc(aes-128-stateless-mode): OK

Signed-off-by: Gonglei 
[rebase on the v19 spec]
Signed-off-by: Longpeng(Mike) 
---
 docs/specs/pci-ids.txt|   2 +
 hw/virtio/virtio-crypto-pci.c |   4 +-
 include/hw/pci/pci.h  |   2 +
 tests/Makefile.include|   3 +
 tests/virtio-crypto-test.c| 600 ++
 5 files changed, 610 insertions(+), 1 deletion(-)
 create mode 100644 tests/virtio-crypto-test.c

diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt
index bb99a02..61877b7 100755
--- a/docs/specs/pci-ids.txt
+++ b/docs/specs/pci-ids.txt
@@ -22,6 +22,7 @@ maintained as part of the virtio specification.
 1af4:1004  SCSI host bus adapter device (legacy)
 1af4:1005  entropy generator device (legacy)
 1af4:1009  9p filesystem device (legacy)
+1af4:1014  crypto device (legacy)
 
 1af4:1041  network device (modern)
 1af4:1042  block device (modern)
@@ -32,6 +33,7 @@ maintained as part of the virtio specification.
 1af4:1049  9p filesystem device (modern)
 1af4:1050  virtio gpu device (modern)
 1af4:1052  virtio input device (modern)
+1af4:1054  crypto device (modern)
 
 1af4:10f0  Available for experimental usage without registration.  Must get
to  official ID when the code leaves the test lab 

Re: [Qemu-devel] [PATCH v4 1/4] hmp: fix "dump-quest-memory" segfault (ppc)

2017-09-13 Thread David Gibson
On Wed, Sep 13, 2017 at 04:20:33PM +0200, Laurent Vivier wrote:
> Running QEMU with
> qemu-system-ppc64 -M none -nographic -m 256
> and executing
> dump-guest-memory /dev/null 0 8192
> results in segfault
> 
> Fix by checking if we have CPU, and exit with
> error if there is no CPU:
> 
> (qemu) dump-guest-memory /dev/null
> this feature or command is not currently supported
> 
> Signed-off-by: Laurent Vivier 
> Reviewed-by: Greg Kurz 
> Reviewed-by: Thomas Huth 

Acked-by: David Gibson 

> ---
>  target/ppc/arch_dump.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/arch_dump.c b/target/ppc/arch_dump.c
> index 8e9397aa58..95b9ab6f29 100644
> --- a/target/ppc/arch_dump.c
> +++ b/target/ppc/arch_dump.c
> @@ -224,8 +224,15 @@ typedef struct NoteFuncDescStruct NoteFuncDesc;
>  int cpu_get_dump_info(ArchDumpInfo *info,
>const struct GuestPhysBlockList *guest_phys_blocks)
>  {
> -PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
> -PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
> +PowerPCCPU *cpu;
> +PowerPCCPUClass *pcc;
> +
> +if (first_cpu == NULL) {
> +return -1;
> +}
> +
> +cpu = POWERPC_CPU(first_cpu);
> +pcc = POWERPC_CPU_GET_CLASS(cpu);
>  
>  info->d_machine = PPC_ELF_MACHINE;
>  info->d_class = ELFCLASS;

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [Qemu-block] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread John Snow


On 09/13/2017 05:36 PM, Adam Wolfe Gordon via Qemu-devel wrote:
> On Wed, Sep 13, 2017 at 2:53 PM, John Snow  wrote:
>> On 09/13/2017 12:44 PM, Adam Wolfe Gordon wrote:
>>> Register a watcher with rbd so that we get notified when an image is
>>> resized. Propagate resizes to parent block devices so that guest devices
>>> get resized without user intervention.
>>>
>>> Signed-off-by: Adam Wolfe Gordon 
>>> ---
>>> Hello!
>>>
>>> We are using this patch in production at DigitalOcean and have tested it 
>>> fairly
>>> extensively. However, we use the same block device configuration 
>>> everywhere, so
>>> I'd be interested to get an answer to the question I've left in the code:>
 /* NOTE: This assumes there's only one layer between us and the
block-backend. Is this always true? */
>>>
>>
>> Even if it were always true presently, the fact that the block layer
>> storage graph can be configured in arbitrary ways suggests that it might
>> not always be true in the future.
>>
>> Probably in practice MOST drives are configured like
>> [BB]<--[Format]<--[Protocol], but the block layer is adding a lot of fun
>> stuff like filter nodes for throttling or debugging, so it's probably
>> too much to assert that the length of the chain will ALWAYS be exactly two.
> 
> OK, thanks - that's exactly what I was wondering. I can add some code
> to traverse the parents until we get to the BB.
> 
>> Can I play dumb and ask what you are trying to accomplish? The concept
>> of a drive deciding to resize itself is not something I have a model for
>> understanding, presently...
> 
> Good question; I probably should have explained the use-case in the
> commit message.
> 
> We have a storage orchestration service that manages our ceph block
> storage clusters and doesn't interact directly with qemu. Volumes get
> resized through the orchestration service, which (after doing some

resized bigger, one hopes ...

> checks, updating external metadata, etc.) issues the resize via
> librbd. The drive doesn't exactly resize itself, but the trigger is
> out-of-band from any VM the drive may be attached to.

>From QEMU's POV, the drive resized itself! Facts on the ground may be
different, of course.

> 
> Previously, we would notify the VM of the resize by issuing a
> blockresize via qmp after doing the resize itself externally. That
> meant we were actually resizing the rbd image twice (though the second
> was, hopefully, a no-op). We occasionally had trouble with the resize
> issued by qemu getting stuck and blocking the qemu main thread.
> Detecting the out-of-band resize lets us avoid the extra rbd_resize
> call and means that we never modify an rbd image's metadata from qemu.
> 

Hm, I see... It sounds like you want an operation here that lets us
detect medium changes without actually attempting to orchestrate one.

It smells like you want the second half of bdrv_truncate without
actually issuing the call. Perhaps you could split this function into
its two halves, and in the event of an external resize being detected,
you could call the latter-half portion of bdrv_truncate.

...if the drive is configured to automatically detect those events, that
is. Conceivably not all resize events that QEMU *could* detect *should*
automatically result in guest-visible changes as soon as they occur.

--js



Re: [Qemu-devel] [libvirt] QEMU -M nvdimm=on and hotplug

2017-09-13 Thread Haozhong Zhang
On 09/13/17 17:28 +0200, Michal Privoznik wrote:
> 
> BTW: I ran a migration from no nvdimm qemu to one that had -M nvdimm=on
> and guest migrated happily. So looks like guest ABI is stable (or at
> least stable enough not to crash). But since ACPI table is changed I
> doubt that.

One example that may cause trouble is that
1/ Guest OS got a pointer to an ACPI table A on the source host (w/o nvdimm=on)
2/ After migrating to the destination host (w/ nvdimm=on), the
   location of ACPI table A is occupied by NFIT. If guest OS tries to
   access ACPI table A via the pointer in step 1/, then it will access
   the wrong table.

> 
> Also, I found that hotunplug is not implemented yet?

So far there is no public specification defining NVDIMM hotunplug, so
we didn't implement it.


Haozhong


> 
> error: internal error: unable to execute QEMU command 'device_del':
> nvdimm device hot unplug is not supported yet.
> 
> Michal



Re: [Qemu-devel] [PATCH v7 12/20] dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

2017-09-13 Thread John Snow


On 09/12/2017 04:31 PM, Eric Blake wrote:
> Thanks to recent cleanups, all callers were scaling a return value
> of sectors into bytes; do the scaling internally instead.
> 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: John Snow 



Re: [Qemu-devel] [PATCH v7 11/20] dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset

2017-09-13 Thread John Snow


On 09/12/2017 04:31 PM, Eric Blake wrote:
> Thanks to recent cleanups, most callers were scaling a return value
> of sectors into bytes (the exception, in qcow2-bitmap, will be
> converted to byte-based iteration later).  Update the interface to
> do the scaling internally instead.
> 
> In qcow2-bitmap, the code was specifically checking for an error
> to be -1; it is more robust to treat all negative values as an
> error, but at the same time it is also easy enough to ensure we
> return -1 (and not -512) on error.
> 
> Signed-off-by: Eric Blake 
> 

This patch now smells like a bugfix and a separate incremental feature
enhancement.

Do we need to backport the error-checking to a possible 2.10.1?

If no:

Reviewed-by: John Snow 



Re: [Qemu-devel] [PATCH v7 07/20] dirty-bitmap: Track bitmap size by bytes

2017-09-13 Thread John Snow


On 09/12/2017 04:31 PM, Eric Blake wrote:
> We are still using an internal hbitmap that tracks a size in sectors,
> with the granularity scaled down accordingly, because it lets us
> use a shortcut for our iterators which are currently sector-based.
> But there's no reason we can't track the dirty bitmap size in bytes,
> since it is (mostly) an internal-only variable (remember, the size
> is how many bytes are covered by the bitmap, not how many bytes the
> bitmap occupies).  A later cleanup will convert dirty bitmap
> internals to be entirely byte-based, eliminating the intermediate
> sector rounding added here; and technically, since bdrv_getlength()
> already rounds up to sectors, our use of DIV_ROUND_UP is more for
> theoretical completeness than for any actual rounding.
> 
> Use is_power_of_2() while at it, instead of open-coding that.
> 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: John Snow 



Re: [Qemu-devel] [PATCH v7 06/20] dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes

2017-09-13 Thread John Snow


On 09/12/2017 04:31 PM, Eric Blake wrote:
> We're already reporting bytes for bdrv_dirty_bitmap_granularity();
> mixing bytes and sectors in our return values is a recipe for
> confusion.  A later cleanup will convert dirty bitmap internals
> to be entirely byte-based, but in the meantime, we should report
> the bitmap size in bytes.
> 
> The only external caller in qcow2-bitmap.c is temporarily more verbose
> (because it is still using sector-based math), but will later be
> switched to track progress by bytes instead of sectors.
> 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: John Snow 



Re: [Qemu-devel] [PATCH 05/19] nvic: Implement AIRCR changes for v8M

2017-09-13 Thread Richard Henderson
On 09/12/2017 11:13 AM, Peter Maydell wrote:
> The Application Interrupt and Reset Control Register has some changes
> for v8M:
>  * new bits SYSRESETREQS, BFHFNMINS and PRIS: these all have
>real state if the security extension is implemented and otherwise
>are constant
>  * the PRIGROUP field is banked between security states
>  * non-secure code can be blocked from using the SYSRESET bit
>to reset the system if SYSRESETREQS is set
> 
> Implement the new state and the changes to register read and write.
> For the moment we ignore the effects of the secure PRIGROUP.
> We will implement the effects of PRIS and BFHFNMIS later.
> 
> Signed-off-by: Peter Maydell 
> ---

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH v7 05/20] dirty-bitmap: Check for size query failure during truncate

2017-09-13 Thread John Snow


On 09/12/2017 04:31 PM, Eric Blake wrote:
> We've previously fixed several places where we failed to account
> for possible errors from bdrv_nb_sectors().  Fix another one by
> making bdrv_dirty_bitmap_truncate() report the error rather then
> silently resizing bitmaps to -1.  Then adjust the sole caller

Nice failure mode. It was not immediately obvious to me that this could
fail, but here we all are.

> bdrv_truncate() to both reduce the likelihood of failure (blindly
> calling bdrv_dirty_bitmap_truncate() after refresh_total_sectors()
> fails was not nice) as well as propagate any actual failures.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v7: new patch [Kevin]
> ---
>  include/block/dirty-bitmap.h |  2 +-
>  block.c  | 19 ++-
>  block/dirty-bitmap.c | 12 
>  3 files changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
> index 8fd842eac9..15101b59d5 100644
> --- a/include/block/dirty-bitmap.h
> +++ b/include/block/dirty-bitmap.h
> @@ -83,7 +83,7 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
>  void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t sector_num);
>  int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
>  int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
> -void bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
> +int bdrv_dirty_bitmap_truncate(BlockDriverState *bs);
>  bool bdrv_dirty_bitmap_readonly(const BdrvDirtyBitmap *bitmap);
>  bool bdrv_has_readonly_bitmaps(BlockDriverState *bs);
>  bool bdrv_dirty_bitmap_get_autoload(const BdrvDirtyBitmap *bitmap);
> diff --git a/block.c b/block.c
> index ee6a48976e..790dcce360 100644
> --- a/block.c
> +++ b/block.c
> @@ -3450,12 +3450,21 @@ int bdrv_truncate(BdrvChild *child, int64_t offset, 
> PreallocMode prealloc,
>  assert(!(bs->open_flags & BDRV_O_INACTIVE));
> 
>  ret = drv->bdrv_truncate(bs, offset, prealloc, errp);
> -if (ret == 0) {
> -ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
> -bdrv_dirty_bitmap_truncate(bs);
> -bdrv_parent_cb_resize(bs);
> -atomic_inc(>write_gen);
> +if (ret < 0) {
> +return ret;
>  }
> +ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "Could not refresh total sector count");
> +return ret;
> +}
> +ret = bdrv_dirty_bitmap_truncate(bs);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "Could not refresh total sector count");

You probably meant to write a different error message here.

Also, what happens if the actual truncate call works, but the bitmap
resizing fails? What state does that leave us in?



Re: [Qemu-devel] [PATCH 04/19] nvic: Add cached vectpending_prio state

2017-09-13 Thread Richard Henderson
On 09/12/2017 11:13 AM, Peter Maydell wrote:
> Instead of looking up the pending priority
> in nvic_pending_prio(), cache it in a new state struct
> field. The calculation of the pending priority given
> the interrupt number is more complicated in v8M with
> the security extension, so the caching will be worthwhile.
> 
> This changes nvic_pending_prio() from returning a full
> (group + subpriority) priority value to returning a group
> priority. This doesn't require changes to its callsites
> because we use it only in comparisons of the form
>   execution_prio > nvic_pending_prio()
> and execution priority is always a group priority, so
> a test (exec prio > full prio) is true if and only if
> (execprio > group_prio).
> 
> (Architecturally the expected comparison is with the
> group priority for this sort of "would we preempt" test;
> we were only doing a test with a full priority as an
> optimisation to avoid the mask, which is possible
> precisely because the two comparisons always give the
> same answer.)
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/intc/armv7m_nvic.h |  2 ++
>  hw/intc/armv7m_nvic.c | 23 +--
>  hw/intc/trace-events  |  2 +-
>  3 files changed, 16 insertions(+), 11 deletions(-)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [PATCH v4 3/3] Test for full Backup

2017-09-13 Thread John Snow


On 09/08/2017 12:41 PM, Ishani Chugh wrote:
> This patch is the test for full backup implementation in Backup tool.
> The test employs two basic substests:
> 1) Backing up an empty guest and comparing it with base image.
> 2) Writing a pattern to the guest, creating backup and comparing
>with the base image.
> 
> Signed-off-by: Ishani Chugh 
> ---
>  tests/qemu-iotests/191 | 86 
> ++
>  tests/qemu-iotests/191.out | 35 +++
>  tests/qemu-iotests/group   |  1 +
>  3 files changed, 122 insertions(+)
>  create mode 100755 tests/qemu-iotests/191
>  create mode 100644 tests/qemu-iotests/191.out
> 
> diff --git a/tests/qemu-iotests/191 b/tests/qemu-iotests/191
> new file mode 100755
> index 000..16988d8
> --- /dev/null
> +++ b/tests/qemu-iotests/191
> @@ -0,0 +1,86 @@
> +#!/bin/bash
> +#
> +# Test full backup functionality of qemu-backup tool
> +#
> +# Copyright (C) 2017 Ishani Chugh 
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +# creator
> +owner=chugh.ish...@research.iiit.ac.in
> +
> +seq=`basename $0`
> +echo "QA output created by $seq"
> +
> +here=`pwd`
> +status=1 # failure is the default!
> +
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +. ./common.qemu
> +
> +_supported_fmt generic
> +_supported_proto generic
> +_supported_os Linux
> +
> +_cleanup()
> +{
> +rm -f "$TEST_DIR"/virtio0
> +rm -f "$CONFIG_FILE"
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +CONFIG_FILE="$TEST_DIR"/backup-config
> +SOCKET=unix:"$TEST_DIR"/backup_socket
> +size=128M
> +
> +_make_test_img "$size"
> +export QEMU_BACKUP_CONFIG="$CONFIG_FILE"
> +qemu_comm_method="monitor"
> +echo
> +_launch_qemu -drive if=virtio,file="$TEST_IMG" -qmp "$SOCKET",server,nowait

QEMU launch invocation is missing the format= parameter. This test fails
with the -raw format because of this.



Re: [Qemu-devel] [PATCH 02/19] nvic: Add banked exception states

2017-09-13 Thread Richard Henderson
On 09/12/2017 11:13 AM, Peter Maydell wrote:
> For the v8M security extension, some exceptions must be banked
> between security states. Add the new vecinfo array which holds
> the state for the banked exceptions and migrate it if the
> CPU the NVIC is attached to implements the security extension.
> 
> Signed-off-by: Peter Maydell 
> ---
>  include/hw/intc/armv7m_nvic.h | 14 +
>  hw/intc/armv7m_nvic.c | 49 
> ++-
>  2 files changed, 62 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson 


r~




Re: [Qemu-devel] [RFC PATCH] spapr_pci: make index property mandatory

2017-09-13 Thread David Gibson
On Mon, Sep 11, 2017 at 01:35:03PM +0200, Greg Kurz wrote:
> Creating several PHBs without index property confuses the DRC code
> and causes issues:
> - only the first index-less PHB is functional, the other ones will
>   silently ignore hotplugging of PCI devices
> - QEMU will even terminate if these PHBs have cold-plugged devices
> 
> qemu-system-ppc64: -device virtio-net,bus=pci2.0: an attached device
>  is still awaiting release
> 
> This happens because DR connectors for child PCI devices are created
> with a DRC index that is derived from the PHB's index property. If the
> PHBs are created without index, then the same value of -1 is used to
> compute the DRC indexes for both PHBs, hence causing the collision.
> 
> Also, the index property is used to compute the placement of the PHB's
> memory regions. It is limited to 31 or 255, depending on the machine
> type version. This fits well with the requirements of DRC indexes, which
> need the PHB index to be a 16-bit value.
> 
> This patch hence makes the index property mandatory. As a consequence,
> the PHB's memory regions and BUID are now always configured according
> to the index, and it is no longer possible to set them from the command
> line. This is a reasonable trade-off, as it is very unlikely that people
> create PHBs without index (at least libvirt doesn't do it FWIW).
> 
> We have to introduce a PHB instance init function to initialize the
> 64-bit window address to -1 because pseries-2.7 and older machines
> don't set it.
> 
> Signed-off-by: Greg Kurz 
> ---
> 
> Hi,
> 
> This is a proposal to address the issue uncovered during the review of the
> PHB hotplug patches:
> 
> https://lists.nongnu.org/archive/html/qemu-ppc/2017-07/msg00557.html
> 
> I'd like to address this properly before resuming work on PHB hotplug
> itself. Please comment.

Yes, I think this looks good.  Commit message should explicitly
mention that this does break backwards compat - but we don't think the
non-index PHB feature was used in practice and the simplification is
worth it.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 01/19] target/arm: Implement MSR/MRS access to NS banked registers

2017-09-13 Thread Richard Henderson
On 09/12/2017 11:13 AM, Peter Maydell wrote:
> In v8M the MSR and MRS instructions have extra register value
> encodings to allow secure code to access the non-secure banked
> version of various special registers.
> 
> (We don't implement the MSPLIM_NS or PSPLIM_NS aliases, because
> we don't currently implement the stack limit registers at all.)
> 
> Signed-off-by: Peter Maydell 
> ---
>  target/arm/helper.c | 110 
> 
>  1 file changed, 110 insertions(+)

Reviewed-by: Richard Henderson 


r~



Re: [Qemu-devel] Trying to use ccid-card-emulated

2017-09-13 Thread Marc-André Lureau
Hi Patrick

On Wed, Sep 6, 2017 at 5:04 PM Patrick Vacek 
wrote:

> Hello,
>
> I'm trying to emulate a smartcard. I found section 4 of docs/ccid.txt,
> which appears to do exactly what I'm interested in. However, that
> document is a few years old and references CoolKey, which at this point
> seems obsolete, with OpenSC being the preferred succcessor. I've
> followed the rest of steps with success, and tried registering OpenSC
> with NSS (i.e. modutil -dbdir /etc/pki/nssdb -add "CAC Module" -libfile
> /usr/lib/opensc-pkcs11.so), but I'm still not seeing my three
> certificates listed on the device as I'd expect.
>
> I'm using QEMU emulator version 2.8.0(Debian 1:2.8+dfsg-3ubuntu2.3).
> I've also tried using QEMU emulator version 2.10.0 (built from source),
> but the interface has changed and the commands from the documentation
> don't work anymore.
>
> 1. Am I correct to assume that OpenSC is the logical successor to
> CoolKey, and should I expect a simple substitution such as that to work?
>

 That's my understanding too, and it seems Fedora 26 deprecated coolkey.
However, when I tried opensc a few years with qemu/libcacard, it didn't
work. I haven't looked further since.

2. Are there other steps I might be overlooking with OpenSC or with
> getting the certificates recognized on the device?
>

I would first try to get coolkey module to work, before debuging opensc.
Ideally get some help from opensc developper since qemu should still work
with coolkey.

3. If, as I suspect, that document is no longer up to date, what do the
> steps currently look like to get smartcard emulation working?
>

They look still pretty ok to me. certutil usage may have changes, but qemu
& coolkey didn't change I think.

What problems did you have when trying to setup following docs/ccid.txt ?
we may want to update the doc.

Thanks
-- 
Marc-André Lureau


Re: [Qemu-devel] [PATCH v2 00/10] cleanup qemu-iotests

2017-09-13 Thread Eric Blake
On 09/13/2017 05:06 PM, Paolo Bonzini wrote:
> On 13/09/2017 09:59, Fam Zheng wrote:
>> On Wed, 09/13 08:47, Thomas Huth wrote:
>>> Meta comment: Could we maybe also rename "tests/qemu-iotests" to
>>> "tests/iotests" ? The "qemu" prefix sounds always very superfluous to me
>>> here...
>>
>> Sounds good, and saves typing for when this path is manually entered. But 
>> maybe
>> keep tests/qemu-iotests as a symlink to keep old scripts happy?
> 
> The simplest way to keep old scripts happy would be too keep the prefix
> :) even though it does sound unnecessary.

Can't someone manually add the symlink to keep their own scripts working
both before and after the change, even if we don't commit the symlink
into git?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] accel/hax: move hax-stub.c to accel/stubs/

2017-09-13 Thread Philippe Mathieu-Daudé
Suggested-by: Paolo Bonzini 
Signed-off-by: Philippe Mathieu-Daudé 
---
 Makefile.target  | 1 -
 hax-stub.c => accel/stubs/hax-stub.c | 0
 accel/stubs/Makefile.objs| 1 +
 3 files changed, 1 insertion(+), 1 deletion(-)
 rename hax-stub.c => accel/stubs/hax-stub.c (100%)

diff --git a/Makefile.target b/Makefile.target
index 6361f957fb..af4b60d012 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -101,7 +101,6 @@ obj-y += fpu/softfloat.o
 obj-y += target/$(TARGET_BASE_ARCH)/
 obj-y += disas.o
 obj-$(call notempty,$(TARGET_XML_FILES)) += gdbstub-xml.o
-obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o
 
 obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decContext.o
 obj-$(CONFIG_LIBDECNUMBER) += libdecnumber/decNumber.o
diff --git a/hax-stub.c b/accel/stubs/hax-stub.c
similarity index 100%
rename from hax-stub.c
rename to accel/stubs/hax-stub.c
diff --git a/accel/stubs/Makefile.objs b/accel/stubs/Makefile.objs
index fdfbf7332c..8038ad6553 100644
--- a/accel/stubs/Makefile.objs
+++ b/accel/stubs/Makefile.objs
@@ -1,2 +1,3 @@
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
 obj-$(call lnot,$(CONFIG_TCG)) += tcg-stub.o
+obj-$(call lnot,$(CONFIG_HAX)) += hax-stub.o
-- 
2.14.1




Re: [Qemu-devel] [PATCH v2 00/10] cleanup qemu-iotests

2017-09-13 Thread Paolo Bonzini
On 13/09/2017 09:59, Fam Zheng wrote:
> On Wed, 09/13 08:47, Thomas Huth wrote:
>> Meta comment: Could we maybe also rename "tests/qemu-iotests" to
>> "tests/iotests" ? The "qemu" prefix sounds always very superfluous to me
>> here...
> 
> Sounds good, and saves typing for when this path is manually entered. But 
> maybe
> keep tests/qemu-iotests as a symlink to keep old scripts happy?

The simplest way to keep old scripts happy would be too keep the prefix
:) even though it does sound unnecessary.

Paolo



Re: [Qemu-devel] [Qemu-block] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread Adam Wolfe Gordon via Qemu-devel
On Wed, Sep 13, 2017 at 3:21 PM, Jason Dillaman  wrote:
> On Wed, 2017-09-13 at 10:44 -0600, Adam Wolfe Gordon wrote:
>> +parent = bs->inherits_from;
>> +if (parent == NULL) {
>> +error_report("bs %s does not have parent", 
>> bdrv_get_device_or_node_name(bs));
>> +return;
>> +}
>> +
>> +/* Force parents to re-read our size. */
>
> Can you just invoke blk_truncate instead?

blk_truncate will end up calling qemu_rbd_truncate, which does an
rbd_resize. The goal of this patch is to avoid calling rbd_resize from
qemu, since we do resizes from an external orchestration service.

That said, this patch also introduces a cache of the size (in
BDRVRBDState.size_bytes), so it would be possible for
qemu_rbd_truncate to avoid calling rbd_resize when the image is
already the right size. If I did that, we could use blk_truncate when
we detect a resize as well. Would that be a desirable change?



Re: [Qemu-devel] [Qemu-block] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread Adam Wolfe Gordon via Qemu-devel
On Wed, Sep 13, 2017 at 2:53 PM, John Snow  wrote:
> On 09/13/2017 12:44 PM, Adam Wolfe Gordon wrote:
>> Register a watcher with rbd so that we get notified when an image is
>> resized. Propagate resizes to parent block devices so that guest devices
>> get resized without user intervention.
>>
>> Signed-off-by: Adam Wolfe Gordon 
>> ---
>> Hello!
>>
>> We are using this patch in production at DigitalOcean and have tested it 
>> fairly
>> extensively. However, we use the same block device configuration everywhere, 
>> so
>> I'd be interested to get an answer to the question I've left in the code:>
>>> /* NOTE: This assumes there's only one layer between us and the
>>>block-backend. Is this always true? */
>>
>
> Even if it were always true presently, the fact that the block layer
> storage graph can be configured in arbitrary ways suggests that it might
> not always be true in the future.
>
> Probably in practice MOST drives are configured like
> [BB]<--[Format]<--[Protocol], but the block layer is adding a lot of fun
> stuff like filter nodes for throttling or debugging, so it's probably
> too much to assert that the length of the chain will ALWAYS be exactly two.

OK, thanks - that's exactly what I was wondering. I can add some code
to traverse the parents until we get to the BB.

> Can I play dumb and ask what you are trying to accomplish? The concept
> of a drive deciding to resize itself is not something I have a model for
> understanding, presently...

Good question; I probably should have explained the use-case in the
commit message.

We have a storage orchestration service that manages our ceph block
storage clusters and doesn't interact directly with qemu. Volumes get
resized through the orchestration service, which (after doing some
checks, updating external metadata, etc.) issues the resize via
librbd. The drive doesn't exactly resize itself, but the trigger is
out-of-band from any VM the drive may be attached to.

Previously, we would notify the VM of the resize by issuing a
blockresize via qmp after doing the resize itself externally. That
meant we were actually resizing the rbd image twice (though the second
was, hopefully, a no-op). We occasionally had trouble with the resize
issued by qemu getting stuck and blocking the qemu main thread.
Detecting the out-of-band resize lets us avoid the extra rbd_resize
call and means that we never modify an rbd image's metadata from qemu.



Re: [Qemu-devel] [PATCH] osdep: Fix ROUND_UP(64-bit, 32-bit)

2017-09-13 Thread Paolo Bonzini
On 13/09/2017 23:03, Eric Blake wrote:
> When using bit-wise operations that exploit the power-of-two
> nature of the second argument of ROUND_UP(), we still need to
> ensure that the mask is as wide as the first argument (done
> by using addition of 0 to force proper arithmetic promotion).
> Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512) produces 0,
> instead of the intended 2TiB.
> 
> Broken since its introduction in commit 292c8e50 (v1.5.0).
> 
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Eric Blake 
> 
> ---
> I did not audit to see how many potential users of ROUND_UP
> are actually passing in different sized types where the first
> argument can be larger than UINT32_MAX; I stumbled across the
> problem when iotests 190 started failing on a patch where I
> added a new use.  We can either be conservative and put this
> on qemu-stable no matter what, or go through the effort of an
> audit to see what might be broken (many callers in the block
> layer, but not just there).
> ---
>  include/qemu/osdep.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index 6855b94bbf..7a3000efc5 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -189,13 +189,13 @@ extern int daemon(int, int);
> 
>  /* Round number up to multiple. Requires that d be a power of 2 (see
>   * QEMU_ALIGN_UP for a safer but slower version on arbitrary
> - * numbers) */
> + * numbers); works even if d is a smaller type than n.  */
>  #ifndef ROUND_UP
> -#define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
> +#define ROUND_UP(n, d) (((n) + (d) - 1) & -((n) - (n) + (d)))
>  #endif
> 
>  #ifndef DIV_ROUND_UP
> -#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
> +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
>  #endif
> 
>  /*
> 


Cc: qemu-sta...@nongnu.org

Paolo



Re: [Qemu-devel] [Qemu-block] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread Jason Dillaman
On Wed, 2017-09-13 at 10:44 -0600, Adam Wolfe Gordon wrote:
> Register a watcher with rbd so that we get notified when an image is
> resized. Propagate resizes to parent block devices so that guest devices
> get resized without user intervention.
> 
> Signed-off-by: Adam Wolfe Gordon 
> ---
> Hello!
> 
> We are using this patch in production at DigitalOcean and have tested it 
> fairly
> extensively. However, we use the same block device configuration everywhere, 
> so
> I'd be interested to get an answer to the question I've left in the code:
> 
> > /* NOTE: This assumes there's only one layer between us and the
> >block-backend. Is this always true? */
> 
> If that isn't true, this patch will need a bit of work to traverse up the 
> block
> device tree and find the block-backend.
> 
> Of course, any other feedback would be welcome too - this is my first foray 
> into
> the qemu code.
> 
> Regards,
> Adam
> 
>  block/rbd.c | 80 
> +
>  1 file changed, 80 insertions(+)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index 144f350e1f..1c9fcbec1f 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -96,6 +96,8 @@ typedef struct BDRVRBDState {
>  rbd_image_t image;
>  char *image_name;
>  char *snap;
> +uint64_t watcher_handle;
> +uint64_t size_bytes;
>  } BDRVRBDState;
>  
>  static char *qemu_rbd_next_tok(char *src, char delim, char **p)
> @@ -540,6 +542,67 @@ out:
>  return rados_str;
>  }
>  
> +/* BH for when rbd notifies us of an update. */
> +static void qemu_rbd_update_bh(void *arg)
> +{
> +BlockDriverState *parent, *bs = arg;
> +BDRVRBDState *s = bs->opaque;
> +bool was_variable_length;
> +uint64_t new_size_bytes;
> +int64_t new_parent_len;
> +BdrvChild *c;
> +int r;
> +
> +r = rbd_get_size(s->image, _size_bytes);
> +if (r < 0) {
> +error_report("error reading size for %s: %s", s->name, strerror(-r));
> +return;
> +}
> +
> +/* Avoid no-op resizes on non-resize notifications. */
> +if (new_size_bytes == s->size_bytes) {
> +error_printf("skipping non-resize rbd cb\n");

Image update callbacks can be invoked for any number of reasons, not
just resize events. Therefore, you probably don't want to have an error
message printed out here.

> +return;
> +}
> +
> +/* NOTE: This assumes there's only one layer between us and the
> +   block-backend. Is this always true? */

I don't think that can be assumed to be true.

> +parent = bs->inherits_from;
> +if (parent == NULL) {
> +error_report("bs %s does not have parent", 
> bdrv_get_device_or_node_name(bs));
> +return;
> +}
> +
> +/* Force parents to re-read our size. */

Can you just invoke blk_truncate instead? 

> +was_variable_length = bs->drv->has_variable_length;
> +bs->drv->has_variable_length = true;
> +new_parent_len = bdrv_getlength(parent);
> +if (new_parent_len < 0) {
> +error_report("getlength failed on parent %s", 
> bdrv_get_device_or_node_name(parent));
> +bs->drv->has_variable_length = was_variable_length;
> +return;
> +}
> +bs->drv->has_variable_length = was_variable_length;
> +
> +/* Notify block backends that that we have resized.
> +   Copied from bdrv_parent_cb_resize. */
> +QLIST_FOREACH(c, >parents, next_parent) {
> +if (c->role->resize) {
> +c->role->resize(c);
> +}
> +}
> +
> +s->size_bytes = new_size_bytes;
> +}
> +
> +/* Called from non-qemu thread - careful! */
> +static void qemu_rbd_update_cb(void *arg)
> +{
> +BlockDriverState *bs = arg;
> +
> +aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), qemu_rbd_update_bh, 
> bs);
> +}
> +
>  static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
>   Error **errp)
>  {
> @@ -672,9 +735,25 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  }
>  
> +r = rbd_update_watch(s->image, >watcher_handle, qemu_rbd_update_cb, 
> bs);

This API method was only added in the Ceph Jewel release just over a
year ago. This should probably gracefully degrade if compiled against an
older version of the librbd API.

> +if (r < 0) {
> +error_setg_errno(errp, -r, "error registering watcher on %s", 
> s->name);
> +goto failed_watch;
> +}
> +
> +r = rbd_get_size(s->image, >size_bytes);
> +if (r < 0) {
> +error_setg_errno(errp, -r, "error reading size for %s", s->name);
> +goto failed_sz;
> +}
> +
>  qemu_opts_del(opts);
>  return 0;
>  
> +failed_sz:
> +rbd_update_unwatch(s->image, s->watcher_handle);
> +failed_watch:
> +rbd_close(s->image);
>  failed_open:
>  rados_ioctx_destroy(s->io_ctx);
>  failed_shutdown:
> @@ -712,6 +791,7 @@ static void qemu_rbd_close(BlockDriverState *bs)
>  {
>  BDRVRBDState *s = 

Re: [Qemu-devel] [PATCH] hw/isa/pc87312: Mark the device with user_creatable = false

2017-09-13 Thread Hervé Poussineau

Le 13/09/2017 à 11:07, Thomas Huth a écrit :

QEMU currently aborts if you try to use the device at the command
line:

$ ppc64-softmmu/qemu-system-ppc64 -S -machine prep -device pc87312
Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
qemu-system-ppc64: -device pc87312: Device 'parallel0' is in use
Aborted (core dumped)

It uses parallel_hds in its realize function, so I can not be
instantiated by the user again.

Signed-off-by: Thomas Huth 


Reviewed-by: Hervé Poussineau 

However, a better solution would probably to not directly use parallel_hds[0], 
but to add some Chardev properties to the device, and to fill them in the 
mainboard code.
This is noted as a "Future incompatible change" in changelog since 2.8

Hervé


---
  hw/isa/pc87312.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/hw/isa/pc87312.c b/hw/isa/pc87312.c
index 5ce9f0a..48b29e3 100644
--- a/hw/isa/pc87312.c
+++ b/hw/isa/pc87312.c
@@ -386,6 +386,8 @@ static void pc87312_class_init(ObjectClass *klass, void 
*data)
  dc->reset = pc87312_reset;
  dc->vmsd = _pc87312;
  dc->props = pc87312_properties;
+/* Reason: Uses parallel_hds[0] in realize(), so it can't be used twice */
+dc->user_creatable = false;
  }
  
  static const TypeInfo pc87312_type_info = {







Re: [Qemu-devel] [PATCH v4 00/23] make bdrv_get_block_status byte-based

2017-09-13 Thread Eric Blake
On 09/13/2017 11:03 AM, Eric Blake wrote:
> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
> but NBD wants to report status on byte granularity (even if the
> reporting will probably be naturally aligned to sectors or even
> much higher levels).  I've therefore started the task of
> converting our block status code to report at a byte granularity
> rather than sectors.
> 
> Now that 2.11 is open, I'm rebasing/reposting the remaining patches.
> 
> The overall conversion currently looks like:
> part 1: bdrv_is_allocated (merged, commit 51b0a488)
> part 2: dirty-bitmap (v7 is posted [1], mostly reviewed)
> part 3: bdrv_get_block_status (this series, v3 at [2])
> part 4: .bdrv_co_block_status (v2 is posted [3], but needs a rebase)
> 
> Available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-status-v4
> 
> Based-on: <20170912203119.24166-1-ebl...@redhat.com>
> ([PATCH v7 00/20] make dirty-bitmap byte-based)

Also,

Based-on: <20170913210343.19078-1-ebl...@redhat.com>
(osdep: Fix ROUND_UP(64-bit, 32-bit))

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] osdep: Fix ROUND_UP(64-bit, 32-bit)

2017-09-13 Thread Eric Blake
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using addition of 0 to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512) produces 0,
instead of the intended 2TiB.

Broken since its introduction in commit 292c8e50 (v1.5.0).

CC: qemu-triv...@nongnu.org
Signed-off-by: Eric Blake 

---
I did not audit to see how many potential users of ROUND_UP
are actually passing in different sized types where the first
argument can be larger than UINT32_MAX; I stumbled across the
problem when iotests 190 started failing on a patch where I
added a new use.  We can either be conservative and put this
on qemu-stable no matter what, or go through the effort of an
audit to see what might be broken (many callers in the block
layer, but not just there).
---
 include/qemu/osdep.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 6855b94bbf..7a3000efc5 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -189,13 +189,13 @@ extern int daemon(int, int);

 /* Round number up to multiple. Requires that d be a power of 2 (see
  * QEMU_ALIGN_UP for a safer but slower version on arbitrary
- * numbers) */
+ * numbers); works even if d is a smaller type than n.  */
 #ifndef ROUND_UP
-#define ROUND_UP(n,d) (((n) + (d) - 1) & -(d))
+#define ROUND_UP(n, d) (((n) + (d) - 1) & -((n) - (n) + (d)))
 #endif

 #ifndef DIV_ROUND_UP
-#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
+#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
 #endif

 /*
-- 
2.13.5




Re: [Qemu-devel] [Qemu-block] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread John Snow


On 09/13/2017 12:44 PM, Adam Wolfe Gordon wrote:
> Register a watcher with rbd so that we get notified when an image is
> resized. Propagate resizes to parent block devices so that guest devices
> get resized without user intervention.
> 
> Signed-off-by: Adam Wolfe Gordon 
> ---
> Hello!
> 
> We are using this patch in production at DigitalOcean and have tested it 
> fairly
> extensively. However, we use the same block device configuration everywhere, 
> so
> I'd be interested to get an answer to the question I've left in the code:>
>> /* NOTE: This assumes there's only one layer between us and the
>>block-backend. Is this always true? */
> 

Even if it were always true presently, the fact that the block layer
storage graph can be configured in arbitrary ways suggests that it might
not always be true in the future.

Probably in practice MOST drives are configured like
[BB]<--[Format]<--[Protocol], but the block layer is adding a lot of fun
stuff like filter nodes for throttling or debugging, so it's probably
too much to assert that the length of the chain will ALWAYS be exactly two.

Can I play dumb and ask what you are trying to accomplish? The concept
of a drive deciding to resize itself is not something I have a model for
understanding, presently...

> If that isn't true, this patch will need a bit of work to traverse up the 
> block
> device tree and find the block-backend.
> 
> Of course, any other feedback would be welcome too - this is my first foray 
> into
> the qemu code.
> 
> Regards,
> Adam
> 
>  block/rbd.c | 80 
> +
>  1 file changed, 80 insertions(+)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index 144f350e1f..1c9fcbec1f 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -96,6 +96,8 @@ typedef struct BDRVRBDState {
>  rbd_image_t image;
>  char *image_name;
>  char *snap;
> +uint64_t watcher_handle;
> +uint64_t size_bytes;
>  } BDRVRBDState;
>  
>  static char *qemu_rbd_next_tok(char *src, char delim, char **p)
> @@ -540,6 +542,67 @@ out:
>  return rados_str;
>  }
>  
> +/* BH for when rbd notifies us of an update. */
> +static void qemu_rbd_update_bh(void *arg)
> +{
> +BlockDriverState *parent, *bs = arg;
> +BDRVRBDState *s = bs->opaque;
> +bool was_variable_length;
> +uint64_t new_size_bytes;
> +int64_t new_parent_len;
> +BdrvChild *c;
> +int r;
> +
> +r = rbd_get_size(s->image, _size_bytes);
> +if (r < 0) {
> +error_report("error reading size for %s: %s", s->name, strerror(-r));
> +return;
> +}
> +
> +/* Avoid no-op resizes on non-resize notifications. */
> +if (new_size_bytes == s->size_bytes) {
> +error_printf("skipping non-resize rbd cb\n");
> +return;
> +}
> +
> +/* NOTE: This assumes there's only one layer between us and the
> +   block-backend. Is this always true? */
> +parent = bs->inherits_from;
> +if (parent == NULL) {
> +error_report("bs %s does not have parent", 
> bdrv_get_device_or_node_name(bs));
> +return;
> +}
> +
> +/* Force parents to re-read our size. */
> +was_variable_length = bs->drv->has_variable_length;
> +bs->drv->has_variable_length = true;
> +new_parent_len = bdrv_getlength(parent);
> +if (new_parent_len < 0) {
> +error_report("getlength failed on parent %s", 
> bdrv_get_device_or_node_name(parent));
> +bs->drv->has_variable_length = was_variable_length;
> +return;
> +}
> +bs->drv->has_variable_length = was_variable_length;
> +
> +/* Notify block backends that that we have resized.
> +   Copied from bdrv_parent_cb_resize. */
> +QLIST_FOREACH(c, >parents, next_parent) {
> +if (c->role->resize) {
> +c->role->resize(c);
> +}
> +}
> +
> +s->size_bytes = new_size_bytes;
> +}
> +
> +/* Called from non-qemu thread - careful! */
> +static void qemu_rbd_update_cb(void *arg)
> +{
> +BlockDriverState *bs = arg;
> +
> +aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), qemu_rbd_update_bh, 
> bs);
> +}
> +
>  static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
>   Error **errp)
>  {
> @@ -672,9 +735,25 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  }
>  }
>  
> +r = rbd_update_watch(s->image, >watcher_handle, qemu_rbd_update_cb, 
> bs);
> +if (r < 0) {
> +error_setg_errno(errp, -r, "error registering watcher on %s", 
> s->name);
> +goto failed_watch;
> +}
> +
> +r = rbd_get_size(s->image, >size_bytes);
> +if (r < 0) {
> +error_setg_errno(errp, -r, "error reading size for %s", s->name);
> +goto failed_sz;
> +}
> +
>  qemu_opts_del(opts);
>  return 0;
>  
> +failed_sz:
> +rbd_update_unwatch(s->image, s->watcher_handle);
> +failed_watch:
> +rbd_close(s->image);
>  

Re: [Qemu-devel] [PATCH 2/2] Add --firmwarepath to configure

2017-09-13 Thread Paolo Bonzini
On 13/09/2017 14:09, Gerd Hoffmann wrote:
> Add a firmware path config option to configure.  Multiple directories
> are accepted, with the usual colon as separator.  Default value is
> ${prefix}/share/qemu-firmware.  The path is searched in addition to the
> current search path (typically ${prefix}/share/qemu).
> 
> This prepares qemu for the planned split of the prebuilt firmware blobs
> into a separate project.
> 
> Distributions can also use this to get rid of the firmware symlink farm
> and add -- for example -- /usr/share/seabios to the firmware path
> instead.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  configure |  6 ++
>  vl.c  | 12 +---
>  scripts/create_config |  2 +-
>  3 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/configure b/configure
> index fd7e3a5e81..870bdbd3e4 100755
> --- a/configure
> +++ b/configure
> @@ -332,6 +332,7 @@ modules="no"
>  prefix="/usr/local"
>  mandir="\${prefix}/share/man"
>  datadir="\${prefix}/share"
> +firmwarepath="\${prefix}/share/qemu-firmware"
>  qemu_docdir="\${prefix}/share/doc/qemu"
>  bindir="\${prefix}/bin"
>  libdir="\${prefix}/lib"
> @@ -915,6 +916,8 @@ for opt do
>;;
>--localstatedir=*) local_statedir="$optarg"
>;;
> +  --firmwarepath=*) firmwarepath="$optarg"
> +  ;;
>--sbindir=*|--sharedstatedir=*|\
>--oldincludedir=*|--datarootdir=*|--infodir=*|--localedir=*|\
>--htmldir=*|--dvidir=*|--pdfdir=*|--psdir=*)
> @@ -1418,6 +1421,7 @@ Advanced options (experts only):
>--libdir=PATHinstall libraries in PATH
>--sysconfdir=PATHinstall config in PATH$confsuffix
>--localstatedir=PATH install local state in PATH (set at runtime on 
> win32)
> +  --firmwarepath=PATH  search PATH for firmware files

Maybe --firmwaredir or --with-firmwaredir (because firmwaredir is not
one of the "standard" directories)?

> @@ -4232,11 +4233,16 @@ int main(int argc, char **argv, char **envp)
>  qemu_set_log(0);
>  }
>  
> -/* If no data_dir is specified then try to find it relative to the
> -   executable path.  */
> +/* add configured firmware directories */
> +dirs = g_strsplit(CONFIG_QEMU_FIRMWAREPATH, ":", 0);

Windows probably wants to use ; here, so you can use
G_SEARCHPATH_SEPARATOR_S instead of ":".

Thanks,

Paolo

> +for (i = 0; dirs[i] != NULL; i++) {
> +qemu_add_data_dir(dirs[i]);
> +}
> +
> +/* try to find datadir relative to the executable path */
>  qemu_add_data_dir(os_find_datadir());
>  
> -/* If all else fails use the install path specified when building. */
> +/* add the datadir specified when building */
>  qemu_add_data_dir(CONFIG_QEMU_DATADIR);
>  
>  /* -L help lists the data directories and exits. */
> diff --git a/scripts/create_config b/scripts/create_config
> index e6929dd61e..603b826886 100755
> --- a/scripts/create_config
> +++ b/scripts/create_config
> @@ -15,7 +15,7 @@ case $line in
>  echo "#define QEMU_VERSION_MINOR $minor"
>  echo "#define QEMU_VERSION_MICRO $micro"
>  ;;
> - qemu_*dir=*) # qemu-specific directory configuration
> + qemu_*dir=* | qemu_*path=*) # qemu-specific directory configuration
>  name=${line%=*}
>  value=${line#*=}
>  define_name=$(echo $name | LC_ALL=C tr '[a-z]' '[A-Z]')
> 




[Qemu-devel] [Bug 932490] Re: Qemu fails on -fda /dev/fd0 when no medium is present

2017-09-13 Thread Herbert Poetzl
Sorry, it has been more than five years and I don't have a system with a
floppy disk to test anymore.

Best,
Herbert

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/932490

Title:
  Qemu fails on -fda /dev/fd0 when no medium is present

Status in QEMU:
  Incomplete

Bug description:
  # qemu-system-x86_64 --version
  QEMU emulator version 1.0 (qemu-kvm-1.0), Copyright (c) 2003-2008 Fabrice 
Bellard

  # qemu-system-x86_64 -fda /dev/fd0
  qemu-system-x86_64: -fda /dev/fd0: could not open disk image /dev/fd0: No 
such device or address

  Starting with a medium (floppy disk) inserted, then removing or
  changing the medium works fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/932490/+subscriptions



Re: [Qemu-devel] [PATCH v4 21/23] block: Align block status requests

2017-09-13 Thread Eric Blake
On 09/13/2017 02:26 PM, Eric Blake wrote:
> On 09/13/2017 11:03 AM, Eric Blake wrote:
>> Any device that has request_alignment greater than 512 should be
>> unable to report status at a finer granularity; it may also be
>> simpler for such devices to be guaranteed that the block layer
>> has rounded things out to the granularity boundary (the way the
>> block layer already rounds all other I/O out).  Besides, getting
>> the code correct for super-sector alignment also benefits us
>> for the fact that our public interface now has byte granularity,
>> even though none of our drivers have byte-level callbacks.
>>
>> Add an assertion in blkdebug that proves that the block layer
>> never requests status of unaligned sections, similar to what it
>> does on other requests (while still keeping the generic helper
>> in place for when future patches add a throttle driver).  Note
>> that iotest 177 already covers this (it would fail if you use
>> just the blkdebug.c hunk without the io.c changes).  Meanwhile,
>> we can drop assertions in callers that no longer have to pass
>> in sector-aligned addresses.
> 
> Bummer - 'git bisect' says this patch causes iotests 190 to hang.  I'm
> investigating root cause, but I'll have to post a fixup once I figure it
> out.

Found it:

> +/* Round out to request_alignment boundaries */
> +align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
> +aligned_offset = QEMU_ALIGN_DOWN(offset, align);
> +aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;

ROUND_UP(64-bit, 32-bit) has a subtle bug: it truncates the operation at
32 bits, instead of producing a 64-bit result.  Using QEMU_ROUND_UP
instead does NOT have the bug.

That's a ticking time bomb, so I'll patch ROUND_UP() directly as a
pre-requisite, then reply to the cover letter with a Based-on tag.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread no-reply
Hi,

This series failed build test on s390x host. Please find the details below.

Subject: [Qemu-devel] [PATCH] rbd: Detect rbd image resizes and propagate them
Message-id: 20170913164424.32129-1-...@digitalocean.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
# Testing script will be invoked under the git checkout with
# HEAD pointing to a commit that has the patches applied on top of "base"
# branch
set -e
echo "=== ENV ==="
env
echo "=== PACKAGES ==="
rpm -qa
echo "=== TEST BEGIN ==="
CC=$HOME/bin/cc
INSTALL=$PWD/install
BUILD=$PWD/build
echo -n "Using CC: "
realpath $CC
mkdir -p $BUILD $INSTALL
SRC=$PWD
cd $BUILD
$SRC/configure --cc=$CC --prefix=$INSTALL
make -j4
# XXX: we need reliable clean up
# make check -j4 V=1
make install
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/1505298088-10878-1-git-send-email-th...@redhat.com 
-> patchew/1505298088-10878-1-git-send-email-th...@redhat.com
 * [new tag] patchew/20170913164424.32129-1-...@digitalocean.com -> 
patchew/20170913164424.32129-1-...@digitalocean.com
Switched to a new branch 'test'
c0e97b7 rbd: Detect rbd image resizes and propagate them

=== OUTPUT BEGIN ===
=== ENV ===
XDG_SESSION_ID=862
SHELL=/bin/sh
USER=fam
PATCHEW=/home/fam/patchew/patchew-cli -s http://patchew.org --nodebug
PATH=/usr/bin:/bin
PWD=/var/tmp/patchew-tester-tmp-ihkih90_/src
LANG=en_US.UTF-8
HOME=/home/fam
SHLVL=2
LOGNAME=fam
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1012/bus
XDG_RUNTIME_DIR=/run/user/1012
_=/usr/bin/env
=== PACKAGES ===
gpg-pubkey-873529b8-54e386ff
xz-libs-5.2.2-2.fc24.s390x
libxshmfence-1.2-3.fc24.s390x
giflib-4.1.6-15.fc24.s390x
trousers-lib-0.3.13-6.fc24.s390x
ncurses-base-6.0-6.20160709.fc25.noarch
gmp-6.1.1-1.fc25.s390x
libidn-1.33-1.fc25.s390x
slang-2.3.0-7.fc25.s390x
pkgconfig-0.29.1-1.fc25.s390x
alsa-lib-1.1.1-2.fc25.s390x
yum-metadata-parser-1.1.4-17.fc25.s390x
python3-slip-dbus-0.6.4-4.fc25.noarch
python2-cssselect-0.9.2-1.fc25.noarch
createrepo_c-libs-0.10.0-6.fc25.s390x
initscripts-9.69-1.fc25.s390x
parted-3.2-21.fc25.s390x
flex-2.6.0-3.fc25.s390x
colord-libs-1.3.4-1.fc25.s390x
python-osbs-client-0.33-3.fc25.noarch
perl-Pod-Simple-3.35-1.fc25.noarch
python2-simplejson-3.10.0-1.fc25.s390x
brltty-5.4-2.fc25.s390x
librados2-10.2.4-2.fc25.s390x
tcp_wrappers-7.6-83.fc25.s390x
libcephfs_jni1-10.2.4-2.fc25.s390x
nettle-devel-3.3-1.fc25.s390x
bzip2-devel-1.0.6-21.fc25.s390x
libuuid-2.28.2-2.fc25.s390x
python3-dnf-1.1.10-6.fc25.noarch
texlive-kpathsea-doc-svn41139-33.fc25.1.noarch
openssh-7.4p1-4.fc25.s390x
texlive-kpathsea-bin-svn40473-33.20160520.fc25.1.s390x
texlive-graphics-svn41015-33.fc25.1.noarch
texlive-dvipdfmx-def-svn40328-33.fc25.1.noarch
texlive-mfware-svn40768-33.fc25.1.noarch
texlive-texlive-scripts-svn41433-33.fc25.1.noarch
texlive-euro-svn22191.1.1-33.fc25.1.noarch
texlive-etex-svn37057.0-33.fc25.1.noarch
texlive-iftex-svn29654.0.2-33.fc25.1.noarch
texlive-palatino-svn31835.0-33.fc25.1.noarch
texlive-texlive-docindex-svn41430-33.fc25.1.noarch
texlive-xunicode-svn30466.0.981-33.fc25.1.noarch
texlive-koma-script-svn41508-33.fc25.1.noarch
texlive-pst-grad-svn15878.1.06-33.fc25.1.noarch
texlive-pst-blur-svn15878.2.0-33.fc25.1.noarch
texlive-jknapltx-svn19440.0-33.fc25.1.noarch
texinfo-6.1-4.fc25.s390x
openssl-devel-1.0.2k-1.fc25.s390x
jansson-2.10-2.fc25.s390x
fedora-repos-25-4.noarch
python3-libs-3.5.3-6.fc25.s390x
perl-Errno-1.25-387.fc25.s390x
acl-2.2.52-13.fc25.s390x
systemd-pam-231-17.fc25.s390x
NetworkManager-libnm-1.4.4-5.fc25.s390x
poppler-0.45.0-5.fc25.s390x
ccache-3.3.4-1.fc25.s390x
valgrind-3.12.0-9.fc25.s390x
perl-open-1.10-387.fc25.noarch
libgcc-6.4.1-1.fc25.s390x
libsoup-2.56.1-1.fc25.s390x
libstdc++-devel-6.4.1-1.fc25.s390x
nss-softokn-devel-3.31.0-1.0.fc25.s390x
libobjc-6.4.1-1.fc25.s390x
python2-rpm-4.13.0.1-2.fc25.s390x
python2-gluster-3.10.5-1.fc25.s390x
rpm-build-4.13.0.1-2.fc25.s390x
glibc-static-2.24-10.fc25.s390x
lz4-1.8.0-1.fc25.s390x
xapian-core-libs-1.2.24-1.fc25.s390x
elfutils-libelf-devel-0.169-1.fc25.s390x
libaio-0.3.110-6.fc24.s390x
libfontenc-1.1.3-3.fc24.s390x
lzo-2.08-8.fc24.s390x
isl-0.14-5.fc24.s390x
libXau-1.0.8-6.fc24.s390x
linux-atm-libs-2.5.1-14.fc24.s390x
libXext-1.3.3-4.fc24.s390x
libXxf86vm-1.1.4-3.fc24.s390x
bison-3.0.4-4.fc24.s390x
perl-srpm-macros-1-20.fc25.noarch
gawk-4.1.3-8.fc25.s390x
libwayland-client-1.12.0-1.fc25.s390x
perl-Exporter-5.72-366.fc25.noarch
perl-version-0.99.17-1.fc25.s390x
fftw-libs-double-3.3.5-3.fc25.s390x
libssh2-1.8.0-1.fc25.s390x
ModemManager-glib-1.6.4-1.fc25.s390x
newt-python3-0.52.19-2.fc25.s390x
python-munch-2.0.4-3.fc25.noarch
python-bugzilla-1.2.2-4.fc25.noarch
libedit-3.1-16.20160618cvs.fc25.s390x
createrepo_c-0.10.0-6.fc25.s390x
device-mapper-multipath-libs-0.4.9-83.fc25.s390x
yum-3.4.3-510.fc25.noarch
mozjs17-17.0.0-16.fc25.s390x
libselinux-2.5-13.fc25.s390x
python2-pyparsing-2.1.10-1.fc25.noarch
cairo-gobject-1.14.8-1.fc25.s390x

Re: [Qemu-devel] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread no-reply
Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Qemu-devel] [PATCH] rbd: Detect rbd image resizes and propagate them
Message-id: 20170913164424.32129-1-...@digitalocean.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]
patchew/1505298088-10878-1-git-send-email-th...@redhat.com -> 
patchew/1505298088-10878-1-git-send-email-th...@redhat.com
 * [new tag]   patchew/20170913164424.32129-1-...@digitalocean.com 
-> patchew/20170913164424.32129-1-...@digitalocean.com
Switched to a new branch 'test'
c0e97b7e5b rbd: Detect rbd image resizes and propagate them

=== OUTPUT BEGIN ===
Checking PATCH 1/1: rbd: Detect rbd image resizes and propagate them...
WARNING: line over 80 characters
#57: FILE: block/rbd.c:572:
+error_report("bs %s does not have parent", 
bdrv_get_device_or_node_name(bs));

ERROR: line over 90 characters
#66: FILE: block/rbd.c:581:
+error_report("getlength failed on parent %s", 
bdrv_get_device_or_node_name(parent));

total: 1 errors, 1 warnings, 107 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

[Qemu-devel] [PATCH] rbd: Detect rbd image resizes and propagate them

2017-09-13 Thread Adam Wolfe Gordon via Qemu-devel
Register a watcher with rbd so that we get notified when an image is
resized. Propagate resizes to parent block devices so that guest devices
get resized without user intervention.

Signed-off-by: Adam Wolfe Gordon 
---
Hello!

We are using this patch in production at DigitalOcean and have tested it fairly
extensively. However, we use the same block device configuration everywhere, so
I'd be interested to get an answer to the question I've left in the code:

> /* NOTE: This assumes there's only one layer between us and the
>block-backend. Is this always true? */

If that isn't true, this patch will need a bit of work to traverse up the block
device tree and find the block-backend.

Of course, any other feedback would be welcome too - this is my first foray into
the qemu code.

Regards,
Adam

 block/rbd.c | 80 +
 1 file changed, 80 insertions(+)

diff --git a/block/rbd.c b/block/rbd.c
index 144f350e1f..1c9fcbec1f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -96,6 +96,8 @@ typedef struct BDRVRBDState {
 rbd_image_t image;
 char *image_name;
 char *snap;
+uint64_t watcher_handle;
+uint64_t size_bytes;
 } BDRVRBDState;
 
 static char *qemu_rbd_next_tok(char *src, char delim, char **p)
@@ -540,6 +542,67 @@ out:
 return rados_str;
 }
 
+/* BH for when rbd notifies us of an update. */
+static void qemu_rbd_update_bh(void *arg)
+{
+BlockDriverState *parent, *bs = arg;
+BDRVRBDState *s = bs->opaque;
+bool was_variable_length;
+uint64_t new_size_bytes;
+int64_t new_parent_len;
+BdrvChild *c;
+int r;
+
+r = rbd_get_size(s->image, _size_bytes);
+if (r < 0) {
+error_report("error reading size for %s: %s", s->name, strerror(-r));
+return;
+}
+
+/* Avoid no-op resizes on non-resize notifications. */
+if (new_size_bytes == s->size_bytes) {
+error_printf("skipping non-resize rbd cb\n");
+return;
+}
+
+/* NOTE: This assumes there's only one layer between us and the
+   block-backend. Is this always true? */
+parent = bs->inherits_from;
+if (parent == NULL) {
+error_report("bs %s does not have parent", 
bdrv_get_device_or_node_name(bs));
+return;
+}
+
+/* Force parents to re-read our size. */
+was_variable_length = bs->drv->has_variable_length;
+bs->drv->has_variable_length = true;
+new_parent_len = bdrv_getlength(parent);
+if (new_parent_len < 0) {
+error_report("getlength failed on parent %s", 
bdrv_get_device_or_node_name(parent));
+bs->drv->has_variable_length = was_variable_length;
+return;
+}
+bs->drv->has_variable_length = was_variable_length;
+
+/* Notify block backends that that we have resized.
+   Copied from bdrv_parent_cb_resize. */
+QLIST_FOREACH(c, >parents, next_parent) {
+if (c->role->resize) {
+c->role->resize(c);
+}
+}
+
+s->size_bytes = new_size_bytes;
+}
+
+/* Called from non-qemu thread - careful! */
+static void qemu_rbd_update_cb(void *arg)
+{
+BlockDriverState *bs = arg;
+
+aio_bh_schedule_oneshot(bdrv_get_aio_context(bs), qemu_rbd_update_bh, bs);
+}
+
 static int qemu_rbd_open(BlockDriverState *bs, QDict *options, int flags,
  Error **errp)
 {
@@ -672,9 +735,25 @@ static int qemu_rbd_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
+r = rbd_update_watch(s->image, >watcher_handle, qemu_rbd_update_cb, bs);
+if (r < 0) {
+error_setg_errno(errp, -r, "error registering watcher on %s", s->name);
+goto failed_watch;
+}
+
+r = rbd_get_size(s->image, >size_bytes);
+if (r < 0) {
+error_setg_errno(errp, -r, "error reading size for %s", s->name);
+goto failed_sz;
+}
+
 qemu_opts_del(opts);
 return 0;
 
+failed_sz:
+rbd_update_unwatch(s->image, s->watcher_handle);
+failed_watch:
+rbd_close(s->image);
 failed_open:
 rados_ioctx_destroy(s->io_ctx);
 failed_shutdown:
@@ -712,6 +791,7 @@ static void qemu_rbd_close(BlockDriverState *bs)
 {
 BDRVRBDState *s = bs->opaque;
 
+rbd_update_unwatch(s->image, s->watcher_handle);
 rbd_close(s->image);
 rados_ioctx_destroy(s->io_ctx);
 g_free(s->snap);
-- 
2.14.1




Re: [Qemu-devel] [PATCH v4 2/3] backup: Adds Backup Tool

2017-09-13 Thread John Snow


On 09/11/2017 01:02 AM, Fam Zheng wrote:
> On Fri, 09/08 22:11, Ishani Chugh wrote:
>> +def build_parser():
>> +backup_tool = BackupTool()
>> +parser = ArgumentParser()
>> +subparsers = parser.add_subparsers(title='Subcommands',
>> +   description='Valid Subcommands',
>> +   help='Subcommand help')
>> +guest_parser = subparsers.add_parser('guest', help='Manage guest(s)')
>> +guest_subparsers = guest_parser.add_subparsers(title='Guest Subparser')
>> +#   Guest list
> 
> Odd indentation of comments. Please align the # at 4th colume line other code
> lines.  The same to below.
> 

Sorry, that's probably my fault based on how I suggested the comments in
the previous email review. (:

Fam is right though, the pythonic way to comment is by aligning the
octothorpe to column 4 instead of leaving it at column 0.

--js



Re: [Qemu-devel] [PATCH v4 21/23] block: Align block status requests

2017-09-13 Thread Eric Blake
On 09/13/2017 11:03 AM, Eric Blake wrote:
> Any device that has request_alignment greater than 512 should be
> unable to report status at a finer granularity; it may also be
> simpler for such devices to be guaranteed that the block layer
> has rounded things out to the granularity boundary (the way the
> block layer already rounds all other I/O out).  Besides, getting
> the code correct for super-sector alignment also benefits us
> for the fact that our public interface now has byte granularity,
> even though none of our drivers have byte-level callbacks.
> 
> Add an assertion in blkdebug that proves that the block layer
> never requests status of unaligned sections, similar to what it
> does on other requests (while still keeping the generic helper
> in place for when future patches add a throttle driver).  Note
> that iotest 177 already covers this (it would fail if you use
> just the blkdebug.c hunk without the io.c changes).  Meanwhile,
> we can drop assertions in callers that no longer have to pass
> in sector-aligned addresses.

Bummer - 'git bisect' says this patch causes iotests 190 to hang.  I'm
investigating root cause, but I'll have to post a fixup once I figure it
out.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v1 0/2] Virtio GPU for S390

2017-09-13 Thread Farhan Ali



On 09/13/2017 03:00 PM, Thomas Huth wrote:

On 12.09.2017 16:26, Farhan Ali wrote:

These patches wire up the virtio-gpu device for CCW bus for S390.

For the S390 architecture which does not natively support any
graphics device, virtio gpu in 2D mode could be used to emulate a
simple graphics card and use VNC as the display.

eg: qemu-system-s390x ... -device virtio-gpu-ccw,devno=fe.0.0101
-vnc host_ip_addr:5900

Note, to actually see any display content the guest kernel needs to
support DRM layer, Virtio GPU driver, the Virtual Terminal layer
etc.


Do you have a list of CONFIG options that need to be enabled there?
Are there also any patches to the guest kernel driver required? Or
did that work out of the box once you've enabled the right CONFIG
options?


It required some kernel hacking. You need to enable the VT layer for
S390 to get any kind of graphics displayed.

I experimented on the guest side to enable the VT layer and run a
framebuffer console and also the Xfce desktop :)

Anyway the CONFIG options I used are:

The DRM configs to enable the DRM layer and virtio-gpu. I went with the 
default options for DRM layer.


CONFIG_DRM

CONFIG_DRM_VIRTIO_GPU

We also need to enable configs for the VT layer

CONFIG_VT

CONFIG_DUMMY_CONSOLE

And to display a framebuffer console for the guest

CONFIG_FRAMEBUFFER_CONSOLE




I would appreciate any feedback on these patches, specially the
first patch.


Patches look good to me, but I'm not at all familiar with the
virtio-gpu code, so that likely does not count...

Anyway, thanks a lot for tackling this! It's pretty cool to finally
have a graphics card on s390x, too :-)

Thomas






Re: [Qemu-devel] [Qemu-block] [PATCH] block: Clean up some bad code in the vvfat driver

2017-09-13 Thread John Snow


On 09/13/2017 06:21 AM, Thomas Huth wrote:
> Remove the unnecessary home-grown redefinition of the assert() macro here,
> and remove the unusable debug code at the end of the checkpoint() function.
> The code there uses assert() with side-effects (assignment to the "mapping"
> variable), which should be avoided. Looking more closely, it seems as it is
> apparently also only usable for one certain directory layout (with a file
> named USB.H in it) and thus is of no use for the rest of the world.
> 
> Signed-off-by: Thomas Huth 

Farewell, bitrot code.

Reviewed-by: John Snow 

Out of curiosity, I wonder ...

jhuston@probe (foobar) ~/s/qemu> git grep '#if 0' | wc -l
320

oh no



Re: [Qemu-devel] [PATCH v1 0/2] Virtio GPU for S390

2017-09-13 Thread Thomas Huth
On 12.09.2017 16:26, Farhan Ali wrote:
> These patches wire up the virtio-gpu device for CCW bus for
> S390.
> 
> For the S390 architecture which does not natively support any graphics
> device, virtio gpu in 2D mode could be used to emulate a simple graphics
> card and use VNC as the display.
> 
> eg: qemu-system-s390x ... -device virtio-gpu-ccw,devno=fe.0.0101
> -vnc host_ip_addr:5900
> 
> Note, to actually see any display content the
> guest kernel needs to support DRM layer, Virtio GPU driver,
> the Virtual Terminal layer etc.

Do you have a list of CONFIG options that need to be enabled there?
Are there also any patches to the guest kernel driver required? Or did
that work out of the box once you've enabled the right CONFIG options?

> I would appreciate any feedback on these patches, specially the
> first patch.

Patches look good to me, but I'm not at all familiar with the virtio-gpu
code, so that likely does not count...

Anyway, thanks a lot for tackling this! It's pretty cool to finally have
a graphics card on s390x, too :-)

 Thomas



[Qemu-devel] [Bug 1716767] Re: file(1) fails with "Invalid argument" on qemu-sh4-user

2017-09-13 Thread James Clarke
With the attached patch, qemu-sh4-static now works for file:

root@debian:/# /usr/bin/qemu-sh4-static.new2 /usr/bin/file /bin/bash
/bin/bash: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV), dynamically 
linked, interpreter /lib/ld-linux.so.2, 
BuildID[sha1]=579b7ca73585e30c022e6dec83666ecc746f7499, for GNU/Linux 3.2.0, 
stripped

** Patch added: 
"0001-linux-user-syscall.c-Handle-SH4-s-exceptional-alignm.patch"
   
https://bugs.launchpad.net/qemu/+bug/1716767/+attachment/4949652/+files/0001-linux-user-syscall.c-Handle-SH4-s-exceptional-alignm.patch

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1716767

Title:
  file(1) fails with "Invalid argument" on qemu-sh4-user

Status in QEMU:
  New

Bug description:
  We recently discovered that file(1) fails on qemu-sh4-user when
  running on an ELF file:

  (sid_sh4)root@vs94:/# file /bin/bash
  /bin/bash: ERROR: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV) 
error reading (Invalid argument)
  (sid_sh4)root@vs94:/#

  Running with "-d" yields more output:

  (sid_sh4)root@vs94:/# file -d /bin/bash 2>&1 | tail
  322: >> 7 byte&,=97,"(ARM)"]
  0 == 97 = 0
  mget(type=1, flag=0, offset=7, o=0, nbytes=863324, il=0, nc=1)
  mget/96 @7: 
\000\000\000\000\000\000\000\000\000\002\000*\000\001\000\000\000\250\317A\0004\000\000\000L(\r\000\027\000\000\0004\000
 
\000\n\000(\000\032\000\031\000\006\000\000\0004\000\000\0004\000@\0004\000@\000@\001\000\000@\001\000\000\005\000\000\000\004\000\000\000\003\000\000\000t\001\000\000t\001@\000t\001@\000\023\000\000

  323: >> 7 byte&,=-1,"(embedded)"]
  0 == 18446744073709551615 = 0
  [try softmagic 1]
  [try elf -1]
  /bin/bash: ERROR: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV) 
error reading (Invalid argument)
  (sid_sh4)root@vs94:/#

  It seems that the comparison above has a bogus (overflown?) value.

  On actual hardware, it works:

  root@tirpitz:~> file /bin/bash
  /bin/bash: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV), 
dynamically linked, interpreter /lib/ld-linux.so.2, 
BuildID[sha1]=4dd0e4281755827d8bb6686fd481f8c80ea73e9a, for GNU/Linux 3.2.0, 
stripped
  root@tirpitz:~>

  I have uploaded a chroot with Debian unstable which allows to
  reproduce the issue:

  > https://people.debian.org/~glaubitz/sid-sh4-sbuild.tar.gz

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1716767/+subscriptions



[Qemu-devel] [PATCH v2] tests: Introduce generic device hot-plug/hot-unplug functions

2017-09-13 Thread Thomas Huth
A lot of tests provide code for adding and removing a device via the
device_add and device_del QMP commands. Maintaining this code in so many
places is cumbersome and error-prone (some of the code parts check the
responses for device deletion in an incorrect way, for example, we've got
to deal with both, error code and DEVICE_DEL event here). So let's provide
some proper generic functions for adding and removing a device instead.

The code for correctly unplugging a device has been taken from a patch
from Peter Xu.

Signed-off-by: Thomas Huth 
---
 v2:
 - Use the unplug code from Peter Xu's patch
 - Renamed the functions to *device_add/del instead of *hotplug/unplug

 tests/libqos/pci.c | 19 ++--
 tests/libqos/usb.c | 30 ---
 tests/libqtest.c   | 75 ++
 tests/libqtest.h   | 19 
 tests/usb-hcd-uhci-test.c  | 26 ++--
 tests/usb-hcd-xhci-test.c  | 51 +++
 tests/virtio-scsi-test.c   | 24 ++-
 tests/virtio-serial-test.c | 25 ++--
 8 files changed, 113 insertions(+), 156 deletions(-)

diff --git a/tests/libqos/pci.c b/tests/libqos/pci.c
index 2dcdead..df1f98e 100644
--- a/tests/libqos/pci.c
+++ b/tests/libqos/pci.c
@@ -394,21 +394,6 @@ QPCIBar qpci_legacy_iomap(QPCIDevice *dev, uint16_t addr)
 void qpci_plug_device_test(const char *driver, const char *id,
uint8_t slot, const char *opts)
 {
-QDict *response;
-char *cmd;
-
-cmd = g_strdup_printf("{'execute': 'device_add',"
-  " 'arguments': {"
-  "   'driver': '%s',"
-  "   'addr': '%d',"
-  "   %s%s"
-  "   'id': '%s'"
-  "}}", driver, slot,
-  opts ? opts : "", opts ? "," : "",
-  id);
-response = qmp(cmd);
-g_free(cmd);
-g_assert(response);
-g_assert(!qdict_haskey(response, "error"));
-QDECREF(response);
+qtest_qmp_device_add(driver, id, "'addr': '%d'%s%s", slot,
+ opts ? ", " : "", opts ? opts : "");
 }
diff --git a/tests/libqos/usb.c b/tests/libqos/usb.c
index 0cdfaec..2a47604 100644
--- a/tests/libqos/usb.c
+++ b/tests/libqos/usb.c
@@ -40,34 +40,16 @@ void uhci_port_test(struct qhc *hc, int port, uint16_t 
expect)
 void usb_test_hotplug(const char *hcd_id, const int port,
   void (*port_check)(void))
 {
-QDict *response;
-char  *cmd;
+char  *id = g_strdup_printf("usbdev%d", port);
 
-cmd = g_strdup_printf("{'execute': 'device_add',"
-  " 'arguments': {"
-  "   'driver': 'usb-tablet',"
-  "   'port': '%d',"
-  "   'bus': '%s.0',"
-  "   'id': 'usbdev%d'"
-  "}}", port, hcd_id, port);
-response = qmp(cmd);
-g_free(cmd);
-g_assert(response);
-g_assert(!qdict_haskey(response, "error"));
-QDECREF(response);
+qtest_qmp_device_add("usb-tablet", id, "'port': '%d', 'bus': '%s.0'",
+ port, hcd_id);
 
 if (port_check) {
 port_check();
 }
 
-cmd = g_strdup_printf("{'execute': 'device_del',"
-   " 'arguments': {"
-   "   'id': 'usbdev%d'"
-   "}}", port);
-response = qmp(cmd);
-g_free(cmd);
-g_assert(response);
-g_assert(qdict_haskey(response, "event"));
-g_assert(!strcmp(qdict_get_str(response, "event"), "DEVICE_DELETED"));
-QDECREF(response);
+qtest_qmp_device_del(id);
+
+g_free(id);
 }
diff --git a/tests/libqtest.c b/tests/libqtest.c
index b9a1f18..0c12b38 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -987,3 +987,78 @@ void qtest_cb_for_every_machine(void (*cb)(const char 
*machine))
 qtest_end();
 QDECREF(response);
 }
+
+/*
+ * Generic hot-plugging test via the device_add QMP command.
+ */
+void qtest_qmp_device_add(const char *driver, const char *id, const char *fmt,
+  ...)
+{
+QDict *response;
+char *cmd, *opts = NULL;
+va_list va;
+
+if (fmt) {
+va_start(va, fmt);
+opts = g_strdup_vprintf(fmt, va);
+va_end(va);
+}
+
+cmd = g_strdup_printf("{'execute': 'device_add',"
+  " 'arguments': { 'driver': '%s', 'id': '%s'%s%s }}",
+  driver, id, opts ? ", " : "", opts ? opts : "");
+g_free(opts);
+
+response = qmp(cmd);
+g_free(cmd);
+g_assert(response);
+g_assert(!qdict_haskey(response, "event")); /* We don't expect any events 
*/
+g_assert(!qdict_haskey(response, "error"));
+QDECREF(response);
+}
+
+/*
+ * Generic hot-unplugging test via the device_del QMP command.
+ * Device deletion will get 

[Qemu-devel] [PATCH 13/18] block/mirror: Keep write perm for pending writes

2017-09-13 Thread Max Reitz
The owner of the mirror BDS might retire its write permission; but there
may still be pending mirror operations so the mirror BDS cannot
necessarily retire its write permission for its child then.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 05410c94ca..612fab660e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1236,6 +1236,7 @@ static void bdrv_mirror_top_child_perm(BlockDriverState 
*bs, BdrvChild *c,
uint64_t *nperm, uint64_t *nshared)
 {
 MirrorBDSOpaque *s = bs->opaque;
+bool ops_in_flight = s->job && !QTAILQ_EMPTY(>job->ops_in_flight);
 
 if (s->job && s->job->exiting) {
 *nperm = 0;
@@ -1243,9 +1244,10 @@ static void bdrv_mirror_top_child_perm(BlockDriverState 
*bs, BdrvChild *c,
 return;
 }
 
-/* Must be able to forward guest writes to the real image */
+/* Must be able to forward both new and pending guest writes to
+ * the real image */
 *nperm = 0;
-if (perm & BLK_PERM_WRITE) {
+if ((perm & BLK_PERM_WRITE) || ops_in_flight) {
 *nperm |= BLK_PERM_WRITE;
 }
 
-- 
2.13.5




[Qemu-devel] [PATCH 17/18] qemu-io: Add background write

2017-09-13 Thread Max Reitz
Add a new parameter -B to qemu-io's write command.  When used, qemu-io
will not wait for the result of the operation and instead execute it in
the background.

Signed-off-by: Max Reitz 
---
 qemu-io-cmds.c | 83 +-
 1 file changed, 77 insertions(+), 6 deletions(-)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index 2811a89099..c635a248f5 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -481,6 +481,62 @@ static int do_pwrite(BlockBackend *blk, char *buf, int64_t 
offset,
 typedef struct {
 BlockBackend *blk;
 int64_t offset;
+int bytes;
+char *buf;
+int flags;
+} CoBackgroundWrite;
+
+static void coroutine_fn co_background_pwrite_entry(void *opaque)
+{
+CoBackgroundWrite *data = opaque;
+QEMUIOVector qiov;
+int ret;
+
+qemu_iovec_init(, 1);
+qemu_iovec_add(, data->buf, data->bytes);
+
+ret = blk_co_pwritev(data->blk, data->offset, data->bytes, ,
+ data->flags);
+
+qemu_iovec_destroy();
+g_free(data->buf);
+
+if (ret < 0) {
+Error *err;
+error_setg_errno(, -ret, "Background write failed");
+error_report_err(err);
+}
+}
+
+/* Takes ownership of @buf */
+static int do_background_pwrite(BlockBackend *blk, char *buf, int64_t offset,
+int64_t bytes, int flags)
+{
+Coroutine *co;
+CoBackgroundWrite *data;
+
+if (bytes > INT_MAX) {
+return -ERANGE;
+}
+
+data = g_new(CoBackgroundWrite, 1);
+*data = (CoBackgroundWrite){
+.blk= blk,
+.offset = offset,
+.bytes  = bytes,
+.buf= buf,
+.flags  = flags,
+};
+
+co = qemu_coroutine_create(co_background_pwrite_entry, data);
+bdrv_coroutine_enter(blk_bs(blk), co);
+
+return bytes;
+}
+
+typedef struct {
+BlockBackend *blk;
+int64_t offset;
 int64_t bytes;
 int64_t *total;
 int flags;
@@ -931,6 +987,7 @@ static void write_help(void)
 " Writes into a segment of the currently open file, using a buffer\n"
 " filled with a set pattern (0xcdcdcdcd).\n"
 " -b, -- write to the VM state rather than the virtual disk\n"
+" -B, -- just start a background write, do not wait for the result\n"
 " -c, -- write compressed data with blk_write_compressed\n"
 " -f, -- use Force Unit Access semantics\n"
 " -p, -- ignored for backwards compatibility\n"
@@ -951,7 +1008,7 @@ static const cmdinfo_t write_cmd = {
 .perm   = BLK_PERM_WRITE,
 .argmin = 2,
 .argmax = -1,
-.args   = "[-bcCfquz] [-P pattern] off len",
+.args   = "[-bBcCfquz] [-P pattern] off len",
 .oneline= "writes a number of bytes at a specified offset",
 .help   = write_help,
 };
@@ -961,6 +1018,7 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 struct timeval t1, t2;
 bool Cflag = false, qflag = false, bflag = false;
 bool Pflag = false, zflag = false, cflag = false;
+bool background = false;
 int flags = 0;
 int c, cnt;
 char *buf = NULL;
@@ -970,11 +1028,14 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 int64_t total = 0;
 int pattern = 0xcd;
 
-while ((c = getopt(argc, argv, "bcCfpP:quz")) != -1) {
+while ((c = getopt(argc, argv, "bBcCfpP:quz")) != -1) {
 switch (c) {
 case 'b':
 bflag = true;
 break;
+case 'B':
+background = true;
+break;
 case 'c':
 cflag = true;
 break;
@@ -1032,6 +1093,11 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 return 0;
 }
 
+if (background && (bflag || cflag || zflag)) {
+printf("-B cannot be specified together with -b, -c, or -z\n");
+return 0;
+}
+
 offset = cvtnum(argv[optind]);
 if (offset < 0) {
 print_cvtnum_err(offset, argv[optind]);
@@ -1074,6 +1140,8 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 cnt = do_co_pwrite_zeroes(blk, offset, count, flags, );
 } else if (cflag) {
 cnt = do_write_compressed(blk, buf, offset, count, );
+} else if (background) {
+cnt = do_background_pwrite(blk, buf, offset, count, flags);
 } else {
 cnt = do_pwrite(blk, buf, offset, count, flags, );
 }
@@ -1088,12 +1156,15 @@ static int write_f(BlockBackend *blk, int argc, char 
**argv)
 goto out;
 }
 
-/* Finally, report back -- -C gives a parsable format */
-t2 = tsub(t2, t1);
-print_report("wrote", , offset, count, total, cnt, Cflag);
+if (!background) {
+/* Finally, report back -- -C gives a parsable format */
+t2 = tsub(t2, t1);
+print_report("wrote", , offset, count, total, cnt, Cflag);
+}
 
 out:
-if (!zflag) {
+/* do_background_pwrite() takes ownership of the buffer */
+if (!zflag && !background) {
 qemu_io_free(buf);
 }
 
-- 
2.13.5




[Qemu-devel] [PATCH 16/18] block/mirror: Add copy mode QAPI interface

2017-09-13 Thread Max Reitz
This patch allows the user to specify whether to use active or only
passive mode for mirror block jobs.  Currently, this setting will remain
constant for the duration of the entire block job.

Signed-off-by: Max Reitz 
---
 qapi/block-core.json  | 11 +--
 include/block/block_int.h |  4 +++-
 block/mirror.c| 11 ++-
 blockdev.c|  9 -
 4 files changed, 26 insertions(+), 9 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index e072cfa67c..40204d367a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1578,6 +1578,9 @@
 # written. Both will result in identical contents.
 # Default is true. (Since 2.4)
 #
+# @copy-mode: when to copy data to the destination; defaults to 'passive'
+# (Since: 2.11)
+#
 # Since: 1.3
 ##
 { 'struct': 'DriveMirror',
@@ -1587,7 +1590,7 @@
 '*speed': 'int', '*granularity': 'uint32',
 '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError',
-'*unmap': 'bool' } }
+'*unmap': 'bool', '*copy-mode': 'MirrorCopyMode' } }
 
 ##
 # @BlockDirtyBitmap:
@@ -1766,6 +1769,9 @@
 #above @device. If this option is not given, a node name is
 #autogenerated. (Since: 2.9)
 #
+# @copy-mode: when to copy data to the destination; defaults to 'passive'
+# (Since: 2.11)
+#
 # Returns: nothing on success.
 #
 # Since: 2.6
@@ -1786,7 +1792,8 @@
 '*speed': 'int', '*granularity': 'uint32',
 '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError',
-'*filter-node-name': 'str' } }
+'*filter-node-name': 'str',
+'*copy-mode': 'MirrorCopyMode' } }
 
 ##
 # @block_set_io_throttle:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index fa8bbf1f8b..517b2680ce 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -934,6 +934,7 @@ void commit_active_start(const char *job_id, 
BlockDriverState *bs,
  * @filter_node_name: The node name that should be assigned to the filter
  * driver that the mirror job inserts into the graph above @bs. NULL means that
  * a node name should be autogenerated.
+ * @copy_mode: When to trigger writes to the target.
  * @errp: Error object.
  *
  * Start a mirroring operation on @bs.  Clusters that are allocated
@@ -947,7 +948,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
-  bool unmap, const char *filter_node_name, Error **errp);
+  bool unmap, const char *filter_node_name,
+  MirrorCopyMode copy_mode, Error **errp);
 
 /*
  * backup_job_create:
diff --git a/block/mirror.c b/block/mirror.c
index c429aa77bb..8a67935cc4 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1464,7 +1464,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
  const BlockJobDriver *driver,
  bool is_none_mode, BlockDriverState *base,
  bool auto_complete, const char *filter_node_name,
- bool is_mirror,
+ bool is_mirror, MirrorCopyMode copy_mode,
  Error **errp)
 {
 MirrorBlockJob *s;
@@ -1574,7 +1574,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 s->on_target_error = on_target_error;
 s->is_none_mode = is_none_mode;
 s->backing_mode = backing_mode;
-s->copy_mode = MIRROR_COPY_MODE_PASSIVE;
+s->copy_mode = copy_mode;
 s->base = base;
 s->granularity = granularity;
 s->buf_size = ROUND_UP(buf_size, granularity);
@@ -1643,7 +1643,8 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
-  bool unmap, const char *filter_node_name, Error **errp)
+  bool unmap, const char *filter_node_name,
+  MirrorCopyMode copy_mode, Error **errp)
 {
 bool is_none_mode;
 BlockDriverState *base;
@@ -1658,7 +1659,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
  speed, granularity, buf_size, backing_mode,
  on_source_error, on_target_error, unmap, NULL, NULL,
  _job_driver, is_none_mode, base, false,
- filter_node_name, true, errp);
+ filter_node_name, true, copy_mode, errp);
 }
 
 void commit_active_start(const char *job_id, BlockDriverState *bs,
@@ 

[Qemu-devel] [PATCH 18/18] iotests: Add test for active mirroring

2017-09-13 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/151 | 111 +
 tests/qemu-iotests/151.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 117 insertions(+)
 create mode 100755 tests/qemu-iotests/151
 create mode 100644 tests/qemu-iotests/151.out

diff --git a/tests/qemu-iotests/151 b/tests/qemu-iotests/151
new file mode 100755
index 00..49a60773f9
--- /dev/null
+++ b/tests/qemu-iotests/151
@@ -0,0 +1,111 @@
+#!/usr/bin/env python
+#
+# Tests for active mirroring
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img
+
+source_img = os.path.join(iotests.test_dir, 'source.' + iotests.imgfmt)
+target_img = os.path.join(iotests.test_dir, 'target.' + iotests.imgfmt)
+
+class TestActiveMirror(iotests.QMPTestCase):
+image_len = 128 * 1024 * 1024 # MB
+potential_writes_in_flight = True
+
+def setUp(self):
+qemu_img('create', '-f', iotests.imgfmt, source_img, '128M')
+qemu_img('create', '-f', iotests.imgfmt, target_img, '128M')
+
+blk_source = {'node-name': 'source',
+  'driver': iotests.imgfmt,
+  'file': {'driver': 'file',
+   'filename': source_img}}
+
+blk_target = {'node-name': 'target',
+  'driver': iotests.imgfmt,
+  'file': {'driver': 'file',
+   'filename': target_img}}
+
+self.vm = iotests.VM()
+self.vm.add_blockdev(self.qmp_to_opts(blk_source))
+self.vm.add_blockdev(self.qmp_to_opts(blk_target))
+self.vm.launch()
+
+def tearDown(self):
+self.vm.shutdown()
+
+if not self.potential_writes_in_flight:
+self.assertTrue(iotests.compare_images(source_img, target_img),
+'mirror target does not match source')
+
+os.remove(source_img)
+os.remove(target_img)
+
+def doActiveIO(self, sync_source_and_target):
+# Fill the source image
+self.vm.hmp_qemu_io('source',
+'write -P 1 0 %i' % self.image_len);
+
+# Start some background requests
+for offset in range(0, self.image_len, 1024 * 1024):
+self.vm.hmp_qemu_io('source', 'write -B -P 2 %i 1M' % offset)
+
+# Start the block job
+result = self.vm.qmp('blockdev-mirror',
+ job_id='mirror',
+ filter_node_name='mirror-node',
+ device='source',
+ target='target',
+ sync='full',
+ copy_mode='active-write')
+self.assert_qmp(result, 'return', {})
+
+# Start some more requests
+for offset in range(0, self.image_len, 1024 * 1024):
+self.vm.hmp_qemu_io('mirror-node', 'write -B -P 3 %i 1M' % offset)
+
+# Wait for the READY event
+self.wait_ready(drive='mirror')
+
+# Now start some final requests; all of these (which land on
+# the source) should be settled using the active mechanism.
+# The mirror code itself asserts that the source BDS's dirty
+# bitmap will stay clean between READY and COMPLETED.
+for offset in range(0, self.image_len, 1024 * 1024):
+self.vm.hmp_qemu_io('mirror-node', 'write -B -P 4 %i 1M' % offset)
+
+if sync_source_and_target:
+# If source and target should be in sync after the mirror,
+# we have to flush before completion
+self.vm.hmp_qemu_io('mirror-node', 'flush')
+self.potential_writes_in_flight = False
+
+self.complete_and_wait(drive='mirror', wait_ready=False)
+
+def testActiveIO(self):
+self.doActiveIO(False)
+
+def testActiveIOFlushed(self):
+self.doActiveIO(True)
+
+
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['qcow2', 'raw'])
diff --git a/tests/qemu-iotests/151.out b/tests/qemu-iotests/151.out
new file mode 100644
index 00..fbc63e62f8
--- /dev/null
+++ b/tests/qemu-iotests/151.out
@@ -0,0 +1,5 @@
+..
+--
+Ran 2 tests
+
+OK
diff --git a/tests/qemu-iotests/group 

[Qemu-devel] [PATCH 10/18] block/mirror: Make source the file child

2017-09-13 Thread Max Reitz
Regarding the source BDS, the mirror BDS is arguably a filter node.
Therefore, the source BDS should be its "file" child.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 127 ++---
 block/qapi.c   |  25 ++---
 tests/qemu-iotests/141.out |   4 +-
 3 files changed, 119 insertions(+), 37 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9df4157511..05410c94ca 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -77,8 +77,16 @@ typedef struct MirrorBlockJob {
 int target_cluster_size;
 int max_iov;
 bool initial_zeroing_ongoing;
+
+/* Signals that we are no longer accessing source and target and the mirror
+ * BDS should thus relinquish all permissions */
+bool exiting;
 } MirrorBlockJob;
 
+typedef struct MirrorBDSOpaque {
+MirrorBlockJob *job;
+} MirrorBDSOpaque;
+
 struct MirrorOp {
 MirrorBlockJob *s;
 QEMUIOVector qiov;
@@ -595,12 +603,15 @@ static void mirror_exit(BlockJob *job, void *opaque)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 MirrorExitData *data = opaque;
+MirrorBDSOpaque *bs_opaque = s->mirror_top_bs->opaque;
 AioContext *replace_aio_context = NULL;
 BlockDriverState *src = s->source->bs;
 BlockDriverState *target_bs = blk_bs(s->target);
 BlockDriverState *mirror_top_bs = s->mirror_top_bs;
 Error *local_err = NULL;
 
+s->exiting = true;
+
 bdrv_release_dirty_bitmap(src, s->dirty_bitmap);
 
 /* Make sure that the source BDS doesn't go away before we called
@@ -622,7 +633,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 
 /* We don't access the source any more. Dropping any WRITE/RESIZE is
  * required before it could become a backing file of target_bs. */
-bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
+bdrv_child_try_set_perm(mirror_top_bs->file, 0, BLK_PERM_ALL,
 _abort);
 if (s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
 BlockDriverState *backing = s->is_none_mode ? src : s->base;
@@ -673,12 +684,11 @@ static void mirror_exit(BlockJob *job, void *opaque)
 
 /* Remove the mirror filter driver from the graph. Before this, get rid of
  * the blockers on the intermediate nodes so that the resulting state is
- * valid. Also give up permissions on mirror_top_bs->backing, which might
+ * valid. Also give up permissions on mirror_top_bs->file, which might
  * block the removal. */
 block_job_remove_all_bdrv(job);
-bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
-_abort);
-bdrv_replace_node(mirror_top_bs, backing_bs(mirror_top_bs), _abort);
+bdrv_child_try_set_perm(mirror_top_bs->file, 0, BLK_PERM_ALL, 
_abort);
+bdrv_replace_node(mirror_top_bs, mirror_top_bs->file->bs, _abort);
 
 /* We just changed the BDS the job BB refers to (with either or both of the
  * bdrv_replace_node() calls), so switch the BB back so the cleanup does
@@ -687,6 +697,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 blk_set_perm(job->blk, 0, BLK_PERM_ALL, _abort);
 blk_insert_bs(job->blk, mirror_top_bs, _abort);
 
+bs_opaque->job = NULL;
 block_job_completed(>common, data->ret);
 
 g_free(data);
@@ -1102,7 +1113,7 @@ static void mirror_drain(BlockJob *job)
 {
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 
-/* Need to keep a reference in case blk_drain triggers execution
+/* Need to keep a reference in case bdrv_drain triggers execution
  * of mirror_complete...
  */
 if (s->target) {
@@ -1135,44 +1146,88 @@ static const BlockJobDriver commit_active_job_driver = {
 .drain  = mirror_drain,
 };
 
+static void source_child_inherit_fmt_options(int *child_flags,
+ QDict *child_options,
+ int parent_flags,
+ QDict *parent_options)
+{
+child_backing.inherit_options(child_flags, child_options,
+  parent_flags, parent_options);
+}
+
+static char *source_child_get_parent_desc(BdrvChild *c)
+{
+return child_backing.get_parent_desc(c);
+}
+
+static void source_child_cb_drained_begin(BdrvChild *c)
+{
+BlockDriverState *bs = c->opaque;
+MirrorBDSOpaque *s = bs->opaque;
+
+if (s && s->job) {
+block_job_drained_begin(>job->common);
+}
+bdrv_drained_begin(bs);
+}
+
+static void source_child_cb_drained_end(BdrvChild *c)
+{
+BlockDriverState *bs = c->opaque;
+MirrorBDSOpaque *s = bs->opaque;
+
+if (s && s->job) {
+block_job_drained_end(>job->common);
+}
+bdrv_drained_end(bs);
+}
+
+static BdrvChildRole source_child_role = {
+.inherit_options= source_child_inherit_fmt_options,
+.get_parent_desc= source_child_get_parent_desc,
+

[Qemu-devel] [PATCH 14/18] block/mirror: Distinguish active from passive ops

2017-09-13 Thread Max Reitz
Currently, the mirror block job only knows passive operations.  But once
we introduce active writes, we need to distinguish between the two; for
example, mirror_wait_for_free_in_flight_slot() should wait for a passive
operation because active writes will not use the same in-flight slots.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 612fab660e..8fea619a68 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -96,6 +96,7 @@ struct MirrorOp {
 /* Set by mirror_co_read() before yielding for the first time */
 uint64_t bytes_copied;
 
+bool is_active_write;
 CoQueue waiting_requests;
 
 QTAILQ_ENTRY(MirrorOp) next;
@@ -286,9 +287,14 @@ static inline void 
mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 {
 MirrorOp *op;
 
-op = QTAILQ_FIRST(>ops_in_flight);
-assert(op);
-qemu_co_queue_wait(>waiting_requests, NULL);
+QTAILQ_FOREACH(op, >ops_in_flight, next) {
+if (!op->is_active_write) {
+/* Only non-active operations use up in-flight slots */
+qemu_co_queue_wait(>waiting_requests, NULL);
+return;
+}
+}
+abort();
 }
 
 /* Submit async read while handling COW.
-- 
2.13.5




[Qemu-devel] [PATCH 15/18] block/mirror: Add active mirroring

2017-09-13 Thread Max Reitz
This patch implements active synchronous mirroring.  In active mode, the
passive mechanism will still be in place and is used to copy all
initially dirty clusters off the source disk; but every write request
will write data both to the source and the target disk, so the source
cannot be dirtied faster than data is mirrored to the target.  Also,
once the block job has converged (BLOCK_JOB_READY sent), source and
target are guaranteed to stay in sync (unless an error occurs).

Optionally, dirty data can be copied to the target disk on read
operations, too.

Active mode is completely optional and currently disabled at runtime.  A
later patch will add a way for users to enable it.

Signed-off-by: Max Reitz 
---
 qapi/block-core.json |  23 +++
 block/mirror.c   | 187 +--
 2 files changed, 205 insertions(+), 5 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index bb11815608..e072cfa67c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -938,6 +938,29 @@
   'data': ['top', 'full', 'none', 'incremental'] }
 
 ##
+# @MirrorCopyMode:
+#
+# An enumeration whose values tell the mirror block job when to
+# trigger writes to the target.
+#
+# @passive: copy data in background only.
+#
+# @active-write: when data is written to the source, write it
+#(synchronously) to the target as well.  In addition,
+#data is copied in background just like in @passive
+#mode.
+#
+# @active-read-write: write data to the target (synchronously) both
+# when it is read from and written to the source.
+# In addition, data is copied in background just
+# like in @passive mode.
+#
+# Since: 2.11
+##
+{ 'enum': 'MirrorCopyMode',
+  'data': ['passive', 'active-write', 'active-read-write'] }
+
+##
 # @BlockJobType:
 #
 # Type of a block job.
diff --git a/block/mirror.c b/block/mirror.c
index 8fea619a68..c429aa77bb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -54,8 +54,12 @@ typedef struct MirrorBlockJob {
 Error *replace_blocker;
 bool is_none_mode;
 BlockMirrorBackingMode backing_mode;
+MirrorCopyMode copy_mode;
 BlockdevOnError on_source_error, on_target_error;
 bool synced;
+/* Set when the target is synced (dirty bitmap is clean, nothing
+ * in flight) and the job is running in active mode */
+bool actively_synced;
 bool should_complete;
 int64_t granularity;
 size_t buf_size;
@@ -77,6 +81,7 @@ typedef struct MirrorBlockJob {
 int target_cluster_size;
 int max_iov;
 bool initial_zeroing_ongoing;
+int in_active_write_counter;
 
 /* Signals that we are no longer accessing source and target and the mirror
  * BDS should thus relinquish all permissions */
@@ -112,6 +117,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob 
*s, bool read,
 int error)
 {
 s->synced = false;
+s->actively_synced = false;
 if (read) {
 return block_job_error_action(>common, s->on_source_error,
   true, error);
@@ -283,13 +289,12 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 return ret;
 }
 
-static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
+static inline void mirror_wait_for_any_operation(MirrorBlockJob *s, bool 
active)
 {
 MirrorOp *op;
 
 QTAILQ_FOREACH(op, >ops_in_flight, next) {
-if (!op->is_active_write) {
-/* Only non-active operations use up in-flight slots */
+if (op->is_active_write == active) {
 qemu_co_queue_wait(>waiting_requests, NULL);
 return;
 }
@@ -297,6 +302,12 @@ static inline void 
mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 abort();
 }
 
+static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
+{
+/* Only non-active operations use up in-flight slots */
+mirror_wait_for_any_operation(s, false);
+}
+
 /* Submit async read while handling COW.
  * Returns: The number of bytes copied after and including offset,
  *  excluding any bytes copied prior to offset due to alignment.
@@ -861,6 +872,7 @@ static void coroutine_fn mirror_run(void *opaque)
 /* Report BLOCK_JOB_READY and wait for complete. */
 block_job_event_ready(>common);
 s->synced = true;
+s->actively_synced = true;
 while (!block_job_is_cancelled(>common) && !s->should_complete) {
 block_job_yield(>common);
 }
@@ -912,6 +924,12 @@ static void coroutine_fn mirror_run(void *opaque)
 int64_t cnt, delta;
 bool should_complete;
 
+/* Do not start passive operations while there are active
+ * writes in progress */
+while (s->in_active_write_counter) {
+mirror_wait_for_any_operation(s, true);
+}
+
 if 

[Qemu-devel] [PATCH 06/18] block/mirror: Use CoQueue to wait on in-flight ops

2017-09-13 Thread Max Reitz
Attach a CoQueue to each in-flight operation so if we need to wait for
any we can use it to wait instead of just blindly yielding and hoping
for some operation to wake us.

A later patch will use this infrastructure to allow requests accessing
the same area of the virtual disk to specifically wait for each other.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2b3297aa61..81253fbad1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+#include "qemu/coroutine.h"
 #include "trace.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
@@ -34,6 +35,8 @@ typedef struct MirrorBuffer {
 QSIMPLEQ_ENTRY(MirrorBuffer) next;
 } MirrorBuffer;
 
+typedef struct MirrorOp MirrorOp;
+
 typedef struct MirrorBlockJob {
 BlockJob common;
 RateLimit limit;
@@ -67,15 +70,15 @@ typedef struct MirrorBlockJob {
 unsigned long *in_flight_bitmap;
 int in_flight;
 int64_t bytes_in_flight;
+QTAILQ_HEAD(MirrorOpList, MirrorOp) ops_in_flight;
 int ret;
 bool unmap;
-bool waiting_for_io;
 int target_cluster_size;
 int max_iov;
 bool initial_zeroing_ongoing;
 } MirrorBlockJob;
 
-typedef struct MirrorOp {
+struct MirrorOp {
 MirrorBlockJob *s;
 QEMUIOVector qiov;
 int64_t offset;
@@ -83,7 +86,11 @@ typedef struct MirrorOp {
 
 /* Set by mirror_co_read() before yielding for the first time */
 uint64_t bytes_copied;
-} MirrorOp;
+
+CoQueue waiting_requests;
+
+QTAILQ_ENTRY(MirrorOp) next;
+};
 
 typedef enum MirrorMethod {
 MIRROR_METHOD_COPY,
@@ -124,7 +131,9 @@ static void coroutine_fn mirror_iteration_done(MirrorOp 
*op, int ret)
 
 chunk_num = op->offset / s->granularity;
 nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
+
 bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
+QTAILQ_REMOVE(>ops_in_flight, op, next);
 if (ret >= 0) {
 if (s->cow_bitmap) {
 bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
@@ -134,11 +143,9 @@ static void coroutine_fn mirror_iteration_done(MirrorOp 
*op, int ret)
 }
 }
 qemu_iovec_destroy(>qiov);
-g_free(op);
 
-if (s->waiting_for_io) {
-qemu_coroutine_enter(s->common.co);
-}
+qemu_co_queue_restart_all(>waiting_requests);
+g_free(op);
 }
 
 static void coroutine_fn mirror_write_complete(MirrorOp *op, int ret)
@@ -233,10 +240,11 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 
 static inline void mirror_wait_for_io(MirrorBlockJob *s)
 {
-assert(!s->waiting_for_io);
-s->waiting_for_io = true;
-qemu_coroutine_yield();
-s->waiting_for_io = false;
+MirrorOp *op;
+
+op = QTAILQ_FIRST(>ops_in_flight);
+assert(op);
+qemu_co_queue_wait(>waiting_requests, NULL);
 }
 
 /* Submit async read while handling COW.
@@ -342,6 +350,7 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t 
offset,
 .offset = offset,
 .bytes  = bytes,
 };
+qemu_co_queue_init(>waiting_requests);
 
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
@@ -357,6 +366,7 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t 
offset,
 abort();
 }
 
+QTAILQ_INSERT_TAIL(>ops_in_flight, op, next);
 qemu_coroutine_enter(co);
 
 if (mirror_method == MIRROR_METHOD_COPY) {
@@ -1292,6 +1302,8 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 }
 }
 
+QTAILQ_INIT(>ops_in_flight);
+
 trace_mirror_start(bs, s, opaque);
 block_job_start(>common);
 return;
-- 
2.13.5




[Qemu-devel] [PATCH 12/18] block/dirty-bitmap: Add bdrv_dirty_iter_next_area

2017-09-13 Thread Max Reitz
This new function allows to look for a consecutively dirty area in a
dirty bitmap.

Signed-off-by: Max Reitz 
---
 include/block/dirty-bitmap.h |  2 ++
 block/dirty-bitmap.c | 52 
 2 files changed, 54 insertions(+)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index a79a58d2c3..7654748700 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -90,6 +90,8 @@ void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
 void bdrv_reset_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
 int64_t cur_sector, int64_t nr_sectors);
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter);
+bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
+   uint64_t *offset, int *bytes);
 void bdrv_set_dirty_iter(BdrvDirtyBitmapIter *hbi, int64_t sector_num);
 int64_t bdrv_get_dirty_count(BdrvDirtyBitmap *bitmap);
 int64_t bdrv_get_meta_dirty_count(BdrvDirtyBitmap *bitmap);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index aee57cf8c8..81b2f78016 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -550,6 +550,58 @@ int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 return hbitmap_iter_next(>hbi, true);
 }
 
+/**
+ * Return the next consecutively dirty area in the dirty bitmap
+ * belonging to the given iterator @iter.
+ *
+ * @max_offset: Maximum value that may be returned for
+ *  *offset + *bytes
+ * @offset: Will contain the start offset of the next dirty area
+ * @bytes:  Will contain the length of the next dirty area
+ *
+ * Returns: True if a dirty area could be found before max_offset
+ *  (which means that *offset and *bytes then contain valid
+ *  values), false otherwise.
+ */
+bool bdrv_dirty_iter_next_area(BdrvDirtyBitmapIter *iter, uint64_t max_offset,
+   uint64_t *offset, int *bytes)
+{
+uint32_t granularity = bdrv_dirty_bitmap_granularity(iter->bitmap);
+uint64_t gran_max_offset;
+int sector_gran = granularity >> BDRV_SECTOR_BITS;
+int64_t ret;
+int size;
+
+if (DIV_ROUND_UP(max_offset, BDRV_SECTOR_SIZE) == iter->bitmap->size) {
+/* If max_offset points to the image end, round it up by the
+ * bitmap granularity */
+gran_max_offset = ROUND_UP(max_offset, granularity);
+} else {
+gran_max_offset = max_offset;
+}
+
+ret = hbitmap_iter_next(>hbi, false);
+if (ret < 0 || (ret << BDRV_SECTOR_BITS) + granularity > gran_max_offset) {
+return false;
+}
+
+*offset = ret << BDRV_SECTOR_BITS;
+size = 0;
+
+assert(granularity <= INT_MAX);
+
+do {
+/* Advance iterator */
+ret = hbitmap_iter_next(>hbi, true);
+size += granularity;
+} while ((ret << BDRV_SECTOR_BITS) + granularity <= gran_max_offset &&
+ hbitmap_iter_next(>hbi, false) == ret + sector_gran &&
+ size <= INT_MAX - granularity);
+
+*bytes = MIN(size, max_offset - *offset);
+return true;
+}
+
 /* Called within bdrv_dirty_bitmap_lock..unlock */
 void bdrv_set_dirty_bitmap_locked(BdrvDirtyBitmap *bitmap,
   int64_t cur_sector, int64_t nr_sectors)
-- 
2.13.5




Re: [Qemu-devel] [PATCH v1 2/2] virtio-gpu-ccw: Create a virtio gpu device for the ccw bus

2017-09-13 Thread Thomas Huth
On 13.09.2017 14:09, David Hildenbrand wrote:
> On 12.09.2017 16:26, Farhan Ali wrote:
>> Wire up the virtio-gpu device for the CCW bus. The virtio-gpu
>> is a virtio-1 device, so disable revision 0.
>>
>> Signed-off-by: Farhan Ali 
>> Acked-by: Christian Borntraeger 
>> ---
>>  hw/s390x/virtio-ccw.c | 54 
>> +++
>>  hw/s390x/virtio-ccw.h | 10 ++
>>  2 files changed, 64 insertions(+)
>>
>> diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
>> index b1976fd..3078bf0 100644
>> --- a/hw/s390x/virtio-ccw.c
>> +++ b/hw/s390x/virtio-ccw.c
>> @@ -1007,6 +1007,20 @@ static void virtio_ccw_crypto_realize(VirtioCcwDevice 
>> *ccw_dev, Error **errp)
>>   NULL);
>>  }
>>  
>> +static void virtio_ccw_gpu_realize(VirtioCcwDevice *ccw_dev, Error **errp)
>> +{
>> +VirtIOGPUCcw *dev = VIRTIO_GPU_CCW(ccw_dev);
>> +DeviceState *vdev = DEVICE(>vdev);
>> +Error *err = NULL;
>> +
>> +qdev_set_parent_bus(vdev, BUS(_dev->bus));
>> +object_property_set_bool(OBJECT(vdev), true, "realized", );
>> +if (err) {
>> +error_propagate(errp, err);
>> +return;
>> +}
> 
> You can call error_propagate() unconditionally, it can deal with !err.

I think you even do not need error_propagate here - just pass errp
directly to object_property_set_bool. You exit the function afterwards
anyway.

 Thomas



[Qemu-devel] [PATCH 11/18] hbitmap: Add @advance param to hbitmap_iter_next()

2017-09-13 Thread Max Reitz
This new parameter allows the caller to just query the next dirty
position without moving the iterator.

Signed-off-by: Max Reitz 
---
 include/qemu/hbitmap.h |  4 +++-
 block/dirty-bitmap.c   |  2 +-
 tests/test-hbitmap.c   | 26 +-
 util/hbitmap.c | 10 +++---
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/include/qemu/hbitmap.h b/include/qemu/hbitmap.h
index d3a74a21fc..6a52575ad5 100644
--- a/include/qemu/hbitmap.h
+++ b/include/qemu/hbitmap.h
@@ -316,11 +316,13 @@ void hbitmap_free_meta(HBitmap *hb);
 /**
  * hbitmap_iter_next:
  * @hbi: HBitmapIter to operate on.
+ * @advance: If true, advance the iterator.  Otherwise, the next call
+ *   of this function will return the same result.
  *
  * Return the next bit that is set in @hbi's associated HBitmap,
  * or -1 if all remaining bits are zero.
  */
-int64_t hbitmap_iter_next(HBitmapIter *hbi);
+int64_t hbitmap_iter_next(HBitmapIter *hbi, bool advance);
 
 /**
  * hbitmap_iter_next_word:
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 30462d4f9a..aee57cf8c8 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -547,7 +547,7 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter)
 
 int64_t bdrv_dirty_iter_next(BdrvDirtyBitmapIter *iter)
 {
-return hbitmap_iter_next(>hbi);
+return hbitmap_iter_next(>hbi, true);
 }
 
 /* Called within bdrv_dirty_bitmap_lock..unlock */
diff --git a/tests/test-hbitmap.c b/tests/test-hbitmap.c
index 1acb353889..e6d4d563cb 100644
--- a/tests/test-hbitmap.c
+++ b/tests/test-hbitmap.c
@@ -46,7 +46,7 @@ static void hbitmap_test_check(TestHBitmapData *data,
 
 i = first;
 for (;;) {
-next = hbitmap_iter_next();
+next = hbitmap_iter_next(, true);
 if (next < 0) {
 next = data->size;
 }
@@ -435,25 +435,25 @@ static void test_hbitmap_iter_granularity(TestHBitmapData 
*data,
 /* Note that hbitmap_test_check has to be invoked manually in this test.  
*/
 hbitmap_test_init(data, 131072 << 7, 7);
 hbitmap_iter_init(, data->hb, 0);
-g_assert_cmpint(hbitmap_iter_next(), <, 0);
+g_assert_cmpint(hbitmap_iter_next(, true), <, 0);
 
 hbitmap_test_set(data, ((L2 + L1 + 1) << 7) + 8, 8);
 hbitmap_iter_init(, data->hb, 0);
-g_assert_cmpint(hbitmap_iter_next(), ==, (L2 + L1 + 1) << 7);
-g_assert_cmpint(hbitmap_iter_next(), <, 0);
+g_assert_cmpint(hbitmap_iter_next(, true), ==, (L2 + L1 + 1) << 7);
+g_assert_cmpint(hbitmap_iter_next(, true), <, 0);
 
 hbitmap_iter_init(, data->hb, (L2 + L1 + 2) << 7);
-g_assert_cmpint(hbitmap_iter_next(), <, 0);
+g_assert_cmpint(hbitmap_iter_next(, true), <, 0);
 
 hbitmap_test_set(data, (131072 << 7) - 8, 8);
 hbitmap_iter_init(, data->hb, 0);
-g_assert_cmpint(hbitmap_iter_next(), ==, (L2 + L1 + 1) << 7);
-g_assert_cmpint(hbitmap_iter_next(), ==, 131071 << 7);
-g_assert_cmpint(hbitmap_iter_next(), <, 0);
+g_assert_cmpint(hbitmap_iter_next(, true), ==, (L2 + L1 + 1) << 7);
+g_assert_cmpint(hbitmap_iter_next(, true), ==, 131071 << 7);
+g_assert_cmpint(hbitmap_iter_next(, true), <, 0);
 
 hbitmap_iter_init(, data->hb, (L2 + L1 + 2) << 7);
-g_assert_cmpint(hbitmap_iter_next(), ==, 131071 << 7);
-g_assert_cmpint(hbitmap_iter_next(), <, 0);
+g_assert_cmpint(hbitmap_iter_next(, true), ==, 131071 << 7);
+g_assert_cmpint(hbitmap_iter_next(, true), <, 0);
 }
 
 static void hbitmap_test_set_boundary_bits(TestHBitmapData *data, ssize_t diff)
@@ -893,7 +893,7 @@ static void test_hbitmap_serialize_zeroes(TestHBitmapData 
*data,
 for (i = 0; i < num_positions; i++) {
 hbitmap_deserialize_zeroes(data->hb, positions[i], min_l1, true);
 hbitmap_iter_init(, data->hb, 0);
-next = hbitmap_iter_next();
+next = hbitmap_iter_next(, true);
 if (i == num_positions - 1) {
 g_assert_cmpint(next, ==, -1);
 } else {
@@ -919,10 +919,10 @@ static void test_hbitmap_iter_and_reset(TestHBitmapData 
*data,
 
 hbitmap_iter_init(, data->hb, BITS_PER_LONG - 1);
 
-hbitmap_iter_next();
+hbitmap_iter_next(, true);
 
 hbitmap_reset_all(data->hb);
-hbitmap_iter_next();
+hbitmap_iter_next(, true);
 }
 
 int main(int argc, char **argv)
diff --git a/util/hbitmap.c b/util/hbitmap.c
index 21535cc90b..96525983ce 100644
--- a/util/hbitmap.c
+++ b/util/hbitmap.c
@@ -141,7 +141,7 @@ unsigned long hbitmap_iter_skip_words(HBitmapIter *hbi)
 return cur;
 }
 
-int64_t hbitmap_iter_next(HBitmapIter *hbi)
+int64_t hbitmap_iter_next(HBitmapIter *hbi, bool advance)
 {
 unsigned long cur = hbi->cur[HBITMAP_LEVELS - 1] &
 hbi->hb->levels[HBITMAP_LEVELS - 1][hbi->pos];
@@ -154,8 +154,12 @@ int64_t hbitmap_iter_next(HBitmapIter *hbi)
 }
 }
 
-/* The next call will resume work from the next bit.  */
-hbi->cur[HBITMAP_LEVELS - 1] = cur & (cur - 1);
+if (advance) {
+/* 

[Qemu-devel] [PATCH 05/18] block/mirror: Convert to coroutines

2017-09-13 Thread Max Reitz
In order to talk to the source BDS (and maybe in the future to the
target BDS as well) directly, we need to convert our existing AIO
requests into coroutine I/O requests.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 134 +
 1 file changed, 78 insertions(+), 56 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 4664b0516f..2b3297aa61 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -80,6 +80,9 @@ typedef struct MirrorOp {
 QEMUIOVector qiov;
 int64_t offset;
 uint64_t bytes;
+
+/* Set by mirror_co_read() before yielding for the first time */
+uint64_t bytes_copied;
 } MirrorOp;
 
 typedef enum MirrorMethod {
@@ -101,7 +104,7 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob 
*s, bool read,
 }
 }
 
-static void mirror_iteration_done(MirrorOp *op, int ret)
+static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
 {
 MirrorBlockJob *s = op->s;
 struct iovec *iov;
@@ -138,9 +141,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 }
 }
 
-static void mirror_write_complete(void *opaque, int ret)
+static void coroutine_fn mirror_write_complete(MirrorOp *op, int ret)
 {
-MirrorOp *op = opaque;
 MirrorBlockJob *s = op->s;
 
 aio_context_acquire(blk_get_aio_context(s->common.blk));
@@ -158,9 +160,8 @@ static void mirror_write_complete(void *opaque, int ret)
 aio_context_release(blk_get_aio_context(s->common.blk));
 }
 
-static void mirror_read_complete(void *opaque, int ret)
+static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
 {
-MirrorOp *op = opaque;
 MirrorBlockJob *s = op->s;
 
 aio_context_acquire(blk_get_aio_context(s->common.blk));
@@ -176,8 +177,11 @@ static void mirror_read_complete(void *opaque, int ret)
 
 mirror_iteration_done(op, ret);
 } else {
-blk_aio_pwritev(s->target, op->offset, >qiov,
-0, mirror_write_complete, op);
+int ret;
+
+ret = blk_co_pwritev(s->target, op->offset,
+ op->qiov.size, >qiov, 0);
+mirror_write_complete(op, ret);
 }
 aio_context_release(blk_get_aio_context(s->common.blk));
 }
@@ -242,53 +246,49 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
  *  (new_end - offset) if tail is rounded up or down due to
  *  alignment or buffer limit.
  */
-static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
-   uint64_t bytes)
+static void coroutine_fn mirror_co_read(void *opaque)
 {
+MirrorOp *op = opaque;
+MirrorBlockJob *s = op->s;
 BlockBackend *source = s->common.blk;
 int nb_chunks;
 uint64_t ret;
-MirrorOp *op;
 uint64_t max_bytes;
 
 max_bytes = s->granularity * s->max_iov;
 
 /* We can only handle as much as buf_size at a time. */
-bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
-assert(bytes);
-assert(bytes < BDRV_REQUEST_MAX_BYTES);
-ret = bytes;
+op->bytes = MIN(s->buf_size, MIN(max_bytes, op->bytes));
+assert(op->bytes);
+assert(op->bytes < BDRV_REQUEST_MAX_BYTES);
+op->bytes_copied = op->bytes;
 
 if (s->cow_bitmap) {
-ret += mirror_cow_align(s, , );
+op->bytes_copied += mirror_cow_align(s, >offset, >bytes);
 }
-assert(bytes <= s->buf_size);
+/* Cannot exceed BDRV_REQUEST_MAX_BYTES + INT_MAX */
+assert(op->bytes_copied <= UINT_MAX);
+assert(op->bytes <= s->buf_size);
 /* The offset is granularity-aligned because:
  * 1) Caller passes in aligned values;
  * 2) mirror_cow_align is used only when target cluster is larger. */
-assert(QEMU_IS_ALIGNED(offset, s->granularity));
+assert(QEMU_IS_ALIGNED(op->offset, s->granularity));
 /* The range is sector-aligned, since bdrv_getlength() rounds up. */
-assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
-nb_chunks = DIV_ROUND_UP(bytes, s->granularity);
+assert(QEMU_IS_ALIGNED(op->bytes, BDRV_SECTOR_SIZE));
+nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
 
 while (s->buf_free_count < nb_chunks) {
-trace_mirror_yield_in_flight(s, offset, s->in_flight);
+trace_mirror_yield_in_flight(s, op->offset, s->in_flight);
 mirror_wait_for_io(s);
 }
 
-/* Allocate a MirrorOp that is used as an AIO callback.  */
-op = g_new(MirrorOp, 1);
-op->s = s;
-op->offset = offset;
-op->bytes = bytes;
-
 /* Now make a QEMUIOVector taking enough granularity-sized chunks
  * from s->buf_free.
  */
 qemu_iovec_init(>qiov, nb_chunks);
 while (nb_chunks-- > 0) {
 MirrorBuffer *buf = QSIMPLEQ_FIRST(>buf_free);
-size_t remaining = bytes - op->qiov.size;
+size_t remaining = op->bytes - op->qiov.size;
 
 QSIMPLEQ_REMOVE_HEAD(>buf_free, next);
 s->buf_free_count--;
@@ -297,53 +297,75 @@ static uint64_t mirror_do_read(MirrorBlockJob *s, 

[Qemu-devel] [PATCH 08/18] block/mirror: Use source as a BdrvChild

2017-09-13 Thread Max Reitz
With this, the mirror_top_bs is no longer just a technically required
node in the BDS graph but actually represents the block job operation.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2ece38094d..9df4157511 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -43,8 +43,8 @@ typedef struct MirrorBlockJob {
 RateLimit limit;
 BlockBackend *target;
 BlockDriverState *mirror_top_bs;
-BlockDriverState *source;
 BlockDriverState *base;
+BdrvChild *source;
 
 /* The name of the graph node to replace */
 char *replaces;
@@ -294,7 +294,6 @@ static void coroutine_fn mirror_co_read(void *opaque)
 {
 MirrorOp *op = opaque;
 MirrorBlockJob *s = op->s;
-BlockBackend *source = s->common.blk;
 int nb_chunks;
 uint64_t ret;
 uint64_t max_bytes;
@@ -344,7 +343,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
 s->bytes_in_flight += op->bytes;
 trace_mirror_one_iteration(s, op->offset, op->bytes);
 
-ret = blk_co_preadv(source, op->offset, op->bytes, >qiov, 0);
+ret = bdrv_co_preadv(s->source, op->offset, op->bytes, >qiov, 0);
 mirror_read_complete(op, ret);
 }
 
@@ -416,7 +415,7 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t 
offset,
 
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
-BlockDriverState *source = s->source;
+BlockDriverState *source = s->source->bs;
 MirrorOp *pseudo_op;
 int64_t offset;
 uint64_t delay_ns = 0, ret = 0;
@@ -597,7 +596,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 MirrorExitData *data = opaque;
 AioContext *replace_aio_context = NULL;
-BlockDriverState *src = s->source;
+BlockDriverState *src = s->source->bs;
 BlockDriverState *target_bs = blk_bs(s->target);
 BlockDriverState *mirror_top_bs = s->mirror_top_bs;
 Error *local_err = NULL;
@@ -712,7 +711,7 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 {
 int64_t sector_num, end;
 BlockDriverState *base = s->base;
-BlockDriverState *bs = s->source;
+BlockDriverState *bs = s->source->bs;
 BlockDriverState *target_bs = blk_bs(s->target);
 int ret, n;
 int64_t count;
@@ -802,7 +801,7 @@ static void coroutine_fn mirror_run(void *opaque)
 {
 MirrorBlockJob *s = opaque;
 MirrorExitData *data;
-BlockDriverState *bs = s->source;
+BlockDriverState *bs = s->source->bs;
 BlockDriverState *target_bs = blk_bs(s->target);
 bool need_drain = true;
 int64_t length;
@@ -1284,7 +1283,7 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 /* The block job now has a reference to this node */
 bdrv_unref(mirror_top_bs);
 
-s->source = bs;
+s->source = mirror_top_bs->backing;
 s->mirror_top_bs = mirror_top_bs;
 
 /* No resize for the target either; while the mirror is still running, a
@@ -1330,6 +1329,9 @@ static void mirror_start_job(const char *job_id, 
BlockDriverState *bs,
 s->should_complete = true;
 }
 
+s->source = mirror_top_bs->backing;
+s->mirror_top_bs = mirror_top_bs;
+
 s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
 if (!s->dirty_bitmap) {
 goto fail;
-- 
2.13.5




[Qemu-devel] [PATCH 09/18] block: Generalize should_update_child() rule

2017-09-13 Thread Max Reitz
Currently, bdrv_replace_node() refuses to create loops from one BDS to
itself if the BDS to be replaced is the backing node of the BDS to
replace it: Say there is a node A and a node B.  Replacing B by A means
making all references to B point to A.  If B is a child of A (i.e. A has
a reference to B), that would mean we would have to make this reference
point to A itself -- so we'd create a loop.

bdrv_replace_node() (through should_update_child()) refuses to do so if
B is the backing node of A.  There is no reason why we should create
loops if B is not the backing node of A, though.  The BDS graph should
never contain loops, so we should always refuse to create them.

If B is a child of A and B is to be replaced by A, we should simply
leave B in place there because it is the most sensible choice.

A more specific argument would be: Putting filter drivers into the BDS
graph is basically the same as appending an overlay to a backing chain.
But the main child BDS of a filter driver is not "backing" but "file",
so restricting the no-loop rule to backing nodes would fail here.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h |  2 ++
 block.c   | 44 ++--
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index eaeaad9428..fa8bbf1f8b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -573,6 +573,8 @@ struct BdrvChild {
 QLIST_ENTRY(BdrvChild) next_parent;
 };
 
+typedef QLIST_HEAD(BdrvChildList, BdrvChild) BdrvChildList;
+
 /*
  * Note: the function bdrv_append() copies and swaps contents of
  * BlockDriverStates, so if you add new fields to this struct, please
diff --git a/block.c b/block.c
index 0b55c5a41c..1898b958c9 100644
--- a/block.c
+++ b/block.c
@@ -3134,16 +3134,39 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 return false;
 }
 
-if (c->role == _backing) {
-/* If @from is a backing file of @to, ignore the child to avoid
- * creating a loop. We only want to change the pointer of other
- * parents. */
-QLIST_FOREACH(to_c, >children, next) {
-if (to_c == c) {
-break;
-}
-}
-if (to_c) {
+/* If the child @c belongs to the BDS @to, replacing the current
+ * c->bs by @to would mean to create a loop.
+ *
+ * Such a case occurs when appending a BDS to a backing chain.
+ * For instance, imagine the following chain:
+ *
+ *   guest device -> node A -> further backing chain...
+ *
+ * Now we create a new BDS B which we want to put on top of this
+ * chain, so we first attach A as its backing node:
+ *
+ *   node B
+ * |
+ * v
+ *   guest device -> node A -> further backing chain...
+ *
+ * Finally we want to replace A by B.  When doing that, we want to
+ * replace all pointers to A by pointers to B -- except for the
+ * pointer from B because (1) that would create a loop, and (2)
+ * that pointer should simply stay intact:
+ *
+ *   guest device -> node B
+ * |
+ * v
+ *   node A -> further backing chain...
+ *
+ * In general, when replacing a node A (c->bs) by a node B (@to),
+ * if A is a child of B, that means we cannot replace A by B there
+ * because that would create a loop.  Silently detaching A from B
+ * is also not really an option.  So overall just leaving A in
+ * place there is the most sensible choice. */
+QLIST_FOREACH(to_c, >children, next) {
+if (to_c == c) {
 return false;
 }
 }
@@ -3169,6 +3192,7 @@ void bdrv_replace_node(BlockDriverState *from, 
BlockDriverState *to,
 
 /* Put all parents into @list and calculate their cumulative permissions */
 QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
+assert(c->bs == from);
 if (!should_update_child(c, to)) {
 continue;
 }
-- 
2.13.5




[Qemu-devel] [PATCH 07/18] block/mirror: Wait for in-flight op conflicts

2017-09-13 Thread Max Reitz
This patch makes the mirror code differentiate between simply waiting
for any operation to complete (mirror_wait_for_free_in_flight_slot())
and specifically waiting for all operations touching a certain range of
the virtual disk to complete (mirror_wait_on_conflicts()).

Signed-off-by: Max Reitz 
---
 block/mirror.c | 85 +++---
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 81253fbad1..2ece38094d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
 #include "qemu/coroutine.h"
+#include "qemu/range.h"
 #include "trace.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
@@ -111,6 +112,41 @@ static BlockErrorAction mirror_error_action(MirrorBlockJob 
*s, bool read,
 }
 }
 
+static void coroutine_fn mirror_wait_on_conflicts(MirrorOp *self,
+  MirrorBlockJob *s,
+  uint64_t offset,
+  uint64_t bytes)
+{
+uint64_t self_start_chunk = offset / s->granularity;
+uint64_t self_end_chunk = DIV_ROUND_UP(offset + bytes, s->granularity);
+uint64_t self_nb_chunks = self_end_chunk - self_start_chunk;
+
+while (find_next_bit(s->in_flight_bitmap, self_end_chunk,
+ self_start_chunk) < self_end_chunk &&
+   s->ret >= 0)
+{
+MirrorOp *op;
+
+QTAILQ_FOREACH(op, >ops_in_flight, next) {
+uint64_t op_start_chunk = op->offset / s->granularity;
+uint64_t op_nb_chunks = DIV_ROUND_UP(op->offset + op->bytes,
+ s->granularity) -
+op_start_chunk;
+
+if (op == self) {
+continue;
+}
+
+if (ranges_overlap(self_start_chunk, self_nb_chunks,
+   op_start_chunk, op_nb_chunks))
+{
+qemu_co_queue_wait(>waiting_requests, NULL);
+break;
+}
+}
+}
+}
+
 static void coroutine_fn mirror_iteration_done(MirrorOp *op, int ret)
 {
 MirrorBlockJob *s = op->s;
@@ -238,7 +274,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 return ret;
 }
 
-static inline void mirror_wait_for_io(MirrorBlockJob *s)
+static inline void mirror_wait_for_free_in_flight_slot(MirrorBlockJob *s)
 {
 MirrorOp *op;
 
@@ -287,7 +323,7 @@ static void coroutine_fn mirror_co_read(void *opaque)
 
 while (s->buf_free_count < nb_chunks) {
 trace_mirror_yield_in_flight(s, op->offset, s->in_flight);
-mirror_wait_for_io(s);
+mirror_wait_for_free_in_flight_slot(s);
 }
 
 /* Now make a QEMUIOVector taking enough granularity-sized chunks
@@ -381,8 +417,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t 
offset,
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
 BlockDriverState *source = s->source;
-int64_t offset, first_chunk;
-uint64_t delay_ns = 0;
+MirrorOp *pseudo_op;
+int64_t offset;
+uint64_t delay_ns = 0, ret = 0;
 /* At least the first dirty chunk is mirrored in one iteration. */
 int nb_chunks = 1;
 int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
@@ -400,11 +437,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 }
 bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
 
-first_chunk = offset / s->granularity;
-while (test_bit(first_chunk, s->in_flight_bitmap)) {
-trace_mirror_yield_in_flight(s, offset, s->in_flight);
-mirror_wait_for_io(s);
-}
+mirror_wait_on_conflicts(NULL, s, offset, 1);
 
 block_job_pause_point(>common);
 
@@ -442,6 +475,20 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
nb_chunks * sectors_per_chunk);
 bdrv_dirty_bitmap_unlock(s->dirty_bitmap);
 
+/* Before claiming an area in the in-flight bitmap, we have to
+ * create a MirrorOp for it so that conflicting requests can wait
+ * for it.  mirror_perform() will create the real MirrorOps later,
+ * for now we just create a pseudo operation that will wake up all
+ * conflicting requests once all real operations have been
+ * launched. */
+pseudo_op = g_new(MirrorOp, 1);
+*pseudo_op = (MirrorOp){
+.offset = offset,
+.bytes  = nb_chunks * s->granularity,
+};
+qemu_co_queue_init(_op->waiting_requests);
+QTAILQ_INSERT_TAIL(>ops_in_flight, pseudo_op, next);
+
 bitmap_set(s->in_flight_bitmap, offset / s->granularity, nb_chunks);
 while (nb_chunks > 0 && offset < s->bdev_length) {
 int64_t ret;
@@ -481,11 +528,12 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 
 while (s->in_flight >= MAX_IN_FLIGHT) 

[Qemu-devel] [PATCH 04/18] block/mirror: Pull out mirror_perform()

2017-09-13 Thread Max Reitz
When converting mirror's I/O to coroutines, we are going to need a point
where these coroutines are created.  mirror_perform() is going to be
that point.

Signed-off-by: Max Reitz 
---
 block/mirror.c | 53 ++---
 1 file changed, 30 insertions(+), 23 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 6531652d73..4664b0516f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -82,6 +82,12 @@ typedef struct MirrorOp {
 uint64_t bytes;
 } MirrorOp;
 
+typedef enum MirrorMethod {
+MIRROR_METHOD_COPY,
+MIRROR_METHOD_ZERO,
+MIRROR_METHOD_DISCARD,
+} MirrorMethod;
+
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
 int error)
 {
@@ -324,6 +330,22 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 }
 }
 
+static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
+   unsigned bytes, MirrorMethod mirror_method)
+{
+switch (mirror_method) {
+case MIRROR_METHOD_COPY:
+return mirror_do_read(s, offset, bytes);
+case MIRROR_METHOD_ZERO:
+case MIRROR_METHOD_DISCARD:
+mirror_do_zero_or_discard(s, offset, bytes,
+  mirror_method == MIRROR_METHOD_DISCARD);
+return bytes;
+default:
+abort();
+}
+}
+
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
 BlockDriverState *source = s->source;
@@ -395,11 +417,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 unsigned int io_bytes;
 int64_t io_bytes_acct;
 BlockDriverState *file;
-enum MirrorMethod {
-MIRROR_METHOD_COPY,
-MIRROR_METHOD_ZERO,
-MIRROR_METHOD_DISCARD
-} mirror_method = MIRROR_METHOD_COPY;
+MirrorMethod mirror_method = MIRROR_METHOD_COPY;
 
 assert(!(offset % s->granularity));
 ret = bdrv_get_block_status_above(source, NULL,
@@ -439,22 +457,11 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 }
 
 io_bytes = mirror_clip_bytes(s, offset, io_bytes);
-switch (mirror_method) {
-case MIRROR_METHOD_COPY:
-io_bytes = io_bytes_acct = mirror_do_read(s, offset, io_bytes);
-break;
-case MIRROR_METHOD_ZERO:
-case MIRROR_METHOD_DISCARD:
-mirror_do_zero_or_discard(s, offset, io_bytes,
-  mirror_method == MIRROR_METHOD_DISCARD);
-if (write_zeroes_ok) {
-io_bytes_acct = 0;
-} else {
-io_bytes_acct = io_bytes;
-}
-break;
-default:
-abort();
+io_bytes = mirror_perform(s, offset, io_bytes, mirror_method);
+if (mirror_method != MIRROR_METHOD_COPY && write_zeroes_ok) {
+io_bytes_acct = 0;
+} else {
+io_bytes_acct = io_bytes;
 }
 assert(io_bytes);
 offset += io_bytes;
@@ -650,8 +657,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 continue;
 }
 
-mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE, false);
+mirror_perform(s, sector_num * BDRV_SECTOR_SIZE,
+   nb_sectors * BDRV_SECTOR_SIZE, MIRROR_METHOD_ZERO);
 sector_num += nb_sectors;
 }
 
-- 
2.13.5




[Qemu-devel] [PATCH 03/18] blockjob: Make drained_{begin, end} public

2017-09-13 Thread Max Reitz
When a block job decides to be represented as a BDS and track its
associated child nodes itself instead of having the BlockJob object
track them, it needs to implement the drained_begin/drained_end child
operations.  In order to do that, it has to be able to control drainage
of the block job (i.e. to pause and resume it).  Therefore, we need to
make these operations public.

Signed-off-by: Max Reitz 
---
 include/block/blockjob.h | 15 +++
 blockjob.c   | 20 ++--
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 67c0968fa5..a59f316788 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -339,6 +339,21 @@ void block_job_ref(BlockJob *job);
 void block_job_unref(BlockJob *job);
 
 /**
+ * block_job_drained_begin:
+ *
+ * Inhibit I/O requests initiated by the block job.
+ */
+void block_job_drained_begin(BlockJob *job);
+
+/**
+ * block_job_drained_end:
+ *
+ * Resume I/O after it has been paused through
+ * block_job_drained_begin().
+ */
+void block_job_drained_end(BlockJob *job);
+
+/**
  * block_job_txn_unref:
  *
  * Release a reference that was previously acquired with block_job_txn_add_job
diff --git a/blockjob.c b/blockjob.c
index 3a0c49137e..4312a121fa 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -217,21 +217,29 @@ static const BdrvChildRole child_job = {
 .stay_at_node   = true,
 };
 
-static void block_job_drained_begin(void *opaque)
+void block_job_drained_begin(BlockJob *job)
 {
-BlockJob *job = opaque;
 block_job_pause(job);
 }
 
-static void block_job_drained_end(void *opaque)
+static void block_job_drained_begin_op(void *opaque)
+{
+block_job_drained_begin(opaque);
+}
+
+void block_job_drained_end(BlockJob *job)
 {
-BlockJob *job = opaque;
 block_job_resume(job);
 }
 
+static void block_job_drained_end_op(void *opaque)
+{
+block_job_drained_end(opaque);
+}
+
 static const BlockDevOps block_job_dev_ops = {
-.drained_begin = block_job_drained_begin,
-.drained_end = block_job_drained_end,
+.drained_begin = block_job_drained_begin_op,
+.drained_end = block_job_drained_end_op,
 };
 
 void block_job_remove_all_bdrv(BlockJob *job)
-- 
2.13.5




[Qemu-devel] [PATCH 02/18] block: BDS deletion during bdrv_drain_recurse

2017-09-13 Thread Max Reitz
Drainined a BDS child may lead to both the original BDS and/or its other
children being deleted (e.g. if the original BDS represents a block
job).  We should prepare for this in both bdrv_drain_recurse() and
bdrv_drained_begin() by monitoring whether the BDS we are about to drain
still exists at all.

Signed-off-by: Max Reitz 
---
 block/io.c | 72 +-
 1 file changed, 52 insertions(+), 20 deletions(-)

diff --git a/block/io.c b/block/io.c
index 4378ae4c7d..8ec1a564ad 100644
--- a/block/io.c
+++ b/block/io.c
@@ -182,33 +182,57 @@ static void bdrv_drain_invoke(BlockDriverState *bs)
 
 static bool bdrv_drain_recurse(BlockDriverState *bs)
 {
-BdrvChild *child, *tmp;
+BdrvChild *child;
 bool waited;
+struct BDSToDrain {
+BlockDriverState *bs;
+BdrvDeletedStatus del_stat;
+QLIST_ENTRY(BDSToDrain) next;
+};
+QLIST_HEAD(, BDSToDrain) bs_list = QLIST_HEAD_INITIALIZER(bs_list);
+bool in_main_loop =
+qemu_get_current_aio_context() == qemu_get_aio_context();
 
 waited = BDRV_POLL_WHILE(bs, atomic_read(>in_flight) > 0);
 
 /* Ensure any pending metadata writes are submitted to bs->file.  */
 bdrv_drain_invoke(bs);
 
-QLIST_FOREACH_SAFE(child, >children, next, tmp) {
-BlockDriverState *bs = child->bs;
-bool in_main_loop =
-qemu_get_current_aio_context() == qemu_get_aio_context();
-assert(bs->refcnt > 0);
-if (in_main_loop) {
-/* In case the recursive bdrv_drain_recurse processes a
- * block_job_defer_to_main_loop BH and modifies the graph,
- * let's hold a reference to bs until we are done.
- *
- * IOThread doesn't have such a BH, and it is not safe to call
- * bdrv_unref without BQL, so skip doing it there.
- */
-bdrv_ref(bs);
-}
-waited |= bdrv_drain_recurse(bs);
-if (in_main_loop) {
-bdrv_unref(bs);
+/* Draining children may result in other children being removed and maybe
+ * even deleted, so copy the children list first */
+QLIST_FOREACH(child, >children, next) {
+struct BDSToDrain *bs2d = g_new0(struct BDSToDrain, 1);
+
+bs2d->bs = child->bs;
+QLIST_INSERT_HEAD(>deleted_status, >del_stat, next);
+
+QLIST_INSERT_HEAD(_list, bs2d, next);
+}
+
+while (!QLIST_EMPTY(_list)) {
+struct BDSToDrain *bs2d = QLIST_FIRST(_list);
+QLIST_REMOVE(bs2d, next);
+
+if (!bs2d->del_stat.deleted) {
+QLIST_REMOVE(>del_stat, next);
+
+if (in_main_loop) {
+/* In case the recursive bdrv_drain_recurse processes a
+ * block_job_defer_to_main_loop BH and modifies the graph,
+ * let's hold a reference to the BDS until we are done.
+ *
+ * IOThread doesn't have such a BH, and it is not safe to call
+ * bdrv_unref without BQL, so skip doing it there.
+ */
+bdrv_ref(bs2d->bs);
+}
+waited |= bdrv_drain_recurse(bs2d->bs);
+if (in_main_loop) {
+bdrv_unref(bs2d->bs);
+}
 }
+
+g_free(bs2d);
 }
 
 return waited;
@@ -252,17 +276,25 @@ static void coroutine_fn 
bdrv_co_yield_to_drain(BlockDriverState *bs)
 
 void bdrv_drained_begin(BlockDriverState *bs)
 {
+BdrvDeletedStatus del_stat = { .deleted = false };
+
 if (qemu_in_coroutine()) {
 bdrv_co_yield_to_drain(bs);
 return;
 }
 
+QLIST_INSERT_HEAD(>deleted_status, _stat, next);
+
 if (atomic_fetch_inc(>quiesce_counter) == 0) {
 aio_disable_external(bdrv_get_aio_context(bs));
 bdrv_parent_drained_begin(bs);
 }
 
-bdrv_drain_recurse(bs);
+if (!del_stat.deleted) {
+QLIST_REMOVE(_stat, next);
+
+bdrv_drain_recurse(bs);
+}
 }
 
 void bdrv_drained_end(BlockDriverState *bs)
-- 
2.13.5




[Qemu-devel] [PATCH 00/18] block/mirror: Add active-sync mirroring

2017-09-13 Thread Max Reitz
This series implements an active and synchronous mirroring mode.

Currently, the mirror block job is passive an asynchronous: Depending on
your start conditions, some part of the source disk starts as "dirty".
Then, the block job will (as a background operation) continuously copy
dirty parts to the target disk until all of the source disk is clean.
In the meantime, any write to the source disk dirties the affected area.

One effect of this operational mode is that the job may never converge:
If the writes to the source happen faster than the block job copies data
to the target, the job can never finish.

When the active mode implemented in this series is enabled, every write
request to the source will automatically trigger a synchronous write to
the target right afterwards.  Therefore, the source can never get dirty
faster than data is copied to the target.  Most importantly, once source
and target are in sync (BLOCK_JOB_READY is emitted), they will not
diverge (unless e.g. an I/O error occurs).

Active mirroring also improves on a second issue of the passive mode: We
do not have to read data from the source in order to write it to the
target.  When new data is written to the source in active mode, it is
automatically mirrored to the target, which saves us the superfluous
read from the source.
(Optionally, one can choose to also mirror data read from the source.
This does not necessarily help with convergence, but it saves an extra
read operation (in exchange for slower read access to the source because
this mirroring is implemented synchronously).)


There may be a couple of things to do on top of this series:
- Allow switching between active and passive mode at runtime: This
  should not be too difficult to implement, the main question is how to
  expose it to the user.
  (I seem to recall we wanted some form of block-job-set-option
  command...?)

- Implement an asynchronous active mode: May be detrimental when it
  comes to convergence, but it might be nice to have anyway.  May or may
  not be complicated to implement.

- Make the target a BdrvChild of the mirror BDS: This series does some
  work to make the mirror BDS a more integral part of the block job (see
  below for more).  One of the things I wanted to do is to make both the
  source and the target plain children of that BDS, and I did have
  patches to do this.  However, at some point continuing to do this for
  the target seemed rather difficult, and also a bit pointless, so I
  decided to keep it for later.
  (To be specific, that "some point" was exactly when I tried to rebase
  onto 045a2f8254c.)


=== Structure of this series ===

The first half (up until patch 10) restructures parts of the mirror
block job:

- Patches 4/5:
  The job is converted to use coroutines instead of AIO.
  (because this is probably where we want to go, and also because active
   mirroring will need to wait on conflicting in-flight operations, and
   I really don't want to wait on in-flight AIO requests)

  This is done primarily by patch 5, with patch 4 being necessary
  beforehand.

- Patches 6/7:
  Every in-flight operation gets a CoQueue so it can be waited on
  (because this allows active mirroring operations to wait for
  conflicting writes)

  This is started by patch 6, and with patch 7, every bit in the
  in-flight bitmap has at least one  corresponding operation in the
  MirrorBlockJob.ops_in_flight list that can be waited on.

- Patches 1/2/3/8/9/10:
  The source node is now no longer used through a BlockBackend (patch 8)
  and also it is now attached to the mirror BDS as the "file" child
  instead of the "backing" child (patch 10).
  This is mostly because I'd personally like the mirror BDS to be a real
  filter BDS instead of some technicality that needs to be there to
  solve op blocker issues.

  Patches 3 and 9 are necessary for patch 10.

  Patches 1 and 2 were necessary for this when I decided to include
  another patch to make the target node an immediate child of the mirror
  BDS, too.  However, as I wrote above, I later decided to put this idea
  off until later, and as long as the mirror BDS only has a single
  child, those patches are not strictly necessary.
  However, I think that those patches are good to have anyway, so I
  decided to keep them.


The second half (patches 11 to 18) implement active mirroring:
- Patch 11 is required by patch 12.  This in turn is required by the
  active-sync mode when mirroring data read from the source to the
  target, because that functionality needs to be able to find all the
  parts of the data read which are actually dirty so we don't copy clean
  data.

- Patches 13 and 14 prepare for the job for active operations.

- Patch 15 implements active mirroring.

- Patch 16 allows it to be used (by adding a parameter to
  blockdev-mirror and drive-mirror).

- Patch 18 adds an iotest which relies on functionality introduced by
  patch 17.



Max Reitz (18):
  block: Add BdrvDeletedStatus
  block: BDS 

[Qemu-devel] [PATCH 01/18] block: Add BdrvDeletedStatus

2017-09-13 Thread Max Reitz
Sometimes an operation may delete a BDS.  It may then not be trivial to
determine this because the BDS object itself cannot be accessed
afterwards.  With this patch, one can attach a BdrvDeletedStatus object
to a BDS which can be used to safely query whether the BDS still exists
even after it has been deleted.

Signed-off-by: Max Reitz 
---
 include/block/block_int.h | 12 
 block.c   |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index ba4c383393..eaeaad9428 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -498,6 +498,13 @@ typedef struct BdrvAioNotifier {
 QLIST_ENTRY(BdrvAioNotifier) list;
 } BdrvAioNotifier;
 
+typedef struct BdrvDeletedStatus {
+/* Set to true by bdrv_delete() */
+bool deleted;
+
+QLIST_ENTRY(BdrvDeletedStatus) next;
+} BdrvDeletedStatus;
+
 struct BdrvChildRole {
 /* If true, bdrv_replace_node() doesn't change the node this BdrvChild
  * points to. */
@@ -706,6 +713,11 @@ struct BlockDriverState {
 
 /* Only read/written by whoever has set active_flush_req to true.  */
 unsigned int flushed_gen; /* Flushed write generation */
+
+/* When bdrv_delete() is invoked, it will walk through this list
+ * and set every entry's @deleted field to true.  The entries will
+ * not be freed automatically. */
+QLIST_HEAD(, BdrvDeletedStatus) deleted_status;
 };
 
 struct BlockBackendRootState {
diff --git a/block.c b/block.c
index 6dd47e414e..0b55c5a41c 100644
--- a/block.c
+++ b/block.c
@@ -3246,10 +3246,16 @@ out:
 
 static void bdrv_delete(BlockDriverState *bs)
 {
+BdrvDeletedStatus *del_stat;
+
 assert(!bs->job);
 assert(bdrv_op_blocker_is_empty(bs));
 assert(!bs->refcnt);
 
+QLIST_FOREACH(del_stat, >deleted_status, next) {
+del_stat->deleted = true;
+}
+
 bdrv_close(bs);
 
 /* remove from list, if necessary */
-- 
2.13.5




Re: [Qemu-devel] [RFC 0/8] virtio-crypto: add multiplexing mode support

2017-09-13 Thread Halil Pasic


On 09/11/2017 03:10 AM, Longpeng(Mike) wrote:
> *NOTE*
> The code realization is based on the latest virtio crypto spec:
>  [PATCH v19 0/2] virtio-crypto: virtio crypto device specification
>https://lists.nongnu.org/archive/html/qemu-devel/2017-08/msg05217.html
> 
> In session mode, the process of create/close a session
> makes we have a least one full round-trip cost from guest to host to guest
> to be able to send any data for symmetric algorithms. It gets ourself into
> synchronization troubles in some scenarios like a web server handling lots
> of small requests whose algorithms and keys are different.
> 
> We can support one-blob request (no sessions) as well for symmetric
> algorithms, including HASH, MAC services. The benefit is obvious for
> HASH service because it's usually a one-blob operation.
> 

Hi!

I've just started looking at this. Patch #1 modifies linux/virtio_crypto.h
which if I compare with the (almost) latest linux master is different. Thus
I would expect a corresponding kernel patch set too, but I haven't received
one, nor did I find a reference in the cover letter.

I think if I want to test the new features I need the kernel counter-part
too, or?

Could you point me to the kernel counterpart?

Regards,
Halil


> Gonglei (3):
>   virtio-crypto: add stateless crypto request handler
>   cryptodev: extract one util function
>   virtio-crypto: add host feature bits support
> 
> Longpeng(Mike) (5):
>   virtio-crypto: add new definations for multiplexing mode
>   virtio-crypto: add session creation logic for mux mode
>   virtio-crypto: add dataq operation logic for mux mode
>   cryptodev: add stateless mode cipher support
>   cryptodev-builtin: add stateless cipher support
> 
>  backends/cryptodev-builtin.c   | 189 ---
>  backends/cryptodev.c   |  21 ++
>  hw/virtio/virtio-crypto.c  | 433 
> +++--
>  include/hw/virtio/virtio-crypto.h  |   2 +
>  include/standard-headers/linux/virtio_crypto.h | 182 ++-
>  include/sysemu/cryptodev.h |  21 ++
>  6 files changed, 774 insertions(+), 74 deletions(-)
> 




Re: [Qemu-devel] [PATCH v2] accel: default to an actually available accelerator

2017-09-13 Thread Richard Henderson
On 09/13/2017 10:29 AM, Cornelia Huck wrote:
> configure_accelerator() falls back to tcg if no accelerator has
> been specified. Formerly, we could be sure that tcg is always
> available; however, with --disable-tcg, this is not longer true,
> and you are not able to start qemu without explicitly specifying
> another accelerator on those builds.
> 
> Instead, choose an accelerator in the order tcg->kvm->hax.
> 
> Signed-off-by: Cornelia Huck 
> ---
> RFC->v2:
> - drop xen from the list of fallbacks
> - use comma-delimited list of accelerators (much easier)
> - changed one place in qemu-options.hx that had been missed

Queued to tcg-next.  Thanks,


r~



Re: [Qemu-devel] [PATCH 7/7] trace: Add event "guest_inst_info_after"

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:39 AM, Lluís Vilanova wrote:
>  if (TRACE_GUEST_INST_AFTER_ENABLED && translated_insn) {
>  trace_guest_inst_after_tcg(cpu, tcg_ctx.tcg_env, pc_insn);
>  }
> +if (TRACE_GUEST_INST_INFO_AFTER_ENABLED && translated_insn) {
> +trace_guest_inst_info_after_tcg(
> +cpu, tcg_ctx.tcg_env, pc_insn, insn_size);
> +}

Same comments re guest_inst + guest_inst_info, really.


r~



Re: [Qemu-devel] [PATCH 6/7] trace: Add event "guest_inst_after"

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:35 AM, Lluís Vilanova wrote:
> Signed-off-by: Lluís Vilanova 
> ---
>  accel/tcg/translator.c |   23 ++-
>  trace-events   |8 
>  2 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index d66d601c89..c010aeee45 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -35,7 +35,8 @@ void translator_loop_temp_check(DisasContextBase *db)
>  void translator_loop(const TranslatorOps *ops, DisasContextBase *db,
>   CPUState *cpu, TranslationBlock *tb)
>  {
> -target_ulong pc_bbl;
> +target_ulong pc_bbl, pc_insn = 0;
> +bool translated_insn = false;
>  int max_insns;
>  
>  /* Initialize DisasContext */
> @@ -75,10 +76,15 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
>  
>  while (true) {
> -target_ulong pc_insn = db->pc_next;
>  TCGv_i32 insn_size_tcg = 0;
>  int insn_size_opcode_idx;
>  
> +/* Tracing after (previous instruction) */
> +if (db->num_insns > 0) {
> +trace_guest_inst_after_tcg(cpu, tcg_ctx.tcg_env, pc_insn);
> +}

How does this differ from "guest_inst"?  Why would you need two trace points?

Why are you placing this at the beginning of the while loop rather than the end?

> @@ -164,6 +172,9 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  
>  gen_set_inline_region_begin(tcg_ctx.disas.inline_label);
>  
> +if (TRACE_GUEST_INST_AFTER_ENABLED && translated_insn) {
> +trace_guest_inst_after_tcg(cpu, tcg_ctx.tcg_env, pc_insn);
> +}
>  if (TRACE_GUEST_BBL_AFTER_ENABLED) {
>  trace_guest_bbl_after_tcg(cpu, tcg_ctx.tcg_env, pc_bbl);
>  }

I think I'm finally beginning to understand what you're after with your
inlining.  But I still think this should be doable in the appropriate opcode
generating functions.


r~



Re: [Qemu-devel] [PATCH 5/7] trace: Add event "guest_bbl_after"

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:31 AM, Lluís Vilanova wrote:
> +void translator__gen_goto_tb(TCGContext *ctx)
> +{
> +if (ctx->disas.in_guest_code &&
> +(TRACE_GUEST_BBL_AFTER_ENABLED)) {
> +if (ctx->disas.inline_label == NULL) {
> +ctx->disas.inline_label = gen_new_inline_label();
> +}
> +gen_set_inline_point(ctx->disas.inline_label);
> +/* disable next exit_tb */
> +ctx->disas.seen_goto_tb = true;
> +}
> +}
> +
> +void translator__gen_exit_tb(TCGContext *ctx)
> +{
> +if (ctx->disas.in_guest_code && !ctx->disas.seen_goto_tb &&
> +(TRACE_GUEST_BBL_AFTER_ENABLED)) {
> +if (ctx->disas.inline_label == NULL) {
> +ctx->disas.inline_label = gen_new_inline_label();
> +}
> +gen_set_inline_point(ctx->disas.inline_label);
> +/* enable next exit_tb */
> +ctx->disas.seen_goto_tb = false;
> +}
> +}

I don't understand why you wouldn't just modify tcg_gen_goto_tb and
tcg_gen_exit_tb instead.


r~



Re: [Qemu-devel] [PATCH 4/4] ppc: ensure we update the decrementer value during migration

2017-09-13 Thread Laurent Vivier
On 13/09/2017 19:11, Mark Cave-Ayland wrote:
> On 13/09/17 08:12, David Gibson wrote:
> 
>> This is subtly incorrect.  It sets the DECR on load to exactly the
>> value that was saved.  That effectively means that the DECR is frozen
>> for the migration downtime, which probably isn't what we want.
>>
>> Instead we need to save the DECR as an offset from the timebase, and
>> restore it relative to the (downtime adjusted) timebase on the
>> destination.
>>
>> The complication with that is that the timebase is generally migrated
>> at the machine level, not the cpu level: the timebase is generally
>> synchronized between cpus in the machine, and migrating it at the
>> individual cpu could break that.  Which means we probably need a
>> machine level hook to handle the decrementer too, even though it
>> logically *is* per-cpu, because otherwise we'll be trying to restore
>> it before the timebase is restored.
> 
> I know that we discussed this in-depth last year, however I was working
> along the lines that Laurent's patch fixed this along the lines of our
> previous discussion:
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00338.html (and
> indeed Laurent's analysis at
> https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg01489.html).
> 
> However looking again at the this patch in the context you mentioned
> above, I'm starting to wonder if the right solution now is for the
> machine init function for g3beige/mac99 to do the same as spapr, e.g.
> add cpu_ppc_clock_vm_state_change() as a vm_change_state_handler and
> then add VMSTATE_PPC_TIMEBASE_V from the machine PPCTimeBase into my new
> subsection?
> 
> Laurent, do you think that your state change handler would work
> correctly in this way?

I think all is explained in the second link you have mentioned, it seems 
we don't need a state handler as KVM DECR will no be updated by the migrated 
value:

hw/ppc/ppc.c

736 static void __cpu_ppc_store_decr(PowerPCCPU *cpu, uint64_t *nextp,
737  QEMUTimer *timer,
738  void (*raise_excp)(void *),
739  void (*lower_excp)(PowerPCCPU *),
740  uint32_t decr, uint32_t value)
741 {
...
749 if (kvm_enabled()) {
750 /* KVM handles decrementer exceptions, we don't need our own 
timer */
751 return;
752 }
...

But this allows to migrate it for TCG. And it should be correct because in case 
of TCG I think [need to check] timebase is stopped too (so offset is 0)

David, do you agree with that?

Thanks,
Laurent



Re: [Qemu-devel] [PATCH v2] accel: default to an actually available accelerator

2017-09-13 Thread Thomas Huth
On 13.09.2017 19:29, Cornelia Huck wrote:
> configure_accelerator() falls back to tcg if no accelerator has
> been specified. Formerly, we could be sure that tcg is always
> available; however, with --disable-tcg, this is not longer true,
> and you are not able to start qemu without explicitly specifying
> another accelerator on those builds.
> 
> Instead, choose an accelerator in the order tcg->kvm->hax.
> 
> Signed-off-by: Cornelia Huck 
> ---
> RFC->v2:
> - drop xen from the list of fallbacks
> - use comma-delimited list of accelerators (much easier)
> - changed one place in qemu-options.hx that had been missed
> ---
>  accel/accel.c   | 3 +--
>  qemu-options.hx | 8 +---
>  2 files changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/accel/accel.c b/accel/accel.c
> index 8ae40e1e13..e8a3121d06 100644
> --- a/accel/accel.c
> +++ b/accel/accel.c
> @@ -79,8 +79,7 @@ void configure_accelerator(MachineState *ms)
>  
>  accel = qemu_opt_get(qemu_get_machine_opts(), "accel");
>  if (accel == NULL) {
> -/* Use the default "accelerator", tcg */
> -accel = "tcg";
> +accel = "tcg:kvm:hax";
>  }
>  
>  p = accel;
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 9f6e2adfff..2ff88389e0 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
>  "-machine [type=]name[,prop[=value][,...]]\n"
>  "selects emulated machine ('-machine help' for list)\n"
>  "property accel=accel1[:accel2[:...]] selects 
> accelerator\n"
> -"supported accelerators are kvm, xen, hax or tcg 
> (default: tcg)\n"
> +"supported accelerators are kvm, xen, hax or tcg 
> (default: tcg:kvm:hax)\n"
>  "kernel_irqchip=on|off|split controls accelerated 
> irqchip support (default=off)\n"
>  "vmport=on|off|auto controls emulation of vmport 
> (default: auto)\n"
>  "kvm_shadow_mem=size of KVM shadow MMU in bytes\n"
> @@ -66,7 +66,8 @@ Supported machine properties are:
>  @table @option
>  @item accel=@var{accels1}[:@var{accels2}[:...]]
>  This is used to enable an accelerator. Depending on the target architecture,
> -kvm, xen, hax or tcg can be available. By default, tcg is used. If there is
> +kvm, xen, hax or tcg can be available. By default, the first one available
> +out of tcg, kvm, hax (in that order) is used. If there is
>  more than one accelerator specified, the next one is used if the previous one
>  fails to initialize.
>  @item kernel_irqchip=on|off
> @@ -126,7 +127,8 @@ STEXI
>  @item -accel @var{name}[,prop=@var{value}[,...]]
>  @findex -accel
>  This is used to enable an accelerator. Depending on the target architecture,
> -kvm, xen, hax or tcg can be available. By default, tcg is used. If there is
> +kvm, xen, hax or tcg can be available. By default, the first one available
> +out of tcg, kvm, hax (in that order) is used. If there is
>  more than one accelerator specified, the next one is used if the previous one
>  fails to initialize.
>  @table @option

Reviewed-by: Thomas Huth 



Re: [Qemu-devel] [PATCH] pixman: drop submodule

2017-09-13 Thread Konrad Rzeszutek Wilk
On Fri, Sep 01, 2017 at 12:50:23PM +0200, Gerd Hoffmann wrote:
> Drop pixman submodule and support for the "internal" pixman build.
> pixman should be reasonable well established meanwhile that we don't
> need the fallback submodule any more.  While being at it also drop
> some #ifdefs for pixman versions olter than what we require in

s/olter/older/




[Qemu-devel] [PATCH v2] accel: default to an actually available accelerator

2017-09-13 Thread Cornelia Huck
configure_accelerator() falls back to tcg if no accelerator has
been specified. Formerly, we could be sure that tcg is always
available; however, with --disable-tcg, this is not longer true,
and you are not able to start qemu without explicitly specifying
another accelerator on those builds.

Instead, choose an accelerator in the order tcg->kvm->hax.

Signed-off-by: Cornelia Huck 
---
RFC->v2:
- drop xen from the list of fallbacks
- use comma-delimited list of accelerators (much easier)
- changed one place in qemu-options.hx that had been missed
---
 accel/accel.c   | 3 +--
 qemu-options.hx | 8 +---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/accel/accel.c b/accel/accel.c
index 8ae40e1e13..e8a3121d06 100644
--- a/accel/accel.c
+++ b/accel/accel.c
@@ -79,8 +79,7 @@ void configure_accelerator(MachineState *ms)
 
 accel = qemu_opt_get(qemu_get_machine_opts(), "accel");
 if (accel == NULL) {
-/* Use the default "accelerator", tcg */
-accel = "tcg";
+accel = "tcg:kvm:hax";
 }
 
 p = accel;
diff --git a/qemu-options.hx b/qemu-options.hx
index 9f6e2adfff..2ff88389e0 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -31,7 +31,7 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "-machine [type=]name[,prop[=value][,...]]\n"
 "selects emulated machine ('-machine help' for list)\n"
 "property accel=accel1[:accel2[:...]] selects 
accelerator\n"
-"supported accelerators are kvm, xen, hax or tcg (default: 
tcg)\n"
+"supported accelerators are kvm, xen, hax or tcg (default: 
tcg:kvm:hax)\n"
 "kernel_irqchip=on|off|split controls accelerated irqchip 
support (default=off)\n"
 "vmport=on|off|auto controls emulation of vmport (default: 
auto)\n"
 "kvm_shadow_mem=size of KVM shadow MMU in bytes\n"
@@ -66,7 +66,8 @@ Supported machine properties are:
 @table @option
 @item accel=@var{accels1}[:@var{accels2}[:...]]
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax or tcg can be available. By default, tcg is used. If there is
+kvm, xen, hax or tcg can be available. By default, the first one available
+out of tcg, kvm, hax (in that order) is used. If there is
 more than one accelerator specified, the next one is used if the previous one
 fails to initialize.
 @item kernel_irqchip=on|off
@@ -126,7 +127,8 @@ STEXI
 @item -accel @var{name}[,prop=@var{value}[,...]]
 @findex -accel
 This is used to enable an accelerator. Depending on the target architecture,
-kvm, xen, hax or tcg can be available. By default, tcg is used. If there is
+kvm, xen, hax or tcg can be available. By default, the first one available
+out of tcg, kvm, hax (in that order) is used. If there is
 more than one accelerator specified, the next one is used if the previous one
 fails to initialize.
 @table @option
-- 
2.13.5




Re: [Qemu-devel] [Qemu devel v8 PATCH 4/5] msf2: Add Smartfusion2 SoC

2017-09-13 Thread Philippe Mathieu-Daudé

Hi Sundeep,

On 09/13/2017 06:21 AM, sundeep subbaraya wrote:

All patches got Reviewed-by. Do you want me to change alias to remap and
send


This is just cosmetic for the "alias MSF2.eNVM.alias":

(qemu) info mtree
address-space: cpu-memory
  - (prio 0, i/o): armv7m-container
- (prio -1, i/o): system
  -0003 (prio 0, i/o): alias 
MSF2.eNVM.alias @MSF2.eNVM -0003

  2000-2000 (prio 0, ram): MSF2.eSRAM
  6000-6003 (prio 0, rom): MSF2.eNVM
  a000-a3ff (prio 0, ram): ddr-ram

> v9 or wait so that you get time to review?

I have a draft for your patch 2, I'll send within the day.

Regards,

Phil.



Re: [Qemu-devel] [PATCH 3/4] ppc: add CPU access_type into the migration stream

2017-09-13 Thread Mark Cave-Ayland
On 13/09/17 08:19, David Gibson wrote:

>> When pausing a VM, does execution stop at the end of the current TB
>> rather than immediately? If so, perhaps someone could confirm that
>> guarantee is good enough for access_type?
> 
> I'm pretty sure it has to; we'd have to come up out of an individual
> TB in order to get to the main loop where we check the "please pause"
> flag.  I'm not sure if that helps us here though - I *think* access
> type is about carrying information from the point where we trigger an
> exception to the point where we actually start processing the
> exception.
> 
> This code is really ugly and I've never understood it well :(. It's
> always seemed bogus to me that we have an essentially global variable
> to carry information over that small gap, though.  Unfortunately it's
> unlikely that I'd be able to dive into this and work out if it's
> really needed any time soon.

>From my testing yesterday it does appear to be required for TCG (or
unless this is exposing another bug in the Mac migration) so I can check
it works here and then maybe someone else confirm it works on KVM?

A couple of things I've noticed: the heathrow VMStateDescription looks
good, however I can see that the OpenPIC timers won't likely migrate
correctly without adding a few more fields - that's easy to fix.

Another thing is that if we're changing the migration stream for Mac
models Ben has some OpenPIC and other related changes in his wip queue
that it might make sense to put those in first before properly fixing
the Mac machine migration.


ATB,

Mark.



Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] ppc: add CPU IRQ state to PPC VMStateDescription

2017-09-13 Thread Greg Kurz
On Wed, 13 Sep 2017 17:44:54 +0100
Mark Cave-Ayland  wrote:

> On 13/09/17 07:02, David Gibson wrote:
> 
> >>> Alexey - do you recall from your analysis why these fields were no
> >>> longer deemed necessary, and how your TCG tests were configured?  
> >>
> >> I most certainly did not do analysis (my bad. sorry) - I took the patch
> >> from David as he left the team, fixed to compile and pushed away. I am also
> >> very suspicions we did not try migrating TCG or anything but pseries. My
> >> guest that things did not break (if they did not which I am not sure about,
> >> for the TCG case) because the interrupt controller (XICS) or the
> >> pseries-guest took care of resending an interrupt which does not seem to be
> >> the case for mac99.  
> > 
> > Right, that's probably true.  The main point, though, is that these
> > fields were dropped a *long* time ago, when migration was barely
> > working to begin with.  In particular I'm pretty sure most of the
> > non-pseries platforms were already pretty broken for migration
> > (amongst other things).
> > 
> > Polishing the mac platforms up to working again, including migration,
> > is a reasonable goal.  But it can't be at the expense of pseries,
> > which is already working, used in production, and much better tested
> > than mac99 or g3beige ever were.  
> 
> Oh I completely agree since I'm well aware pseries likely has more users
> than the Mac machines - my question was directed more about why we
> support backwards migration.
> 

Downstream support backward migration because end users/customers ask for it
for maximum flexibility when it comes to move workloads around different systems
with different QEMU versions. This is fairly usual in data centers with many
systems.

As others already said, breaking things upstream may turn downstream work
into a nightmare (and FWIW, most of the people working on ppc are also
involved in downstream work).

Cheers,

--
Greg

> I spent several hours yesterday poking my Darwin test case with trying
> the different combinations of pending_interrupts, irq_input_state and
> access_type and could easily provoke migration failures unless all 3 of
> the fields were present so a practical test shows they are still
> required for TCG migration. I think ppc_set_irq()'s use of the interrupt
> fields in hw/ppc/ppc.c and the subsequent reference to pending
> interrupts in target/ppc may explain why I see freezes/hangs until a key
> is pressed in many cases.
> 
> 
> ATB,
> 
> Mark.



pgpzZX8fLAfs3.pgp
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 4/4] ppc: ensure we update the decrementer value during migration

2017-09-13 Thread Mark Cave-Ayland
On 13/09/17 08:12, David Gibson wrote:

> This is subtly incorrect.  It sets the DECR on load to exactly the
> value that was saved.  That effectively means that the DECR is frozen
> for the migration downtime, which probably isn't what we want.
> 
> Instead we need to save the DECR as an offset from the timebase, and
> restore it relative to the (downtime adjusted) timebase on the
> destination.
> 
> The complication with that is that the timebase is generally migrated
> at the machine level, not the cpu level: the timebase is generally
> synchronized between cpus in the machine, and migrating it at the
> individual cpu could break that.  Which means we probably need a
> machine level hook to handle the decrementer too, even though it
> logically *is* per-cpu, because otherwise we'll be trying to restore
> it before the timebase is restored.

I know that we discussed this in-depth last year, however I was working
along the lines that Laurent's patch fixed this along the lines of our
previous discussion:
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg00338.html (and
indeed Laurent's analysis at
https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg01489.html).

However looking again at the this patch in the context you mentioned
above, I'm starting to wonder if the right solution now is for the
machine init function for g3beige/mac99 to do the same as spapr, e.g.
add cpu_ppc_clock_vm_state_change() as a vm_change_state_handler and
then add VMSTATE_PPC_TIMEBASE_V from the machine PPCTimeBase into my new
subsection?

Laurent, do you think that your state change handler would work
correctly in this way?


ATB,

Mark.



Re: [Qemu-devel] [PATCH 4/7] tcg: Add support for "inlining" regions of code

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:27 AM, Lluís Vilanova wrote:
> TCG BBLs and instructions have multiple exit points from where to raise
> tracing events, but some of the necessary information in the generic
> disassembly infrastructure is not available until after generating these
> exit points.
> 
> This patch adds support for "inline points" (where the tracing code will
> be placed), and "inline regions" (which identify the TCG code that must
> be inlined). The TCG compiler will basically copy each inline region to
> any inline points that reference it.

I am not keen on this.

Is there a reason you can't just emit the tracing code at the appropriate place
to begin with?  Perhaps I have to wait to see how this is used...


r~



Re: [Qemu-devel] [PATCH 3/7] trace: Add event "guest_inst_info_before"

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:23 AM, Lluís Vilanova wrote:
> Signed-off-by: Lluís Vilanova 
> ---
>  accel/tcg/translator.c |   18 ++
>  trace-events   |9 +
>  2 files changed, 27 insertions(+)
> 
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index 287d27b4f7..6598931171 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -70,6 +70,8 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  
>  while (true) {
>  target_ulong pc_insn = db->pc_next;
> +TCGv_i32 insn_size_tcg = 0;
> +int insn_size_opcode_idx;

Initializing a TCGv_i32 is wrong.
And surely insn_size_opcode is surely uninitialized?

> +if (TRACE_GUEST_INST_INFO_BEFORE_EXEC_ENABLED) {
> +insn_size_tcg = tcg_temp_new_i32();
> +insn_size_opcode_idx = tcg_op_buf_count();
> +tcg_gen_movi_i32(insn_size_tcg, 0xdeadbeef);
> +
> +trace_guest_inst_info_before_tcg(
> +cpu, tcg_ctx.tcg_env, pc_insn, insn_size_tcg);
> +
> +tcg_temp_free_i32(insn_size_tcg);

There's no reason you can't declare insn_size_tcg right here and avoid the
incorrect initialization above.

Is there a reason to have both "guest_insn" and "guest_insn_info"?


r~



Re: [Qemu-devel] [Qemu devel v8 PATCH 4/5] msf2: Add Smartfusion2 SoC

2017-09-13 Thread sundeep subbaraya
Hi Philippe,

On Wed, Sep 13, 2017 at 2:51 PM, sundeep subbaraya 
wrote:

> Hi Philippe,
>
> All patches got Reviewed-by. Do you want me to change alias to remap and
> send
> v9 or wait so that you get time to review?
>
> Sorry. My bad, I overlooked your comment. I will just drop alias and send
v9.

Thanks,
Sundeep


> Thanks,
> Sundeep
>
> On Fri, Sep 8, 2017 at 12:56 PM, sundeep subbaraya  > wrote:
>
>> Hi Philippe,
>>
>> On Fri, Sep 8, 2017 at 3:08 AM, Philippe Mathieu-Daudé 
>> wrote:
>>
>>> On 09/07/2017 04:24 PM, Subbaraya Sundeep wrote:
>>>
 Smartfusion2 SoC has hardened Microcontroller subsystem
 and flash based FPGA fabric. This patch adds support for
 Microcontroller subsystem in the SoC.

 Signed-off-by: Subbaraya Sundeep 
 Reviewed-by: Alistair Francis 
 ---
   default-configs/arm-softmmu.mak |   1 +
   hw/arm/Makefile.objs|   1 +
   hw/arm/msf2-soc.c   | 218 ++
 ++
   include/hw/arm/msf2-soc.h   |  66 
   4 files changed, 286 insertions(+)
   create mode 100644 hw/arm/msf2-soc.c
   create mode 100644 include/hw/arm/msf2-soc.h

 diff --git a/default-configs/arm-softmmu.mak
 b/default-configs/arm-softmmu.mak
 index bbdd3c1..5059d13 100644
 --- a/default-configs/arm-softmmu.mak
 +++ b/default-configs/arm-softmmu.mak
 @@ -129,3 +129,4 @@ CONFIG_ACPI=y
   CONFIG_SMBIOS=y
   CONFIG_ASPEED_SOC=y
   CONFIG_GPIO_KEY=y
 +CONFIG_MSF2=y
 diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
 index a2e56ec..df36a03 100644
 --- a/hw/arm/Makefile.objs
 +++ b/hw/arm/Makefile.objs
 @@ -19,3 +19,4 @@ obj-$(CONFIG_FSL_IMX31) += fsl-imx31.o kzm.o
   obj-$(CONFIG_FSL_IMX6) += fsl-imx6.o sabrelite.o
   obj-$(CONFIG_ASPEED_SOC) += aspeed_soc.o aspeed.o
   obj-$(CONFIG_MPS2) += mps2.o
 +obj-$(CONFIG_MSF2) += msf2-soc.o
 diff --git a/hw/arm/msf2-soc.c b/hw/arm/msf2-soc.c
 new file mode 100644
 index 000..47fffa4
 --- /dev/null
 +++ b/hw/arm/msf2-soc.c
 @@ -0,0 +1,218 @@
 +/*
 + * SmartFusion2 SoC emulation.
 + *
 + * Copyright (c) 2017 Subbaraya Sundeep 
 + *
 + * Permission is hereby granted, free of charge, to any person
 obtaining a copy
 + * of this software and associated documentation files (the
 "Software"), to deal
 + * in the Software without restriction, including without limitation
 the rights
 + * to use, copy, modify, merge, publish, distribute, sublicense,
 and/or sell
 + * copies of the Software, and to permit persons to whom the Software
 is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be
 included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
 EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
 MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT
 SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
 OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 ARISING FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qapi/error.h"
 +#include "qemu-common.h"
 +#include "hw/arm/arm.h"
 +#include "exec/address-spaces.h"
 +#include "hw/char/serial.h"
 +#include "hw/boards.h"
 +#include "sysemu/block-backend.h"
 +#include "qemu/cutils.h"
 +#include "hw/arm/msf2-soc.h"
 +
 +#define MSF2_TIMER_BASE   0x40004000
 +#define MSF2_SYSREG_BASE  0x40038000
 +
 +#define ENVM_BASE_ADDRESS 0x6000
 +
 +#define SRAM_BASE_ADDRESS 0x2000
 +
 +#define MSF2_ENVM_MAX_SIZE(512 * K_BYTE)
 +
 +/*
 + * eSRAM max size is 80k without SECDED(Single error correction and
 + * dual error detection) feature and 64k with SECDED.
 + * We do not support SECDED now.
 + */
 +#define MSF2_ESRAM_MAX_SIZE   (80 * K_BYTE)
 +
 +static const uint32_t spi_addr[MSF2_NUM_SPIS] = { 0x40001000 ,
 0x40011000 };
 +static const uint32_t uart_addr[MSF2_NUM_UARTS] = { 0x4000 ,
 0x4001 };
 +
 +static const int spi_irq[MSF2_NUM_SPIS] = { 2, 3 };
 +static const int uart_irq[MSF2_NUM_UARTS] = { 10, 11 };
 +static const int timer_irq[MSF2_NUM_TIMERS] = { 14, 15 };
 +
 +static void m2sxxx_soc_initfn(Object *obj)
 +{
 +MSF2State *s = MSF2_SOC(obj);
 +int i;
 

Re: [Qemu-devel] [PATCH 2/7] trace: Add event "guest_inst_before"

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:19 AM, Lluís Vilanova wrote:
>  while (true) {
> +target_ulong pc_insn = db->pc_next;

Why not just "pc"?

> +
>  db->num_insns++;
>  ops->insn_start(db, cpu);
>  tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
> @@ -96,6 +98,7 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  if (db->num_insns == 1) {
>  trace_guest_bbl_before_tcg(cpu, tcg_ctx.tcg_env, db->pc_first);
>  }
> +trace_guest_inst_before_tcg(cpu, tcg_ctx.tcg_env, pc_insn);

I prefer "insn" over "inst".  There are enough other words that begin with
"inst" (e.g. instance) to possibly be confusing.  Either that or it's my 20
years working on gcc that ingrained "insn".  ;-)


r~



Re: [Qemu-devel] [PATCH 1/7] trace: Add event "guest_bbl_before"

2017-09-13 Thread Richard Henderson
On 09/10/2017 09:15 AM, Lluís Vilanova wrote:
> Signed-off-by: Lluís Vilanova 
> ---
>  accel/tcg/translator.c |6 ++
>  trace-events   |   11 +++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index afa3af478a..91b3b0da32 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -13,6 +13,7 @@
>  #include "cpu.h"
>  #include "tcg/tcg.h"
>  #include "tcg/tcg-op.h"
> +#include "trace-tcg.h"
>  #include "exec/exec-all.h"
>  #include "exec/gen-icount.h"
>  #include "exec/log.h"
> @@ -91,6 +92,11 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  }
>  }
>  
> +/* Tracing before */
> +if (db->num_insns == 1) {
> +trace_guest_bbl_before_tcg(cpu, tcg_ctx.tcg_env, db->pc_first);
> +}

Why not place this before the loop, so that you don't
have to check num_insns == 1?

> +vcpu tcg guest_bbl_before(uint64_t vaddr) "vaddr=0x%016"PRIx64, 
> "vaddr=0x%016"PRIx64

You're really going to print both ENV and PC tagged with "vaddr"?
That just seems confusing.

Also, terminology.  A "basic block" ("bb" by preference, not "bbl"), has a
specific meaning (https://en.wikipedia.org/wiki/Basic_block).  What we're
generating here is a TranslationBlock (which may consist of many basic blocks),
and oft contracted within the source as "tb".


r~



  1   2   3   4   5   >