[Qemu-devel] [Bug 1248854] Re: The guest cannot boot up with parameter "-no-acpi"
** Changed in: qemu Status: Confirmed => Fix Committed ** Changed in: qemu Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1248854 Title: The guest cannot boot up with parameter "-no-acpi" Status in QEMU: Fix Released Bug description: Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Linux kvm.git Commit:81e87e26796782e014fd1f2bb9cd8fb6ce4021a8 qemu.git Commit:a126050a103c924b03388a9a64ce9af8c96b0969 Host Kernel Version:3.12.0-rc5 Hardware:Romley_EP ,Ivytowm_EP Bug detailed description: -- when create guest with parameter "-no-acpi", the guest cannot boot up. note: this should be a qemu bug kvm + qemu = result 81e87e26 + a126050a = bad 81e87e26 + b8616055 = good Reproduce steps: 1. create guest qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none -no-acpi rhel6u4.qcow Current result: guest cannot boot up Expected result: guest boot up fine. Basic root-causing log: -- [root@vt-snb9 qemu]#qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none /root/rhel6u4.qcow -no-acpi VNC server running on `::1:5900' qemu-system-x86_64: /root/qemu/hw/i386/acpi-build.c:135: acpi_get_pm_info: Assertion `obj' failed. Aborted (core dumped) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1248854/+subscriptions
[Qemu-devel] [Bug 1248854] Re: The guest cannot boot up with parameter "-no-acpi"
kvm.git + qemu.git: ede58222_5c5432e7 test on Romley_EP, Ivytown_EP, create guest with parameter "-no-acpi", the guest boot up fine. ** Changed in: qemu Status: New => Confirmed -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1248854 Title: The guest cannot boot up with parameter "-no-acpi" Status in QEMU: Confirmed Bug description: Environment: Host OS (ia32/ia32e/IA64):ia32e Guest OS (ia32/ia32e/IA64):ia32e Guest OS Type (Linux/Windows):Linux kvm.git Commit:81e87e26796782e014fd1f2bb9cd8fb6ce4021a8 qemu.git Commit:a126050a103c924b03388a9a64ce9af8c96b0969 Host Kernel Version:3.12.0-rc5 Hardware:Romley_EP ,Ivytowm_EP Bug detailed description: -- when create guest with parameter "-no-acpi", the guest cannot boot up. note: this should be a qemu bug kvm + qemu = result 81e87e26 + a126050a = bad 81e87e26 + b8616055 = good Reproduce steps: 1. create guest qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none -no-acpi rhel6u4.qcow Current result: guest cannot boot up Expected result: guest boot up fine. Basic root-causing log: -- [root@vt-snb9 qemu]#qemu-system-x86_64 -enable-kvm -m 1024 -smp 2 -net none /root/rhel6u4.qcow -no-acpi VNC server running on `::1:5900' qemu-system-x86_64: /root/qemu/hw/i386/acpi-build.c:135: acpi_get_pm_info: Assertion `obj' failed. Aborted (core dumped) To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1248854/+subscriptions
[Qemu-devel] [RFC PATCH v0 3/3] gluster: Add support for creating zero-filled image
GlusterFS supports creation of zero-filled file on GlusterFS volume by means of an API called glfs_zerofill(). Use this API from QEMU to create an image that is filled with zeroes by using the preallocation option of qemu-img. qemu-img create gluster://server/volume/image -o preallocation=full 10G The allowed values for preallocation are 'full' and 'off'. By default preallocation is off and image is not zero-filled. glfs_zerofill() offloads the writing of zeroes to the server and if the storage supports SCSI WRITESAME, GlusterFS server can issue BLKZEROOUT ioctl to achieve the zeroing. Signed-off-by: Bharata B Rao --- block/gluster.c | 45 + 1 file changed, 45 insertions(+) diff --git a/block/gluster.c b/block/gluster.c index 15f5dfb..2368997 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -381,6 +381,29 @@ out: qemu_aio_release(acb); return ret; } + +static inline int gluster_supports_zerofill(void) +{ +return 1; +} + +static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset, +int64_t size) +{ +return glfs_zerofill(fd, offset, size); +} + +#else +static inline int gluster_supports_zerofill(void) +{ +return 0; +} + +static inline int qemu_gluster_zerofill(struct glfs_fd *fd, int64_t offset, +int64_t size) +{ +return 0; +} #endif static int qemu_gluster_create(const char *filename, @@ -389,6 +412,7 @@ static int qemu_gluster_create(const char *filename, struct glfs *glfs; struct glfs_fd *fd; int ret = 0; +int prealloc = 0; int64_t total_size = 0; GlusterConf *gconf = g_malloc0(sizeof(GlusterConf)); @@ -401,6 +425,18 @@ static int qemu_gluster_create(const char *filename, while (options && options->name) { if (!strcmp(options->name, BLOCK_OPT_SIZE)) { total_size = options->value.n / BDRV_SECTOR_SIZE; +} else if (!strcmp(options->name, BLOCK_OPT_PREALLOC)) { +if (!options->value.s || !strcmp(options->value.s, "off")) { +prealloc = 0; +} else if (!strcmp(options->value.s, "full") && +gluster_supports_zerofill()) { +prealloc = 1; +} else { +error_setg(errp, "Invalid preallocation mode: '%s'" +" or GlusterFS doesn't support zerofill API", + options->value.s); +return -EINVAL; +} } options++; } @@ -413,6 +449,10 @@ static int qemu_gluster_create(const char *filename, if (glfs_ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { ret = -errno; } +if (prealloc && qemu_gluster_zerofill(fd, 0, +total_size * BDRV_SECTOR_SIZE)) { +ret = -errno; +} if (glfs_close(fd) != 0) { ret = -errno; } @@ -595,6 +635,11 @@ static QEMUOptionParameter qemu_gluster_create_options[] = { .type = OPT_SIZE, .help = "Virtual disk size" }, +{ +.name = BLOCK_OPT_PREALLOC, +.type = OPT_STRING, +.help = "Preallocation mode (allowed values: off, on)" +}, { NULL } }; -- 1.7.11.7
[Qemu-devel] [RFC PATCH v0 2/3] gluster: Implement .bdrv_co_write_zeroes for gluster
Support .bdrv_co_write_zeroes() from gluster driver by using GlusterFS API glfs_zerofill() that off-loads the writing of zeroes to GlusterFS server. Signed-off-by: Bharata B Rao --- block/gluster.c | 101 configure | 8 + 2 files changed, 81 insertions(+), 28 deletions(-) diff --git a/block/gluster.c b/block/gluster.c index 9f85228..15f5dfb 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -250,6 +250,34 @@ static void qemu_gluster_complete_aio(void *opaque) qemu_aio_release(acb); } +/* + * AIO callback routine called from GlusterFS thread. + */ +static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg) +{ +GlusterAIOCB *acb = (GlusterAIOCB *)arg; + +acb->ret = ret; +acb->bh = qemu_bh_new(qemu_gluster_complete_aio, acb); +qemu_bh_schedule(acb->bh); +} + +static void qemu_gluster_aio_cancel(BlockDriverAIOCB *blockacb) +{ +GlusterAIOCB *acb = (GlusterAIOCB *)blockacb; +bool finished = false; + +acb->finished = &finished; +while (!finished) { +qemu_aio_wait(); +} +} + +static const AIOCBInfo gluster_aiocb_info = { +.aiocb_size = sizeof(GlusterAIOCB), +.cancel = qemu_gluster_aio_cancel, +}; + /* TODO Convert to fine grained options */ static QemuOptsList runtime_opts = { .name = "gluster", @@ -322,6 +350,39 @@ out: return ret; } +#ifdef CONFIG_GLUSTERFS_ZEROFILL +static int qemu_gluster_co_write_zeroes(BlockDriverState *bs, +int64_t sector_num, int nb_sectors) +{ +int ret; +GlusterAIOCB *acb; +BDRVGlusterState *s = bs->opaque; +off_t size; +off_t offset; + +offset = sector_num * BDRV_SECTOR_SIZE; +size = nb_sectors * BDRV_SECTOR_SIZE; + +acb = qemu_aio_get(&gluster_aiocb_info, bs, NULL, NULL); +acb->size = size; +acb->ret = 0; +acb->finished = NULL; +acb->coroutine = qemu_coroutine_self(); + +ret = glfs_zerofill_async(s->fd, offset, size, &gluster_finish_aiocb, acb); +if (ret < 0) { +goto out; +} + +qemu_coroutine_yield(); +return acb->ret; + +out: +qemu_aio_release(acb); +return ret; +} +#endif + static int qemu_gluster_create(const char *filename, QEMUOptionParameter *options, Error **errp) { @@ -364,34 +425,6 @@ out: return ret; } -static void qemu_gluster_aio_cancel(BlockDriverAIOCB *blockacb) -{ -GlusterAIOCB *acb = (GlusterAIOCB *)blockacb; -bool finished = false; - -acb->finished = &finished; -while (!finished) { -qemu_aio_wait(); -} -} - -static const AIOCBInfo gluster_aiocb_info = { -.aiocb_size = sizeof(GlusterAIOCB), -.cancel = qemu_gluster_aio_cancel, -}; - -/* - * AIO callback routine called from GlusterFS thread. - */ -static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg) -{ -GlusterAIOCB *acb = (GlusterAIOCB *)arg; - -acb->ret = ret; -acb->bh = qemu_bh_new(qemu_gluster_complete_aio, acb); -qemu_bh_schedule(acb->bh); -} - static coroutine_fn int qemu_gluster_aio_rw(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int write) { @@ -583,6 +616,9 @@ static BlockDriver bdrv_gluster = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +#ifdef CONFIG_GLUSTERFS_ZEROFILL +.bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, +#endif .create_options = qemu_gluster_create_options, }; @@ -604,6 +640,9 @@ static BlockDriver bdrv_gluster_tcp = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +#ifdef CONFIG_GLUSTERFS_ZEROFILL +.bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, +#endif .create_options = qemu_gluster_create_options, }; @@ -625,6 +664,9 @@ static BlockDriver bdrv_gluster_unix = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +#ifdef CONFIG_GLUSTERFS_ZEROFILL +.bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, +#endif .create_options = qemu_gluster_create_options, }; @@ -646,6 +688,9 @@ static BlockDriver bdrv_gluster_rdma = { #ifdef CONFIG_GLUSTERFS_DISCARD .bdrv_co_discard = qemu_gluster_co_discard, #endif +#ifdef CONFIG_GLUSTERFS_ZEROFILL +.bdrv_co_write_zeroes = qemu_gluster_co_write_zeroes, +#endif .create_options = qemu_gluster_create_options, }; diff --git a/configure b/configure index 508f6a5..3c267a4 100755 --- a/configure +++ b/configure @@ -255,6 +255,7 @@ coroutine_pool="" seccomp="" glusterfs="" glusterfs_discard="no" +glusterfs_zerofill="no" virtio_blk_data_plane="" gtk="" gtkabi="2.0" @@ -2670,6 +2671,9 @@ if test "$glusterfs" != "no" ; then if $pkg_config --atleast-version=5 glusterfs-api; then glusterfs_discard="yes" fi +if $pkg_config --atlea
[Qemu-devel] [RFC PATCH v0 1/3] gluster: Convert aio routines into coroutines
Convert the read, write, flush and discard implementations from aio-based ones to coroutine based ones. Signed-off-by: Bharata B Rao --- block/gluster.c | 168 +--- 1 file changed, 63 insertions(+), 105 deletions(-) diff --git a/block/gluster.c b/block/gluster.c index 877686a..9f85228 100644 --- a/block/gluster.c +++ b/block/gluster.c @@ -26,11 +26,11 @@ typedef struct GlusterAIOCB { int ret; bool *finished; QEMUBH *bh; +Coroutine *coroutine; } GlusterAIOCB; typedef struct BDRVGlusterState { struct glfs *glfs; -int fds[2]; struct glfs_fd *fd; int event_reader_pos; GlusterAIOCB *event_acb; @@ -231,46 +231,23 @@ out: return NULL; } -static void qemu_gluster_complete_aio(GlusterAIOCB *acb, BDRVGlusterState *s) +static void qemu_gluster_complete_aio(void *opaque) { -int ret; -bool *finished = acb->finished; -BlockDriverCompletionFunc *cb = acb->common.cb; -void *opaque = acb->common.opaque; - -if (!acb->ret || acb->ret == acb->size) { -ret = 0; /* Success */ -} else if (acb->ret < 0) { -ret = acb->ret; /* Read/Write failed */ -} else { -ret = -EIO; /* Partial read/write - fail it */ -} +GlusterAIOCB *acb = (GlusterAIOCB *)opaque; -qemu_aio_release(acb); -cb(opaque, ret); -if (finished) { -*finished = true; +if (acb->ret == acb->size) { +acb->ret = 0; +} else if (acb->ret > 0) { +acb->ret = -EIO; /* Partial read/write - fail it */ } -} -static void qemu_gluster_aio_event_reader(void *opaque) -{ -BDRVGlusterState *s = opaque; -ssize_t ret; - -do { -char *p = (char *)&s->event_acb; - -ret = read(s->fds[GLUSTER_FD_READ], p + s->event_reader_pos, - sizeof(s->event_acb) - s->event_reader_pos); -if (ret > 0) { -s->event_reader_pos += ret; -if (s->event_reader_pos == sizeof(s->event_acb)) { -s->event_reader_pos = 0; -qemu_gluster_complete_aio(s->event_acb, s); -} -} -} while (ret < 0 && errno == EINTR); +qemu_bh_delete(acb->bh); +acb->bh = NULL; +qemu_coroutine_enter(acb->coroutine, NULL); +if (acb->finished) { +*acb->finished = true; +} +qemu_aio_release(acb); } /* TODO Convert to fine grained options */ @@ -309,7 +286,6 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options, filename = qemu_opt_get(opts, "filename"); - s->glfs = qemu_gluster_init(gconf, filename); if (!s->glfs) { ret = -errno; @@ -329,18 +305,8 @@ static int qemu_gluster_open(BlockDriverState *bs, QDict *options, s->fd = glfs_open(s->glfs, gconf->image, open_flags); if (!s->fd) { ret = -errno; -goto out; } -ret = qemu_pipe(s->fds); -if (ret < 0) { -ret = -errno; -goto out; -} -fcntl(s->fds[GLUSTER_FD_READ], F_SETFL, O_NONBLOCK); -qemu_aio_set_fd_handler(s->fds[GLUSTER_FD_READ], -qemu_gluster_aio_event_reader, NULL, s); - out: qemu_opts_del(opts); qemu_gluster_gconf_free(gconf); @@ -414,28 +380,20 @@ static const AIOCBInfo gluster_aiocb_info = { .cancel = qemu_gluster_aio_cancel, }; +/* + * AIO callback routine called from GlusterFS thread. + */ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void *arg) { GlusterAIOCB *acb = (GlusterAIOCB *)arg; -BlockDriverState *bs = acb->common.bs; -BDRVGlusterState *s = bs->opaque; -int retval; acb->ret = ret; -retval = qemu_write_full(s->fds[GLUSTER_FD_WRITE], &acb, sizeof(acb)); -if (retval != sizeof(acb)) { -/* - * Gluster AIO callback thread failed to notify the waiting - * QEMU thread about IO completion. - */ -error_report("Gluster AIO completion failed: %s", strerror(errno)); -abort(); -} +acb->bh = qemu_bh_new(qemu_gluster_complete_aio, acb); +qemu_bh_schedule(acb->bh); } -static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs, -int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, -BlockDriverCompletionFunc *cb, void *opaque, int write) +static coroutine_fn int qemu_gluster_aio_rw(BlockDriverState *bs, +int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int write) { int ret; GlusterAIOCB *acb; @@ -446,10 +404,11 @@ static BlockDriverAIOCB *qemu_gluster_aio_rw(BlockDriverState *bs, offset = sector_num * BDRV_SECTOR_SIZE; size = nb_sectors * BDRV_SECTOR_SIZE; -acb = qemu_aio_get(&gluster_aiocb_info, bs, cb, opaque); +acb = qemu_aio_get(&gluster_aiocb_info, bs, NULL, NULL); acb->size = size; acb->ret = 0; acb->finished = NULL; +acb->coroutine = qemu_coroutine_self(); if (write) { ret = glfs_pwritev_async(s->fd, qiov->iov, qiov->niov, offset, 0, @@ -462,11 +421
[Qemu-devel] [RFC PATCH v0 0/3] gluster: conversion to coroutines and supporting write_zeroes
Hi, This series is about converting all the bdrv_aio* implementations in gluster driver to coroutine based implementations. Read, write, flush and discard routines are converted. This also adds support for .bdrv_co_write_zeroes() in gluster and provides a new preallocation option with qemu-img (-o preallocation=full) that can be used for raw images on GlusterFS backend to create fully allocated and zero-filled images. Bharata B Rao (3): gluster: Convert aio routines into coroutines gluster: Implement .bdrv_co_write_zeroes for gluster gluster: Add support for creating zero-filled image block/gluster.c | 298 configure | 8 ++ 2 files changed, 181 insertions(+), 125 deletions(-) -- 1.7.11.7
Re: [Qemu-devel] [PATCH V5 3/5] qemu-iotests: add 058 internal snapshot export with qemu-nbd case
于 2013/11/19 19:29, Kevin Wolf 写道: Am 10.11.2013 um 23:03 hat Wenchao Xia geschrieben: Signed-off-by: Wenchao Xia --- tests/qemu-iotests/058 | 102 tests/qemu-iotests/058.out | 32 ++ tests/qemu-iotests/check |1 + tests/qemu-iotests/group |1 + 4 files changed, 136 insertions(+), 0 deletions(-) create mode 100755 tests/qemu-iotests/058 create mode 100644 tests/qemu-iotests/058.out I think this should have: _supported_proto nbd Or otherwise the check in common needs to be changed. At least for me the test failed without an error message because I didn't have a qemu-nbd symlink in place. Kevin I found a little problem: $QEMU_IMG snapshot -c sn1 "$TEST_IMG" can't work when IMGPROTO=nbd, since in common.rc the image was not exported as raw: if [ $IMGPROTO = "nbd" ]; then eval "$QEMU_NBD -v -t -b 127.0.0.1 -p 10810 $TEST_IMG_FILE &" QEMU_NBD_PID=$! sleep 1 # FIXME: qemu-nbd needs to be listening before we continue fi So later the test case think $TEST_IMG as a raw image. All case used $QEMU_IMG snapshot will fail, such as 015? Do you think this is a bug? Instead if change common.rc to exported as raw(I am not sure whether it is done on purpose), how about add a check called: _required_util in common.rc?
Re: [Qemu-devel] [PATCH for 1.7] .gitignore: Ignore config.status
Am 22.11.2013 07:39, schrieb Fam Zheng: > Signed-off-by: Fam Zheng > --- > .gitignore | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/.gitignore b/.gitignore > index 5584b5f..1c9d63d 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -3,6 +3,7 @@ config-all-devices.* > config-all-disas.* > config-host.* > config-target.* > +config.status > trace/generated-tracers.h > trace/generated-tracers.c > trace/generated-tracers-dtrace.h Reviewed-by: Stefan Weil This is a small bug fix for the build environment without any risk which might be added to QEMU 1.7. Stefan
Re: [Qemu-devel] Buildbot failure: MinGW (was: qga/vss-win32/requester.h compile error)
Am 21.11.2013 09:40, schrieb Stefan Hajnoczi: > Excellent, thanks! The latest buildbot has compiled successfully: > http://buildbot.b1-systems.de/qemu/builders/default_mingw32/builds/784 > > It's now hitting a make check failure unrelated to VSS. > > Thanks, > Stefan 'make check' for MinGW cross builds needs an installation of wine to run the resulting exe files. I also had to fix tests/Makefile to exclude some POSIX-only tests and to makesome smaller code fixes. Now I can run 'make check' for MinGW on a Linux host. Patches will follow after QEMU 1.7 - maybe you can prepare the buildbot with wine. Regards, Stefan
[Qemu-devel] [PATCH] .gitignore: Ignore config.status
Signed-off-by: Fam Zheng --- .gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/.gitignore b/.gitignore index 5584b5f..1c9d63d 100644 --- a/.gitignore +++ b/.gitignore @@ -3,6 +3,7 @@ config-all-devices.* config-all-disas.* config-host.* config-target.* +config.status trace/generated-tracers.h trace/generated-tracers.c trace/generated-tracers-dtrace.h -- 1.8.4.2
Re: [Qemu-devel] [RFC PATCH v3 0/2] Point-in-time snapshot exporting over NBD
On 2013年11月20日 10:32, Ian Main wrote: On Thu, Oct 17, 2013 at 01:36:41PM +0800, Fam Zheng wrote: This series adds for point-in-time snapshot NBD exporting based on blockdev-backup (variant of drive-backup with existing device as target). In general this seems to work great. I did a bunch of testing and was able to mount filesystems over NBD, do many writes (on backed up image)/reads (on target), intense IO etc and all held up fine. There are definitely some issues with some of the commands allowing you to blow things up. I'm interested to hear opinions on whether this is a showstopper at this time or not. Thanks very much for your testing, Ian. QEMU should report error instead of crashing with invalid operations. I added an "operation blocker" and incorporated into the next version. In the future, we still want to review more on those commands and try to enable safe ones, and possibly also allow multiple block jobs on a BDS. For now it is good enough to only allow blockdev-backup on the source and NBD export on the target (which is safe, and all what we need for image fleecing, as this version shows). With that being a working implementation, I've posted v4. Please test and free to make comments. We get a thin point-in-time snapshot by COW mechanism of drive-backup, and export it through built in NBD server. The steps are as below: 1. (SHELL) qemu-img create -f qcow2 BACKUP.qcow2 (Alternatively we can use -o backing_file=RUNNING-VM.img to omit explicitly providing the size by ourselves, but it's risky because RUNNING-VM.qcow2 is used r/w by guest. Whether or not setting backing file in the image file doesn't matter, as we are going to override the backing hd in the next step) 2. (QMP) blockdev-add backing=source-drive file.driver=file file.filename=BACKUP.qcow2 id=target0 if=none driver=qcow2 I had to create a custom python script to make these commands work as they require nested dicts. Is that normal or did I miss something entirely? Yes, I use a python script locally to test it too. Fam
[Qemu-devel] [PATCH v4 5/7] block: Parse "backing" option to reference existing BDS
Signed-off-by: Fam Zheng --- block.c | 37 - include/block/block_int.h | 3 +++ 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/block.c b/block.c index 3bf4c8a..a5da656 100644 --- a/block.c +++ b/block.c @@ -1166,11 +1166,33 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options, /* If there is a backing file, use it */ if ((flags & BDRV_O_NO_BACKING) == 0) { QDict *backing_options; - -qdict_extract_subqdict(options, &backing_options, "backing."); -ret = bdrv_open_backing_file(bs, backing_options, &local_err); -if (ret < 0) { -goto close_and_fail; +const char *backing_bs; + +backing_bs = qdict_get_try_str(options, "backing"); +if (backing_bs) { +bs->backing_hd = bdrv_find(backing_bs); +qdict_del(options, "backing"); +if (bs->backing_hd) { +bdrv_ref(bs->backing_hd); +assert(!bs->backing_blocker); +error_setg(&bs->backing_blocker, + "device is used as backing hd of '%s'", + bs->device_name); +bdrv_op_block_all(bs->backing_hd, bs->backing_blocker); +pstrcpy(bs->backing_file, sizeof(bs->backing_file), +bs->backing_hd->filename); +pstrcpy(bs->backing_format, sizeof(bs->backing_format), +bs->backing_hd->drv->format_name); +} else { +error_setg(errp, "Backing device not found: %s", backing_bs); +goto close_and_fail; +} +} else { +qdict_extract_subqdict(options, &backing_options, "backing."); +ret = bdrv_open_backing_file(bs, backing_options, &local_err); +if (ret < 0) { +goto close_and_fail; +} } } @@ -1461,6 +1483,11 @@ void bdrv_close(BlockDriverState *bs) if (bs->drv) { if (bs->backing_hd) { +if (error_is_set(&bs->backing_blocker)) { +bdrv_op_unblock_all(bs->backing_hd, +bs->backing_blocker); +error_free(bs->backing_blocker); +} bdrv_unref(bs->backing_hd); bs->backing_hd = NULL; } diff --git a/include/block/block_int.h b/include/block/block_int.h index 60edc80..6db30c9 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -316,6 +316,9 @@ struct BlockDriverState { BlockJob *job; QDict *options; + +/* For backing reference, block the operations of named backing device */ +Error *backing_blocker; }; int get_tmp_filename(char *filename, int size); -- 1.8.4.2
[Qemu-devel] [PATCH v4 7/7] block: Allow backup on referenced named BlockDriverState
Drive backup is a read only operation on source bs. We want to allow this specific case to enable image-fleecing. Note that when image-fleecing job starts, the job still add its blocker to source bs, and any other operation on it will be blocked by that. Signed-off-by: Fam Zheng --- block.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block.c b/block.c index a5da656..d30be51 100644 --- a/block.c +++ b/block.c @@ -1179,6 +1179,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options, "device is used as backing hd of '%s'", bs->device_name); bdrv_op_block_all(bs->backing_hd, bs->backing_blocker); +bdrv_op_unblock(bs->backing_hd, BLOCK_OP_TYPE_BACKUP, +bs->backing_blocker); pstrcpy(bs->backing_file, sizeof(bs->backing_file), bs->backing_hd->filename); pstrcpy(bs->backing_format, sizeof(bs->backing_format), -- 1.8.4.2
[Qemu-devel] [PATCH v4 4/7] block: Add checks of blocker in block operations
Before operate on a BlockDriverState, respective types are checked against bs->op_blockers and it will error out if there's a blocker. Signed-off-by: Fam Zheng --- blockdev-nbd.c | 4 blockdev.c | 25 + 2 files changed, 29 insertions(+) diff --git a/blockdev-nbd.c b/blockdev-nbd.c index 922cf56..c49feae 100644 --- a/blockdev-nbd.c +++ b/blockdev-nbd.c @@ -92,6 +92,10 @@ void qmp_nbd_server_add(const char *device, bool has_writable, bool writable, return; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_NBD_SERVER_ADD, errp)) { +return; +} + if (!has_writable) { writable = false; } diff --git a/blockdev.c b/blockdev.c index eb55b74..1921d27 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1001,6 +1001,11 @@ SnapshotInfo *qmp_blockdev_snapshot_delete_internal_sync(const char *device, return NULL; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT_DELETE, + errp)) { +return NULL; +} + ret = bdrv_snapshot_find_by_id_and_name(bs, id, name, &sn, &local_err); if (error_is_set(&local_err)) { error_propagate(errp, local_err); @@ -1098,6 +1103,10 @@ static void internal_snapshot_prepare(BlkTransactionState *common, return; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_INTERNAL_SNAPSHOT, errp)) { +return; +} + if (!bdrv_is_inserted(bs)) { error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); return; @@ -1485,6 +1494,10 @@ void qmp_block_passwd(const char *device, const char *password, Error **errp) return; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_PASSWD, errp)) { +return; +} + err = bdrv_set_key(bs, password); if (err == -EINVAL) { error_set(errp, QERR_DEVICE_NOT_ENCRYPTED, bdrv_get_device_name(bs)); @@ -1536,6 +1549,10 @@ void qmp_change_blockdev(const char *device, const char *filename, return; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_CHANGE, errp)) { +return; +} + if (format) { drv = bdrv_find_whitelisted_format(format, bs->read_only); if (!drv) { @@ -1586,6 +1603,10 @@ void qmp_block_set_io_throttle(const char *device, int64_t bps, int64_t bps_rd, return; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_SET_IO_THROTTLE, errp)) { +return; +} + memset(&cfg, 0, sizeof(cfg)); cfg.buckets[THROTTLE_BPS_TOTAL].avg = bps; cfg.buckets[THROTTLE_BPS_READ].avg = bps_rd; @@ -1684,6 +1705,10 @@ void qmp_block_resize(const char *device, int64_t size, Error **errp) return; } +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_RESIZE, errp)) { +return; +} + if (size < 0) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "size", "a >0 size"); return; -- 1.8.4.2
[Qemu-devel] [PATCH v4 6/7] qmp: add command 'blockdev-backup'
Similar to drive-backup, but this command uses a device id as target instead of creating/opening an image file. Also add blocker on target bs, since the target is also a named device now. Signed-off-by: Fam Zheng --- block/backup.c | 22 ++ blockdev.c | 46 ++ qapi-schema.json | 49 + qmp-commands.hx | 46 ++ 4 files changed, 163 insertions(+) diff --git a/block/backup.c b/block/backup.c index cad14c9..b608f08 100644 --- a/block/backup.c +++ b/block/backup.c @@ -338,6 +338,7 @@ static void coroutine_fn backup_run(void *opaque) hbitmap_free(job->bitmap); bdrv_iostatus_disable(target); +bdrv_op_unblock_all(target, job->common.blocker); bdrv_unref(target); block_job_completed(&job->common, ret); @@ -363,6 +364,24 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target, return; } +if (!bdrv_is_inserted(bs)) { +error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, bs->device_name); +return; +} + +if (!bdrv_is_inserted(target)) { +error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, target->device_name); +return; +} + +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, errp)) { +return; +} + +if (bdrv_op_is_blocked(target, BLOCK_OP_TYPE_BACKUP, errp)) { +return; +} + len = bdrv_getlength(bs); if (len < 0) { error_setg_errno(errp, -len, "unable to get length for '%s'", @@ -376,6 +395,9 @@ void backup_start(BlockDriverState *bs, BlockDriverState *target, return; } +bdrv_op_block_all(target, job->common.blocker); +bdrv_op_unblock(target, BLOCK_OP_TYPE_NBD_SERVER_ADD, job->common.blocker); + job->on_source_error = on_source_error; job->on_target_error = on_target_error; job->target = target; diff --git a/blockdev.c b/blockdev.c index 1921d27..e62fb35 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1900,6 +1900,8 @@ void qmp_drive_backup(const char *device, const char *target, return; } +/* Although backup_run has this check too, we need to use bs->drv below, so + * do an early check redundantly. */ if (!bdrv_is_inserted(bs)) { error_set(errp, QERR_DEVICE_HAS_NO_MEDIUM, device); return; @@ -1974,6 +1976,50 @@ void qmp_drive_backup(const char *device, const char *target, } } +void qmp_blockdev_backup(const char *device, const char *target, + enum MirrorSyncMode sync, + bool has_speed, int64_t speed, + bool has_on_source_error, + BlockdevOnError on_source_error, + bool has_on_target_error, + BlockdevOnError on_target_error, + Error **errp) +{ +BlockDriverState *bs; +BlockDriverState *target_bs; +Error *local_err = NULL; + +if (!has_speed) { +speed = 0; +} +if (!has_on_source_error) { +on_source_error = BLOCKDEV_ON_ERROR_REPORT; +} +if (!has_on_target_error) { +on_target_error = BLOCKDEV_ON_ERROR_REPORT; +} + +bs = bdrv_find(device); +if (!bs) { +error_set(errp, QERR_DEVICE_NOT_FOUND, device); +return; +} + +target_bs = bdrv_find(target); +if (!target_bs) { +error_set(errp, QERR_DEVICE_NOT_FOUND, target); +return; +} + +bdrv_ref(target_bs); +backup_start(bs, target_bs, speed, sync, on_source_error, on_target_error, + block_job_cb, bs, &local_err); +if (local_err != NULL) { +bdrv_unref(target_bs); +error_propagate(errp, local_err); +} +} + #define DEFAULT_MIRROR_BUF_SIZE (10 << 20) void qmp_drive_mirror(const char *device, const char *target, diff --git a/qapi-schema.json b/qapi-schema.json index 4656e8c..bd3c55c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1844,6 +1844,40 @@ '*on-target-error': 'BlockdevOnError' } } ## +# @BlockdevBackup +# +# @device: the name of the device which should be copied. +# +# @target: the name of the backup target device +# +# @sync: what parts of the disk image should be copied to the destination +#(all the disk, only the sectors allocated in the topmost image, or +#only new I/O). +# +# @speed: #optional the maximum speed, in bytes per second +# +# @on-source-error: #optional the action to take on an error on the source, +# default 'report'. 'stop' and 'enospc' can only be used +# if the block device supports io-status (see BlockInfo). +# +# @on-target-error: #optional the action to take on an error on the target, +# default 'report' (no limitations, since this applies to +# a different block device than @device). +# +# Note that @on-source-error and @on-
[Qemu-devel] [PATCH v4 3/7] block: Replace in_use with operation blocker
This drops BlockDriverState.in_use with op_blockers: - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1). - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0). - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs). The specific types are used, e.g. in place of starting block backup, bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...). - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use). Note: there is no block for only specific types for now, i.e. a caller blocks all or none. So although the checks are type specific, the above changes can still be seen as identical in logic. Signed-off-by: Fam Zheng --- block-migration.c | 8 ++-- block.c | 34 +- blockdev.c | 29 +++-- blockjob.c | 12 +++- hw/block/dataplane/virtio-blk.c | 16 ++-- include/block/block.h | 2 -- include/block/block_int.h | 1 - include/block/blockjob.h| 3 +++ 8 files changed, 66 insertions(+), 39 deletions(-) diff --git a/block-migration.c b/block-migration.c index daf9ec1..12afcfa 100644 --- a/block-migration.c +++ b/block-migration.c @@ -58,6 +58,8 @@ typedef struct BlkMigDevState { /* Protected by block migration lock. */ unsigned long *aio_bitmap; int64_t completed_sectors; + +Error *blocker; } BlkMigDevState; typedef struct BlkMigBlock { @@ -336,7 +338,8 @@ static void init_blk_migration_it(void *opaque, BlockDriverState *bs) bmds->completed_sectors = 0; bmds->shared_base = block_mig_state.shared_base; alloc_aio_bitmap(bmds); -bdrv_set_in_use(bs, 1); +error_setg(&bmds->blocker, "block device is in use by migration"); +bdrv_op_block_all(bs, bmds->blocker); bdrv_ref(bs); block_mig_state.total_sector_sum += sectors; @@ -574,7 +577,8 @@ static void blk_mig_cleanup(void) blk_mig_lock(); while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) { QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry); -bdrv_set_in_use(bmds->bs, 0); +bdrv_op_unblock_all(bmds->bs, bmds->blocker); +error_free(bmds->blocker); bdrv_unref(bmds->bs); g_free(bmds->aio_bitmap); g_free(bmds); diff --git a/block.c b/block.c index 2b18a43..3bf4c8a 100644 --- a/block.c +++ b/block.c @@ -1628,15 +1628,17 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest, bs_dest->refcnt = bs_src->refcnt; /* job */ -bs_dest->in_use = bs_src->in_use; bs_dest->job= bs_src->job; /* keep the same entry in bdrv_states */ pstrcpy(bs_dest->device_name, sizeof(bs_dest->device_name), bs_src->device_name); +memcpy(bs_dest->op_blockers, bs_src->op_blockers, + sizeof(bs_dest->op_blockers)); bs_dest->list = bs_src->list; } +static bool bdrv_op_blocker_is_empty(BlockDriverState *bs); /* * Swap bs contents for two image chains while they are live, * while keeping required fields on the BlockDriverState that is @@ -1658,7 +1660,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old) assert(bs_new->dirty_bitmap == NULL); assert(bs_new->job == NULL); assert(bs_new->dev == NULL); -assert(bs_new->in_use == 0); +assert(bdrv_op_blocker_is_empty(bs_new)); assert(bs_new->io_limits_enabled == false); assert(!throttle_have_timer(&bs_new->throttle_state)); @@ -1677,7 +1679,7 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old) /* Check a few fields that should remain attached to the device */ assert(bs_new->dev == NULL); assert(bs_new->job == NULL); -assert(bs_new->in_use == 0); +assert(bdrv_op_blocker_is_empty(bs_new)); assert(bs_new->io_limits_enabled == false); assert(!throttle_have_timer(&bs_new->throttle_state)); @@ -1714,7 +1716,7 @@ static void bdrv_delete(BlockDriverState *bs) { assert(!bs->dev); assert(!bs->job); -assert(!bs->in_use); +assert(bdrv_op_blocker_is_empty(bs)); assert(!bs->refcnt); bdrv_close(bs); @@ -1887,6 +1889,7 @@ int bdrv_commit(BlockDriverState *bs) int ret = 0; uint8_t *buf; char filename[PATH_MAX]; +Error *local_err; if (!drv) return -ENOMEDIUM; @@ -1895,7 +1898,9 @@ int bdrv_commit(BlockDriverState *bs) return -ENOTSUP; } -if (bdrv_in_use(bs) || bdrv_in_use(bs->backing_hd)) { +if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT, &local_err) || +bdrv_op_is_blocked(bs->backing_hd, BLOCK_OP_TYPE_COMMIT, &local_err)) { +error_free(local_err); return -EBUSY; } @@ -2831,6 +2836,7 @@ int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs, int bdrv_truncate(BlockDriverState *bs, int64_t offset) { BlockDriver *drv = bs->d
[Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
BlockDriverState.op_blockers is an array of list with BLOCK_OP_TYPE_MAX elements. Each list is a list of blockers of an operation type (BlockOpType), that marks this BDS is currently blocked for certain type of operation with reason errors stored in the list. The rule of usage is: * BDS user who wants to take an operation should check if there's any blocker of the type with bdrv_op_is_blocked(). * BDS user who wants to block certain types of operation, should call bdrv_op_block (or bdrv_op_block_all to block all types of operations, which is similar to bdrv_set_in_use of now). * A blocker is only referenced by op_blockers, so the lifecycle is managed by caller, and shouldn't be lost until unblock, so typically a caller does these: - Allocate a blocker with error_setg or similar, call bdrv_op_block() to block some operations. - Hold the blocker, do his job. - Unblock operations that it blocked, with the same reason pointer passed to bdrv_op_unblock(). - Release the blocker with error_free(). Signed-off-by: Fam Zheng --- block.c | 55 +++ include/block/block.h | 6 ++ include/block/block_int.h | 5 + 3 files changed, 66 insertions(+) diff --git a/block.c b/block.c index 382ea71..2b18a43 100644 --- a/block.c +++ b/block.c @@ -4425,6 +4425,61 @@ void bdrv_unref(BlockDriverState *bs) } } +struct BdrvOpBlocker { +Error *reason; +QLIST_ENTRY(BdrvOpBlocker) list; +}; + +bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp) +{ +BdrvOpBlocker *blocker; +assert(op >=0 && op < BLOCK_OP_TYPE_MAX); +if (!QLIST_EMPTY(&bs->op_blockers[op])) { +blocker = QLIST_FIRST(&bs->op_blockers[op]); +*errp = error_copy(blocker->reason); +return true; +} +return false; +} + +void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason) +{ +BdrvOpBlocker *blocker; +assert(op >=0 && op < BLOCK_OP_TYPE_MAX); + +blocker = g_malloc0(sizeof(BdrvOpBlocker)); +blocker->reason = reason; +QLIST_INSERT_HEAD(&bs->op_blockers[op], blocker, list); +} + +void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason) +{ +BdrvOpBlocker *blocker, *next; +assert(op >=0 && op < BLOCK_OP_TYPE_MAX); +QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) { +if (blocker->reason == reason) { +QLIST_REMOVE(blocker, list); +g_free(blocker); +} +} +} + +void bdrv_op_block_all(BlockDriverState *bs, Error *reason) +{ +int i; +for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) { +bdrv_op_block(bs, i, reason); +} +} + +void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason) +{ +int i; +for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) { +bdrv_op_unblock(bs, i, reason); +} +} + void bdrv_set_in_use(BlockDriverState *bs, int in_use) { assert(bs->in_use != in_use); diff --git a/include/block/block.h b/include/block/block.h index 3560deb..693d305 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -403,6 +403,12 @@ void bdrv_unref(BlockDriverState *bs); void bdrv_set_in_use(BlockDriverState *bs, int in_use); int bdrv_in_use(BlockDriverState *bs); +bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp); +void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason); +void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason); +void bdrv_op_block_all(BlockDriverState *bs, Error *reason); +void bdrv_op_unblock_all(BlockDriverState *bs, Error *reason); + #ifdef CONFIG_LINUX_AIO int raw_get_aio_fd(BlockDriverState *bs); #else diff --git a/include/block/block_int.h b/include/block/block_int.h index 1666066..d8cef85 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -230,6 +230,8 @@ struct BlockDriver { QLIST_ENTRY(BlockDriver) list; }; +typedef struct BdrvOpBlocker BdrvOpBlocker; + /* * Note: the function bdrv_append() copies and swaps contents of * BlockDriverStates, so if you add new fields to this struct, please @@ -308,6 +310,9 @@ struct BlockDriverState { QLIST_HEAD(, BdrvTrackedRequest) tracked_requests; +/* operation blockers */ +QLIST_HEAD(, BdrvOpBlocker) op_blockers[BLOCK_OP_TYPE_MAX]; + /* long-running background operation */ BlockJob *job; -- 1.8.4.2
[Qemu-devel] [PATCH v4 0/7] Drop in_use from BlockDriverState and enable point-in-time snapshot exporting over NBD
This series adds for point-in-time snapshot NBD exporting based on blockdev-backup (variant of drive-backup with existing device as target). We get a thin point-in-time snapshot by COW mechanism of drive-backup, and export it through built in NBD server. The steps are as below: 1. (SHELL) qemu-img create -f qcow2 BACKUP.qcow2 (Alternatively we can use -o backing_file=RUNNING-VM.img to omit explicitly providing the size by ourselves, but it's risky because RUNNING-VM.qcow2 is used r/w by guest. Whether or not setting backing file in the image file doesn't matter, as we are going to override the backing hd in the next step) 2. (QMP) blockdev-add backing=source-drive file.driver=file file.filename=BACKUP.qcow2 id=target0 if=none driver=qcow2 (where ide0-hd0 is the running BlockDriverState name for RUNNING-VM.img. This patch implements "backing=" option to override backing_hd for added drive) 3. (QMP) blockdev-backup device=source-drive sync=none target=target0 (this is the QMP command introduced by this series, which use a named device as target of drive-backup) 4. (QMP) nbd-server-add device=target0 When image fleecing done: 1. (QMP) block-job-complete device=ide0-hd0 2. (HMP) drive_del target0 3. (SHELL) rm BACKUP.qcow2 v4: Dropping RFC, this series tries to address the crashing cases with an added safety mechanics. In the first half of series, replace the in_use flag with an operation blocker list, and block all operations on named backing hd: [01/07] qapi: Add BlockOperationType enum [02/07] block: Introduce op_blockers to BlockDriverState [03/07] block: Replace in_use with operation blocker [04/07] block: Add checks of blocker in block operations The second half enables point in time snapshot over NBD, with fixes from last revision: [05/07] block: Parse "backing" option to reference existing BDS Fix NULL dereference if device not found. [06/07] qmp: add command 'blockdev-backup' Moved some checks into backup_run. (Paolo) [07/07] block: Allow backup on referenced named BlockDriverState New. Removes one specific blocker on backing referenced BDS, so we can start a backup job on it. v3: Base on blockdev-add. The syntax blockdev-add backing= is new: This will make referenced BDS in the middle of a backing chain, which has significant effects over all existing block operations in QMP. It needs to be reviewed and tested carefully. I'm listing the commands here that can take a device id as parameter (but may not be complete). These commands do not mutate the backing chain (not directly, at least) and should be safe: block_passwd block_set_io_throttle block-job-set-speed block-job-cancel block-job-pause block-job-resume block-job-complete drive-backup blockdev-snapshot-sync blockdev-snapshot-internal-sync blockdev-snapshot-delete-internal-sync: These should be safe. nbd-server-add These can mutates the chain (removing, closing or swapping BDS), need more work to convert them to safe operations with a BDS in the middle. device_del eject: it does bdrv_close the device. change: internally calls eject. block-commit: it deletes intermediate BDSes, which may include other named BDS. block-stream: drive-mirror: it swaps active with target when complete. Resizing a middle BDS need to be reviewed too: block_resize: TBD. Adding and backing HD referencing will put other BDS in middle, but should not immediately break things: blockdev-add Fam Zheng (7): qapi: Add BlockOperationType enum block: Introduce op_blockers to BlockDriverState block: Replace in_use with operation blocker block: Add checks of blocker in block operations block: Parse "backing" option to reference existing BDS qmp: add command 'blockdev-backup' block: Allow backup on referenced named BlockDriverState block-migration.c | 8 ++- block.c | 124 ++-- block/backup.c | 22 +++ blockdev-nbd.c | 4 ++ blockdev.c | 100 blockjob.c | 12 ++-- hw/block/dataplane/virtio-blk.c | 16 -- include/block/block.h | 8 ++- include/block/block_int.h | 9 ++- include/block/blockjob.h| 3 + qapi-schema.json| 74 qmp-commands.hx | 46 +++ 12 files changed, 384 insertions(+), 42 deletions(-) -- 1.8.4.2
[Qemu-devel] [PATCH v4 1/7] qapi: Add BlockOperationType enum
This adds the enum of all the operations that can be taken on a block device. Signed-off-by: Fam Zheng --- qapi-schema.json | 25 + 1 file changed, 25 insertions(+) diff --git a/qapi-schema.json b/qapi-schema.json index 83fa485..4656e8c 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1440,6 +1440,31 @@ 'data': ['commit', 'stream', 'mirror', 'backup'] } ## +# @BlockOperationType +# Type of a block operation. +# +# Since: 1.8 +## +{ 'enum': 'BlockOpType', + 'data': [ +'backup', +'change', +'commit', +'dataplane', +'drive-del', +'eject', +'external-snapshot', +'internal-snapshot', +'internal-snapshot-delete', +'mirror', +'nbd-server-add', +'passwd', +'resize', +'set-io-throttle', +'stream' +] } + +## # @BlockJobInfo: # # Information about a long-running block device operation. -- 1.8.4.2
[Qemu-devel] [PATCH V16 07/11] NUMA: expand MAX_NODES from 64 to 128
libnuma choosed 128 for MAX_NODES, so we follow libnuma here. Signed-off-by: Wanlong Gao --- include/sysemu/sysemu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 291aa6a..807619e 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -132,7 +132,7 @@ extern size_t boot_splash_filedata_size; extern uint8_t qemu_extra_params_fw[2]; extern QEMUClockType rtc_clock; -#define MAX_NODES 64 +#define MAX_NODES 128 #define MAX_CPUMASK_BITS 255 extern int nb_numa_nodes; extern int nb_numa_mem_nodes; -- 1.8.5.rc3
[Qemu-devel] [PATCH V16 09/11] NUMA: set guest numa nodes memory policy
Set the guest numa nodes memory policies using the mbind(2) system call node by node. After this patch, we are able to set guest nodes memory policies through the QEMU options, this arms to solve the guest cross nodes memory access performance issue. And as you all know, if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =>kvm_assign_device() => kvm_iommu_map_memslots() => kvm_iommu_map_pages() => kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policies before the pages are really mapped. Signed-off-by: Andre Przywara Signed-off-by: Wanlong Gao --- numa.c | 86 ++ 1 file changed, 86 insertions(+) diff --git a/numa.c b/numa.c index da4dbbd..915a67a 100644 --- a/numa.c +++ b/numa.c @@ -27,6 +27,16 @@ #include "qapi-visit.h" #include "qapi/opts-visitor.h" #include "qapi/dealloc-visitor.h" +#include "exec/memory.h" + +#ifdef __linux__ +#include +#ifndef MPOL_F_RELATIVE_NODES +#define MPOL_F_RELATIVE_NODES (1 << 14) +#define MPOL_F_STATIC_NODES (1 << 15) +#endif +#endif + QemuOptsList qemu_numa_opts = { .name = "numa", .implied_opt_name = "type", @@ -228,6 +238,75 @@ void set_numa_nodes(void) } } +#ifdef __linux__ +static int node_parse_bind_mode(unsigned int nodeid) +{ +int bind_mode; + +switch (numa_info[nodeid].policy) { +case NUMA_NODE_POLICY_DEFAULT: +case NUMA_NODE_POLICY_PREFERRED: +case NUMA_NODE_POLICY_MEMBIND: +case NUMA_NODE_POLICY_INTERLEAVE: +bind_mode = numa_info[nodeid].policy; +break; +default: +bind_mode = NUMA_NODE_POLICY_DEFAULT; +return bind_mode; +} + +bind_mode |= numa_info[nodeid].relative ? +MPOL_F_RELATIVE_NODES : MPOL_F_STATIC_NODES; + +return bind_mode; +} +#endif + +static int set_node_mem_policy(int nodeid) +{ +#ifdef __linux__ +void *ram_ptr; +RAMBlock *block; +ram_addr_t len, ram_offset = 0; +int bind_mode; +int i; + +QTAILQ_FOREACH(block, &ram_list.blocks, next) { +if (!strcmp(block->mr->name, "pc.ram")) { +break; +} +} + +if (block->host == NULL) { +return -1; +} + +ram_ptr = block->host; +for (i = 0; i < nodeid; i++) { +len = numa_info[i].node_mem; +ram_offset += len; +} + +len = numa_info[nodeid].node_mem; +bind_mode = node_parse_bind_mode(nodeid); +unsigned long *nodes = numa_info[nodeid].host_mem; + +/* This is a workaround for a long standing bug in Linux' + * mbind implementation, which cuts off the last specified + * node. To stay compatible should this bug be fixed, we + * specify one more node and zero this one out. + */ +unsigned long maxnode = find_last_bit(nodes, MAX_NODES); +if (syscall(SYS_mbind, ram_ptr + ram_offset, len, bind_mode, +nodes, maxnode + 2, 0)) { +perror("mbind"); +return -1; +} +#endif + +return 0; +} + void set_numa_modes(void) { CPUState *cpu; @@ -240,4 +319,11 @@ void set_numa_modes(void) } } } + +for (i = 0; i < nb_numa_nodes; i++) { +if (set_node_mem_policy(i) == -1) { +fprintf(stderr, +"qemu: can not set host memory policy for node%d\n", i); +} +} } -- 1.8.5.rc3
[Qemu-devel] [PATCH] fixup
Signed-off-by: Wanlong Gao --- hw/i386/pc.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 50ed4cc..74c1f16 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1072,8 +1072,12 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, guest_info->apic_id_limit = pc_apic_id_limit(max_cpus); guest_info->apic_xrupt_override = kvm_allows_irq0_override(); guest_info->numa_nodes = nb_numa_nodes; -guest_info->node_mem = g_memdup(node_mem, guest_info->numa_nodes * +guest_info->node_mem = g_malloc0(guest_info->numa_nodes * sizeof *guest_info->node_mem); +for (i = 0; i < nb_numa_nodes; i++) { +guest_info->node_mem[i] = numa_info[i].node_mem; +} + guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit * sizeof *guest_info->node_cpu); @@ -1081,7 +1085,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, unsigned int apic_id = x86_cpu_apic_id_from_index(i); assert(apic_id < guest_info->apic_id_limit); for (j = 0; j < nb_numa_nodes; j++) { -if (test_bit(i, node_cpumask[j])) { +if (test_bit(i, numa_info[j].node_cpu)) { guest_info->node_cpu[apic_id] = j; break; } -- 1.8.5.rc0.44.gf26f72d
[Qemu-devel] [PATCH V16 01/11] NUMA: move numa related code to new file numa.c
Signed-off-by: Wanlong Gao --- Makefile.target | 2 +- cpus.c | 14 include/sysemu/cpus.h | 1 - include/sysemu/sysemu.h | 3 + numa.c | 182 vl.c| 139 +--- 6 files changed, 187 insertions(+), 154 deletions(-) create mode 100644 numa.c diff --git a/Makefile.target b/Makefile.target index af6ac7e..0197c17 100644 --- a/Makefile.target +++ b/Makefile.target @@ -109,7 +109,7 @@ endif #CONFIG_BSD_USER # # System emulator target ifdef CONFIG_SOFTMMU -obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o +obj-y += arch_init.o cpus.o monitor.o gdbstub.o balloon.o ioport.o numa.o obj-y += qtest.o obj-y += hw/ obj-$(CONFIG_FDT) += device_tree.o diff --git a/cpus.c b/cpus.c index 01d128d..53360b0 100644 --- a/cpus.c +++ b/cpus.c @@ -1297,20 +1297,6 @@ static void tcg_exec_all(void) exit_request = 0; } -void set_numa_modes(void) -{ -CPUState *cpu; -int i; - -CPU_FOREACH(cpu) { -for (i = 0; i < nb_numa_nodes; i++) { -if (test_bit(cpu->cpu_index, node_cpumask[i])) { -cpu->numa_node = i; -} -} -} -} - void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg) { /* XXX: implement xxx_cpu_list for targets that still miss it */ diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h index 6502488..4f79081 100644 --- a/include/sysemu/cpus.h +++ b/include/sysemu/cpus.h @@ -23,7 +23,6 @@ extern int smp_threads; #define smp_threads 1 #endif -void set_numa_modes(void); void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg); #endif diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 495dae8..2509649 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -136,6 +136,9 @@ extern QEMUClockType rtc_clock; extern int nb_numa_nodes; extern uint64_t node_mem[MAX_NODES]; extern unsigned long *node_cpumask[MAX_NODES]; +void numa_add(const char *optarg); +void set_numa_nodes(void); +void set_numa_modes(void); #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { diff --git a/numa.c b/numa.c new file mode 100644 index 000..ce7736a --- /dev/null +++ b/numa.c @@ -0,0 +1,182 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2013 Fujitsu Ltd. + * Author: Wanlong Gao + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "sysemu/sysemu.h" + +static void numa_node_parse_cpus(int nodenr, const char *cpus) +{ +char *endptr; +unsigned long long value, endvalue; + +/* Empty CPU range strings will be considered valid, they will simply + * not set any bit in the CPU bitmap. + */ +if (!*cpus) { +return; +} + +if (parse_uint(cpus, &value, &endptr, 10) < 0) { +goto error; +} +if (*endptr == '-') { +if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) { +goto error; +} +} else if (*endptr == '\0') { +endvalue = value; +} else { +goto error; +} + +if (endvalue >= MAX_CPUMASK_BITS) { +endvalue = MAX_CPUMASK_BITS - 1; +fprintf(stderr, +"qemu: NUMA: A max of %d VCPUs are supported\n", + MAX_CPUMASK_BITS); +} + +if (endvalue < value) { +goto error; +} + +bitmap_set(node_cpumask[nodenr], value, endvalue-value+1); +return; + +error: +fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus); +exit(1); +} + +void numa_add(const char *optarg) +{ +char option[128]; +char *endptr; +unsigned long long nodenr; + +optarg = get_opt_name(option, 128, optarg, ','); +if (*optarg == ',') { +optarg++; +} +if (!strcmp(option, "node")) { + +if (nb_numa_nodes >= MAX_NODES) { +fprintf(s
[Qemu-devel] [PATCH V16 11/11] NUMA: convert hmp command info_numa to use qmp command query_numa
Reviewed-by: Luiz Capitulino Signed-off-by: Wanlong Gao --- hmp.c | 57 + hmp.h | 1 + monitor.c | 21 + 3 files changed, 59 insertions(+), 20 deletions(-) diff --git a/hmp.c b/hmp.c index 32ee285..d6dedd2 100644 --- a/hmp.c +++ b/hmp.c @@ -24,6 +24,10 @@ #include "ui/console.h" #include "block/qapi.h" #include "qemu-io.h" +#include "qapi-visit.h" +#include "qapi/opts-visitor.h" +#include "qapi/dealloc-visitor.h" +#include "sysemu/sysemu.h" static void hmp_handle_error(Monitor *mon, Error **errp) { @@ -1564,3 +1568,56 @@ void hmp_qemu_io(Monitor *mon, const QDict *qdict) hmp_handle_error(mon, &err); } + +void hmp_info_numa(Monitor *mon, const QDict *qdict) +{ +NUMANodeList *node_list, *node; +uint16List *head; +int nodeid; +char *policy_str = NULL; + +node_list = qmp_query_numa(NULL); + +monitor_printf(mon, "%d nodes\n", nb_numa_nodes); +for (node = node_list; node; node = node->next) { +nodeid = node->value->nodeid; +monitor_printf(mon, "node %d cpus:", nodeid); +head = node->value->cpus; +for (head = node->value->cpus; head != NULL; head = head->next) { +monitor_printf(mon, " %d", (int)head->value); +} +monitor_printf(mon, "\n"); +monitor_printf(mon, "node %d size: %" PRId64 " MB\n", + nodeid, node->value->memory >> 20); +switch (node->value->policy) { +case NUMA_NODE_POLICY_DEFAULT: +policy_str = g_strdup("default"); +break; +case NUMA_NODE_POLICY_PREFERRED: +policy_str = g_strdup("preferred"); +break; +case NUMA_NODE_POLICY_MEMBIND: +policy_str = g_strdup("membind"); +break; +case NUMA_NODE_POLICY_INTERLEAVE: +policy_str = g_strdup("interleave"); +break; +default: +break; +} +monitor_printf(mon, "node %d policy: %s\n", + nodeid, policy_str ? : " "); +if (policy_str) { +free(policy_str); +} +monitor_printf(mon, "node %d relative: %s\n", nodeid, + node->value->relative ? "true" : "false"); +monitor_printf(mon, "node %d host-nodes:", nodeid); +for (head = node->value->host_nodes; head != NULL; head = head->next) { +monitor_printf(mon, " %d", (int)head->value); +} +monitor_printf(mon, "\n"); +} + +qapi_free_NUMANodeList(node_list); +} diff --git a/hmp.h b/hmp.h index 54cf71f..4f8d39b 100644 --- a/hmp.h +++ b/hmp.h @@ -37,6 +37,7 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict); void hmp_info_pci(Monitor *mon, const QDict *qdict); void hmp_info_block_jobs(Monitor *mon, const QDict *qdict); void hmp_info_tpm(Monitor *mon, const QDict *qdict); +void hmp_info_numa(Monitor *mon, const QDict *qdict); void hmp_quit(Monitor *mon, const QDict *qdict); void hmp_stop(Monitor *mon, const QDict *qdict); void hmp_system_reset(Monitor *mon, const QDict *qdict); diff --git a/monitor.c b/monitor.c index b97b7d3..f747a48 100644 --- a/monitor.c +++ b/monitor.c @@ -1989,25 +1989,6 @@ static void do_info_mtree(Monitor *mon, const QDict *qdict) mtree_info((fprintf_function)monitor_printf, mon); } -static void do_info_numa(Monitor *mon, const QDict *qdict) -{ -int i; -CPUState *cpu; - -monitor_printf(mon, "%d nodes\n", nb_numa_nodes); -for (i = 0; i < nb_numa_nodes; i++) { -monitor_printf(mon, "node %d cpus:", i); -CPU_FOREACH(cpu) { -if (cpu->numa_node == i) { -monitor_printf(mon, " %d", cpu->cpu_index); -} -} -monitor_printf(mon, "\n"); -monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i, -numa_info[i].node_mem >> 20); -} -} - #ifdef CONFIG_PROFILER int64_t qemu_time; @@ -2775,7 +2756,7 @@ static mon_cmd_t info_cmds[] = { .args_type = "", .params = "", .help = "show NUMA information", -.mhandler.cmd = do_info_numa, +.mhandler.cmd = hmp_info_numa, }, { .name = "usb", -- 1.8.5.rc3
Re: [Qemu-devel] [PATCH] fixup
Sorry, please ignore this patch. Thanks, Wanlong Gao > Signed-off-by: Wanlong Gao > --- > hw/i386/pc.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 50ed4cc..74c1f16 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -1072,8 +1072,12 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t > below_4g_mem_size, > guest_info->apic_id_limit = pc_apic_id_limit(max_cpus); > guest_info->apic_xrupt_override = kvm_allows_irq0_override(); > guest_info->numa_nodes = nb_numa_nodes; > -guest_info->node_mem = g_memdup(node_mem, guest_info->numa_nodes * > +guest_info->node_mem = g_malloc0(guest_info->numa_nodes * > sizeof *guest_info->node_mem); > +for (i = 0; i < nb_numa_nodes; i++) { > +guest_info->node_mem[i] = numa_info[i].node_mem; > +} > + > guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit * > sizeof *guest_info->node_cpu); > > @@ -1081,7 +1085,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t > below_4g_mem_size, > unsigned int apic_id = x86_cpu_apic_id_from_index(i); > assert(apic_id < guest_info->apic_id_limit); > for (j = 0; j < nb_numa_nodes; j++) { > -if (test_bit(i, node_cpumask[j])) { > +if (test_bit(i, numa_info[j].node_cpu)) { > guest_info->node_cpu[apic_id] = j; > break; > } >
[Qemu-devel] [PATCH V16 02/11] NUMA: check if the total numa memory size is equal to ram_size
If the total number of the assigned numa nodes memory is not equal to the assigned ram size, it will write the wrong data to ACPI talb, then the guest will ignore the wrong ACPI table and recognize all memory to one node. It's buggy, we should check it to ensure that we write the right data to ACPI table. Signed-off-by: Wanlong Gao --- numa.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/numa.c b/numa.c index ce7736a..beda80e 100644 --- a/numa.c +++ b/numa.c @@ -150,6 +150,16 @@ void set_numa_nodes(void) node_mem[i] = ram_size - usedmem; } +uint64_t numa_total = 0; +for (i = 0; i < nb_numa_nodes; i++) { +numa_total += node_mem[i]; +} +if (numa_total != ram_size) { +fprintf(stderr, "qemu: numa nodes total memory size " +"should equal to ram_size\n"); +exit(1); +} + for (i = 0; i < nb_numa_nodes; i++) { if (!bitmap_empty(node_cpumask[i], MAX_CPUMASK_BITS)) { break; -- 1.8.5.rc3
[Qemu-devel] [PATCH V16 10/11] NUMA: add qmp command query-numa
Add qmp command query-numa to show guest NUMA information. Reviewed-by: Luiz Capitulino Signed-off-by: Wanlong Gao --- numa.c | 66 qapi-schema.json | 36 +++ qmp-commands.hx | 49 + 3 files changed, 151 insertions(+) diff --git a/numa.c b/numa.c index 915a67a..b392190 100644 --- a/numa.c +++ b/numa.c @@ -28,6 +28,7 @@ #include "qapi/opts-visitor.h" #include "qapi/dealloc-visitor.h" #include "exec/memory.h" +#include "qmp-commands.h" #ifdef __linux__ #include @@ -327,3 +328,68 @@ void set_numa_modes(void) } } } + +NUMANodeList *qmp_query_numa(Error **errp) +{ +NUMANodeList *head = NULL, *cur_item = NULL; +CPUState *cpu; +int i; + +for (i = 0; i < nb_numa_nodes; i++) { +NUMANodeList *info; +uint16List *cur_cpu_item = NULL; +info = g_malloc0(sizeof(*info)); +info->value = g_malloc0(sizeof(*info->value)); +info->value->nodeid = i; +CPU_FOREACH(cpu) { +if (cpu->numa_node == i) { +uint16List *node_cpu = g_malloc0(sizeof(*node_cpu)); +node_cpu->value = cpu->cpu_index; + +if (!cur_cpu_item) { +info->value->cpus = cur_cpu_item = node_cpu; +} else { +cur_cpu_item->next = node_cpu; +cur_cpu_item = node_cpu; +} +} +} +info->value->memory = numa_info[i].node_mem; + +#ifdef __linux__ +info->value->policy = numa_info[i].policy; +info->value->relative = numa_info[i].relative; + +unsigned long first, next; +next = first = find_first_bit(numa_info[i].host_mem, MAX_NODES); +if (first == MAX_NODES) { +goto end; +} +uint16List *cur_node_item = g_malloc0(sizeof(*cur_node_item)); +cur_node_item->value = first; +info->value->host_nodes = cur_node_item; +do { +next = find_next_bit(numa_info[i].host_mem, MAX_NODES, + next + 1); +if (next == MAX_NODES) { +break; +} + +uint16List *host_node = g_malloc0(sizeof(*host_node)); +host_node->value = next; +cur_node_item->next = host_node; +cur_node_item = host_node; +} while (true); +end: +#endif + +if (!cur_item) { +head = cur_item = info; +} else { +cur_item->next = info; +cur_item = info; +} +} + +return head; +} diff --git a/qapi-schema.json b/qapi-schema.json index c0dad81..af947e2 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -4289,3 +4289,39 @@ '*policy': 'NumaNodePolicy', '*relative': 'bool', '*host-nodes': ['uint16'] }} + +## +# @NUMANode: +# +# Information of guest NUMA node +# +# @nodeid: NUMA node ID +# +# @cpus: VCPUs contained in this node +# +# @memory: memory size of this node +# +# @policy: memory policy of this node +# +# @relative: if host nodes are relative for memory policy +# +# @host-nodes: host nodes for its memory policy +# +# Since: 1.7 +# +## +{ 'type': 'NUMANode', + 'data': {'nodeid': 'uint16', 'cpus': ['uint16'], 'memory': 'uint64', + 'policy': 'NumaNodePolicy', 'relative': 'bool', + 'host-nodes': ['uint16'] }} + +## +# @query-numa: +# +# Returns a list of information about each guest node. +# +# Returns: a list of @NUMANode for each guest node +# +# Since: 1.7 +## +{ 'command': 'query-numa', 'returns': ['NUMANode'] } diff --git a/qmp-commands.hx b/qmp-commands.hx index fba15cd..c2bc508 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -3295,3 +3295,52 @@ Example (2): <- { "return": {} } EQMP + +{ +.name = "query-numa", +.args_type = "", +.mhandler.cmd_new = qmp_marshal_input_query_numa, +}, + +SQMP +query-numa +- + +Show NUMA information. + +Return a json-array. Each NUMA node is represented by a json-object, +which contains: + +- "nodeid": NUMA node ID (json-int) +- "cpus": a json-arry of contained VCPUs +- "memory": amount of memory in each node in Byte (json-int) +- "policy": memory policy of this node (json-string) +- "relative": if host nodes is relative for its memory policy (json-bool) +- "host-nodes": a json-array of host nodes for its memory policy + +Arguments: + +Example: + +-> { "excute": "query-numa" } +<- { "return":[ +{ +"nodeid": 0, +"cpus": [0, 1], +"memory": 536870912, +"policy": "membind", +"relative": false, +"host-nodes": [0, 1] +}, +{ +"nodeid": 1, +"cpus": [2, 3], +"memory": 536870912, +"policy": "interleave", +"relative": false, +"host-nodes": [1] +} + ] + } + +EQMP --
[Qemu-devel] [PATCH V16 04/11] NUMA: convert -numa option to use OptsVisitor
Signed-off-by: Wanlong Gao --- include/sysemu/sysemu.h | 3 +- numa.c | 148 +++- qapi-schema.json| 30 ++ vl.c| 11 +++- 4 files changed, 114 insertions(+), 78 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index d873b42..20b05a3 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -140,9 +140,10 @@ typedef struct node_info { DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); } NodeInfo; extern NodeInfo numa_info[MAX_NODES]; -void numa_add(const char *optarg); void set_numa_nodes(void); void set_numa_modes(void); +extern QemuOptsList qemu_numa_opts; +int numa_init_func(QemuOpts *opts, void *opaque); #define MAX_OPTION_ROMS 16 typedef struct QEMUOptionRom { diff --git a/numa.c b/numa.c index 1bc0fad..c4fa665 100644 --- a/numa.c +++ b/numa.c @@ -24,101 +24,97 @@ */ #include "sysemu/sysemu.h" - -static void numa_node_parse_cpus(int nodenr, const char *cpus) +#include "qapi-visit.h" +#include "qapi/opts-visitor.h" +#include "qapi/dealloc-visitor.h" +QemuOptsList qemu_numa_opts = { +.name = "numa", +.implied_opt_name = "type", +.head = QTAILQ_HEAD_INITIALIZER(qemu_numa_opts.head), +.desc = { { 0 } } /* validated with OptsVisitor */ +}; + +static int numa_node_parse(NumaNodeOptions *opts) { -char *endptr; -unsigned long long value, endvalue; - -/* Empty CPU range strings will be considered valid, they will simply - * not set any bit in the CPU bitmap. - */ -if (!*cpus) { -return; -} +uint16_t nodenr; +uint16List *cpus = NULL; -if (parse_uint(cpus, &value, &endptr, 10) < 0) { -goto error; -} -if (*endptr == '-') { -if (parse_uint_full(endptr + 1, &endvalue, 10) < 0) { -goto error; -} -} else if (*endptr == '\0') { -endvalue = value; +if (opts->has_nodeid) { +nodenr = opts->nodeid; } else { -goto error; +nodenr = nb_numa_nodes; } -if (endvalue >= MAX_CPUMASK_BITS) { -endvalue = MAX_CPUMASK_BITS - 1; -fprintf(stderr, -"qemu: NUMA: A max of %d VCPUs are supported\n", - MAX_CPUMASK_BITS); +if (nodenr >= MAX_NODES) { +fprintf(stderr, "qemu: Max number of NUMA nodes reached: %" +PRIu16 "\n", nodenr); +return -1; } -if (endvalue < value) { -goto error; +for (cpus = opts->cpus; cpus; cpus = cpus->next) { +if (cpus->value > MAX_CPUMASK_BITS) { +fprintf(stderr, "qemu: cpu number %" PRIu16 " is bigger than %d", +cpus->value, MAX_CPUMASK_BITS); +continue; +} +bitmap_set(numa_info[nodenr].node_cpu, cpus->value, 1); } -bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1); -return; +if (opts->has_mem) { +int64_t mem_size; +char *endptr; +mem_size = strtosz(opts->mem, &endptr); +if (mem_size < 0 || *endptr) { +fprintf(stderr, "qemu: invalid numa mem size: %s\n", opts->mem); +return -1; +} +numa_info[nodenr].node_mem = mem_size; +} -error: -fprintf(stderr, "qemu: Invalid NUMA CPU range: %s\n", cpus); -exit(1); +return 0; } -void numa_add(const char *optarg) +int numa_init_func(QemuOpts *opts, void *opaque) { -char option[128]; -char *endptr; -unsigned long long nodenr; - -optarg = get_opt_name(option, 128, optarg, ','); -if (*optarg == ',') { -optarg++; +NumaOptions *object = NULL; +Error *err = NULL; +int ret = 0; + +{ +OptsVisitor *ov = opts_visitor_new(opts); +visit_type_NumaOptions(opts_get_visitor(ov), &object, NULL, &err); +opts_visitor_cleanup(ov); } -if (!strcmp(option, "node")) { - -if (nb_numa_nodes >= MAX_NODES) { -fprintf(stderr, "qemu: too many NUMA nodes\n"); -exit(1); -} -if (get_param_value(option, 128, "nodeid", optarg) == 0) { -nodenr = nb_numa_nodes; -} else { -if (parse_uint_full(option, &nodenr, 10) < 0) { -fprintf(stderr, "qemu: Invalid NUMA nodeid: %s\n", option); -exit(1); -} -} - -if (nodenr >= MAX_NODES) { -fprintf(stderr, "qemu: invalid NUMA nodeid: %llu\n", nodenr); -exit(1); -} +if (error_is_set(&err)) { +fprintf(stderr, "qemu: %s\n", error_get_pretty(err)); +error_free(err); +ret = -1; +goto error; +} -if (get_param_value(option, 128, "mem", optarg) == 0) { -numa_info[nodenr].node_mem = 0; -} else { -int64_t sval; -sval = strtosz(option, &endptr); -if (sval < 0 || *endptr) { -fprintf(stderr, "qemu: invalid numa mem size: %s\n"
[Qemu-devel] [PATCH V16 03/11] NUMA: Add numa_info structure to contain numa nodes info
Add the numa_info structure to contain the numa nodes memory, VCPUs information and the future added numa nodes host memory policies. Reviewed-by: Eduardo Habkost Signed-off-by: Andre Przywara Signed-off-by: Wanlong Gao --- hw/i386/pc.c| 12 include/sysemu/sysemu.h | 8 ++-- monitor.c | 2 +- numa.c | 23 --- vl.c| 7 +++ 5 files changed, 30 insertions(+), 22 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 12c436e..74c1f16 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -670,14 +670,14 @@ static FWCfgState *bochs_bios_init(void) unsigned int apic_id = x86_cpu_apic_id_from_index(i); assert(apic_id < apic_id_limit); for (j = 0; j < nb_numa_nodes; j++) { -if (test_bit(i, node_cpumask[j])) { +if (test_bit(i, numa_info[j].node_cpu)) { numa_fw_cfg[apic_id + 1] = cpu_to_le64(j); break; } } } for (i = 0; i < nb_numa_nodes; i++) { -numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(node_mem[i]); +numa_fw_cfg[apic_id_limit + 1 + i] = cpu_to_le64(numa_info[i].node_mem); } fw_cfg_add_bytes(fw_cfg, FW_CFG_NUMA, numa_fw_cfg, (1 + apic_id_limit + nb_numa_nodes) * @@ -1072,8 +1072,12 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, guest_info->apic_id_limit = pc_apic_id_limit(max_cpus); guest_info->apic_xrupt_override = kvm_allows_irq0_override(); guest_info->numa_nodes = nb_numa_nodes; -guest_info->node_mem = g_memdup(node_mem, guest_info->numa_nodes * +guest_info->node_mem = g_malloc0(guest_info->numa_nodes * sizeof *guest_info->node_mem); +for (i = 0; i < nb_numa_nodes; i++) { +guest_info->node_mem[i] = numa_info[i].node_mem; +} + guest_info->node_cpu = g_malloc0(guest_info->apic_id_limit * sizeof *guest_info->node_cpu); @@ -1081,7 +1085,7 @@ PcGuestInfo *pc_guest_info_init(ram_addr_t below_4g_mem_size, unsigned int apic_id = x86_cpu_apic_id_from_index(i); assert(apic_id < guest_info->apic_id_limit); for (j = 0; j < nb_numa_nodes; j++) { -if (test_bit(i, node_cpumask[j])) { +if (test_bit(i, numa_info[j].node_cpu)) { guest_info->node_cpu[apic_id] = j; break; } diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 2509649..d873b42 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -9,6 +9,7 @@ #include "qapi-types.h" #include "qemu/notify.h" #include "qemu/main-loop.h" +#include "qemu/bitmap.h" /* vl.c */ @@ -134,8 +135,11 @@ extern QEMUClockType rtc_clock; #define MAX_NODES 64 #define MAX_CPUMASK_BITS 255 extern int nb_numa_nodes; -extern uint64_t node_mem[MAX_NODES]; -extern unsigned long *node_cpumask[MAX_NODES]; +typedef struct node_info { +uint64_t node_mem; +DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); +} NodeInfo; +extern NodeInfo numa_info[MAX_NODES]; void numa_add(const char *optarg); void set_numa_nodes(void); void set_numa_modes(void); diff --git a/monitor.c b/monitor.c index 845f608..b97b7d3 100644 --- a/monitor.c +++ b/monitor.c @@ -2004,7 +2004,7 @@ static void do_info_numa(Monitor *mon, const QDict *qdict) } monitor_printf(mon, "\n"); monitor_printf(mon, "node %d size: %" PRId64 " MB\n", i, -node_mem[i] >> 20); +numa_info[i].node_mem >> 20); } } diff --git a/numa.c b/numa.c index beda80e..1bc0fad 100644 --- a/numa.c +++ b/numa.c @@ -61,7 +61,7 @@ static void numa_node_parse_cpus(int nodenr, const char *cpus) goto error; } -bitmap_set(node_cpumask[nodenr], value, endvalue-value+1); +bitmap_set(numa_info[nodenr].node_cpu, value, endvalue-value+1); return; error: @@ -101,7 +101,7 @@ void numa_add(const char *optarg) } if (get_param_value(option, 128, "mem", optarg) == 0) { -node_mem[nodenr] = 0; +numa_info[nodenr].node_mem = 0; } else { int64_t sval; sval = strtosz(option, &endptr); @@ -109,7 +109,7 @@ void numa_add(const char *optarg) fprintf(stderr, "qemu: invalid numa mem size: %s\n", optarg); exit(1); } -node_mem[nodenr] = sval; +numa_info[nodenr].node_mem = sval; } if (get_param_value(option, 128, "cpus", optarg) != 0) { numa_node_parse_cpus(nodenr, option); @@ -134,7 +134,7 @@ void set_numa_nodes(void) * and distribute the available memory equally across all nodes */ for (i = 0; i < nb_numa_nodes; i++) { -if (node_mem[i] != 0) +if (numa_info[i].node_mem != 0) break; } if (i == nb_
[Qemu-devel] [PATCH V16 06/11] NUMA: add "-numa mem," options
Add "-numa mem," option like following as Paolo suggested: -numa mem,nodeid=0,size=1G This new option will make later coming memory hotplug better. We will use the new options to specify nodes memory info, and just remain "-numa node,mem=xx" as legacy. Reviewed-by: Laszlo Ersek Signed-off-by: Wanlong Gao --- include/sysemu/sysemu.h | 1 + numa.c | 36 qemu-options.hx | 6 -- vl.c| 2 ++ 4 files changed, 43 insertions(+), 2 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 20b05a3..291aa6a 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -135,6 +135,7 @@ extern QEMUClockType rtc_clock; #define MAX_NODES 64 #define MAX_CPUMASK_BITS 255 extern int nb_numa_nodes; +extern int nb_numa_mem_nodes; typedef struct node_info { uint64_t node_mem; DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); diff --git a/numa.c b/numa.c index c4fa665..c676c5e 100644 --- a/numa.c +++ b/numa.c @@ -74,6 +74,31 @@ static int numa_node_parse(NumaNodeOptions *opts) return 0; } +static int numa_mem_parse(NumaMemOptions *opts) +{ +uint16_t nodenr; +uint64_t mem_size; + +if (opts->has_nodeid) { +nodenr = opts->nodeid; +} else { +nodenr = nb_numa_mem_nodes; +} + +if (nodenr >= MAX_NODES) { +fprintf(stderr, "qemu: Max number of NUMA nodes reached: %" +PRIu16 "\n", nodenr); +return -1; +} + +if (opts->has_size) { +mem_size = opts->size; +numa_info[nodenr].node_mem = mem_size; +} + +return 0; +} + int numa_init_func(QemuOpts *opts, void *opaque) { NumaOptions *object = NULL; @@ -101,6 +126,13 @@ int numa_init_func(QemuOpts *opts, void *opaque) } nb_numa_nodes++; break; +case NUMA_OPTIONS_KIND_MEM: +ret = numa_mem_parse(object->mem); +if (ret) { +goto error; +} +nb_numa_mem_nodes++; +break; default: fprintf(stderr, "qemu: Invalid NUMA options type.\n"); ret = -1; @@ -119,6 +151,10 @@ error: void set_numa_nodes(void) { +if (nb_numa_mem_nodes > nb_numa_nodes) { +nb_numa_nodes = nb_numa_mem_nodes; +} + if (nb_numa_nodes > 0) { int i; diff --git a/qemu-options.hx b/qemu-options.hx index 8b94264..e6afb6f 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -95,11 +95,13 @@ specifies the maximum number of hotpluggable CPUs. ETEXI DEF("numa", HAS_ARG, QEMU_OPTION_numa, -"-numa node[,mem=size][,cpus=cpu[-cpu]][,nodeid=node]\n", QEMU_ARCH_ALL) +"-numa node[,nodeid=node][,cpus=cpu[-cpu]]\n" +"-numa mem[,nodeid=node][,size=size]\n" +, QEMU_ARCH_ALL) STEXI @item -numa @var{opts} @findex -numa -Simulate a multi node NUMA system. If mem and cpus are omitted, resources +Simulate a multi node NUMA system. If @var{size} and @var{cpus} are omitted, resources are split equally. ETEXI diff --git a/vl.c b/vl.c index e67f34a..064b821 100644 --- a/vl.c +++ b/vl.c @@ -250,6 +250,7 @@ static QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order); int nb_numa_nodes; +int nb_numa_mem_nodes; NodeInfo numa_info[MAX_NODES]; uint8_t qemu_uuid[16]; @@ -2817,6 +2818,7 @@ int main(int argc, char **argv, char **envp) } nb_numa_nodes = 0; +nb_numa_mem_nodes = 0; nb_nics = 0; bdrv_init_with_whitelist(); -- 1.8.5.rc3
[Qemu-devel] [PATCH V16 00/11] Add support for binding guest numa nodes to host numa nodes
As you know, QEMU can't direct it's memory allocation now, this may cause guest cross node access performance regression. And, the worse thing is that if PCI-passthrough is used, direct-attached-device uses DMA transfer between device and qemu process. All pages of the guest will be pinned by get_user_pages(). KVM_ASSIGN_PCI_DEVICE ioctl kvm_vm_ioctl_assign_device() =>kvm_assign_device() => kvm_iommu_map_memslots() => kvm_iommu_map_pages() => kvm_pin_pages() So, with direct-attached-device, all guest page's page count will be +1 and any page migration will not work. AutoNUMA won't too. So, we should set the guest nodes memory allocation policy before the pages are really mapped. According to this patch set, we are able to set guest nodes memory policy like following: -numa node,nodeid=0,cpus=0, \ -numa mem,size=1024M,policy=membind,host-nodes=0-1 \ -numa node,nodeid=1,cpus=1 \ -numa mem,size=1024M,policy=interleave,host-nodes=1 This supports "policy={default|membind|interleave|preferred},relative=true,host-nodes=N-N" like format. And add a QMP command "query-numa" to show numa info through this API. And convert the "info numa" monitor command to use this QMP command "query-numa". This version removes "set-mem-policy" qmp and hmp commands temporarily as Marcelo and Paolo suggested. The simple test is like following: = Before: # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 2 -numa node,nodeid=0,cpus=0,mem=2048 -numa node,nodeid=1,cpus=1,mem=2048 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H [1] 13320 available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 5111 MB node 0 free: 4653 MB node 1 cpus: 1 3 node 1 size: 5120 MB node 1 free: 4764 MB node distances: node 0 1 0: 10 20 1: 20 10 available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 5111 MB node 0 free: 4317 MB node 1 cpus: 1 3 node 1 size: 5120 MB node 1 free: 876 MB node distances: node 0 1 0: 10 20 1: 20 10 After: # numactl -H && /qemu/x86_64-softmmu/qemu-system-x86_64 -m 4096 -smp 4 -numa node,nodeid=0,cpus=0,cpus=2 -numa mem,size=2048M,policy=membind,host-nodes=0 -numa node,nodeid=0,cpus=1,cpus=3 -numa mem,size=2048M,policy=membind,host-nodes=1 -hda 6u4ga2.qcow2 -enable-kvm -device pci-assign,host=07:00.1,id=hostdev0,bus=pci.0,addr=0x7 & sleep 40 && numactl -H [1] 10862 available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 5111 MB node 0 free: 4718 MB node 1 cpus: 1 3 node 1 size: 5120 MB node 1 free: 4799 MB node distances: node 0 1 0: 10 20 1: 20 10 available: 2 nodes (0-1) node 0 cpus: 0 2 node 0 size: 5111 MB node 0 free: 2544 MB node 1 cpus: 1 3 node 1 size: 5120 MB node 1 free: 2725 MB node distances: node 0 1 0: 10 20 1: 20 10 === V1->V2: change to use QemuOpts in numa options (Paolo) handle Error in mpol parser (Paolo) change qmp command format to mem-policy=membind,mem-hostnode=0-1 like (Paolo) V2->V3: also handle Error in cpus parser (5/10) split out common parser from cpus and hostnode parser (Bandan 6/10) V3-V4: rebase to request for comments V4->V5: use OptVisitor and split -numa option (Paolo) - s/set-mpol/set-mem-policy (Andreas) - s/mem-policy/policy - s/mem-hostnode/host-nodes fix hmp command process after error (Luiz) add qmp command query-numa and convert info numa to it (Luiz) V5->V6: remove tabs in json file (Laszlo, Paolo) add back "-numa node,mem=xxx" as legacy (Paolo) change cpus and host-nodes to array (Laszlo, Eric) change "nodeid" to "uint16" add NumaMemPolicy enum type (Eric) rebased on Laszlo's "OptsVisitor: support / flatten integer ranges for repeating options" patch set, thanks for Laszlo's help V6-V7: change UInt16 to uint16 (Laszlo) fix a typo in adding qmp command set-mem-policy V7-V8: rebase to current master with Laszlo's V2 of OptsVisitor patch set fix an adding white space line error V8->V9: rebase to current master check if total numa memory size is equal to ram_size (Paolo) add comments to the OptsVisitor stuff in qapi-schema.json (Eric, Laszlo) replace the use of numa_num_configured_nodes() (Andrew) avoid abusing the fact i==nodeid (Andrew) V9->V10: rebase to current master remove libnuma (Andrew) MAX_NODES=64 -> MAX_NODES=128 since libnuma selected 128 (Andrew) use MAX_NODES instead of MAX_CPUMASK_BITS for host_mem bitmap (Andrew) remove a useless clear_bit() operation (Andrew) V10->V11: rebase to current master fix "maxnode" argument of mbind(2) V11->V12: rebase to current master split patch 02/11 of V11 (Eduardo) add some max value check (Eduardo) split MAX_NODES change patch (Eduardo) V12->V13: rebase to current master thanks for L
[Qemu-devel] [PATCH V16 05/11] NUMA: introduce NumaMemOptions
Signed-off-by: Wanlong Gao --- qapi-schema.json | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/qapi-schema.json b/qapi-schema.json index db539b6..1043e57 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -4223,7 +4223,8 @@ ## { 'union': 'NumaOptions', 'data': { -'node': 'NumaNodeOptions' }} +'node': 'NumaNodeOptions', +'mem' : 'NumaMemOptions' }} ## # @NumaNodeOptions @@ -4243,3 +4244,19 @@ '*nodeid': 'uint16', '*cpus': ['uint16'], '*mem':'str' }} + +## +# @NumaMemOptions +# +# Set memory information of guest NUMA node. (for OptsVisitor) +# +# @nodeid: #optional NUMA node ID +# +# @size: #optional memory size of this node +# +# Since 1.7 +## +{ 'type': 'NumaMemOptions', + 'data': { + '*nodeid': 'uint16', + '*size': 'size' }} -- 1.8.5.rc3
[Qemu-devel] [PATCH V16 08/11] NUMA: parse guest numa nodes memory policy
The memory policy setting format is like: policy={default|membind|interleave|preferred}[,relative=true],host-nodes=N-N And we are adding this setting as a suboption of "-numa mem,", the memory policy then can be set like following: -numa node,nodeid=0,cpus=0 \ -numa node,nodeid=1,cpus=1 \ -numa mem,nodeid=0,size=1G,policy=membind,host-nodes=0-1 \ -numa mem,nodeid=1,size=1G,policy=interleave,relative=true,host-nodes=1 Signed-off-by: Wanlong Gao --- include/sysemu/sysemu.h | 3 +++ numa.c | 18 ++ qapi-schema.json| 33 +++-- vl.c| 3 +++ 4 files changed, 55 insertions(+), 2 deletions(-) diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 807619e..82f1447 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -139,6 +139,9 @@ extern int nb_numa_mem_nodes; typedef struct node_info { uint64_t node_mem; DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS); +DECLARE_BITMAP(host_mem, MAX_NODES); +NumaNodePolicy policy; +bool relative; } NodeInfo; extern NodeInfo numa_info[MAX_NODES]; void set_numa_nodes(void); diff --git a/numa.c b/numa.c index c676c5e..da4dbbd 100644 --- a/numa.c +++ b/numa.c @@ -78,6 +78,7 @@ static int numa_mem_parse(NumaMemOptions *opts) { uint16_t nodenr; uint64_t mem_size; +uint16List *nodes; if (opts->has_nodeid) { nodenr = opts->nodeid; @@ -96,6 +97,23 @@ static int numa_mem_parse(NumaMemOptions *opts) numa_info[nodenr].node_mem = mem_size; } +if (opts->has_policy) { +numa_info[nodenr].policy = opts->policy; +} + +if (opts->has_relative) { +numa_info[nodenr].relative = opts->relative; +} + +for (nodes = opts->host_nodes; nodes; nodes = nodes->next) { +if (nodes->value > MAX_NODES) { +fprintf(stderr, "qemu: node number %" PRIu16 " is bigger than %d\n", +nodes->value, MAX_NODES); +continue; +} +bitmap_set(numa_info[nodenr].host_mem, nodes->value, 1); +} + return 0; } diff --git a/qapi-schema.json b/qapi-schema.json index 1043e57..c0dad81 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -4246,6 +4246,26 @@ '*mem':'str' }} ## +# @NumaNodePolicy +# +# NUMA node policy types +# +# @default: restore default policy, remove any nondefault policy +# +# @preferred: set the preferred node for allocation +# +# @membind: a strict policy that restricts memory allocation to the +# nodes specified +# +# @interleave: the page allocations is interleaved across the set +# of nodes specified +# +# Since 1.7 +## +{ 'enum': 'NumaNodePolicy', + 'data': [ 'default', 'preferred', 'membind', 'interleave' ] } + +## # @NumaMemOptions # # Set memory information of guest NUMA node. (for OptsVisitor) @@ -4254,9 +4274,18 @@ # # @size: #optional memory size of this node # +# @policy: #optional memory policy of this node +# +# @relative: #optional if the nodes specified are relative +# +# @host-nodes: #optional host nodes for its memory policy +# # Since 1.7 ## { 'type': 'NumaMemOptions', 'data': { - '*nodeid': 'uint16', - '*size': 'size' }} + '*nodeid': 'uint16', + '*size': 'size', + '*policy': 'NumaNodePolicy', + '*relative': 'bool', + '*host-nodes': ['uint16'] }} diff --git a/vl.c b/vl.c index 064b821..95d03f5 100644 --- a/vl.c +++ b/vl.c @@ -2815,6 +2815,9 @@ int main(int argc, char **argv, char **envp) for (i = 0; i < MAX_NODES; i++) { numa_info[i].node_mem = 0; bitmap_zero(numa_info[i].node_cpu, MAX_CPUMASK_BITS); +bitmap_zero(numa_info[i].host_mem, MAX_NODES); +numa_info[i].policy = NUMA_NODE_POLICY_DEFAULT; +numa_info[i].relative = false; } nb_numa_nodes = 0; -- 1.8.5.rc3
[Qemu-devel] [PATCH] net: Update netdev peer on link change
When a link change occurs on a backend (like tap), we currently do not propage such change to the nic. As a result, when someone turns off a link on a tap device, for instance, then a guest doesn't see that change and continues to try to send traffic or run DHCP even though the lower-layer is disconnected. This is OK when the network is set up as a HUB since the the guest may be connected to other HUB ports too, but when it's set up as a netdev, it makes thinkgs worse. The patch addresses this by setting the peers link down only when the peer is not a HUBPORT device. With this patch, in the following config -netdev tap,id=net0 -device e1000,mac=X,netdev=net0 when net0 link is turned off, the guest e1000 shows lower-layer link down. This allows guests to boot much faster in such configurations. With windows guest, it also allows the network to recover properly since windows will not configure the link-local IPv4 address, and when the link is turned on, the proper address address is configured. Signed-off-by: Vlad Yasevich --- net/net.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/net/net.c b/net/net.c index 0a88e68..8a084bf 100644 --- a/net/net.c +++ b/net/net.c @@ -1065,15 +1065,23 @@ void qmp_set_link(const char *name, bool up, Error **errp) nc->info->link_status_changed(nc); } -/* Notify peer. Don't update peer link status: this makes it possible to - * disconnect from host network without notifying the guest. - * FIXME: is disconnected link status change operation useful? - * - * Current behaviour is compatible with qemu vlans where there could be - * multiple clients that can still communicate with each other in - * disconnected mode. For now maintain this compatibility. */ -if (nc->peer && nc->peer->info->link_status_changed) { -nc->peer->info->link_status_changed(nc->peer); +if (nc->peer) { +/* Change peer link only if the peer is NIC and then notify peer. + * If the peer is a HUBPORT or a backend, we do not change the + * link status. + * + * This behavior is compatible with qemu vlans where there could be + * multiple clients that can still communicate with each other in + * disconnected mode. For now maintain this compatibility. + */ +if (nc->peer->info->type == NET_CLIENT_OPTIONS_KIND_NIC) { +for (i = 0; i < queues; i++) { +ncs[i]->peer->link_down = !up; +} +} +if (nc->peer->info->link_status_changed) { +nc->peer->info->link_status_changed(nc->peer); +} } } -- 1.8.4.2
Re: [Qemu-devel] [PATCH RFC 0/3] add direct support of event in qapi schema
于 2013/11/13 9:44, Wenchao Xia 写道: > This series add support for tag/keyword 'event' in qapi-schema. > The implemention doesn't generate a struture and visit function > in the background for every event, so it doesn't support nested > structure in the define to avoid trouble. > > It is on top of series: > http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg01574.html > > Wenchao Xia (3): >1 os-posix: include sys/time.h >2 qapi script: add support of event >3 tests: add test cases for qapi event support > > Makefile|9 +- > Makefile.objs |2 +- > include/sysemu/os-posix.h |2 + > qapi/Makefile.objs |1 + > scripts/qapi-event.py | 355 > +++ > tests/Makefile | 14 +- > tests/qapi-schema/qapi-schema-test.json | 12 + > tests/qapi-schema/qapi-schema-test.out | 10 +- > tests/test-qmp-event.c | 250 ++ > 9 files changed, 646 insertions(+), 9 deletions(-) > create mode 100644 scripts/qapi-event.py > create mode 100644 tests/test-qmp-event.c > Luiz, do you think this series is in the right direction?
Re: [Qemu-devel] [PATCH V5 2/5] qemu-nbd: support internal snapshot export
于 2013/11/19 19:26, Kevin Wolf 写道: Am 10.11.2013 um 23:03 hat Wenchao Xia geschrieben: Now it is possible to directly export an internal snapshot, which can be used to probe the snapshot's contents without qemu-img convert. Signed-off-by: Wenchao Xia --- block/snapshot.c | 18 + include/block/snapshot.h |8 +++ qemu-nbd.c | 47 - qemu-nbd.texi|8 ++- 4 files changed, 78 insertions(+), 3 deletions(-) diff --git a/block/snapshot.c b/block/snapshot.c index e51a7db..7cc45fa 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -25,6 +25,24 @@ #include "block/snapshot.h" #include "block/block_int.h" +QemuOptsList internal_snapshot_opts = { +.name = "snapshot", +.head = QTAILQ_HEAD_INITIALIZER(internal_snapshot_opts.head), +.desc = { +{ +.name = SNAPSHOT_OPT_ID, +.type = QEMU_OPT_STRING, +.help = "snapshot id" +},{ +.name = SNAPSHOT_OPT_NAME, +.type = QEMU_OPT_STRING, +.help = "snapshot name" +},{ +/* end of list */ +} +}, +}; + int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info, const char *name) { diff --git a/include/block/snapshot.h b/include/block/snapshot.h index d05bea7..770d9bb 100644 --- a/include/block/snapshot.h +++ b/include/block/snapshot.h @@ -27,6 +27,14 @@ #include "qemu-common.h" #include "qapi/error.h" +#include "qemu/option.h" + + +#define SNAPSHOT_OPT_BASE "snapshot." +#define SNAPSHOT_OPT_ID "snapshot.id" +#define SNAPSHOT_OPT_NAME "snapshot.name" + +extern QemuOptsList internal_snapshot_opts; typedef struct QEMUSnapshotInfo { char id_str[128]; /* unique snapshot id */ diff --git a/qemu-nbd.c b/qemu-nbd.c index c26c98e..f934eaa 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -20,6 +20,7 @@ #include "block/block.h" #include "block/nbd.h" #include "qemu/main-loop.h" +#include "block/snapshot.h" #include #include @@ -79,7 +80,14 @@ static void usage(const char *name) "\n" "Block device options:\n" " -r, --read-only export read-only\n" -" -s, --snapshot use snapshot file\n" +" -s, --snapshot use FILE as an external snapshot, create a temporary\n" +" file with backing_file=FILE, redirect the write to\n" +" the temporary one\n" +" -l, --load-snapshot=SNAPSHOT_PARAM\n" +" load an internal snapshot inside FILE and export it\n" +" as an read-only device, SNAPSHOT_PARAM format is\n" +" 'snapshot.id=[ID],snapshot.name=[NAME]', or\n" +" '[ID_OR_NAME]'\n" " -n, --nocachedisable host cache\n" " --cache=MODE set cache mode (none, writeback, ...)\n" #ifdef CONFIG_LINUX_AIO @@ -315,7 +323,9 @@ int main(int argc, char **argv) char *device = NULL; int port = NBD_DEFAULT_PORT; off_t fd_size; -const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:t"; +QemuOpts *sn_opts = NULL; +const char *sn_id_or_name = NULL; +const char *sopt = "hVb:o:p:rsnP:c:dvk:e:f:tl:"; struct option lopt[] = { { "help", 0, NULL, 'h' }, { "version", 0, NULL, 'V' }, @@ -328,6 +338,7 @@ int main(int argc, char **argv) { "connect", 1, NULL, 'c' }, { "disconnect", 0, NULL, 'd' }, { "snapshot", 0, NULL, 's' }, +{ "load-snapshot", 1, NULL, 'l' }, { "nocache", 0, NULL, 'n' }, { "cache", 1, NULL, QEMU_NBD_OPT_CACHE }, #ifdef CONFIG_LINUX_AIO @@ -428,6 +439,18 @@ int main(int argc, char **argv) errx(EXIT_FAILURE, "Offset must be positive `%s'", optarg); } break; +case 'l': +if (strncmp(optarg, SNAPSHOT_OPT_BASE, strlen(SNAPSHOT_OPT_BASE)) +== 0) { You can avoid this ugly line break by using strstart(): if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) { ... Kevin Will use strstart(), thanks for tipping.
Re: [Qemu-devel] [PATCH V5 1/5] snapshot: distinguish id and name in load_tmp
于 2013/11/19 19:20, Kevin Wolf 写道: Am 10.11.2013 um 23:03 hat Wenchao Xia geschrieben: Since later this function will be used so improve it. The only caller of it now is qemu-img, and it is not impacted by introduce function bdrv_snapshot_load_tmp_by_id_or_name() that call bdrv_snapshot_load_tmp() twice to keep old search logic. bdrv_snapshot_load_tmp_by_id_or_name() return int to let caller know the errno, and errno will be used later. Also fix a typo in comments of bdrv_snapshot_delete(). Signed-off-by: Wenchao Xia --- block/qcow2-snapshot.c| 16 ++- block/qcow2.h |5 +++- block/snapshot.c | 60 ++-- include/block/block_int.h |4 ++- include/block/snapshot.h |7 - qemu-img.c|8 - 6 files changed, 90 insertions(+), 10 deletions(-) diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c index 3529c68..aeb5efd 100644 --- a/block/qcow2-snapshot.c +++ b/block/qcow2-snapshot.c @@ -675,7 +675,10 @@ int qcow2_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab) return s->nb_snapshots; } -int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name) +int qcow2_snapshot_load_tmp(BlockDriverState *bs, +const char *snapshot_id, +const char *name, +Error **errp) { int i, snapshot_index; BDRVQcowState *s = bs->opaque; @@ -683,12 +686,17 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name) uint64_t *new_l1_table; int new_l1_bytes; int ret; +const char *device = bdrv_get_device_name(bs); This is wrong, low-level functions shouldn't need the device name for anything. assert(bs->read_only); /* Search the snapshot */ -snapshot_index = find_snapshot_by_id_or_name(bs, snapshot_name); +snapshot_index = find_snapshot_by_id_and_name(bs, snapshot_id, name); if (snapshot_index < 0) { +error_setg(errp, + "Can't find a snapshot with ID '%s' and name '%s' " + "on device '%s'", + STR_OR_NULL(snapshot_id), STR_OR_NULL(name), device); return -ENOENT; } sn = &s->snapshots[snapshot_index]; I think we already discussed the same thing in the context of a different series: The caller knows which device and which snapshot name or ID he used. The only information he really needs is: error_setg(errp, "Can't find snapshot"); If in the context of the caller's caller this isn't enough, the caller can add this information. I doubt that it's even necessary in this case. I remember that, will follow this rule. @@ -699,6 +707,10 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name) ret = bdrv_pread(bs->file, sn->l1_table_offset, new_l1_table, new_l1_bytes); if (ret < 0) { +error_setg(errp, + "Failed to read l1 table for snapshot with ID '%s' and name " + "'%s' on device '%s'", + sn->id_str, sn->name, device); g_free(new_l1_table); return ret; } diff --git a/block/qcow2.h b/block/qcow2.h index 922e190..303eb26 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -488,7 +488,10 @@ int qcow2_snapshot_delete(BlockDriverState *bs, const char *name, Error **errp); int qcow2_snapshot_list(BlockDriverState *bs, QEMUSnapshotInfo **psn_tab); -int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name); +int qcow2_snapshot_load_tmp(BlockDriverState *bs, +const char *snapshot_id, +const char *name, +Error **errp); void qcow2_free_snapshots(BlockDriverState *bs); int qcow2_read_snapshots(BlockDriverState *bs); diff --git a/block/snapshot.c b/block/snapshot.c index a05c0c0..e51a7db 100644 --- a/block/snapshot.c +++ b/block/snapshot.c @@ -194,7 +194,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs, * If only @snapshot_id is specified, delete the first one with id * @snapshot_id. * If only @name is specified, delete the first one with name @name. - * if none is specified, return -ENINVAL. + * if none is specified, return -EINVAL. * * Returns: 0 on success, -errno on failure. If @bs is not inserted, return * -ENOMEDIUM. If @snapshot_id and @name are both NULL, return -EINVAL. If @bs @@ -265,18 +265,72 @@ int bdrv_snapshot_list(BlockDriverState *bs, return -ENOTSUP; } +/** + * Temporarily load an internal snapshot by @snapshot_id and @name. + * @bs: block device used in the operation + * @snapshot_id: unique snapshot ID, or NULL + * @name: snapshot name, or NULL + * @errp: location to store error + * + * If both @snapshot_id and @name are specified, load the first one with + * id @snapshot_id and name @name. + *
Re: [Qemu-devel] Is there any new progress or plans about "ram live snapshot feature"?
于 2013/11/21 19:02, Zhanghailiang 写道: Hi, Now qemu ram live snapshot feature has some problems, it is based on ‘ram live migration’. The time of snapshot depends on completion time of migration, which is not measurable. Also It may can’t achieve migrate in some situation. I have seen discussion about that in your previous emails, and there seems to be a feasible scheme. I want to know if there is any new progress or somebody is trying to realize the program. Best Regards Hailiang Zhang Hi, Pavel Are you going to implement this feat? If no I'd like to add it in my TODO list.
Re: [Qemu-devel] [PATCH 16/27] acpi: ich9: allow guest to clear SCI rised by GPE
Igor Mammedov wrote: On Thu, 21 Nov 2013 16:32:27 +0800 Li Guang wrote: Michael S. Tsirkin wrote: On Thu, Nov 21, 2013 at 04:18:45PM +0800, Li Guang wrote: Hu Tao wrote: On Thu, Nov 21, 2013 at 09:14:18AM +0200, Michael S. Tsirkin wrote: On Thu, Nov 21, 2013 at 03:38:37AM +0100, Igor Mammedov wrote: it fixes IRQ storm since guest isn't able to lower SCI IRQ after it has been handled when it clears GPE event. Signed-off-by: Igor Mammedov The storm is only on memory hotplug right? IIRC, it happens on cpu hotplug, too. :-), that made remember EC implementation, with EC, SCI will be safer, I think. Hmm you are saying let's use EC for memory hotplug? It can be a bridge between guest and QEMU, with it, we may don't have to bother ASL writing and south-bridge hardware related work(or very little) if we implement EC correctly. Wouldn't it require guest driver though? Beauty of ASL/GPE it's that it supported by Windows and Linux out of box. it did require guest driver, but as a ACPI standard device, the driver is natively implemented by ACPI compatible OS.
Re: [Qemu-devel] [PATCH 16/27] acpi: ich9: allow guest to clear SCI rised by GPE
Michael S. Tsirkin wrote: On Thu, Nov 21, 2013 at 04:32:27PM +0800, Li Guang wrote: Michael S. Tsirkin wrote: On Thu, Nov 21, 2013 at 04:18:45PM +0800, Li Guang wrote: Hu Tao wrote: On Thu, Nov 21, 2013 at 09:14:18AM +0200, Michael S. Tsirkin wrote: On Thu, Nov 21, 2013 at 03:38:37AM +0100, Igor Mammedov wrote: it fixes IRQ storm since guest isn't able to lower SCI IRQ after it has been handled when it clears GPE event. Signed-off-by: Igor Mammedov The storm is only on memory hotplug right? IIRC, it happens on cpu hotplug, too. :-), that made remember EC implementation, with EC, SCI will be safer, I think. Hmm you are saying let's use EC for memory hotplug? It can be a bridge between guest and QEMU, with it, we may don't have to bother ASL writing and south-bridge hardware related work(or very little) if we implement EC correctly. I'd like to see that. Can you write a document (just text) for an imaginary EC support for memory hotplug? Hmm..., with EC, For memory hotplug, at least, ASL at [PATCH 27/27] can be replaced by a simple Method(_Qx) under EC device, IO base operations at [PATCH 15/27] are dispensable, we can relay data by standard operations of EC space and also for SCI, all device changes want to notify guest OS can share same SCI with EC, and the operations are specified at ACPI SPEC. likewise, for CPU hotplug, pvpanic, and even debugcon. and, for odd devices, like pvpanic, guest OS may complain about it, and we may also have to bother on maintaining state of it at QEMU, and writing a driver for guest OS, with EC, functions of device like pvpanic may be implemented silently, and EC is ACPI standard device, each ACPI compatible OS will have a driver for it natively.
[Qemu-devel] [PULL 06/11] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()
From: Peter Maydell Fix build failures with clang when KVM is not enabled by providing a stub version of kvm_arch_get_supported_cpuid(). We retain the compile time check that this function isn't called when CONFIG_KVM is not set by guarding the stub with ifndef __OPTIMIZE__ (we assume that an optimizing build will do sufficient constant folding and dead code elimination to remove the calls before linking). Signed-off-by: Peter Maydell Signed-off-by: Paolo Bonzini --- target-i386/kvm-stub.c | 12 1 file changed, 12 insertions(+) diff --git a/target-i386/kvm-stub.c b/target-i386/kvm-stub.c index 11429c4..2b9e801 100644 --- a/target-i386/kvm-stub.c +++ b/target-i386/kvm-stub.c @@ -16,3 +16,15 @@ bool kvm_allows_irq0_override(void) { return 1; } + +#ifndef __OPTIMIZE__ +/* This function is only called inside conditionals which we + * rely on the compiler to optimize out when CONFIG_KVM is not + * defined. + */ +uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function, + uint32_t index, int reg) +{ +abort(); +} +#endif -- 1.8.3.1
[Qemu-devel] [PATCH] vfio-pci: Release all MSI-X vectors when disabled
We were relying on msix_unset_vector_notifiers() to release all the vectors when we disable MSI-X, but this only happens when MSI-X is still enabled on the device. Perform further cleanup by releasing any remaining vectors listed as in-use after this call. This caused a leak of IRQ routes on hotplug depending on how the guest OS prepared the device for removal. Signed-off-by: Alex Williamson Cc: qemu-sta...@nongnu.org --- hw/misc/vfio.c | 12 1 file changed, 12 insertions(+) diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c index f7f8a19..355b018 100644 --- a/hw/misc/vfio.c +++ b/hw/misc/vfio.c @@ -878,8 +878,20 @@ static void vfio_disable_msi_common(VFIODevice *vdev) static void vfio_disable_msix(VFIODevice *vdev) { +int i; + msix_unset_vector_notifiers(&vdev->pdev); +/* + * MSI-X will only release vectors if MSI-X is still enabled on the + * device, check through the rest and release it ourselves if necessary. + */ +for (i = 0; i < vdev->nr_vectors; i++) { +if (vdev->msi_vectors[i].use) { +vfio_msix_vector_release(&vdev->pdev, i); +} +} + if (vdev->nr_vectors) { vfio_disable_irqindex(vdev, VFIO_PCI_MSIX_IRQ_INDEX); }
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
On 11/21/13 23:26, Eric Blake wrote: > On 11/21/2013 03:21 PM, Laszlo Ersek wrote: >> This patch allows the user to usefully specify >> >> -drive file=img_1,if=pflash,format=raw,readonly \ >> -drive file=img_2,if=pflash,format=raw >> >> on the command line. The flash images will be mapped under 4G in their >> reverse unit order -- that is, with their base addresses progressing >> downwards, in increasing unit order. >> > >> +/* This function maps flash drives from 4G downward, in order of their unit >> + * numbers. Addressing within one flash drive is of course not reversed. >> + * >> + * The drive with unit number 0 is mapped at the highest address, and it is >> + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not > > s/severral/several/ > I may have meant "very seveal" :) (Will fix if patch is deemed worthy otherwise.) Thanks! Laszlo
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
On 11/21/2013 03:21 PM, Laszlo Ersek wrote: > This patch allows the user to usefully specify > > -drive file=img_1,if=pflash,format=raw,readonly \ > -drive file=img_2,if=pflash,format=raw > > on the command line. The flash images will be mapped under 4G in their > reverse unit order -- that is, with their base addresses progressing > downwards, in increasing unit order. > > +/* This function maps flash drives from 4G downward, in order of their unit > + * numbers. Addressing within one flash drive is of course not reversed. > + * > + * The drive with unit number 0 is mapped at the highest address, and it is > + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not s/severral/several/ -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH for-1.7] Revert "e1000/rtl8139: update HMP NIC when every bit is written"
On Mon, Nov 18, 2013 at 04:32:34PM -0700, Alex Williamson wrote: > On Mon, 2013-11-18 at 17:55 -0500, Vlad Yasevich wrote: > > On 11/18/2013 05:40 PM, Alex Williamson wrote: > > > On Mon, 2013-11-18 at 17:07 -0500, Vlad Yasevich wrote: > > >> On 11/18/2013 04:33 PM, Alex Williamson wrote: > > >>> On Mon, 2013-11-18 at 15:57 -0500, Vlad Yasevich wrote: > > On 11/18/2013 03:33 PM, Alex Williamson wrote: > > > On Mon, 2013-11-18 at 15:09 -0500, Vlad Yasevich wrote: > > >> On 11/18/2013 02:58 PM, Alex Williamson wrote: > > >>> On Mon, 2013-11-18 at 21:47 +0200, Michael S. Tsirkin wrote: > > This reverts commit cd5be5829c1ce87aa6b3a7806524fac07ac9a757. > > Digging into hardware specs shows this does not > > actually make QEMU behave more like hardware. > > Let's stick to the tried heuristic for 1.7 and > > possibly revisit for 1.8. > > >>> > > >>> If this is broken, then so are these: > > >>> > > >>> 23c37c37f0280761072c23bf67d3a4f3c0ff25aa > > >>> 7c36507c2b8776266f50c5e2739bd18279953b93 > > >> > > >> These aren't really broken. They just assume that the high order > > >> writes will happen after the low order writes. > > >> > > >> In the case of e1000, this is a little more then an assumption > > >> (particularly in the case of nic initilization). > > > > > > But AIUI there's also a valid bit in that high order byte on e1000, so > > > reverting cd5be582 means we stuff a new mac into qemu less often, but > > > it's still only accurate some of the time. > > > > Yes, there is a slight issue with validity of mac at the time of > > processing packets. I have an outstanding question on the Intel > > list about this behavior with real HW. But, with e1000, the validity > > bit provides a much higher guarantee that a guest that will be > > setting the mac address will write the high register second to > > guarantee that when the valid bit is written, the mac is fully > > valid. As a result we don't really need the e1000 part of the > > cd5be5829. > > >>> > > >>> But doesn't that valid bit mean that a mac update will start and end > > >>> with a write to the high order register? So we're assuming: > > >>> > > >>> a) write RA + 1 (invalidate) > > >>> b) write RA (write low) > > >>> c) write RA + 1 (write high + valid) > > >>> > > >> > > >> No. On update, only steps b and c typically happen. Thus my question > > >> to the on the intel list. > > > > > > So perhaps the bit is some kind of data latch bit and the mac address > > > fields within those registers are effectively scratch until that bit is > > > written? > > > > > >>> Without cd5be5829 the only change is that we don't store a new mac into > > >>> the monitor at b). The mac stored in the monitor is still wrong from a) > > >>> until c). So it's ever so slightly less broken without cd5be5829. > > >> > > >> Since there is really no a), the mac in the monitor is only different > > >> after step b). since it's is incomplete and we expect step c), there > > >> is really no point in updating it. > > > > > > Great, so I have no argument against reverting, or just fixing, that > > > chunk of cd5be5829. Let's implement the latch bit too. > > > > > > > > >> In the case of RTL nic, it is just an assumption, but it hasn't > > >> been shown faulty yet. We do plan to address this a bit more > > >> thoroughly. > > > > > > So how is RTL less broken without cd5be582? AIUI the valid bit is off > > > in a separate register on RTL, so we have no guarantee about order of > > > updating the mac. Without cd5be582 the info in the monitor may be > > > permanently broken if the guest uses a write order other than what we > > > assume. > > > > > > > This one is actually not as bad either. RTL spec requires that > > receive register writes happen as 32 bit word writes. This is > > what linux and bsd drivers do, so from driver perspective, the > > issue is the same. What our emulation layer does is turn these > > 32 bit writes into 4 8-bit writes. This is likely due to some > > very broken and very old drivers, but I am not sure. > > > > So, the information in the monitor will be broken if the guest > > does: write_hi(); write_lo(); A part of me would really like > > to see a guest that does this :) > > >>> > > >>> So the argument for reverting here is that it seems unlikely that the > > >>> dwords get written in the hi->lo order and we'd rather the monitor get > > >>> stuck with the wrong mac forever > > >> > > >> For how many/which guests? I know it's not linux or BSD. I need to > > >> boot windows to see what it does, but I think it does the right thing. > > > > > > How many guests do you plan to test? > > > > I think the proposal was to see if anyone reports an issue :) > > > > > > > >>> than it show
[Qemu-devel] [edk2 PATCH] OvmfPkg: split the variable store to a separate file
Split the variable store off to a separate file when SPLIT_VARSTORE is defined. Even in this case, the preexistent PCDs' values don't change. Qemu must take care of contiguously mapping NVVARSTORE.fd + OVMF.fd so that when concatenated they end exactly at 4GB. Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Laszlo Ersek --- Generated with 8 lines of context for easier in-patch verification with the help of the clipboard. OvmfPkg/OvmfPkgIa32.fdf| 48 ++ OvmfPkg/OvmfPkgIa32X64.fdf | 48 ++ OvmfPkg/OvmfPkgX64.fdf | 48 ++ OvmfPkg/README | 16 4 files changed, 160 insertions(+) diff --git a/OvmfPkg/OvmfPkgIa32.fdf b/OvmfPkg/OvmfPkgIa32.fdf index c50709c..310d6a9 100644 --- a/OvmfPkg/OvmfPkgIa32.fdf +++ b/OvmfPkg/OvmfPkgIa32.fdf @@ -23,31 +23,51 @@ # [Defines] !if $(TARGET) == RELEASE !ifndef $(FD_SIZE_2MB) DEFINE FD_SIZE_1MB= !endif !endif +!ifdef $(SPLIT_VARSTORE) +!ifdef $(FD_SIZE_1MB) +[FD.NvVarStore] +BaseAddress = 0xFFF0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress +SET gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize = 0x0010 +Size = 0x2 +ErasePolarity = 1 +BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize +NumBlocks = 0x20 +!else +[FD.NvVarStore] +BaseAddress = 0xFFE0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress +SET gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize = 0x0020 +Size = 0x2 +ErasePolarity = 1 +BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize +NumBlocks = 0x20 +!endif +!else !ifdef $(FD_SIZE_1MB) [FD.OVMF] BaseAddress = 0xFFF0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress Size = 0x0010|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize ErasePolarity = 1 BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize NumBlocks = 0x100 !else [FD.OVMF] BaseAddress = 0xFFE0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress Size = 0x0020|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize ErasePolarity = 1 BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize NumBlocks = 0x200 !endif +!endif 0x|0xe000 gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFlashNvStorageVariableBase|gEfiMdeModulePkgTokenSpaceGuid.PcdFlashNvStorageVariableSize #NV_VARIABLE_STORE DATA = { ## This is the EFI_FIRMWARE_VOLUME_HEADER # ZeroVector [] 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, @@ -106,30 +126,58 @@ DATA = { # WriteQueueSize: UINT64 0xE0, 0x0F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 } 0x0001|0x0001 #NV_FTW_SPARE gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFlashNvStorageFtwSpareBase|gEfiMdeModulePkgTokenSpaceGuid.PcdFlashNvStorageFtwSpareSize +!ifdef $(SPLIT_VARSTORE) +!ifdef $(FD_SIZE_1MB) +[FD.OVMF] +BaseAddress = 0xFFF2 +Size = 0x000E +ErasePolarity = 1 +BlockSize = 0x1000 +NumBlocks = 0xE0 + +0x|0x000CC000 +FV = FVMAIN_COMPACT +0x000CC000|0x14000 +FV = SECFV +!else +[FD.OVMF] +BaseAddress = 0xFFE2 +Size = 0x001E +ErasePolarity = 1 +BlockSize = 0x1000 +NumBlocks = 0x1E0 + +0x|0x001AC000 +FV = FVMAIN_COMPACT +0x001AC000|0x34000 +FV = SECFV +!endif +!else !ifdef $(FD_SIZE_1MB) 0x0002|0x000CC000 FV = FVMAIN_COMPACT 0x000EC000|0x14000 FV = SECFV !else 0x0002|0x001AC000 FV = FVMAIN_COMPACT 0x001CC000|0x34000 FV = SECFV !endif +!endif [FD.MEMFD] BaseAddress = 0x80|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfMemFvBase Size = 0x80|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfMemFvSize ErasePolarity = 1 BlockSize = 0x1 diff --git a/OvmfPkg/OvmfPkgIa32X64.fdf b/OvmfPkg/OvmfPkgIa32X64.fdf index d65f40f..8877a89 100644 --- a/OvmfPkg/OvmfPkgIa32X64.fdf +++ b/OvmfPkg/OvmfPkgIa32X64.fdf @@ -23,31 +23,51 @@ # [Defines] !if $(TARGET) == RELEASE !ifndef $(FD_SIZE_2MB) DEFINE FD_SIZE_1MB= !endif !endif +!ifdef $(SPLIT_VARSTORE) +!ifdef $(FD_SIZE_1MB) +[FD.NvVarStore] +BaseAddress = 0xFFF0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress +SET gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize = 0x0010 +Size = 0x2 +ErasePolarity = 1 +BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize +NumBlocks = 0x20 +!else +[FD.NvVarStore] +BaseAddress = 0xFFE0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress +SET gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareFdSize = 0x0020 +Size = 0x2 +ErasePolarity = 1 +BlockSize = 0x1000|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFirmwareBlockSize +NumBlocks = 0x20 +!endif +!else !ifdef $(FD_SIZE_1MB) [FD.OVMF] BaseAddress = 0xFFF0|gUefiOvmfPkgTokenSpaceGuid.PcdOvmfFdBaseAddress Size = 0x0010|gUefi
[Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
This patch allows the user to usefully specify -drive file=img_1,if=pflash,format=raw,readonly \ -drive file=img_2,if=pflash,format=raw on the command line. The flash images will be mapped under 4G in their reverse unit order -- that is, with their base addresses progressing downwards, in increasing unit order. (The unit number increases with command line order if not explicitly specified.) This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. The binary code part would be read-only, centrally managed on the host system, and passed in as unit 0. The variable store would be writeable, VM-specific, and passed in as unit 1. ffe0-ffe1 (prio 0, R-): system.flash1 ffe2- (prio 0, R-): system.flash0 (If the guest tries to write to the flash range that is backed by the read-only drive, bdrv_write() in pflash_update() will correctly deny the write with -EACCES, and pflash_update() won't care, which suits us well.) Signed-off-by: Laszlo Ersek --- hw/i386/pc_sysfw.c | 60 -- 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c index e917c83..1c3e3d6 100644 --- a/hw/i386/pc_sysfw.c +++ b/hw/i386/pc_sysfw.c @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, memory_region_set_readonly(isa_bios, true); } +/* This function maps flash drives from 4G downward, in order of their unit + * numbers. Addressing within one flash drive is of course not reversed. + * + * The drive with unit number 0 is mapped at the highest address, and it is + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not + * supported. + * + * The caller is responsible to pass in the non-NULL @pflash_drv that + * corresponds to the flash drive with unit number 0. + */ static void pc_system_flash_init(MemoryRegion *rom_memory, DriveInfo *pflash_drv) { +int unit = 0; BlockDriverState *bdrv; int64_t size; -hwaddr phys_addr; +hwaddr phys_addr = 0x1ULL; int sector_bits, sector_size; pflash_t *system_flash; MemoryRegion *flash_mem; +char name[64]; -bdrv = pflash_drv->bdrv; -size = bdrv_getlength(pflash_drv->bdrv); sector_bits = 12; sector_size = 1 << sector_bits; -if ((size % sector_size) != 0) { -fprintf(stderr, -"qemu: PC system firmware (pflash) must be a multiple of 0x%x\n", -sector_size); -exit(1); -} +do { +bdrv = pflash_drv->bdrv; +size = bdrv_getlength(bdrv); +if ((size % sector_size) != 0) { +fprintf(stderr, +"qemu: PC system firmware (pflash segment %d) must be a " +"multiple of 0x%x\n", unit, sector_size); +exit(1); +} +if (size > phys_addr) { +fprintf(stderr, "qemu: pflash segments must fit under 4G " +"cumulatively\n"); +exit(1); +} -phys_addr = 0x1ULL - size; -system_flash = pflash_cfi01_register(phys_addr, NULL, "system.flash", size, - bdrv, sector_size, size >> sector_bits, - 1, 0x, 0x, 0x, 0x, 0); -flash_mem = pflash_cfi01_get_memory(system_flash); +phys_addr -= size; -pc_isa_bios_init(rom_memory, flash_mem, size); +/* pflash_cfi01_register() creates a deep copy of the name */ +snprintf(name, sizeof name, "system.flash%d", unit); +system_flash = pflash_cfi01_register(phys_addr, NULL /* qdev */, name, + size, bdrv, sector_size, + size >> sector_bits, + 1 /* width */, + 0x /* id0 */, + 0x /* id1 */, + 0x /* id2 */, + 0x /* id3 */, + 0 /* be */); +if (unit == 0) { +flash_mem = pflash_cfi01_get_memory(system_flash); +pc_isa_bios_init(rom_memory, flash_mem, size); +} +pflash_drv = drive_get(IF_PFLASH, 0, ++unit); +} while (pflash_drv != NULL); } static void old_pc_system_rom_init(MemoryRegion *rom_memory, bool isapc_ram_fw) -- 1.8.3.1
Re: [Qemu-devel] [PATCH 16/27] acpi: ich9: allow guest to clear SCI rised by GPE
On Thu, 21 Nov 2013 16:32:27 +0800 Li Guang wrote: > Michael S. Tsirkin wrote: > > On Thu, Nov 21, 2013 at 04:18:45PM +0800, Li Guang wrote: > > > >> Hu Tao wrote: > >> > >>> On Thu, Nov 21, 2013 at 09:14:18AM +0200, Michael S. Tsirkin wrote: > >>> > On Thu, Nov 21, 2013 at 03:38:37AM +0100, Igor Mammedov wrote: > > > it fixes IRQ storm since guest isn't able to lower SCI IRQ > > after it has been handled when it clears GPE event. > > > > Signed-off-by: Igor Mammedov > > > The storm is only on memory hotplug right? > > >>> IIRC, it happens on cpu hotplug, too. > >>> > >>> > >>> > >>> > >> :-), that made remember EC implementation, > >> with EC, SCI will be safer, I think. > >> > > Hmm you are saying let's use EC for memory hotplug? > > > > > > > It can be a bridge between guest and QEMU, > with it, we may don't have to bother ASL writing > and south-bridge hardware related work(or very > little) if we implement EC correctly. > Wouldn't it require guest driver though? Beauty of ASL/GPE it's that it supported by Windows and Linux out of box.
Re: [Qemu-devel] [PATCH 00/27 RFC v7] ACPI memory hotplug
On Thu, Nov 21, 2013 at 02:39:10PM +0100, Igor Mammedov wrote: > On Thu, 21 Nov 2013 08:20:56 +0200 > "Michael S. Tsirkin" wrote: > > > On Thu, Nov 21, 2013 at 03:38:21AM +0100, Igor Mammedov wrote: > > > --- > > > What's new since v6: > > > > > > * DIMM device is split to backend and frontend. Therefore following > > > command/options were added for supporting it: > > > > > > For memdev backend: > > > CLI: -memdev-add > > > Monitor/QMP: memdev-add > > > with options: 'id' and 'size' > > > For dimm frontend: > > > option "size" became readonly, pulling it's size from attached > > > backend > > > added option "memdev" for specifying backend by 'id' > > > > > > * Added Q35 support > > > * Added support for 32 bit guests > > > * build for i386 emulator (not tested) > > > > OK so a large patchset so did not review yet. One question > > due to the dependency on bios honouring etc/reserved-memory-end: is > > there some way to detect old BIOS and fail memory hotplug? > version negotiation between ASL and hardware could be used to that effect. > > QEMU could start with present but disabled memory hotplug and if > QEMU and BIOS ASL could come in agreement that both support it in > compatible way, it could be enabled. So at the moment there's no negotiation, is there? > > > > > --- > > > > > > This series allows to hotplug 'arbitrary' DIMM devices specifying size, > > > NUMA node mapping (guest side), slot and address where to map it, at > > > runtime. > > > > > > Due to ACPI limitation there is need to specify a number of possible > > > DIMM devices. For this task -m option was extended to support > > > following format: > > > > > > -m [mem=]RamSize[,slots=N,maxmem=M] > > > > > > To allow memory hotplug user must specify a pair of additional parameters: > > > 'slots' - number of possible increments > > > 'maxmem' - max possible total memory size QEMU is allowed to use, > > >including RamSize. > > > > > > minimal monitor command syntax to hotplug DIMM device: > > > > > > memdev-add id=memX,size=1G > > > device_add dimm,id=dimmX,memdev=memX > > > > > > DIMM device provides following properties that could be used with > > > device_add / -device to alter default behavior: > > > > > > id- unique string identifying device [mandatory] > > > slot - number in range [0-slots) [optional], if not specified > > > the first free slot is used > > > node - NUMA node id [optional] (default: 0) > > > size - amount of memory to add, readonly derived from backing memdev > > > start - guest's physical address where to plug DIMM [optional], > > > if not specified the first gap in hotplug memory region > > > that fits DIMM is used > > > > > > -device option could be used for adding potentially hotunplugable DIMMs > > > and also for specifying hotplugged DIMMs in migration case. > > > > > > Tested guests: > > > - RHEL 6x64 > > > - Windows 2012DCx64 > > > - Windows 2008DCx64 > > > - Windows 2008DCx32 > > > > > > Known limitations/bugs/TODOs: > > > - only hot-add supported > > > - max number of supported DIMM devices 255 (due to ACPI object name > > >limit), could be increased creating several containers and putting > > >DIMMs there. (exercise for future) > > > - failed hotplug action consumes 1 slot (device_add doesn't delete > > >device properly if realize failed) > > > - e820 table doesn't include DIMM devices added with -device / > > >(or after reboot devices added with device_add) > > > - Windows 2008 remembers DIMM configuration, so if DIMM with other > > >start/size is added into the same slot, it refuses to use it insisting > > >on old mapping. > > > > > > Series is based on mst's PCI tree and requires following SeaBIOS patch: > > > http://patchwork.ozlabs.org/patch/292614/ > > > on top of patches to load ACPI tables from QEMU > > > working SeaBIOS tree for testing is available at: > > > https://github.com/imammedo/seabios/commits/memhp-wip > > > > > > QEMU git tree for testing is available at: > > > https://github.com/imammedo/qemu/commits/memory-hotplug-v7 > > > > > > Example QEMU cmd line: > > > qemu-system-x86_64 -enable-kvm -monitor unix:/tmp/mon,server,nowait \ > > > -m 4096,slots=4,maxmem=8G -L /custome_seabios guest.img > > > > > > PS: > > > Windows guest requires SRAT table for hotplug to work so add extra > > > option: > > >-numa node > > > to QEMU command line. > > > > > > > > > Igor Mammedov (26): > > > acpi: factor out common pm_update_sci() into acpi core > > > rename pci_hotplug_fn to hotplug_fn and make it available for other > > > devices > > > pc: add 'etc/reserved-memory-end' fw_cfg interface for SeaBIOS > > > vl: convert -m to qemu_opts_parse() > > > qapi: add SIZE type parser to string_input_visitor > > > get reference to /backend container via qemu_get_backend() > > > add memdev backen
Re: [Qemu-devel] [RFC PATCH 2/2] rtl8139: update HMP only when the address is fully written
On 11/21/2013 01:04 PM, Vlad Yasevich wrote: > rtl8139 hardware requires 9346 config register to be set into > write mode before mac address can be changed even though it is > not documented. Every driver inspected so far appears to do > this along with comments that this is an undocumented requirement. > > We can use this to help us identify when the mac address has been > completely written. Simple set a flag whenever mac has changed s/Simple/Simply/ > and at the next transition of 9346 register from Write to Normal > mode, we update the HMP. > > Signed-off-by: Vlad Yasevich > --- Comment-only review (ie. I didn't validate the code, just fixing grammar) > +} else if (opmode == Cfg9346_Normal && s->mac_changed) { > +/* Even though it is not documented, it is required to set > + * opmode to Cfg9346_ConfigWrite when changing the mac address > + * of the card and to set to Cfg9346_Normal when done. We > + * can use this as an idication to kick off the notification event. s/idication/indication/ > -qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys); > +if (s->compat_flags & RTL8139_FLAG_MAC_COMPLETE) { > +s->mac_changed = true; > +} else if (addr == MAC0+5) { Doesn't coding style recommend s/MAC0+5/MAC0 + 5/ -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] os x boot broken by commit 11948748495841bd54721b250d68c7b3cb0475ef
Added qemu-devel, since that is where this stuff belongs now. Everyone else, sorry for the dupe... On Thu, Nov 21, 2013 at 07:14:27PM +0100, Paolo Bonzini wrote: > Can you remind us about your DSDT modifications? It should be possible > to patch the HPET and applesmc bits appropriately from QEMU (or to move > them from the DSDT to an SSDT that is built entirely in QEMU). > > It actually isn't impossible that Mac OS X would boot just fine with 1.8... My current DSDT patch (against QEMU) is enclosed below. The HPET basically needs "IRQNoFlags() {2, 8}", which causes XP to bluescreen. So, I've made it conditional on the SMC STA method returning success (0x0B). The SMC node's STA method returns 0x0B unconditionally on real hardware. So I was planning on figuring out what's easier in the context of the most recent QEMU code base: 1. dynamically generating (during qemu runtime initialization) a DSDT entry for SMC with hardcoded 0x0B STA method, whenever "--device isa-applesmc" is present on the qemu command line or 2. writing a static (compile-time) SMC node but with a slightly smarter _STA method, which returns 0x0B when "--device isa-applesmc" was given on the cmdline, or which returns 0x00 in the absence of "--device isa-applesmc". Either 1. or 2. could be used with HPET -- I can make inclusion of IRQNoFlags dependent on either the success or on the presence of SMC._STA() :) Let me know what you think. Thanks, --Gabriel ### # Modify DSDT entry for HPET: conditionally insert "IRQNoFlags() {2, 8}" into # _CRS method only if an AppleSMC DSDT node is also present and enabled (it # otherwise causes WinXP to BSOD). ### diff --git a/hw/i386/acpi-dsdt-hpet.dsl b/hw/i386/acpi-dsdt-hpet.dsl index dfde174..205cf05 100644 --- a/hw/i386/acpi-dsdt-hpet.dsl +++ b/hw/i386/acpi-dsdt-hpet.dsl @@ -38,14 +38,23 @@ Scope(\_SB) { } Return (0x0F) } -Name(_CRS, ResourceTemplate() { -#if 0 /* This makes WinXP BSOD for not yet figured reasons. */ -IRQNoFlags() {2, 8} -#endif +Name(RESP, ResourceTemplate() { Memory32Fixed(ReadOnly, 0xFED0, // Address Base 0x0400, // Address Length ) }) +Name(RESI, ResourceTemplate() { +IRQNoFlags() {2, 8} +}) +Method(_CRS, 0) { +Store(\_SB.PCI0.ISA.SMC._STA(), Local0) +If (LEqual(Local0, 0x0B)) {// AppleSMC present, add IRQ +ConcatenateResTemplate(RESP, RESI, Local1) +Return (Local1) +} else { +Return (RESP) +} +} } } ### # Add DSDT entry for AppleSMC; # TODO: find a way to make the _STA method return 0x0b only if QEMU command # line contains "-device isa-applesmc", and 0x00 otherwise! ### diff --git a/hw/i386/acpi-dsdt-isa.dsl b/hw/i386/acpi-dsdt-isa.dsl index 89caa16..b7a27bb 100644 --- a/hw/i386/acpi-dsdt-isa.dsl +++ b/hw/i386/acpi-dsdt-isa.dsl @@ -16,6 +16,28 @@ /* Common legacy ISA style devices. */ Scope(\_SB.PCI0.ISA) { +Device (SMC) { +Name(_HID, EisaId("APP0001")) +OperationRegion(SMC, SystemIO, 0x0300, 0x20) +Field(SMC, ByteAcc, NoLock, Preserve) { +Offset(0x04), +CMDP, 8, +} +Method(_STA, 0) { +//Store(0x10, CMDP)// APPLESMC_READ_CMD +//Store(CMDP, Local0) +//If (LEqual(Local0, 0x0c)) { +Return (0x0B) +//} Else { +//Return (0x00) +//} +} +Name (_CRS, ResourceTemplate () { +IO (Decode16, 0x0300, 0x0300, 0x01, 0x20) +IRQNoFlags() { 6 } +}) +} + Device(RTC) { Name(_HID, EisaId("PNP0B00")) Name(_CRS, ResourceTemplate() {
Re: [Qemu-devel] [PATCH for-1.7 0/5] acpi unit-test: added tests
Marcel Apfelbaum wrote: On Thu, 2013-11-21 at 22:20 +0100, Erik Rull wrote: Marcel Apfelbaum wrote: Added 2 tests: 1. Basic check of FACS table (missed on prev submission) 2. Compare DSDT and SSDT tables against expected values Test 2: - runs only if iasl is installed on the host machine. - the test plan: 1. Dumps the ACPI tables as AML on the disk. 2. Runs iasl to disassembly the tables into ASL files. 3. Compares them with expected offline ASL files. - the test runs for both default machine and q35. - in case the test fails, it can be easily tweaked to show the differences between the ASL files and understand the issue. Patches: 1/5 - test 1 2/5 - some infrastructure improvements 3/5 - expected asl files for test 2 4/5 - creates links for the expected files if the build directory is not current 5/5 - test 2 Which iasl Version is needed for the ACPI compilation and testing? I have an IASL installed on my build machine, but when trying to compile the ACPI stuff, it fails. Maybe it's just too old, but I didn't find a way to disable the iasl access. Must I uninstall iasl on my machine to get qemu compiled again? I would use the latest version, version 20130823, from https://acpica.org/downloads or the git from git://github.com/acpica/acpica.git I don't think you need iasl on your computer to build qemu. Hope I helped, Marcel Thanks. But then I don't understand the error that appears: CPP x86_64-softmmu/acpi-dsdt.dsl.i.orig ACPI_PREPROCESS x86_64-softmmu/acpi-dsdt.dsl.i IASL x86_64-softmmu/acpi-dsdt.dsl.i make[1]: *** [hw/i386/acpi-dsdt.hex] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 I don't find a chance to disable this access/compilation within configure. If I just missed a possible option, it would be great to point me at it. I found also when grep'ing through the sources that there is an "if" for check whether iasl is present or not. But setting --iasl= (empty) to force a removal of iasl for the qemu compilation gives a configure error. Best regards, Erik The IASL version is: Intel ACPI Component Architecture ASL Optimizing Compiler version 20060912 [Dec 20 2006] Copyright (C) 2000 - 2006 Intel Corporation Supports ACPI Specification Revision 3.0a Thanks for your support. Best regards, Erik
Re: [Qemu-devel] [PATCH for-1.7 0/5] acpi unit-test: added tests
On Thu, 2013-11-21 at 22:20 +0100, Erik Rull wrote: > Marcel Apfelbaum wrote: > > Added 2 tests: > > 1. Basic check of FACS table (missed on prev submission) > > 2. Compare DSDT and SSDT tables against expected values > > > > Test 2: > > - runs only if iasl is installed on the host machine. > > - the test plan: > > 1. Dumps the ACPI tables as AML on the disk. > > 2. Runs iasl to disassembly the tables into ASL files. > > 3. Compares them with expected offline ASL files. > > > > - the test runs for both default machine and q35. > > - in case the test fails, it can be easily tweaked to > > show the differences between the ASL files and > > understand the issue. > > > > Patches: > > 1/5 - test 1 > > 2/5 - some infrastructure improvements > > 3/5 - expected asl files for test 2 > > 4/5 - creates links for the expected files > > if the build directory is not current > > 5/5 - test 2 > > > > Which iasl Version is needed for the ACPI compilation and testing? I have > an IASL installed on my build machine, but when trying to compile the ACPI > stuff, it fails. Maybe it's just too old, but I didn't find a way to > disable the iasl access. Must I uninstall iasl on my machine to get qemu > compiled again? I would use the latest version, version 20130823, from https://acpica.org/downloads or the git from git://github.com/acpica/acpica.git I don't think you need iasl on your computer to build qemu. Hope I helped, Marcel > The IASL version is: > Intel ACPI Component Architecture > ASL Optimizing Compiler version 20060912 [Dec 20 2006] > Copyright (C) 2000 - 2006 Intel Corporation > Supports ACPI Specification Revision 3.0a > > Thanks for your support. > > Best regards, > > Erik > > >
[Qemu-devel] [PULL 02/11] configure: Explicitly set ARFLAGS so we can build with GNU Make 4.0
From: Peter Maydell Our rules.mak adds '-rR' to MAKEFLAGS to indicate that we will be explicitly specifying everything and not relying on any default variables or rules. However we were accidentally relying on the default ARFLAGS ("rv"). This went unnoticed because of a bug in GNU Make 3.82 and earlier which meant that adding -rR to MAKEFLAGS only affected submakes, not the currently running instance. Explicitly set ARFLAGS in config-host.mak, in the same way we handle CFLAGS and LDFLAGS; this will allow us to work with Make 4.0. Thanks to Paul Smith for analyzing this bug for us. Cc: qemu-sta...@nongnu.org Reported-by: Ken Moffat Signed-off-by: Peter Maydell Signed-off-by: Paolo Bonzini --- configure | 5 + 1 file changed, 5 insertions(+) diff --git a/configure b/configure index 508f6a5..ad12688 100755 --- a/configure +++ b/configure @@ -325,6 +325,9 @@ query_pkg_config() { pkg_config=query_pkg_config sdl_config="${SDL_CONFIG-${cross_prefix}sdl-config}" +# If the user hasn't specified ARFLAGS, default to 'rv', just as make does. +ARFLAGS="${ARFLAGS-rv}" + # default flags for all hosts QEMU_CFLAGS="-fno-strict-aliasing $QEMU_CFLAGS" QEMU_CFLAGS="-Wall -Wundef -Wwrite-strings -Wmissing-prototypes $QEMU_CFLAGS" @@ -3695,6 +3698,7 @@ echo "C compiler$cc" echo "Host C compiler $host_cc" echo "C++ compiler $cxx" echo "Objective-C compiler $objcc" +echo "ARFLAGS $ARFLAGS" echo "CFLAGS$CFLAGS" echo "QEMU_CFLAGS $QEMU_CFLAGS" echo "LDFLAGS $LDFLAGS" @@ -4276,6 +4280,7 @@ echo "HOST_CC=$host_cc" >> $config_host_mak echo "CXX=$cxx" >> $config_host_mak echo "OBJCC=$objcc" >> $config_host_mak echo "AR=$ar" >> $config_host_mak +echo "ARFLAGS=$ARFLAGS" >> $config_host_mak echo "AS=$as" >> $config_host_mak echo "CPP=$cpp" >> $config_host_mak echo "OBJCOPY=$objcopy" >> $config_host_mak -- 1.8.3.1
[Qemu-devel] [PULL 04/11] atomic.h: Fix build with clang
From: Peter Maydell clang defines __ATOMIC_SEQ_CST but its implementation of the __atomic_exchange() builtin differs from that of gcc. Move the __clang__ branch of the ifdef ladder to the top and fix its implementation (there is no such builtin as __sync_exchange), so we can compile with clang again. Signed-off-by: Peter Maydell Signed-off-by: Paolo Bonzini --- include/qemu/atomic.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h index 0aa8913..492bce1 100644 --- a/include/qemu/atomic.h +++ b/include/qemu/atomic.h @@ -168,14 +168,14 @@ #endif #ifndef atomic_xchg -#ifdef __ATOMIC_SEQ_CST +#if defined(__clang__) +#define atomic_xchg(ptr, i)__sync_swap(ptr, i) +#elif defined(__ATOMIC_SEQ_CST) #define atomic_xchg(ptr, i)({ \ typeof(*ptr) _new = (i), _old; \ __atomic_exchange(ptr, &_new, &_old, __ATOMIC_SEQ_CST); \ _old; \ }) -#elif defined __clang__ -#define atomic_xchg(ptr, i)__sync_exchange(ptr, i) #else /* __sync_lock_test_and_set() is documented to be an acquire barrier only. */ #define atomic_xchg(ptr, i)(smp_mb(), __sync_lock_test_and_set(ptr, i)) -- 1.8.3.1
Re: [Qemu-devel] [PATCH for-1.7 0/5] acpi unit-test: added tests
Marcel Apfelbaum wrote: Added 2 tests: 1. Basic check of FACS table (missed on prev submission) 2. Compare DSDT and SSDT tables against expected values Test 2: - runs only if iasl is installed on the host machine. - the test plan: 1. Dumps the ACPI tables as AML on the disk. 2. Runs iasl to disassembly the tables into ASL files. 3. Compares them with expected offline ASL files. - the test runs for both default machine and q35. - in case the test fails, it can be easily tweaked to show the differences between the ASL files and understand the issue. Patches: 1/5 - test 1 2/5 - some infrastructure improvements 3/5 - expected asl files for test 2 4/5 - creates links for the expected files if the build directory is not current 5/5 - test 2 Which iasl Version is needed for the ACPI compilation and testing? I have an IASL installed on my build machine, but when trying to compile the ACPI stuff, it fails. Maybe it's just too old, but I didn't find a way to disable the iasl access. Must I uninstall iasl on my machine to get qemu compiled again? The IASL version is: Intel ACPI Component Architecture ASL Optimizing Compiler version 20060912 [Dec 20 2006] Copyright (C) 2000 - 2006 Intel Corporation Supports ACPI Specification Revision 3.0a Thanks for your support. Best regards, Erik
[Qemu-devel] [Bug 1253777] [NEW] OpenBSD VM running on OpenBSD host has sleep calls taking twice as long as they should
Public bug reported: Running a script like while [ 1 ] do date sleep 1 done on the VM will result in the (correct) date being displayed, but it is displayed only every two (!) seconds. We have also noticed that if we connect to the VM's console using VNC, and move the mouse pointer constantly in the VNC window, the script runs normally with updates every second! Note that the script doesn't have to be running on the VM's console - it's also possible to (say) ssh to the VM from a separate machine and run the script and it will display the '2 second' issue, but as soon as you move the mouse pointer constantly in the VNC console window the script starts behaving normally with updates every second. I have only seen this bug when running an OpenBSD VM on an OpenBSD host. Running an OpenBSD VM on a Linux host does not exhibit the problem for me. I also belive (am told) that a Linux VM running on an OpenBSD host does not exhibit the problem. I have been using the OpenBSD 5.4 64 bit distro which comes with qemu 1.5.1 in a package, however I tried compiling qemu 1.6.1 and that has the same bug. In fact older OpenBSD distros have the same issue - going back to OpenBSD distros from two years ago still have the problem. This is not a 'new' bug recently introduced. Initially I wondered if it could be traced to an incorrectly set command line option, but I've since gone through many of the options in the man page simply trying different values (eg. different CPU types ( -cpu) , different emulated PC (-M)) but so far the problem remains. I'm quite happy to run tests in order to track this bug down better. We use qemu running on OpenBSD extensively and find it very useful! ** Affects: qemu Importance: Undecided Status: New ** Tags: openbsd sleep time -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1253777 Title: OpenBSD VM running on OpenBSD host has sleep calls taking twice as long as they should Status in QEMU: New Bug description: Running a script like while [ 1 ] do date sleep 1 done on the VM will result in the (correct) date being displayed, but it is displayed only every two (!) seconds. We have also noticed that if we connect to the VM's console using VNC, and move the mouse pointer constantly in the VNC window, the script runs normally with updates every second! Note that the script doesn't have to be running on the VM's console - it's also possible to (say) ssh to the VM from a separate machine and run the script and it will display the '2 second' issue, but as soon as you move the mouse pointer constantly in the VNC console window the script starts behaving normally with updates every second. I have only seen this bug when running an OpenBSD VM on an OpenBSD host. Running an OpenBSD VM on a Linux host does not exhibit the problem for me. I also belive (am told) that a Linux VM running on an OpenBSD host does not exhibit the problem. I have been using the OpenBSD 5.4 64 bit distro which comes with qemu 1.5.1 in a package, however I tried compiling qemu 1.6.1 and that has the same bug. In fact older OpenBSD distros have the same issue - going back to OpenBSD distros from two years ago still have the problem. This is not a 'new' bug recently introduced. Initially I wondered if it could be traced to an incorrectly set command line option, but I've since gone through many of the options in the man page simply trying different values (eg. different CPU types ( -cpu) , different emulated PC (-M)) but so far the problem remains. I'm quite happy to run tests in order to track this bug down better. We use qemu running on OpenBSD extensively and find it very useful! To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1253777/+subscriptions
Re: [Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch
On 11/21/2013 01:04 PM, Vlad Yasevich wrote: > e1000 provides a E1000_RAH_AV bit on every complete write > to the Receive Address Register. We can use this bit > 2 ways: > 1) To trigger HMP notifications. When the bit is set the > mac address is fully set and we can update the HMP. > > 2) We can turn off he bit on the write to low order bits of s/he/the/ > the Receive Address Register, so that we would not try > to match received traffic to this address when it is > not completely set. > > Signed-off-by: Vlad Yasevich > --- > hw/net/e1000.c | 11 ++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > > -if (index == RA || index == RA + 1) { > +switch (index) { > +case RA: > +/* Mask off AV bit on the write of the low dword. The write of > + * the high dword will set the bit. This way a half-written > + * mac address will not be used to filter on rx. > + */ > +s->mac_reg[RA+1] &= ~E1000_RAH_AV; Does real hardware also auto-clear this bit when writing the low word (thus forcing all drivers to write high word last to make a change take effect)? Or are we risking the case of a driver that writes high word first including the bit, and where real hardware just glitches over the temporary half-written address where our emulation locks the user out entirely? (Asked by someone that has not read the datasheet, so take with a grain of salt) -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCHv2 3/6] ui/vnc: optimize dirty bitmap tracking
On 11/21/2013 01:51 AM, Peter Lieven wrote: > vnc_update_client currently scans the dirty bitmap of each client > bitwise which is a very costly operation if only few bits are dirty. > vnc_refresh_server_surface does almost the same. > this patch optimizes both by utilizing the heavily optimized > function find_next_bit to find the offset of the next dirty > bit in the dirty bitmaps. > > The following artifical test (just the bitmap operation part) running s/artifical/artificial/ > vnc_update_client 65536 times on a 2560x2048 surface illustrates the > performance difference: > > All bits clean - vnc_update_client_new: 0.07 secs > vnc_update_client_old: 10.98 secs > > All bits dirty - vnc_update_client_new: 11.26 secs > vnc_update_client_old: 20.19 secs > > Few bits dirty - vnc_update_client_new: 0.08 secs > vnc_update_client_old: 10.98 secs > > The case for all bits dirty is still rather slow, this > is due to the implementation of find_and_clear_dirty_height. > This will be addresses in a separate patch. s/addresses/addressed/ -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
[Qemu-devel] [RFC PATCH 2/2] rtl8139: update HMP only when the address is fully written
rtl8139 hardware requires 9346 config register to be set into write mode before mac address can be changed even though it is not documented. Every driver inspected so far appears to do this along with comments that this is an undocumented requirement. We can use this to help us identify when the mac address has been completely written. Simple set a flag whenever mac has changed and at the next transition of 9346 register from Write to Normal mode, we update the HMP. Signed-off-by: Vlad Yasevich --- hw/i386/pc_piix.c| 4 hw/i386/pc_q35.c | 4 hw/net/rtl8139.c | 50 +- include/hw/i386/pc.h | 8 4 files changed, 65 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 2daa111..731ae3b 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -372,6 +372,10 @@ static QEMUMachine pc_i440fx_machine_v1_7 = { PC_I440FX_1_7_MACHINE_OPTIONS, .name = "pc-i440fx-1.7", .init = pc_init_pci_1_7, +.compat_props = (GlobalProperty[]) { +PC_COMPAT_1_7, +{ /* end of list */ } +}, }; #define PC_I440FX_1_6_MACHINE_OPTIONS PC_I440FX_1_7_MACHINE_OPTIONS diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index e2b8907..7b6aedf 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -292,6 +292,10 @@ static QEMUMachine pc_q35_machine_v1_7 = { PC_Q35_1_7_MACHINE_OPTIONS, .name = "pc-q35-1.7", .init = pc_q35_init_1_7, +.compat_props = (GlobalProperty[]) { +PC_COMPAT_1_7, +{ /* end of list */ } +}, }; #define PC_Q35_1_6_MACHINE_OPTIONS PC_Q35_1_7_MACHINE_OPTIONS diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c index 7f2b4db..5c4caec 100644 --- a/hw/net/rtl8139.c +++ b/hw/net/rtl8139.c @@ -476,6 +476,7 @@ typedef struct RTL8139State { uint16_t CpCmd; uint8_t TxThresh; +bool mac_changed; NICState *nic; NICConf conf; @@ -515,6 +516,9 @@ typedef struct RTL8139State { /* Support migration to/from old versions */ int rtl8139_mmio_io_addr_dummy; +#define RTL8139_FLAG_MAC_BIT 0 +#define RTL8139_FLAG_MAC_COMPLETE (1 << RTL8139_FLAG_MAC_BIT) +uint32_tcompat_flags; } RTL8139State; /* Writes tally counters to memory via DMA */ @@ -1215,6 +1219,7 @@ static void rtl8139_reset(DeviceState *d) /* restore MAC address */ memcpy(s->phys, s->conf.macaddr.a, 6); qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys); +s->mac_changed = false; /* reset interrupt mask */ s->IntrStatus = 0; @@ -1563,6 +1568,14 @@ static void rtl8139_Cfg9346_write(RTL8139State *s, uint32_t val) /* Reset. */ val = 0; rtl8139_reset(d); +} else if (opmode == Cfg9346_Normal && s->mac_changed) { +/* Even though it is not documented, it is required to set + * opmode to Cfg9346_ConfigWrite when changing the mac address + * of the card and to set to Cfg9346_Normal when done. We + * can use this as an idication to kick off the notification event. + */ +qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys); +s->mac_changed = false; } s->Cfg9346 = val; @@ -2743,7 +2756,12 @@ static void rtl8139_io_writeb(void *opaque, uint8_t addr, uint32_t val) { case MAC0 ... MAC0+5: s->phys[addr - MAC0] = val; -qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys); +if (s->compat_flags & RTL8139_FLAG_MAC_COMPLETE) { +s->mac_changed = true; +} else if (addr == MAC0+5) { +/* Emulate old style updates on the last write */ +qemu_format_nic_info_str(qemu_get_queue(s->nic), s->phys); +} break; case MAC0+6 ... MAC0+7: /* reserved */ @@ -3256,6 +3274,13 @@ static int rtl8139_post_load(void *opaque, int version_id) * to link status bit in BasicModeStatus */ qemu_get_queue(s->nic)->link_down = (s->BasicModeStatus & 0x04) == 0; +/* Emulate old behavior if we don't support mac change completion + * tracking + */ +if (!(s->compat_flags & RTL8139_FLAG_MAC_COMPLETE)) { +s->mac_changed = false; +} + return 0; } @@ -3286,6 +3311,24 @@ static void rtl8139_pre_save(void *opaque) s->rtl8139_mmio_io_addr_dummy = 0; } +static bool rtl8139_mac_state_needed(void *opaque) +{ +RTL8139State *s = opaque; + +return (s->compat_flags & RTL8139_FLAG_MAC_COMPLETE) && s->mac_changed; +} + +static const VMStateDescription vmstate_rtl8139_mac_state ={ +.name = "rtl8139/mac_state", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_BOOL(mac_changed, RTL8139State), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_rtl8139 = { .name = "rtl8139", .version_id = 4, @@ -3371,6 +3414,9 @
Re: [Qemu-devel] [PATCH 0/3] [PULL for 1.7?] qemu-kvm.git uq/master queue
Am 21.11.2013 14:28, schrieb Gleb Natapov: > The following changes since commit fc8ead74674b7129e8f31c2595c76658e5622197: > > Merge remote-tracking branch 'qemu-kvm/uq/master' into staging (2013-10-18 > 10:03:24 -0700) > > are available in the git repository at: > > > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master > > for you to fetch changes up to ef4cbe14342c1f63b3c754e306218f004f4e26c4: > > kvm: Fix uninitialized cpuid_data (2013-11-07 13:14:56 +0200) > > > Jan Kiszka (1): > pci-assign: Remove dead code for direct I/O region access from userspace > > Paolo Bonzini (1): > KVM: x86: fix typo in KVM_GET_XCRS > > Stefan Weil (1): > kvm: Fix uninitialized cpuid_data > > hw/i386/kvm/pci-assign.c | 56 > +--- > target-i386/kvm.c| 13 --- > 2 files changed, 14 insertions(+), 55 deletions(-) > I think these patches should be included in QEMU 1.7, too. Stefan
Re: [Qemu-devel] [PATCH for-1.7 1/2] acpi-build: fix build on glib < 2.22
Il 21/11/2013 13:17, Michael S. Tsirkin ha scritto: > g_string_vprintf was only introduced in 2.24 so switch to vsnprintf > instead. A bit uglier but name size is fixed at 4 bytes here so it's > easy. > > Reported-by: Richard Henderson > Signed-off-by: Michael S. Tsirkin > --- > hw/i386/acpi-build.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index 486e705..59a17df 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -287,16 +287,17 @@ static inline void build_append_array(GArray *array, > GArray *val) > > static void build_append_nameseg(GArray *array, const char *format, ...) > { > -GString *s = g_string_new(""); > +/* It would be nicer to use g_string_vprintf but it's only there in 2.22 > */ > +char s[] = ""; > +int len; > va_list args; > > va_start(args, format); > -g_string_vprintf(s, format, args); > +len = vsnprintf(s, sizeof s, format, args); > va_end(args); > > -assert(s->len == 4); > -g_array_append_vals(array, s->str, s->len); > -g_string_free(s, true); > +assert(len == 4); > +g_array_append_vals(array, s, len); > } > > /* 5.4 Definition Block Encoding */ > Reviewed-by: Paolo Bonzini
[Qemu-devel] [PATCH 2/3] pci-assign: Remove dead code for direct I/O region access from userspace
From: Jan Kiszka This feature was already deprecated back then in qemu-kvm, ie. before pci-assign went upstream. assigned_dev_ioport_rw will never be invoked with resource_fd < 0. Signed-off-by: Jan Kiszka Acked-by: Alex Williamson Signed-off-by: Gleb Natapov --- hw/i386/kvm/pci-assign.c | 56 +--- 1 file changed, 10 insertions(+), 46 deletions(-) diff --git a/hw/i386/kvm/pci-assign.c b/hw/i386/kvm/pci-assign.c index 011764f..4e65110 100644 --- a/hw/i386/kvm/pci-assign.c +++ b/hw/i386/kvm/pci-assign.c @@ -154,55 +154,19 @@ static uint64_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region, uint64_t val = 0; int fd = dev_region->region->resource_fd; -if (fd >= 0) { -if (data) { -DEBUG("pwrite data=%" PRIx64 ", size=%d, e_phys=" TARGET_FMT_plx - ", addr="TARGET_FMT_plx"\n", *data, size, addr, addr); -if (pwrite(fd, data, size, addr) != size) { -error_report("%s - pwrite failed %s", - __func__, strerror(errno)); -} -} else { -if (pread(fd, &val, size, addr) != size) { -error_report("%s - pread failed %s", - __func__, strerror(errno)); -val = (1UL << (size * 8)) - 1; -} -DEBUG("pread val=%" PRIx64 ", size=%d, e_phys=" TARGET_FMT_plx - ", addr=" TARGET_FMT_plx "\n", val, size, addr, addr); +if (data) { +DEBUG("pwrite data=%" PRIx64 ", size=%d, e_phys=" TARGET_FMT_plx + ", addr="TARGET_FMT_plx"\n", *data, size, addr, addr); +if (pwrite(fd, data, size, addr) != size) { +error_report("%s - pwrite failed %s", __func__, strerror(errno)); } } else { -uint32_t port = addr + dev_region->u.r_baseport; - -if (data) { -DEBUG("out data=%" PRIx64 ", size=%d, e_phys=" TARGET_FMT_plx - ", host=%x\n", *data, size, addr, port); -switch (size) { -case 1: -outb(*data, port); -break; -case 2: -outw(*data, port); -break; -case 4: -outl(*data, port); -break; -} -} else { -switch (size) { -case 1: -val = inb(port); -break; -case 2: -val = inw(port); -break; -case 4: -val = inl(port); -break; -} -DEBUG("in data=%" PRIx64 ", size=%d, e_phys=" TARGET_FMT_plx - ", host=%x\n", val, size, addr, port); +if (pread(fd, &val, size, addr) != size) { +error_report("%s - pread failed %s", __func__, strerror(errno)); +val = (1UL << (size * 8)) - 1; } +DEBUG("pread val=%" PRIx64 ", size=%d, e_phys=" TARGET_FMT_plx + ", addr=" TARGET_FMT_plx "\n", val, size, addr, addr); } return val; } -- 1.8.4.rc3
[Qemu-devel] [PATCH 0/3] [PULL] qemu-kvm.git uq/master queue
The following changes since commit fc8ead74674b7129e8f31c2595c76658e5622197: Merge remote-tracking branch 'qemu-kvm/uq/master' into staging (2013-10-18 10:03:24 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master for you to fetch changes up to ef4cbe14342c1f63b3c754e306218f004f4e26c4: kvm: Fix uninitialized cpuid_data (2013-11-07 13:14:56 +0200) Jan Kiszka (1): pci-assign: Remove dead code for direct I/O region access from userspace Paolo Bonzini (1): KVM: x86: fix typo in KVM_GET_XCRS Stefan Weil (1): kvm: Fix uninitialized cpuid_data hw/i386/kvm/pci-assign.c | 56 +--- target-i386/kvm.c| 13 --- 2 files changed, 14 insertions(+), 55 deletions(-)
Re: [Qemu-devel] [PATCH v3 0/6] Add cache mode option to qemu-iotests, and change default mode to "writeback"
On Wed, Nov 20, 2013 at 03:44:11PM +0800, Fam Zheng wrote: > This series adds cache mode option in the iotests framework. Test cases are > updated to make use of cache mode and mask supported modes. > > v3: Change _unsupported_qemu_io_options to _supported_cache_modes. > Change default mode to "writeback". > Clean up some whitespaces in the end of series. > Fix "026.out.nocache" case. > Fix 048 case on tmpfs. > > > Fam Zheng (6): > qemu-iotests: Add "-c " option > qemu-iotests: Honour cache mode in iotests.py > qemu-iotests: Add _supported_cache_modes > qemu-iotests: Change default cache mode to "writeback" > qemu-iotests: Force qcow2 in error path test in 048 > qemu-iotests: Clean up spaces in usage output > > tests/qemu-iotests/026| 2 +- > tests/qemu-iotests/039| 2 +- > tests/qemu-iotests/048| 8 +++- > tests/qemu-iotests/052| 3 +-- > tests/qemu-iotests/check | 2 +- > tests/qemu-iotests/common | 37 +++-- > tests/qemu-iotests/common.rc | 20 ++-- > tests/qemu-iotests/iotests.py | 3 ++- > 8 files changed, 50 insertions(+), 27 deletions(-) In principle I'm happy with the series but there are two instances where you are silently changing what the test does (overriding cache mode and image format). Please skip tests that cannot run instead of silently testing something else. Stefan
Re: [Qemu-devel] [PATCH 16/27] acpi: ich9: allow guest to clear SCI rised by GPE
On Thu, 21 Nov 2013 16:28:40 +0800 Hu Tao wrote: > On Thu, Nov 21, 2013 at 10:26:36AM +0200, Michael S. Tsirkin wrote: > > On Thu, Nov 21, 2013 at 04:12:22PM +0800, Hu Tao wrote: > > > On Thu, Nov 21, 2013 at 09:14:18AM +0200, Michael S. Tsirkin wrote: > > > > On Thu, Nov 21, 2013 at 03:38:37AM +0100, Igor Mammedov wrote: > > > > > it fixes IRQ storm since guest isn't able to lower SCI IRQ > > > > > after it has been handled when it clears GPE event. > > > > > > > > > > Signed-off-by: Igor Mammedov > > > > > > > > The storm is only on memory hotplug right? > > > > > > IIRC, it happens on cpu hotplug, too. > > > > So this is a bugfix then? We should include this in 1.7? > > Yes. but cpu hotplug support is not added for q35, so the bug doesn't > affect anyone, yet. > I was thinking about CPU hotplug for q35 as well, thus patch 1/27 "acpi: factor out common pm_update_sci() into acpi core" moves common for CPU/Memory hotplug ACPI bits into generic code, so making CPU hotplug for q35 would require only unifying CPU hotplug bits and plumbing it in q35 its done for mem hotplug in: [PATCH 17/27] acpi: initialize memory hotplug ACPI ICH9 hardware
[Qemu-devel] [PATCH v3 for-1.7] s390x: fix flat file load on 32 bit systems
pc-bios/s390-zipl.rom is a flat image so it's expected that loading it as elf will fail. It should fall back on loading a flat file, but doesn't on 32 bit systems, instead it fails printing: qemu: hardware error: could not load bootloader 's390-zipl.rom' The result is boot failure. The reason is that a 64 bit unsigned interger which is set to -1 on error is compared to -1UL which on a 32 bit system with gcc is a 32 bit unsigned interger. Since both are unsigned, no sign extension takes place and comparison evaluates to non-equal. There's no reason to do clever tricks: all functions we call actually return int so just use int. And then we can use == -1 everywhere, consistently. Reviewed-by: Alexander Graf Signed-off-by: Michael S. Tsirkin --- changes from v2: drop cleanups that aren't bugfixes, not needed for 1.7 changes from v1: fix more places hw/s390x/ipl.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c index d69adb2..65d39da 100644 --- a/hw/s390x/ipl.c +++ b/hw/s390x/ipl.c @@ -62,10 +62,10 @@ typedef struct S390IPLState { static int s390_ipl_init(SysBusDevice *dev) { S390IPLState *ipl = S390_IPL(dev); -ram_addr_t kernel_size = 0; +int kernel_size; if (!ipl->kernel) { -ram_addr_t bios_size = 0; +int bios_size; char *bios_filename; /* Load zipl bootloader */ @@ -80,7 +80,7 @@ static int s390_ipl_init(SysBusDevice *dev) bios_size = load_elf(bios_filename, NULL, NULL, &ipl->start_addr, NULL, NULL, 1, ELF_MACHINE, 0); -if (bios_size == -1UL) { +if (bios_size == -1) { bios_size = load_image_targphys(bios_filename, ZIPL_IMAGE_START, 4096); ipl->start_addr = ZIPL_IMAGE_START; @@ -90,17 +90,17 @@ static int s390_ipl_init(SysBusDevice *dev) } g_free(bios_filename); -if ((long)bios_size < 0) { +if (bios_size == -1) { hw_error("could not load bootloader '%s'\n", bios_name); } return 0; } else { kernel_size = load_elf(ipl->kernel, NULL, NULL, NULL, NULL, NULL, 1, ELF_MACHINE, 0); -if (kernel_size == -1UL) { +if (kernel_size == -1) { kernel_size = load_image_targphys(ipl->kernel, 0, ram_size); } -if (kernel_size == -1UL) { +if (kernel_size == -1) { fprintf(stderr, "could not load kernel '%s'\n", ipl->kernel); return -1; } @@ -115,7 +115,8 @@ static int s390_ipl_init(SysBusDevice *dev) ipl->start_addr = KERN_IMAGE_START; } if (ipl->initrd) { -ram_addr_t initrd_offset, initrd_size; +ram_addr_t initrd_offset; +int initrd_size; initrd_offset = INITRD_START; while (kernel_size + 0x10 > initrd_offset) { @@ -123,7 +124,7 @@ static int s390_ipl_init(SysBusDevice *dev) } initrd_size = load_image_targphys(ipl->initrd, initrd_offset, ram_size - initrd_offset); -if (initrd_size == -1UL) { +if (initrd_size == -1) { fprintf(stderr, "qemu: could not load initrd '%s'\n", ipl->initrd); exit(1); } -- MST
[Qemu-devel] [PATCH v2 for-1.7] s390x: fix flat file load on 32 bit systems
pc-bios/s390-zipl.rom is a flat image so it's expected that loading it as elf will fail. It should fall back on loading a flat file, but doesn't on 32 bit systems, instead it fails printing: qemu: hardware error: could not load bootloader 's390-zipl.rom' The result is boot failure. The reason is that a 64 bit unsigned interger which is set to -1 on error is compared to -1UL which on a 32 bit system with gcc is a 32 bit unsigned interger. Since both are unsigned, no sign extension takes place and comparison evaluates to non-equal. There's no reason to do clever tricks: all functions we call actually return int so just use int. In fact ram_addr_t dos not make any sense - it's meaning is "memory handle for migration". And then we can use == -1 everywhere, consistently. Signed-off-by: Michael S. Tsirkin --- Changes from v1: better fix: use int everywhere fix all places with same bug (e.g. -kernel was broken too) hw/s390x/ipl.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c index d69adb2..9570912 100644 --- a/hw/s390x/ipl.c +++ b/hw/s390x/ipl.c @@ -62,10 +62,9 @@ typedef struct S390IPLState { static int s390_ipl_init(SysBusDevice *dev) { S390IPLState *ipl = S390_IPL(dev); -ram_addr_t kernel_size = 0; if (!ipl->kernel) { -ram_addr_t bios_size = 0; +int bios_size; char *bios_filename; /* Load zipl bootloader */ @@ -80,7 +79,7 @@ static int s390_ipl_init(SysBusDevice *dev) bios_size = load_elf(bios_filename, NULL, NULL, &ipl->start_addr, NULL, NULL, 1, ELF_MACHINE, 0); -if (bios_size == -1UL) { +if (bios_size == -1) { bios_size = load_image_targphys(bios_filename, ZIPL_IMAGE_START, 4096); ipl->start_addr = ZIPL_IMAGE_START; @@ -90,17 +89,19 @@ static int s390_ipl_init(SysBusDevice *dev) } g_free(bios_filename); -if ((long)bios_size < 0) { +if (bios_size == -1) { hw_error("could not load bootloader '%s'\n", bios_name); } return 0; } else { +int kernel_size; + kernel_size = load_elf(ipl->kernel, NULL, NULL, NULL, NULL, NULL, 1, ELF_MACHINE, 0); -if (kernel_size == -1UL) { +if (kernel_size == -1) { kernel_size = load_image_targphys(ipl->kernel, 0, ram_size); } -if (kernel_size == -1UL) { +if (kernel_size == -1) { fprintf(stderr, "could not load kernel '%s'\n", ipl->kernel); return -1; } @@ -115,7 +116,8 @@ static int s390_ipl_init(SysBusDevice *dev) ipl->start_addr = KERN_IMAGE_START; } if (ipl->initrd) { -ram_addr_t initrd_offset, initrd_size; +hwaddr initrd_offset; +int initrd_size; initrd_offset = INITRD_START; while (kernel_size + 0x10 > initrd_offset) { @@ -123,7 +125,7 @@ static int s390_ipl_init(SysBusDevice *dev) } initrd_size = load_image_targphys(ipl->initrd, initrd_offset, ram_size - initrd_offset); -if (initrd_size == -1UL) { +if (initrd_size == -1) { fprintf(stderr, "qemu: could not load initrd '%s'\n", ipl->initrd); exit(1); } -- MST
Re: [Qemu-devel] [PATCH for-1.7 2/2] acpi-build: fix build on glib < 2.14
Il 21/11/2013 13:17, Michael S. Tsirkin ha scritto: > g_array_get_element_size was only added in glib 2.14. > Fortunately we don't use it for any arrays where > element size is > 1, so just add an assert. > > Reported-by: Richard Henderson > Signed-off-by: Michael S. Tsirkin > --- > hw/i386/acpi-build.c | 5 - > hw/i386/bios-linker-loader.c | 8 > 2 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index 59a17df..5f36e7e 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -425,7 +425,10 @@ static inline void *acpi_data_push(GArray *table_data, > unsigned size) > > static unsigned acpi_data_len(GArray *table) > { > -return table->len * g_array_get_element_size(table); > +#if GLIB_CHECK_VERSION(2, 14, 0) > +assert(g_array_get_element_size(table) == 1); > +#endif > +return table->len; > } > > static void acpi_align_size(GArray *blob, unsigned align) This looks good, but is the below part necessary? It's ugly! Paolo > diff --git a/hw/i386/bios-linker-loader.c b/hw/i386/bios-linker-loader.c > index 0833853..fd23611 100644 > --- a/hw/i386/bios-linker-loader.c > +++ b/hw/i386/bios-linker-loader.c > @@ -90,7 +90,7 @@ enum { > > GArray *bios_linker_loader_init(void) > { > -return g_array_new(false, true /* clear */, > sizeof(BiosLinkerLoaderEntry)); > +return g_array_new(false, true /* clear */, 1); > } > > /* Free linker wrapper and return the linker array. */ > @@ -115,7 +115,7 @@ void bios_linker_loader_alloc(GArray *linker, > BIOS_LINKER_LOADER_ALLOC_ZONE_HIGH); > > /* Alloc entries must come first, so prepend them */ > -g_array_prepend_val(linker, entry); > +g_array_prepend_vals(linker, &entry, sizeof entry); > } > > void bios_linker_loader_add_checksum(GArray *linker, const char *file, > @@ -132,7 +132,7 @@ void bios_linker_loader_add_checksum(GArray *linker, > const char *file, > entry.cksum.start = cpu_to_le32((uint8_t *)start - (uint8_t *)table); > entry.cksum.length = cpu_to_le32(size); > > -g_array_append_val(linker, entry); > +g_array_append_vals(linker, &entry, sizeof entry); > } > > void bios_linker_loader_add_pointer(GArray *linker, > @@ -154,5 +154,5 @@ void bios_linker_loader_add_pointer(GArray *linker, > assert(pointer_size == 1 || pointer_size == 2 || > pointer_size == 4 || pointer_size == 8); > > -g_array_append_val(linker, entry); > +g_array_append_vals(linker, &entry, sizeof entry); > } >
[Qemu-devel] [PATCH 2/3] usb-host-libusb: Add alloc / free streams ops
Signed-off-by: Hans de Goede --- hw/usb/host-libusb.c | 50 ++ 1 file changed, 50 insertions(+) diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c index 0dd60b2..894875b 100644 --- a/hw/usb/host-libusb.c +++ b/hw/usb/host-libusb.c @@ -1280,6 +1280,54 @@ static void usb_host_handle_reset(USBDevice *udev) } } +static int usb_host_alloc_streams(USBDevice *udev, USBEndpoint **eps, + int nr_eps, int streams) +{ +#if LIBUSBX_API_VERSION >= 0x01000103 +USBHostDevice *s = USB_HOST_DEVICE(udev); +unsigned char endpoints[30]; +int i, rc; + +for (i = 0; i < nr_eps; i++) { +endpoints[i] = eps[i]->nr; +if (eps[i]->pid == USB_TOKEN_IN) { +endpoints[i] |= 0x80; +} +} +rc = libusb_alloc_streams(s->dh, streams, endpoints, nr_eps); +if (rc < 0) { +usb_host_libusb_error("libusb_alloc_streams", rc); +} else if (rc != streams) { +fprintf(stderr, +"libusb_alloc_streams: got less streams then requested %d < %d\n", +rc, streams); +} + +return (rc == streams) ? 0 : -1; +#else +fprintf(stderr, "libusb_alloc_streams: error not implemented\n"); +return -1; +#endif +} + +static void usb_host_free_streams(USBDevice *udev, USBEndpoint **eps, + int nr_eps) +{ +#if LIBUSBX_API_VERSION >= 0x01000103 +USBHostDevice *s = USB_HOST_DEVICE(udev); +unsigned char endpoints[30]; +int i; + +for (i = 0; i < nr_eps; i++) { +endpoints[i] = eps[i]->nr; +if (eps[i]->pid == USB_TOKEN_IN) { +endpoints[i] |= 0x80; +} +} +libusb_free_streams(s->dh, endpoints, nr_eps); +#endif +} + /* * This is *NOT* about restoring state. We have absolutely no idea * what state the host device is in at the moment and whenever it is @@ -1361,6 +1409,8 @@ static void usb_host_class_initfn(ObjectClass *klass, void *data) uc->handle_reset = usb_host_handle_reset; uc->handle_destroy = usb_host_handle_destroy; uc->flush_ep_queue = usb_host_flush_ep_queue; +uc->alloc_streams = usb_host_alloc_streams; +uc->free_streams = usb_host_free_streams; dc->vmsd = &vmstate_usb_host; dc->props = usb_host_dev_properties; set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); -- 1.8.4.2
Re: [Qemu-devel] [PATCH v3 5/6] qemu-iotests: Force qcow2 in error path test in 048
On 2013年11月21日 20:41, Stefan Hajnoczi wrote: On Wed, Nov 20, 2013 at 03:44:16PM +0800, Fam Zheng wrote: The "raw" doesn't always work on certain file systems (e.g. tmpfs). Use qcow2 to make the allocation status explicit. Signed-off-by: Fam Zheng --- tests/qemu-iotests/048 | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/048 b/tests/qemu-iotests/048 index 9def7fc..0fb5d2c 100755 --- a/tests/qemu-iotests/048 +++ b/tests/qemu-iotests/048 @@ -88,7 +88,13 @@ event = "read_aio" errno = "5" once ="off" EOF -_make_test_img $size +if [ "$IMGFMT" = "raw" ]; then +# For raw the allocation status for new image depends on file system, force +# qcow2 in this case, since the tested code is in block layer +IMGFMT=qcow2 _make_test_img $size | _filter_imgfmt If it can't be tested, skip the test. Don't silently test something else. OK. Do you think I should split the test? Since we don't have conditional reference output. Fam
Re: [Qemu-devel] [PATCH for-1.7] s390x: fix flat rom load on 32 bit systems
On 21.11.2013, at 13:24, Cornelia Huck wrote: > On Thu, 21 Nov 2013 14:08:22 +0200 > "Michael S. Tsirkin" wrote: > >> pc-bios/s390-zipl.rom is a flat image so it's expected that >> loading it as elf will fail. >> It should fall back on loading a flat file, but doesn't >> on 32 bit systems, instead it fails printing: >>qemu: hardware error: could not load bootloader 's390-zipl.rom' >> >> The result is boot failure. >> >> The reason is that a 64 bit unsigned interger which is set >> to -1 on error is compared to -1UL which on a 32 bit system >> with gcc is a 32 bit unsigned interger. >> Since both are unsigned, no sign extension takes place and >> comparison evaluates to non-equal. >> >> There's no reason to do clever tricks: -1 will cause >> sign extension to happen correctly automatically. >> >> Signed-off-by: Michael S. Tsirkin >> --- >> hw/s390x/ipl.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c >> index d69adb2..88115e9 100644 >> --- a/hw/s390x/ipl.c >> +++ b/hw/s390x/ipl.c >> @@ -80,7 +80,7 @@ static int s390_ipl_init(SysBusDevice *dev) >> >> bios_size = load_elf(bios_filename, NULL, NULL, &ipl->start_addr, >> NULL, >> NULL, 1, ELF_MACHINE, 0); >> -if (bios_size == -1UL) { >> +if (bios_size == -1) { >> bios_size = load_image_targphys(bios_filename, ZIPL_IMAGE_START, >> 4096); >> ipl->start_addr = ZIPL_IMAGE_START; > > Makes sense, but doesn't the kernel loader just below suffer from just > the same problem? Yes, initrd too. Alex
Re: [Qemu-devel] [PATCH 00/27 RFC v7] ACPI memory hotplug
On Thu, 21 Nov 2013 08:20:56 +0200 "Michael S. Tsirkin" wrote: > On Thu, Nov 21, 2013 at 03:38:21AM +0100, Igor Mammedov wrote: > > --- > > What's new since v6: > > > > * DIMM device is split to backend and frontend. Therefore following > > command/options were added for supporting it: > > > > For memdev backend: > > CLI: -memdev-add > > Monitor/QMP: memdev-add > > with options: 'id' and 'size' > > For dimm frontend: > > option "size" became readonly, pulling it's size from attached backend > > added option "memdev" for specifying backend by 'id' > > > > * Added Q35 support > > * Added support for 32 bit guests > > * build for i386 emulator (not tested) > > OK so a large patchset so did not review yet. One question > due to the dependency on bios honouring etc/reserved-memory-end: is > there some way to detect old BIOS and fail memory hotplug? version negotiation between ASL and hardware could be used to that effect. QEMU could start with present but disabled memory hotplug and if QEMU and BIOS ASL could come in agreement that both support it in compatible way, it could be enabled. > > > --- > > > > This series allows to hotplug 'arbitrary' DIMM devices specifying size, > > NUMA node mapping (guest side), slot and address where to map it, at > > runtime. > > > > Due to ACPI limitation there is need to specify a number of possible > > DIMM devices. For this task -m option was extended to support > > following format: > > > > -m [mem=]RamSize[,slots=N,maxmem=M] > > > > To allow memory hotplug user must specify a pair of additional parameters: > > 'slots' - number of possible increments > > 'maxmem' - max possible total memory size QEMU is allowed to use, > >including RamSize. > > > > minimal monitor command syntax to hotplug DIMM device: > > > > memdev-add id=memX,size=1G > > device_add dimm,id=dimmX,memdev=memX > > > > DIMM device provides following properties that could be used with > > device_add / -device to alter default behavior: > > > > id- unique string identifying device [mandatory] > > slot - number in range [0-slots) [optional], if not specified > > the first free slot is used > > node - NUMA node id [optional] (default: 0) > > size - amount of memory to add, readonly derived from backing memdev > > start - guest's physical address where to plug DIMM [optional], > > if not specified the first gap in hotplug memory region > > that fits DIMM is used > > > > -device option could be used for adding potentially hotunplugable DIMMs > > and also for specifying hotplugged DIMMs in migration case. > > > > Tested guests: > > - RHEL 6x64 > > - Windows 2012DCx64 > > - Windows 2008DCx64 > > - Windows 2008DCx32 > > > > Known limitations/bugs/TODOs: > > - only hot-add supported > > - max number of supported DIMM devices 255 (due to ACPI object name > >limit), could be increased creating several containers and putting > >DIMMs there. (exercise for future) > > - failed hotplug action consumes 1 slot (device_add doesn't delete > >device properly if realize failed) > > - e820 table doesn't include DIMM devices added with -device / > >(or after reboot devices added with device_add) > > - Windows 2008 remembers DIMM configuration, so if DIMM with other > >start/size is added into the same slot, it refuses to use it insisting > >on old mapping. > > > > Series is based on mst's PCI tree and requires following SeaBIOS patch: > > http://patchwork.ozlabs.org/patch/292614/ > > on top of patches to load ACPI tables from QEMU > > working SeaBIOS tree for testing is available at: > > https://github.com/imammedo/seabios/commits/memhp-wip > > > > QEMU git tree for testing is available at: > > https://github.com/imammedo/qemu/commits/memory-hotplug-v7 > > > > Example QEMU cmd line: > > qemu-system-x86_64 -enable-kvm -monitor unix:/tmp/mon,server,nowait \ > > -m 4096,slots=4,maxmem=8G -L /custome_seabios guest.img > > > > PS: > > Windows guest requires SRAT table for hotplug to work so add extra option: > >-numa node > > to QEMU command line. > > > > > > Igor Mammedov (26): > > acpi: factor out common pm_update_sci() into acpi core > > rename pci_hotplug_fn to hotplug_fn and make it available for other > > devices > > pc: add 'etc/reserved-memory-end' fw_cfg interface for SeaBIOS > > vl: convert -m to qemu_opts_parse() > > qapi: add SIZE type parser to string_input_visitor > > get reference to /backend container via qemu_get_backend() > > add memdev backend infrastructure > > dimm: map DimmDevice into DimBus provided address space > > dimm: add busy slot check and slot auto-allocation > > dimm: add busy address check and address auto-allocation > > dimm: add hotplug callback to DimmBus > > acpi: memory hotplug ACPI hardware implementation > > acpi: initialize memory hotplug ACPI PIIX4 hardware > >
Re: [Qemu-devel] [PATCH for-1.7] s390x: fix flat rom load on 32 bit systems
On Thu, Nov 21, 2013 at 01:24:13PM +0100, Cornelia Huck wrote: > On Thu, 21 Nov 2013 14:08:22 +0200 > "Michael S. Tsirkin" wrote: > > > pc-bios/s390-zipl.rom is a flat image so it's expected that > > loading it as elf will fail. > > It should fall back on loading a flat file, but doesn't > > on 32 bit systems, instead it fails printing: > > qemu: hardware error: could not load bootloader 's390-zipl.rom' > > > > The result is boot failure. > > > > The reason is that a 64 bit unsigned interger which is set > > to -1 on error is compared to -1UL which on a 32 bit system > > with gcc is a 32 bit unsigned interger. > > Since both are unsigned, no sign extension takes place and > > comparison evaluates to non-equal. > > > > There's no reason to do clever tricks: -1 will cause > > sign extension to happen correctly automatically. > > > > Signed-off-by: Michael S. Tsirkin > > --- > > hw/s390x/ipl.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/hw/s390x/ipl.c b/hw/s390x/ipl.c > > index d69adb2..88115e9 100644 > > --- a/hw/s390x/ipl.c > > +++ b/hw/s390x/ipl.c > > @@ -80,7 +80,7 @@ static int s390_ipl_init(SysBusDevice *dev) > > > > bios_size = load_elf(bios_filename, NULL, NULL, &ipl->start_addr, > > NULL, > > NULL, 1, ELF_MACHINE, 0); > > -if (bios_size == -1UL) { > > +if (bios_size == -1) { > > bios_size = load_image_targphys(bios_filename, > > ZIPL_IMAGE_START, > > 4096); > > ipl->start_addr = ZIPL_IMAGE_START; > > Makes sense, but doesn't the kernel loader just below suffer from just > the same problem? Yes, v2 fixes this.
[Qemu-devel] [PATCH 1/3] KVM: x86: fix typo in KVM_GET_XCRS
From: Paolo Bonzini Only the first item of the array was ever looked at. No practical effect, but still worth fixing. Signed-off-by: Paolo Bonzini Signed-off-by: Gleb Natapov --- target-i386/kvm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 749aa09..27071e3 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1314,8 +1314,8 @@ static int kvm_get_xcrs(X86CPU *cpu) for (i = 0; i < xcrs.nr_xcrs; i++) { /* Only support xcr0 now */ -if (xcrs.xcrs[0].xcr == 0) { -env->xcr0 = xcrs.xcrs[0].value; +if (xcrs.xcrs[i].xcr == 0) { +env->xcr0 = xcrs.xcrs[i].value; break; } } -- 1.8.4.rc3
Re: [Qemu-devel] [PATCH v3 5/6] qemu-iotests: Force qcow2 in error path test in 048
On Wed, Nov 20, 2013 at 03:44:16PM +0800, Fam Zheng wrote: > The "raw" doesn't always work on certain file systems (e.g. tmpfs). Use > qcow2 to make the allocation status explicit. > > Signed-off-by: Fam Zheng > --- > tests/qemu-iotests/048 | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/tests/qemu-iotests/048 b/tests/qemu-iotests/048 > index 9def7fc..0fb5d2c 100755 > --- a/tests/qemu-iotests/048 > +++ b/tests/qemu-iotests/048 > @@ -88,7 +88,13 @@ event = "read_aio" > errno = "5" > once ="off" > EOF > -_make_test_img $size > +if [ "$IMGFMT" = "raw" ]; then > +# For raw the allocation status for new image depends on file system, > force > +# qcow2 in this case, since the tested code is in block layer > +IMGFMT=qcow2 _make_test_img $size | _filter_imgfmt If it can't be tested, skip the test. Don't silently test something else. Stefan
[Qemu-devel] [PATCH for-1.7 5/5] acpi unit-test: compare DSDT and SSDT tables against expected values
This test will run only if iasl is installed on the host machine. The test plan: 1. Dumps the ACPI tables as AML on the disk. 2. Runs iasl to disassembly the tables into ASL files. 3. Compares them with expected offline ASL files. The test runs for both default machine and q35. In case the test fails, it can be easily tweaked to show the differences between the ASL files and understand the issue. Signed-off-by: Marcel Apfelbaum --- tests/acpi-test.c | 231 +- 1 file changed, 211 insertions(+), 20 deletions(-) diff --git a/tests/acpi-test.c b/tests/acpi-test.c index ca83b1d..d08720d 100644 --- a/tests/acpi-test.c +++ b/tests/acpi-test.c @@ -18,14 +18,26 @@ #include "qemu/compiler.h" #include "hw/i386/acpi-defs.h" +#define MACHINE_PC "pc" +#define MACHINE_Q35 "q35" + /* DSDT and SSDTs format */ typedef struct { AcpiTableHeader header; -uint8_t *aml; -int aml_len; -} AcpiSdtTable; +gchar *aml;/* aml bytecode from guest */ +gsize aml_len; +gchar *aml_file; +gchar *asl;/* asl code generated from aml */ +gsize asl_len; +gchar *asl_file; +gchar *exp_aml;/* expected asl from test file */ +gsize exp_aml_len; +gchar *exp_asl;/* expected asl from test file */ +gsize exp_asl_len; +} QEMU_PACKED AcpiSdtTable; typedef struct { +const char *machine; uint32_t rsdp_addr; AcpiRsdpDescriptor rsdp_table; AcpiRsdtDescriptorRev1 rsdt_table; @@ -33,8 +45,7 @@ typedef struct { AcpiFacsDescriptorRev1 facs_table; uint32_t *rsdt_tables_addr; int rsdt_tables_nr; -AcpiSdtTable dsdt_table; -GArray *ssdt_tables; +GArray *ssdt_tables; /* first is DSDT */ } test_data; #define LOW(x) ((x) & 0xff) @@ -91,8 +102,10 @@ typedef struct { /* Boot sector code: write SIGNATURE into memory, * then halt. + * Q35 machine requires a minimum 0x7e000 bytes disk. + * (bug or feature?) */ -static uint8_t boot_sector[0x200] = { +static uint8_t boot_sector[0x7e000] = { /* 7c00: mov $0xdead,%ax */ [0x00] = 0xb8, [0x01] = LOW(SIGNATURE), @@ -117,17 +130,37 @@ static uint8_t boot_sector[0x200] = { }; static const char *disk = "tests/acpi-test-disk.raw"; +static const char *data_dir = "tests/acpi-test-data"; static void free_test_data(test_data *data) { +AcpiSdtTable *temp; int i; g_free(data->rsdt_tables_addr); + for (i = 0; i < data->ssdt_tables->len; ++i) { -g_free(g_array_index(data->ssdt_tables, AcpiSdtTable, i).aml); +temp = &g_array_index(data->ssdt_tables, AcpiSdtTable, i); +if (temp->aml) { +g_free(temp->aml); +} +if (temp->aml_file) { +unlink(temp->aml_file); +g_free(temp->aml_file); +} +if (temp->asl) { +g_free(temp->asl); +} +if (temp->asl_file) { +unlink(temp->asl_file); +g_free(temp->asl_file); +} +if (temp->exp_asl) { +g_free(temp->exp_asl); +} } + g_array_free(data->ssdt_tables, false); -g_free(data->dsdt_table.aml); } static uint8_t acpi_checksum(const uint8_t *data, int len) @@ -292,34 +325,178 @@ static void test_dst_table(AcpiSdtTable *sdt_table, uint32_t addr) ACPI_READ_ARRAY_PTR(sdt_table->aml, sdt_table->aml_len, addr); checksum = acpi_checksum((uint8_t *)sdt_table, sizeof(AcpiTableHeader)) + - acpi_checksum(sdt_table->aml, sdt_table->aml_len); + acpi_checksum((uint8_t *)sdt_table->aml, sdt_table->aml_len); g_assert(!checksum); } static void test_acpi_dsdt_table(test_data *data) { -AcpiSdtTable *dsdt_table = &data->dsdt_table; +AcpiSdtTable dsdt_table; uint32_t addr = data->fadt_table.dsdt; -test_dst_table(dsdt_table, addr); -g_assert_cmphex(dsdt_table->header.signature, ==, ACPI_DSDT_SIGNATURE); +memset(&dsdt_table, 0, sizeof(dsdt_table)); +data->ssdt_tables = g_array_new(false, true, sizeof(AcpiSdtTable)); + +test_dst_table(&dsdt_table, addr); +g_assert_cmphex(dsdt_table.header.signature, ==, ACPI_DSDT_SIGNATURE); + +/* Place DSDT first */ +g_array_append_val(data->ssdt_tables, dsdt_table); } static void test_acpi_ssdt_tables(test_data *data) { -GArray *ssdt_tables; int ssdt_tables_nr = data->rsdt_tables_nr - 1; /* fadt is first */ int i; -ssdt_tables = g_array_sized_new(false, true, sizeof(AcpiSdtTable), -ssdt_tables_nr); for (i = 0; i < ssdt_tables_nr; i++) { AcpiSdtTable ssdt_table; + +memset(&ssdt_table, 0 , sizeof(ssdt_table)); uint32_t addr = data->rsdt_tables_addr[i + 1]; /* fadt is first */ test_dst_table(&ssdt_table, addr); -g_array_append_val(ssdt_tables, ssdt_table); +g_array_append_val(data->ssdt_tables, ssdt_table); +} +} + +static bool iasl_installed(void)
Re: [Qemu-devel] GIT master fails compilation for ACPI
Hu Tao wrote: On Thu, Nov 21, 2013 at 09:06:43AM +0100, Erik Rull wrote: Hi all, when doing a git clone on the latest master, it fails compiling: CCx86_64-softmmu/memory_mapping.o CCx86_64-softmmu/dump.o CCx86_64-softmmu/xen-stub.o CCx86_64-softmmu/hw/i386/multiboot.o CCx86_64-softmmu/hw/i386/smbios.o CCx86_64-softmmu/hw/i386/pc.o CCx86_64-softmmu/hw/i386/pc_piix.o CCx86_64-softmmu/hw/i386/pc_q35.o CCx86_64-softmmu/hw/i386/pc_sysfw.o CCx86_64-softmmu/hw/i386/kvmvapic.o CPP x86_64-softmmu/acpi-dsdt.dsl.i.orig ACPI_PREPROCESS x86_64-softmmu/acpi-dsdt.dsl.i IASL x86_64-softmmu/acpi-dsdt.dsl.i make[1]: *** [hw/i386/acpi-dsdt.hex] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 erik@debian:~/qemu-test/qemu$ My configure options are: ./configure --prefix= --target-list=x86_64-softmmu --disable-vnc-png Are you sure about --prefix= and... --disable-vnc-jpeg --disable-vnc-tls --disable-vnc-sasl --audio-drv-list= ... --audio-drv-list= ? --enable-sdl --disable-xen --disable-brlapi --disable-bluez --disable-curl --disable-guest-agent --disable-spice I can't even do a configure with exactly the above options, it gives an error message: ERROR: unknown option --disable-vnc-sasl While removing --prefix= and --audio-drv-list=, the compilation goes fine. No, sorry, the error is still the same! I did a make distclean and redid the configure without the options you pointed out. Any further ideas? Are there more possibilities to debug why the compilation fails? Best regards, Erik
[Qemu-devel] [RFC PATCH 1/2] e1000: Use Address_Available bit as HW latch
e1000 provides a E1000_RAH_AV bit on every complete write to the Receive Address Register. We can use this bit 2 ways: 1) To trigger HMP notifications. When the bit is set the mac address is fully set and we can update the HMP. 2) We can turn off he bit on the write to low order bits of the Receive Address Register, so that we would not try to match received traffic to this address when it is not completely set. Signed-off-by: Vlad Yasevich --- hw/net/e1000.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/hw/net/e1000.c b/hw/net/e1000.c index ae63591..82978ea 100644 --- a/hw/net/e1000.c +++ b/hw/net/e1000.c @@ -1106,10 +1106,19 @@ mac_writereg(E1000State *s, int index, uint32_t val) s->mac_reg[index] = val; -if (index == RA || index == RA + 1) { +switch (index) { +case RA: +/* Mask off AV bit on the write of the low dword. The write of + * the high dword will set the bit. This way a half-written + * mac address will not be used to filter on rx. + */ +s->mac_reg[RA+1] &= ~E1000_RAH_AV; +break; +case (RA + 1): macaddr[0] = cpu_to_le32(s->mac_reg[RA]); macaddr[1] = cpu_to_le32(s->mac_reg[RA + 1]); qemu_format_nic_info_str(qemu_get_queue(s->nic), (uint8_t *)macaddr); +break; } } -- 1.8.4.2
[Qemu-devel] [RFC PATCH 0/2] Update HMP only upon mac change completion.
Recent threads regarding e1000/rtl8139 and mac address change notifications prompted some research into the respecitive hw data sheets as well as available drivers. What I found is that each hw has a mechanism that can be used by our emulation layer to determine when the mac address change has completed (when the OS finished writing the mac address), and we can use these mechanisms to trigger HMP notifications. This is only an RFC series. It's been tested and works well. I've split e1000 and rtl8139 changes as they are sufficiently different. e1000 make this very clean and easy, but rtl8139 isn't as nice. Please take a look and I'd like to hear your comments. Thanks -vlad Vlad Yasevich (2): e1000: Use Address_Available bit as HW latch rtl8139: update HMP only when the address is fully written hw/i386/pc_piix.c| 4 hw/i386/pc_q35.c | 4 hw/net/e1000.c | 11 ++- hw/net/rtl8139.c | 50 +- include/hw/i386/pc.h | 8 5 files changed, 75 insertions(+), 2 deletions(-) -- 1.8.4.2
Re: [Qemu-devel] [PATCH for-1.7] qdev-properties-system.c: Allow vlan or netdev for -device, not both
Il 21/11/2013 20:47, Vlad Yasevich ha scritto: > On 11/11/2013 02:50 AM, Paolo Bonzini wrote: >> Il 11/11/2013 06:18, Jason Wang ha scritto: >>> On 11/08/2013 10:13 AM, Vlad Yasevich wrote: It is currently possible to specify things like: -device e1000,netdev=foo,vlan=1 With this usage, whichever argument was specified last (vlan or netdev) overwrites what was previousely set and results in a non-working configuration. Even worse, when used with multiqueue devices, it causes a segmentation fault on exit in qemu_free_net_client. That patch treates the above command line options as invalid and generates an error at start-up. Signed-off-by: Vlad Yasevich --- hw/core/qdev-properties-system.c | 9 + 1 file changed, 9 insertions(+) diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index 0eada32..729efa8 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char *str, void **ptr) goto err; } +if (ncs[i]) { +ret = -EINVAL; +goto err; +} + ncs[i] = peers[i]; ncs[i]->queue_index = i; } @@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void *opaque, *ptr = NULL; return; } +if (*ptr) { +error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name); +return; +} hubport = net_hub_port_find(id); if (!hubport) { >>> >>> Acked-by: Jason Wang >> >> This patch is good for 1.7. >> >> Paolo >> > > Hi All > > Just wondering what's to become of this patch? It has an ACK, but has > been abandoned. It does a fix a crash in qemu. I missed this when going through pending patches. Stefan, are you picking it up? Paolo
Re: [Qemu-devel] [PATCH 1/2] block: Enable BDRV_O_SNAPSHOT with driver-specific options
Am 21.11.2013 um 15:33 hat Stefan Hajnoczi geschrieben: > On Tue, Nov 19, 2013 at 04:37:27PM +0100, Kevin Wolf wrote: > > @@ -1114,6 +1093,24 @@ int bdrv_open(BlockDriverState *bs, const char > > *filename, QDict *options, > > goto fail; > > } > > > > +/* Prepare a new options QDict for the temporary file, where user > > + * options refer to the backing file */ > > +if (!options) { > > +options = qdict_new(); > > +} > > You can drop this because options is never NULL: > > options = qdict_clone_shallow(options); > > /* For snapshot=on, create a temporary qcow2 overlay */ > if (flags & BDRV_O_SNAPSHOT) { > ... You're right, I'll remove it in v2. > > +if (filename) { > > +qdict_put(options, "file.filename", > > qstring_from_str(filename)); > > +} > > +if (drv) { > > +qdict_put(options, "driver", > > qstring_from_str(drv->format_name)); > > +} > > + > > +snapshot_options = qdict_new(); > > +qdict_put(snapshot_options, "backing", options); > > +qdict_flatten(snapshot_options); > > + > > +options = snapshot_options; > > One thing I'm not sure about after these operations have been performed: > > bs->options = options; > ... > if (flags & BDRV_O_SNAPSHOT) { > ... > > So bs->options does not reflect what we ended up with in the > BDRV_O_SNAPSHOT case. > > But git grep -- '->options' shows no users of this field. Therefore it > won't cause a problem yet. But can you explain what's going on here? > Either we should keep bs->options up-to-date or we should drop the > field. I think bs->options was meant to be used in cases where we reopen a BDS. If it's currently unused, I guess this means we have some bugs. :-) Should bs->options be the original options passed by the user or the ones modified by BDRV_O_SNAPSHOT? I'm not completely sure, but I think the modified ones make more sense; it would also make it easier to move the whole BDRV_O_SNAPSHOT logic out to drive_init (which I plan to do in a next step). Kevin
[Qemu-devel] [PULL 11/11] qga: Fix compiler warnings (missing format attribute, wrong format strings)
From: Stefan Weil gcc 4.8.2 reports this warning when extra warnings are enabled (-Wextra): CCqga/commands.o qga/commands.c: In function ‘slog’: qga/commands.c:28:5: error: function might be possible candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] g_logv("syslog", G_LOG_LEVEL_INFO, fmt, ap); ^ gcc 4.8.2 reports this warning when slog is declared with the gnu_printf format attribute: qga/commands-posix.c: In function ‘qmp_guest_file_open’: qga/commands-posix.c:404:5: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘int64_t’ [-Wformat=] slog("guest-file-open, handle: %d", handle); ^ On 32 bit hosts there are three more warnings which are also fixed here. Signed-off-by: Stefan Weil Signed-off-by: Paolo Bonzini --- qga/commands-posix.c | 8 qga/guest-agent-core.h | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/qga/commands-posix.c b/qga/commands-posix.c index 10682f5..8100bee 100644 --- a/qga/commands-posix.c +++ b/qga/commands-posix.c @@ -401,7 +401,7 @@ int64_t qmp_guest_file_open(const char *path, bool has_mode, const char *mode, E return -1; } -slog("guest-file-open, handle: %d", handle); +slog("guest-file-open, handle: %" PRId64, handle); return handle; } @@ -410,7 +410,7 @@ void qmp_guest_file_close(int64_t handle, Error **err) GuestFileHandle *gfh = guest_file_handle_find(handle, err); int ret; -slog("guest-file-close called, handle: %ld", handle); +slog("guest-file-close called, handle: %" PRId64, handle); if (!gfh) { return; } @@ -451,7 +451,7 @@ struct GuestFileRead *qmp_guest_file_read(int64_t handle, bool has_count, read_count = fread(buf, 1, count, fh); if (ferror(fh)) { error_setg_errno(err, errno, "failed to read file"); -slog("guest-file-read failed, handle: %ld", handle); +slog("guest-file-read failed, handle: %" PRId64, handle); } else { buf[read_count] = 0; read_data = g_malloc0(sizeof(GuestFileRead)); @@ -496,7 +496,7 @@ GuestFileWrite *qmp_guest_file_write(int64_t handle, const char *buf_b64, write_count = fwrite(buf, 1, count, fh); if (ferror(fh)) { error_setg_errno(err, errno, "failed to write to file"); -slog("guest-file-write failed, handle: %ld", handle); +slog("guest-file-write failed, handle: %" PRId64, handle); } else { write_data = g_malloc0(sizeof(GuestFileWrite)); write_data->count = write_count; diff --git a/qga/guest-agent-core.h b/qga/guest-agent-core.h index 624a559..e422208 100644 --- a/qga/guest-agent-core.h +++ b/qga/guest-agent-core.h @@ -29,7 +29,7 @@ GACommandState *ga_command_state_new(void); bool ga_logging_enabled(GAState *s); void ga_disable_logging(GAState *s); void ga_enable_logging(GAState *s); -void slog(const gchar *fmt, ...); +void GCC_FMT_ATTR(1, 2) slog(const gchar *fmt, ...); void ga_set_response_delimited(GAState *s); bool ga_is_frozen(GAState *s); void ga_set_frozen(GAState *s); -- 1.8.3.1
[Qemu-devel] [PULL 01/11] sun4m: Add FCode ROM for TCX framebuffer
From: Mark Cave-Ayland Upstream OpenBIOS now implements SBus probing in order to determine the contents of a physical bus slot, which is required to allow OpenBIOS to identify the framebuffer without help from the fw_cfg interface. SBus probing works by detecting the presence of an FCode program (effectively tokenised Forth) at the base address of each slot, and if present executes it so that it creates its own device node in the OpenBIOS device tree. The FCode ROM is generated as part of the OpenBIOS build and should generally be updated at the same time. Signed-off-by: Mark Cave-Ayland CC: Blue Swirl CC: Bob Breuer CC: Artyom Tarasenko Signed-off-by: Paolo Bonzini --- Makefile | 2 +- hw/display/tcx.c | 26 +- hw/sparc/sun4m.c | 17 ++--- pc-bios/QEMU,tcx.bin | Bin 0 -> 1242 bytes pc-bios/README | 4 ++-- 5 files changed, 38 insertions(+), 11 deletions(-) create mode 100644 pc-bios/QEMU,tcx.bin diff --git a/Makefile b/Makefile index 3321b98..bdff4e4 100644 --- a/Makefile +++ b/Makefile @@ -293,7 +293,7 @@ ifdef INSTALL_BLOBS BLOBS=bios.bin sgabios.bin vgabios.bin vgabios-cirrus.bin \ vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin \ acpi-dsdt.aml q35-acpi-dsdt.aml \ -ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc \ +ppc_rom.bin openbios-sparc32 openbios-sparc64 openbios-ppc QEMU,tcx.bin \ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \ pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \ efi-e1000.rom efi-eepro100.rom efi-ne2k_pci.rom \ diff --git a/hw/display/tcx.c b/hw/display/tcx.c index 24876d3..873b82c 100644 --- a/hw/display/tcx.c +++ b/hw/display/tcx.c @@ -25,8 +25,12 @@ #include "qemu-common.h" #include "ui/console.h" #include "ui/pixel_ops.h" +#include "hw/loader.h" #include "hw/sysbus.h" +#define TCX_ROM_FILE "QEMU,tcx.bin" +#define FCODE_MAX_ROM_SIZE 0x1 + #define MAXX 1024 #define MAXY 768 #define TCX_DAC_NREGS 16 @@ -43,6 +47,8 @@ typedef struct TCXState { QemuConsole *con; uint8_t *vram; uint32_t *vram24, *cplane; +hwaddr prom_addr; +MemoryRegion rom; MemoryRegion vram_mem; MemoryRegion vram_8bit; MemoryRegion vram_24bit; @@ -529,14 +535,31 @@ static int tcx_init1(SysBusDevice *dev) { TCXState *s = TCX(dev); ram_addr_t vram_offset = 0; -int size; +int size, ret; uint8_t *vram_base; +char *fcode_filename; memory_region_init_ram(&s->vram_mem, OBJECT(s), "tcx.vram", s->vram_size * (1 + 4 + 4)); vmstate_register_ram_global(&s->vram_mem); vram_base = memory_region_get_ram_ptr(&s->vram_mem); +/* FCode ROM */ +memory_region_init_ram(&s->rom, NULL, "tcx.prom", FCODE_MAX_ROM_SIZE); +vmstate_register_ram_global(&s->rom); +memory_region_set_readonly(&s->rom, true); +sysbus_init_mmio(dev, &s->rom); + +fcode_filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, TCX_ROM_FILE); +if (fcode_filename) { +ret = load_image_targphys(fcode_filename, s->prom_addr, + FCODE_MAX_ROM_SIZE); +if (ret < 0 || ret > FCODE_MAX_ROM_SIZE) { +fprintf(stderr, "tcx: could not load prom '%s'\n", TCX_ROM_FILE); +return -1; +} +} + /* 8-bit plane */ s->vram = vram_base; size = s->vram_size; @@ -598,6 +621,7 @@ static Property tcx_properties[] = { DEFINE_PROP_UINT16("width",TCXState, width, -1), DEFINE_PROP_UINT16("height", TCXState, height,-1), DEFINE_PROP_UINT16("depth",TCXState, depth, -1), +DEFINE_PROP_HEX64("prom_addr", TCXState, prom_addr, -1), DEFINE_PROP_END_OF_LIST(), }; diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c index a0d366c..94f7950 100644 --- a/hw/sparc/sun4m.c +++ b/hw/sparc/sun4m.c @@ -537,24 +537,27 @@ static void tcx_init(hwaddr addr, int vram_size, int width, qdev_prop_set_uint16(dev, "width", width); qdev_prop_set_uint16(dev, "height", height); qdev_prop_set_uint16(dev, "depth", depth); +qdev_prop_set_uint64(dev, "prom_addr", addr); qdev_init_nofail(dev); s = SYS_BUS_DEVICE(dev); +/* FCode ROM */ +sysbus_mmio_map(s, 0, addr); /* 8-bit plane */ -sysbus_mmio_map(s, 0, addr + 0x0080ULL); +sysbus_mmio_map(s, 1, addr + 0x0080ULL); /* DAC */ -sysbus_mmio_map(s, 1, addr + 0x0020ULL); +sysbus_mmio_map(s, 2, addr + 0x0020ULL); /* TEC (dummy) */ -sysbus_mmio_map(s, 2, addr + 0x0070ULL); +sysbus_mmio_map(s, 3, addr + 0x0070ULL); /* THC 24 bit: NetBSD writes here even with 8-bit display: dummy */ -sysbus_mmio_map(s, 3, addr + 0x00301000ULL); +sysbus_mmio_map(s, 4, addr + 0x00301000ULL); if (depth == 24) { /* 24-bit plane */ -sysbus_mmio_map(s, 4, addr + 0x0200ULL); +sysbus_mmio_map(s, 5, addr + 0x0200ULL); /* Control plane */ -sysbus_mmio_map(s, 5, addr + 0x0a00
Re: [Qemu-devel] [PATCH for-1.7] qdev-properties-system.c: Allow vlan or netdev for -device, not both
On 11/11/2013 02:50 AM, Paolo Bonzini wrote: > Il 11/11/2013 06:18, Jason Wang ha scritto: >> On 11/08/2013 10:13 AM, Vlad Yasevich wrote: >>> It is currently possible to specify things like: >>> -device e1000,netdev=foo,vlan=1 >>> With this usage, whichever argument was specified last (vlan or netdev) >>> overwrites what was previousely set and results in a non-working >>> configuration. Even worse, when used with multiqueue devices, >>> it causes a segmentation fault on exit in qemu_free_net_client. >>> >>> That patch treates the above command line options as invalid and >>> generates an error at start-up. >>> >>> Signed-off-by: Vlad Yasevich >>> --- >>> hw/core/qdev-properties-system.c | 9 + >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/hw/core/qdev-properties-system.c >>> b/hw/core/qdev-properties-system.c >>> index 0eada32..729efa8 100644 >>> --- a/hw/core/qdev-properties-system.c >>> +++ b/hw/core/qdev-properties-system.c >>> @@ -205,6 +205,11 @@ static int parse_netdev(DeviceState *dev, const char >>> *str, void **ptr) >>> goto err; >>> } >>> >>> +if (ncs[i]) { >>> +ret = -EINVAL; >>> +goto err; >>> +} >>> + >>> ncs[i] = peers[i]; >>> ncs[i]->queue_index = i; >>> } >>> @@ -301,6 +306,10 @@ static void set_vlan(Object *obj, Visitor *v, void >>> *opaque, >>> *ptr = NULL; >>> return; >>> } >>> +if (*ptr) { >>> +error_set_from_qdev_prop_error(errp, -EINVAL, dev, prop, name); >>> +return; >>> +} >>> >>> hubport = net_hub_port_find(id); >>> if (!hubport) { >> >> Acked-by: Jason Wang > > This patch is good for 1.7. > > Paolo > Hi All Just wondering what's to become of this patch? It has an ACK, but has been abandoned. It does a fix a crash in qemu. Thanks -vlad
[Qemu-devel] [PATCH] seccomp: add kill() to the syscall whitelist
The kill() syscall is triggered with the following command: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 The resulting syslog/audit message: # ausearch -m SECCOMP time->Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm="qemu-kvm" sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 # scmp_sys_resolver 62 kill Reported-by: CongLi Tested-by: CongLi Signed-off-by: Paul Moore --- qemu-seccomp.c |1 + 1 file changed, 1 insertion(+) diff --git a/qemu-seccomp.c b/qemu-seccomp.c index 69cee44..cf07869 100644 --- a/qemu-seccomp.c +++ b/qemu-seccomp.c @@ -114,6 +114,7 @@ static const struct QemuSeccompSyscall seccomp_whitelist[] = { { SCMP_SYS(write), 244 }, { SCMP_SYS(fcntl), 243 }, { SCMP_SYS(tgkill), 242 }, +{ SCMP_SYS(kill), 242 }, { SCMP_SYS(rt_sigaction), 242 }, { SCMP_SYS(pipe2), 242 }, { SCMP_SYS(munmap), 242 },
[Qemu-devel] [PATCH for-1.7 4/5] configure: added acpi unit-test files
Ensure configure will set-up links for the files if the build is created in other directory. Signed-off-by: Marcel Apfelbaum --- configure | 4 1 file changed, 4 insertions(+) diff --git a/configure b/configure index 9addff1..127e287 100755 --- a/configure +++ b/configure @@ -4665,6 +4665,10 @@ for bios_file in \ do FILES="$FILES pc-bios/`basename $bios_file`" done +for test_file in `find $source_path/tests/acpi-test-data -type f` +do +FILES="$FILES tests/acpi-test-data`echo $test_file | sed -e 's/.*acpi-test-data//'`" +done mkdir -p $DIRS for f in $FILES ; do if [ -e "$source_path/$f" ] && [ "$source_path" != `pwd` ]; then -- 1.8.3.1
[Qemu-devel] [PATCH for-1.7 1/5] acpi unit-test: load and check facs table
FACS table does not have a checksum, so we can check at least the signature (existence). Signed-off-by: Marcel Apfelbaum --- tests/acpi-test.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/tests/acpi-test.c b/tests/acpi-test.c index d6ff66f..43775cd 100644 --- a/tests/acpi-test.c +++ b/tests/acpi-test.c @@ -30,6 +30,7 @@ typedef struct { AcpiRsdpDescriptor rsdp_table; AcpiRsdtDescriptorRev1 rsdt_table; AcpiFadtDescriptorRev1 fadt_table; +AcpiFacsDescriptorRev1 facs_table; uint32_t *rsdt_tables_addr; int rsdt_tables_nr; AcpiSdtTable dsdt_table; @@ -252,6 +253,22 @@ static void test_acpi_fadt_table(test_data *data) g_assert(!acpi_checksum((uint8_t *)fadt_table, fadt_table->length)); } +static void test_acpi_facs_table(test_data *data) +{ +AcpiFacsDescriptorRev1 *facs_table = &data->facs_table; +uint32_t addr = data->fadt_table.firmware_ctrl; + +ACPI_READ_FIELD(facs_table->signature, addr); +ACPI_READ_FIELD(facs_table->length, addr); +ACPI_READ_FIELD(facs_table->hardware_signature, addr); +ACPI_READ_FIELD(facs_table->firmware_waking_vector, addr); +ACPI_READ_FIELD(facs_table->global_lock, addr); +ACPI_READ_FIELD(facs_table->flags, addr); +ACPI_READ_ARRAY(facs_table->resverved3, addr); + +g_assert_cmphex(facs_table->signature, ==, ACPI_FACS_SIGNATURE); +} + static void test_dst_table(AcpiSdtTable *sdt_table, uint32_t addr) { uint8_t checksum; @@ -329,6 +346,7 @@ static void test_acpi_one(const char *params) test_acpi_rsdp_table(&data); test_acpi_rsdt_table(&data); test_acpi_fadt_table(&data); +test_acpi_facs_table(data); test_acpi_dsdt_table(&data); test_acpi_ssdt_tables(&data); -- 1.8.3.1
[Qemu-devel] [PATCH for-1.7 0/5] acpi unit-test: added tests
Added 2 tests: 1. Basic check of FACS table (missed on prev submission) 2. Compare DSDT and SSDT tables against expected values Test 2: - runs only if iasl is installed on the host machine. - the test plan: 1. Dumps the ACPI tables as AML on the disk. 2. Runs iasl to disassembly the tables into ASL files. 3. Compares them with expected offline ASL files. - the test runs for both default machine and q35. - in case the test fails, it can be easily tweaked to show the differences between the ASL files and understand the issue. Patches: 1/5 - test 1 2/5 - some infrastructure improvements 3/5 - expected asl files for test 2 4/5 - creates links for the expected files if the build directory is not current 5/5 - test 2 Marcel Apfelbaum (5): acpi unit-test: load and check facs table acpi unit-test: adjust the test data structure for better handling acpi unit-test: add test files configure: added acpi unit-test files acpi unit-test: compare DSDT and SSDT tables against expected values configure |4 + tests/acpi-test-data/pc/APIC.dsl | 103 ++ tests/acpi-test-data/pc/DSDT.dsl | 1870 ++ tests/acpi-test-data/pc/FACP.dsl | 99 ++ tests/acpi-test-data/pc/FACS.dsl | 32 + tests/acpi-test-data/pc/HPET.dsl | 43 + tests/acpi-test-data/pc/SSDT.dsl | 634 tests/acpi-test-data/q35/APIC.dsl | 103 ++ tests/acpi-test-data/q35/DSDT.dsl | 3197 + tests/acpi-test-data/q35/FACP.dsl | 99 ++ tests/acpi-test-data/q35/FACS.dsl | 32 + tests/acpi-test-data/q35/HPET.dsl | 43 + tests/acpi-test-data/q35/MCFG.dsl | 36 + tests/acpi-test-data/q35/SSDT.dsl | 665 tests/acpi-test.c | 282 +++- 15 files changed, 7210 insertions(+), 32 deletions(-) create mode 100644 tests/acpi-test-data/pc/APIC.dsl create mode 100644 tests/acpi-test-data/pc/DSDT.dsl create mode 100644 tests/acpi-test-data/pc/FACP.dsl create mode 100644 tests/acpi-test-data/pc/FACS.dsl create mode 100644 tests/acpi-test-data/pc/HPET.dsl create mode 100644 tests/acpi-test-data/pc/SSDT.dsl create mode 100644 tests/acpi-test-data/q35/APIC.dsl create mode 100644 tests/acpi-test-data/q35/DSDT.dsl create mode 100644 tests/acpi-test-data/q35/FACP.dsl create mode 100644 tests/acpi-test-data/q35/FACS.dsl create mode 100644 tests/acpi-test-data/q35/HPET.dsl create mode 100644 tests/acpi-test-data/q35/MCFG.dsl create mode 100644 tests/acpi-test-data/q35/SSDT.dsl -- 1.8.3.1
[Qemu-devel] [PATCH for-1.7 2/5] acpi unit-test: adjust the test data structure for better handling
Ensure more then one instance of test_data may exist at a given time. It will help to compare different acpi table versions. Signed-off-by: Marcel Apfelbaum --- tests/acpi-test.c | 55 --- 1 file changed, 32 insertions(+), 23 deletions(-) diff --git a/tests/acpi-test.c b/tests/acpi-test.c index 43775cd..ca83b1d 100644 --- a/tests/acpi-test.c +++ b/tests/acpi-test.c @@ -34,7 +34,7 @@ typedef struct { uint32_t *rsdt_tables_addr; int rsdt_tables_nr; AcpiSdtTable dsdt_table; -AcpiSdtTable *ssdt_tables; +GArray *ssdt_tables; } test_data; #define LOW(x) ((x) & 0xff) @@ -118,6 +118,18 @@ static uint8_t boot_sector[0x200] = { static const char *disk = "tests/acpi-test-disk.raw"; +static void free_test_data(test_data *data) +{ +int i; + +g_free(data->rsdt_tables_addr); +for (i = 0; i < data->ssdt_tables->len; ++i) { +g_free(g_array_index(data->ssdt_tables, AcpiSdtTable, i).aml); +} +g_array_free(data->ssdt_tables, false); +g_free(data->dsdt_table.aml); +} + static uint8_t acpi_checksum(const uint8_t *data, int len) { int i; @@ -295,30 +307,30 @@ static void test_acpi_dsdt_table(test_data *data) static void test_acpi_ssdt_tables(test_data *data) { -AcpiSdtTable *ssdt_tables; +GArray *ssdt_tables; int ssdt_tables_nr = data->rsdt_tables_nr - 1; /* fadt is first */ int i; -ssdt_tables = g_new0(AcpiSdtTable, ssdt_tables_nr); +ssdt_tables = g_array_sized_new(false, true, sizeof(AcpiSdtTable), +ssdt_tables_nr); for (i = 0; i < ssdt_tables_nr; i++) { -AcpiSdtTable *ssdt_table = &ssdt_tables[i]; +AcpiSdtTable ssdt_table; uint32_t addr = data->rsdt_tables_addr[i + 1]; /* fadt is first */ - -test_dst_table(ssdt_table, addr); +test_dst_table(&ssdt_table, addr); +g_array_append_val(ssdt_tables, ssdt_table); } data->ssdt_tables = ssdt_tables; } -static void test_acpi_one(const char *params) +static void test_acpi_one(const char *params, test_data *data) { char *args; uint8_t signature_low; uint8_t signature_high; uint16_t signature; int i; -test_data data; -memset(&data, 0, sizeof(data)); +memset(data, 0, sizeof(*data)); args = g_strdup_printf("-net none -display none %s %s", params ? params : "", disk); qtest_start(args); @@ -342,20 +354,13 @@ static void test_acpi_one(const char *params) } g_assert_cmphex(signature, ==, SIGNATURE); -test_acpi_rsdp_address(&data); -test_acpi_rsdp_table(&data); -test_acpi_rsdt_table(&data); -test_acpi_fadt_table(&data); +test_acpi_rsdp_address(data); +test_acpi_rsdp_table(data); +test_acpi_rsdt_table(data); +test_acpi_fadt_table(data); test_acpi_facs_table(data); -test_acpi_dsdt_table(&data); -test_acpi_ssdt_tables(&data); - -g_free(data.rsdt_tables_addr); -for (i = 0; i < (data.rsdt_tables_nr - 1); ++i) { -g_free(data.ssdt_tables[i].aml); -} -g_free(data.ssdt_tables); -g_free(data.dsdt_table.aml); +test_acpi_dsdt_table(data); +test_acpi_ssdt_tables(data); qtest_quit(global_qtest); g_free(args); @@ -363,10 +368,14 @@ static void test_acpi_one(const char *params) static void test_acpi_tcg(void) { +test_data data; + /* Supplying -machine accel argument overrides the default (qtest). * This is to make guest actually run. */ -test_acpi_one("-machine accel=tcg"); +test_acpi_one("-machine accel=tcg", &data); + +free_test_data(&data); } int main(int argc, char *argv[]) -- 1.8.3.1
[Qemu-devel] [PATCH 3/6] qemu-option: Add qemu_config_parse_qdict()
This function basically parses command-line options given as a QDict replacing a config file. For instance, the QDict {"section.opt1": 42, "section.opt2": 23} corresponds to the config file: [section] opt1 = 42 opt2 = 23 It is possible to specify multiple sections and also multiple sections of the same type. On the command line, this looks like the following: inject-error.0.event=reftable_load,\ inject-error.1.event=l2_load,\ set-state.event=l1_update This would correspond to the following config file: [inject-error "inject-error.0"] event = reftable_load [inject-error "inject-error.1"] event = l2_load [set-state] event = l1_update Signed-off-by: Max Reitz --- include/qemu/config-file.h | 6 +++ util/qemu-config.c | 111 + 2 files changed, 117 insertions(+) diff --git a/include/qemu/config-file.h b/include/qemu/config-file.h index 508428f..dbd97c4 100644 --- a/include/qemu/config-file.h +++ b/include/qemu/config-file.h @@ -4,6 +4,7 @@ #include #include "qemu/option.h" #include "qapi/error.h" +#include "qapi/qmp/qdict.h" QemuOptsList *qemu_find_opts(const char *group); QemuOptsList *qemu_find_opts_err(const char *group, Error **errp); @@ -18,6 +19,11 @@ int qemu_config_parse(FILE *fp, QemuOptsList **lists, const char *fname); int qemu_read_config_file(const char *filename); +/* Parse QDict options as a replacement for a config file (allowing multiple + enumerated (0..(n-1)) configuration "sections") */ +void qemu_config_parse_qdict(QDict *options, QemuOptsList **lists, + Error **errp); + /* Read default QEMU config files */ int qemu_read_default_config_files(bool userconfig); diff --git a/util/qemu-config.c b/util/qemu-config.c index 04da942..80d82d4 100644 --- a/util/qemu-config.c +++ b/util/qemu-config.c @@ -356,3 +356,114 @@ int qemu_read_config_file(const char *filename) return -EINVAL; } } + +static void config_parse_qdict_section(QDict *options, QemuOptsList *opts, + Error **errp) +{ +QemuOpts *subopts, *subopts_i; +QDict *subqdict, *subqdict_i = NULL; +char *prefix = g_strdup_printf("%s.", opts->name); +Error *local_err = NULL; +size_t orig_size, enum_size; + +qdict_extract_subqdict(options, &subqdict, prefix); +orig_size = qdict_size(subqdict); +if (!orig_size) { +goto out; +} + +subopts = qemu_opts_create_nofail(opts); +qemu_opts_absorb_qdict(subopts, subqdict, &local_err); +if (error_is_set(&local_err)) { +error_propagate(errp, local_err); +goto out; +} + +enum_size = qdict_size(subqdict); +if (enum_size < orig_size && enum_size) { +error_setg(errp, "Unknown option '%s' for '%s'", + qdict_first(subqdict)->key, opts->name); +goto out; +} + +if (enum_size) { +/* Multiple, enumerated rules */ +int i; + +/* Not required anymore */ +qemu_opts_del(subopts); + +for (i = 0; i < INT_MAX; i++) { +char i_prefix[32], opt_name[48]; +size_t snprintf_ret; + +snprintf_ret = snprintf(i_prefix, 32, "%i.", i); +assert(snprintf_ret < 32); + +snprintf_ret = snprintf(opt_name, 48, "%s.%i", opts->name, i); +assert(snprintf_ret < 48); + +qdict_extract_subqdict(subqdict, &subqdict_i, i_prefix); +if (!qdict_size(subqdict_i)) { +break; +} + +subopts_i = qemu_opts_create(opts, opt_name, 1, NULL); +if (!subopts_i) { +error_setg(errp, "Option ID '%s' already in use", opt_name); +goto out; +} + +qemu_opts_absorb_qdict(subopts_i, subqdict_i, &local_err); +if (error_is_set(&local_err)) { +error_propagate(errp, local_err); +qemu_opts_del(subopts_i); +goto out; +} + +if (qdict_size(subqdict_i)) { +error_setg(errp, "[%s] rules don't support the option '%s'", + opts->name, qdict_first(subqdict_i)->key); +qemu_opts_del(subopts_i); +goto out; +} + +/* +if (add_rule(subopts_i, (void *)ard) < 0) { +error_setg(errp, "Could not add [%s] rule %i", opts->name, i); +goto out; +} +*/ + +QDECREF(subqdict_i); +subqdict_i = NULL; +} + +if (qdict_size(subqdict)) { +error_setg(errp, "Unused option '%s' for [%s]", + qdict_first(subqdict)->key, opts->name); +goto out; +} +} + +out: +QDECREF(subqdict); +QDECREF(subqdict_i); + +free(prefix); +} + +void qemu_config_parse_qdict(QDict *options, QemuOptsList **lists, + Error **errp) +{ +int i; +
[Qemu-devel] [PATCH 6/6] blkverify: Don't require protocol filename
If the filename is not prefixed by "blkverify:" in blkverify_parse_filename(), the blkverify driver was not selected through that protocol prefix, but by an explicit command line option (like file.driver=blkverify). Contrary to the current reaction, this is not really a problem; the whole filename just has to be stored (in the x-image option) and the user has to manually specify the x-raw option. Signed-off-by: Max Reitz --- The x-raw option is undocumented and may be removed at any point in time; but I don't see a reason why this should happen. Passing the reference filename through an option maybe was a bit dirty in the past; but by “exposing” it in the way this patch does, it becomes probably the only clean solution. The only problem I can see right now is the naming of the option, but we should always be able to leave it as an alias, if necessary. --- block/blkverify.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blkverify.c b/block/blkverify.c index 3c63528..bdbdd68 100644 --- a/block/blkverify.c +++ b/block/blkverify.c @@ -78,7 +78,9 @@ static void blkverify_parse_filename(const char *filename, QDict *options, /* Parse the blkverify: prefix */ if (!strstart(filename, "blkverify:", &filename)) { -error_setg(errp, "File name string must start with 'blkverify:'"); +/* There was no prefix; therefore, all options have to be already + present in the QDict (except for the filename) */ +qdict_put(options, "x-image", qstring_from_str(filename)); return; } -- 1.8.4.2
[Qemu-devel] [PATCH 0/6] blkdebug/blkverify: Allow command-line configuration
Currently, the configuration of blkdebug and blkverify is done through the "filename" alone. There is now way of manually choosing blkdebug or blkverify as a driver and using a normal image filename. In the case of blkdebug, the filename starts with the protocol prefix, follows up with the name of a configuration file and ends with the name of the image file. In the case of blkverify, the filename starts with the protocol prefix, follows up with the raw reference image filename and ends with the name of the image file. This patch allows the configuration of both drivers completely through command-line options. The driver has to be selected through the file.driver option (or similar), the image filename has to be given as the filename (obviously) and, depending on the driver, further options have to be given to control the behavior. In case of blkverify, the x-raw option specifies the name of the raw reference image file. In case of blkdebug, one may either set the config option to the filename of a configuration file, or the contents of the configuration file may be given directly on the command line (see description of patch 3 for an example). Max Reitz (6): blkdebug: Use errp for read_config() blkdebug: Don't require sophisticated filename qemu-option: Add qemu_config_parse_qdict() blkdebug: Always call read_config() blkdebug: Use command-line in read_config() blkverify: Don't require protocol filename block/blkdebug.c | 50 +--- block/blkverify.c | 4 +- include/qemu/config-file.h | 6 +++ util/qemu-config.c | 111 + 4 files changed, 154 insertions(+), 17 deletions(-) -- 1.8.4.2
Re: [Qemu-devel] [ANNOUNCE] QEMU 1.7.0-rc1 is now available
Sorry for the delays around this release. I finally have proper internet access at home. I've updated the wiki and compressed the schedule for 1.7.0 should go out on Wednesday. I went through patches this morning and believe I have everything committed that was targeted for 1.7 (minus Paolo's pull request). Please confirm and ping anything that isn't committed as appropriate. Regards, Anthony Liguori On Thu, Nov 21, 2013 at 10:42 AM, Anthony Liguori wrote: > Hi, > > On behalf of the QEMU Team, I'd like to announce the availability of the > second release candidate for the QEMU 1.7 release. This release is meant > for testing purposes and should not be used in a production environment. > > http://wiki.qemu.org/download/qemu-1.7.0-rc1.tar.bz2 > > You can help improve the quality of the QEMU 1.7 release by testing this > release and reporting bugs on Launchpad: > > https://bugs.launchpad.net/qemu/ > > The release plan for the 1.7 release is available at: > > http://wiki.qemu.org/Planning/1.7 > > Please add entries to the ChangeLog for the 1.7 release below: > > http://wiki.qemu.org/ChangeLog/Next > > The following changes have been made since v1.7.0-rc0: > > - vfio-pci: Fix multifunction=on (Alex Williamson) > - target-i386: Fix addr32 prefix in gen_lea_modrm (Richard Henderson) > - atomic.h: Fix build with clang (Peter Maydell) > - target-i386: do not override nr_cores for -cpu host (Paolo Bonzini) > - mips jazz: do not raise data bus exception when accessing invalid > addresses (Hervé Poussineau) > - target-i386: yield to another VCPU on PAUSE (Paolo Bonzini) > - rng-egd: offset the point when repeatedly read from the buffer (Amos Kong) > - rng-egd: remove redundant free (Amos Kong) > - virtio-rng: add check of period (Amos Kong) > - s390x: fix flat file load on 32 bit systems (Michael S. Tsirkin) > - acpi-build: fix build on glib < 2.14 (Michael S. Tsirkin) > - acpi-build: fix build on glib < 2.22 (Michael S. Tsirkin) > - target-openrisc: Correct carry flag check of l.addc and l.addic test cases > (Sebastian Macke) > - target-openrisc: Correct memory bounds checking for the tlb buffers > (Sebastian Macke) > - openrisc-timer: Reduce overhead, Separate clock update functions > (Sebastian Macke) > - target-openrisc: Correct wrong epcr register in interrupt handler > (Sebastian Macke) > - target-openrisc: Remove executable flag for every page (Sebastian Macke) > - target-openrisc: Remove unnecessary code generated by jump instructions > (Sebastian Macke) > - target-openrisc: Speed up move instruction (Sebastian Macke) > - The calculation of bytes_xfer in qemu_put_buffer() is wrong (Wangting > (Kathy)) > - migration: drop MADVISE_DONT_NEED for incoming zero pages (Peter Lieven) > - qom: Fix memory leak in object_property_set_link() (Vlad Yasevich) > - qtest: Use -display none by default (Andreas Färber) > - virtio-net: fix the memory leak in rxfilter_notify() (Amos Kong) > - doc: fix hardcoded helper path (Amos Kong) > - tcg-ia64: Introduce tcg_opc_bswap64_i (Richard Henderson) > - tcg-ia64: Introduce tcg_opc_ext_i (Richard Henderson) > - tcg-ia64: Introduce tcg_opc_movi_a (Richard Henderson) > - tcg-ia64: Introduce tcg_opc_mov_a (Richard Henderson) > - tcg-ia64: Use A3 form of logical operations (Richard Henderson) > - tcg-ia64: Use SUB_A3 and ADDS_A4 for subtraction (Richard Henderson) > - tcg-ia64: Use ADDS for small addition (Richard Henderson) > - tcg-ia64: Avoid unnecessary stop bit in tcg_out_alu (Richard Henderson) > - tcg-ia64: Move AREG0 to R32 (Richard Henderson) > - tcg-ia64: Simplify brcond (Richard Henderson) > - tcg-ia64: Handle constant calls (Richard Henderson) > - tcg-ia64: Use shortcuts for nop insns (Richard Henderson) > - tcg-ia64: Use TCGMemOp within qemu_ldst routines (Richard Henderson) > - hw/i386/Makefile.obj: use $(PYTHON) to run .py scripts consistently > (Michael Tokarev) > - configure: Use -B switch only for Python versions which support it (Stefan > Weil) > - qga: Fix shutdown command of guest agent to work with SysV (whitearchey) > - block: Fail if requested driver is not available (Kevin Wolf) > - MAINTAINERS: add block driver sub-maintainers (Stefan Hajnoczi) > - qemu-img: Fix overwriting 'ret' before using (Fam Zheng) > - qemu-iotests: Test qcow2 count_contiguous_clusters() (Kevin Wolf) > - smc91c111: Fix receive starvation (Sebastian Huber) > - qcow2: fix possible corruption when reading multiple clusters (Peter > Lieven) > - qmp: access the local QemuOptsLists for drive option (Amos Kong) > - MAINTAINERS: add block tree repo URLs (Stefan Hajnoczi) > - qemu-iotests: Extend 041 for unbacked mirroring (Max Reitz) > - block/drive-mirror: Check for NULL backing_hd (Max Reitz) > - qapi-schema: Update description for NewImageMode (Max Reitz) > - block: Print its file name if backing file opening failed (Fam Zheng) > - pc: disable pci-info (Igor Mammedov) > - console: Remove unused debug code (Stefan Weil) >
[Qemu-devel] [PULL for-1.7 00/11] Miscellaneous -rc patches
The following changes since commit 394cfa39ba24dd838ace1308ae24961243947fb8: Merge remote-tracking branch 'quintela/migration.next' into staging (2013-11-19 13:03:06 -0800) are available in the git repository at: git://github.com/bonzini/qemu.git tags/for-anthony for you to fetch changes up to d607a52364e7bfc1cd6d3e425b898e86be4e525d: qga: Fix compiler warnings (missing format attribute, wrong format strings) (2013-11-21 17:39:25 +0100) You do not have my key yet, but you can verify it is the same that I use for example in Linux pull requests. Please contact me offlist if you wish to do keysigning. Here are a bunch of 1.7-tagged patches that I was afraid were getting forgotten or that did not have a clear maintainer responsible for making a pull request. Alex Williamson (1): vfio-pci: Fix multifunction=on Amos Kong (2): rng-egd: remove redundant free rng-egd: offset the point when repeatedly read from the buffer Hervé Poussineau (1): mips jazz: do not raise data bus exception when accessing invalid addresses Mark Cave-Ayland (1): sun4m: Add FCode ROM for TCX framebuffer Paolo Bonzini (2): pc: get rid of builtin pvpanic for "-M pc-1.5" target-i386: yield to another VCPU on PAUSE Peter Maydell (3): configure: Explicitly set ARFLAGS so we can build with GNU Make 4.0 atomic.h: Fix build with clang target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid() Stefan Weil (1): qga: Fix compiler warnings (missing format attribute, wrong format strings) Makefile | 2 +- backends/rng-egd.c| 5 +++-- configure | 5 + hw/display/tcx.c | 26 +- hw/i386/pc_piix.c | 7 --- hw/i386/pc_q35.c | 7 --- hw/mips/mips_jazz.c | 24 hw/misc/pvpanic.c | 5 - hw/misc/vfio.c| 7 +++ hw/sparc/sun4m.c | 17 ++--- include/hw/i386/pc.h | 1 - include/qemu/atomic.h | 6 +++--- pc-bios/QEMU,tcx.bin | Bin 0 -> 1242 bytes pc-bios/README| 4 ++-- qga/commands-posix.c | 8 qga/guest-agent-core.h| 2 +- target-i386/helper.h | 1 + target-i386/kvm-stub.c| 12 target-i386/misc_helper.c | 22 -- target-i386/translate.c | 5 - 20 files changed, 122 insertions(+), 44 deletions(-) create mode 100644 pc-bios/QEMU,tcx.bin -- 1.8.3.1
Re: [Qemu-devel] [PATCH 21/27] pc: add memory hotplug 440fx machine
On Thu, 21 Nov 2013 15:13:12 +0100 Andreas Färber wrote: > Am 21.11.2013 06:48, schrieb Li Guang: > > Hi, Igor > > > > Igor Mammedov wrote: > >> Add DimmBus for memory hotplug below 4Gb or above 4Gb depending > >> on initial memory size and hotplug memory size. > >> > > ... > >> +static > >> +void pc_hotplug_memory_init_impl(Object *owner, > >> + MemoryRegion *system_memory, > >> + ram_addr_t low_hotplug_mem_start, > >> + ram_addr_t low_hotplug_mem_end, > >> + DimmBus *hotplug_mem_bus, > >> + ram_addr_t *high_mem_end) > >> +{ > >> +QemuOpts *opts = qemu_opts_find(qemu_find_opts("memory-opts"), > >> NULL); > >> +ram_addr_t ram_size = qemu_opt_get_size(opts, "mem", 0); > >> +ram_addr_t maxmem = qemu_opt_get_size(opts, "maxmem", 0); > >> +ram_addr_t hotplug_mem_size; > >> + > >> +if (maxmem<= ram_size) { > >> +/* Disable ACPI migration code and creation of memory devices > >> in SSDT */ > >> > > > > Why not give the memory that not be hot-added a chance to be placed on > > one memory slot? > > Seconded, I believe I requested that on the previous version already. Because current initial memory allocation is a mess and this already large series would become even more large and intrusive, so far series it relatively not intrusive and self contained. I believe re-factoring of initial memory to use Dimm devices should be done later on top of infrastructure this series provides. > Andreas > > > if all memory can be hot-added and hot-removed, then we can bring in > > more flexibility for > > memory hotplug feature. >
[Qemu-devel] [Bug 1253563] [NEW] bad performance with rng-egd backend
You have been subscribed to a public bug by Amos Kong (amoskong): 1. create listen socket # cat /dev/random | nc -l localhost 1024 2. start vm with rng-egd backend ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -mon chardev=qmp,mode=control,pretty=on -chardev socket,id=qmp,host=localhost,port=1234,server,nowait -m 2000 -device virtio-net-pci,netdev=h1,id=vnet0 -netdev tap,id=h1 -vnc :0 -drive file=/images/RHEL-64-virtio.qcow2 \ -chardev socket,host=localhost,port=1024,id=chr0 \ -object rng-egd,chardev=chr0,id=rng0 \ -device virtio-rng-pci,rng=rng0,max-bytes=1024000,period=1000 (guest) # dd if=/dev/hwrng of=/dev/null note: cancelling dd process by Ctrl+c, it will return the read speed. Problem: the speed is around 1k/s === If I use rng-random backend (filename=/dev/random), the speed is about 350k/s). It seems that when the request entry is added to the list, we don't read the data from queue list immediately. The chr_read() is delayed, the virtio_notify() is delayed. the next request will also be delayed. It effects the speed. I tried to change rng_egd_chr_can_read() always returns 1, the speed is improved to (about 400k/s) Problem: we can't poll the content in time currently Any thoughts? Thanks, Amos ** Affects: qemu Importance: Undecided Status: New -- bad performance with rng-egd backend https://bugs.launchpad.net/bugs/1253563 You received this bug notification because you are a member of qemu-devel-ml, which is subscribed to the bug report.
[Qemu-devel] [PATCH] vl: verify if combination of cpus, sockets, cores and threads is sane
Signed-off-by: Peter Lieven --- vl.c | 34 -- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/vl.c b/vl.c index 8d5d874..dc0b41a 100644 --- a/vl.c +++ b/vl.c @@ -1385,35 +1385,41 @@ static QemuOptsList qemu_smp_opts = { static void smp_parse(QemuOpts *opts) { if (opts) { - unsigned cpus= qemu_opt_get_number(opts, "cpus", 0); unsigned sockets = qemu_opt_get_number(opts, "sockets", 0); unsigned cores = qemu_opt_get_number(opts, "cores", 0); unsigned threads = qemu_opt_get_number(opts, "threads", 0); /* compute missing values, prefer sockets over cores over threads */ -if (cpus == 0 || sockets == 0) { +if (cpus == 0) { sockets = sockets > 0 ? sockets : 1; cores = cores > 0 ? cores : 1; threads = threads > 0 ? threads : 1; -if (cpus == 0) { -cpus = cores * threads * sockets; -} +cpus = cores * threads * sockets; +} else if (sockets == 0) { +cores = cores > 0 ? cores : 1; +threads = threads > 0 ? threads : 1; +sockets = cpus / (cores * threads); +} else if (cores == 0) { +threads = threads > 0 ? threads : 1; +cores = cpus / (sockets * threads); } else { -if (cores == 0) { -threads = threads > 0 ? threads : 1; -cores = cpus / (sockets * threads); -} else { -threads = cpus / (cores * sockets); -} +threads = cpus / (sockets * cores); } -max_cpus = qemu_opt_get_number(opts, "maxcpus", 0); +if (cpus != sockets * cores * threads) { +fprintf(stderr, "Illegal CPU layout: %d cpus with %d sockets," +" %d cores per socket and %d threads per core" +" (cpus != sockets * cores * threads)\n", +cpus, sockets, cores, threads); +exit(1); +} smp_cpus = cpus; -smp_cores = cores > 0 ? cores : 1; -smp_threads = threads > 0 ? threads : 1; +smp_cores = cores; +smp_threads = threads; +max_cpus = qemu_opt_get_number(opts, "maxcpus", 0); } if (max_cpus == 0) { -- 1.7.9.5
Re: [Qemu-devel] [PATCH 1/2] block: Enable BDRV_O_SNAPSHOT with driver-specific options
On Tue, Nov 19, 2013 at 04:37:27PM +0100, Kevin Wolf wrote: > @@ -1114,6 +1093,24 @@ int bdrv_open(BlockDriverState *bs, const char > *filename, QDict *options, > goto fail; > } > > +/* Prepare a new options QDict for the temporary file, where user > + * options refer to the backing file */ > +if (!options) { > +options = qdict_new(); > +} You can drop this because options is never NULL: options = qdict_clone_shallow(options); /* For snapshot=on, create a temporary qcow2 overlay */ if (flags & BDRV_O_SNAPSHOT) { ... > +if (filename) { > +qdict_put(options, "file.filename", qstring_from_str(filename)); > +} > +if (drv) { > +qdict_put(options, "driver", qstring_from_str(drv->format_name)); > +} > + > +snapshot_options = qdict_new(); > +qdict_put(snapshot_options, "backing", options); > +qdict_flatten(snapshot_options); > + > +options = snapshot_options; One thing I'm not sure about after these operations have been performed: bs->options = options; ... if (flags & BDRV_O_SNAPSHOT) { ... So bs->options does not reflect what we ended up with in the BDRV_O_SNAPSHOT case. But git grep -- '->options' shows no users of this field. Therefore it won't cause a problem yet. But can you explain what's going on here? Either we should keep bs->options up-to-date or we should drop the field. Stefan
Re: [Qemu-devel] Buildbot failure: multiboot.o link error on OpenBSD current
On 20/11/13 4:15 AM, Stefan Hajnoczi wrote: Sorry I forgot to CC qemu-devel on this: On Wed, Nov 20, 2013 at 10:06 AM, Stefan Hajnoczi wrote: Hi Brad, The QEMU buildbot is failing on brad_openbsd_current as follows: Building optionrom/multiboot.img ld: multiboot.o: relocation R_X86_64_16 can not be used when making a shared object; recompile with -fPIC multiboot.o: could not read symbols: Bad value http://buildbot.b1-systems.de/qemu/builders/default_openbsd_current/builds/590 Do you have time to investigate? If not, please CC people who could help. Thanks, Stefan I have been meaning to try to attempt to come up with a patch to resolve this issue but haven't come up with anything. I have a patch in the OpenBSD ports tree that allows QEMU to build but its just taking that and having it fit within the configure/Makefile infrastructure and applied for OpenBSD only. http://www.openbsd.org/cgi-bin/cvsweb/~checkout~/ports/emulators/qemu/patches/patch-pc-bios_optionrom_Makefile?rev=1.1;content-type=text%2Fplain The issue has come up as OpenBSD 5.3 and newer now builds all binaries by default on almost all of our architectures with PIE by default; so the ROMs need to be built with PIE disabled. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.