Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info
On 11/18/13 12:53, Michael S. Tsirkin wrote: From: Igor Mammedov imamm...@redhat.com The BIOS that we ship in 1.7 does not use pci info from host and so far isn't going to use it. Taking in account problems it caused see 9604f70fdf and to avoid future incompatibility issues, it's safest to disable that interface by default for all machine types including 1.7 as it was never exposed/used by guest. And properly remove/cleanup it during 1.8 development cycle. Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Gerd Hoffmann kra...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc_piix.c | 2 +- hw/i386/pc_q35.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) etc/pci-info is precisely the form and contents that OVMF needs to download ACPI tables from qemu. http://sourceforge.net/mailarchive/forum.php?thread_name=1385450282-27007-3-git-send-email-lersek%40redhat.comforum_name=edk2-devel Please keep this exported in 1.8, for OVMF's sake. Thanks Laszlo
Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device
On Tue, Nov 26, 2013 at 5:22 PM, liguang lig.f...@cn.fujitsu.com wrote: Signed-off-by: liguang lig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |2 + hw/timer/Makefile.objs |1 + hw/timer/sunxi-pit.c| 295 +++ include/hw/timer/sunxi-pit.h| 37 + 4 files changed, 335 insertions(+), 0 deletions(-) create mode 100644 hw/timer/sunxi-pit.c create mode 100644 include/hw/timer/sunxi-pit.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index a555eef..7bf5ad0 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y + +CONFIG_SUNXI_PIT=y diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs index eca5905..f7888e9 100644 --- a/hw/timer/Makefile.objs +++ b/hw/timer/Makefile.objs @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o obj-$(CONFIG_TUSB6010) += tusb6010.o obj-$(CONFIG_MC146818RTC) += mc146818rtc.o +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c new file mode 100644 index 000..39b84ab --- /dev/null +++ b/hw/timer/sunxi-pit.c @@ -0,0 +1,295 @@ +/* + * Allwinner sunxi timer device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guang lig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/ptimer.h +#include sysemu/sysemu.h +#include hw/timer/sunxi-pit.h + + +typedef struct SunxiTimer { +ptimer_state *timer; +} SunxiTimer; + I don't understand the need for this struct. What was wrong with the direct array of ptimers you had before? +typedef struct SunxiPITState { +/* private */ +SysBusDevice parent_obj; +/* public */ +qemu_irq irq[SUNXI_TIMER_NR]; +SunxiTimer timer[SUNXI_TIMER_NR]; +MemoryRegion iomem; + +uint32_t irq_enable; +uint32_t irq_status; +uint32_t control[SUNXI_TIMER_NR]; +uint32_t interval[SUNXI_TIMER_NR]; +uint32_t count[SUNXI_TIMER_NR]; +uint32_t watch_dog_mode; +uint32_t watch_dog_control; +uint32_t count_lo; +uint32_t count_hi; +uint32_t count_ctl; +} SunxiPITState; + +static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned size) +{ +SunxiPITState *s = SUNXI_PIT(opaque); +uint8_t index = 0; initializer to 0 un-needed. + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +return s-irq_enable; +break; +case SUNXI_TIMER_IRQ_ST: +return s-irq_status; +break; +case SUNXI_TIMER_BASE ... SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT: +index = offset 0xf0; +index = 4; +index -= 1; +switch (offset 0x0f) { +case SUNXI_TIMER_CONTROL: +return s-control[index]; +break; +case SUNXI_TIMER_INTERVAL: +return s-interval[index]; +break; +case SUNXI_TIMER_COUNT: { +SunxiTimer *t = s-timer[index]; +s-count[index] = ptimer_get_count(t-timer); +return s-count[index]; +} +default: +break; +} +break; +case SUNXI_WDOG_CONTROL: +break; +case SUNXI_WDOG_MODE: +break; +case SUNXI_COUNT_LO: +return s-count_lo; +break; +case SUNXI_COUNT_HI: +return s-count_hi; +break; +case SUNXI_COUNT_CTL: +return s-count_ctl; +default: +break; Usual to do a log_guest error here. Same for writes below. +} + +return 0; +} + +static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value, +unsigned size) +{ + SunxiPITState *s = SUNXI_PIT(opaque); + uint8_t index = 0; + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +s-irq_enable = value; +break; +case SUNXI_TIMER_IRQ_ST: +s-irq_status = value; Are you missing a ~ ? This is a clear-to-clear semantic rather than write-to-clear. Also shouldn't this de-assert the relevant interrupt lines? +break; +case SUNXI_TIMER_BASE ... SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT: +index = offset 0xf0; +index = 4; +index -= 1; +switch (offset 0x0f) { +case SUNXI_TIMER_CONTROL: { +SunxiTimer
[Qemu-devel] [PATCHv2 1.8 1/9] qemu-img: add support for skipping zeroes in input during convert
we currently do not check if a sector is allocated during convert. This means if a sector is unallocated that we allocate a bounce buffer of zeroes, find out its zero later and do not write it in the best case. In the worst case this can lead to reading blocks from a raw device (like iSCSI) altough we could easily know via get_block_status that they are zero and simply skip them. This patch also fixes the progress output not being at 100% after a successful conversion. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 85 ++-- 1 file changed, 48 insertions(+), 37 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index dc0c2f0..efb744c 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1125,13 +1125,15 @@ out3: static int img_convert(int argc, char **argv) { -int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, +int c, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors, skip_create; +int64_t ret = 0; int progress = 0, flags; const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; -int64_t total_sectors, nb_sectors, sector_num, bs_offset; +int64_t total_sectors, nb_sectors, sector_num, bs_offset, +sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; const uint8_t *buf1; @@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv) QEMUOptionParameter *out_baseimg_param; char *options = NULL; const char *snapshot_name = NULL; -float local_progress = 0; int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */ bool quiet = false; Error *local_err = NULL; @@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv) sector_num = 0; nb_sectors = total_sectors; -if (nb_sectors != 0) { -local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, cluster_sectors)); -} for(;;) { int64_t bs_num; @@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv) } } sector_num += n; -qemu_progress_print(local_progress, 100); +qemu_progress_print(100.0 * sector_num / total_sectors, 0); } /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); @@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv) sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; -if (nb_sectors != 0) { -local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512)); -} for(;;) { nb_sectors = total_sectors - sector_num; if (nb_sectors = 0) { +ret = 0; break; } -if (nb_sectors = (IO_BUF_SIZE / 512)) { -n = (IO_BUF_SIZE / 512); -} else { -n = nb_sectors; -} while (sector_num - bs_offset = bs_sectors) { bs_i ++; @@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv) sector_num, bs_i, bs_offset, bs_sectors); */ } -if (n bs_offset + bs_sectors - sector_num) { -n = bs_offset + bs_sectors - sector_num; -} - -/* If the output image is being created as a copy on write image, - assume that sectors which are unallocated in the input image - are present in both the output's and input's base images (no - need to copy them). */ -if (out_baseimg) { -ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset, -n, n1); +if ((out_baseimg || has_zero_init) +sector_num = sector_num_next_status) { +n = nb_sectors INT_MAX ? INT_MAX : nb_sectors; +ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset, +n, n1); if (ret 0) { -error_report(error while reading metadata for sector - % PRId64 : %s, - sector_num - bs_offset, strerror(-ret)); +error_report(error while reading block status of sector % + PRId64 : %s, sector_num - bs_offset, + strerror(-ret)); goto out; } -if (!ret) { +/* If the output image is zero initialized, we are not working + * on a shared base and the input is zero we can skip the next + * n1 sectors */ +if
[Qemu-devel] [PATCHv2 1.8 4/9] block: add opt_transfer_length to BlockLimits
Signed-off-by: Peter Lieven p...@kamp.de --- include/block/block_int.h |3 +++ 1 file changed, 3 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index 95140b6..6499516 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -242,6 +242,9 @@ typedef struct BlockLimits { /* optimal alignment for write zeroes requests in sectors */ int64_t write_zeroes_alignment; + +/* optimal transfer length in sectors */ +int opt_transfer_length; } BlockLimits; /* -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 2/9] qemu-img: fix usage instruction for qemu-img convert
Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c |1 - 1 file changed, 1 deletion(-) diff --git a/qemu-img.c b/qemu-img.c index efb744c..e2d1a0a 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -105,7 +105,6 @@ static void help(void) conversion. If the number of bytes is 0, the source will not be scanned for\n unallocated or zero sectors, and the destination image will always be\n fully allocated\n - images will always be fully allocated\n '--output' takes the format in which the output must be done (human or json)\n '-n' skips the target volume creation (useful if the volume is created\n prior to running qemu-img)\n -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 3/9] block/iscsi: set bdi-cluster_size
this patch aims to set bdi-cluster_size to the internal page size of the iscsi target so that enabled callers can align requests properly. Reviewed-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c |7 +++ 1 file changed, 7 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index 93fee6d..75d6b87 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1580,6 +1580,13 @@ static int iscsi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi) IscsiLun *iscsilun = bs-opaque; bdi-unallocated_blocks_are_zero = !!iscsilun-lbprz; bdi-can_write_zeroes_with_unmap = iscsilun-lbprz iscsilun-lbp.lbpws; +/* Guess the internal cluster (page) size of the iscsi target by the means + * of opt_unmap_gran. Transfer the unmap granularity only if it has a + * reasonable size for bdi-cluster_size */ +if (iscsilun-bl.opt_unmap_gran * iscsilun-block_size = 64 * 1024 +iscsilun-bl.opt_unmap_gran * iscsilun-block_size = 16 * 1024 * 1024) { +bdi-cluster_size = iscsilun-bl.opt_unmap_gran * iscsilun-block_size; +} return 0; } -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 5/9] block/iscsi: set bs-bl.opt_transfer_length
Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index 75d6b87..1109d8c 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1457,6 +1457,9 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags, } bs-bl.write_zeroes_alignment = sector_lun2qemu(iscsilun-bl.opt_unmap_gran, iscsilun); + +bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len, + iscsilun); } #if defined(LIBISCSI_FEATURE_NOP_COUNTER) -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 9/9] qemu-img: increase min_sparse to 128 sectors (64kb)
Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c|4 ++-- qemu-img.texi |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index cc7540f..6f1179b 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -101,7 +101,7 @@ static void help(void) '-p' show progress of command (only certain commands)\n '-pp' show progress of command in sectors (only convert command)\n '-q' use Quiet mode - do not print any output (except errors)\n - '-S' indicates the consecutive number of bytes (defaults to 4k) that must\n + '-S' indicates the consecutive number of bytes (defaults to 64k) that must\n contain only zeros for qemu-img to create a sparse image during\n conversion. If the number of bytes is 0, the source will not be scanned for\n unallocated or zero sectors, and the destination image will always be\n @@ -1158,7 +1158,7 @@ static int img_convert(int argc, char **argv) QEMUOptionParameter *out_baseimg_param; char *options = NULL; const char *snapshot_name = NULL; -int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */ +int min_sparse = 128; /* Need at least 64k of zeros for sparse detection */ bool quiet = false; Error *local_err = NULL; diff --git a/qemu-img.texi b/qemu-img.texi index cb4a3eb..c0e86ab 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -195,7 +195,7 @@ Image conversion is also useful to get smaller image when using a growable format such as @code{qcow} or @code{cow}: the empty sectors are detected and suppressed from the destination image. -@var{sparse_size} indicates the consecutive number of bytes (defaults to 4k) +@var{sparse_size} indicates the consecutive number of bytes (defaults to 64k) that must contain only zeros for qemu-img to create a sparse image during conversion. If @var{sparse_size} is 0, the source will not be scanned for unallocated or zero sectors, and the destination image will always be -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 0/9] qemu-img convert optimizations
this series adds some optimizations for qemu-img during convert which have been developed recently: - skipping input based on get_block_status - variable I/O buffer size - align write requests to cluster_size - show progress in sectors or percent v1-v2: - introduce opt_transfer_length in BlockLimits [Paolo] - remove knobs for iobuffer_size and alignment and use them unconditionally [Paolo] - calculate I/O buffer size by BlockLimits information [Paolo] - change the alignment patch to round down to the last and not to the next aligned sector [Paolo] - limit updates in the sector progress output - new patch to increase the default for min_sparse [Paolo] Peter Lieven (9): qemu-img: add support for skipping zeroes in input during convert qemu-img: fix usage instruction for qemu-img convert block/iscsi: set bdi-cluster_size block: add opt_transfer_length to BlockLimits block/iscsi: set bs-bl.opt_transfer_length qemu-img: dynamically adjust iobuffer size during convert qemu-img: round down request length to an aligned sector qemu-img: add option to show progress in sectors qemu-img: increase min_sparse to 128 sectors (64kb) block/iscsi.c | 10 +++ include/block/block_int.h |3 + qemu-img-cmds.hx |4 +- qemu-img.c| 159 ++--- qemu-img.texi |6 +- 5 files changed, 126 insertions(+), 56 deletions(-) -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 6/9] qemu-img: dynamically adjust iobuffer size during convert
since the convert process is basically a sync operation it might be benificial in some case to change the hardcoded I/O buffer size to a greater value. This patch increases the I/O buffer size if the output driver advertises an optimal transfer length or discard alignment that is greater than the default buffer size of 2M. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index e2d1a0a..b076d44 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1135,6 +1135,7 @@ static int img_convert(int argc, char **argv) sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; +size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE; const uint8_t *buf1; BlockDriverInfo bdi; QEMUOptionParameter *param = NULL, *create_options = NULL; @@ -1371,7 +1372,16 @@ static int img_convert(int argc, char **argv) bs_i = 0; bs_offset = 0; bdrv_get_geometry(bs[0], bs_sectors); -buf = qemu_blockalign(out_bs, IO_BUF_SIZE); + +/* increase bufsectors from the default 4096 (2M) if opt_transfer_length + * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB) + * as maximum. */ +bufsectors = MIN(32768, + MAX(bufsectors, MAX(out_bs-bl.opt_transfer_length, + out_bs-bl.discard_alignment)) +); + +buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE); if (skip_create) { int64_t output_length = bdrv_getlength(out_bs); @@ -1394,7 +1404,7 @@ static int img_convert(int argc, char **argv) goto out; } cluster_size = bdi.cluster_size; -if (cluster_size = 0 || cluster_size IO_BUF_SIZE) { +if (cluster_size = 0 || cluster_size bufsectors * BDRV_SECTOR_SIZE) { error_report(invalid cluster size); ret = -1; goto out; @@ -1531,8 +1541,8 @@ static int img_convert(int argc, char **argv) sector_num_next_status = sector_num + n1; } -if (nb_sectors = (IO_BUF_SIZE / 512)) { -n = (IO_BUF_SIZE / 512); +if (nb_sectors = bufsectors) { +n = bufsectors; } else { n = nb_sectors; } -- 1.7.9.5
[Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors
Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img-cmds.hx |4 ++-- qemu-img.c | 31 --- qemu-img.texi|4 +++- 3 files changed, 33 insertions(+), 6 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index da1d965..6c8183b 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -34,9 +34,9 @@ STEXI ETEXI DEF(convert, img_convert, -convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename) +convert [-c] [-p|-pp] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename) STEXI -@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [-c] [-p|-pp] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI DEF(info, img_info, diff --git a/qemu-img.c b/qemu-img.c index 1421f0f..cc7540f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -99,6 +99,7 @@ static void help(void) rebasing in this case (useful for renaming the backing file)\n '-h' with or without a command shows this help and lists the supported formats\n '-p' show progress of command (only certain commands)\n + '-pp' show progress of command in sectors (only convert command)\n '-q' use Quiet mode - do not print any output (except errors)\n '-S' indicates the consecutive number of bytes (defaults to 4k) that must\n contain only zeros for qemu-img to create a sparse image during\n @@ -1122,6 +1123,22 @@ out3: return ret; } +static void print_sector_progress(int progress, int64_t sector_num, + int64_t total_sectors) +{ +static int64_t last_sector = -1; +if (progress == 2) { +if (sector_num == 0 || +sector_num last_sector + 0.02 * total_sectors || +sector_num == total_sectors) { +printf(%ld of %ld sectors converted.\r, sector_num, + total_sectors); +fflush(stdout); +last_sector = sector_num; +} +} +} + static int img_convert(int argc, char **argv) { int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create; @@ -1130,7 +1147,7 @@ static int img_convert(int argc, char **argv) const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; -int64_t total_sectors, nb_sectors, sector_num, bs_offset, +int64_t total_sectors = 0, nb_sectors, sector_num, bs_offset, sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; @@ -1201,7 +1218,7 @@ static int img_convert(int argc, char **argv) break; } case 'p': -progress = 1; +progress++; break; case 't': cache = optarg; @@ -1227,7 +1244,7 @@ static int img_convert(int argc, char **argv) out_filename = argv[argc - 1]; /* Initialize before goto out */ -qemu_progress_init(progress, 2.0); +qemu_progress_init(progress == 1, 2.0); if (options is_help_option(options)) { ret = print_block_option_help(out_filename, out_fmt); @@ -1258,6 +1275,8 @@ static int img_convert(int argc, char **argv) total_sectors += bs_sectors; } +print_sector_progress(progress, 0, total_sectors); + if (snapshot_name != NULL) { if (bs_n 1) { error_report(No support for concatenating multiple snapshot); @@ -1472,6 +1491,7 @@ static int img_convert(int argc, char **argv) } sector_num += n; qemu_progress_print(100.0 * sector_num / total_sectors, 0); +print_sector_progress(progress, sector_num, total_sectors); } /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); @@ -1587,11 +1607,16 @@ static int img_convert(int argc, char **argv) buf1 += n1 * 512; } qemu_progress_print(100.0 * sector_num / total_sectors, 0); +print_sector_progress(progress, sector_num, total_sectors); } } out: if (!ret) { qemu_progress_print(100, 0); +print_sector_progress(progress, total_sectors, total_sectors); +} +if (progress == 2) { +printf(\n); } qemu_progress_end(); free_option_parameters(create_options); diff --git a/qemu-img.texi b/qemu-img.texi index da36975..cb4a3eb 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -54,6 +54,8
[Qemu-devel] [PATCHv2 1.8 7/9] qemu-img: round down request length to an aligned sector
this patch shortens requests to end at an aligned sector so that the next request starts aligned. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index b076d44..1421f0f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1124,8 +1124,7 @@ out3: static int img_convert(int argc, char **argv) { -int c, n, n1, bs_n, bs_i, compress, cluster_size, -cluster_sectors, skip_create; +int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create; int64_t ret = 0; int progress = 0, flags; const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; @@ -1397,19 +1396,21 @@ static int img_convert(int argc, char **argv) } } +cluster_sectors = 0; +ret = bdrv_get_info(out_bs, bdi); +if (ret 0 compress) { +error_report(could not get block driver info); +goto out; +} else { +cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE; +} + if (compress) { -ret = bdrv_get_info(out_bs, bdi); -if (ret 0) { -error_report(could not get block driver info); -goto out; -} -cluster_size = bdi.cluster_size; -if (cluster_size = 0 || cluster_size bufsectors * BDRV_SECTOR_SIZE) { +if (cluster_sectors = 0 || cluster_sectors bufsectors) { error_report(invalid cluster size); ret = -1; goto out; } -cluster_sectors = cluster_size 9; sector_num = 0; nb_sectors = total_sectors; @@ -1547,6 +1548,15 @@ static int img_convert(int argc, char **argv) n = nb_sectors; } +/* round down request length to an aligned sector */ +if (cluster_sectors 0 n = cluster_sectors) { +int64_t next_aligned_sector = (sector_num + n); +next_aligned_sector -= next_aligned_sector % cluster_sectors; +if (sector_num + n next_aligned_sector) { +n = next_aligned_sector - sector_num; +} +} + if (n bs_offset + bs_sectors - sector_num) { n = bs_offset + bs_sectors - sector_num; } -- 1.7.9.5
Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device
Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com wrote: Signed-off-by: liguanglig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |2 + hw/timer/Makefile.objs |1 + hw/timer/sunxi-pit.c| 295 +++ include/hw/timer/sunxi-pit.h| 37 + 4 files changed, 335 insertions(+), 0 deletions(-) create mode 100644 hw/timer/sunxi-pit.c create mode 100644 include/hw/timer/sunxi-pit.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index a555eef..7bf5ad0 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y + +CONFIG_SUNXI_PIT=y diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs index eca5905..f7888e9 100644 --- a/hw/timer/Makefile.objs +++ b/hw/timer/Makefile.objs @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o obj-$(CONFIG_TUSB6010) += tusb6010.o obj-$(CONFIG_MC146818RTC) += mc146818rtc.o +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c new file mode 100644 index 000..39b84ab --- /dev/null +++ b/hw/timer/sunxi-pit.c @@ -0,0 +1,295 @@ +/* + * Allwinner sunxi timer device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guanglig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/ptimer.h +#include sysemu/sysemu.h +#include hw/timer/sunxi-pit.h + + +typedef struct SunxiTimer { +ptimer_state *timer; +} SunxiTimer; + I don't understand the need for this struct. What was wrong with the direct array of ptimers you had before? because I have to pack timer array into VMSD, and there's no VMSTATE_PTIMER_ARRAY for ptimer_state. +typedef struct SunxiPITState { +/* private*/ +SysBusDevice parent_obj; +/* public*/ +qemu_irq irq[SUNXI_TIMER_NR]; +SunxiTimer timer[SUNXI_TIMER_NR]; +MemoryRegion iomem; + +uint32_t irq_enable; +uint32_t irq_status; +uint32_t control[SUNXI_TIMER_NR]; +uint32_t interval[SUNXI_TIMER_NR]; +uint32_t count[SUNXI_TIMER_NR]; +uint32_t watch_dog_mode; +uint32_t watch_dog_control; +uint32_t count_lo; +uint32_t count_hi; +uint32_t count_ctl; +} SunxiPITState; + +static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned size) +{ +SunxiPITState *s = SUNXI_PIT(opaque); +uint8_t index = 0; initializer to 0 un-needed. OK. + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +return s-irq_enable; +break; +case SUNXI_TIMER_IRQ_ST: +return s-irq_status; +break; +case SUNXI_TIMER_BASE ... SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT: +index = offset 0xf0; +index= 4; +index -= 1; +switch (offset 0x0f) { +case SUNXI_TIMER_CONTROL: +return s-control[index]; +break; +case SUNXI_TIMER_INTERVAL: +return s-interval[index]; +break; +case SUNXI_TIMER_COUNT: { +SunxiTimer *t =s-timer[index]; +s-count[index] = ptimer_get_count(t-timer); +return s-count[index]; +} +default: +break; +} +break; +case SUNXI_WDOG_CONTROL: +break; +case SUNXI_WDOG_MODE: +break; +case SUNXI_COUNT_LO: +return s-count_lo; +break; +case SUNXI_COUNT_HI: +return s-count_hi; +break; +case SUNXI_COUNT_CTL: +return s-count_ctl; +default: +break; Usual to do a log_guest error here. Same for writes below. OK. +} + +return 0; +} + +static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value, +unsigned size) +{ + SunxiPITState *s = SUNXI_PIT(opaque); + uint8_t index = 0; + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +s-irq_enable = value; +break; +case SUNXI_TIMER_IRQ_ST: +s-irq_status= value; Are you missing a ~ ? This is a clear-to-clear semantic rather than write-to-clear. yes Also shouldn't this de-assert the relevant interrupt lines? there's no related tips in sunxi SoC data-sheet, and test is fine until now. +break; +case SUNXI_TIMER_BASE ... SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT: +index = offset
Re: [Qemu-devel] [PATCH for-1.7 0/2] block/drive-mirror: Reuse backing HD for sync=none
Il 25/11/2013 20:28, Max Reitz ha scritto: This series fixes the drive-mirror blockjob in case of none sync mode to always use the old (current) image file as the backing file of the newly created mirrored file (in case of absolute-paths mode). It is rather important to get this into 1.7, as we will introduce an at least pretty strange API in case the original file is unbacked otherwise. Max Reitz (2): block/drive-mirror: Reuse backing HD for sync=none qemu-iotests: Fix test 041 Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCH v4 2/4] hw/intc: add sunxi interrupt controller device
On Tue, Nov 26, 2013 at 5:22 PM, liguang lig.f...@cn.fujitsu.com wrote: Signed-off-by: liguang lig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |1 + hw/intc/Makefile.objs |1 + hw/intc/sunxi-pic.c | 238 +++ include/hw/intc/sunxi-pic.h | 20 4 files changed, 260 insertions(+), 0 deletions(-) create mode 100644 hw/intc/sunxi-pic.c create mode 100644 include/hw/intc/sunxi-pic.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 7bf5ad0..bbe00e4 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -83,3 +83,4 @@ CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y CONFIG_SUNXI_PIT=y +CONFIG_SUNXI_PIC=y diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs index 47ac442..dad8c43 100644 --- a/hw/intc/Makefile.objs +++ b/hw/intc/Makefile.objs @@ -12,6 +12,7 @@ common-obj-$(CONFIG_IOAPIC) += ioapic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic.o common-obj-$(CONFIG_OPENPIC) += openpic.o +common-obj-$(CONFIG_SUNXI_PIC) += sunxi-pic.o obj-$(CONFIG_APIC) += apic.o apic_common.o obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o diff --git a/hw/intc/sunxi-pic.c b/hw/intc/sunxi-pic.c new file mode 100644 index 000..e84fc55 --- /dev/null +++ b/hw/intc/sunxi-pic.c @@ -0,0 +1,238 @@ +/* + * Allwinner sunxi interrupt controller device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guang lig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/devices.h +#include sysemu/sysemu.h +#include hw/intc/sunxi-pic.h + + +typedef struct SunxiPICState { +/* private */ +SysBusDevice parent_obj; +/* public */ +MemoryRegion iomem; +qemu_irq parent_fiq; +qemu_irq parent_irq; Blank line here for readability. +uint32_t vector; +uint32_t base_addr; +uint32_t protect; +uint32_t nmi; +uint32_t irq_pending[SUNXI_PIC_REG_IDX]; +uint32_t fiq_pending[SUNXI_PIC_REG_IDX]; this appears to be constant 0. I cant find anywhere that sets fiq_pending. +uint32_t select[SUNXI_PIC_REG_IDX]; +uint32_t enable[SUNXI_PIC_REG_IDX]; +uint32_t mask[SUNXI_PIC_REG_IDX]; +/*priority setting here*/ +} SunxiPICState; + +static void sunxi_pic_update(SunxiPICState *s) +{ +uint8_t i = 0, j = 0; Initialisers un-needed. +bool irq = false; + +for (i = 0, j = 0; i SUNXI_PIC_REG_IDX; i++) { +for (j = 0; j 32; j++) { +if (!test_bit(j, (void *)s-mask[i])) { You can save on a level of indentation here with: if (test_bit(j, (void *)s-mask[i])) { continue; } +if (test_bit(j, (void *)s-irq_pending[i])) { +qemu_set_irq(s-parent_irq, 1); qemu_irq_raise() is simpler. I can't find anywhere in the code where the interrupts are lowered. How do they come down, once they are up? +irq = true; +} +if (test_bit(j, (void *)s-fiq_pending[i]) +test_bit(j, (void *)s-select[i])) { +qemu_set_irq(s-parent_fiq, 1); +irq = true; +} +if (irq) { +goto out; What happens if two interrupts are both active, the first setting IRQ the second FIQ? Wont this escape logic cause FIQ raising to potentionally be missed? +} +} +} +} +out: +return; +} + +static void sunxi_pic_set_irq(void *opaque, int irq, int level) +{ +SunxiPICState *s = opaque; + +if (level) { +set_bit(irq, (void *)s-irq_pending[irq/32]); set_bit(irq % 32, ...) +} Is this supposed to set either fiq_pending or irq_pending depending on the select bit? +sunxi_pic_update(s); +} + +static uint64_t sunxi_pic_read(void *opaque, hwaddr offset, unsigned size) +{ +SunxiPICState *s = opaque; +uint8_t index = (offset 0xc)/4; + +switch (offset) { +case SUNXI_PIC_VECTOR: +return s-vector; +break; +case SUNXI_PIC_BASE_ADDR: +return s-base_addr; +break; +case SUNXI_PIC_PROTECT: +return s-protect; +break; +case SUNXI_PIC_NMI: +return s-nmi; +break; +case SUNXI_PIC_IRQ_PENDING ... SUNXI_PIC_IRQ_PENDING + 8: +
Re: [Qemu-devel] [PATCH rebased for-1.8] i386: pc: align gpa-hpa on 1GB boundary (v6)
Il 25/11/2013 22:05, Michael S. Tsirkin ha scritto: Signed-off-by: Marcelo Tosatti mtosa...@redhat.com [Reorganize code, keep same logic. - Paolo] Signed-off-by: Paolo Bonzini pbonz...@redhat.com BTW how about a unit-test for this? Can be something along the lines of the acpi tests: run BIOS, probe what it reported. No, it can't because the change is not guest visible. It is not enabled for machine types =1.8 due to migration compatibility (the migration stream uses offsets within the RAM block, and these change), but there is no difference from the guest point of view. Paolo
Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info
On Tue, Nov 26, 2013 at 09:12:50AM +0100, Laszlo Ersek wrote: On 11/18/13 12:53, Michael S. Tsirkin wrote: From: Igor Mammedov imamm...@redhat.com The BIOS that we ship in 1.7 does not use pci info from host and so far isn't going to use it. Taking in account problems it caused see 9604f70fdf and to avoid future incompatibility issues, it's safest to disable that interface by default for all machine types including 1.7 as it was never exposed/used by guest. And properly remove/cleanup it during 1.8 development cycle. Signed-off-by: Igor Mammedov imamm...@redhat.com Reviewed-by: Gerd Hoffmann kra...@redhat.com Reviewed-by: Michael S. Tsirkin m...@redhat.com Reviewed-by: Eduardo Habkost ehabk...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/i386/pc_piix.c | 2 +- hw/i386/pc_q35.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) etc/pci-info is precisely the form and contents that OVMF needs to download ACPI tables from qemu. http://sourceforge.net/mailarchive/forum.php?thread_name=1385450282-27007-3-git-send-email-lersek%40redhat.comforum_name=edk2-devel Please keep this exported in 1.8, for OVMF's sake. Thanks Laszlo This pull request was misnamed, it was merged for 1.7. Problem is pci-info can't be implemented correctly as defined: for example we don't know where does MMCONFIG resize before it is configured. This patch was acked by several people so we'll need a stronger justification for re-introducing it. seabios manages to enumerate PCI with information exported from qemu so why can't OVMF? I think it's down to other qemu bugs (such as _CRS not covering all of PCI memory), we shall just fix them. -- MST
Re: [Qemu-devel] [PATCH v4 2/4] hw/intc: add sunxi interrupt controller device
Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com wrote: Signed-off-by: liguanglig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |1 + hw/intc/Makefile.objs |1 + hw/intc/sunxi-pic.c | 238 +++ include/hw/intc/sunxi-pic.h | 20 4 files changed, 260 insertions(+), 0 deletions(-) create mode 100644 hw/intc/sunxi-pic.c create mode 100644 include/hw/intc/sunxi-pic.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index 7bf5ad0..bbe00e4 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -83,3 +83,4 @@ CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y CONFIG_SUNXI_PIT=y +CONFIG_SUNXI_PIC=y diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs index 47ac442..dad8c43 100644 --- a/hw/intc/Makefile.objs +++ b/hw/intc/Makefile.objs @@ -12,6 +12,7 @@ common-obj-$(CONFIG_IOAPIC) += ioapic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o common-obj-$(CONFIG_ARM_GIC) += arm_gic.o common-obj-$(CONFIG_OPENPIC) += openpic.o +common-obj-$(CONFIG_SUNXI_PIC) += sunxi-pic.o obj-$(CONFIG_APIC) += apic.o apic_common.o obj-$(CONFIG_ARM_GIC_KVM) += arm_gic_kvm.o diff --git a/hw/intc/sunxi-pic.c b/hw/intc/sunxi-pic.c new file mode 100644 index 000..e84fc55 --- /dev/null +++ b/hw/intc/sunxi-pic.c @@ -0,0 +1,238 @@ +/* + * Allwinner sunxi interrupt controller device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guanglig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/devices.h +#include sysemu/sysemu.h +#include hw/intc/sunxi-pic.h + + +typedef struct SunxiPICState { +/* private*/ +SysBusDevice parent_obj; +/* public*/ +MemoryRegion iomem; +qemu_irq parent_fiq; +qemu_irq parent_irq; Blank line here for readability. OK +uint32_t vector; +uint32_t base_addr; +uint32_t protect; +uint32_t nmi; +uint32_t irq_pending[SUNXI_PIC_REG_IDX]; +uint32_t fiq_pending[SUNXI_PIC_REG_IDX]; this appears to be constant 0. I cant find anywhere that sets fiq_pending. here, only NMI can generate fiq, and I did take care NMI now, so there's no real case to set fiq. +uint32_t select[SUNXI_PIC_REG_IDX]; +uint32_t enable[SUNXI_PIC_REG_IDX]; +uint32_t mask[SUNXI_PIC_REG_IDX]; +/*priority setting here*/ +} SunxiPICState; + +static void sunxi_pic_update(SunxiPICState *s) +{ +uint8_t i = 0, j = 0; Initialisers un-needed. OK +bool irq = false; + +for (i = 0, j = 0; i SUNXI_PIC_REG_IDX; i++) { +for (j = 0; j 32; j++) { +if (!test_bit(j, (void *)s-mask[i])) { You can save on a level of indentation here with: if (test_bit(j, (void *)s-mask[i])) { continue; } OK +if (test_bit(j, (void *)s-irq_pending[i])) { +qemu_set_irq(s-parent_irq, 1); qemu_irq_raise() is simpler. I can't find anywhere in the code where the interrupts are lowered. How do they come down, once they are up? +irq = true; +} +if (test_bit(j, (void *)s-fiq_pending[i]) +test_bit(j, (void *)s-select[i])) { +qemu_set_irq(s-parent_fiq, 1); +irq = true; +} +if (irq) { +goto out; What happens if two interrupts are both active, the first setting IRQ the second FIQ? Wont this escape logic cause FIQ raising to potentionally be missed? +} +} +} +} +out: +return; +} + +static void sunxi_pic_set_irq(void *opaque, int irq, int level) +{ +SunxiPICState *s = opaque; + +if (level) { +set_bit(irq, (void *)s-irq_pending[irq/32]); set_bit(irq % 32, ...) OK +} Is this supposed to set either fiq_pending or irq_pending depending on the select bit? Yes, but, I as I stated before, maybe I will take NMI later. +sunxi_pic_update(s); +} + +static uint64_t sunxi_pic_read(void *opaque, hwaddr offset, unsigned size) +{ +SunxiPICState *s = opaque; +uint8_t index = (offset 0xc)/4; + +switch (offset) { +case SUNXI_PIC_VECTOR: +return s-vector; +break; +case SUNXI_PIC_BASE_ADDR: +return s-base_addr; +break; +case
Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device
On Tue, Nov 26, 2013 at 6:59 PM, Li Guang lig.f...@cn.fujitsu.com wrote: Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com wrote: Signed-off-by: liguanglig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |2 + hw/timer/Makefile.objs |1 + hw/timer/sunxi-pit.c| 295 +++ include/hw/timer/sunxi-pit.h| 37 + 4 files changed, 335 insertions(+), 0 deletions(-) create mode 100644 hw/timer/sunxi-pit.c create mode 100644 include/hw/timer/sunxi-pit.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index a555eef..7bf5ad0 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y + +CONFIG_SUNXI_PIT=y diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs index eca5905..f7888e9 100644 --- a/hw/timer/Makefile.objs +++ b/hw/timer/Makefile.objs @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o obj-$(CONFIG_TUSB6010) += tusb6010.o obj-$(CONFIG_MC146818RTC) += mc146818rtc.o +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c new file mode 100644 index 000..39b84ab --- /dev/null +++ b/hw/timer/sunxi-pit.c @@ -0,0 +1,295 @@ +/* + * Allwinner sunxi timer device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guanglig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/ptimer.h +#include sysemu/sysemu.h +#include hw/timer/sunxi-pit.h + + +typedef struct SunxiTimer { +ptimer_state *timer; +} SunxiTimer; + I don't understand the need for this struct. What was wrong with the direct array of ptimers you had before? because I have to pack timer array into VMSD, and there's no VMSTATE_PTIMER_ARRAY for ptimer_state. Anyway you can just use VMSTATE_STRUCT_ARRAY on ptimers own VMSD definition? If you cant, then I think you make a reasonable case for moving the relevant bits and pieces to headers so they are visible. That or implement VMSTATE_PTIMER_ARRAY. +typedef struct SunxiPITState { +/* private*/ +SysBusDevice parent_obj; +/* public*/ +qemu_irq irq[SUNXI_TIMER_NR]; +SunxiTimer timer[SUNXI_TIMER_NR]; +MemoryRegion iomem; + +uint32_t irq_enable; +uint32_t irq_status; +uint32_t control[SUNXI_TIMER_NR]; +uint32_t interval[SUNXI_TIMER_NR]; +uint32_t count[SUNXI_TIMER_NR]; +uint32_t watch_dog_mode; +uint32_t watch_dog_control; +uint32_t count_lo; +uint32_t count_hi; +uint32_t count_ctl; +} SunxiPITState; + +static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned size) +{ +SunxiPITState *s = SUNXI_PIT(opaque); +uint8_t index = 0; initializer to 0 un-needed. OK. + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +return s-irq_enable; +break; +case SUNXI_TIMER_IRQ_ST: +return s-irq_status; +break; +case SUNXI_TIMER_BASE ... SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT: +index = offset 0xf0; +index= 4; +index -= 1; +switch (offset 0x0f) { +case SUNXI_TIMER_CONTROL: +return s-control[index]; +break; +case SUNXI_TIMER_INTERVAL: +return s-interval[index]; +break; +case SUNXI_TIMER_COUNT: { +SunxiTimer *t =s-timer[index]; +s-count[index] = ptimer_get_count(t-timer); +return s-count[index]; +} +default: +break; +} +break; +case SUNXI_WDOG_CONTROL: +break; +case SUNXI_WDOG_MODE: +break; +case SUNXI_COUNT_LO: +return s-count_lo; +break; +case SUNXI_COUNT_HI: +return s-count_hi; +break; +case SUNXI_COUNT_CTL: +return s-count_ctl; +default: +break; Usual to do a log_guest error here. Same for writes below. OK. +} + +return 0; +} + +static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value, +unsigned size) +{ + SunxiPITState *s = SUNXI_PIT(opaque); + uint8_t index = 0; + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +s-irq_enable =
Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device
Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 6:59 PM, Li Guanglig.f...@cn.fujitsu.com wrote: Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com wrote: Signed-off-by: liguanglig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |2 + hw/timer/Makefile.objs |1 + hw/timer/sunxi-pit.c| 295 +++ include/hw/timer/sunxi-pit.h| 37 + 4 files changed, 335 insertions(+), 0 deletions(-) create mode 100644 hw/timer/sunxi-pit.c create mode 100644 include/hw/timer/sunxi-pit.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index a555eef..7bf5ad0 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y + +CONFIG_SUNXI_PIT=y diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs index eca5905..f7888e9 100644 --- a/hw/timer/Makefile.objs +++ b/hw/timer/Makefile.objs @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o obj-$(CONFIG_TUSB6010) += tusb6010.o obj-$(CONFIG_MC146818RTC) += mc146818rtc.o +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c new file mode 100644 index 000..39b84ab --- /dev/null +++ b/hw/timer/sunxi-pit.c @@ -0,0 +1,295 @@ +/* + * Allwinner sunxi timer device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guanglig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/ptimer.h +#include sysemu/sysemu.h +#include hw/timer/sunxi-pit.h + + +typedef struct SunxiTimer { +ptimer_state *timer; +} SunxiTimer; + I don't understand the need for this struct. What was wrong with the direct array of ptimers you had before? because I have to pack timer array into VMSD, and there's no VMSTATE_PTIMER_ARRAY for ptimer_state. Anyway you can just use VMSTATE_STRUCT_ARRAY on ptimers own VMSD definition? If you cant, then I think you make a reasonable case for moving the relevant bits and pieces to headers so they are visible. That or implement VMSTATE_PTIMER_ARRAY. right, but that seems be in a separated patch, I consider to use current way, can I? +typedef struct SunxiPITState { +/* private*/ +SysBusDevice parent_obj; +/* public*/ +qemu_irq irq[SUNXI_TIMER_NR]; +SunxiTimer timer[SUNXI_TIMER_NR]; +MemoryRegion iomem; + +uint32_t irq_enable; +uint32_t irq_status; +uint32_t control[SUNXI_TIMER_NR]; +uint32_t interval[SUNXI_TIMER_NR]; +uint32_t count[SUNXI_TIMER_NR]; +uint32_t watch_dog_mode; +uint32_t watch_dog_control; +uint32_t count_lo; +uint32_t count_hi; +uint32_t count_ctl; +} SunxiPITState; + +static uint64_t sunxi_pit_read(void *opaque, hwaddr offset, unsigned size) +{ +SunxiPITState *s = SUNXI_PIT(opaque); +uint8_t index = 0; initializer to 0 un-needed. OK. + +switch (offset) { +case SUNXI_TIMER_IRQ_EN: +return s-irq_enable; +break; +case SUNXI_TIMER_IRQ_ST: +return s-irq_status; +break; +case SUNXI_TIMER_BASE ... SUNXI_TIMER_BASE * 6 + SUNXI_TIMER_COUNT: +index = offset 0xf0; +index= 4; +index -= 1; +switch (offset 0x0f) { +case SUNXI_TIMER_CONTROL: +return s-control[index]; +break; +case SUNXI_TIMER_INTERVAL: +return s-interval[index]; +break; +case SUNXI_TIMER_COUNT: { +SunxiTimer *t =s-timer[index]; +s-count[index] = ptimer_get_count(t-timer); +return s-count[index]; +} +default: +break; +} +break; +case SUNXI_WDOG_CONTROL: +break; +case SUNXI_WDOG_MODE: +break; +case SUNXI_COUNT_LO: +return s-count_lo; +break; +case SUNXI_COUNT_HI: +return s-count_hi; +break; +case SUNXI_COUNT_CTL: +return s-count_ctl; +default: +break; Usual to do a log_guest error here. Same for writes below. OK. +} + +return 0; +} + +static void sunxi_pit_write(void *opaque, hwaddr offset, uint64_t value, +unsigned size) +{ + SunxiPITState *s = SUNXI_PIT(opaque); + uint8_t
Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
Il 26/11/2013 03:07, Fam Zheng ha scritto: +void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason) What about making BlockOpType a bitmask, and having bdrv_op_{,un}block take multiple ORed BlockOpTypes? bdrv_op_{,un}block_all then are not necessary, you only need a BLOCK_OPERATION_ALL value. Bitmap is not enough, at least it should be an array. For example when we enable multiple block jobs, there're two stoppers for drive_del, right? I said bitmask, not bitmap. :) So that you can add the same Error to multiple items of the array with a single bdrv_op_block call (by ORing them into the second parameter). Paolo
Re: [Qemu-devel] [PATCH v4 3/4] hw/arm: add sunxi machine type
On Tue, Nov 26, 2013 at 5:22 PM, liguang lig.f...@cn.fujitsu.com wrote: Signed-off-by: liguang lig.f...@cn.fujitsu.com --- hw/arm/Makefile.objs |1 + hw/arm/sunxi-soc.c | 98 ++ 2 files changed, 99 insertions(+), 0 deletions(-) create mode 100644 hw/arm/sunxi-soc.c diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 3671b42..f9f3071 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -5,3 +5,4 @@ obj-y += tosa.o versatilepb.o vexpress.o xilinx_zynq.o z2.o obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o obj-y += omap1.o omap2.o strongarm.o +obj-y += sunxi-soc.o diff --git a/hw/arm/sunxi-soc.c b/hw/arm/sunxi-soc.c new file mode 100644 index 000..b45af6d --- /dev/null +++ b/hw/arm/sunxi-soc.c @@ -0,0 +1,98 @@ +/* + * Allwinner sunxi series SoC emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guang lig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/devices.h +#include hw/boards.h +#include hw/arm/arm.h +#include hw/ptimer.h +#include hw/char/serial.h +#include hw/timer/sunxi-pit.h +#include hw/intc/sunxi-pic.h + +#include sysemu/sysemu.h +#include exec/address-spaces.h + + +#define SUNXI_PIC_REG_BASE 0x01c20400 +#define SUNXI_PIT_REG_BASE 0x01c20c00 +#define SUNXI_UART0_REG_BASE 0x01c28000 + +static struct arm_boot_info sunxi_binfo = { +.loader_start = 0x4000, +.board_id = 0x1008, +}; + +static void sunxi_init(QEMUMachineInitArgs *args) I would check with Andreas/PMM on what the go is with SoCs regarding container devices and boards. My (vague) understanding is that SoCs should be container devices and boards instantiate those containers with off-chip connectivity. This seems flat to me, with everything on board level. +{ +ram_addr_t ram_size = args-ram_size; +const char *cpu_model = args-cpu_model; +const char *kernel_filename = args-kernel_filename; +const char *kernel_cmdline = args-kernel_cmdline; +ARMCPU *cpu; +MemoryRegion *address_space_mem = get_system_memory(); +MemoryRegion *ram = g_new(MemoryRegion, 1); +MemoryRegion *ram_alias = g_new(MemoryRegion, 1); +qemu_irq pic[95]; [SUNXI_PIC_INT_NR] Regards, Peter +DeviceState *dev; +uint8_t i; + +/*here we currently support sunxi-4i*/ +cpu_model = cortex-a8; +cpu = cpu_arm_init(cpu_model); +if (!cpu) { +fprintf(stderr, Unable to find CPU definition\n); +exit(1); +} + +memory_region_init_ram(ram, NULL, sunxi-soc.ram, ram_size); +memory_region_add_subregion(address_space_mem, 0, ram); +memory_region_init_alias(ram_alias, NULL, ram.alias, ram, 0, ram_size); +memory_region_add_subregion(address_space_mem, 0x4000, ram_alias); + +dev = sysbus_create_varargs(TYPE_SUNXI_PIC, SUNXI_PIC_REG_BASE, +qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ), +qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_FIQ), +NULL); +for (i = 0; i SUNXI_PIC_INT_NR; i++) { +pic[i] = qdev_get_gpio_in(dev, i); +} + +sysbus_create_varargs(TYPE_SUNXI_PIT, SUNXI_PIT_REG_BASE, pic[22], pic[23], + pic[24], pic[25], pic[67], pic[68], NULL); + +serial_mm_init(address_space_mem, SUNXI_UART0_REG_BASE, 2, pic[1], 115200, +serial_hds[0], DEVICE_NATIVE_ENDIAN); + +sunxi_binfo.ram_size = ram_size; +sunxi_binfo.kernel_filename = kernel_filename; +sunxi_binfo.kernel_cmdline = kernel_cmdline; +arm_load_kernel(cpu, sunxi_binfo); +} + +static QEMUMachine sunxi_machine = { +.name = sunxi, +.desc = Allwinner's SoC (sunxi series), +.init = sunxi_init, +}; + +static void sunxi_machine_init(void) +{ +qemu_register_machine(sunxi_machine); +} + +machine_init(sunxi_machine_init); -- 1.7.2.5
Re: [Qemu-devel] [PATCH v4 1/4] hw/timer: add sunxi timer device
On Tue, Nov 26, 2013 at 7:19 PM, Li Guang lig.f...@cn.fujitsu.com wrote: Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 6:59 PM, Li Guanglig.f...@cn.fujitsu.com wrote: Peter Crosthwaite wrote: On Tue, Nov 26, 2013 at 5:22 PM, liguanglig.f...@cn.fujitsu.com wrote: Signed-off-by: liguanglig.f...@cn.fujitsu.com --- default-configs/arm-softmmu.mak |2 + hw/timer/Makefile.objs |1 + hw/timer/sunxi-pit.c| 295 +++ include/hw/timer/sunxi-pit.h| 37 + 4 files changed, 335 insertions(+), 0 deletions(-) create mode 100644 hw/timer/sunxi-pit.c create mode 100644 include/hw/timer/sunxi-pit.h diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak index a555eef..7bf5ad0 100644 --- a/default-configs/arm-softmmu.mak +++ b/default-configs/arm-softmmu.mak @@ -81,3 +81,5 @@ CONFIG_VERSATILE_I2C=y CONFIG_SDHCI=y CONFIG_INTEGRATOR_DEBUG=y + +CONFIG_SUNXI_PIT=y diff --git a/hw/timer/Makefile.objs b/hw/timer/Makefile.objs index eca5905..f7888e9 100644 --- a/hw/timer/Makefile.objs +++ b/hw/timer/Makefile.objs @@ -27,3 +27,4 @@ obj-$(CONFIG_SH4) += sh_timer.o obj-$(CONFIG_TUSB6010) += tusb6010.o obj-$(CONFIG_MC146818RTC) += mc146818rtc.o +obj-$(CONFIG_SUNXI_PIT) += sunxi-pit.o diff --git a/hw/timer/sunxi-pit.c b/hw/timer/sunxi-pit.c new file mode 100644 index 000..39b84ab --- /dev/null +++ b/hw/timer/sunxi-pit.c @@ -0,0 +1,295 @@ +/* + * Allwinner sunxi timer device emulation + * + * Copyright (C) 2013 Li Guang + * Written by Li Guanglig.f...@cn.fujitsu.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include hw/sysbus.h +#include hw/ptimer.h +#include sysemu/sysemu.h +#include hw/timer/sunxi-pit.h + + +typedef struct SunxiTimer { +ptimer_state *timer; +} SunxiTimer; + I don't understand the need for this struct. What was wrong with the direct array of ptimers you had before? because I have to pack timer array into VMSD, and there's no VMSTATE_PTIMER_ARRAY for ptimer_state. Anyway you can just use VMSTATE_STRUCT_ARRAY on ptimers own VMSD definition? If you cant, then I think you make a reasonable case for moving the relevant bits and pieces to headers so they are visible. That or implement VMSTATE_PTIMER_ARRAY. right, but that seems be in a separated patch, I consider to use current way, can I? Well you have effectively implemented VMSTATE_PTIMER_ARRAY here locally. Its probably best to just fix ptimer as first patch in this series and it keeps your device model cleaner from the start. You may have already done the hard work. Regards, Peter
Re: [Qemu-devel] [PATCHv2 1.8 2/9] qemu-img: fix usage instruction for qemu-img convert
Il 26/11/2013 09:56, Peter Lieven ha scritto: Reviewed-by: Eric Blake ebl...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c |1 - 1 file changed, 1 deletion(-) diff --git a/qemu-img.c b/qemu-img.c index efb744c..e2d1a0a 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -105,7 +105,6 @@ static void help(void) conversion. If the number of bytes is 0, the source will not be scanned for\n unallocated or zero sectors, and the destination image will always be\n fully allocated\n - images will always be fully allocated\n '--output' takes the format in which the output must be done (human or json)\n '-n' skips the target volume creation (useful if the volume is created\n prior to running qemu-img)\n Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCHv2 1.8 5/9] block/iscsi: set bs-bl.opt_transfer_length
Il 26/11/2013 09:56, Peter Lieven ha scritto: Signed-off-by: Peter Lieven p...@kamp.de --- block/iscsi.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/block/iscsi.c b/block/iscsi.c index 75d6b87..1109d8c 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -1457,6 +1457,9 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags, } bs-bl.write_zeroes_alignment = sector_lun2qemu(iscsilun-bl.opt_unmap_gran, iscsilun); + +bs-bl.opt_transfer_length = sector_lun2qemu(iscsilun-bl.opt_xfer_len, + iscsilun); } #if defined(LIBISCSI_FEATURE_NOP_COUNTER) Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCHv2 1.8 4/9] block: add opt_transfer_length to BlockLimits
Il 26/11/2013 09:56, Peter Lieven ha scritto: Signed-off-by: Peter Lieven p...@kamp.de --- include/block/block_int.h |3 +++ 1 file changed, 3 insertions(+) diff --git a/include/block/block_int.h b/include/block/block_int.h index 95140b6..6499516 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -242,6 +242,9 @@ typedef struct BlockLimits { /* optimal alignment for write zeroes requests in sectors */ int64_t write_zeroes_alignment; + +/* optimal transfer length in sectors */ +int opt_transfer_length; } BlockLimits; /* Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCHv2 1.8 6/9] qemu-img: dynamically adjust iobuffer size during convert
Il 26/11/2013 09:56, Peter Lieven ha scritto: since the convert process is basically a sync operation it might be benificial in some case to change the hardcoded I/O buffer size to a greater value. This patch increases the I/O buffer size if the output driver advertises an optimal transfer length or discard alignment that is greater than the default buffer size of 2M. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index e2d1a0a..b076d44 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1135,6 +1135,7 @@ static int img_convert(int argc, char **argv) sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; +size_t bufsectors = IO_BUF_SIZE / BDRV_SECTOR_SIZE; const uint8_t *buf1; BlockDriverInfo bdi; QEMUOptionParameter *param = NULL, *create_options = NULL; @@ -1371,7 +1372,16 @@ static int img_convert(int argc, char **argv) bs_i = 0; bs_offset = 0; bdrv_get_geometry(bs[0], bs_sectors); -buf = qemu_blockalign(out_bs, IO_BUF_SIZE); + +/* increase bufsectors from the default 4096 (2M) if opt_transfer_length + * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB) + * as maximum. */ +bufsectors = MIN(32768, + MAX(bufsectors, MAX(out_bs-bl.opt_transfer_length, + out_bs-bl.discard_alignment)) +); + +buf = qemu_blockalign(out_bs, bufsectors * BDRV_SECTOR_SIZE); if (skip_create) { int64_t output_length = bdrv_getlength(out_bs); @@ -1394,7 +1404,7 @@ static int img_convert(int argc, char **argv) goto out; } cluster_size = bdi.cluster_size; -if (cluster_size = 0 || cluster_size IO_BUF_SIZE) { +if (cluster_size = 0 || cluster_size bufsectors * BDRV_SECTOR_SIZE) { error_report(invalid cluster size); ret = -1; goto out; @@ -1531,8 +1541,8 @@ static int img_convert(int argc, char **argv) sector_num_next_status = sector_num + n1; } -if (nb_sectors = (IO_BUF_SIZE / 512)) { -n = (IO_BUF_SIZE / 512); +if (nb_sectors = bufsectors) { +n = bufsectors; } else { n = nb_sectors; } Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCHv2 1.8 7/9] qemu-img: round down request length to an aligned sector
Il 26/11/2013 09:56, Peter Lieven ha scritto: this patch shortens requests to end at an aligned sector so that the next request starts aligned. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 30 -- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index b076d44..1421f0f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1124,8 +1124,7 @@ out3: static int img_convert(int argc, char **argv) { -int c, n, n1, bs_n, bs_i, compress, cluster_size, -cluster_sectors, skip_create; +int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create; int64_t ret = 0; int progress = 0, flags; const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; @@ -1397,19 +1396,21 @@ static int img_convert(int argc, char **argv) } } +cluster_sectors = 0; +ret = bdrv_get_info(out_bs, bdi); +if (ret 0 compress) { +error_report(could not get block driver info); +goto out; +} else { +cluster_sectors = bdi.cluster_size / BDRV_SECTOR_SIZE; +} + if (compress) { -ret = bdrv_get_info(out_bs, bdi); -if (ret 0) { -error_report(could not get block driver info); -goto out; -} -cluster_size = bdi.cluster_size; -if (cluster_size = 0 || cluster_size bufsectors * BDRV_SECTOR_SIZE) { +if (cluster_sectors = 0 || cluster_sectors bufsectors) { error_report(invalid cluster size); ret = -1; goto out; } -cluster_sectors = cluster_size 9; sector_num = 0; nb_sectors = total_sectors; @@ -1547,6 +1548,15 @@ static int img_convert(int argc, char **argv) n = nb_sectors; } +/* round down request length to an aligned sector */ If you have to respin, /* do not bother doing this on short requests. They happen when we found an all-zero area, and the next sector to write will not be sector_num + n. */ +if (cluster_sectors 0 n = cluster_sectors) { +int64_t next_aligned_sector = (sector_num + n); +next_aligned_sector -= next_aligned_sector % cluster_sectors; +if (sector_num + n next_aligned_sector) { +n = next_aligned_sector - sector_num; +} +} + if (n bs_offset + bs_sectors - sector_num) { n = bs_offset + bs_sectors - sector_num; } Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCHv2 1.8 9/9] qemu-img: increase min_sparse to 128 sectors (64kb)
Il 26/11/2013 09:56, Peter Lieven ha scritto: Suggested-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c|4 ++-- qemu-img.texi |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index cc7540f..6f1179b 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -101,7 +101,7 @@ static void help(void) '-p' show progress of command (only certain commands)\n '-pp' show progress of command in sectors (only convert command)\n '-q' use Quiet mode - do not print any output (except errors)\n - '-S' indicates the consecutive number of bytes (defaults to 4k) that must\n + '-S' indicates the consecutive number of bytes (defaults to 64k) that must\n contain only zeros for qemu-img to create a sparse image during\n conversion. If the number of bytes is 0, the source will not be scanned for\n unallocated or zero sectors, and the destination image will always be\n @@ -1158,7 +1158,7 @@ static int img_convert(int argc, char **argv) QEMUOptionParameter *out_baseimg_param; char *options = NULL; const char *snapshot_name = NULL; -int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */ +int min_sparse = 128; /* Need at least 64k of zeros for sparse detection */ bool quiet = false; Error *local_err = NULL; diff --git a/qemu-img.texi b/qemu-img.texi index cb4a3eb..c0e86ab 100644 --- a/qemu-img.texi +++ b/qemu-img.texi @@ -195,7 +195,7 @@ Image conversion is also useful to get smaller image when using a growable format such as @code{qcow} or @code{cow}: the empty sectors are detected and suppressed from the destination image. -@var{sparse_size} indicates the consecutive number of bytes (defaults to 4k) +@var{sparse_size} indicates the consecutive number of bytes (defaults to 64k) that must contain only zeros for qemu-img to create a sparse image during conversion. If @var{sparse_size} is 0, the source will not be scanned for unallocated or zero sectors, and the destination image will always be Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Re: [Qemu-devel] [PATCH 14/16] pc: Postpone adding ACPI and SMBIOS to fw_cfg
On Thu, Nov 14, 2013 at 07:40:31AM -0600, Corey Minyard wrote: On 11/14/2013 07:38 AM, Michael S. Tsirkin wrote: On Thu, Nov 14, 2013 at 07:28:00AM -0600, Corey Minyard wrote: On 11/14/2013 01:30 AM, Michael S. Tsirkin wrote: On Tue, Nov 12, 2013 at 10:33:13AM -0600, Corey Minyard wrote: Postpone the addition of the ACPI and SMBIOS tables until after device initialization. This allows devices to add entries to these tables. Signed-off-by: Corey Minyard cminy...@mvsita.com Why delay adding FW_CFG_ACPI_TABLES? These are normally specified by user so they are constant. Because the IPMI device has to dynamically add an entry based on its configuration. Where? Which patch does this? It would be a strange thing to do because FW_CFG_ACPI_TABLES is normally intended for user-supplied tables (though q35 misused it for other purposes). I don't have that patch out yet. I had written some code to dynamically generate ACPI tables, but that was rejected and from what I understand something was done recently so these tables can be dynamically generated some other way. I haven't had time to look into this, but I wanted to get the main patch set out first. I know that these tables are generally fixed, but the entry for IPMI should only be present if the device is specified, and its contents depend on the configuration supplied, -corey OK need some help with that? Main options are - write code in ASL, supply entry in ACPI always, patch some fields to enable/disable it dynamically see how pvpanic entry is disabled for an example - write code in ASL append it to ACPI if necessary see how CPU entries are added depending on number of CPUs for an example - generate code in AML directly see acpi based pci hotplug on pci branch in my tree for an example As it is the tables get added to the BIOS access before devices initialize. -corey --- hw/i386/pc.c | 38 ++ 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index dee409d..765c95e 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -607,8 +607,6 @@ static unsigned int pc_apic_id_limit(unsigned int max_cpus) static FWCfgState *bochs_bios_init(void) { FWCfgState *fw_cfg; -uint8_t *smbios_table; -size_t smbios_len; uint64_t *numa_fw_cfg; int i, j; unsigned int apic_id_limit = pc_apic_id_limit(max_cpus); @@ -631,14 +629,8 @@ static FWCfgState *bochs_bios_init(void) fw_cfg_add_i16(fw_cfg, FW_CFG_MAX_CPUS, (uint16_t)apic_id_limit); fw_cfg_add_i32(fw_cfg, FW_CFG_ID, 1); fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size); -fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, - acpi_tables, acpi_tables_len); fw_cfg_add_i32(fw_cfg, FW_CFG_IRQ0_OVERRIDE, kvm_allows_irq0_override()); -smbios_table = smbios_get_table(smbios_len); -if (smbios_table) -fw_cfg_add_bytes(fw_cfg, FW_CFG_SMBIOS_ENTRIES, - smbios_table, smbios_len); fw_cfg_add_bytes(fw_cfg, FW_CFG_E820_TABLE, e820_table, sizeof(e820_table)); @@ -1127,6 +1119,31 @@ void pc_acpi_init(const char *default_dsdt) } } +struct pc_bios_post_init { +Notifier post_init; +void *fw_cfg; +}; + +/* Add the ACPI and SMBIOS tables after all the hardware has been initialized. + * This gives devices a chance to add to those tables. + */ +static void pc_bios_post_initfn(Notifier *n, void *opaque) +{ +struct pc_bios_post_init *p = container_of(n, struct pc_bios_post_init, + post_init); +uint8_t *smbios_table; +size_t smbios_len; + +fw_cfg_add_bytes(p-fw_cfg, FW_CFG_ACPI_TABLES, + acpi_tables, acpi_tables_len); +smbios_table = smbios_get_table(smbios_len); +if (smbios_table) +fw_cfg_add_bytes(p-fw_cfg, FW_CFG_SMBIOS_ENTRIES, + smbios_table, smbios_len); +} + +static struct pc_bios_post_init post_init; + FWCfgState *pc_memory_init(MemoryRegion *system_memory, const char *kernel_filename, const char *kernel_cmdline, @@ -1196,6 +1213,11 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, rom_add_option(option_rom[i].name, option_rom[i].bootindex); } guest_info-fw_cfg = fw_cfg; + +post_init.fw_cfg = fw_cfg; +post_init.post_init.notify = pc_bios_post_initfn; +qemu_add_machine_init_done_notifier(post_init.post_init); + return fw_cfg; } -- 1.8.3.1
Re: [Qemu-devel] os x boot broken by commit 11948748495841bd54721b250d68c7b3cb0475ef
On Thu, Nov 21, 2013 at 05:02:07PM -0500, Gabriel L. Somlo wrote: Added qemu-devel, since that is where this stuff belongs now. Everyone else, sorry for the dupe... On Thu, Nov 21, 2013 at 07:14:27PM +0100, Paolo Bonzini wrote: Can you remind us about your DSDT modifications? It should be possible to patch the HPET and applesmc bits appropriately from QEMU (or to move them from the DSDT to an SSDT that is built entirely in QEMU). It actually isn't impossible that Mac OS X would boot just fine with 1.8... My current DSDT patch (against QEMU) is enclosed below. The HPET basically needs IRQNoFlags() {2, 8}, which causes XP to bluescreen. So, I've made it conditional on the SMC STA method returning success (0x0B). The SMC node's STA method returns 0x0B unconditionally on real hardware. So I was planning on figuring out what's easier in the context of the most recent QEMU code base: 1. dynamically generating (during qemu runtime initialization) a DSDT entry for SMC with hardcoded 0x0B STA method, whenever --device isa-applesmc is present on the qemu command line or 2. writing a static (compile-time) SMC node but with a slightly smarter _STA method, which returns 0x0B when --device isa-applesmc was given on the cmdline, or which returns 0x00 in the absence of --device isa-applesmc. Either 1. or 2. could be used with HPET -- I can make inclusion of IRQNoFlags dependent on either the success or on the presence of SMC._STA() :) Let me know what you think. Thanks, --Gabriel Without discussing whether this is the right thing to do, main options to implement this would be: - write code in ASL, supply entry in ACPI always, patch some fields to enable/disable it dynamically see how pvpanic entry is disabled for an example - write code in ASL append it to ACPI if necessary see how CPU entries are added depending on number of CPUs for an example - generate code in AML directly see acpi based pci hotplug on pci branch in my tree for an example
Re: [Qemu-devel] [PATCHv2 1.8 1/9] qemu-img: add support for skipping zeroes in input during convert
Il 26/11/2013 09:56, Peter Lieven ha scritto: we currently do not check if a sector is allocated during convert. This means if a sector is unallocated that we allocate a bounce buffer of zeroes, find out its zero later and do not write it in the best case. In the worst case this can lead to reading blocks from a raw device (like iSCSI) altough we could easily know via get_block_status that they are zero and simply skip them. This patch also fixes the progress output not being at 100% after a successful conversion. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 85 ++-- 1 file changed, 48 insertions(+), 37 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index dc0c2f0..efb744c 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1125,13 +1125,15 @@ out3: static int img_convert(int argc, char **argv) { -int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, +int c, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors, skip_create; +int64_t ret = 0; int progress = 0, flags; const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; -int64_t total_sectors, nb_sectors, sector_num, bs_offset; +int64_t total_sectors, nb_sectors, sector_num, bs_offset, +sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; const uint8_t *buf1; @@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv) QEMUOptionParameter *out_baseimg_param; char *options = NULL; const char *snapshot_name = NULL; -float local_progress = 0; int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */ bool quiet = false; Error *local_err = NULL; @@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv) sector_num = 0; nb_sectors = total_sectors; -if (nb_sectors != 0) { -local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, cluster_sectors)); -} for(;;) { int64_t bs_num; @@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv) } } sector_num += n; -qemu_progress_print(local_progress, 100); +qemu_progress_print(100.0 * sector_num / total_sectors, 0); } /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); @@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv) sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; -if (nb_sectors != 0) { -local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512)); -} for(;;) { nb_sectors = total_sectors - sector_num; if (nb_sectors = 0) { +ret = 0; break; } -if (nb_sectors = (IO_BUF_SIZE / 512)) { -n = (IO_BUF_SIZE / 512); -} else { -n = nb_sectors; -} while (sector_num - bs_offset = bs_sectors) { bs_i ++; @@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv) sector_num, bs_i, bs_offset, bs_sectors); */ } -if (n bs_offset + bs_sectors - sector_num) { -n = bs_offset + bs_sectors - sector_num; -} - -/* If the output image is being created as a copy on write image, - assume that sectors which are unallocated in the input image - are present in both the output's and input's base images (no - need to copy them). */ -if (out_baseimg) { -ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset, -n, n1); +if ((out_baseimg || has_zero_init) +sector_num = sector_num_next_status) { +n = nb_sectors INT_MAX ? INT_MAX : nb_sectors; +ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset, +n, n1); if (ret 0) { -error_report(error while reading metadata for sector - % PRId64 : %s, - sector_num - bs_offset, strerror(-ret)); +error_report(error while reading block status of sector % + PRId64 : %s, sector_num - bs_offset, + strerror(-ret)); goto out; } -if (!ret) { +/* If the output image is zero
Re: [Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors
I think the right way to do this would be to add the functionality to qemu-progress.c (i.e. pass a number of sectors and let it choose between printing % or sectors). I don't like the qemu-progress.c API anyway, so a rewrite would be welcome. :) Paolo Il 26/11/2013 09:56, Peter Lieven ha scritto: Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img-cmds.hx |4 ++-- qemu-img.c | 31 --- qemu-img.texi|4 +++- 3 files changed, 33 insertions(+), 6 deletions(-) diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index da1d965..6c8183b 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -34,9 +34,9 @@ STEXI ETEXI DEF(convert, img_convert, -convert [-c] [-p] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename) +convert [-c] [-p|-pp] [-q] [-n] [-f fmt] [-t cache] [-O output_fmt] [-o options] [-s snapshot_name] [-S sparse_size] filename [filename2 [...]] output_filename) STEXI -@item convert [-c] [-p] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} +@item convert [-c] [-p|-pp] [-q] [-n] [-f @var{fmt}] [-t @var{cache}] [-O @var{output_fmt}] [-o @var{options}] [-s @var{snapshot_name}] [-S @var{sparse_size}] @var{filename} [@var{filename2} [...]] @var{output_filename} ETEXI DEF(info, img_info, diff --git a/qemu-img.c b/qemu-img.c index 1421f0f..cc7540f 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -99,6 +99,7 @@ static void help(void) rebasing in this case (useful for renaming the backing file)\n '-h' with or without a command shows this help and lists the supported formats\n '-p' show progress of command (only certain commands)\n + '-pp' show progress of command in sectors (only convert command)\n '-q' use Quiet mode - do not print any output (except errors)\n '-S' indicates the consecutive number of bytes (defaults to 4k) that must\n contain only zeros for qemu-img to create a sparse image during\n @@ -1122,6 +1123,22 @@ out3: return ret; } +static void print_sector_progress(int progress, int64_t sector_num, + int64_t total_sectors) +{ +static int64_t last_sector = -1; +if (progress == 2) { +if (sector_num == 0 || +sector_num last_sector + 0.02 * total_sectors || +sector_num == total_sectors) { +printf(%ld of %ld sectors converted.\r, sector_num, + total_sectors); +fflush(stdout); +last_sector = sector_num; +} +} +} + static int img_convert(int argc, char **argv) { int c, n, n1, bs_n, bs_i, compress, cluster_sectors, skip_create; @@ -1130,7 +1147,7 @@ static int img_convert(int argc, char **argv) const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; -int64_t total_sectors, nb_sectors, sector_num, bs_offset, +int64_t total_sectors = 0, nb_sectors, sector_num, bs_offset, sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; @@ -1201,7 +1218,7 @@ static int img_convert(int argc, char **argv) break; } case 'p': -progress = 1; +progress++; break; case 't': cache = optarg; @@ -1227,7 +1244,7 @@ static int img_convert(int argc, char **argv) out_filename = argv[argc - 1]; /* Initialize before goto out */ -qemu_progress_init(progress, 2.0); +qemu_progress_init(progress == 1, 2.0); if (options is_help_option(options)) { ret = print_block_option_help(out_filename, out_fmt); @@ -1258,6 +1275,8 @@ static int img_convert(int argc, char **argv) total_sectors += bs_sectors; } +print_sector_progress(progress, 0, total_sectors); + if (snapshot_name != NULL) { if (bs_n 1) { error_report(No support for concatenating multiple snapshot); @@ -1472,6 +1491,7 @@ static int img_convert(int argc, char **argv) } sector_num += n; qemu_progress_print(100.0 * sector_num / total_sectors, 0); +print_sector_progress(progress, sector_num, total_sectors); } /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); @@ -1587,11 +1607,16 @@ static int img_convert(int argc, char **argv) buf1 += n1 * 512; } qemu_progress_print(100.0 * sector_num / total_sectors, 0); +print_sector_progress(progress, sector_num,
Re: [Qemu-devel] os x boot broken by commit 11948748495841bd54721b250d68c7b3cb0475ef
On Fri, Nov 22, 2013 at 10:00:20AM +0100, Paolo Bonzini wrote: Il 21/11/2013 23:02, Gabriel L. Somlo ha scritto: On Thu, Nov 21, 2013 at 07:14:27PM +0100, Paolo Bonzini wrote: Can you remind us about your DSDT modifications? It should be possible to patch the HPET and applesmc bits appropriately from QEMU (or to move them from the DSDT to an SSDT that is built entirely in QEMU). It actually isn't impossible that Mac OS X would boot just fine with 1.8... My current DSDT patch (against QEMU) is enclosed below. The HPET basically needs IRQNoFlags() {2, 8}, which causes XP to bluescreen. The IRQNoFlags(){2,8} setting makes sense if the general configuration register of the HPET has bits 0..1=1 (HPET enabled = 1 and HPET legacy replacement route = 1). That would be something like Field(HPTM, DWordAcc, Lock, Preserve) { VEND, 32, PRD, 32, UNUS, 32 GCNF, 32 } ... Method(_CRS, 0) { Store(GCNF, Local0) If (LEqual(LAnd(Local0, 3), 3)) { // Legacy replacement route ConcatenateResTemplate(RESP, RESI, Local1) Return (Local1) } else { Return (RESP) } } Which reminds me. We run C preprocessor over the source so there's no good reason to use 4-byte names anymore really (iasl also has an integrated preprocessor but that only appeared in 2012, not sure it's wise to rely on that). So simply #define HPET_MEMORY HPTM and use HPET_MEMORY everywhere. If that doesn't work, there are various choices here... (1) Does Mac OS work if you add a _PRS with IRQNoFlags and Memory32Fixed, but leave _CRS as it is? (2) does it work with -no-hpet? (3) you could also make that dependent on _OSI(Darwin). It's unlikely that Linux and/or Windows expose _OSI(Darwin), and anyway the BSOD is only there for Windows XP as I understand it. So, I've made it conditional on the SMC STA method returning success (0x0B). That would mean that running Windows XP on Mac OS X hardware breaks, though. The SMC node's STA method returns 0x0B unconditionally on real hardware. So I was planning on figuring out what's easier in the context of the most recent QEMU code base: 1. dynamically generating (during qemu runtime initialization) a DSDT entry for SMC with hardcoded 0x0B STA method, whenever --device isa-applesmc is present on the qemu command line or 2. writing a static (compile-time) SMC node but with a slightly smarter _STA method, which returns 0x0B when --device isa-applesmc was given on the cmdline, or which returns 0x00 in the absence of --device isa-applesmc. Either would work. See acpi_get_misc_info and patch_ssdt in hw/i386/acpi-build.c. I think device-dependent ACPI stuff should become a QOM interface, but you need not do that. Paolo
Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
Il 26/11/2013 11:19, Fam Zheng ha scritto: So that you can add the same Error to multiple items of the array with a single bdrv_op_block call (by ORing them into the second parameter). What data type to contain this? I'm not sure 64 is enough in long term... I would just use an uint64_t, I think you have ~20 operations now. There is still quite some breathing room. Paolo
Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
On Tue, Nov 26, 2013 at 6:25 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/11/2013 11:19, Fam Zheng ha scritto: So that you can add the same Error to multiple items of the array with a single bdrv_op_block call (by ORing them into the second parameter). What data type to contain this? I'm not sure 64 is enough in long term... I would just use an uint64_t, I think you have ~20 operations now. There is still quite some breathing room. OK. Then, can we still use QAPI enum? Because I think being possible to query blockers makes sense, that way the user doesn't have to trial and error to know what operation is blocked. Fam
Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
On Tue, Nov 26, 2013 at 5:22 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/11/2013 03:07, Fam Zheng ha scritto: +void bdrv_op_unblock(BlockDriverState *bs, BlockOpType op, Error *reason) What about making BlockOpType a bitmask, and having bdrv_op_{,un}block take multiple ORed BlockOpTypes? bdrv_op_{,un}block_all then are not necessary, you only need a BLOCK_OPERATION_ALL value. Bitmap is not enough, at least it should be an array. For example when we enable multiple block jobs, there're two stoppers for drive_del, right? I said bitmask, not bitmap. :) OK. Sorry.. So that you can add the same Error to multiple items of the array with a single bdrv_op_block call (by ORing them into the second parameter). What data type to contain this? I'm not sure 64 is enough in long term... Fam
Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
Il 26/11/2013 11:31, Fam Zheng ha scritto: I would just use an uint64_t, I think you have ~20 operations now. There is still quite some breathing room. OK. Then, can we still use QAPI enum? Because I think being possible to query blockers makes sense, that way the user doesn't have to trial and error to know what operation is blocked. Yes, we can. You would still have to add a bunch of #define BLOCK_OPERATION_MASK_RESIZE (1ULL BLOCK_OPERATION_TYPE_RESIZE) in block.h. Paolo
[Qemu-devel] [PATCH] qcow2: Zero-initialise first cluster for new images
Strictly speaking, this is only required for has_zero_init() == false, but it's easy enough to just do a cluster-aligned write that is padded with zeros after the header. This fixes that after 'qemu-img create' header extensions are attempted to be parsed that are really just random leftover data. Cc: qemu-sta...@nongnu.org Signed-off-by: Kevin Wolf kw...@redhat.com --- block/qcow2.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 6e5d98d..7c18587 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1471,7 +1471,7 @@ static int qcow2_create2(const char *filename, int64_t total_size, * size for any qcow2 image. */ BlockDriverState* bs; -QCowHeader header; +QCowHeader *header; uint8_t* refcount_table; Error *local_err = NULL; int ret; @@ -1489,30 +1489,35 @@ static int qcow2_create2(const char *filename, int64_t total_size, } /* Write the header */ -memset(header, 0, sizeof(header)); -header.magic = cpu_to_be32(QCOW_MAGIC); -header.version = cpu_to_be32(version); -header.cluster_bits = cpu_to_be32(cluster_bits); -header.size = cpu_to_be64(0); -header.l1_table_offset = cpu_to_be64(0); -header.l1_size = cpu_to_be32(0); -header.refcount_table_offset = cpu_to_be64(cluster_size); -header.refcount_table_clusters = cpu_to_be32(1); -header.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT); -header.header_length = cpu_to_be32(sizeof(header)); +QEMU_BUILD_BUG_ON((1 MIN_CLUSTER_BITS) sizeof(*header)); +header = g_malloc(cluster_size); +memset(header, 0, cluster_size); +*header = (QCowHeader) { +.magic = cpu_to_be32(QCOW_MAGIC), +.version= cpu_to_be32(version), +.cluster_bits = cpu_to_be32(cluster_bits), +.size = cpu_to_be64(0), +.l1_table_offset= cpu_to_be64(0), +.l1_size= cpu_to_be32(0), +.refcount_table_offset = cpu_to_be64(cluster_size), +.refcount_table_clusters= cpu_to_be32(1), +.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT), +.header_length = cpu_to_be32(sizeof(*header)), +}; if (flags BLOCK_FLAG_ENCRYPT) { -header.crypt_method = cpu_to_be32(QCOW_CRYPT_AES); +header-crypt_method = cpu_to_be32(QCOW_CRYPT_AES); } else { -header.crypt_method = cpu_to_be32(QCOW_CRYPT_NONE); +header-crypt_method = cpu_to_be32(QCOW_CRYPT_NONE); } if (flags BLOCK_FLAG_LAZY_REFCOUNTS) { -header.compatible_features |= +header-compatible_features |= cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS); } -ret = bdrv_pwrite(bs, 0, header, sizeof(header)); +ret = bdrv_pwrite(bs, 0, header, cluster_size); +g_free(header); if (ret 0) { error_setg_errno(errp, -ret, Could not write qcow2 header); goto out; -- 1.8.1.4
Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info
Hi, I think it's down to other qemu bugs (such as _CRS not covering all of PCI memory), we shall just fix them. i.e. the attached patch should fixup things. cheers, Gerd From a81b8d66e24fd298ce7654d424a378337e6cf132 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann kra...@redhat.com Date: Tue, 26 Nov 2013 11:46:11 +0100 Subject: [PATCH] piix: fix 32bit pci hole Make the 32bit pci hole start at end of ram, so all possible address space is covered. Of course the firmware can use less than that. Leaving space unused is no problem, mapping pci bars outside the hole causes problems though. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/pci-host/piix.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index edc974e..1414a2b 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -345,15 +345,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, f-ram_memory = ram_memory; i440fx = I440FX_PCI_HOST_BRIDGE(dev); -/* Set PCI window size the way seabios has always done it. */ -/* Power of 2 so bios can cover it with a single MTRR */ -if (ram_size = 0x8000) { -i440fx-pci_info.w32.begin = 0x8000; -} else if (ram_size = 0xc000) { -i440fx-pci_info.w32.begin = 0xc000; -} else { -i440fx-pci_info.w32.begin = 0xe000; -} +i440fx-pci_info.w32.begin = ram_size; memory_region_init_alias(f-pci_hole, OBJECT(d), pci-hole, f-pci_address_space, pci_hole_start, pci_hole_size); -- 1.8.3.1
Re: [Qemu-devel] [PATCH v4 7/7] block: Allow backup on referenced named BlockDriverState
Am 26.11.2013 um 04:06 hat Fam Zheng geschrieben: On 2013年11月25日 19:23, Kevin Wolf wrote: Am 22.11.2013 um 06:24 hat Fam Zheng geschrieben: Drive backup is a read only operation on source bs. We want to allow this specific case to enable image-fleecing. Note that when image-fleecing job starts, the job still add its blocker to source bs, and any other operation on it will be blocked by that. Signed-off-by: Fam Zheng f...@redhat.com --- block.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block.c b/block.c index a5da656..d30be51 100644 --- a/block.c +++ b/block.c @@ -1179,6 +1179,8 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options, device is used as backing hd of '%s', bs-device_name); bdrv_op_block_all(bs-backing_hd, bs-backing_blocker); +bdrv_op_unblock(bs-backing_hd, BLOCK_OP_TYPE_BACKUP, +bs-backing_blocker); pstrcpy(bs-backing_file, sizeof(bs-backing_file), bs-backing_hd-filename); pstrcpy(bs-backing_format, sizeof(bs-backing_format), We probably need separate blockers for can be a backup source and can be a backup target. Because I think this allows using it as a read-write target as well, which was not intended. Yes. Will do it. Do we need to cover this in other parts of the code as well, like when adding a new BDS during external snapshot creation? Does it have a name? If not I think we are safe there. Not yet, but I think it won't be long until we do have named nodes created this way, so considering it now certainly can't hurt. Kevin
Re: [Qemu-devel] [PATCH v4 2/7] block: Introduce op_blockers to BlockDriverState
On Tue, Nov 26, 2013 at 6:41 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/11/2013 11:31, Fam Zheng ha scritto: I would just use an uint64_t, I think you have ~20 operations now. There is still quite some breathing room. OK. Then, can we still use QAPI enum? Because I think being possible to query blockers makes sense, that way the user doesn't have to trial and error to know what operation is blocked. Yes, we can. You would still have to add a bunch of #define BLOCK_OPERATION_MASK_RESIZE (1ULL BLOCK_OPERATION_TYPE_RESIZE) in block.h. Well that works, but then we have a repeated list of operations in two places, I'm not sure that would be good to have. Kevin, Stefan, any opinion? Fam
Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
On 11/25/2013 05:48 PM, Paolo Bonzini wrote: Il 25/11/2013 08:29, Lei Li ha scritto: In this case, if the migration would fail just because the misconfiguration of device state on destination, in the meantime the outgoing migration has no aware of this failure, I think it should add such handling (like synchronize of the device state list in incoming side?) to the current migration protocol as it is kind of missing... It can not just rely on the resume of source guest for such failure... or maybe it should be handled in management app to force the configuration right? It is already handled by libvirt, indeed. Basically, -incoming without -S is a broken option because of the missing handshake at the end of migration. With -S something else (either a human or a program) can check that everything went well and choose whether to restart the source or the destination. I see, thanks for your explanation. :-) BTW, do you think we should add such handling to the current migration protocol? Postcopy would fix this (assuming the postcopy phase is reliable) by migrating device data before any page flipping occurs. Are you suggesting that page flipping should be coupled with the postcopy migration for live upgrade of QEMU as your comments in the previous version? In order to make live upgrade reliable, it should. The whole procedure for page flipping migration is straight forward, and the cases of failure I listed are in theory, which never happened at least since many times I have tested (except the case you raised above). But I agree with you on coupling with postcopy migration to make it more reliable, specially for the undetected problems. For this, I am not quite sure I understand it correctly, seems the latest update of post copy migration was sent on last Oct, would you please give some insights on what else could I do for the coupling with postcopy migration? If no, now page flipping is implemented as a migration capability, and it's a good shape already as your comments in the previous version. Although it still needs a little more time to get the numbers of the new vmsplice, I'd to ask your opinion that do you consider it could be merged as an experimental version for now? Paolo -- Lei
Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram
Il 26/11/2013 12:07, Lei Li ha scritto: Basically, -incoming without -S is a broken option because of the missing handshake at the end of migration. With -S something else (either a human or a program) can check that everything went well and choose whether to restart the source or the destination. I see, thanks for your explanation. :-) BTW, do you think we should add such handling to the current migration protocol? I think it's not included by design. The whole procedure for page flipping migration is straight forward, and the cases of failure I listed are in theory, which never happened at least since many times I have tested (except the case you raised above). But I agree with you on coupling with postcopy migration to make it more reliable, specially for the undetected problems. The only problem that worries me is failing to load device data (most likely due to misconfiguration or a bug). For this, I am not quite sure I understand it correctly, seems the latest update of post copy migration was sent on last Oct, would you please give some insights on what else could I do for the coupling with postcopy migration? I don't know the state exactly. Orit and Andrea should know. If no, now page flipping is implemented as a migration capability, and it's a good shape already as your comments in the previous version. Although it still needs a little more time to get the numbers of the new vmsplice, I'd to ask your opinion that do you consider it could be merged as an experimental version for now? Yes, that could be useful. I will review the patch as soon as possible. Paolo
Re: [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit
Il 21/11/2013 10:11, Lei Li ha scritto: This patch implements save_page callback for the outside of page flipping. It will write the address of the page on the Unix socket and flip the page data on pipe by vmsplice(). Every page address would have a header flag RAM_SAVE_FLAG_HOOK, which will trigger the load hook to receive it in incoming side as well. Signed-off-by: Lei Li li...@linux.vnet.ibm.com --- migration-local.c | 54 + 1 files changed, 54 insertions(+), 0 deletions(-) diff --git a/migration-local.c b/migration-local.c index 0f0896b..14207e9 100644 --- a/migration-local.c +++ b/migration-local.c @@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void *opaque, return 0; } +static size_t qemu_local_save_ram(QEMUFile *f, void *opaque, + MemoryRegion *mr, ram_addr_t offset, + size_t size, int *bytes_sent) +{ +QEMUFileLocal *s = opaque; +ram_addr_t current_addr = mr-ram_addr + offset; +void *ram_addr; +ssize_t ret; + +if (s-unix_page_flipping) { +qemu_fflush(s-file); +qemu_put_be64(s-file, RAM_SAVE_FLAG_HOOK); + +/* Write page address to unix socket */ +qemu_put_be64(s-file, current_addr); + You can write current_addr | RAM_SAVE_FLAG_HOOK. The value will be in the flags argument of the hook_ram_load, you can extract it with flags ~RAM_SAVE_FLAG_HOOK. This cuts by half the data written to the Unix socket. Paolo +ram_addr = memory_region_get_ram_ptr(mr) + offset; + +/* vmsplice page data to pipe */ +struct iovec iov = { +.iov_base = ram_addr, +.iov_len = size, +}; + +/* + * The flag SPLICE_F_MOVE is introduced in kernel for the page + * flipping feature in QEMU, which will movie pages rather than + * copying, previously unused. + * + * If a move is not possible the kernel will transparently falls + * back to copying data. + * + * For older kernels the SPLICE_F_MOVE would be ignored and a copy + * would occur. + */ +ret = vmsplice(s-pipefd[1], iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE); +if (ret == -1) { +if (errno != EAGAIN errno != EINTR) { +fprintf(stderr, vmsplice save error: %s\n, strerror(errno)); +return ret; +} +} else { +if (bytes_sent) { +*bytes_sent = 1; +} +DPRINTF(block_offset: %lu, offset: %lu\n, block_offset, offset); +return 0; +} +} + +return RAM_SAVE_CONTROL_NOT_SUPP; +} + static const QEMUFileOps pipe_read_ops = { .get_fd= qemu_local_get_sockfd, .get_buffer= qemu_local_get_buffer, @@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = { .writev_buffer = qemu_local_writev_buffer, .close = qemu_local_close, .before_ram_iterate = qemu_local_send_pipefd, +.save_page = qemu_local_save_ram }; QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode)
Re: [Qemu-devel] [PATCH] SPARC: Fix LEON3 power down instruction
On 11/25/2013 03:22 PM, Sebastian Huber wrote: The env-pc is not necessarily up-to-date in the helper function. Use the program counter of the disassembly context instead. Looks good. Thanks Sebastian. Reviewed-by: Fabien Chouteau chout...@adacore.com Signed-off-by: Sebastian Huber sebastian.hu...@embedded-brains.de --- target-sparc/helper.c|6 +++--- target-sparc/helper.h|2 +- target-sparc/translate.c |3 ++- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/target-sparc/helper.c b/target-sparc/helper.c index e70d1bc..50912ff 100644 --- a/target-sparc/helper.c +++ b/target-sparc/helper.c @@ -314,14 +314,14 @@ target_ulong helper_tsubcctv(CPUSPARCState *env, target_ulong src1, } #ifndef TARGET_SPARC64 -void helper_power_down(CPUSPARCState *env) +void helper_power_down(CPUSPARCState *env, uint32_t pc) { CPUState *cs = CPU(sparc_env_get_cpu(env)); cs-halted = 1; env-exception_index = EXCP_HLT; -env-pc = env-npc; -env-npc = env-pc + 4; +env-pc = pc + 4; +env-npc = pc + 8; cpu_loop_exit(env); } #endif diff --git a/target-sparc/helper.h b/target-sparc/helper.h index fc49cd8..c4752c7 100644 --- a/target-sparc/helper.h +++ b/target-sparc/helper.h @@ -6,7 +6,7 @@ DEF_HELPER_2(trace_insn, void, env, i32) DEF_HELPER_1(rett, void, env) DEF_HELPER_2(wrpsr, void, env, tl) DEF_HELPER_1(rdpsr, tl, env) -DEF_HELPER_1(power_down, void, env) +DEF_HELPER_2(power_down, void, env, i32) #else DEF_HELPER_2(wrpil, void, env, tl) DEF_HELPER_2(wrpstate, void, env, tl) diff --git a/target-sparc/translate.c b/target-sparc/translate.c index 0588d23..d9ee90c 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -3631,7 +3631,8 @@ static void disas_sparc_insn(DisasContext * dc, unsigned int insn) if ((rd == 0x13) (dc-def-features CPU_FEATURE_POWERDOWN)) { /* LEON3 power-down */ -gen_helper_power_down(cpu_env); +tcg_gen_movi_i32(cpu_tmp0, dc-pc); +gen_helper_power_down(cpu_env, cpu_tmp0); } break; #else
Re: [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load
Il 21/11/2013 10:11, Lei Li ha scritto: +static int qemu_local_ram_load(QEMUFile *f, void *opaque, + uint64_t flags) +{ +QEMUFileLocal *s = opaque; +ram_addr_t addr; +struct iovec iov; +ssize_t ret = -EINVAL; + +/* + * PIPE file descriptor will be received by another callback + * get_buffer. + */ +if (pipefd_passed) { +void *host; +/* + * Extract the page address from the 8-byte record and + * read the page data from the pipe. + */ +addr = qemu_get_be64(s-file); +host = qemu_get_ram_ptr(addr); + +iov.iov_base = host; +iov.iov_len = TARGET_PAGE_SIZE; + +/* The flag SPLICE_F_MOVE is introduced in kernel for the page + * flipping feature in QEMU, which will movie pages rather than + * copying, previously unused. + * + * If a move is not possible the kernel will transparently falls + * back to copying data. + * + * For older kernels the SPLICE_F_MOVE would be ignored and a copy + * would occur. + */ +ret = vmsplice(s-pipefd[0], iov, 1, SPLICE_F_MOVE); +if (ret == -1) { +if (errno != EAGAIN errno != EINTR) { +fprintf(stderr, vmsplice() load error: %s, strerror(errno)); +return ret; +} +DPRINTF(vmsplice load error\n); +} else if (ret == 0) { +DPRINTF(stderr, load_page: zero read\n); +} + +DPRINTF(vmsplice (read): %zu\n, ret); +return ret; +} + +return 0; +} I think you need to return -EINVAL if there is no pipe. Paolo
Re: [Qemu-devel] [PATCH 06/17] migration-local: add send_pipefd()
Il 21/11/2013 10:11, Lei Li ha scritto: +struct cmsghdr *cmptr; +char req[1] = { 0x01 }; About this, see my reply to patch 8. +if (pipefd 0) { +msg.msg_control = NULL; +msg.msg_controllen = 0; +/* Negative status means error */ +req[0] = pipefd; No need for this. qemu_fopen_socket_local has failed already, and you will never get here. Paolo +} else { +msg.msg_control = control_un.control; +msg.msg_controllen = sizeof(control_un.control); + +cmptr = CMSG_FIRSTHDR(msg); +cmptr-cmsg_len = CMSG_LEN(sizeof(int)); +cmptr-cmsg_level = SOL_SOCKET; +cmptr-cmsg_type = SCM_RIGHTS; +*((int *) CMSG_DATA(cmptr)) = pipefd; + +msg.msg_name = NULL; +msg.msg_namelen = 0; + +iov[0].iov_base = req; +iov[0].iov_len = sizeof(req); +msg.msg_iov = iov; +msg.msg_iovlen = 1; +}
Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
On 11/21/2013 05:11 PM, Lei Li wrote: The control message for exchange of pipe file descriptor should be received by recvmsg, and it might be eaten to stream file by qemu_recv() when receiving by two callbacks. So this patch adds unix_msgfd_lookup() to callback get_buffer as the only one receiver, where the pipe file descriptor would be caughted. Signed-off-by: Lei Li li...@linux.vnet.ibm.com --- migration-local.c | 68 ++-- 1 files changed, 65 insertions(+), 3 deletions(-) diff --git a/migration-local.c b/migration-local.c index e028beb..0f0896b 100644 --- a/migration-local.c +++ b/migration-local.c @@ -50,6 +50,8 @@ typedef struct QEMUFileLocal { bool unix_page_flipping; } QEMUFileLocal; +static bool pipefd_passed; + static int qemu_local_get_sockfd(void *opaque) { QEMUFileLocal *s = opaque; @@ -57,16 +59,76 @@ static int qemu_local_get_sockfd(void *opaque) return s-sockfd; } +static int unix_msgfd_lookup(void *opaque, struct msghdr *msg) +{ +QEMUFileLocal *s = opaque; +struct cmsghdr *cmsg; +bool found = false; + +for (cmsg = CMSG_FIRSTHDR(msg); cmsg; cmsg = CMSG_NXTHDR(msg, cmsg)) { +if (cmsg-cmsg_len != CMSG_LEN(sizeof(int)) || +cmsg-cmsg_level != SOL_SOCKET || +cmsg-cmsg_type != SCM_RIGHTS) +continue; + +/* PIPE file descriptor to be received */ +s-pipefd[0] = *((int *)CMSG_DATA(cmsg)); +} + +if (s-pipefd[0] = 0) { And this should be if (s-pipefd[0] 0).. +fprintf(stderr, no pipe fd can be received\n); +return found; +} + +DPRINTF(pipefd successfully received\n); +return s-pipefd[0]; +} + static int qemu_local_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) { QEMUFileLocal *s = opaque; ssize_t len; +struct msghdr msg = { NULL, }; +struct iovec iov[1]; +union { +struct cmsghdr cmsg; +char control[CMSG_SPACE(sizeof(int))]; +} msg_control; + +iov[0].iov_base = buf; +iov[0].iov_len = size; + +msg.msg_iov = iov; +msg.msg_iovlen = 1; +msg.msg_control = msg_control; +msg.msg_controllen = sizeof(msg_control); for (;;) { -len = qemu_recv(s-sockfd, buf, size, 0); -if (len != -1) { -break; +if (!pipefd_passed) { +/* + * recvmsg is called here to catch the control message for + * the exchange of PIPE file descriptor until it is received. + */ +len = recvmsg(s-sockfd, msg, 0); +if (len != -1) { +if (unix_msgfd_lookup(s, msg) 0) { +pipefd_passed = 1; +/* + * Do not count one byte taken by the PIPE file + * descriptor. + */ +len--; +} else { +len = -1; +} Just found that this 'else' should go away as it will break the normal Unix migration since pipefd_passed will always be 0 for it. I have fixed this in my code, seems I mis-send it for some reason, sorry for this...:-[ +break; +} +} else { +len = qemu_recv(s-sockfd, buf, size, 0); +if (len != -1) { +break; +} } if (socket_error() == EAGAIN) { -- Lei
Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
Il 21/11/2013 10:11, Lei Li ha scritto: +/* + * recvmsg is called here to catch the control message for + * the exchange of PIPE file descriptor until it is received. + */ +len = recvmsg(s-sockfd, msg, 0); +if (len != -1) { +if (unix_msgfd_lookup(s, msg) 0) { +pipefd_passed = 1; +/* + * Do not count one byte taken by the PIPE file + * descriptor. + */ +len--; I think adding a byte in the middle of the stream is not reliable. Rather, you should transmit the socket always at the same place, for example in the first call of qemu_local_save_ram, after it has written the 64-bit field. The matching code in qemu_local_ram_load will be like this: static int qemu_local_ram_load(QEMUFile *f, void *opaque, uint64_t flags) { QEMUFileLocal *s = opaque; ram_addr_t addr; struct iovec iov; ssize_t ret = -EINVAL; if (!s-pipefd_received) { /* * send_pipefd was called at this point, and it wrote one byte * to the stream. */ qemu_get_byte(s); s-pipefd_received = true; } if (pipefd_passed) { ... } return -EINVAL; } Also, please move pipefd_passed within QEMUFileLocal. Thanks, Paolo
Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
Il 21/11/2013 10:11, Lei Li ha scritto: Signed-off-by: Lei Li li...@linux.vnet.ibm.com --- migration.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/migration.c b/migration.c index 4ac466b..0f98ac1 100644 --- a/migration.c +++ b/migration.c @@ -579,10 +579,11 @@ static void *migration_thread(void *opaque) pending_size = qemu_savevm_state_pending(s-file, max_size); DPRINTF(pending size % PRIu64 max % PRIu64 \n, pending_size, max_size); -if (pending_size pending_size = max_size) { +if (pending_size pending_size = max_size +!runstate_needs_reset()) { qemu_savevm_state_iterate(s-file); I'm not sure why you need this. } else { -int ret; +int ret = 0; DPRINTF(done iterating\n); qemu_mutex_lock_iothread(); @@ -590,7 +591,10 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); -ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); +if (!runstate_needs_reset()) { +ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); +} This however is okay. Paolo if (ret = 0) { qemu_file_set_rate_limit(s-file, INT_MAX); qemu_savevm_state_complete(s-file);
Re: [Qemu-devel] [PATCH 29/60] AArch64: Add orri instruction emulation
On 09/27/2013 09:42 PM, Richard Henderson wrote: On 09/26/2013 05:48 PM, Alexander Graf wrote: +if (setflags) { +tcg_dst = cpu_reg(dest); +} else { +tcg_dst = cpu_reg_sp(dest); +} Never sp for logicals. This should be ok in my view, the manual explicitly shows in the pseudocode: if d == 31 !setflags then SP[] = result; else X[d] = result; Claudio +handle_orri(s, insn); And yet again, a better function name. r~
Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
On 11/26/2013 07:32 PM, Paolo Bonzini wrote: Il 21/11/2013 10:11, Lei Li ha scritto: Signed-off-by: Lei Li li...@linux.vnet.ibm.com --- migration.c | 10 +++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/migration.c b/migration.c index 4ac466b..0f98ac1 100644 --- a/migration.c +++ b/migration.c @@ -579,10 +579,11 @@ static void *migration_thread(void *opaque) pending_size = qemu_savevm_state_pending(s-file, max_size); DPRINTF(pending size % PRIu64 max % PRIu64 \n, pending_size, max_size); -if (pending_size pending_size = max_size) { +if (pending_size pending_size = max_size +!runstate_needs_reset()) { qemu_savevm_state_iterate(s-file); I'm not sure why you need this. The adjustment here is to avoid the iteration stage for page flipping. Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is not 0 and pending_size max_size (0) at start. In the previous version it was like this: if (pending_size pending_size = max_size !migrate_unix_page_flipping()) { And you said 'This is a bit ugly but I understand the need. Perhaps !runstate_needs_reset() like below?' :) } else { -int ret; +int ret = 0; DPRINTF(done iterating\n); qemu_mutex_lock_iothread(); @@ -590,7 +591,10 @@ static void *migration_thread(void *opaque) qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); old_vm_running = runstate_is_running(); -ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); +if (!runstate_needs_reset()) { +ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); +} This however is okay. Paolo if (ret = 0) { qemu_file_set_rate_limit(s-file, INT_MAX); qemu_savevm_state_complete(s-file); -- Lei
Re: [Qemu-devel] [PATCH 29/60] AArch64: Add orri instruction emulation
On Tue, Nov 26, 2013 at 12:56 PM, Claudio Fontana claudio.font...@linaro.org wrote: On 09/27/2013 09:42 PM, Richard Henderson wrote: On 09/26/2013 05:48 PM, Alexander Graf wrote: +if (setflags) { +tcg_dst = cpu_reg(dest); +} else { +tcg_dst = cpu_reg_sp(dest); +} Never sp for logicals. This should be ok in my view, the manual explicitly shows in the pseudocode: if d == 31 !setflags then SP[] = result; else X[d] = result; Agreed: for immediate logical instructions, destination can be SP except for ANDS. ANDS with destination as r31 is aliased to TST. Laurent Claudio +handle_orri(s, insn); And yet again, a better function name. r~
Re: [Qemu-devel] [PATCH 10/17] migration-local: override save_page for page transmit
On 11/26/2013 07:22 PM, Paolo Bonzini wrote: Il 21/11/2013 10:11, Lei Li ha scritto: This patch implements save_page callback for the outside of page flipping. It will write the address of the page on the Unix socket and flip the page data on pipe by vmsplice(). Every page address would have a header flag RAM_SAVE_FLAG_HOOK, which will trigger the load hook to receive it in incoming side as well. Signed-off-by: Lei Li li...@linux.vnet.ibm.com --- migration-local.c | 54 + 1 files changed, 54 insertions(+), 0 deletions(-) diff --git a/migration-local.c b/migration-local.c index 0f0896b..14207e9 100644 --- a/migration-local.c +++ b/migration-local.c @@ -200,6 +200,59 @@ static int qemu_local_send_pipefd(QEMUFile *f, void *opaque, return 0; } +static size_t qemu_local_save_ram(QEMUFile *f, void *opaque, + MemoryRegion *mr, ram_addr_t offset, + size_t size, int *bytes_sent) +{ +QEMUFileLocal *s = opaque; +ram_addr_t current_addr = mr-ram_addr + offset; +void *ram_addr; +ssize_t ret; + +if (s-unix_page_flipping) { +qemu_fflush(s-file); +qemu_put_be64(s-file, RAM_SAVE_FLAG_HOOK); + +/* Write page address to unix socket */ +qemu_put_be64(s-file, current_addr); + You can write current_addr | RAM_SAVE_FLAG_HOOK. The value will be in the flags argument of the hook_ram_load, you can extract it with flags ~RAM_SAVE_FLAG_HOOK. This cuts by half the data written to the Unix socket. OK, thanks. Paolo +ram_addr = memory_region_get_ram_ptr(mr) + offset; + +/* vmsplice page data to pipe */ +struct iovec iov = { +.iov_base = ram_addr, +.iov_len = size, +}; + +/* + * The flag SPLICE_F_MOVE is introduced in kernel for the page + * flipping feature in QEMU, which will movie pages rather than + * copying, previously unused. + * + * If a move is not possible the kernel will transparently falls + * back to copying data. + * + * For older kernels the SPLICE_F_MOVE would be ignored and a copy + * would occur. + */ +ret = vmsplice(s-pipefd[1], iov, 1, SPLICE_F_GIFT | SPLICE_F_MOVE); +if (ret == -1) { +if (errno != EAGAIN errno != EINTR) { +fprintf(stderr, vmsplice save error: %s\n, strerror(errno)); +return ret; +} +} else { +if (bytes_sent) { +*bytes_sent = 1; +} +DPRINTF(block_offset: %lu, offset: %lu\n, block_offset, offset); +return 0; +} +} + +return RAM_SAVE_CONTROL_NOT_SUPP; +} + static const QEMUFileOps pipe_read_ops = { .get_fd= qemu_local_get_sockfd, .get_buffer= qemu_local_get_buffer, @@ -211,6 +264,7 @@ static const QEMUFileOps pipe_write_ops = { .writev_buffer = qemu_local_writev_buffer, .close = qemu_local_close, .before_ram_iterate = qemu_local_send_pipefd, +.save_page = qemu_local_save_ram }; QEMUFile *qemu_fopen_socket_local(int sockfd, const char *mode) -- Lei
Re: [Qemu-devel] [PATCH 12/17] migration-local: override hook_ram_load
On 11/26/2013 07:25 PM, Paolo Bonzini wrote: Il 21/11/2013 10:11, Lei Li ha scritto: +static int qemu_local_ram_load(QEMUFile *f, void *opaque, + uint64_t flags) +{ +QEMUFileLocal *s = opaque; +ram_addr_t addr; +struct iovec iov; +ssize_t ret = -EINVAL; + +/* + * PIPE file descriptor will be received by another callback + * get_buffer. + */ +if (pipefd_passed) { +void *host; +/* + * Extract the page address from the 8-byte record and + * read the page data from the pipe. + */ +addr = qemu_get_be64(s-file); +host = qemu_get_ram_ptr(addr); + +iov.iov_base = host; +iov.iov_len = TARGET_PAGE_SIZE; + +/* The flag SPLICE_F_MOVE is introduced in kernel for the page + * flipping feature in QEMU, which will movie pages rather than + * copying, previously unused. + * + * If a move is not possible the kernel will transparently falls + * back to copying data. + * + * For older kernels the SPLICE_F_MOVE would be ignored and a copy + * would occur. + */ +ret = vmsplice(s-pipefd[0], iov, 1, SPLICE_F_MOVE); +if (ret == -1) { +if (errno != EAGAIN errno != EINTR) { +fprintf(stderr, vmsplice() load error: %s, strerror(errno)); +return ret; +} +DPRINTF(vmsplice load error\n); +} else if (ret == 0) { +DPRINTF(stderr, load_page: zero read\n); +} + +DPRINTF(vmsplice (read): %zu\n, ret); +return ret; +} + +return 0; +} I think you need to return -EINVAL if there is no pipe. Yes, you are right.. Paolo -- Lei
Re: [Qemu-devel] [Qemu-stable] [PATCH] qcow2: Zero-initialise first cluster for new images
On 2013年11月26日 18:48, Kevin Wolf wrote: Strictly speaking, this is only required for has_zero_init() == false, but it's easy enough to just do a cluster-aligned write that is padded with zeros after the header. This fixes that after 'qemu-img create' header extensions are attempted to be parsed that are really just random leftover data. Cc: qemu-sta...@nongnu.org Signed-off-by: Kevin Wolf kw...@redhat.com --- block/qcow2.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 6e5d98d..7c18587 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -1471,7 +1471,7 @@ static int qcow2_create2(const char *filename, int64_t total_size, * size for any qcow2 image. */ BlockDriverState* bs; -QCowHeader header; +QCowHeader *header; uint8_t* refcount_table; Error *local_err = NULL; int ret; @@ -1489,30 +1489,35 @@ static int qcow2_create2(const char *filename, int64_t total_size, } /* Write the header */ -memset(header, 0, sizeof(header)); -header.magic = cpu_to_be32(QCOW_MAGIC); -header.version = cpu_to_be32(version); -header.cluster_bits = cpu_to_be32(cluster_bits); -header.size = cpu_to_be64(0); -header.l1_table_offset = cpu_to_be64(0); -header.l1_size = cpu_to_be32(0); -header.refcount_table_offset = cpu_to_be64(cluster_size); -header.refcount_table_clusters = cpu_to_be32(1); -header.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT); -header.header_length = cpu_to_be32(sizeof(header)); +QEMU_BUILD_BUG_ON((1 MIN_CLUSTER_BITS) sizeof(*header)); +header = g_malloc(cluster_size); +memset(header, 0, cluster_size); Could just be: header = g_malloc0(cluster_size); But either way, Reviewed-by: Fam Zheng f...@redhat.com +*header = (QCowHeader) { +.magic = cpu_to_be32(QCOW_MAGIC), +.version= cpu_to_be32(version), +.cluster_bits = cpu_to_be32(cluster_bits), +.size = cpu_to_be64(0), +.l1_table_offset= cpu_to_be64(0), +.l1_size= cpu_to_be32(0), +.refcount_table_offset = cpu_to_be64(cluster_size), +.refcount_table_clusters= cpu_to_be32(1), +.refcount_order = cpu_to_be32(3 + REFCOUNT_SHIFT), +.header_length = cpu_to_be32(sizeof(*header)), +}; if (flags BLOCK_FLAG_ENCRYPT) { -header.crypt_method = cpu_to_be32(QCOW_CRYPT_AES); +header-crypt_method = cpu_to_be32(QCOW_CRYPT_AES); } else { -header.crypt_method = cpu_to_be32(QCOW_CRYPT_NONE); +header-crypt_method = cpu_to_be32(QCOW_CRYPT_NONE); } if (flags BLOCK_FLAG_LAZY_REFCOUNTS) { -header.compatible_features |= +header-compatible_features |= cpu_to_be64(QCOW2_COMPAT_LAZY_REFCOUNTS); } -ret = bdrv_pwrite(bs, 0, header, sizeof(header)); +ret = bdrv_pwrite(bs, 0, header, cluster_size); +g_free(header); if (ret 0) { error_setg_errno(errp, -ret, Could not write qcow2 header); goto out;
Re: [Qemu-devel] [PATCH 1.7] [PATCH v3 1/2] linux-user: create target_structsheader to place ipc_perm and shmid_dss
ping http://patchwork.ozlabs.org/patch/287211/ http://patchwork.ozlabs.org/patch/287213/ Can we get these two changes in 1.7? It has been reviewed and waiting for someone who has commits rights for two months now. Regards, Petar From: Petar Jovanovic Sent: Wednesday, November 20, 2013 12:03 AM To: qemu-devel@nongnu.org; riku.voi...@linaro.org Cc: Petar Jovanovic; peter.mayd...@linaro.org; aurel...@aurel32.net; afaer...@suse.de; Alex Bennée Subject: RE: [Qemu-devel] [PATCH v3 1/2] linux-user: create target_structsheader to place ipc_perm and shmid_dss ping http://patchwork.ozlabs.org/patch/287211/ http://patchwork.ozlabs.org/patch/287213/ From: Petar Jovanovic Sent: Tuesday, November 12, 2013 4:40 PM To: qemu-devel@nongnu.org; riku.voi...@linaro.org Cc: Petar Jovanovic; peter.mayd...@linaro.org; aurel...@aurel32.net; afaer...@suse.de; Alex Bennée Subject: RE: [Qemu-devel] [PATCH v3 1/2] linux-user: create target_structsheader to place ipc_perm and shmid_dss ping http://patchwork.ozlabs.org/patch/287211/ http://patchwork.ozlabs.org/patch/287213/ Riku? Regards, Petar
Re: [Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors
On 26.11.2013 11:04, Paolo Bonzini wrote: I think the right way to do this would be to add the functionality to qemu-progress.c (i.e. pass a number of sectors and let it choose between printing % or sectors). I was thinking about the same, but is this not beyond the scope of this patch? ;-) I don't mind leaving this patch out if you or the maintainers are strongly against it. I just need it internally to show the progress of a convert operation. I could theoretically calculate this the information I need also from the percentage output. Its a little bit more complicated if more than one hard drive is converted. Peter
Re: [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE
Il 21/11/2013 10:11, Lei Li ha scritto: { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED }, DEBUG - MEMORY_STALE is missing. Paolo { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, +{ RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE },
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
Laszlo Ersek ler...@redhat.com writes: On 11/25/13 16:22, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: On 11/22/13 13:21, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: This patch allows the user to usefully specify -drive file=img_1,if=pflash,format=raw,readonly \ -drive file=img_2,if=pflash,format=raw on the command line. The flash images will be mapped under 4G in their reverse unit order -- that is, with their base addresses progressing downwards, in increasing unit order. (The unit number increases with command line order if not explicitly specified.) This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. The binary code part would be read-only, centrally managed on the host system, and passed in as unit 0. The variable store would be writeable, VM-specific, and passed in as unit 1. ffe0-ffe1 (prio 0, R-): system.flash1 ffe2- (prio 0, R-): system.flash0 (If the guest tries to write to the flash range that is backed by the read-only drive, bdrv_write() in pflash_update() will correctly deny the write with -EACCES, and pflash_update() won't care, which suits us well.) Before this patch: You can define as many if=pflash drives as you want. Any with non-zero index are silently ignored. If you don't specify one with index=0, you get a ROM at the top of the 32 bit address space, contents taken from -bios (default bios.bin). Up to its last 128KiB are aliased at the top of the ISA address space. If you do specify one with index=0, you get a pflash device at the top of the 32 bit address space, with contents from the drive, and -bios is silently ignored. Up to its last 128KiB are copied to a ROM at the top of the (20 bit) ISA address space. After this patch (please correct misunderstandings): Now the drives after the first unused index are silently ignored. If you don't specify one with index=0, no change. If you do, you now get N pflash devices, where N is the first unused index. Each pflash's contents is taken from the respective drive. The flashes are mapped at the top of the 32 bit address space in reverse index order. -bios is silently ignored, as before. Up to the last 128KiB of the index=0 flash are copied to a ROM at the top of the ISA address space. Thus, no change for index=0. For index=1..N, we now get additional flash devices. Correct? Yes. Thanks. 1. Multiple pflash devices Is there any way for a guest to see the N separate pflash devices, or do they look exactly like a single pflash device? The interpretation of multiple -pflash options is board specific. I grepped the source for IF_PFLASH first, and found some boards that use more than one flash device. Merging them in one contiguous target-phys address range would be i386 specific. This doesn't break compatibility (because multiple pflash devices used not to be supported for i386 targets at all), but I agree that this would posit their interpretation for the future, and thus it might need deeper thought. From looking at hw/block/pflash_cfi01.c, it seems that the guest can interrogate the flash device about its size (nb_blocs is stored in cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 in pflash_read()). So, if the guest cares, it can figure out that there are multiple devices backing the range. I think. Your stated purpose for multiple -pflash: This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. Such a split between writable part and read-only part makes sense to me. How is it done in physical hardware? Single device with configurable write-protect, or two separate devices? 2. More board code device configuration magic Our long term goal is to configure machines by configuration files (i.e. data) rather than code. Your patch extends the PC board code dealing with if=pflash for multiple drives. Reminder: -drive if=X (where X != none) creates a backend, and leaves it in a place where board code can find it. Board code may elect to create a frontend using that backend. Yes, I'm aware. I did think about this -- eg. just create a drive with if=none, and add a frontend with -device something. But, the frontend here is not a device (a peripheral), it's a memory region with special mmio ops. Pflash frontends can apparently be created with -device cfi.pflash01,...': cfi.pflash01.drive=drive cfi.pflash01.num-blocks=uint32 cfi.pflash01.sector-length=uint64 cfi.pflash01.width=uint8 cfi.pflash01.big-endian=uint8 cfi.pflash01.id0=uint16 cfi.pflash01.id1=uint16 cfi.pflash01.id2=uint16
Re: [Qemu-devel] [PATCHv2 1.8 8/9] qemu-img: add option to show progress in sectors
Il 26/11/2013 13:23, Peter Lieven ha scritto: I think the right way to do this would be to add the functionality to qemu-progress.c (i.e. pass a number of sectors and let it choose between printing % or sectors). I was thinking about the same, but is this not beyond the scope of this patch? ;-) I think the functionality is not important enough to warrant more code in qemu-img.c (even 20 lines is already too much). We should improve the utility libraries instead. I don't mind leaving this patch out if you or the maintainers are strongly against it. Yes, please leave it out. Paolo
[Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Hi all, When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. Any better ideas? Thanks, Zhang Haoyu
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto: When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? Paolo
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 12:40:36PM +, Zhanghaoyu (A) wrote: Hi all, When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, Why vcpu thread ask the hypervisor to update the irq routing table on pcpu migration? in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. Any better ideas? Thanks, Zhang Haoyu -- Gleb.
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 02:48:10PM +0200, Gleb Natapov wrote: On Tue, Nov 26, 2013 at 12:40:36PM +, Zhanghaoyu (A) wrote: Hi all, When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, Why vcpu thread ask the hypervisor to update the irq routing table on pcpu migration? Ah, I misread. Guest sets irq smp_affinity not host. -- Gleb.
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
Laszlo Ersek ler...@redhat.com writes: On 11/22/13 13:21, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: This patch allows the user to usefully specify -drive file=img_1,if=pflash,format=raw,readonly \ -drive file=img_2,if=pflash,format=raw on the command line. The flash images will be mapped under 4G in their reverse unit order -- that is, with their base addresses progressing downwards, in increasing unit order. (The unit number increases with command line order if not explicitly specified.) This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. The binary code part would be read-only, centrally managed on the host system, and passed in as unit 0. The variable store would be writeable, VM-specific, and passed in as unit 1. ffe0-ffe1 (prio 0, R-): system.flash1 ffe2- (prio 0, R-): system.flash0 (If the guest tries to write to the flash range that is backed by the read-only drive, bdrv_write() in pflash_update() will correctly deny the write with -EACCES, and pflash_update() won't care, which suits us well.) Before this patch: You can define as many if=pflash drives as you want. Any with non-zero index are silently ignored. If you don't specify one with index=0, you get a ROM at the top of the 32 bit address space, contents taken from -bios (default bios.bin). Up to its last 128KiB are aliased at the top of the ISA address space. If you do specify one with index=0, you get a pflash device at the top of the 32 bit address space, with contents from the drive, and -bios is silently ignored. Up to its last 128KiB are copied to a ROM at the top of the (20 bit) ISA address space. After this patch (please correct misunderstandings): Now the drives after the first unused index are silently ignored. If you don't specify one with index=0, no change. If you do, you now get N pflash devices, where N is the first unused index. Each pflash's contents is taken from the respective drive. The flashes are mapped at the top of the 32 bit address space in reverse index order. -bios is silently ignored, as before. Up to the last 128KiB of the index=0 flash are copied to a ROM at the top of the ISA address space. Thus, no change for index=0. For index=1..N, we now get additional flash devices. Correct? Yes. Thus, we grab *all* if=pflash drives for this purpose. Your stated use case wants just two. Hmm. Are we sure we'll never want to map an if=pflash device somewhere else? I'm asking because such ABI things are a pain to change later on...
Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
Il 26/11/2013 13:03, Lei Li ha scritto: +if (pending_size pending_size = max_size +!runstate_needs_reset()) { qemu_savevm_state_iterate(s-file); I'm not sure why you need this. The adjustment here is to avoid the iteration stage for page flipping. Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is not 0 and pending_size max_size (0) at start. It's still not clear to me that avoiding the iteration stage is necessary. I think it's just an optimization to avoid scanning the bitmap, but: (1) Juan's bitmap optimization will make this mostly unnecessary (2) getting good downtime from page flipping will require postcopy anyway. And you said 'This is a bit ugly but I understand the need. Perhaps !runstate_needs_reset() like below?' :) Oops. I might have said this before thinking about postcopy and/or before seeing the benchmark results from Juan's patches. If this part of the patch is just an optimization, I'd rather leave it out for now. Thanks for putting up with me. :) Paolo
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote: Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto: When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? It should be rate limited somehow. Since it guest triggarable guest may cause host to allocate a lot of memory this way. Is this about MSI interrupt affinity? IIRC changing INT interrupt affinity should not trigger kvm_set_irq_routing update. If this is about MSI only then what about changing userspace to use KVM_SIGNAL_MSI for MSI injection? -- Gleb.
Re: [Qemu-devel] virtio-net: network stops responding in Win2k3 server
Since it's a production machine, I had to try a remedy first: I changed one network card (pub, of course) to e1000 and now it's up from 62 hours (maybe a record!) Here is the output for the other card (virtio): I guess parameters should have been the same for the affected interface too (they are now the same for the tap interface connected to e1000 driver, and reflect the underlying physical interface). Offload parameters for tap-solariconsi: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: on large-receive-offload: off I cannot switch back the virtual nic just now. I will try to increase debug on other machines, but most are 64bit Win2k3 OSs, so the virtio-net driver is not the same (it's a 64bit version at least...). Soon or later I will have some debug data to report to the list. In the meantime if you have some configuration advises, feel free to post them. Thanks, Mario 2013/11/24 Yan Vugenfirer yvuge...@redhat.com Hi Mario, Can you check the offload settings of the tap device that is connected to guest? Run “ethtool -k tap-solaripub”. On the guest. Raise the log verbosity by going to device manager - NetKVM device - Advanced tab - Logging.Level and changing it to 4. Use DebugView to record the driver tracing (enable kernel trace): http://technet.microsoft.com/en-us/sysinternals/bb896647.aspx Best regards, Yan.
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
Laszlo Ersek ler...@redhat.com writes: On 11/25/13 16:32, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: This patch allows the user to usefully specify -drive file=img_1,if=pflash,format=raw,readonly \ -drive file=img_2,if=pflash,format=raw on the command line. The flash images will be mapped under 4G in their reverse unit order -- that is, with their base addresses progressing downwards, in increasing unit order. (The unit number increases with command line order if not explicitly specified.) This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. The binary code part would be read-only, centrally managed on the host system, and passed in as unit 0. The variable store would be writeable, VM-specific, and passed in as unit 1. ffe0-ffe1 (prio 0, R-): system.flash1 ffe2- (prio 0, R-): system.flash0 (If the guest tries to write to the flash range that is backed by the read-only drive, bdrv_write() in pflash_update() will correctly deny the write with -EACCES, and pflash_update() won't care, which suits us well.) Signed-off-by: Laszlo Ersek ler...@redhat.com --- hw/i386/pc_sysfw.c | 60 -- 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c index e917c83..1c3e3d6 100644 --- a/hw/i386/pc_sysfw.c +++ b/hw/i386/pc_sysfw.c @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, memory_region_set_readonly(isa_bios, true); } +/* This function maps flash drives from 4G downward, in order of their unit + * numbers. Addressing within one flash drive is of course not reversed. + * + * The drive with unit number 0 is mapped at the highest address, and it is + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not + * supported. + * + * The caller is responsible to pass in the non-NULL @pflash_drv that + * corresponds to the flash drive with unit number 0. + */ static void pc_system_flash_init(MemoryRegion *rom_memory, DriveInfo *pflash_drv) { +int unit = 0; BlockDriverState *bdrv; int64_t size; -hwaddr phys_addr; +hwaddr phys_addr = 0x1ULL; int sector_bits, sector_size; pflash_t *system_flash; MemoryRegion *flash_mem; +char name[64]; -bdrv = pflash_drv-bdrv; -size = bdrv_getlength(pflash_drv-bdrv); sector_bits = 12; sector_size = 1 sector_bits; -if ((size % sector_size) != 0) { -fprintf(stderr, -qemu: PC system firmware (pflash) must be a multiple of 0x%x\n, -sector_size); -exit(1); -} +do { +bdrv = pflash_drv-bdrv; +size = bdrv_getlength(bdrv); +if ((size % sector_size) != 0) { +fprintf(stderr, +qemu: PC system firmware (pflash segment %d) must be a +multiple of 0x%x\n, unit, sector_size); +exit(1); +} +if (size phys_addr) { +fprintf(stderr, qemu: pflash segments must fit under 4G +cumulatively\n); You're just following existing bad practice here, but correct use of error_report() would give you nicer messages. Happy to explain if you're interested. I suspect things go haywire long before you hit address zero. Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G. The former's hole starts at 0xe000, the latter's at 0xb000. Should that be the limit? I wanted to do the bare minimal here, to catch obviously wrong backing drives (like a 10G file). This is already more verification than what the current code does. I wouldn't mind more a specific check, but I don't want to suggest (with more specific code) that it's actually *safe* to go down to the limit that I'd put here. For example, the IO-APIC mmio range is [0xFEE0..0xFEE01000[, leaving free 18MB-4KB just below 4G. (Of which the current OVMF, including variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS. I just don't want to send the message it's safe to go all the way down there. Right now the responsibility is with the user (you can specify a single pflash device that's 20MB in size even now), and I'd like to stick with that. I believe that pflash_cfi01_register() sysbus_mmio_map() sysbus_mmio_map_common() memory_region_add_subregion() memory_region_add_subregion_common() could, in theory, find a conflict at runtime (see the #if 0-surrounded collision warning in memory_region_add_subregion_common()). But the memory API doesn't consider such collisions hard errors, and no status code is
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Il 26/11/2013 13:56, Gleb Natapov ha scritto: I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? It should be rate limited somehow. Since it guest triggarable guest may cause host to allocate a lot of memory this way. True, though if I understand Zhanghaoyu's proposal a workqueue would be even worse. Paolo
Re: [Qemu-devel] [PATCH 04/27] vl: convert -m to qemu_opts_parse()
On Thu, 21 Nov 2013 11:12:43 +0100 Markus Armbruster arm...@redhat.com wrote: Igor Mammedov imamm...@redhat.com writes: Along with conversion extend -m option to support following parameters: Please split this into two patches: first conversion to QemuOpts, then extension. mem - startup memory amount slots - total number of hotplug memory slots maxmem - maximum possible memory slots and maxmem should go in pair and maxmem should be greater than mem for memory hotplug to be usable. v2: make sure maxmem is not less than ram size Signed-off-by: Igor Mammedov imamm...@redhat.com --- qemu-options.hx |9 +- vl.c| 73 ++- 2 files changed, 68 insertions(+), 14 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 8b94264..fe4559b 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -210,8 +210,13 @@ use is discouraged as it may be removed from future versions. ETEXI DEF(m, HAS_ARG, QEMU_OPTION_m, --m megs set virtual RAM size to megs MB [default= -stringify(DEFAULT_RAM_SIZE) ]\n, QEMU_ARCH_ALL) +-m [mem=]megs[,slots=n,maxmem=size]\n +set virtual RAM size to megs MB [default= +stringify(DEFAULT_RAM_SIZE) ]\n +mem=start-up memory amount\n +slots=maximum number of hotplug slots\n +maxmem=maximum total amount of memory\n, +QEMU_ARCH_ALL) Help text is confusing, because it discusses megs twice. Fits right in, as output of -help is generally confusing (to put it politely). What about something like this: -m [mem=]megs[,slots=n,maxmem=size] configure guest RAM mem: initial amount of guest memory (default: XXX) slots: number of hotplug slots (default: none) maxmem: maximum amount of guest memory (default: mem) Sure, I'll fix it. STEXI @item -m @var{megs} @findex -m diff --git a/vl.c b/vl.c index f28674f..5974f0f 100644 --- a/vl.c +++ b/vl.c @@ -529,6 +529,28 @@ static QemuOptsList qemu_msg_opts = { }, }; +static QemuOptsList qemu_mem_opts = { +.name = memory-opts, +.implied_opt_name = mem, +.head = QTAILQ_HEAD_INITIALIZER(qemu_mem_opts.head), +.merge_lists = true, Yes, because multiple -m need to accumulate. +.desc = { +{ +.name = mem, +.type = QEMU_OPT_SIZE, +}, +{ +.name = slots, +.type = QEMU_OPT_NUMBER, +}, +{ +.name = maxmem, +.type = QEMU_OPT_SIZE, +}, +{ /* end of list */ } +}, +}; + /** * Get machine options * @@ -2816,6 +2838,14 @@ static int object_create(QemuOpts *opts, void *opaque) return 0; } +static void qemu_init_default_mem_opts(uint64_t size) +{ +QemuOpts *opts = qemu_opts_create_nofail(qemu_mem_opts); +qemu_opt_set_number(opts, mem, size); +qemu_opt_set_number(opts, maxmem, size); +qemu_opt_set_number(opts, slots, 0); +} + We usually don't put defaults in QemuOpts. Instead, the code querying QemuOpts detects and handles absence of the parameter. Can be as simple as qemu_opt_get_size(opts, mem, DEFAULT_RAM_SIZE * 1024 * 1024). It allows to make number of uses a bit simpler: for example v6: QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory-opts), NULL); if (!opts) { /* no -m x,... was passed to cmd line so no mem hotplug */ return; } mem_st-dev_count = qemu_opt_get_number(opts, slots, 0); becomes QemuOpts *opts = qemu_opts_find(qemu_find_opts(memory-opts), NULL); state-dev_count = qemu_opt_get_number(opts, slots, 0); and eliminates need to pepper code with DEFAULT_RAM_SIZE * 1024 * 1024 or like constants wherever qemu_opt_get_..() is called, that was main reason to set defaults. i.e. set defaults only once instead of spreading them in every place qemu_opt_get_..() is called. See also below. int main(int argc, char **argv, char **envp) { int i; @@ -2887,6 +2917,7 @@ int main(int argc, char **argv, char **envp) qemu_add_opts(qemu_tpmdev_opts); qemu_add_opts(qemu_realtime_opts); qemu_add_opts(qemu_msg_opts); +qemu_add_opts(qemu_mem_opts); runstate_init(); @@ -2901,7 +2932,8 @@ int main(int argc, char **argv, char **envp) module_call_init(MODULE_INIT_MACHINE); machine = find_default_machine(); cpu_model = NULL; -ram_size = 0; +ram_size = DEFAULT_RAM_SIZE * 1024 * 1024; +qemu_init_default_mem_opts(ram_size); snapshot = 0; cyls = heads = secs = 0; translation = BIOS_ATA_TRANSLATION_AUTO; @@ -3178,21 +3210,43 @@ int main(int argc, char **argv, char **envp)
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 2:47 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto: When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? Can this cause an interrupt to be delivered to the wrong (old) vcpu? The way Linux sets interrupt affinity, it cannot, since changing the affinity is (IIRC) done in the interrupt handler, so the next interrupt cannot be in flight and thus pick up the old interrupt routing table. However it may be vulnerable in other ways.
[Qemu-devel] sparc64 with openbios-sparc64
Hello, Is there any news on emulating sparc64 machine with openbios-sparc64 on a x86_64 host machine? I followed some posts regarding this from 2010 however didn't end-up with a conclusion. Regards, Mahmood
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
On 11/26/13 13:53, Markus Armbruster wrote: Thus, we grab *all* if=pflash drives for this purpose. Your stated use case wants just two. Hmm. Are we sure we'll never want to map an if=pflash device somewhere else? No, I'm not sure. Thanks Laszlo
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
On 11/26/13 13:36, Markus Armbruster wrote: Your stated purpose for multiple -pflash: This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. Such a split between writable part and read-only part makes sense to me. How is it done in physical hardware? Single device with configurable write-protect, or two separate devices? (Jordan could help more.) Likely one device that's fully writeable. The flash driver (through which the NvVar updates go) makes sure that a kind of journal is written and that the live variable store is not corrupted even if power is cut during an update. However, if something writes to the flash without going through the driver, it can brick the board. (Trample over the bootstrap code for example.) I think. Laszlo
Re: [Qemu-devel] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
On 11/26/13 14:11, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: On 11/25/13 16:32, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: This patch allows the user to usefully specify -drive file=img_1,if=pflash,format=raw,readonly \ -drive file=img_2,if=pflash,format=raw on the command line. The flash images will be mapped under 4G in their reverse unit order -- that is, with their base addresses progressing downwards, in increasing unit order. (The unit number increases with command line order if not explicitly specified.) This accommodates the following use case: suppose that OVMF is split in two parts, a writeable host file for non-volatile variable storage, and a read-only part for bootstrap and decompressible executable code. The binary code part would be read-only, centrally managed on the host system, and passed in as unit 0. The variable store would be writeable, VM-specific, and passed in as unit 1. ffe0-ffe1 (prio 0, R-): system.flash1 ffe2- (prio 0, R-): system.flash0 (If the guest tries to write to the flash range that is backed by the read-only drive, bdrv_write() in pflash_update() will correctly deny the write with -EACCES, and pflash_update() won't care, which suits us well.) Signed-off-by: Laszlo Ersek ler...@redhat.com --- hw/i386/pc_sysfw.c | 60 -- 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c index e917c83..1c3e3d6 100644 --- a/hw/i386/pc_sysfw.c +++ b/hw/i386/pc_sysfw.c @@ -72,35 +72,65 @@ static void pc_isa_bios_init(MemoryRegion *rom_memory, memory_region_set_readonly(isa_bios, true); } +/* This function maps flash drives from 4G downward, in order of their unit + * numbers. Addressing within one flash drive is of course not reversed. + * + * The drive with unit number 0 is mapped at the highest address, and it is + * passed to pc_isa_bios_init(). Merging severral drives for isa-bios is not + * supported. + * + * The caller is responsible to pass in the non-NULL @pflash_drv that + * corresponds to the flash drive with unit number 0. + */ static void pc_system_flash_init(MemoryRegion *rom_memory, DriveInfo *pflash_drv) { +int unit = 0; BlockDriverState *bdrv; int64_t size; -hwaddr phys_addr; +hwaddr phys_addr = 0x1ULL; int sector_bits, sector_size; pflash_t *system_flash; MemoryRegion *flash_mem; +char name[64]; -bdrv = pflash_drv-bdrv; -size = bdrv_getlength(pflash_drv-bdrv); sector_bits = 12; sector_size = 1 sector_bits; -if ((size % sector_size) != 0) { -fprintf(stderr, -qemu: PC system firmware (pflash) must be a multiple of 0x%x\n, -sector_size); -exit(1); -} +do { +bdrv = pflash_drv-bdrv; +size = bdrv_getlength(bdrv); +if ((size % sector_size) != 0) { +fprintf(stderr, +qemu: PC system firmware (pflash segment %d) must be a +multiple of 0x%x\n, unit, sector_size); +exit(1); +} +if (size phys_addr) { +fprintf(stderr, qemu: pflash segments must fit under 4G +cumulatively\n); You're just following existing bad practice here, but correct use of error_report() would give you nicer messages. Happy to explain if you're interested. You have the Location thing in mind, eg. automatically binding the error message to the offending command line option, right? I've seen you use it before in another patch. Hm... It was commit ec2df8c10a4585ba4641ae482cf2f5f13daa810e. I will try to address the rest of your comments too when I get back to this patch. Thanks Laszlo I suspect things go haywire long before you hit address zero. Note that both pc_piix.c and pc_q35.c leave a hole in RAM just below 4G. The former's hole starts at 0xe000, the latter's at 0xb000. Should that be the limit? I wanted to do the bare minimal here, to catch obviously wrong backing drives (like a 10G file). This is already more verification than what the current code does. I wouldn't mind more a specific check, but I don't want to suggest (with more specific code) that it's actually *safe* to go down to the limit that I'd put here. For example, the IO-APIC mmio range is [0xFEE0..0xFEE01000[, leaving free 18MB-4KB just below 4G. (Of which the current OVMF, including variable store, takes up 2MB.) Grep IO_APIC_DEFAULT_ADDRESS. I just don't want to send the message it's safe to go all the way down there. Right now the responsibility is with the user (you can specify a single pflash device that's 20MB in size even now), and I'd like to stick with that. I believe that
Re: [Qemu-devel] [PATCHv2 1.8 1/9] qemu-img: add support for skipping zeroes in input during convert
On 26.11.2013 11:02, Paolo Bonzini wrote: Il 26/11/2013 09:56, Peter Lieven ha scritto: we currently do not check if a sector is allocated during convert. This means if a sector is unallocated that we allocate a bounce buffer of zeroes, find out its zero later and do not write it in the best case. In the worst case this can lead to reading blocks from a raw device (like iSCSI) altough we could easily know via get_block_status that they are zero and simply skip them. This patch also fixes the progress output not being at 100% after a successful conversion. Signed-off-by: Peter Lieven p...@kamp.de --- qemu-img.c | 85 ++-- 1 file changed, 48 insertions(+), 37 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index dc0c2f0..efb744c 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1125,13 +1125,15 @@ out3: static int img_convert(int argc, char **argv) { -int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, +int c, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors, skip_create; +int64_t ret = 0; int progress = 0, flags; const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; -int64_t total_sectors, nb_sectors, sector_num, bs_offset; +int64_t total_sectors, nb_sectors, sector_num, bs_offset, +sector_num_next_status = 0; uint64_t bs_sectors; uint8_t * buf = NULL; const uint8_t *buf1; @@ -1140,7 +1142,6 @@ static int img_convert(int argc, char **argv) QEMUOptionParameter *out_baseimg_param; char *options = NULL; const char *snapshot_name = NULL; -float local_progress = 0; int min_sparse = 8; /* Need at least 4k of zeros for sparse detection */ bool quiet = false; Error *local_err = NULL; @@ -1403,10 +1404,6 @@ static int img_convert(int argc, char **argv) sector_num = 0; nb_sectors = total_sectors; -if (nb_sectors != 0) { -local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, cluster_sectors)); -} for(;;) { int64_t bs_num; @@ -1464,7 +1461,7 @@ static int img_convert(int argc, char **argv) } } sector_num += n; -qemu_progress_print(local_progress, 100); +qemu_progress_print(100.0 * sector_num / total_sectors, 0); } /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); @@ -1481,21 +1478,13 @@ static int img_convert(int argc, char **argv) sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; -if (nb_sectors != 0) { -local_progress = (float)100 / -(nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512)); -} for(;;) { nb_sectors = total_sectors - sector_num; if (nb_sectors = 0) { +ret = 0; break; } -if (nb_sectors = (IO_BUF_SIZE / 512)) { -n = (IO_BUF_SIZE / 512); -} else { -n = nb_sectors; -} while (sector_num - bs_offset = bs_sectors) { bs_i ++; @@ -1507,34 +1496,53 @@ static int img_convert(int argc, char **argv) sector_num, bs_i, bs_offset, bs_sectors); */ } -if (n bs_offset + bs_sectors - sector_num) { -n = bs_offset + bs_sectors - sector_num; -} - -/* If the output image is being created as a copy on write image, - assume that sectors which are unallocated in the input image - are present in both the output's and input's base images (no - need to copy them). */ -if (out_baseimg) { -ret = bdrv_is_allocated(bs[bs_i], sector_num - bs_offset, -n, n1); +if ((out_baseimg || has_zero_init) +sector_num = sector_num_next_status) { +n = nb_sectors INT_MAX ? INT_MAX : nb_sectors; +ret = bdrv_get_block_status(bs[bs_i], sector_num - bs_offset, +n, n1); if (ret 0) { -error_report(error while reading metadata for sector - % PRId64 : %s, - sector_num - bs_offset, strerror(-ret)); +error_report(error while reading block status of sector % + PRId64 : %s, sector_num - bs_offset, + strerror(-ret)); goto out; } -if (!ret) { +/* If the output image is zero initialized, we are
Re: [Qemu-devel] [edk2] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
Il 25/11/2013 20:45, Laszlo Ersek ha scritto: From looking at hw/block/pflash_cfi01.c, it seems that the guest can interrogate the flash device about its size (nb_blocs is stored in cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 in pflash_read()). So, if the guest cares, it can figure out that there are multiple devices backing the range. I think. IIUC in the case of OVMF the guest just knows that there are multiple devices backing the range. Is that right? But yes, I think that the guest can figure out that there are multiple devices backing the range. From reading the pflash code further: * the pflash device doesn't care about the location where you write the command (the exception is the block erase command) * the pflash device only cares about the LSB you read from when you read data from it So you can use the last 256 bytes of the first flash (you know it ends at 4GB) to check for the device (there's probably some suggested protocol to do that, I don't know) and query its size. Example with -qtest stdio: writeb 0xff00 0x98 OK readb 0xfff2d OK 0x001f readb 0xfff2e OK 0x readb 0xfff2f OK 0x0010 readb 0xfff30 OK 0x writeb 0xff00 0x98 OK This means the device has 31+1 blocks each 4KB in size. You can then query the next device at 4GB-128KB-256, and so on. Paolo
[Qemu-devel] [PATCH] virtio-rng: correct the default limit rate
We have the following comment about the default limit rate: /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s. If * you have an entropy source capable of generating more entropy than this * and you can pass it through via virtio-rng, then hats off to you. Until * then, this is unlimited for all practical purposes. */ But the current rate is (INT64_MAX) bytes per (1 16) ms, it's 128,000 TB/s 2TB/s is a reasonable rate, so this patch fixes the code, not update the document. * change the default period to 60,000 ms -- 1 mins * change the default max-bytes to 2^47 bytes -- INT64_MAX 16 INT64_MAX = 0x 8000 - 1 = 2 ^ 63 - 1 INT64_MAX 16 = 0x 8000 - 1 = 2 ^ 47 Signed-off-by: Amos Kong ak...@redhat.com --- include/hw/virtio/virtio-rng.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h index debaa15..d64e804 100644 --- a/include/hw/virtio/virtio-rng.h +++ b/include/hw/virtio/virtio-rng.h @@ -53,7 +53,7 @@ typedef struct VirtIORNG { */ #define DEFINE_VIRTIO_RNG_PROPERTIES(_state, _conf_field)\ DEFINE_PROP_UINT64(max-bytes, _state, _conf_field.max_bytes, \ - INT64_MAX), \ -DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 1 16) + INT64_MAX 16), \ +DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 6) #endif -- 1.8.3.1
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Il 26/11/2013 14:18, Avi Kivity ha scritto: I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? Can this cause an interrupt to be delivered to the wrong (old) vcpu? No, this would be exactly the same code that is running now: mutex_lock(kvm-irq_lock); old = kvm-irq_routing; kvm_irq_routing_update(kvm, new); mutex_unlock(kvm-irq_lock); synchronize_rcu(); kfree(old); return 0; Except that the kfree would run in the call_rcu kernel thread instead of the vcpu thread. But the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update. There is still the problem that Gleb pointed out, though. Paolo The way Linux sets interrupt affinity, it cannot, since changing the affinity is (IIRC) done in the interrupt handler, so the next interrupt cannot be in flight and thus pick up the old interrupt routing table. However it may be vulnerable in other ways.
Re: [Qemu-devel] [edk2] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
On 11/26/13 14:41, Paolo Bonzini wrote: Il 25/11/2013 20:45, Laszlo Ersek ha scritto: From looking at hw/block/pflash_cfi01.c, it seems that the guest can interrogate the flash device about its size (nb_blocs is stored in cfi_table starting at 0x2D, and cfi_table can be queried by command 0x98 in pflash_read()). So, if the guest cares, it can figure out that there are multiple devices backing the range. I think. IIUC in the case of OVMF the guest just knows that there are multiple devices backing the range. Is that right? No, the guest (the flash driver) is unaware of that. It could know if it wanted to, but it doesn't care. It cares about a base address and a size, and that range is subdivided into various roles. But how many flash devices back the range is not interesting for the driver. Jordan wrote the driver with one flash device in mind, and when I split the image in two parts, I took care to map them so that the base address, the size, and those boundaries stay the same. It's sufficient for the driver to continue working. But yes, I think that the guest can figure out that there are multiple devices backing the range. From reading the pflash code further: * the pflash device doesn't care about the location where you write the command (the exception is the block erase command) * the pflash device only cares about the LSB you read from when you read data from it So you can use the last 256 bytes of the first flash (you know it ends at 4GB) to check for the device (there's probably some suggested protocol to do that, I don't know) and query its size. Example with -qtest stdio: writeb 0xff00 0x98 OK readb 0xfff2d OK 0x001f readb 0xfff2e OK 0x readb 0xfff2f OK 0x0010 readb 0xfff30 OK 0x writeb 0xff00 0x98 OK This means the device has 31+1 blocks each 4KB in size. You can then query the next device at 4GB-128KB-256, and so on. Thanks for testing this in practice. Laszlo
Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
On 11/26/2013 08:54 PM, Paolo Bonzini wrote: Il 26/11/2013 13:03, Lei Li ha scritto: +if (pending_size pending_size = max_size +!runstate_needs_reset()) { qemu_savevm_state_iterate(s-file); I'm not sure why you need this. The adjustment here is to avoid the iteration stage for page flipping. Because pending_size = ram_save_remaining() * TARGET_PAGE_SIZE which is not 0 and pending_size max_size (0) at start. It's still not clear to me that avoiding the iteration stage is The purpose of it is not just for optimization, but to avoid the iteration for better alignment. The current flow of page flipping basically has two stages: 1) ram_save_setup stage, it will send all the bytes in this stages to destination, and send_pipefd by ram_control_before_iterate at the end of it. 2) ram_save_complete, it will start to transmit the ram page in ram_save_block, and send the device state after that. So it needs to adjust the current migration process to avoid the iteration stage. necessary. I think it's just an optimization to avoid scanning the bitmap, but: (1) Juan's bitmap optimization will make this mostly unnecessary (2) getting good downtime from page flipping will require postcopy anyway. And you said 'This is a bit ugly but I understand the need. Perhaps !runstate_needs_reset() like below?' :) Oops. I might have said this before thinking about postcopy and/or before seeing the benchmark results from Juan's patches. If this part of the patch is just an optimization, I'd rather leave it out for now. I am afraid that page flipping can not proceed correctly without this.. Thanks for putting up with me. :) Paolo -- Lei
Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate
Il 26/11/2013 14:43, Amos Kong ha scritto: /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s. If * you have an entropy source capable of generating more entropy than this * and you can pass it through via virtio-rng, then hats off to you. Until * then, this is unlimited for all practical purposes. */ But the current rate is (INT64_MAX) bytes per (1 16) ms, it's 128,000 TB/s You are changing: * max-bytes from 2^63 to 2^47 * period from 65536 to 6 For a user, changing only period would have no effect, the limit rate would remain effectively infinite. Changing max-bytes would give a 7% higher rate after your patch. Not a big deal, and max-bytes is easier to explain after your patch (bytes/minute) than before (bytes/65536ms). Reviewed-by: Paolo Bonzini pbonz...@redhat.com * change the default period to 60,000 ms -- 1 mins * change the default max-bytes to 2^47 bytes -- INT64_MAX 16 INT64_MAX = 0x 8000 - 1 = 2 ^ 63 - 1 INT64_MAX 16 = 0x 8000 - 1 = 2 ^ 47 Signed-off-by: Amos Kong ak...@redhat.com --- include/hw/virtio/virtio-rng.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/hw/virtio/virtio-rng.h b/include/hw/virtio/virtio-rng.h index debaa15..d64e804 100644 --- a/include/hw/virtio/virtio-rng.h +++ b/include/hw/virtio/virtio-rng.h @@ -53,7 +53,7 @@ typedef struct VirtIORNG { */ #define DEFINE_VIRTIO_RNG_PROPERTIES(_state, _conf_field) \ DEFINE_PROP_UINT64(max-bytes, _state, _conf_field.max_bytes, \ - INT64_MAX), \ -DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 1 16) + INT64_MAX 16), \ +DEFINE_PROP_UINT32(period, _state, _conf_field.period_ms, 6)
Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info
On Tue, Nov 26, 2013 at 12:00:51PM +0100, Gerd Hoffmann wrote: Hi, I think it's down to other qemu bugs (such as _CRS not covering all of PCI memory), we shall just fix them. i.e. the attached patch should fixup things. cheers, Gerd Yes, I think it's a start. Q35 is a bit harder because of the MMIO region. Do we want to tweak end too? There's all kind of stuff there so need to be careful ... From a81b8d66e24fd298ce7654d424a378337e6cf132 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann kra...@redhat.com Date: Tue, 26 Nov 2013 11:46:11 +0100 Subject: [PATCH] piix: fix 32bit pci hole Make the 32bit pci hole start at end of ram, so all possible address space is covered. Of course the firmware can use less than that. Leaving space unused is no problem, mapping pci bars outside the hole causes problems though. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/pci-host/piix.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index edc974e..1414a2b 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -345,15 +345,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, f-ram_memory = ram_memory; i440fx = I440FX_PCI_HOST_BRIDGE(dev); -/* Set PCI window size the way seabios has always done it. */ -/* Power of 2 so bios can cover it with a single MTRR */ -if (ram_size = 0x8000) { -i440fx-pci_info.w32.begin = 0x8000; -} else if (ram_size = 0xc000) { -i440fx-pci_info.w32.begin = 0xc000; -} else { -i440fx-pci_info.w32.begin = 0xe000; -} +i440fx-pci_info.w32.begin = ram_size; memory_region_init_alias(f-pci_hole, OBJECT(d), pci-hole, f-pci_address_space, pci_hole_start, pci_hole_size); -- 1.8.3.1
Re: [Qemu-devel] [PATCH 08/17] add unix_msgfd_lookup() to callback get_buffer
On 11/26/2013 07:31 PM, Paolo Bonzini wrote: Il 21/11/2013 10:11, Lei Li ha scritto: +/* + * recvmsg is called here to catch the control message for + * the exchange of PIPE file descriptor until it is received. + */ +len = recvmsg(s-sockfd, msg, 0); +if (len != -1) { +if (unix_msgfd_lookup(s, msg) 0) { +pipefd_passed = 1; +/* + * Do not count one byte taken by the PIPE file + * descriptor. + */ +len--; I think adding a byte in the middle of the stream is not reliable. Rather, you should transmit the socket always at the same place, for example in the first call of qemu_local_save_ram, after it has written the 64-bit field. I guess 'transmit the socket' you mean transmit the fd? Sorry that I am quite understand your suggestion here.. Do you mean that send_pipefd in the first call of qemu_local_save_ram after it has written the 64-bit field? In this way, get rid of qemu_local_send_pipefd? Currently, the fd control message is sent at the end of the stream in ram_save_setup stage, followed by the ram page. The control message of fd is always at the same place. The matching code in qemu_local_ram_load will be like this: static int qemu_local_ram_load(QEMUFile *f, void *opaque, uint64_t flags) { QEMUFileLocal *s = opaque; ram_addr_t addr; struct iovec iov; ssize_t ret = -EINVAL; if (!s-pipefd_received) { /* * send_pipefd was called at this point, and it wrote one byte * to the stream. */ qemu_get_byte(s); s-pipefd_received = true; } if (pipefd_passed) { ... } return -EINVAL; } Also, please move pipefd_passed within QEMUFileLocal. Thanks, Paolo -- Lei
Re: [Qemu-devel] [PATCH 14/17] add new RanState RAN_STATE_MEMORY_STALE
On 11/26/2013 08:28 PM, Paolo Bonzini wrote: Il 21/11/2013 10:11, Lei Li ha scritto: { RUN_STATE_DEBUG, RUN_STATE_SUSPENDED }, DEBUG - MEMORY_STALE is missing. Good catch, I will add it, thanks. :) Paolo { RUN_STATE_RUNNING, RUN_STATE_SUSPENDED }, { RUN_STATE_SUSPENDED, RUN_STATE_RUNNING }, { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE }, +{ RUN_STATE_SUSPENDED, RUN_STATE_MEMORY_STALE }, -- Lei
[Qemu-devel] [PATCH] SPARC: Add and use CPU_FEATURE_CASA
The LEON3 processor has support for the CASA instruction which is normally only available for SPARC V9 processors. Binutils 2.24 and GCC 4.9 will support this instruction for LEON3. GCC uses it to generate C11 atomic operations. --- target-sparc/cpu.c |3 +- target-sparc/cpu.h |4 ++- target-sparc/helper.h |4 ++- target-sparc/ldst_helper.c | 26 +--- target-sparc/translate.c | 47 --- 5 files changed, 52 insertions(+), 32 deletions(-) diff --git a/target-sparc/cpu.c b/target-sparc/cpu.c index e7f878e..5806e59 100644 --- a/target-sparc/cpu.c +++ b/target-sparc/cpu.c @@ -458,7 +458,8 @@ static const sparc_def_t sparc_defs[] = { .mmu_trcr_mask = 0x, .nwindows = 8, .features = CPU_DEFAULT_FEATURES | CPU_FEATURE_TA0_SHUTDOWN | -CPU_FEATURE_ASR17 | CPU_FEATURE_CACHE_CTRL | CPU_FEATURE_POWERDOWN, +CPU_FEATURE_ASR17 | CPU_FEATURE_CACHE_CTRL | CPU_FEATURE_POWERDOWN | +CPU_FEATURE_CASA, }, #endif }; diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h index 41194ec..f87d7fb 100644 --- a/target-sparc/cpu.h +++ b/target-sparc/cpu.h @@ -271,12 +271,14 @@ typedef struct sparc_def_t { #define CPU_FEATURE_ASR17(1 15) #define CPU_FEATURE_CACHE_CTRL (1 16) #define CPU_FEATURE_POWERDOWN(1 17) +#define CPU_FEATURE_CASA (1 18) #ifndef TARGET_SPARC64 #define CPU_DEFAULT_FEATURES (CPU_FEATURE_FLOAT | CPU_FEATURE_SWAP | \ CPU_FEATURE_MUL | CPU_FEATURE_DIV | \ CPU_FEATURE_FLUSH | CPU_FEATURE_FSQRT | \ - CPU_FEATURE_FMUL | CPU_FEATURE_FSMULD) + CPU_FEATURE_FMUL | CPU_FEATURE_FSMULD | \ + CPU_FEATURE_CASA) #else #define CPU_DEFAULT_FEATURES (CPU_FEATURE_FLOAT | CPU_FEATURE_SWAP | \ CPU_FEATURE_MUL | CPU_FEATURE_DIV | \ diff --git a/target-sparc/helper.h b/target-sparc/helper.h index 5e0eea1..9c4fd56 100644 --- a/target-sparc/helper.h +++ b/target-sparc/helper.h @@ -22,7 +22,6 @@ DEF_HELPER_1(popc, tl, tl) DEF_HELPER_4(ldda_asi, void, env, tl, int, int) DEF_HELPER_5(ldf_asi, void, env, tl, int, int, int) DEF_HELPER_5(stf_asi, void, env, tl, int, int, int) -DEF_HELPER_5(cas_asi, tl, env, tl, tl, tl, i32) DEF_HELPER_5(casx_asi, tl, env, tl, tl, tl, i32) DEF_HELPER_2(set_softint, void, env, i64) DEF_HELPER_2(clear_softint, void, env, i64) @@ -31,6 +30,9 @@ DEF_HELPER_2(tick_set_count, void, ptr, i64) DEF_HELPER_1(tick_get_count, i64, ptr) DEF_HELPER_2(tick_set_limit, void, ptr, i64) #endif +#if !defined(CONFIG_USER_ONLY) || defined(TARGET_SPARC64) +DEF_HELPER_5(cas_asi, tl, env, tl, tl, tl, i32) +#endif DEF_HELPER_3(check_align, void, env, tl, i32) DEF_HELPER_1(debug, void, env) DEF_HELPER_1(save, void, env) diff --git a/target-sparc/ldst_helper.c b/target-sparc/ldst_helper.c index 2936b58..c51b9b0 100644 --- a/target-sparc/ldst_helper.c +++ b/target-sparc/ldst_helper.c @@ -2224,33 +2224,35 @@ void helper_stf_asi(CPUSPARCState *env, target_ulong addr, int asi, int size, } } -target_ulong helper_cas_asi(CPUSPARCState *env, target_ulong addr, -target_ulong val1, target_ulong val2, uint32_t asi) +target_ulong helper_casx_asi(CPUSPARCState *env, target_ulong addr, + target_ulong val1, target_ulong val2, + uint32_t asi) { target_ulong ret; -val2 = 0xUL; -ret = helper_ld_asi(env, addr, asi, 4, 0); -ret = 0xUL; +ret = helper_ld_asi(env, addr, asi, 8, 0); if (val2 == ret) { -helper_st_asi(env, addr, val1 0xUL, asi, 4); +helper_st_asi(env, addr, val1, asi, 8); } return ret; } +#endif /* TARGET_SPARC64 */ -target_ulong helper_casx_asi(CPUSPARCState *env, target_ulong addr, - target_ulong val1, target_ulong val2, - uint32_t asi) +#if !defined(CONFIG_USER_ONLY) || defined(TARGET_SPARC64) +target_ulong helper_cas_asi(CPUSPARCState *env, target_ulong addr, +target_ulong val1, target_ulong val2, uint32_t asi) { target_ulong ret; -ret = helper_ld_asi(env, addr, asi, 8, 0); +val2 = 0xUL; +ret = helper_ld_asi(env, addr, asi, 4, 0); +ret = 0xUL; if (val2 == ret) { -helper_st_asi(env, addr, val1, asi, 8); +helper_st_asi(env, addr, val1 0xUL, asi, 4); } return ret; } -#endif /* TARGET_SPARC64 */ +#endif /* !defined(CONFIG_USER_ONLY) || defined(TARGET_SPARC64) */ void helper_ldqf(CPUSPARCState *env, target_ulong addr, int mem_idx) { diff --git a/target-sparc/translate.c b/target-sparc/translate.c index 38b4519..86743dc 100644 --- a/target-sparc/translate.c +++ b/target-sparc/translate.c @@ -2107,18 +2107,6 @@ static
Re: [Qemu-devel] [edk2] [qemu PATCH] hw/i386/pc_sysfw: support more than one flash drive
Il 26/11/2013 14:53, Laszlo Ersek ha scritto: IIUC in the case of OVMF the guest just knows that there are multiple devices backing the range. Is that right? No, the guest (the flash driver) is unaware of that. It could know if it wanted to, but it doesn't care. It cares about a base address and a size, and that range is subdivided into various roles. But how many flash devices back the range is not interesting for the driver. Jordan wrote the driver with one flash device in mind, and when I split the image in two parts, I took care to map them so that the base address, the size, and those boundaries stay the same. It's sufficient for the driver to continue working. Ah, I see it now. That's because the driver never needs to write an offset into the flash device. Offsets are communicated by writing to different memory addresses. Since the OVMF driver never queries the size of the device, it doesn't care whether the flash is one or two. Paolo
Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info
On 11/26/13 10:10, Michael S. Tsirkin wrote: seabios manages to enumerate PCI with information exported from qemu so why can't OVMF? SeaBIOS and qemu duplicate logic (code) between each other. src/fw/pciinit.c grabs RamSize over fw_cfg, and i440fx_mem_addr_setup() sets pcimem_start to one of three values based on RamSize. (One of the three is not explicit there, it's the build default 0xe000.) The same code is visible in qemu in i440fx_init(). I duplicated the same logic in OVMF's PEI one week ago, and it solved the problem. See the patch attached to http://thread.gmane.org/gmane.comp.bios.tianocore.devel/4881/focus=4995. But code/logic duplication is ugly, which is why I've been looking for a better, dynamic solution ever since. I think it's down to other qemu bugs (such as _CRS not covering all of PCI memory), we shall just fix them. I don't know how to fix them. I don't know how to enumerate all PCI regions in use, plus all unassigned ranges, from below, like with a MemoryListener. If I understand correctly, Igor suggested to track devices as they map their regions, but (again, if I understood correctly) you didn't seem to like the idea. Thanks Laszlo
Re: [Qemu-devel] [PATCH 16/17] migration: adjust migration_thread() process for page flipping
Il 26/11/2013 14:53, Lei Li ha scritto: 1) ram_save_setup stage, it will send all the bytes in this stages to destination, and send_pipefd by ram_control_before_iterate at the end of it. ram_save_setup runs doesn't send anything from guest RAM. It sends the lengths of the various blocks. As you said, at the end of ram_save_setup you send the pipefd. ram_save_iterate runs before ram_save_complete. ram_save_iterate and ram_save_complete write data with exactly the same format. Both of them can use ram_save_page It should not matter if some pages are sent as part of ram_save_iterate and others as part of ram_save_complete. One possibility is that you are hitting a bug due to the way you ignore the 0x01 byte that send_pipefd places on the socket. Oops. I might have said this before thinking about postcopy and/or before seeing the benchmark results from Juan's patches. If this part of the patch is just an optimization, I'd rather leave it out for now. I am afraid that page flipping can not proceed correctly without this.. I really would like to understand why, because it really shouldn't (this shouldn't be a place where you need a hook). Paolo
Re: [Qemu-devel] [PULL for-1.8 1/2] pc: disable pci-info
On 11/26/13 15:04, Michael S. Tsirkin wrote: On Tue, Nov 26, 2013 at 12:00:51PM +0100, Gerd Hoffmann wrote: Hi, I think it's down to other qemu bugs (such as _CRS not covering all of PCI memory), we shall just fix them. i.e. the attached patch should fixup things. cheers, Gerd Yes, I think it's a start. Q35 is a bit harder because of the MMIO region. Do we want to tweak end too? There's all kind of stuff there so need to be careful ... I sent you a very similar patch last evening, and you ignored it. In that same email I asked about the mmconfig stuff as well, and you ignored that too. Attached. From a81b8d66e24fd298ce7654d424a378337e6cf132 Mon Sep 17 00:00:00 2001 From: Gerd Hoffmann kra...@redhat.com Date: Tue, 26 Nov 2013 11:46:11 +0100 Subject: [PATCH] piix: fix 32bit pci hole Make the 32bit pci hole start at end of ram, so all possible address space is covered. Of course the firmware can use less than that. Leaving space unused is no problem, mapping pci bars outside the hole causes problems though. Signed-off-by: Gerd Hoffmann kra...@redhat.com --- hw/pci-host/piix.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index edc974e..1414a2b 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -345,15 +345,7 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, f-ram_memory = ram_memory; i440fx = I440FX_PCI_HOST_BRIDGE(dev); -/* Set PCI window size the way seabios has always done it. */ -/* Power of 2 so bios can cover it with a single MTRR */ -if (ram_size = 0x8000) { -i440fx-pci_info.w32.begin = 0x8000; -} else if (ram_size = 0xc000) { -i440fx-pci_info.w32.begin = 0xc000; -} else { -i440fx-pci_info.w32.begin = 0xe000; -} +i440fx-pci_info.w32.begin = ram_size; This doesn't clamp the w32.begin value into [0x8000, 0xe000], which seems wrong. Guys, I'm confused and annoyed out of my brains by the divergence here. In my perception Michael, Igor, and Gerd are all proposing different things, and my ideas are either rejected or ignored (even though they occasionally overlap with ideas from others). After struggling with this for more than a week, and having being awake for 27 hrs, I'm officially stopping to care right now. Ping me when qemu has something to offer that's neither ridden by SeaBIOS legacy nor requires an AML interpreter in OVMF's PEI phase. Thanks, Laszlo ---BeginMessage--- CC'ing Igor On 11/25/13 14:24, Michael S. Tsirkin wrote: On Sun, Nov 24, 2013 at 06:43:38PM +0100, Laszlo Ersek wrote: Hello Michael, I didn't completely get your point on IRC regarding this subject. You were saying that OVMF shouldn't be changed, but the boundaries exported by qemu should (and that qemu should stop caring about SeaBIOS tradition in this regard). You mentioned MMCONFIG and I don't understand that reference. Can you please elaborate how you think we should proceed? Thanks! Laszlo I refer to this: /* Leave enough space for the biggest MCFG BAR */ /* TODO: this matches current bios behaviour, but * it's not a power of two, which means an MTRR * can't cover it exactly. */ s-mch.pci_info.w32.begin = MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT + MCH_HOST_BRIDGE_PCIEXBAR_MAX; s-mch.pci_info.w32.end = IO_APIC_DEFAULT_ADDRESS; We must make sure CRS does not include the mmcfg region. Sorry but I still don't understand :) First, OVMF is currently for i440fx only, and the quoted part is from hw/pci-host/q35.c. I of course agree that we should fix this in a uniform manner in qemu -- just saying that for OVMF the following would suffice: diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c index edc974e..9753fea 100644 --- a/hw/pci-host/piix.c +++ b/hw/pci-host/piix.c @@ -345,12 +345,10 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, f-ram_memory = ram_memory; i440fx = I440FX_PCI_HOST_BRIDGE(dev); -/* Set PCI window size the way seabios has always done it. */ -/* Power of 2 so bios can cover it with a single MTRR */ if (ram_size = 0x8000) { i440fx-pci_info.w32.begin = 0x8000; -} else if (ram_size = 0xc000) { -i440fx-pci_info.w32.begin = 0xc000; +} else if (ram_size = 0xe000) { +i440fx-pci_info.w32.begin = ram_size; } else { i440fx-pci_info.w32.begin = 0xe000; } Second, the code that you quoted from hw/pci-host/q35.c seems to do exactly what you're saying we must do. static Property mch_props[] = { DEFINE_PROP_UINT64(PCIE_HOST_MCFG_BASE, Q35PCIHost, parent_obj.base_addr, MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT), This sets the default mmconfig base address at MCH_HOST_BRIDGE_PCIEXBAR_DEFAULT. The code you qouted seems to ensure that the range starting at
Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate
On Tue, Nov 26, 2013 at 02:58:26PM +0100, Paolo Bonzini wrote: Il 26/11/2013 14:43, Amos Kong ha scritto: /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s. If * you have an entropy source capable of generating more entropy than this * and you can pass it through via virtio-rng, then hats off to you. Until * then, this is unlimited for all practical purposes. */ But the current rate is (INT64_MAX) bytes per (1 16) ms, it's 128,000 TB/s You are changing: * max-bytes from 2^63 to 2^47 * period from 65536 to 6 For a user, changing only period would have no effect, the limit rate would remain effectively infinite. Changing max-bytes would give a 7% higher rate after your patch. The rate will be effected when period OR max-bytes is changed. Before patch (as documenet described): 2^47 / 65536 = 1.953125 TB/s Applied patch: 2^47 / 600,000 = 2.1328125 TB/s (2.1328125 - 1.953125) / 1.953125 = 0.092 It increased 9.2%, as you said, not a big deal. Not a big deal, and max-bytes is easier to explain after your patch (bytes/minute) than before (bytes/65536ms). Reviewed-by: Paolo Bonzini pbonz...@redhat.com Thanks for the quick feedback. Amos
Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate
[adding libvirt] On 11/26/2013 06:58 AM, Paolo Bonzini wrote: Il 26/11/2013 14:43, Amos Kong ha scritto: /* Set a default rate limit of 2^47 bytes per minute or roughly 2TB/s. If * you have an entropy source capable of generating more entropy than this * and you can pass it through via virtio-rng, then hats off to you. Until * then, this is unlimited for all practical purposes. */ But the current rate is (INT64_MAX) bytes per (1 16) ms, it's 128,000 TB/s You are changing: * max-bytes from 2^63 to 2^47 * period from 65536 to 6 For a user, changing only period would have no effect, the limit rate would remain effectively infinite. Changing max-bytes would give a 7% higher rate after your patch. Not a big deal, and max-bytes is easier to explain after your patch (bytes/minute) than before (bytes/65536ms). Reviewed-by: Paolo Bonzini pbonz...@redhat.com Hmm. Libvirt is already converting a user's rate of bytes/period into the qemu parameters, defaulting to 1 second as its default period. Am I correct that as long as libvirt specified both rate AND period, then this change has no impact (and that the 7% change occurs if you specify period while leaving max-bytes alone)? Or is this an ABI change where libvirt will have to be taught to be smart enough to know whether it is old qemu or new qemu to adjust how libvirt does its calculations when converting the user's rate into qemu terms? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] KVM call agenda for 2013-11-26
Juan Quintela quint...@redhat.com wrote: Hi Please, send any topic that you are interested in covering. As there are no topics for agenda, call is cancelled. Later, Juan. Thanks, Juan. Call details: 10:00 AM to 11:00 AM EDT Every two weeks If you need phone number details, contact me privately.
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 3:47 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 26/11/2013 14:18, Avi Kivity ha scritto: I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? Can this cause an interrupt to be delivered to the wrong (old) vcpu? No, this would be exactly the same code that is running now: mutex_lock(kvm-irq_lock); old = kvm-irq_routing; kvm_irq_routing_update(kvm, new); mutex_unlock(kvm-irq_lock); synchronize_rcu(); kfree(old); return 0; Except that the kfree would run in the call_rcu kernel thread instead of the vcpu thread. But the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update. I understood the proposal was also to eliminate the synchronize_rcu(), so while new interrupts would see the new routing table, interrupts already in flight could pick up the old one.
Re: [Qemu-devel] [PATCH 4/4] tpm: Provide libtpms software TPM backend
On 11/25/2013 10:04 PM, Xu, Quan wrote: Thanks Bryant, this problem has been solved by following http://www.mail-archive.com/qemu-devel@nongnu.org/msg200808.html;. But there is another problem when run configure with ./configure --target-list=x86_64-softmmu --enable-tpm. The value of libtpms is still no. when I modified tpm_libtpms to yes in configure file directly and make, then reported with error hw/tpm/tpm_libtpms.c:21:33: fatal error: libtpms/tpm_library.h: No such file or directory. Now I am installing libtpms with https://github.com/coreycb/libtpms for libtpms lib. Could you share specific step to configure QEMU based on your patch, if it comes easily to you? Here's what I've been using to build libtpms: $ CFLAGS='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' $ export CFLAGS $ ./configure --build=x86_64-redhat-linux-gnu --prefix=/usr --libdir=/usr/lib64 $ make $ sudo make install And then the configure you're using above should work for QEMU. BTW, one target of my team is enabling stubdom vtpm for HVM virtual machine on Xen virtualization, your patches and seabios are big breakthroughs. My team is very interested to collaborate with you / Qemu community on similar areas. That's great to hear! Unfortunately, the current approach of linking QEMU against libtpms doesn't look like it's going to make it upstream. So it looks like we need to take a different approach. Btw, I thought Xen already had TPM support. Is that not supported in stubdom's? -- Regards, Corey Bryant I'd be really pleased if you can help me on these issues. Quan Xu Intel -Original Message- From: Corey Bryant [mailto:cor...@linux.vnet.ibm.com] Sent: Monday, November 25, 2013 9:53 PM To: Xu, Quan Cc: qemu-devel@nongnu.org Subject: Re: [Qemu-devel] [PATCH 4/4] tpm: Provide libtpms software TPM backend On 11/24/2013 10:36 PM, Xu, Quan wrote: Bryant, I found that there is some conflict in qemu-options.hx between your patch andqemu-1.7.0-rc1.tar.bz2 http://wiki.qemu-project.org/download/qemu-1.7.0-rc1.tar.bz2. What QEMU version does this patch base on? Thanks. Quan Xu Intel Thanks Quan. I believe I built these on top of commit c2d30667760e3d7b81290d801e567d4f758825ca. I don't think this series is going to make it upstream though so I likely won't be submitting a v2. -- Regards, Corey Bryant
Re: [Qemu-devel] [PATCH] virtio-rng: correct the default limit rate
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Il 26/11/2013 15:32, Eric Blake ha scritto: Hmm. Libvirt is already converting a user's rate of bytes/period into the qemu parameters, defaulting to 1 second as its default period. Am I correct that as long as libvirt specified both rate AND period, then this change has no impact (and that the 7% change occurs if you specify period while leaving max-bytes alone)? Or is this an ABI change where libvirt will have to be taught to be smart enough to know whether it is old qemu or new qemu to adjust how libvirt does its calculations when converting the user's rate into qemu terms? Yes, it is the former---i.e. you're right. Paolo -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJSlLPjAAoJEBvWZb6bTYbyDg8QAIk8BIh5ZY0w4aF7AHufxpmn XtvbGIAdb+ztNBtPrL5HJXAvaOvbZNfktmKDA7Cee1Cq8opJPUZQDpwmv7xUshpG yc1YQu3Djb0/kCDV0oNiMQARSwxJ3XMR94SSLfmlI3b7lJVDeviouhV4UdjS3eal ZNWUXya+1bloOC+zX4P0atB5hyrpVhcVHddWLsIs4+6EYcOatHPgl5CK5f5uxq+P 5qkAe1FJTHMkBOJ86kis3s57ZTTqIlkvFiqtJY4IViifHjbMR4PJraO6EQigOxf6 VdV+MIkPm3GA6F3SYOoZNacITBsaRXTJ0kjD+BjUUAFONB7X8//Uob5gSUgmkED7 ATqiDnqSq1jwyV4FAVwYDB397ViLAZYd9k/ai4QZLaclyMqHQKx+RCNN7dmFwCuL EYK0Z9AkoVkJJQQUcERO3ps/T89+tlBVQhBKpae1V2ZKJqCGxLaf0344S+kFicYk bqp87ra5L5J8Qdn+/eBMJsQNa5NDWjKU0TvZSr3tGGoGKKamzwFmnsTzcZ9DHoTZ IVcdHAbToCYnqbnJzORoxFVycSrY8B2AhZOqIz5YQgNV1LDPcK7hzcXZ+rpBu5mU zF9MFHBKrGJo7l4GyC4jFa5osDfGOyGzMyoSvlMuFTG8CUXN/QLmcpzE+NL1QlCj lL7TLkbhYKrQxUPw37+P =mb5e -END PGP SIGNATURE-
Re: [Qemu-devel] [RFC] create a single workqueue for each vm to update vm irq routing table
Il 26/11/2013 15:36, Avi Kivity ha scritto: No, this would be exactly the same code that is running now: mutex_lock(kvm-irq_lock); old = kvm-irq_routing; kvm_irq_routing_update(kvm, new); mutex_unlock(kvm-irq_lock); synchronize_rcu(); kfree(old); return 0; Except that the kfree would run in the call_rcu kernel thread instead of the vcpu thread. But the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update. I understood the proposal was also to eliminate the synchronize_rcu(), so while new interrupts would see the new routing table, interrupts already in flight could pick up the old one. Isn't that always the case with RCU? (See my answer above: the vcpus already see the new routing table after the rcu_assign_pointer that is in kvm_irq_routing_update). If you eliminate the synchronize_rcu, new interrupts would see the new routing table, while interrupts already in flight will get a dangling pointer. Paolo